Turtle Prion Gene
Mad Cow Home ... Best Links ... Contact Researcher ... Science Index

Turtle prion gene
...Gene organization: 5' UTR, splice acceptor, single exon, 3' UTR
...Signal and GPI peptides
...Turtle repeat: hexapeptide similar to birds
...Invariant region, beta strands, helices, disulfide, glycosylation retained
...Second loop mystery resolved
...EF-hand motif: not really

Evolutionary appearance of turtles relative to birds, mammals
The ancestral amniote prion: alignment of turtle, bird, mammal, and doppel

Turtle prions

FEBS Lett 2000 Mar 3;469(1):33-38 free full text upon guest registration
Tatjana Simonic, Duga S, Strumbo B, Asselta R, Ceciliani F, Severino Ronchi
Office phone: 02 2664343; fax: 02 2666301
Comment (webmaster): This is a very exciting result that resolves a host of old issues and raises new ones. The 959 bp sequence became available at GenBank 17 Mar 00 as accession number AJ245488. The authors, new to prion research, are experienced molecular biologists and have previously sequenced a variety of genes; the paper here shows an excellent awareness of recent prion literature and encompasses everything from genomic sequence to 3D models of turtle prion. (The authors kindly made turtle coordinates available in a pdb format that works well in Swiss-PdpViewer 3.5

Advanced methods used here to sequence the turtle prion gene may help push the history of the prion gene even further back in time, central to understanding the normal function of various prion protein domains and paralogues. As noted below, turtles are no longer considered a basal reptilian lineage but instead a sister group to crocodilians, the two clades in turn the nearest neighbor of birds. This topology very much affects interpretation of the turtle prion sequence.

Future research plans for the group reportedly include:

a) cloning of PrP cDNAs from Xenopus laevis and fishes;
b) expression of turtle PrP to obtain antibodies, verification of biochemical properties (dimerization and Ca++ binding,...);
c) purification of prion protein from fish brain.
d) continuing rapid dissemination of results to the prion disease research community via journals like FEBS Letters.

For RT-PCR, total RNA was extracted from the liver of the red-eared slider turtle, Trachemys scripta, and reverse transcribed using degenerate primers based on conserved known prion sequences, obtaining positions 538-620 of the ultimately recovered sequence. Coupling part of the authentic sequence to a successful primer lead to recovery of 580-907, which included 94 bp of 3'UTR. The 5' end failed to amplify under a variety of plausible conditions; ultimately 'adaptor-ligated genomic DNA fragments' and a nested PCR step yielded 641 bp including 35 bp of 5' UTR. This was eventually extended to position -52 after further aggravations. The first 25 bp differed from cDNA, suggesting a splice site. This is supported by its canonical ../AG splice acceptor.

The bottom line: a 959 bp sequence comprised of 52 bp of an upstream exon, the final 25 bp of an intervening intron, 10 bp of 5' UTR leader, 813 bp of coding region (a 30.1 kd protein of 270 amino acids) and 94 bp of trailing 3' UTR. [This UTR was not compared to previously determined prion UTR by Blast searches but see below.] Thus the structure of the turtle prion gene is very similar to that seen in mammals: one or more upstream exons and a single uninterupted coding exon with a short leading and long training UTR. Northern blots showed that the gene was transcribed (not a pseudogene): a main band of approximately 2.4 kb was detected, a size similar to other prion mRNAs which have long 3' UTRs.

The turtle sequence thus provides important genomic context through inclusion of part of an upstream exon, the pre-coding splice junction, and a portion of 3' UTR. All too often, mammalian prion sequences -- even in human disease -- have stopped short of providing adequate sequence length. The turtle sequence will allow other investigators to bootstrap off this sequence (or rather, conserved degenerate primers based on both turtle and bird sequences) to find the prion gene of ancestral amniotes and fish.

The inferred turtle protein sequence is unquestionably a prion orthologue with no particular kinship to mammalian doppels. Mammalian and avian and doppel proteins are said to be about 40, 58, and 20% identical. This could be recalculated with a better alignment than that of figure 3; repeat regions should never be aligned across or even within clades because they are not homologous; the alignment of the loop between helix 2 and 3 is problematic [but see below].

>turtle prion protein 270 aa (signal, mature, GPI):
MGRYRLTCWIVVLLVVMWSDVSFS
KKGKGKGGGGGNTGSNRN PNYPSN PGYPQN PGYPRN PSYPHN PAYPPN PAYPPN PGYPHN PSYPRN PSYPQN PGYPGG GGQHYNPAGGGTNFKNQKPWKPDKPKTNMKAM AGAAAAGAVVGGLGG 
YALGSAMSGMRMNFDRPEERQWWNENSNRYPNQVYYKEYNDRSVPEGRFVRDCVNITVTEYKIDPNENQNVTQVEVRVMKQVIQEMCMQQYQQYQLAS
GVKLLSDPSLMLIIMLVIFFVMH-
The repeat region [see details below] is comprised of generally minor variations of the avian doubly periodic hexapeptide, PHNPGY, and is comparable in number (10) to the longest known bird repeats. Neither modified post-signal cleavage arginines nor copper binding was studied; however, the primary sequence can be accommodated into the 3D structure previously proposed for avian copper-binding repeat.

The core invariant prion region, AGAAAAGAVVGGLGGY, is perfectly conserved and a safe bet to be present in the common ancestor of mammal and bird/turtles at 310 million years. The preceding basic region is quite well preserved back to the post-repeat region especially relative to birds. The beta strand underpass regions also proves to be very stable, though few residues of helix A per se are fixed.

The two primary asparagine-linked glycosylation sites, found at positions 204 and 219 in turtle and conserved in birds and mammalian prion (and the first in doppel), argue for the antiquity of this extra-cellular feature. The third site (Asn 209 in chicken) is absent in turtle ,which lacks the recent avian loop duplication, as is a putative doppel glycosylation site. Turtle prion has 3 additional NxS sites but the intervening residue in each case is proline: not admissible in consensus signal.

The immediate neighborhood of both cysteines -- the only ones present in mature protein -- are conserved and it is safe to conclude the turtle prion has a similar fold stabilized by a disulfide as other prions. There is no support here for or against an ancient origin of the second disulfide of doppel and no likelihood of inter-molecular disulfide cross-linking.

The first 24 residues comprise a fairly well conserved signal peptide [see below]. The cleavage site predicted from avian homology can be validated with online prediction tools such as PSORT. The last common residues of turtle prion align quite well with the chicken GPI recognition sequence. The turtle and chicken sequences both have Ser-Gly (at positions 247-248 and 248-249, respectively); chicken Gly249 has generally been taken as the GPI substitution site. Still-unpublished methods cited in the Oct JMB paper to analyze doppel for a GPI site would probably support this conclusion.

Secondary and tertiary structure predictions were performed using ProModII. These turned out to be very similar to features known from mammalian prion nmr or inferrable from alignment. The problem with 3D threading is that it is exactly the differences to the reference structures (here rodent and human) that are of interest and yet these are the most problematic aspect of the output.

The paper states that 'the most intriguing feature, unique to the turtle prion, is the presence of an EF-hand Ca(2+) binding motif at positions 213 to 224.' No information in the paper explains the motif search or validation tools used; citations provided are unsatisfactory. Background on EF-hands is provided below; on balance, this motif is not supported though only experiment can ultimately settle the issue of calcium binding.

Gene organization: 5' UTR, splice acceptor, single exon, 3' UTR

14 Mar 00 webmaster
The reported flanking UTR reported for turtle prion can be studied further by alignment with similarly located regions in other prion sequences. Since alignment of non-coding sequence is subject to statistical flukes, the first step is to see if turtle UTR can pull out prion genomic sequence from the millions of other sequences at GenBank. This not really expected given the rate at which these regions evolve and the phylogenetic distances involved.

The only avian prion UTRs in GenBank are chicken M61145 (cDNA library: 171 bp is available 5' and 1214 bp 3') and M95404 (genomic: 34 bp is available 5' and 111 bp 3'). By comparing genomic to mRNA, it emerges that chicken has only 2 bp of leading sequence, whereas turtle has 10 (as do placentals and marsupial). Numerous mammalian sequences are available for exon 1, 2, leading UTR, and flanking 3' UTR. These are aligned elsewhere on this site.

Turtle intron 2 is too short to serve as a probe, the distal portion of turtle exon 2 does not align with exon 2 from mammals, the leader UTR is too short to assess statistical significance, and the 3' turtle UTR also fails to align significantly via blastn. (Here, turtle exons are named in parallel with mammals, ie, the coding exon is taken as exon 3.) The only point of interest is that the 10 bp leader sequence in turtle agrees in length with mammal and also in sequence for the first 5 residues preceding the start codon.

In summary, the structure of the turtle prion gene is very similar to other prion genes insofar as this can be determined; it would be necessary to sequence additional reptilian species to assess conservation of upstream exons and promoters.

turtle distal exon 2    ...gggtaaaggtggaggcggtggggcgcgcagcgcagtgcccag
chicken                 ...agtcagaggaagcaaccaccgaccccaagacctcaccccgag
  
turtle distal intron 2  ...tccaaataattactttacag
chicken                 ...gtgtgtccttatgcccgcag
  
turtle leader exon 3       acttatcatc ATG
chicken                            cc ATG
marsupial                  atcacctacc ATG
mammalian consensus        AtAAGtCATC ATG

turtle trailing 3'         gaaaagcagtctcagcctgaactgtgctgatctgtgcaaacgttcagagggaataatctatataaaacagcctctctgttggaggtctctcca...
chicken                    gggatgccgtgccccggccctgtggcagtgagatgacatcgtgtccccgtgcccacccatggggtgttccttgtcctcgcttttgtccatctt... 

Signal and GPI peptide regions

13 Mar 00 webmaster
The N-terminus of turtle PrP has 24 residues recognized by consensus recognition software [reviewed: Curr Opin Biotechnol 2000 Feb;11(1):13-8] or comparison to avian sequences as endoplasmic reticulum targeting signal peptide. Similarly, a GPI anchor sequence beginning with Ser-Gly at positions 247-248 can be identified carboxy-terminally. These targeting motifs are a conserved property of all prion orthologues and paralogues.

Signal peptides (and GPI segments) do not have tertiary structure in the sense of a globular protein domain. Signal peptides are transient helical features recognized during ribosomal extrusion by the very ancient signal recognition particle (SRP), a recently crystallized and reviewed protein-RNA complex: Signal sequences may bind to both protein and RNA: their positively charged region (which usually precedes the hydrophobic core) contacting the RNA directly and hydrophobic core nest into the methionine-lined pocket of SRP's M domain, triggering a conformational change targeting the nascent peptide (and the ribosome in which it sits) to the SRP receptor in the ER membrane.

Alignment of prion signal peptides must acknowledge the possibility of convergent evolution (which inflates percent identity): thousands of unrelated proteins must be acceptable to the same processing machinery, from SRP to endo-protease. Amino acid composition and length are both constrained; the signal peptide will begin with methionine and continue with a stretch rich in arg, his, and lys (the only positively chargedd amino acids), then continue from a limited menu of non helical-breaking hydrophobic residues, and end before a stretch of basic residues.

Keeping this in mind, the turtle prion supports an ancestral signal peptide length of 24-26. This issue had been somewhat confused by a synapomorphic two residue deletion affecting rodents-primates-lagomorphs. Mammalian prions are not particularly well conserved within the signal region; their alignment to bird prion signal is already somewhat dubious (only 2 of 9 avian sequences cover this region). The rapidly evolving doppel signal peptide was analyzed in detail earlier: only 11 residues are conserved between rodents and primate. None of these are conserved in vertebrate prions

In short, the situation unfavorable for alignment of turtle signal peptide because it is an isolated long branch, the neighboring avian clade is poorly represented, the region is evolving rapidly while trapped in the potential well of SRP convergence, and lacks 3D structure supporting distant alignment. On the other hand, the depth of divergence is not so great relative to the rate of mutational fixation that all similarities have been obliterated: a few ancestral values can even be reliably inferred.

The GPI anchors can be aligned anchoring on the final disulfide cysteine and the attachment site. The alignment suffers from long branch effects due to singlet turtle, bird, and maruspial sequences, as well as from a non-determinable 3D structure. Incompleted sequences of Schatzl et al. once again are costly. Taxa should always be sequenced in moderately similar groups of three. PM Harrison's promised study of known mammalian GPI sites remains unpublished and unimplemented as a web tool; requirements of the ER enzyme attaching complex are not as well understood as SRP.

Turtle aligns quite well with chicken (which has an anomalous histidine post-attachment). In helix C, mammals seem to have an extra two terminal helical turns (5-6 residues) relative to turtle/avian and doppel. (This allows the second disulfide of doppel to form without massive rearrangement.) Curiously, turtle has serine at position 238; P238S is a reported CJD mutation. This is the most gappy region of the prion protein, suggesting fewer consequences to not conserving length. This makes it difficult to predict ancestral amniote sequence other than in qualitative terms.

Turtle repeat: hexapeptide similar to birds

14 Mar 00 webmaster research
Turtle prion repeat adds some new dimensions to speculation on repeat region evolution and function. While clearly in the form of avian biperiodic proline-based hexarepeat (rather than mammalian mono-periodic octarepeat), turtle repeat breaks some of the rules. Some long-standing invariant residues are changed [see line 2 of graphic below], apparently resulting in the inability of some repeat units to bind copper or other metals, at least using conventional histidine ring nitrogens and glycine carbonyls.

Metal binding by turtle repeat will have to be determined experimentally. The covalently modified arginine fiasco, now in its twelfth year, involves R25 and R37 (primate numbering); the former is lysine in birds, turtle (residue 41 SNRNP, and marsupial but the latter arginine is conserved -- experimental work on native protein is again necessary to determine whether it is modified. (Doppel conserves both arginine sites.)

However, the overall length (60 residues or 22%) and preponderance of highly basic residues (21:1 relative to acidic, of 112 residues) in flanking regions are comparable to previously studied prions. This discrimination against acid residues argues that charge and solubility per se cannot explain the observed conserved pattern: phospholipid, nucleic acid, and heparin sulfate are three prominent, negatively charged candidates for binding partner.

Turtle sequence suggests important roles for the tyrosine in 3rd position and asparagine in sixth position, which remain conserved in birds (and comparably to W and Q in mammalian octarepeat).. Despite some variation at second and fifth position, their chemical character is largely retained: small, uncharged at position 2, polar or basic at position 5. Note also that the first capping repeat has a polar asparagine in second position, recalling the 5' capping hydrophilic serine here in birds and glutamine in mammals.

Turtle repeat domain has an odd central duplicated repeat, PAYPPN. While not necessarily disruptive of the collagen-like coil proposed for this region, these units seem unlikely to bind copper; indeed only two non-adjacent units have histidine. However, it is not necessary for every repeat unit to be functional in this regard since 10 repeats occur in this protein. The repeats and flanking regions occupy 45% of the entire protein.

Turtle repeats are stabilized to replication slippage events, having more heterogenity that avian or mammalian repeat domains, meaning fewer over-writes due to insertion and deletion of repeat units. Even the PAYPPN units do not seem of recent origin even though local point mutations often accompany insertion event. This suggests the turtle repeat length is not anomalous for its lineage.

In one functional scenario, the long arm of the repeat reaches out to catch potentially damaging free copper (of especial concern in the vicinity of the synapase), then cycles this copper to the cell interior where it can be safely stored or used. Here both length and copper-binding capability have selective value, with length-only hexapeptides retained provided that they follow the P-small-Y-P-polar-N format needed for consistent modular tertiary structure, whose maintainence over immense time frames is completely contradictory to the unstructured random coil structure proposed from nmr in EDTA.

It is simply not plausible to posit radically different roles for the repeat region in different lineages and functional proposals (such as superoxide dismutase activity) must be viable across all lineages.

Microheterogeneity in turtle repeat domain:
cccaactatcccagcaac
 P  N  Y  P  S  N 
cccggctacccccaaaat
 P  G  Y  P  Q  N 
cctggctatcccagaaac
 P  G  Y  P  R  N  
cctagctacccccataat
 P  S  Y  P  H  N 
cctgcctacccccccaat
 P  A  Y  P  P  N  
cccgcctatccccctaat
 P  A  Y  P  P  N 
cccggctacccccacaat
 P  G  Y  P  H  N 
cccagttaccccaggaac
 P  S  Y  P  R  N   
cctagctacccccagaat
 P  S  Y  P  Q  N 
cctggctaccctggtggt
 P  G  Y  P  G  G
Mammalian prion flanking regions are alignable but the repeat region itself is not homologous, having resulted from an independent replication slippage event of different character. Doppel aligns very poorly and only briefly post-signal; doppel may never have had a repeat region, or more likely, after tandem duplication a large deletion took out its repeat and core invariant region. Mature doppel also a very basic amino terminus, so this is likely an ancient feature.

In summary, this region of turtle prion is fairly conventional but expands the horizons of diversity. The prion protein tends to preserve its qualitative, presumably functional, domain structures -- signal, basic, repeat, basic, invariant, globular domain, disulfide, glycosylation, GPI -- long after precise details have blurred and percent homologies become problematic to estimate. This domain structure conservation is a somewhat distinct aspect from tertiary fold (which will be conserved as well).

Invariant region, beta strands, helices, disulfide, glycosylation retained

19 Mar 00 webmaster research

Alignment of the central region of turtle prion is routine. Turtle sequence expands the possibilities at many formerly invariant positions, which has the effect of reinforcing the significance of the continuing perfect invariance of the 106-126 domain.

In the central region, turtle is mostly of avian character, for example having the double tryptophan of birds instead of the double tyrosine of mammals. (Near the first beta strand, there is better agreement with mammal at 3 positions -- under-dotted in the graphic). Trp and tyr, though both of the aromatic ring class, cannot be interconverted by a single point mutation, and two point mutations in series must pass through cysteines or a stop codon. One wonders what residues are found here in the common tetrapod ancester and if a 'repeat aromatic' is critical structurally (edge stacking).

Nine central residues are invariant in all species as well in doppel. Turtle, avian, and doppel align so well in this region that their 3D folds are over-determined (severely constrained by identities to mammalian residue positioning). Doppel alignment breaks down N-terminally (despite a run of 17 residues conserved in 3 species) and the structure of this 42 amino acid region and question of its first beta strand will have to be determined experimentally. However, turtle prion strongly supports an earlier concept that the invariant structural core of the prion protein is based primarily on packing at the underpass region and spatially adjacent domains.

Second loop mystery resolved?

14 Mar 00 webmaster research
A paradoxical aspect of the prion protein is the evolution and function of the second loop (between alpha helix B and C). It has not been possible to meaningfully align bird and mammal prion in this region due to quite different lengths and amino acid compositions despite reliable flanking anchors (the disulfide cysteines). No plausible sequence of point mutations or deletions (even of 3n type that avoid frameshifts while allowing phaseshifts) could derive both mammal and avian second loop from a common ancestor.

The paradox is that the loop region is quite well conserved, to different net effect, in both mammal and avian lineages. The overall impression is of a gene region that evolved for some period rather like unconstrained intronic or intergenomic spacer but subsequently acquired function (in addition to the glycosylation site) that greatly limited further change over the last 90 million years or so. This makes it difficult to envision a common functionality, much less reconstruct the ancestral sequence.

This region can be reliably located in doppel, again anchoring to the interior flanking conserved disulfide. But doppel has degenerated so fast that it is scarcely alignable with either bird or mammal and so sheds no light on the origin or evolution of the loop domain. By threading to mammalian 3D nmr structures, it is possible in both doppel and bird to identify individual residues playing comparable roles (which, strictly speaking, may not be exactdly homologous, or related by evolutionary descent). Nmr structures display positioning of individual loop residues and quantitate loop rigidity in solution.

Nmr structures independently determined in 3 species all have a 5 residue loop, GENFT, at positions 195-199 (PDB: human 1QLX, mouse 1AG2, and hamster 1B10). The corresponding residues in turtle are NQNVT; however, turtle helix B has a proline (normally helix-ending) near the end of helix B and is missing one residue relative to mammals, PNE-NQNVT. The onset of helix C is correctly predicted by software tools. The loop carries one of the conserved glycosylation sites and 2 of the 5 residues can be altered in familial CJD.

Happily, the turtle/crocodilian lineage illuminates the history of this domain (see graphic). Cys-to-cys lengths are similar in turtle, mammal, and doppel at 35, 36, and 35 which suggests this length is ancestral, implying that birds, with 46 residues, have 11 extra amino acids. [Note: placental mammals (101 sequences) all have the same inter-cysteine length with the exception of a camel allele M205-.]

The webmaster proposes a scenario in which an internal duplication of this region occurred in the avian lineage subsequent to its divergence from turtle. Since the most basal bird, ostrich, was not sequenced in this region, this putative event (possibly an internal replication slippage) can only be dated to the 150 million year interval between divergence of birds from turtle/crocodilians and the divergence of galloanserinae. Additional turtle/crocodilian, paleognath bird, and other reptilian sequences would separate out lineage-wide developments from changes unique to the red-eared slider (an isolated or long branch sequence).

The avian loop region duplication has not fully settled down; some 7 positions are hypervariable, though the third glycosylation site has been retained (and perhaps drove fixation of this rare event). It seems that both helices have gotten longer, which increases their terminal divergence so that a longer loop can be accommodated, rather than a secondary loop developing. Only one glycosylation site is universal: that following the first cysteine of the universal disulfide.

In summary, the turtle sequence resolves the second loop paradox in favor of the length seen in mammals, not birds.

The ancestral amniote common ancestor is predicted to have an inter-cysteine length of 35 amino acids, a 5 residue loop, and a single loop glycosylation site. Some 12 residues of the ancestral amniote sequence can be reliably predicted based on their conservation over immense time scales: FV.DCVNITV............N.T......M..V...MCI.QY. These may suffice for degenerate primer design in earlier-branching organisms. The turtle sequence also establishes the helix-loop-helix fold seen in mammalian prion nmr as ancient. Dimly related doppel still appears to have diverged prior to the separation of mammals and bird/turtle and retains only 7 of the 12 conserved residues though still the overall fold.

Backgrounder on EF-hands

13 Mar 00 webmaster: adapted from Medline, PDP, SwissProt
The paper states that 'the most intriguing feature, unique to the turtle prion, is the presence of an EF-hand Ca(2+) binding motif C-terminally.... A canonical 12-residues segment containing the Ca2+ binding ligands was found at positions 213 to 224. The flanking sequences correspond to the second and third -helices and may be consistent with such secondary structure, the only difference concerning an early break of the first helix (the putative E-helix) by the Pro214. This perfectly agrees with the canonical start of the inter-helical loop of an EF-hand motif....The presence of a similar disulphide bridge has been reported in the first EF-hand motif found in an extracellular protein, the matrix protein BM40....Usually at least two Ca2+ binding sites are present in a same polypeptide chain, however a single functional site has been reported for E.coli lytic transglycosylase Slt35."

The authors add that there is no experimental support for calcium binding but oddly they do not use their own freshly minted 3D model of turtle prion for a comparison to standard EF-hands at PDB; indeed figure 5b shows 2 EF-hand residues assigned to helix B and 3 to helix C, leaving the EF-hand short of the required 12 flanked residues. This structural motif, provided that it is a valid reading of the primary sequence data, would raise a great many questions about the functional role of this motif and why it is not in other prion lineages.

This claim must be viewed with great caution given limited documentation within the paper. Many earlier structural motifs proposed for the prion protein have been quietly abandoned. In the past, the webmaster has wasted a huge amount of time chasing down rubbish that should never have been submitted for publication and if so should have been rejected during peer review. Expert volunteer help in assessing EF hands, not used by the authors, is readily available from Cox J.A. in Switzerland and Kretsinger R.H. in Virginia.

Suspicions are quickly raised here: the very remote E. coli example justifying a single EF-hand is actually a proteolytic fragment of a larger membrane protein. Three better mammalian examples are quickly found at Prosite: myosin regulatory light chains, osteonectin, and FAD G3P dehydrogenase. The vast majority of mammalian proteins with EF-hands have multiple EF-hands; 984 EF-hands in 423 different sequences have been validated. Turtle has a single candidate EF-hand surrounding an otherwise conserved ER-glycosylation site, making characteristic conformational shifts upon calcium binding difficult to envision.

While it is true that online tools do recognize this EF-hand candidate from primary sequence (justifying some mention of it in the paper) and the required flanking secondary structures (alpha helices B and C) are present, the real definition of an EF-hand lies in its tertiary fold (including handedness of the helices) and actual calcium binding properties. (See list of links to 3D structures.) The helices in prion protein are at altogether wrong angles for an EF-hand; these angles cannot be changed without disrupting highly evolved interior packing relationships that provide the globular domain with stability.

Online tools also give many false positives -- indeed, the very Prosite and Profilescan reporting back the EF-hand also find 3 protein kinase C sites, 3 additional casein kinase II sites, and 6 N-myristoylation sites (as well as the generally accepted N-glycosylation sites at 204-207 and 219-222). Some 120 false positive EFD-hands are known -- proteins that satisfy the consensus sequence but do not bind calcium.

The EF-hand tools themselves do not output a statistically assessable score. Precision of the primary sequence tests (true hits / (true hits + false positives)) is 89.1%; recall (true hits / (true hits + false negatives)) is 97.4%. The consensus sequence is much more complex to describe than a glycosylation site and is by no means a hard-and-fast rule.

Positions 1, 3 and 12 are the most conserved. The 6th residue in an EF-hand loop is in most cases glycine (turtle is glutamine), but the number of exceptions to this 'rule' has gradually increased (and includes gln). In an EF-hand loop the calcium ion is coordinated in a pentagonal bipyramidal configuration. The six residues involved in the binding via oxygen atoms are in positions 1, 3, 5, 7, 9 and 12; these residues are denoted by X, Y, Z, -Y, -X and -Z. The invariant Glu or Asp at position 12 provides two oxygens (bidentate ligand) for calcium. The average ion-ligand distance typical for calcium-oxygen distances is 2.4 A; position 12 is generally bidentate (eg, asp) with the metal ion coordinated by seven oxygen ligands.

The EF-hand fold consists of a helix-loop-helix module with the two helices at rgiht angles forming the finger (helix E) and the thumb (helix F) of a right hand [carp parvalbumin nomenclature]. The loop between helices E and F contains the 12 residue EF-hand consensus sequence:

D-x-[DNS]-{ILVFYW}-[DENSTG]-[DNQGHRK]-{GP}-[LIVMC]-[DENQSTAGC]-x(2)-[DE]-[LIVMFYW]
D-P-  N  -   E    -   N    -    Q    - N  -   V   -     T     - QV - E        V    turtle 213 to 224
1-2-  3  -   4    -   5    -    6    - 7  -   8   -     9     -10/11-12 -    13    EF-hand numbering
A major problem here is that calcium-binding proteins sharing the EF-hand generally belong to the same evolutionary family, which is not the case for turtle prion. For many proteins, calcium is essential for folding, thermostability, reactivity and function. EF-hand proteins undergo conformational changes upon binding of Ca2+; it is hard to see how this could be restricted to just the turtle prion lineage.

The webmaster has verified that no EF-hand is predicted by Profilescan for chicken and other avian prions, though oddly the kinase and N-myristoylation sites are still supported across species. Avian sequences are the most closely available sequences to turtle.

We are left with three scenarios, possibilities that could be resolved by further sequencing and by direct calcium binding studies. In the webmaster's opinion, the first scenario is correct:

(1) The turtle prion primary sequence came by accident to superficially resemble an EF-hand (without functional or structural calcium binding) in a variable loop region which is not well conserved between birds and mammal, or for that matter between birds and turtle; the glycosylation site remains the main function.

(2) The turtle prion sequence has a newly evolved paralogous EF-hand that binds calcium in a functionally signficant manner. Other species beyond the avian/turtle divergence, such as crocodilians, may also possess the EF-hand but not the ancestral amniote. (Alternately, the turtle prion sequence contains a relic ancient EF-hand that has been lost along with calcium binding functionality in mammalian and avian lineages.)

Ironically, protocadherin (which has been proposed as a high-affinity ligand for prion protein) is otherwise involved in calcium-dependent cell-cell adhesion. Its calcium binding site however is not an EF-hand. Presenilin 2 binds with sorcin, a penta-EF-hand Ca2+-binding protein that serves as a modulator of the ryanodine receptor (RyR) intracellular Ca2+ channel but itself does not contain a calcium site [J. Biol. Chem, 10.1074/jbc.M909882199].
  - Aequorin and Renilla luciferin binding protein (LBP) (Ca=3).
  - Alpha actinin (Ca=2).
  - Calbindin (Ca=4).
  - Calcineurin B subunit (protein phosphatase 2B regulatory subunit) (Ca=4).
  - Calcium-binding protein from Streptomyces erythraeus (Ca=3?).
  - Calcium-binding protein from Schistosoma mansoni (Ca=2?).
  - Calcium-binding  proteins  TCBP-23 and TCBP-25 from Tetrahymena thermophila
    (Ca=4?).
  - Calcium-dependent protein kinases (CDPK) from plants (Ca=4).
  - Calcium vector protein from amphoxius (Ca=2).
  - Calcyphosin (thyroid protein p24) (Ca=4?).
  - Calmodulin (Ca=4, except in yeast where Ca=3).
  - Calpain small and large chains (Ca=2).
  - Calretinin (Ca=6).
  - Calcyclin (prolactin receptor associated protein) (Ca=2).
  - Caltractin (centrin) (Ca=2 or 4).
  - Cell Division Control protein 31 (gene CDC31) from yeast (Ca=2?).
  - Diacylglycerol kinase (EC 2.7.1.107) (DGK) (Ca=2).
  - FAD-dependent   glycerol-3-phosphate   dehydrogenase   (EC  1.1.99.5)  from mammals (Ca=1).
  - Fimbrin (plastin) (Ca=2).
  - Flagellar calcium-binding protein (1f8) from Trypanosoma cruzi (Ca=1 or 2).
  - Guanylate cyclase activating protein (GCAP) (Ca=3).
  - Inositol phospholipid-specific phospholipase C isozymes gamma-1 and delta-1(Ca=2)
  - Intestinal calcium-binding protein (ICaBPs) (Ca=2).
  - MIF related proteins 8 (MRP-8 or CFAG) and 14 (MRP-14) (Ca=2).
  - Myosin regulatory light chains (Ca=1).
  - Oncomodulin (Ca=2).
  - Osteonectin  (basement  membrane  protein  BM-40) (SPARC) 
  - Parvalbumins alpha and beta (Ca=2).
  - Placental  calcium-binding  protein  (18a2)  (nerve  growth  factor induced protein 42a) (p9k) (Ca=2).
  - Recoverins (visinin, hippocalcin, neurocalcin, S-modulin) (Ca=2 to 3).
  - Reticulocalbin (Ca=4).
  - S-100 protein, alpha and beta chains (Ca=2).
  - Sarcoplasmic calcium-binding protein (SCPs) (Ca=2 to 3).
  - Sea urchin proteins Spec 1 (Ca=4), Spec 2 (Ca=4?), Lps-1 (Ca=8).
  - Serine/threonine  protein  phosphatase  rdgc  (EC 3.1.3.16) from Drosophila(Ca=2)
  - Sorcin V19 from hamster (Ca=2).
  - Spectrin alpha chain (Ca=2).
  - Squidulin (optic lobe calcium-binding protein) from squid (Ca=4).
  - Troponins C; from  skeletal muscle (Ca=4), from cardiac muscle (Ca=3), fromarthropods and molluscs (Ca=2).
 
Recommended reading:
Protein Prof. 2:305-490(1995). 
Trends Biochem Sci 1996 Jan;21(1):14-7
Nat Struct Biol 1997 Jul;4(7):514-6

Evolutionary appearance of turtles relative to birds, mammals

13 Mar 00 webmaster
While the turtle sequence is very good news indeed, a single species makes for a very long branches (here 228 million years); sequencing should always be done in triples. Trachemys scripta is a hidden-neck terrapin (as opposed to a side-necked turtle); there are seven major familes of turtles. Of course, having one turtle sequence in hand greatly simplifies the task of sequencing other priority species.

The modern consensus view of turtle evolution [see below] shows that the prion sequence of turtles will be most closely aligned to crocodilians and then to birds; turtles are thus not a basal amniote lineage. Thus it is no surprise that the red necked slider has a hexapeptide repeat unit like that of birds; this will likely be the case in their last common ancestor as well as in crocodilians. Turtle data cannot resolve the avian anomaly -- the common ancester to a hexa- and octapeptide repeat lineages (which cannot have evolved from each other by point mutation change).

By great good fortune, Hedges et al.published a major bird phylogenetic paper [all extant orders were sampled] this week in Mol Bio Evol 17#3 Mar 00 pg 451; the gist of it for bird prions is that (ostrich, (duck, chicken), ...) is the correct branching order and that paleognathae are evolving much slower (eg, are closer to the ancestral sequence), Galloanserinae are intermediate, Neognathae the fastest. This, along with the turtle sequence outgroup, very much helps in determining the ancestral bird prion.

The webmaster earlier located the prion gene in zebrafish and fugu, adjacent to the well-conserved KIAA0168 gene. No ESTs related to prion or doppel can yet be found in the databases; fugu would have little intervening repeat material making it quite feasible to sequence the prion gene from a KIAA0168 clone.

A molecular phylogeny of reptiles.

Science 1999 Feb 12;283(5404):998-1001
Hedges SB, Poling LL
Commentary piece by Olivier Rieppel
See also Rieppel, O., Reisz, R. R. (1999). The origin and early evolution of turtles. Annu. Rev. Ecol. Syst. 30: 1-22
See also Kumar S, Hedges SB , Nature 1998 Apr 30;392(6679):917-20
The classical phylogeny of living reptiles pairs crocodilians with birds, tuataras with squamates, and places turtles at the base of the tree. New evidence from two nuclear genes, and analyses of mitochondrial DNA and 22 additional nuclear genes, join crocodilians with turtles and place squamates at the base of the tree. Morphological and paleontological evidence for this molecular phylogeny is unclear. Molecular time estimates support a Triassic origin for the major groups of living reptiles.

The time estimates indicate that squamates diverged from the other reptiles at 245 12.2 million years ago (Ma) (9 genes), birds diverged from the lineage leading to turtles and crocodilians at 228 10.3 Ma (17 genes), and that turtles diverged from crocodilians at 207 20.5 Ma (7 genes). These divergence times are close to when the first turtles (223 to 210 Ma) and crocodilians (210 to 208 Ma) appear in the fossil record and earlier than the first birds (152 to 146 Ma) and first squamates (157 to 155 Ma).

Complete mitochondrial DNA sequences of the green turtle and blue-tailed mole skink: statistical evidence for archosaurian affinity of turtles.

Mol Biol Evol 1999 Jun;16(6):784-92 
Kumazawa Y, Nishida M
Turtles have highly specialized morphological characteristics, and their phylogenetic position has been under intensive debate. Previous molecular studies have not established a consistent and statistically well supported conclusion on this issue. In order to address this, complete mitochondrial DNA sequences were determined for the green turtle and the blue-tailed mole skink. These genomes possess an organization of genes which is typical of most other vertebrates, such as placental mammals, a frog, and bony fishes, but distinct from organizations of alligators and snakes. Molecular evolutionary rates of mitochondrial protein sequences appear to vary considerably among major reptilian lineages, with relatively rapid rates for snake and crocodilian lineages but slow rates for turtle and lizard lineages. In spite of this rate heterogeneity, phylogenetic analyses using amino acid sequences of 12 mitochondrial proteins reliably established the Archosauria (birds and crocodilians) and Lepidosauria (lizards and snakes) clades postulated from previous morphological studies.

The phylogenetic analyses further suggested that turtles are a sister group of the archosaurs, and this untraditional relationship was provided with strong statistical evidence by both the bootstrap and the Kishino-Hasegawa tests. This is the first statistically significant molecular phylogeny on the placement of turtles relative to the archosaurs and lepidosaurs. It is therefore likely that turtles originated from a Permian-Triassic archosauromorph ancestor with two pairs of temporal fenestrae behind the skull orbit that were subsequently lost. The traditional classification of turtles in the Anapsida may thus need to be reconsidered.

Molecular evidence for a clade of turtles.

Mol Phylogenet Evol 1999 Oct;13(1):144-8
Mannen H, Li SS
Although turtles have been generally grouped with the most primitive reptile species, the origin and phylogenetic relationships of turtles have remained unresolved to date. To confirm the phylogenetic position of turtles in amniotes, we have cloned and determined the cDNA sequences encoding for skink lactate dehydrogenase (LDH)-A and LDH-B, snake LDH-A, and African clawed frog LDH-A; four alpha-enolase cDNA sequences from turtle, alligator, skink, and snake were also cloned and determined. All of these eight cDNA sequences, as well as the previously published LDH-A, LDH-B, and alpha-enolase of mammals, birds, reptiles, and African clawed frog, were analyzed by the phylogenetic tree reconstruction methods of neighbor-joining, maximum parsimony, and maximum likelihood.

In the phylogenetic analyses, the turtle was found to be closely related to the alligator. Also, we found that the turtle had diverged after the divergence of squamates and birds. This departs from previous hypotheses of turtle evolution and further suggests that turtles are the latest of divergent reptiles, having been derived from an ancestor of crocodilian lineage within the last 200 million years.

Warm-blooded isochore structure in Nile crocodile and turtle.

Mol Biol Evol 1999 Nov;16(11):1521-7
Hughes S, Zelus D, Mouchiroud D
The genomes of warm-blooded vertebrates are characterized by a strong heterogeneity in base composition, with GC-rich and GC-poor isochores. The GC content of sequences, especially in third codon positions, is highly correlated with that of the isochore they are embedded in. In amphibian and fish genomes, GC-rich isochores are nearly absent. Thus, it has been proposed that the GC increase in a part of mammalian and avian genomes represents an adaptation to homeothermy. To test this selective hypothesis, we sequenced marker protein genes in two cold-blooded vertebrates, the Nile crocodile Crocodylus niloticus (10 genes) and the red-eared slider Trachemys scripta elegans (6 genes).

The analysis of base composition in third codon position of this original data set shows that the Nile crocodile and the turtle also exhibit GC-rich isochores, which rules out the homeothermy hypothesis. Instead, we propose that the GC increase results from a mutational bias that took place earlier than the adaptation to homeothermy in birds and before the turtle/crocodile divergence.

Surprisingly, the isochore structure appears very similar between the red-eared slider and the Nile crocodile than between the chicken and the Nile crocodile. This point questions the phylogenetic position of turtles as a basal lineage of extant reptiles. We also observed a regular molecular clock in the Archosauria, which enables us, by using a more extended data set, to confirm Kumar and Hedges's dating of the bird-crocodile split.

The ancestral amniote prion: alignment of turtle, bird, mammal, and doppel

19 Mar 00 webmaster commentary
The alignment of all members of the prion family is given piecewise by the graphics above. Mammals, for which by far the most sequences are available, are shown for brevity as ancestral sequence plus major variants. Software performs poorly on alignment of prion protein because of compositional similarity of non-homologous repeats, the divergence and non-alignable regions of doppel, and gaps in loop regions. Software does not produce an 'objective' alignment, it simply reflects built-in biases of programming (consistency is in part systematic error); a tool that works fairly well on all proteins does not always work well on individual cases

For these reasons, it is best to anchor first to as many invariant and near-invariant residues as possible, then align in between clamped residues using amino acid character and knowledge of interior residues, especially those that must hydrogen-bond or form salt bridges. It is important to use all sequence and sequence fragments because these suggest tolerances at individual codons. Problematic areas remain which only further sequences or direct structural studies can resolve.

As noted, what emerges is not so much a reconstructed ancestral sequence but rather a sense of an ordered conserved set of structural/functional elements: signal-basic-[repeat-basic-invariant]-globular domain-disulfide-glycosylation-GPI. While a forgone conclusion at the levels of percent sequence identity here, the conservation of 3D fold has its stronger and weaker regions: the underpass domain in particular has been very resistant to change for 310 million years. The 106-126 region has even better sequence conservation despite its bland composition and non-globular domain location; the reasons for this are unknown but must be central to function.

Does the turtle prion sequence allow for a better probe, perhaps to more distant ancestral sequences already in the databases? Yes, but various special features conspire against a viable search profile. Signal and GPI segments are found in thousands of unrelated proteins and requirements of the signal recognition protein result in convergence artifacts in Blast searches. The repeat region has an uncertain ancestral repeat length and its composition is too similar to common unrelated collagen-like proteins. The 106-126 also has to bland and repetitive a composition to be useful in Blast searhces. Long 3' UTRs in the prion gene family mean that ESTs rarely if ever reach the coding region; mutational change is too rapid in 3' UTR for it to serve as probe. This leaves the mature globular domain as the only handle. Yet fold searches have not found other members of its superfamily.

An alternative approach, which public databases are not quite designed to handle, to finding distant homologues is successive annotation-filtering prior to Blast alignment. Thus, first require a signal peptide and GPI site (or terminal integral membrane domain, excluding proteins with internal membrane segments). Then require a disulfide. (Glycosylation is almost a given with these two conditions met.) Finally, the Blast search on the severely reduced database must credit residue matches in proportion to their conservation.

Now the problem is the lack of available sequence in species 'between' drosophila/nematode and mammals. Little genomic data is available for reptiles and amphibians. In fish, fugu has received only localized genomic attention while zebrafish research has focused on mapping and ESTs: the prion gene has even been located by synteny. Prion protein was detected in salmon by antibody but the sequence was not pursued. Direct searches at fly/worm databases do not yield prion counterparts (though the full fly genome is just being released now); neither species is annotated adequately enough yet for reliable annotation-filtering. The human genome has not yet yielded further duplications nor pseudogenes (which provide a blurring snapshot of the gene at the time of creation).

In short, while various genomic initiatives move forward, the next development in understanding prion functional evolution will likely come from sequencing work based on the successful turtle sequence and the improved degenerate primers that it affords.

Mad Cow Home ... Best Links ... Contact Researcher ... Science Index