31 May 01 Xenopus laevis cDNA clone BG813008 from eye. WashU Xenopus EST project Average insert size 2.3 kb. High quality sequence stops at base 436.This 5' EST appeared today at GenBank. It did not emerge from prion research per se but rather a Xenopus genomic transcriptome project. This very important sequence pushes back the earliest known prion protein to the time of divergence of the amphibians, to 360 million years ago (Kumar and Hedges: Nature 392:917 1998).
This is a partial sequence containing 131 bp of upstream sequence followed by 387 bp of coding sequence, of which 22 codons are signal peptide and 107 mature protein coding. Because the sequence stops at the second residue of the second beta strand (both are perfectly conserved), it provides no information about helices B and C, disulfides, glycosylation sites, or GPI anchor.
This is definitely a prion protein, bearing 41% blast identity (higher without repeat gap penalties) equally to avian and mammalian prions. It bears little resemblance to doppel, again suggesting that the gene doubling event was very ancient -- the reconstructed ancestral amniote protein has not converged to any plausible ancestral doppel. If the sequence could be extended to the complete coding region, a single disulfide pair in prion position would probably be affirmed. Thus, unless there was lineage-specific loss, xenopus also will have a doppel.
Xenopus prion completely lacks a repeat region (as predicted here years ago from the discordance between marsupial, placental, avian/turtle repeats). There are no possibilites for copper binding: no histidines whatsoever in 129 residues. Therefore it cannot function as a superoxide dismutase. The globular domain must have a self-contained, distinct function likely having nothing to do with oxidative stress.
Like doppels, xenopus prion has post-signal region, here of 34 residues, that strongly retains the character and composition of flanking regions of mammalian repeats. The remarkable conservation of this region for 360 million years in another protein lineage strongly refutes the notion of an unstructured extended peptide. Note the WGQ tripeptide conserved in prions and doppels in all species -- this involves a billion years of "round-trip" evolutionary time, requiring a compelling reason. This region is probably folded back over the main domain making it more globular; some hints of this have come from nmr and modelling; however, there is no hint of helix or sheet in its predicted secondary structure [see below].
Xenopus prion is the first species studied to have an imperfect 'invariant' region,106-126 in mammals (a new Chinese sheep sequence is also slightly anomalous: VVGSLGGYMLG). The invariant region has a 4 aa deletion but is still highly conserved: the 29 residue stretch QWKPPKsKTNMKSVAigAAAGAi----GGYMLG only differs at 4 positions from mammalian/avian consensus (86% agreement). Thus this region must be functionally important (it is dispensable structurally) even in the absence of a repeat domain.
The region between the two beta strands is 29 residues in length, the same as in mammals and birds. Conservation is less with 17 amino acids matching bird/mammal (59%) with the inferred helix A domain far better conserved than the loop region. Remarkably, there are 4 residues completely invariant in all prions, 2 of which (interior phenylalanine and helix-boundary proline) are also conserved in all doppels.
Note that xenopus viewed as outgroup settles the nature of 11 ancestral residues, for example "codon 129" is M and YY occurred in the common amniote ancestor, not A or V and WW as in birds (newly determined ancestral values have a dot under the xenopus line). To the extent that highly conserved residues are oriented towards the solvent, they likely reflect a persistant interaction with another protein.
The 78 residue pseudo-ancestral sequence below makes for a very strong Blast query sequence in searching fugu, zebrafish, and other early vertebrate lineages for Ests and prion-related proteins. The compositional simplicity of the "106-126" region can cause misleading matches; gap penalties are large when the repeat region is missing or of different periodicity. The conserved glycosylation site and disulfide pair are held in reserve to validate weak matches in more distant model organisms. However, at this time, there are no striking matches, even in the 463,953 sequences in the Danio rerio trace archive.
>prion deep query sequence WGQQYNPSsggYHNQWKPPKSKTNMKSVAgaAAAGAVVLGGYMLGsAmsRMSYhFgNPmEsRYYNeNsyryPNRVYYR
|The xenopus sequence is thus very unlikely to be recruitable by mammalian amyloid seed. Xenopus prion or some fragment thereof may however be capable of forming a rogue conformation on its own (beyond the amyloidogenic propensity that almost every protein possesses); some support comes from software prediction of a alpah/beta ambivalent stretch precisely at the first beta strand and helix A:
Ambivalent Sequence Predictor (ASP v1.0) Young et al. Protein Science(1999) 8:1752-64 ....,....7....,....8....,....9....,....10...,....11...,....12 AA |PYNPSGYNKQWKPPKSKTNMKSVAIGAAAGAIGGYMLGNAVGRMSYQFNNPMESRYYNDY| prH sec |000000000000000001157888998998653555432211000000000344443233| prE sec |000000000000000000000000000000001123445532388766100000111111| prL sec |999988889999999987742110000001235321111156511233898555444555| ASP sec |..................................SSSSS...............SSSSS.|
What does xenopus prion tell us about fish? Farmed fish have policy implications because they are fed BSE bone meal in some countries. The xenopus sequence suggests that the prion protein evolved more rapidly during the long era between fish and amphibian divergence from the mammalian/avian lineages than it did subsequently after the copper-binding repeats became established. Thus fish prion sequence can be predicted to not have a repeat region at all and most likely a "106-126" region more like that of amphibian, though xenopus could have incurred deletions. In summary, fish prion is very unlikely to be recruitable by mammalian prions, though it may well retain the ambivalent secondary structure noted above.
The signal region of xenopus prion is not overwhelming supported by prediction software, but the neural net is not at all tuned to amphibians which represent in this context a very long branch. It is unsurprisingly barely alignable to prion/doppel signal regions. The main evidence for it comes from alignment and the conserved double lysine that follows the putative cleavage site. If not a signal peptide, it would be structurally gratuitous. The issue is better settled by determining disulfides (oxidative external location), glycosylation and GPI anchor (both imply endoplasmic reticulum transit and non-cytoplasmic localization).
The 151 bp of upstream region, because of stop codons and the signal region, most likely represent untranslated 5' exons (they could also be vector contaminant). There is no Blast alignment to exon 1, exon 2, 10 bp of 5'UTR exon 3 or intronic sequence in mammals and turtle where this data exists. It is not possible to determine exon boundaries after splicing (presumed here) has occurred in a cDNA. In summary, the exon structure of the xenopus prion gene cannot be inferred from this single cDNA though the coding region is probably mono-exonic.
Reference Sequences ctttcccgatcaccctggaacagctatcctgaatccccccctggcacatccatttcgtatt ttccccttggcacagaggcacagcaccaggacctgacacccacatagcttctctttggca cactctataccctcacccaggttgtttatgatgccacaaagtctctggacttgtttagtc M P Q S L W T C L V cttatctccctagtatgcacattgactgtatcttccaagaagagcggtggtgggaaaagt L I S L V C T L T V S S K K S G G G K S aaaactggaggatggaacacagggagcaaccggaaccccaactacccaggaggctaccca K T G G W N T G S N R N P N Y P G G Y P gggaatactggaggcagctgggggcaacaaccttataatcctagcggttataacaagcaa G N T G G S W G Q Q P Y N P S G Y N K Q tggaaacctcccaagtccaaaaccaacatgaagtcggtggccataggcgctgctgctggt W K P P K S K T N M K S V A I G A A A G gctattggaggctacatgctcggtaatgcagtgggtcgtatgagttatcaattcaacaat A I G G Y M L G N A V G R M S Y Q F N N cccatggagtcccgttattataacgactactataaccagatgccaaatcgcgtgtac P M E S R Y Y N D Y Y N Q M P N R V Y >BG813008 Xenopus laevis cDNA clone WashU Xenopus EST project ctttcccgatcaccctggaacagctatcctgaatccccccctggcacatccatttcgtattttcccct tggcacagaggcacagcaccaggacctgacacccacatagcttctctttggcacactctataccctca cccaggttgtttatgatgccacaaagtctctggacttgtttagtccttatctccctagtatgcacatt gactgtatcttccaagaagagcggtggtgggaaaagtaaaactggaggatggaacacagggagcaacc ggaaccccaactacccaggaggctacccagggaatactggaggcagctgggggcaacaaccttataat cctagcggttataacaagcaatggaaacctcccaagtccaaaaccaacatgaagtcggtggccatagg cgctgctgctggtgctattggaggctacatgctcggtaatgcagtgggtcgtatgagttatcaattca acaatcccatggagtcccgttattataacgactactataaccagatgccaaatcgcgtgtac >xenopus prion protein MPQSLWTCLVLISLVCTLTVSSKKSGGGKSKTGGWNTGSNRNPNYPGGYPGNTGGSWGQQPYNPSGYN KQWKPPKSKTNMKSVAIGAAAGAIGGYMLGNAVGRMSYQFNNPMESRYYNDYYNQMPNRVY Ambivalent Sequence Predictor (ASP v1.0) Young et al.:: Protein Science(1999) 8:1752-64 ....,....7....,....8....,....9....,....10...,....11...,....12 AA |PYNPSGYNKQWKPPKSKTNMKSVAIGAAAGAIGGYMLGNAVGRMSYQFNNPMESRYYNDY| prH sec |000000000000000001157888998998653555432211000000000344443233| prE sec |000000000000000000000000000000001123445532388766100000111111| prL sec |999988889999999987742110000001235321111156511233898555444555| ASP sec |..................................SSSSS...............SSSSS.|