Most people have the impression that prion protein sequence is extraordinarily conserved, at least in the mammalian lineage. To make this quantitative, I took an out-of-date compilation of 40 not-necessarily-representative protein families for which the rate of change is known. Common units are accepted point mutations (PAMs), changes per 100 residues per 100 million years, adjusted for multiple hits at the same site, as per Dayhoff. The mean PAM was 19, with median 14 and standard deviation 18, for this data set.
More convenient units (multiply by 1.85) for comparisons to prion protein would be changes per 250 residues per 135 million years (our earliest node, excluding chickens), with a mean of 35, median of 26, and a standard deviation of 33. The sorted data set in these units is: 0.5, 0.6, 3.1, 3.1, 3.3, 4.3, 5.2, 7.8, 8.3, 9.3, 9.8, 9.8, 12.4, 13.1, 14.3, 16.9, 20.4, 22.2, 24.1, 25.9, 25.9, 25.9, 31.5, 31.5, 31.5, 37.0, 37.0, 46.3, 46.3, 50.0, 55.6, 61.1, 61.1, 74.1, 79.6, 79.6, 88.9, 98.1, 109.3, 131.5
A better data set needs to be larger and computed relative to the same phylogenetic tree, that is, the same bird-marsupial-placental divergence dates. Neither the mean nor the standard deviation is currently robust: ie, dropping the two highest values lowers the mean to 31 and the standard deviation to 27.
To compute the PAM for prion protein, one must decide how to handle three special situations, namely, whether to include the signal peptide and the cleaved GPI sequence (or only look at mature protein, recalling that fibrinopeptides evolve wildly whereas fibrin does not) and how to handle changes in the octapeptide repeat region (where the unit of change is roughly the octapeptide unit itself, not a single nucleotide). The rate needs to be figured for individual positions, separate domains, as well as globally.
Including the chicken sequence would drastically raise the prion PAM rate . The alternatives are to say (1) chicken prion is actually an inbred defect and atypical of birds, (2) chicken prion is only paralogous, (3) chicken prion is orthologous but the molecular clock runs unevenly, bursts of evolution have accompanied development of new features and functions of the mammalian nervous system. There seem to be no examples of function has abruptly changed in some lineages but not others, without gene doubling.
There is obviously a difference between PAMs [point accepted mutations that propagated across a population and became established within a species] and the much commoner narrow mutational events that occur in an individual but, for whatever reason, never becomes established. The former mutations are important to prion structure-function analysis; the latter to natural background BSE incidence. In species like cows where individuals number in the billions (unlike, say, pandas), rare events happen happen every year.
Consider the oft-sequenced protein, cytochrome c. Here, the "invariant" residues slowly evaporated over time as the net was cast wider; today only a few core residues remain in this category. Placental mammals and marsupials diverged 135,000,000 years ago and 'early vertebrates' now go back closer to a billion years. The prion gene has changed slowly but the record is held by histone iv, with one difference between humans and peas.
This still begs the question why some regions of prion protein are as conserved as we see them, given the mild impacts of total deletion. Prion protein may have an conventional ligand-binding function that would not be particularly sensitive to sequence change, but that in the course of its maturation, processing, and turnover, it must maintain binding sites for a large and disparate group of other proteins. In this scenario, prion protein has little "wiggle room" not because its job on the outer cell membrane is so exquisitely sensitive to structural disturbance (or particularly important) but because of the sum total of all these other constraints. With some work, one could check whether other GPI proteins exhibit similar anomalies.
It is important to align the DNA separately from the protein to partition genetic changes between hairpin structural signalling and conventional amino acid roles. In this case we already know there is a strongly conserved helix C in the nucleic acid, overlaying and out of phase with the first octapeptide repeat. Third codon position change (silent mutations) in the 4-codon amino acids (val, ala, thr, pro, gly, and also arg, ser, and leu) componetize selective pressure between nucleic acid and protein realms. That is, there should be slower Kimura drift where nucleic acid structures are important. If not, we obtain a dating technique for the repeat region, based on non-equilibration at silent codon positions.
By examing changes in closely related species such as macaques, where confusing over-writing second mutations aren't expected, the distribution of silent mutations along the length of the gene can be determined. It is important to reccognize the difference in purine and pyrimidine third position useage by various amino acids, (i.e., keep transitions and transversions separate). By restricting to conserved amino acids, the desired parameter is isolated.
The chart below shows type of codon used along the prion gene and so the opportunities for measuring variation in third position for comparable conserved amino acids.
|Codons||Use||%||Type||Prion codon type use|
|6||15||6||pur||These distinctions are needed to compare evolutionary rates at different positions. Some codon sets are purine-rich in position 3, like glutamine; others are pyrimidine-rich, like asparagine.|
n--r--y--p--p--q----g--g- amino acid coded accgctacccacctcagggtggtggt consensus placental mammals -3--3--3--3--3--3--3--3--3 third codon postion ----t-----------------c--- rodents -------t-----------a--g--- ruminants -------------------c------ great apes -------------c------------ old world monkeys -------------c-----------c new world monkeysMouse prion protein fails to translate as expected, with the third amino acid in the first repeat, , not represented in protein [JMB 245: 362 1995], probably due to a transcription error from a weakened 3'hairpin stem.
accgttacccacctcagggtggcggt rodents accgttacccacctcagagtggtggt rat accgttacccacctcagggtggcacc mouse accgttacccacctcagggtggcggc hamster
accgctatccacctcagggagggggt ruminants accgttatccacctcagggagggggt cow
accgctacccacctcagggcggtggt great apes accgctacccacctcagggcggtggc gibbon
accgctacccaccccagggtggtggt old world monkeys accgctatccaccccagggtggtggt baboon
accgctacccaccccagggtggtggc new world monkeys acctctacccaccacagggtggtggc capuchin accgctacccaccacagggtggtggc marmoset accgctacccaccccagagcggtggc aotes accgctacccaccccagggtggtggt titi
It is instructive to look at prion DNA sequences. DNA for the first 25 amino acids is shown gapped and aligned, along with consensus sequences for the earliest placental mammals, earliest mammal, and earliest vertebrate.
The consensus sequences below show possible a small internal duplications within the pre-repeat region nucleotide helix C, GG-NRYP-Q, that possible occurs again much later between alpha H1 and beta B2 (with loss however of the helix c hairpin, VYY not substituting for GGG). This region itself has additional small repetitive character of its own, YYR, found internally in H1 and B2, or alternatively triple repeats of RY and YR. These regions may be very ancient areas of prion gene extension, attributable to replication slippage or unequal crossing over.
---------S------L----- GGSNRYPGQPGSPGG-NRYPPQ GG ...GNDYEDRYYRENMNRYPNQVYYRPVDQYSN ...-SEW--------QH---S--M-K-I-E-NS ...-------------Y----------M-R---Recall with a random string of 20 amino acids used equally, there are 400 possible pairs, 8,000 triplets, 160,000 quartets and so on. Prion protein is approximatly 250 amino acids in length; one does not expect to see a sequence like GGxNRYPyG repeated by chance.
In yeast, where nearly all genes have been experimentally deleted, a goodly proportion were deemed non-essential to normal colony growth. Of these, a surprising fraction were assigned a modulatory role. That is, rather than being in the mainstream of intermediary metabolism, they only fine-tuned some process. Further, the cell often had some compensatory mechanism that could come into play in a deletion setting. Evidently there can be considerable selective pressure to maintain a non-essential gene. This could all be applicable to prion protein.
Why wasn't the rogue conformer possibility eliminated during evolution -- sheep are struck down in their prime reproductive years. Does it exist by happenstance or could a second conformer have an unsuspected in vivo value that could explain its ongoing evolutionary availability?
Suppose we take some homology alignment software off the Web and align a batch of prion sequences only to find (as Southward's and Prusiner's groups have) that something we know to be wrong is returned as the best phylogenetic tree.
Prion paradox or just so-so software?
The options are that (1) the consensus of a century of anatomical, embryological, paleontological, and mega-sequencing data is wrong, and we should be using prion phylogeny to correct the situation, (2) the software algorithm is insufficiently sophisticated, gave the wrong result, and the nodes in error should be examined by eye to see what, if anything, is going on.
There is a widespread misconception that phylogenetic software is somehow 'objective' when in fact it simply incorporates personal decisions and weighting preferences of the programmer. An expert on protein crystallography may be able to align the sequences much better by eye, using software results only as the point of departure (for bulk processing). The best analogy is in medicine: do you believe your internist or the 'expert' computer system that commented on your blood panels?
In other words, we are not studying prion protein not because of any interest in revising vertebrate evolution; the known tree should drive our study of prion evolution. In this situation, we require optimal alignment with respect to a constrained tree. Otherwise, the tail is wagging the dog.
Now it is often not possible to reproduce computer alignment results because they are seldom published in detail and the software version number used has long since vanished (if it even runs at all on a current computer). However, it is worth asking what information about prion protein could possibly come out of analyzing conflicted nodes:
It may turn out that deletions or insertions in the octapeptide region and elsewhere have thrown off the software -- deletions are historically the Achilles heel in automatic alignment scoring. Another scenario is mutations on top of previous mutations have reverted or obscured a sequence's real changes. Constrained to the standard tree, we may correctly infer what happened in seemingly improbable events.
The situation for prion protein is that a lot is known about domains beyond the primary sequence. Cysteine residues, glycosylation sites, helices, beta structure, GPI and signal cleavages provide internal anchors for alignment: thus homology software also needs to be clamped to regions between these fixed points.