Prion Molecular Phylogeny
Mad Cow Home or Best Links
Last updated 30 May 98

Phylogeny for the prion protein
Revised dates of divergence
Application: the cetaceaen-artiodactyl divergence

Fig.1 Legend This is the best phylogeny that can be drawn for the artiodactyls as of June 1998. Animals relevent to TSE are shown; unsequenced strategic species are shown in red. The tree is based on a review of about 300 recent scientific publications. Data from short interspersed elements (SINEs) and satellite DNA were used to refine and resolve tree topology deduced from classical morphological analysis [reflected inGenBank taxonomy] and sequence alignment of many genes by standard methods.

Colored bars represent independent genetic markers that definitively partition artiodactyls. Everything 'downstream' of a bar necessarily forms a monophyletic clade (uniquely share a last common ancestor) -- this is a crucial property of repeat elements. Gray boxes show unresolved nodes where conflicting trees are still prevalent in the literature. For example, all three possible topological trees for Tylopoda(camels), Suidae (pigs), and Ruminantia (cows) appeared in 1997 in single-gene studies with various degrees of support.

Species shown in red are suggested strategic candidates for prion gene sequencing. Species such as white-tailed deer and cheetah have experienced TSE; reindeer are at risk for scrapie in Norway; squirrel has been associated with CJD in Kentucky; American bison are potentially at risk to CWD. The other species are selected for balance (baleen minke whale), for maximally informative phylogenetic position (chevrotain, nilgai, cape buffalo, rhino, hippo), or to eliminate long branches (peccary, okapi, nutria, guinea pig). Nilgai and cape buffalo are the highest priority sequence relative to BSE; white-tailed deer and reindeer for chronic wasting disease.

These sequences, if completed, would vastly improve our ability to reconstruct ancient phylogenetic nodes, our understanding of what constitutes the wild type allele in sheep and cattle, and our understanding of fixed mutational events during the evolution of this protein. In turn, this helps understand which protein domains are tightly conserved and which are random connecting loops of little normal function. For example, the complete absence of change in the pre-repeat region over 100 million years conflicts with a recent study asserting no fixed structural role for this domain.

Molecular phylogeny based on aligning single genes gives erratic results, depending on assumptions, methodologies, and weighting schemes. Vertebrates apparently underwent rapid radiation during super-continent breakup at about 110 million years: the nodes of divergence for the 18 orders of mammals are very tightly clustured and cannot be reliably dissected at this time because only a percent or two of observed change is attributable to this era. Some species, such as guinea pig, form long branches difficult to root because of few extant representatives of this lineage.

Short interspersed elements (SINEs) have a more definitive interpretation. The best known of retrotransposon is the primate Alu, 300-odd bp of junk (formerly 7 SL RNA) found in the human genome at an unbelievable 600,000 copies (compared to 65,000 functional genes, which is only 15 times that of the bacterium E coli). The prion gene introns contain 20 copies of various Alu's along with 69 similar elements from 53 classes.

In effect, the vertebrate genome is a roach motel: retrotransposons can check into a lineage in high number via reverse transcriptase but after experiencing genetic drift, they can't check out. There are no instances of parallel events, back mutation, or convergence; these aren't really possible but could be detected anyway from sequence minutae. Subsequent bursts of amplification by founder elements further refine the clade. The key differences favoring a phylogenetic tree made from a highly repetitive transposon set over one made from a single gene (or morphological characters) are the lack of back mutation and lack of convergent evolution. However, molecular clocks are not feasible with repeat elements.

Combinatorics of node attachment times permutation of species order give rise an intractibly large number of possible trees for even a few dozen species, worse asymptotically than (n+1)!. Current maximal likelihood software cannot explore this space in realistic time frames. How do you find the right tree? This is where SINEs and other repeat elements come in. For this reason, it is extremely important to exploit the partial, reliable constraints afforded by morphology, alignments, well-supported maximal parsimony trees from various genes, satellite DNA, and SINEs. The idea here is, we might not be sure where the giraffe branches off relative to deer or gazelle, but we are quite sure giraffes are not rodents.

What it is the significance of this to prions? Suppose that a sequence of a whale prion had just been completed. Because whales shared a common ancestor with sheep and cattle that pigs and camels did not, its sequence provides information about the structure and function of the prion gene at the time of appearance of the ancestral ruminant. The species barrier to BSE may well be lower to whale that to pig than to camel. The dolphin node helps us understand when and where certain changes took place in the development of the current cow or sheep prion and whether certain polymorphisms are new or have persisted for tens of millions of years or have occurred in multiple lineags. A case in point are the polymorphic repeat bovine alleles (5 or 6 copies): the longer allele may have arisen 2-3 million years ago and persisted across speciation of kudu, bison, and cow.

Prion sequences can be clamped to the above tree, using both its topology and dated divergences. This turns conventional molecular phylogency on its head -- instead of trying to deduce the tree and dates from prion sequences, we use a consensus tree derived from many genes and fossils and ask where, how fast, and when mutations in the prion gene become accepted as dominant alleles.

Has the rate of genetic change been uniform across lineages in prion protein? This need not be the case: ovine and human growth hormone genes had a very different history : a 40x increase is seen in mutation rate in the ruminants and great apes. Antoinette van der Kuyl reports an increased substitution rate of the hominid prion gene during the period of brain expansion relative to ruminants (in press).

Revised dates of divergence

Nature 1998 Apr 30;392(6679):917-920  [See also Nature 1996 May 16;381(6579):226-229]
Kumar S, Hedges SB
Below are the best available dates for species divergences based on molecular clocks averaged over 658 nuclear genes from 207 species. Mammalian orders diverged during continental breakup, which preceded by tens of millions of years the KT boundary event that opened niches occupied by non-avian dinosaurs. The fossil record may be inadequate for the Mesozoic, an upsetting concept to traditional paleontologists.

Below are the dates for various divergences of species for which the prion gene has been sequenced. They are sorted by declining relative precision (ratio of standard error to age of divergence):

DivergenceMya Div# SppStd DevPrec
OW monkeys-great apes23.3561.25
lobed fish-other fish450.04435.58
orangutan-great apes8.260.810
NW monkeys-primates47.698.317
gibbon-great apes14.642.819
gorilla-great apes6.761.319

Pigs, cows, and whales

For over a century, a trocheated astragalus, a partial double mesocylix, narrow lower trigonids, and expanded orbitosphenoid have been taken as defining synaptomorpies of Artiodacyla. For example, a unique double pulley system in the heel bone gives cows and their allies a flexible joint not found in any other vertebrate. GenBank taxonomy is based on classical morphological markers, as reflected in the tome, "Mammals of the World."

Cetaceae (whales and dolphins, but not seals) do not exhibit a trocheated astragalus even vestigially or during development, nor do any of their fossil antecedents. Yet molecular phylogenists, using new repeat elements called SINEs, have demonstrated beyond any reasonable doubt, that the Tylopoda (camel, lama) and Suidae (pigs, peccaries) diverged from the artiodactyl lineage prior to the divergence of the Ancodonta (hippo, sometimes wrongly put in with pigs) and Cetaceae (whales). Whales are thus artiodactyls but not quite ruminants (which begin with Tragulidae (chevrotains). Cattle are in a later group called Pecora.

GenBank has gone back and forth between traditional and modern taxonomies depending on input.

A partial prion sequence has been completed from a dolphin (toothed whale); from molecular taxonomy, it is expected to have features in common with giraffe, deer, cow, and oryx lacking in pig or camel (one expects closest of all to hippo, then chevrotain). The graphic below aligns whale sequence to these species. Amino acids are colored by chemical groups; differences of the dolphin sequence are highlighted either according to phylogenetic signal or as non-informative singlet stature.

Changes pertinent to dolphin attachment to the tree are weighted for significance by numbers on the top line. Changes in variable regions , oscillating codons, and 'interchangeable' amino acids (ILVM etc.) receive lower weights; consistent changes in conserved regions, secondary structure, modification sites receive higher weights. Residues fixed across all mammals are considered strongly conserved; if still identical in birds, considered ultra-conserved.

A recent paper [GJP Nayor and WM Brown Nature 388:527 1997] quantitated the phylogenetic value of change in mitochondrial genomes for each codon position, each gene, and each amino acid over 4081 positions x 19 genes = 77,482 residues; hydrophobic amino acids proved to have the least value.

They used a set of organisms for which the phylogenetic tree was known. Despite the large data set, maximal parsimony produced the wrong tree and worse, compelling bootstrap support for it, motivating a search for better weighting of codon position.

Variability of a site can also be quantitated by counting the number of different residues that appear there across all species or by a lineage-adjusted count of accepted changes. These different quantitative scales might be made commensurate by normalizing to discriminatory power. Actual weightings used here are on a 1-5 scale that reflects all these considerations; while a judgement call, robustness of weights to affine rescaling can then be studied.

Residues with discriminatory value are summarized in the table below in declining order of weights and assigned trees. For example, the first column favors the tree ((cow, dolphin), pig) with weight 4. These 'tree votes' are then summed with the result that ((cow, dolphin), pig) receives a score of 20, markedly higher than ((cow,pig), dolphin) at 12 or ((cow, (pig, dolphin)) at 4.

In summary, prion protein sequences support the new molecular taxonomy derived from a large set of genes, mitochondrial sequences, and retrotransposons.

At the time that dolphins diverged from artiodactyls 60 million years ago, the prion sequence is not hard to reconstruct. (Some residual ambiguity could be eliminated by judicious sequencing of additional species.) Residues that were to change as whales evolved are shown in red, with the ultimate substitution shown on the right margin. Analagously, residues that were to change in cows are shown in blue, with the ultimate substitution shown on the left margin. One might suppose that cows "looked" more like the common ancestor than did whales and so would change more slowly genetically, yet one sees that the number of fixed mutational changes has been essentially the same in both lineages.

Mad Cow Home or Best Links