Knockout News
Mad Cow Home ... Best Links ... Search this site

Prion gene good for something after all?
Knockouts vs knockouts: is the prion gene essential after all?
Phenotypic GSS and new mutations D202N and Q212P
Hydrogen bond donors and acceptors
Nematode genome 85% complete: no prion yet
Triplet repeat diseases: innocent inclusions?
CpG codon depletion

Signal region synapomorphy

22 Oct 98 webmaster
The prion gene may be good for something after all -- helping resolve a vexatious issue in mammalian evolution.

Eutherian mammals are thought to have experienced a very rapid radiation roughly 100 million years ago (mya) into the various taxonomic orders of today such as rodents, primates, carnivores, and ruminants. Distant events long ago are difficult to resolve by aligning sequence data because fixed mutational changes are rare during the brief window critical to tree topology. If the radiation considered here took place over 1 million years, the 1:99 ratio of branch lengths in the phylogenetic tree implies observed change will be overwhelmingly post-radiation and irrevelent.

For many genes, eg cytochrome oxidase or DNA polymerase, there is no reason whatsoever to expected a favorable acceleration of rate of change during the divergence window -- these genes have the same function whatever the skeletal morphology or habitat niche so do not experience a slackening of selective pressure (though founder drift effects could be enhanced). Recent work in hox genes further illustrate that remarkable changes in morphology can arise from a few modest point mutations or duplications in genes that direct development.

In the prion gene and many others, the rate of evolutionary change varies markedly by codon, by as much as a factor of 50. Stretches such as AGAAAAGA see no change accepted for more than 310my in any lineage; at the other extreme, serine- asparagine toggle codons have fixed changes in many disparate lineages with a characteristic time scale of 10my and are often seen in extant species as alleles. Other codons, such as the ancestral tryptophan in DWEDRY, exhibit synapomorphic changes, here to tyrosine (a 2bp change with DCEDRY as probable intermediate, seen only in post-guinea pig rodents).

Thus the situation is really worse than the first two factors [branch fraction, steady rates] suggest: the mutations most likely to occur during the critical 1mya window are in codons with highest rates of evolution -- exactly those which are likely to be over-written numerous times subsequently. In the main prion gene, the codon most likely to change in 1my is GGHNQW. Following any particular lineage forward in time results in several changes. Since the only data is from extant species, nothing reliable can then be inferred about the period of interest 100mya.

Codons with phylogenetic signal relative to the mammalian radiation are thus rare because of conflicting requirements -- a slowly evolving codon needs to have changed during a particular narrow window. If the rate of change by codon position is plotted, codons can be weighted for relevence by convolution of the Fourier transform with a time scale semi-gaussian for the polytenic node in question, ie, by low band-pass Fourier filtration.

Codons that change too fast are effectively discarded; codons that change too slowly are moot. It is sometimes hypothesized that rate of change is regionaly smooth, yet adjacent amino acids in an alpha helix do not have their side chains in proximity -- one may be a structurally critical interior residue, the next a weakly constrained polar surface residue. In the prion gene, 256 codons are quickly reduced to a handful of potentially synapomorphic positions because there are many invariant residues and additionally rapidly changing positions. The key idea here is to not treat good information (codons with an appropriate rate of change) on an equal footing with mediocre information (codons experiencing multiple hits).

This approach is equally applicable to mutational sites involving insertions and deletions (called indels when the event cannot be resolved) in genes that are not evolving chaotically. However, those indels involving the tandem repeats and oligo-glycines in the prion gene are the analogue of ser-asn toggle codons: they change too fast to have applicability to 100my time scales. Specific indels are rare to begin with; to occur twice in separate lineages or to revert has the effect of squaring an already miniscule probability. (Tandem repeats have special structural features that enhance these rates through replication slippage; retrotransposons raise other special issues.)

Fortuitously, the prion gene contains the perfect indel relative to the mammalian radiation in its signal region. The event, most parsimoniously explained as a 6 bp deletion (two codons so no frameshift), cleanly separates rodents, primates, lagomorphs from the ancestral lineage (marsupial sequence) and ferungulates (ruminants, cetaceans, carnivores, perissodactyls). In other words, the deletion establishes the existence of a common ancestor to the mouse-human-rabbit lineage not shared by the cow-mink-horse lineage.

MA--NLGCWMLVLFVATWSDLGLCKK  great apes (6 species)
MA--NLGCWMLVLFVATWSDLGLCKK  old world monkeys (15 species)
MA--NLGCWMLVLFVATWSDLGLCKK  new world monkeys (8 species)
MA--HLGYWMLLLFVATWSDVGLCKK  lagomorphs (1 species)

MVKSHIGSWILVLFVAMWSDVGLCKK  artiodactyl (20 species)
MVKSHIGSWILVLFVAMWSDVGLCKK  perissodactyls (3 species)


How solid is the evidence for this interpretation?

First note that, deletions and insertions are extremely rare in the prion gene -- excluding tandem repeat slippage, the only other known example in 80 mammalian species is a glycine (GGG codon) inserted in an ancestor of Murinae rodents at the hyper-variable GPI junction: YYDGRRSsavlf.

Second, the 26-residue signal region has been quite stable to point mutation, almost comparable to mature protein. All genetic change accepted in the signal region in the last 100my in 80 species can be explained by a dozen or so point mutational events and the indel under consideration here; another few point mutations accommodates the marsupial divergence at 178my. There is little singlet change (where sequencing experimental error is concentrated in any case), a few conservative toggle codons, and a limited number of deeper synapomorphic changes all consistent with seldom-disputed aspects of the phylogenetic tree (assumed here throughout). For example, ancestral tyrosine at position 8 has gone to cysteine in old world monkeys-great apes and to 4-codon serine in ferungulates.

(Serine is unique in having 6 codons not in the same column of the standard genetic code; direct change requires two base changes and seldom occurs; threonine and cysteine, at the intersections of serine rows and columns, usually mediate change. The effect results in relatively frozen 4-codon and 2-codon serine, giving constraints on toggle opportunities; 2-codon serine can be safely inferred from ser-asn toggles, 4-codon serine from ser-ala-thr.)

Next, note that the indel event can be simply explained by an insertion or deletion involving codons 3 and 4. Two similar scenarios also work, with the indel beginning at position 2 or 3 of codon 2. This region has seen very little change at silent codon positions.

MA--NLG  primates
MA--NLG  rodents
MA--HLG  lagomorph
MVKSHIG  ferungulates
MGKIQLG  marsupial
MARLLTT  chicken
 m   a           NH  l   g
ATG GCG --- --- CAC CTC GGC rabbit
ATG GCG --- --- AAC CTT GGC primates
ATG GCG --- --- AAC CTT GGC rodents
 m  gv   k   is  hq  il  g

Alignment programs such as ClustalW or Blast often do not gap correctly in this region. This error then trickles down through research papers on the rate of change in the prion gene (or of nuclear genes in general) when alignments are not hand-gapped. The effect is not trivial when compounded with gross errors regarding the octarepeat region, because illusory changes can then quantitatively dominate the picture of prion gene evolution. Note that it is most unclear at both the DNA and protein level which residues are still homologous (or even what homologous means) because of the split between function and descendancy.

Chicken and marsupial, safe outgroups, have 26 residues in the signal region, as do all ferungulates. Despite the great span of time, they align quite well with conservative single base changes needed for concordance. This argues for the indel being a deletion within the rabbit-primate-rodent lineage, rather than separate insertions within birds, marsupial, and ferrungulates. Further support could be obtained by sequencing mamalian orders not represented in the data, such as basal sloths: all are predicted to have 26 residues in the signal region.

          MA--NLGYWLLALFVATWTDVGLC-KK                 rodent
          MA--HLGYWMLLLFVATWSDVGLC-KK                 rabbitt
          MA--NLGCWMLVLFVATWSDLGLC-KK                 sq monkey
          MVKSHIGSWLLVLFVATWSDIGFC-KK                 mink
          MVKSHIGSWILVLFVAMWSDVGLC-KK                 deer
          MGKIQLGYWILVLFIVTWSDLGLC-KK                 marsup 
          MARLLTTCCLLALLLAACTDVALS-KK                 bird 
          MVKSHLGYWILVLFVATWSDVGLC-KK                 ancestral placental mammal
Another region where an indel synapomorphy seems to cleanly separate rodent-primate-lagomorph from ferungulates is in the terminal octapeptide repeat, which is always a nonapeptide in ferungulates but never in the other group. A tetra-glycine becomes a tri-glycine. The marsupial also has a nonapeptide but is not overwhelming in its similarity otherwise.. The first and final repeats are not subject to erasure and over-writing like middle repeats by the nature of the slippage mechanism.

In conclusion, [(rodent-primate-lagomorph), ferrungulate), marsupial] is the only tree with topology consistent with the signal and repeat region deletions. There are further synapomorphic codons but they simply confirm agreed-upon divisions such rodent-primate or ferrungulate-others. Ruminants cannot be separated from carnivores with protein sequences from this region. The rabbit node cannot be placed -- there is little value in long branch sequencing unless taken in 3's to suppress singlets, eg [(rabbit, hare), pika]. Two similarly chosen marsupials would be of even more value; [(opossum, kangaroo), monotreme] works, again because the topology is certain and the branches are not too short. Prion researchers have squandered immense resources sequencing too closely related taxa.

Here is a very curious recent paper on this same topic concerning a mitochondrial gene that also goes against the grain:

Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders.

Cao Y, Janke A, Waddell PJ, Westerman M, Takenaka O, Murata S, Okada N, Paabo S, Hasegawa M
J Mol Evol 1998 Sep;47(3):307-22
"The phylogenetic relationship among primates, ferungulates (artiodactyls + cetaceans + perissodactyls + carnivores), and rodents was examined using proteins encoded by the H strand of mtDNA, with marsupials and monotremes as the outgroup. Trees estimated from individual proteins were compared in detail with the tree estimated from all 12 proteins (either concatenated or summing up log-likelihood scores for each gene). Although the overall evidence strongly suggests ((primates, ferungulates), rodents), the ND1 data clearly support another tree, ((primates, rodents), ferungulates).

To clarify whether this contradiction is due to (1) a stochastic (sampling) error; (2) minor model-based errors (e.g., ignoring site rate variability), or (3) convergent and parallel evolution (specifically between either primates and rodents or ferungulates and the outgroup), the ND1 genes from many additional species of primates, rodents, other eutherian orders, and the outgroup (marsupials + monotremes) were sequenced. The phylogenetic analyses were extensive and aimed to eliminate the following artifacts as possible causes of the aberrant result: base composition biases, unequal site substitution rates, or the cumulative effects of both.

Neither more sophisticated evolutionary analyses nor the addition of species changed the previous conclusion. That is, the statistical support for grouping rodents and primates to the exclusion of all other taxa fluctuates upward or downward in quite a tight range centered near 95% confidence. These results and a site-by-site examination of the sequences clearly suggest that convergent or parallel evolution has occurred in ND1 between primates and rodents and/or between ferungulates and the outgroup. While the primate/rodent grouping is strange, ND1 also throws some interesting light on the relationships of some eutherian orders, marsupials, and montremes. In these parts of the tree, ND1 shows no apparent tendency for unexplained convergences."

What about convergent evolution in the signal region?

Now a great many proteins have a signal region; within a given species, these are all processed by the same endoplasmic reticulum machinery -- a limited set of signal endopeptidases must recognize the clip point in all these pre-proteins. The 'signature' of a signal peptide is not a specific linear sequence (like that of a restriction site) but rather a more generic pattern of central hydrophobicity followed by a serine or cysteine and charged residues.

This sounds like a prescription that would tolerate rapid rates of change yet it does not. It also sounds like a prescription for convergent evolution or for building exportable proteins by swapping in a universal signal domain (analogue of the Rossmann fold for nucleotides). So why, on a Blast search against an 850,000,000 bp data set, do prion queries only return other prion signal regions?

The answer may be that signal peptides are very ancient, dating back to the divergence with eubacteria -- there has been a great span of time in which to diverge. Convergent evolution does not act in this instance to drive the domain to a universal linear sequence, rather to a common generic property pattern within an immense sequence space (20 to the 26th power). There may not be many proteins with 'new' signal peptides; the source may be existing signal proteins through gene duplication and divergence.

Note: why use INDEL for INsertion or DELetion? Because when aligning sequences, one often sees that a gap has to be introduced. That doesn't necessarily mean the shorter sequence had a deletion; the longer sequence might equally have had an insertion. In many situations it is not possible to resolve the issue. So rather than call it 'insertion or deletion' which is too cumbersome for constant use, or call it 'deletion' which is biased, people went for a neutral term, indel.

The indel in the signal region of the prion protein happened to be resolvable and was a deletion. Resolution is only probablistic: 1 rare event is a whole lot better than 2 rare events, eg the ancestral signal could be 24 aa and the marsupial and ferungulate lineages could each have had the same 2 aa insert in the same spot while mouse-human stayed at ancestral length. Resolution is also predictive: 3-toed sloths, elephants, platypus prions etc. will have 26 aa. Guinea pig is likely 24 aa but could go either way, depending on exactly when it branched off relative to the deletion event in the common ancestor of rodent-primate-rabbit not shared by artiodactyl-carnivore.

Synapomorphy is one of many learned-sounding terms in newer taxonomic theory that do not convey any meaning per se. however, these terms end up being convenient. It refers to a character value [here an aligned amino acid, elsewhere a bump on a tooth] that occurs only and everywhere on a topological subtree. example: DWEDRY in all rodents is DYEDRY in every other species. The tryptophan at this position is a good synapomorphic character for rodents. But the tyrosine is not, because it does not identify a monophyletic subtree. This is a 2bp change that presumbably passed through cysteine (which may still be present in some pre-Murinid rodents). One could also speak of local synapomorphies, eg at codon 4, serine is diagnostic of hamsters within rodents but not within mammals.

Knockouts vs knockouts?

26 Oct 98 webmaster opinion
Recent papers about knockout mice are split between several finding no ill effects and a few finding minor abnormalities. In either case, there is no support for essentiality of the prion gene, no disease phenotype much less lethality associated with loss-of-function, and thus no explanation for the conservative evolution of the gene. Note transgenic mice expressing PrP with specific amino-proximal deletions develop a neurologic syndrome with ataxia and cerebellar lesions.

How good are the controls for knockout mice? Terrible. No one has ever determined the prion sequence of a real Mus musculus domesticus. A highly inbreed lab mouse bears less of a relationship to a wild housemouse than a toy poodle does to a Canadian wolf.

Could 'wildtype' mice already be knockouts of normal prion function? Suppose the prion gene were essential but inbreeding fixed a bad allele, with a compensatory change in another gene strongly selected by the inbreeding process: all the experiments then compare a point mutation knockout to a deletion knockout.

No one in their right mind would ever use linc [long incubation] mice as controls -- this allele is doubly defective relative to sinc [short incubation]: L108F - T189V. (Linc is thus 3 base changes from sinc: C428T and ACc671-673GTc; the 'missing link' would have ile or ala as transition. Sinc can be shown to be closer to wildtype -- see below.). While linc doesn't cause TSE during mouse lifetimes, TSE is not a disease of normal function. The loci in humans that cause familial CJD are speculated to thermodynamic destabilize native protein, yet these are mainly mild conservative changes compared to linc. (It should not be thought that long incubation times (to scrapie passage) means this protein is 'better' than wildtype; on the contrary, it probably means that it is less like a real prion in structure, hence harder to recruit under the like-like principle of the species barrier. Linc mice are analagous to the many bizarre alleles in sheep prion -- artefacts of animal husbandry.

Is linc is a knockout (or severe setback) of some aspect of normal function? Yes, the severity of the mutations implies this. The argument is threefold quantitative: a residue's functional importance is inversely related to its characteristic rate of evolutionary change; codons 108 and 189 (and surrounding domains) are experiencing exceeding slow rates of change. (The baseline is set in pseudogenes, introns, or intra-gene loop regions of similar base composition not experiencing selection.) Second, statistical measures of the substitutability of one residue for another reflect general design criteria of proteins (roughly PAM or Blosum matrices): if there is to be an accepted change at some codon, then it is far more likely to be certain residues than others. Third, multiple mutations are generally worse than additive in effect.

Assuming for the moment that sinc has full wildtype function, to replace both a leucine with a phenylalanine and a threonine with valine at codon positions with 100 million year scales of invariance is a bit like winning the lottery the same day you shoot a hole-in-one at golf blindfolded. This is the measure of neutrality of linc relative to sinc for retaining the full gamut of normal prion function. [A third strain of mouse was in wide use in the 1980's but has not shown up in sequences since, M133V relative to sinc; similar arguments apply to it. I call it kinc in view of probable induced structural changes.]

Naturally very few papers in the prion literature actually state up front which of the 3x3=9 genotypes of mice were used. A person familiar with the myriad strain names and their histories might be able to work this out in some cases. The key issue may be whether linc mice were derived from sinc or vice versa. The latter scenario means that even though sinc might be closer to wildtype, if it got there through a compensated linc mouse, its knockouts are no better than linc knockouts.

Alternatively, it could be argued that loss of prion function in 'wildtype' is simply not detected under conditions of cage life (or swimming or maze tests). That is, a mouse could be deaf, dumb, lack night vision and olefaction, and roll over for predators -- what does it matter when you are never more than 6 inches from your food bowl? A gene may be essential in the wild but not in the animal room. (E. coli can dispense with hundreds of genes when grown in rich broth.)

Are sinc mice really wildtype?

Fortunately, the wild type sequence of Mus musculus can be reliably predicted while we wait for someone to sequence the prion gene of real wildtype mice. The basic approach is to (1) align mouse prion with rat, gerbil, 3 hamsters, and 2 cotton rats within the Murinae, (2) clamp to the reliably dated, known topology, (3) use, in effect, an ancestor of primates reconstructed from the dozens of sequences as outgroup, (4) apply marsupial, bird, and ferungulate prion sequences as more remote outgroups and quantitate rates of change and acceptable substitutions at relevent positions in sinc mouse.

Forgetting now the differences between sinc, linc, and kinc mice, let us concentrate on dubious variations mice have relative to what is known about this protein from its evolution. One sees immediately mice have too many changes:

Key: Rodent sequences are globally colored by tree topology. Internal magenta highlights recent synapomorphies and plesiomorphies; internal yellow suppresses singlets. Uncertainty in consensus sequences indicated by lower case.
*Ferungulate, marsupial and bird lines show only unambiguous residues relevent to rodent issues; indels are suppressed to hold the alignment flat.

Note Clade affected Mutant Conservation Comment
1 mouse+rat+gerbil A14T placental stable codon
2 mouse-rat+gerbil T15M MT toggle M occurs sporadically
3 mouse only G55del mammal repeat region
4 mouse only D72S mammal repeat region
5 mouse only G80S mammal repeat region
6 mouse+rat M109L placental L in marsupial too
7 rodents-(g.pig) Y145W non-rodents 2bp synapomorphy
8 rodents -127D non-rodents pre-GPI anchor
9 mouse only -A234ST mammal post GPI-anchor

Changes in the signal and GPI region are major:

Are the changes in the'wildtype' mouse signal region and the chaos near the site of GPI attachment enough throw off these processes, resulting in a mix of GPI-attached, transmembrane terminal peptide, and extracellular-released? This could result in effects on normal function via distribution even though these domains do not appear per se in mature protein.

This is difficult to assess because sophisticated identification algorthms [eg Psort] don't like any rodent signal peptides. However, using a virtual chimera of human signal shows that mouse gpi attachment is still expected. The hypervariable region surrounding the GPI join has no good explanation; the 3' terminus itself shows extraordinary conservation. Asp preceding the join, DGRRS-ss appears to be a very old insertion in rodents. On top of this, mouse has a slippage insert with terminal point mutation immediately after the GPI. One might suppose that the GPI splice signature would be critical and conserved but instead it is one of the most variable regions in the whole protein.

Changes in the repeat region could disturb mouse prion function:

Mouse prion has three very unusual changes in this region. The first shortens the first repeat to 8 residue [seen in new world monkeys as well], the second substitutes a serine for glycine adjacent to the tryptophan of the third repeat, which is then iterated in the fourth repeat. These two changes are probably related to a single slippage even though the serine codons are different at third position. Note rats -- also highly inbred-- show two idiosyncratic serines in the repeat region at different sites.

While serines are tolerated sporadically at the first and final repeats, it is precisely these serine substitutions at the second and third repeat that are unprecedented and quite possibly enough to knock out or disrupt the structure/function of the repeat region. Mouse prion is a very poor place to study copper and zinc binding to this region because of these unique serine substitutions; most studies fortunately have been done with PHGGGWQG repeats (general mammal).


When the functions of the prion protein finally become measurable, it will be interesting to see if the mouse prion is fully working. Knockout mice are simply not persuasive at this point (and golden hamster would not be much better) because there is no evidence that normal function is retained by lab mice controls. It is high time that real wildtype mouse was sequenced
Non-Redundant rodent prion protein resouce in fasta format:

Phenotypic variability of GSS disease is associated with prion protein heterogeneity

J Neuropathol Exp Neurol 1998 Oct;57(10):979-88 
Piccardo P, Dlouhy SR, Lievens PM, Young K, Bird TD, Nochlin D, Dickson DW, Vinters HV, Zimmerman TR, Mackenzie IR, Kish SJ, Ang LC, De Carli C, Pocchiari M, Brown P, Gibbs CJ Jr, Gajdusek DC, Bugiani O, Ironside J, Tagliavini F, Ghetti B
Gerstmann-Straussler-Scheinker disease (GSS), a cerebello-pyramidal syndrome associated with dementia and caused by mutations in the prion protein gene (PRNP), is phenotypically heterogeneous. The molecular mechanisms responsible for such heterogeneity are unknown. Since we hypothesize that prion protein (PrP) heterogeneity may be associated with clinico-pathologic heterogeneity, the aim of this study was to analyze PrP in several GSS variants. Among the pathologic phenotypes of GSS, we recognize those without and with marked spongiform degeneration. In the latter (i.e. a subset of GSS P102L patients) we observed 3 major proteinase-K resistant PrP (PrPres) isoforms of ca. 21-30 kDa, similar to those seen in Creutzfeldt-Jakob disease. In contrast, the 21-30 kDa isoforms were not prominent in GSS variants without spongiform changes, including GSS A117V, GSS D202N, GSS Q212P, GSS Q217R, and 2 cases of GSS P102L.

This suggests that spongiform changes in GSS are related to the presence of high levels of these distinct 21-30 kDa isoforms. Variable amounts of smaller, distinct PrPres isoforms of ca. 7-15 kDa were seen in all GSS variants. This suggests that GSS is characterized by the presence PrP isoforms that can be partially cleaved to low molecular weight PrPres peptides.

Comment (webmaster): Two of these mutations are apparently new and not on Medline. One presumes they turned up during screening of GSS patients. People should stop calling GSS a disease or just pick one genotype for it. I favor getting rid of both FFI and GSS and just sticking with 'CJD D202N M129M' or whatever. GSS is a subset of CJD with no deep underlying definition or common ground -- no wonder there is paper after paper wrestling with 'phenotypic variability.'

Both D202N and Q212P are found in alpha helix 3 in the mouse and hamster nmr structures. D202 is an invariant residue in mammals (but glutamate in birds) just past the 2nd glycosylation site, hydrogen bonded to Y149, Y157, T199, and T199 amide. Q212 is also strongly invariant (but deleted in birds) just prior to the second cysteine of the disulphide and hydrogen bonded to T216

Better3D blow-ups will be posted shortly. There is also a whole long story to about how hydrogen bond acceptors cannot be replaced by donors or non-acceptors or donors by acceptors or non-donors etc. etc. even though under other circumstances these mutations might be conservative. There are applications to sheep allele hazards and to whether any of the lab mice strains have normal functioning prion protein. E200K R208H V210I Q217R M232R are the other known mutations in this vicinity .



Hydrogen bond donors and acceptors

 Adapted from 27 July 98 PNAS Riek et al
Prion hydrogen bonds from NMR structure of mouse
(CJD mutation codons shown in color)
a Tyr-128 H sidechain O Asp-178 11
b Asn-143 H sidechain O Glu-146 16
c Tyr-149 H sidechain O Asp-202 16
d Tyr-150 H backbone O Pro-137 17
e Arg-151 H sidechain O Glu-152 8
f Asn-153 H backbone O Tyr-149 18
g Arg-156 H sidechain O Glu-196 13
H Tyr-157 H backbone O Asp-202 7
i Gln-160 H backbone O Gly-131 19
j Tyr-162 NH sidechain O Thr-183 12
k Arg-164 H sidechain O Asp-178 7
l Asn-174 H backbone O Asn-171 12
m? Thr-183 OH backbone O Cys-179 6
n Thr-188 OH backbone O Ile-184 12
o Thr-191 OH backbone O His-187 8
p Thr-192 OH backbone O Glu-196 7
q Lys-194 H sidechain O Glu-196 7
r Thr-199 NH sidechain O Asp-202 17
s Thr-199 OH sidechain O Asp-202 14
t Asp-202 NH sidechain O Thr-199 15
u Thr-216 OH backbone O Gln-212 10
v Gln-217 H backbone O Ala-133 8



Nematode genome 85% complete: no prion yet

WUSTL and the Sanger Centre have finished sequencing 85,341,695 bases of the 100 Mb Caenorhabditis elegans genome (make that 86,572,592 as of 25 Nov 98)
Comment (webmaster):

The prion gene has been tracked back 410 million years to the fish-mammal divergence using antibody 3F4 to the core invariant epitope. Yet there is no sign of the gene earlier in yeast, fruit fly, or nematode.

Here is the hnRNP gene product, still the best Blast hit to prion protein, found long ago by hybridization. Note that its terminal repeat does bear an uncanny resemblance, in composition and residue order, to the prion protein octarepeat (and also to the yeast sup35 prionlike protein repeat.

Yeast Sup35:
Also, RNA polymerase II has a very similar repeat to bird prion, YSPTSPS in later eucaryotes, YSPASPA in Mastigamoeba.

cDNA cloning of a novel heterogeneous nuclear ribonucleoprotein gene homologue in Caenorhabditis elegans using hamster prion protein cDNA as a hybridization probe.

Iwasaki M, Okumura K, Kondo Y, Tanaka T, Igarashi H
Nucleic Acids Res 1992 Aug 11;20(15):4001-7   
...The evolutionary conservation of the PrP gene has been reported in the genomes of many vertebrates as well as certain invertebrates. In the genome of nematode Caenorhabditis elegans, the sequence capable of hybridizing with the mammalian PrP cDNA probe has been demonstrated, predicting the presence of the PrP gene homologue in C.elegans. In this study, Southern analysis with the hamster PrP cDNA (HaPrP) probe confirmed the previous observation. Moreover, Northern analysis revealed that the sequence is actively transcribed in adult worms.

Thus, we screened C.elegans cDNA libraries with the HaPrP probe and isolated a cDNA that hybridizes to the same sequence in C.elegans that hybridized with the HaPrP probe in the Southern and Northern analyses. The deduced amino acid sequence of this cDNA, however, is substantially homologous with heterogeneous nuclear ribonucleoprotein (hnRNP) core proteins rather than mammalian PrPc. The hnRNPs contain the glycine-rich domain in the C-terminal half of the molecule, which also seemed to be in PrPc at the N-terminal half of the molecule. Both of the glycine-rich domains are composed of tracts with high G + C content, indicating that these tracts may [cause] the hybridizing signals. These results suggest that this cDNA clone is derived from a novel hnRNP gene homologue in C.elegans but not from a predicted PrP gene homologue.

Triplet repeat diseases: innocent inclusions?

Katrina L. Kelner opinion piece
Science 22 Oct 98
" In a curious set of neurodegenerative diseases, a long string of the nucleotide triplet CAG lodges within genes, causing the death of subsets of neurons and ultimately disease. Exactly how these strings of repeats cause cell death is not known, but they do not simply disrupt the function of their target gene. Rather, the long CAG string has a deadly--but undefined--effect of its own.

One popular idea is that the CAG repeats cause the protein to form a toxic aggregate in the nucleus of cells. These so-called nuclear inclusions are common in the brains of patients with these disorders. But in two recent papers in Cell, this explanation is called into question. One group shows, in a cultured cell model system for Huntington's disease (F. Saudou et al., Cell 95, 55 1998), that cells may die even without the presence of nuclear inclusions. In the most dramatic experiment, expression of a fragment of the mutant huntingtin protein containing a 68-repeat insertion, together with an inhibitory form of the ubiquitin-conjugating enzyme, resulted in far fewer intranuclear inclusions. The mutant huntingtin actually triggered more cell death in this situation than it would have in the presence of inclusions, leading the authors to the bold suggestion that the inclusions may actually be protective.

A second group made transgenic mice that mimicked the disorder spinocerebellar atrophy type 1 (A. Klement et al., ibid., p. 41.), in which the repeat-containing protein ataxin-1 lacked a self-aggregating region. These mice had no nuclear inclusions, but still showed the characteristic degeneration of cerebellar Purkinje cells. The field may now have to look elsewhere for the mechanism by which these repeats do their damage to the cell."

Comment (webmaster): While these two Cell papers should be taken seriously (not forgetting the large literature on these diseases pointing in the other direction), we have seen similar errors of interpretation many times in CJD. It is impossible to show absence of aggregates, only non-detectibility up to the sensitivity of whatever methods used. Many other effects also come into play in transgenic mutants when using proteins of unknown function and neuropathological phenotyping.

Speaking of ontogeny recapitulating phylogeny, here is the phylogenetic version of repeat disease anticipation:

Evolution of the primate androgen receptor: A structural basis for disease.

Choong CS, Kemppainen JA, Wilson EM
J Mol Evol 1998 Sep;47(3):334-42 
Comparison of androgen receptor from five primate species, human, chimpanzee), baboon, macaque) and collared brown lemur supports their phylogeny with complete conservation of the DNA and steroid binding domain protein sequence. A linear increase in trinucleotide repeat expansion of homologous CAG and GGC sequences occurs in the NH2-terminal transcriptional activation region and is proportional to the time of species divergence.

A serine phosphate/glutamine repeat interaction is observed where increasing CAG repeat length is associated with an increased rate of serine 94 phosphorylation. Disparity in the calculated and apparent molecular weight with CAG repeat expansion of an AR NH2-terminal fragment suggests self-aggregation with increasing glutamine repeat length into the pathological range. These results suggest that a CAG/glutamine repeat expanded during divergence of the higher primate species, which may have a direct effect on AR structure and support a common pathway in CAG trigenic diseases in the pathophysiology of neurodegeneration observed in X-linked spinal bulbar and muscular atrophy.

CpG codon depletion

21 Oct 98 webmaster
Human codon use per ten thousand codons, based on 7,168,914 codons from 14,529 proteins.
TTT F 164  TCT S 143  TAT Y 122  TGT C  97
TTC F 209  TCC S 177  TAC Y 167  TGC C 127
TTA L  67  TCA S 112  TAA *  06  TGA *  12
TTG L 118  TCG S  44  TAG *  05  TGG W 132

CTT L 121  CCT P 172  CAT H  99  CGT R  47
CTC L 194  CCC P 203  CAC H 148  CGC R 110
CTA L  65  CCA P 165  CAA Q 117  CGA R  61
CTG L 399  CCG P  70  CAG Q 342  CGG R 115

ATT I 157  ACT T 127  AAT N 168  AGT S 115
ATC I 228  ACC T 204  AAC N 207  AGC S 193
ATA I  69  ACA T 147  AAA K 234  AGA R 111
ATG M 224  ACG T  65  AAG K 334  AGG R 110

GTT V 106  GCT A 184  GAT D 222  GGT G 110
GTC V 151  GCC A 288  GAC D 267  GGC G 235
GTA V  67  GCA A 155  GAA E 283  GGA G 167
GTG V 294  GCG A  75  GAG E 406  GGG G 167
Mouse codon use per ten thousand codons, based on 3,403,144 codons from 7,272 proteins. 

TTT 158  TCT 154  TAT 120  TGT 109
TTC 214  TCC 180  TAC 172  TGC 129
TTA  59  TCA 111  TAA  06  TGA  12
TTG 122  TCG  45  TAG  05  TGG 130

CTT 121  CCT 187  CAT  98  CGT  48
CTC 193  CCC 191  CAC 151  CGC 100
CTA  74  CCA 174  CAA 118  CGA  65
CTG 387  CCG  69  CAG 342  CGG 102

ATT 146  ACT 133  AAT 157  AGT 120
ATC 228  ACC 199  AAC 217  AGC 198
ATA  66  ACA 157  AAA 217  AGA 115
ATG 223  ACG  61  AAG 348  AGG 115

GTT 101  GCT 198  GAT 216  GGT 120
GTC 157  GCC 265  GAC 276  GGC 231
GTA  69  GCA 153  GAA 269  GGA 179
GTG 290  GCG  71  GAG 398  GGG 161
Cow codon use per ten thousand codons, based on 528,7904 codons from 1,277 proteins.

TTT 162  TCT 126  TAT 116  TGT  96
TTC 243  TCC 175  TAC 195  TGC 138
TTA  52  TCA  94  TAA  07  TGA  12
TTG 111  TCG  46  TAG  05  TGG 138

CTT 109  CCT 149  CAT  81  CGT  42
CTC 206  CCC 206  CAC 146  CGC 111
CTA  54  CCA 143  CAA 100  CGA  56
CTG 426  CCG  76  CAG 325  CGG 111

ATT 150  ACT 116  AAT 151  AGT  97
ATC 259  ACC 217  AAC 230  AGC 185
ATA  66  ACA 136  AAA 222  AGA 103
ATG 226  ACG  76  AAG 352  AGG 111

GTT 101  GCT 173  GAT 212  GGT 113
GTC 168  GCC 309  GAC 298  GGC 253
GTA  61  GCA 133  GAA 269  GGA 164
GTG 320  GCG  82  GAG 416  GGG 173

Mad Cow Home ... Best Links ... CGi">Search this site