Review of Data
Align + Diffs
|Tables and graphics|
Table 1: Comparison of notations
Table 2: Comparison of micro-variants
Table 3: Frequency of various deletions
Table 4: Variation in cross-species repeats
Table 5: Repeat region consensus sequence
Table 6: Human insertions
Fig.1: Animation of slippage deletion
| ||The animation shows how DNA slippage during replication of tandem repeats can lead to a repeat deletion. Here repeat Rd in the template slipped back to hybridize with Rc of the nascent chain, leading to a lack of Rd in the final new chain. Insertions differ only in that it is the nascent strand that slips, and the process can be iterated. Static view, all frames.
||The animation shows how DNA slippage during replication of tandem repeats can lead to a repeat insertion. Here repeat Rd in the template slipped back to hybridize with Rc of the template chain, leading to an extra slightly modified Rc' in the final new chain. Insertions differ only in that it is the nascent strand that slips, and the process can be iterated. Static view, all frames.
Numerous deletions and insertions of octapeptide units have been found in the repeat region of the mammalian prion gene. The basic unit of genetic change here is a 24 base pair repeat unit -- variation at a single base pair has not been observed in humans in the repeat region. These indels occur as deleterious mutations causing Creutzfeldt-Jakob Disease (CJD), as neutral polymorphisms, and as phylogenetically fixed changes across whole clades. Single module changes in orangutan, squirrel monkey, artiodactyls, and old world monkeys augment human data.
Iterated DNA slippage during replication is proposed as a unified mechanism for the origin of most of these events. This model predicts, for any species, addtional classes of insertions and deletions while forbidding others. The model further clarifies the issues of ambiguous end points of deletions and wobble heterogeneity in long insertions.
Parsimonious application of the model to mammalian sequence data allows reconstruction of the repeat region sequence at earlier phylogenetic branch nodes. A terminal nonamer in the ferungulate lineage suggests that long-repeat bovine prion has arisen recently from an internal single insertion event. The repeat region as a whole may have been generated by ancient expansions of an upstream domain, illuminating the relationship of marsupial and chicken repeat to eutherians. These results sharpen the prion probe sequence for distant BLAST II homology searches for the currently orphaned prion gene.
The prion gene from the 60-odd mammals sequenced to date contains a repeat region highly similar in structure to that of humans. At the protein level, the consensus sequence is a direct tandem repeat consisting of a nonapeptide followed by three octapepetides, denoted here Ra, Rb, Rc, and Rd, with typical repeat element, PHGGGWGQ, that can be tracked back to the metatherian-eutherian divergence. Strong sequence conservation over a long period suggests a significant function; however, no role has yet been assigned to this domain nor to the prion protein as a whole. The prion gene is an orphan, having no known homologues earlier than teleosts.(salmon )
In fact, the role for this domain may reside mainly in nucleic acid rather than protein. The early part of the repeat domain is associated with an evolutionarily stable helix in mRNA called helix C. In other genes expressed in brain, mRNA structure has been suggested to target the site of translation or regulate its frequenccy. The repeat domain is easily cleaved by proteolysis and is seldom purified with intact protein; theoretical secondary structure prediction and NMR studies in mice indicate no fixed structure for the repeat domain. Silent mutations occur here with markedly reduced frequency compared with downstream domains that are equally well conserved at the amino acid sequence level, supporting selective bias at the nucleotide level (see Fig.9).
A rather striking assemblege of human insertions and deletions -- always comprised of whole 24 base pair modules in the repeat region-- have accumulated from the screening of large numbers of CJD victims, family members, neurological patients, and controls for mutations in the prion gene. The larger insertions (four to nine repeats, sometimes imperfect) are causally associated with CJD, while shorter insertions (two and four) have less pronounced effects, and deletions (one module, three separate locations) are not clearly associated 5with disease. Only one homozygous deletion case is on record 3, in a 33-year old woman with a CJD-compatible dementia; on the other hand, a deleted allele may have ameliorating effects.(ref )Single inserts have not been seen in humans though they would seemingly not be lethal.
Possibly, long-term study of individuals carrying deletions would show a very late onset CJD or carrier status, a concern for blood transfusion recipients. No (immuno)pathology has been reported for individuals carrying only this allele.
Insertions and deletions in this same region have occured and become fixed in other species, most notably a single octamer repeat insertion in the squirrel monkey and a terminal nonamer insertion in the ferungulate lineage. (Some cattle exhibit a single repeat deletion polymorphism; this must be a derived condition because sheep, pigs, goats, ferrets, mink, oryx, camel, kudu, other cattle, deer, and elk prion retain the full complement of repeats.)
Any proposed explanatory mechanism for these mutations must address the issues of modularity, specificity of end points, asymmetry between single deletions and insertions, wobble microheterogenity in some repeats, origin of complex large insertions, and non-observed but seemingly plausible events.
Recombination, in the form of unequal crossing-over, has been put forward by several authors(4, 5, 17, (20 as the mechanism by which insertions and deletions arise in the prion gene, though doubts have arisen as larger and more complex insertions were discovered. A recombination mechanism predicts that single repeat deletions are accompanied by an equal incidence ofreciprocal recombinants (correponding single repeat insertions); however these have never been observed. Recombination also does not account for wobble imperfections in repeats -- this would make recombination a potent point mutagen throughout the human genome since repeats are found in many genes, whereas recombination is usually viewed as a DNA repair and gene shuffling mechanism of high fidelity. Nonetheless, unequal crossing-over is capable of producing some observed products and may be responsible for a portion of them.
DNA slippage during replication has been proposed(23-33) for mutational change in other genes with repeat regions, as well as in an evolutionary context, (34-40) and is proposed applicable to the prion gene here. Oron-Karniet et al.(23 ) described the mechanism as:
... the mutation arose by slipped strand mispairing, creating a single-stranded loop, followed by DNA elongation, strand breathing and the formation of a mismatch bubble. An extensive literature search has revealed six additional deletion/insertion mutations in humans in which the inserted nucleotides come from the same DNA strand. Our model explains all six mutations, suggesting that rearrangement of a mismatch loop or bubble during DNA replication may be not uncommon.Asymmetry between insertions and deletions arises from unequal roles of the template and nascent DNA strand in replication. As with recombination, the basic factor giving rise to modularity is slipped hybridization of a DNA strand from a later repeat to that of an earlier one, enabled by tandem repeated sequences. The imperfect fidelity of some insertional repeats arises (in the model) not from polymerase copy error or acceptance of a wobble pair, but from incomplete 3' exonuclease editing. As will be seen, large complex insertions can arise from multiple slippage events within a single genetic epsiode, and need not be multi-generational sequential events. A single simple constraint on iterated slippage allows all observed insertions, predict further possibilities, and forbids many others.
1. Some authors number outwards from the first octamer, not counting the very similar nonamer that precedes them. This is unsatisfactory because in species such as rabbit or mouse, the nonamer has a point deletion. This notation is also confusing relative to the more popular notation that starts with the nonamer.
2. Most authors use the numbering system of Goldfarb et al(20. This counts the repeats 5' to 3' based on human as R1 R2 R2 R3 R4. In humans, the second and third repeat are, in fact, identical at both the DNA and protein sequence level. Today, with more sequences in hand, we see that R2 R2 is not satisfactory for many species for either nucleotide or amino acid sequence; the number throws off comparisons. This system is so widely used that recalibrating here to R1 R2 R3 R4 R5 would cause considerable confusion with existing research. Because of the importance of bovine, ovine, and other TSEs, notation needs to work beyond human CJD. The notation also does not recognize partial identity in adjacent repeats, which can be adequate to support slippage(5 .
3. Goldfarb et al (20) also established a sensible notation for extra repeats that depart from the given repeat at a single nucleotide, eg R2a, R2c, and R3g. (However other authors use R2' for R2a and R3' for R3g.(18, 19) Here, R3g differs from R3 by a T-to-G change in the third position of the seventh codon of R3. The notation is meant to imply that, say R3g, via the mechanism of its formation, is consistently a direct variant of R3. The problem here is that notation has gotten ahead of evidence of mechanism: R3g is equally one base change away from R2, or R2a is equally one base change from R3 or two from R4. (Note that interpreting R3g as an R2 variant would bring the distinct five insertional families into better agreement (21). If more variants are found, the risk of redundancy arises. The notation also has an implicit reliance on parsimony, while the differences involved are very small:
For these reasons, the notation used here is Ra, Rb, Rc, Rd, and Re for human repeats and their eutherian homologues, in 5' to 3' order. This notation is species-independent and should be stable to new sequences since it represents the consensus ancestral sequence. Also included is Rfrag, a terminal tri-glycine repeat fragment that is best taken as part of this repeat domain. Naming of the variants should not be model-dependent, I take them here as V1, V2, V3, and V4.
|Notation used here:||Other Notations|
|Ra||cct cag ggc ggt ggt ggc tgg ggg cag||(R1)|
|Rb||cct cat ggt ggt ggc tgg ggg cag||(R2)|
|Rc||cct cat ggt ggt ggc tgg ggg cag||(R2)|
|Rd||ccc cat ggt ggt ggc tgg gga cag||(R3)|
|Re||cct cat ggt ggt ggc tgg ggt caa||(R4)|
|Rfrag||-- -- gga ggt ggc -- -- --||-|
|Variant||Human Sequence||Goldfarb (20), Owen19), van Gool(18)|
|V1||cct cat ggc ggt ggc tgg gga cag||(R2a or R2')|
|V2||cct cat ggt ggt ggc tgg ggg cag||(R2c)|
|V3||ccc cat ggt ggt ggc tgg gga cag||(R3g or R3')|
|Deletion||start and stop range*||Palmer(5)|
|del A||codons 54 to 82||(R2-R2, upstream of codon 76)|
|del B||codons 69 to 76||(R2-R3, upstream of codon 76)|
|del C||codons 83 to 91||(R3-R4, downstream of codon 76)|
*Deletions can only be assigned to an ambiguity class (Fig. 4) from sequencing because of repetitive blocks.(5 )Microheterogeneity, assumption of complemetary strand pairing, and parsimony define outer boundaries here. Some authors only determined approximate sizes and positions of deletions relative to the Ncol restriction site at codon 76. The data are adequate to conclude no double deletions have been detected and that the relative frequency of occurrence of single deletions is del C >> del B > delA.
Deletion end points in the slippage model vary according to the extent of 3' exonuclease excission during mismatch correction: the model suggests that 5' to 3' synthesis renews as soon as the last mismatch is corrected. This creates a hybrid repeat as end product: the 5' proximal part of the earlier repeat fused with the 3' end of the later repeat. The ambiguity zones for the three observed classes of human single repeat deletions, along with model-prefered end points are shown in Fig. 4 below.
|Figure 4. Deletions of 24 bp|
within an ambiguity zone
have the same resultant sequences
and so end points are not distinguishable.
Ra-------------------------------Ra Rb---------------------------Rb Rc---------------------------Rc Rd- pro gln gly gly gly gly trp gly gln pro his gly gly gly trp gly gln pro his gly gly gly trp gly gln pro 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 CCT CAG GGC GGT GGT GGC TGG GGG CAG CCT CAT GGT GGT GGC TGG GGG CAG CCT CAT GGT GGT GGC TGG GGG CAG CC Rd---------------------------Rd Re---------------------------Re R1/2----R1/2 pro his gly gly gly trp gly gln pro his gly gly gly trp gly gln gly gly gly 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 CCC CAT GGT GGT GGC TGG GGA CAG CCT CAT GGT GGT GGC TGG GGT CAA GGA GGT GGC
Distinguishing deletions: Any deletion of 24 consecutive bases within an ambiguity zone produces the same final sequence. The source of DNA in the final product thus can differ, depending, in the model, on the the slipped hairpin and 3' exonuclease activity. The two extremes are shown below: the left-most and right-most start points for the deletion:
Del A: resultant sequence after deletion of any 24 consecutive bases from codons 54 to codon 82. Ra/b Rc Rd Re: CCT CAG GGC GGT GGT GGC TGG GGG CAG CCT CAT GGT GGT GGC TGG GGG CAG CCC CAT GGT ... Ra Rb Rc/d Re: CCT CAG GGC GGT GGT GGC TGG GGG CAG CCT CAT GGT GGT GGC TGG GGG CAG CCC CAT GGT ... Del B: resultant sequence after deletion of any 24 consecutive bases from codons 69 to codon 76: Ra Rb Rc/d Re: CCT CAT GGT GGT GGC TGG GGA CAG Ra Rb Rc/d Re: CCT CAT GGT GGT GGC TGG GGA CAG Del C: resultant sequence after deletion of any 24 consecutive bases from codons 83 to codon 91: Ra Rb Rc/d Re: CCC CAT GGT GGT GGC TGG GGT CAA Ra Rb Rc/d Re: CCC CAT GGT GGT GGC TGG GGT CAA
|Frequency reported for various deletions.|
|Palmer (5)||del A||1||737||careful sequencing methods, re-did N. African cases|
|Perry (10)||del A||3||121||high quality SSCP sequency method S80743|
|Pocchiari (6)||del A||9 (5)||4||other allele to a codon 210 CJD case|
|Palmer (5)||del B||1||737||only documented case of this deletion|
|Vnecak-Jones (12)||del A or del B||1||120||second deletion type found in control|
|Masullo (13)||del A or del B||1||30||met/met atypical dementia, homozygous deletion|
|Vnecak-Jones (12, 4)||del C||5||120||Tennessee population, codon 178 proband|
|Puckett (1)||del C||2||30||HeLa cell line and CDNA library, X83416|
|Laplanche (2, 5 )||del C||3||5||1 family, downstream of a Ncol site|
|Bosque (4, 12)||del C||3||7||in one family with D178N|
|Palmer (5, 22 )||del C||6||737||ambiguity noted of start point|
|Diedrich (3)||del C||1 (1)||6 (6)||dementia cases; GenBank M81929|
|Perry (10)||del C||12||121||found in FAD + PD family, S80732|
|Salvatore (7)||del C||2||217||downstream of NCol in 2 controls, no sporadics|
|Reder (11)||del C||1||3||family distantly related to Bosque's family|
|Brown (15)||del C||2||136||found among 26 iatrogenic cases, 110 controls|
|Brown (unpub)||del C||6||||unpublished: kuru, 5 families with ataxic illness|
|Windl (16)||none||0||120||UK CJD referals, allele-specific hybridization|
|Owen (40)||none||0||101||methods could have detected deletions|
|Goldfarb (20)||none||0||||methods focused on insertions, 10 deletions expected|
|Totals**||all dels||53||1934||2.1-2.7% of the populations studied had some deletion**|
* Numbers in parentheses adjust for multiple family members with the same inherited deletion.
** Using this data at face value and assuming Hardy-Weinberg equilibrium, about one Caucasian in ten thousand is homozygous for the most comon deletion, del C. Because a single mutation may show up in distant unknown family kinships, becasue fo adoption and unknown parentage, because studies generally screened mainly neurological patients, and because controls were not always randomly selected, this and other frequencies are strictly upper limits. There is also sequencing uncertainty between del A and delB in some reports and other authors omit data on end points and only estimate the size of deletion, simply locating deletions with respect to restriction sites of NCol relative to codon 76.
Del C is the only deletion or insertion to attain the status of common polymorphism. The issue at a population level is to decide whether, say del C, is a deletion hot spot created over and over in independent events because of the inherent proclivity of the DNA to form hairpins at the end of the repeat domain, or whether more exhaustive geneological study would show founder events are very rare but have become common alleles through neutral drift or some unknown selective bias. The occurence of similar recent deletions in other species, the unusual and lethal long insertions, and similar patterns in unrelated genes with repeads argue for an intrinsic instability in such DNA.
|Figure 1 legend: DNA repeats: streak graphic.|
DNA repeats in octapeptides of normal human prion.
The sequence is placed on horizontal and vertical axes and compared to itself, receiving a white dot if there is a match. (This means that the diagonal is always a perfect match.) There is no new information below the diagonal because the matching process is symmetric. A moving convolution window of width 12 (four codons) removes clutter due to insignificant repeats.
The three observed classes of deletions show up clearly on the intermittent line just above the diagonal. If slippage must be to the first adjacent region of significant complementarity, then del A, del B, and del C represent a complete set of allowable deletion classes. The extended streak for del A corresponds to the extended ambiguity zone first noted by Palmer et al. for deletion resulting in a 1234-pattern.
Streak lines above the adjacency repeat line predict 6 additional deletion classes, corresponding to deletions to multiples of 24bp. These may be denoted as Ra/c, Rb/d, and Rc/e; Ra/d, Rb/e; and Ra/e which would result in final fusion repeats 1-3 45, 1 2-4 5, 12 3-5; 1-4 5, 1 2-5; 1-5, repectively. Slippage here must occur past the first region of possible complementarity to earlier regions; this has not been observed as yet.
A similar graphic quickly predicts deletion patterns in any species.
|Variation in repeat regions|| Table 4: Variation in cross-species repeats|
Human prion repeat protein was used as a BLAST2 probe against a non-redundant set of public databases on August 11, 1997 using default settings. Homologous returned sequences (all prion orthologues) were parsed into individual repeat regions and sorted to eliminate redundancy.
Terminology for repeat units from N to C is given in Column 1. Column 2 shows the number of species having a particular variant. Column 3 provides a proxy for species having the variant whose sequence is given in one-letter amino acid code in Column 4.
The nonamer repeat Ra shows the most variants with eight. The sequence pqggggwgq is by far the most common and is suggested (with alternative pqgggt-wgq) to be ancestral by the marsupial sequence, pqgggtnwgq. Three point deletions occur in conjunction with various substitutions. Positions 1,2, 4, 7,8,9 are invariant. The kudu sequence tra.str is a weakly documented allele (ref ) with two surprising changes that may be attributable to cDNA error.
The second repeat Rb is invariant as phgggwgq at the protein level in 75 species; marsupial has phpggsnwgq however. The third repeat Rc does not match Rb in three eutherian species even at the amino acid level, so this feature of the human sequence (which holds also at the DNA level) cannot be an required structural feature s. However, the slippage model, applied to an earlier era before the modern repeat structure was established, predicts ancient slippage upstream generated Ra, which in turn generated Rb-Re.
The fourth and fifth repeats Rd, Re have minor point variation; the consensus sequence agrees with Rb and Rc. Mice and rats have an unusual serine preceding the tryptophan in both Rc and Rd.
The fifth repeat Re has some interesting aspects. A fourth internal glycine occurs consistently in the ferrungulate clade (affecting artiodactyls, cetaceans, carnivores, and perissodactyls). This nonamer has no connection to Ra at the DNA level and probably represents a single micro-slippage event in a common ancestoral sequence.
Certain placental mammals, most notably a bovine group, lama, giraffe, and squirrel monkey, have six repeats due to insertion events. These are similar to human insertion alleles and for alignment purposes must be considered on a case by case basis using microheterogenity in the DNA to pinpoint where the insertion occurred. Camel may have had an insertion followed by a deletion of the terminal nonamer. A single unit has been deleted in the black-handed spider monkey, Ateles geoffroyi, and five old world monkeys. Schaetztl (9 ) also reported an octamer deletion polymorphism in two of five orangutans.
The most common allele in the bovine has six repeats; however, this is a fairly recent fixed insertion mutation. The five repeat polymorphism is the artiodactylan and mammalian norm. It may represent persistence of a more ancestral allele.
1 11 21 31 41 51 1 MANLGCWMLV LFVATWSDLG LCKKRPKPGG WNTGGSRYPG QGSPGGNRYP PQGGGGWGQP 60 61 HGGGWGQPHG GGWGQPHGGG WGQPHGGGWG QGGGTHSQWN KPSKPKTNMK HMAGAAAAGA 120 121 VVGGLGGYML GSAMSRPIIH FGSDYEDRYY RENMHRYPNQ VYYRPMDEYS NQNNFVHDCV 180 181 NITIKQHTVT TTTKGENFTE TDVKMMERVV EQMCITQYER ESQAYYQRGS SMVLFSSPPV 240 241 ILLISFLIFL IVG atggcgaacct tggctgctgg atgctggttc tctttgtggc cacatggagt gacctgggcc tctgcaagaa gcgcccgaag cctggaggat ggaacactgg gggcagccga tacccggggc agggcagccc tggaggcaac cgctacccac ctcagggcgg tggtggctgg gggcagcctc atggtggtgg ctgggggcag cctcatggtg gtggctgggg gcagccccat ggtggtggct ggggacagcc tcatggtggt ggctggggtc aaggaggtgg cacccacagt cagtggaaca agccgagtaa gccaaaaacc aacatgaagc acatggctgg tgctgcagca gctggggcag tggtgggggg ccttggcggc tacatgctgg gaagtgccat gagcaggccc atcatacatt tcggcagtga ctatgaggac cgttactatc gtgaaaacat gcaccgttac cccaaccaag tgtactacag gcccatggat gagtacagca accagaacaa ctttgtgcac gactgcgtca atatcacaat caagcagcac acggtcacca caaccaccaa gggggagaac ttcaccgaga ccgacgttaa gatgatggag cgcgtggttg agcagatgtg tatcacccag tacgagaggg aatctcaggc ctattaccag agaggatcga gcatggtcct cttctcctct ccacctgtga tcctcctgat ctctttcctc atcttcctga tagtgggatg a PQGGGGWGQ PHGGGWGQ PHGGGWGQ PHGGGWGQ PHGGGWGQ GGG: DNA for normal 123 bp or 41 aa human repeat region: nucl: 151-274 or codons 51-91 cct cag ggc ggt ggt ggc tgg ggg cag cct cat ggt ggt ggc tgg ggg cag cct cat ggt ggt ggc tgg ggg cag ccc cat ggt ggt ggc tgg gga cag cct cat ggt ggt ggc tgg ggt caa gga ggt ggc
Repeat regions in other species: amino acid level camel missing gly cow long squirrel monkey long ate.geo; cer.xxx, the.gel mac.syl all short dwa.goa pqggg-gwg.phgggwgq.phgggwgq.phgggwgq .phggggwgq artiodactyl ory.leu pqggggwgq.phgggwgq.phgggwgq.phgggwgq .phggggwgq artiodactyl Tra.str sqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq .phggggwgq artiodactyl Tra.str pqeggdwgq.phgggwgq.phvggwgq.phgggwgq .phggggwgq artiodactyl Bos.tau pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq .phggggwgq artiodactyl Cam.dro pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq artiodactyl Cer.ela pqggggwgq.phgggwgq.phgggwgq.phgggwgq .phggggwgq artiodactyl Bos.tau pqggggwgq.phgggwgq.phgggwgq.phgggwgq .phggggwgq artiodactyl Cap.hir pqggggwgq.phgggwgq.phgggwgq.phgggwgq .phggggwgq artiodactyl Ovi.ari pqggggwgq.phgggwgq.phgggwgq.phgggwgq .phggggwgq artiodactyl Cer.ela pqggggwgq.phgggwgq.phgggwgq.phgggwgq .phggggwgq artiodactyl Odo.hem pqggggwgq.phgggwgq.phgggwgq.phgggwgq .phggggwgq artiodactyl Sus.scr pqggggwgq.phgggwgq.phgggwgq.phgggwgq .phggggwgq artiodactyl Mus.put pqggggwgq.phgggwgq.phgggwgq.phgggwgq .phggggwgq carnivora Mus.vis pqggggwgq.phgggwgq.phgggwgq.phgggwgq .phggggwgq carnivora rat.nor phggg-wgq.phgggwgq.phgggwgq.phgggwsq rodent gol.ham phggg-wgq.phgggwgq.phgggwgq.phgggwgq rodent Cri.gri pqgggtwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq rodent Cri.mig pqgggtwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq rodent Mes.aur pqgggtwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq rodent Mus.mus pqgg-twgq.phgggwgq.phggswgq.phggswgq.phgggwgq rodent Rat.nor pqsggtwgq.phgggwgq.phgggwgq.phgggwgq.phgggwsq rodent Rat.rat pqsggtwgq.phgggwgq.phgggwgq.phggg-gq.phgggwsq rodent Ory.cun pqggg-wgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq lagomorph Sai.sci pqggg-wgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, nw Ate.geo pqggg-wgq.phgggwgq.phgggwgq.phgggwgq monkey, nw Ate.pan pqggg-wgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, nw Cal.jac pqggg-wgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, nw Ceb.ape pqggg-wgq.phgggwgq.phgggwgq.phggswgq.phgggwgq monkey, nw Aot.tri pqsgg-wgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, nw Cal.mol pqgggswgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow Cer.ate pqggggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow Cer.tor pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow Cer.dia pqggggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow Cer.mon pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow Cer.neg pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow Cer.pat pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow The.gel pqggggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow Man.sph pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow Pre.fra pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow Mac.arc pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow Mac.fas pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow Mac.fus pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow Mac.mul pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow Mac.nem pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow Mac.syl pqggggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow Pap.ham pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow Col.gue pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow Gor.gor pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow ape Hom.sap pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow ape Pan.tro pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow ape Hya.lar pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow ape Hyl.syn pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow ape Pon.pyg pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow ape Sym.syn pqggggwgq.phgggwgq.phgggwgq.phgggwgq.phgggwgq monkey, ow ape
Assuming contiguity of repeated stretches and parsimony based on slight sequence variations of human Rd and Re, the complex longer repeats can be interpreted in terms of wild-type repeat units. Doing so results in the table below.
|Base change differences|
Saimiri sciureu (squirrel monkey) Primates; Platyrrhini; Cebidae; Cebinae, Saimiri.The prion gene from two different animals were sequenced by different groups with accession numbers U15165 (Cervenakova) and U08310 (Schaetzl). At the protein level, the latter sequence has an extra copy of its third repeat [below] and changes of tyr to cys at codon 6 and lys to arg at codon 163. At the DNA level there are further silent changes of A to C preceding the trp codon in the penultimate repeat, and T to G at bp 606.
CLUSTAL W (1.7) multiple sequence alignment of squirrel monkey Rc CCCCATGGTGGCGGCTGGGGACAG Rd CCCCATGGTGGCGGATGGGGACAG Rc U08310 CCCCATGGTGGCGGCTGGGGACAG Rd U08310 CCCCATGGTGGCGGCTGGGGACAG Re U08310 CCCCATGGTGGTGGCTGGGGACAG Ra CCCCAGGGTGGTGGCTGGGGGCAG Rb CCTCATGGTGGTGGCTGGGGGCAA Re CCTCATGGTGGCGGCTGGGGTCAA Ra U08310 CCCCAGGGTGGTGGCTGGGGGCAG Rb U08310 CCTCATGGTGGTGGCTGGGGGCAA Rf U08310 CCTCATGGTGGCGGCTGGGGTCAA invariant pos: ** ** ***** ** ***** **
Pongo pygmaeus orangutan U08305
PQGGGGWGQ PHGGGWGQ PHGGGWGQ PHGGGWGQ PHGGGWGQ GGG Ambiguity zones: cct cag ggc ggt ggt ggc tgg ggg cag cct cat ggt ggt ggc tgg ggg cag cct cat ggt ggt ggc tgg ggg cag ccc cat ggt ggt ggc tgg ggg cag cct cat ggt ggt ggc tgg ggt caa gga ggt ggt Signatures for distinguishable orangutan 24bp deletions: del A (pon.pyg) cct cat ggt ggt ggc tgg ggg cag del B (pon.pyg) ccc cat ggt ggt ggc tgg ggt caa
X55882 is a GenBank cow sequence with 6 octapeptides; D10614 has 5 octapeptides:
Ra 1 cct cag gga ggg ggt ggc tgg ggt cag Rb 2 ccc cat gga ggt ggc tgg ggc cag Ri i cct cat gga ggt ggc tgg ggc cag Rc 3 cct cat gga ggt ggc tgg ggt cag Rd 4 ccc cat ggt ggt ggc tgg gga cag Rf 5 cca cat ggt ggt gga ggc tgg ggt caa ggt ggt Ra 1 cct cag gga ggg ggt ggc tgg ggt cag Rb 2 ccc cat gga ggt ggc tgg ggc cag Rc 3 cct cat gga ggt ggc tgg ggt cag Rd 4 ccc cat ggt ggt ggc tgg gga cag Re 5 cca cat ggt ggt gga ggc tgg ggt caa ggt ggtComparing the two sequences, the simplest explanation is that (beginning with the short sequence) Rc in the nascent strand slipped back to Rb, and the insert Ri is a chimera of the proximal part of Rc with the distal part of Rb. In other words, the first codon of Ri, cct, derives from Rc whereas its last codon derives from Rb, in the slippage model.
Y09760 camel Camelus dromedarius
PQGGGGWGQ PHGGGWGQ PHGGGWGQ PHGGGWGQ PHGGGWGQ GGG ccc cag gga ggg ggc ggc tgg ggt cag ccc cac gga gga ggc tgg ggt cag ccc cac gga ggc ggc tgg ggt caa ccc cac gga ggc ggc tgg ggc cag ccc cat ggt gga ggc tgg ggt caa ggt ggt ggc ccc cat ggt ggt ggc tgg gga cag cow Rd 4 cca cat ggt ggt gga ggc tgg ggt caa cow Rf 5 The camel deletion is ambiguous but should be clarified by lama and any camel repeat polymorphism. If camel originally had a terminal nonamer, then this could simply be deleted.
U08309 black-handed spider monkey Ateles geoffroyi; U15164 black spider monkey x brown-headed Ateles paniscus x Ateles fuscicepsPQGGGWGQ PQGGGWGQ PHGGGWGQ PHGGGWGQ PHGGGWGQ PHGGGWGQ PHGGGWGQ PHGGGWGQ PHGGGWGQ GGG AGG Ra ccc cag ggt ggt ggc tgg ggg caa Rc ccc cat ggt ggc ggc tgg ggg cag Rd ccc cat ggt ggc ggc tgg gga cag Re cct cat ggt ggt ggc tgg ggt caa gga ggt ggc Ra ccc cag ggt ggt ggc tgg ggg cag Rb cct cat ggt ggt ggc tgg ggg caa Rc ccc cat ggt ggc ggc tgg ggg cag Rd ccc cat ggt ggc ggc tgg gga cag Re cct cat ggt ggt ggc tgg ggt caa gca ggt ggcThe Ateles geoffroyi results quite simply by deleting repeat Rb of the repeat region found in other species of this genus.
The silent versus sense anomaly
The repeat region, despite the emphasis given here, has been very stable over evolutionary time at the amino acid level, even though no normal function or stable fold can be assigned to it (41, 42). A sophisticated study of secondary structure in prion mRNA has found (37) a strongly conserved stem-loop region, called helix C, just upstream and partly including Ra. The normal role is not understood but might influence transcription or translation rates or ribosomal targeting within neurons. The structure might form at least transiently or during replication, in the DNA as well and destabilize slipped strand intermediates, accounting in part for the marked instability of this region. Curiously, its length is exactly 24 bp if the bottom imperfect base pair is not included. BLASTP 2 homology searches do not find this to be a common motif in other proteins; the region is too short to search with BLASTN 2 .[Under development]
(Adapted from reference 37)
AC CGC TAC CCA CCT CAG GGC GGT GGT -- human helix C; first 5 codons of Ra underlined N R Y P P Q G G GIf this region, or the repeat region generally, functions mainly at the nucleotide level, then third codon position could be equally as important as first or second codon position,unlike the situation at the amino acid level. This predicts that evolutionarily fixed silent mutations would be less common here than in distal regions where selection acted mainly at the protein level. The prion protein affords an opportunity to test this idea by comparison to the invariant core domain 104-122: the repeat region should be changing relatively slower at third codon position than the invariant core, if indeed the repeat experiences strong selection at the nucleotide level.
THNQW NKPSK PKTNM KHVAG AAAAG AVVGG LGGym lgsam srp1 AGGTHNQWNKPSKP-KTNMKHMAGAAAAGAVVGGLGG 66 GGGTHNQWNKPSKP-KTNMKHMAGAAAAGAVVGGLGG 88 GGGTHSQWNKPSKP-KSNMKHMAGAAAAGAVVGGLGG 99 GGGTHSQWNKPSKP-KTNMKHMAGAAAAGAVVGGLGG 222 GGGTHSQWNKPSKP-KTSMKHMAGAAAAGAVVGGLGG 11 GGGTHNQWHKPNKP-KTSMKHMAGAAAAGAVVGGLGG 33 GGGTHNQWNKPNKP-KTSMKHMAGAAAAGAVVGGLGG 22 GGGTHNQWHKPSKP-KTSMKHMAGAAAAGAVVGGLGG 5 GG-THNQWGKPSKP-KTSMKHVAGAAAAGAVVGGLGG 78 GGGTHNQWNKPSKP-KTSMKHVAGAAAAGAVVGGLGG 7 GGGAHGQWNKPSKP-KTSMKHVAGAAAAGAVVGGLGG 6 GG-THSQWNKPSKP-KTNMKHVAGAAAAGAVVGGLGG 111 GGGTHSQWNKPSKP-KTNMKHVAGAAAAGAVVGGLGG 8 GGGSHGQWGKPSKP-KTNMKHVAGAAAAGAVVGGLGG 9 GGGSHGQWNKPSKP-KTNMKHVAGAAAAGAVVGGLGG 3 GG-SHSQWNKPSKP-KTNMKHVAGAAAAGAVVGGLGG 4 GG-THGQWNKPSKP-KTNMKHVAGAAAAGAVVGGLGG 77 GGGTHNQWNKPSKP-KTNMKHVAGAAAAGAVVGGLGG 44 GGGTHNQWNKPSKP-KTNFKHVAGAAAAGAVVGGLGG 55 GGGTHNQWNKPSKP-KTNLKHVAGAAAAGAVVGGLGG 2 GG--YNKW-KPDKP-KTNLKHVAGAAAAGAVVGGLGG brush-tailed possum 333 GGSYHNQ--KPWKPPKTNFKHVAGAAAAGAVVGGLGG chicken .* :.: ** ** *:.:**:***************
|Consensus repeat region sequences|
1. Puckett C, Concannon P, Casey C, Hood L Genomic structure of the human prion protein gene Am J Hum Genet 49(2), 320-329 (1991) 2. Laplanche JL, Chatelain J, Launay J-M, Gaxengel C, Vidaud M Deletion in prion protein gene in a Moroccan family. Nucleic Acids Res 18(22), 6745 (1990) 3. Diedrich JF, Knopman DS, List JF, Olson K, Frey WH 2d, Emory CR, Sung JH, Haase AT Deletion in the prion protein gene in a demented patient. Hum Mol Genet 1(6), 443-444 (1992) 4. Bosque PJ, Vnencak-Jones CL, Johnson MD, Whitlock JA, McLean MJ A PrP gene codon 178 base substitution and a 24-bp interstitial deletion in familial CJD. Neurology 42(10), 1864-1870 (1992) 5. Palmer MS, Mahal SP, Campbell TA, Hill AF, Sidle KC, Laplanche JL, Collinge J Deletions in the prion protein gene are not associated with CJD. Hum Mol Genet 2(5), 541-544 (1993) 6. Pocchiari M, Salvatore M, Cutruzzola F, Genuardi M, Allocatelli CT, Masullo C, Macchi G, Alema G, Galgani S, Xi YG, A new point mutation of the prion protein gene in CJD Ann Neurol 34(6), 802-807 (1993) 7 Salvatore M, Genuardi M, Petraroli R, Masullo C, D'Alessandro M, Pocchiari M et al. Polymorphisms of the prion protein gene in Italian patients with CJD Hum Genet 94(4), 375-379 (1994); erratum in Hum Genet 1995 95(5):605 8. Laplanche JL, 1995 Molecular genetics of familial and sporadic forms of human prion diseases Ann Pharm Fr 53(5), 193-200 (1995) 9. Schatzl HM, Da Costa M, Taylor L, Cohen FE, Prusiner SB Prion protein gene variation among primates. J Mol Biol 245(4), 362-374 (1995); erratum in J Mol Biol 1997 Jan 17;265(2):257 10. Perry RT, Go RC, Harrell LE, Acton RT SSCP analysis and sequencing of the human prion protein gene (PRNP) detects two different 24 bp deletions in an atypical Alzheimer's disease family. Am J Med Genet 60(1), 12-18 (1995) 11. Reder AT, Mednick AS, Cervenakova L, Goldfarb LG, Garay A, Ovsiew F Clinical and genetic studies of fatal familial insomnia. Neurology 45(6), 1068-1075 (1995) 12. Vnecak-Jones CL and Phillips JA III Identification of heterogeneous PrP gene deletions in controls by detection of allele-specific heteroduplexes (DASH) AM J Hum Genet 50:871-872 (1992) 13. Masullo, C, Salvatore, M, Macchi, G, Genuardi, M, and Pocchiari, M Progressive dementia in a Yyoung patient with a homoqygous deletion of the PrP gene Ann NY Acade Scie 724: 358-60 (1994) 14. Goldfarb, LG, Brown, P, Cervenakova, L and Gajdusek, DC Genetic Analysis of CJD and Related Disorders Phil.Trans.R.Soc.Lond. B 343: 379-384 (1994) 15. Brown, P, Cervenakova, L, Goldfarb, L et al Iatrogenic CJD: Ancient genes and modern medicine Neurology Feb;44(2):291-293 (1994) 16. Windl, O, Demster, M, Estibeiro, JP, Lathe, R, et al Genetic basis of CJD in the UK: a systematic analysis of predisposing mutations and allelic variations in the PRNP gene Hum Genet 98: 259-264 (1996) 17. Owen, F, Poulter M, Shah T An in-frame insertin in the prion protein gene in familial CJD Brain Res Mol Brain Res 7: 273-276 (1990) 18. van Gool WA, Hensels GW, Hoogerwaard EM, Wiexer JHA, Wesseling P and Bolhuis PA Hypokinesia and presenile dementia in a Dkutch famiy with a novel insertion in the prion protein gene Brain 118: 1565-1571 (1995) 19. Owen, R, Poulter M, Collinge J, Leach M, Lofthouse R, Crow TJ, et al. A dementilng illness associatred with a novel insertion in the prion protein gene. Brain Res Mol Brain Res 13: 155-157 (1992) 20. Goldfarb L, Brown P, McCombie WR, Goldgaber D, Swergold GD, Wills PR, et al. Transmissible familial CJD associated with 5, 7, and 8 extra octapeptide coding repeats in the PRNP gene Proc Natl Acad Sci USA 88 10926-30 (1991) 21. Cochran, EJ, Bennett DA, Cervenakova L, Kenney K, Bernard B, Foster NL, Benson DF, Goldfarb LG, and Brown, P. Familial CJD with a five-repeat octapeptide insert mutation Neurology 47: 727-733 (1996) 22. Laplanche J-L, Chatelain, J, Thomas S, Brown P, and Cathala F Analyse du gene PrP dans une famille d'origine Tunisienne atteinte de malade de Cretuzfeldt-Jakob. Rev Neurolog147: 825-827 (1991) 23. Oron-Karni V, Filon D, Rund D, Oppenheim A A novel mechanism generating short deletion/insertions following slippage is suggested by a mutation in the human alpha2-globin gene. Hum Mol Genet 1997 Jun;6(6):881-885 24. Weitzmann MN, Woodford KJ, Usdin K DNA secondary structures and the evolution of hypervariable tandem arrays. J Biol Chem 1997 Apr 4;272(14):9517-9523 25. Harvey SC Slipped structures in DNA triplet repeat sequences: entropic contributions to genetic instabilities. Biochemistry 1997 Mar 18;36(11):3047-3049 26.Hyland PL, McKinney MW, Keegan AL, McKenna PG, Curran MD, Middleton D, Barnett YA Sequence analysis of spontaneously-arising mutations at the aprt locus in wild-type and thymidine kinase-deficient Friend cells: evidence for strand slippage-misalignment mechanism in formation of deletions. Biochem Soc Trans 1997 Feb;25(1):127S 27. Pinder DJ, Blake CE, Leach DRF DIR: a novel DNA rearrangement associated with inverted repeats. Nucleic Acids Res 1997 Feb 1;25(3):523-529 28.Macey JR, Larson A, Ananjeva NB, Papenfuss TJ Replication slippage may cause parallel evolution in the secondary structures of mitochondrial transfer RNAs. Mol Biol Evol 1997 Jan;14(1):30-39 29. Fitches AC, May SJ, Olds RJ A novel antithrombin gene mutation: slippage and mispairing as a mechanism of genetic disease. Pathology 1996 Nov;28(4):339-342 30.Osterholm AM, Bastlova T, Meijer A, Podlutsky A, Zanesi N, Hou SM Sequence analysis of deletion mutations at the HPRT locus of human T-lymphocytes: association of a palindromic structure with a breakpoint cluster in exon 2. Mutagenesis 1996 Sep;11(5):511-517 31.Holden JJ, Walker M, Chalifoux M, White BN Trinucleotide repeats at the FRAXF locus: frequencyand distribution in the general population. Am J Med Genet 1996 Aug 9;64(2):424-427 32.Tran HT, Gordenin DA, Resnick MA The prevention of repeat-associated deletions in Saccharomyces cerevisiae by mismatch repair depends on size and origin of deletions. Genetics 1996 Aug;143(4):1579-1587 33.Ji J, Clegg NJ, Peterson KR, Jackson AL, Laird CD, Loeb LA In vitro expansion of GGC:GCC repeats: identification of the preferred strand of expansion. Nucleic Acids Res 1996 Jul 15;24(14):2835-2840 34.Ho KF, Craddock EM, Piano F, Kambysellis MP Phylogenetic analysis of DNA length mutations in a repetitive region of the Hawaiian Drosophila yolk protein gene Yp2. J Mol Evol 1996 Aug;43(2):116-124 35. Hood DW, Deadman ME, Jennings MP, Bisercic M, Fleischmann RD, Venter JC, Moxon ER DNA repeats identify novel virulence genes in Haemophilus influenzae. Proc Natl Acad Sci U S A 1996 Oct 1;93(20):11121-11125 36. Vogler AP, Welsh A, Hancock JM Phylogenetic analysis of slippage-like sequence variation in the V4 rRNA expansion segment in tiger beetles (Cicindelidae). Mol Biol Evol 1997 Jan;14(1):6-19 37.Luck R, Steger G, Riesner D Thermodynamic prediction of conserved secondary structure: application to the RRE element of HIV, the tRNA-like element of CMV and the mRNA of prion protein. J Mol Biol 1996 May 24;258(5):813-826 38. Carrel RW, Lomas DA Conformational disease The Lancet Volume 350: 134-38 (1997) 39.Kelly, JW Alternative conformations of amyloidogenic proteins govern their behavior. Curr. Opin. Struct. Biol. 6,11-17 (1996) 40. Owen F, Poulter M, Collinge J, Leach M, Shah T, Lofthouse R, Chen YF, Crow TJ, Harding AE, Hardy J, et al Insertions in the prion protein gene in atypical dementias. Exp Neurol 1991 May;112(2):240-242 41. Hornemann, S., Korth, C., Oesch, B., Riek, R., Wider, G., Włthrich, K. and Glockshuber, R. (1997) Recombinant full-length murine prion protein, mPrP(23-231): Purification and spectroscopic characterization FEBS Lett. 413, 277-281. 42. Riek, R., Hornemann, S., Wider, R., Glockshuber, R. and Włthrich, K. NMR characterization of the full-length recombinant murine prion protein, mPrP(23-231). FEBS Letter. 413, 282-288.(1997) 43. Muramoto T, DeArmond SJ, Scott M, Telling GC, Cohen FE, Prusiner SB Heritable disorder resembling neuronal storage disease in mice expressing prion protein with deletion of an alpha-helix. Nat Med 1997 Jul;3(7):750-755 Mice were constructed carrying prion protein (PrP) transgenes with individual regions of putative secondary structure deleted. Transgenic mice with amino-terminal regions deleted remained healthy at >400 days of age, whereas those with either of carboxy-terminal alpha-helices deleted spontaneously developed fatal CNS illnesses similar to neuronal storage diseases. Deletion of either C-terminal helix resulted in PrP accumulation within cytoplasmic inclusions in enlarged neurons. Deletion of the penultimate C-terminal helix resulted in proliferation of rough endoplasmic reticulum. Mice with the C-terminal helix deleted were affected with nerve cell loss in the hippocampus and proliferation of smooth endoplasmic reticulum. Whether children with the human counterpart of this malady will be found remains to be determined. 43. Muramoto T, Scott M, Cohen FE, Prusiner SB Recombinant scrapie-like prion protein of 106 amino acids is soluble. Proc Natl Acad Sci U S A 1996 Dec 24;93(26):15457-15462 The N terminus of the scrapie isoform of prion protein (PrPSc) can be truncated without loss of scrapie infectivity and, correspondingly, the truncation of the N terminus of the cellular isoform, PrPC, still permits conversion into PrPSc. To assess whether additional segments of the PrP molecule can be deleted, we previously removed regions of putative secondary structure in PrPC; in the present study we found that deletion of each of the four predicted helices prevented PrPSc formation, as did deletion of the stop transfer effector region and the C178A mutation. Removal of a 36-residue loop between helices 2 and 3 did not prevent formation of protease-resistant PrP; the resulting scrapie-like protein, designated PrPSc106, contained 106 residues after cleavage of an N-terminal signal peptide and a C-terminal sequence for glycolipid anchor addition. Addition of the detergent Sarkosyl to cell lysates solubilized PrPSc106, which retained resistance to digestion by proteinase K. These results suggest that all the regions of proposed secondary structure in PrP are required for PrPSc formation, as is the disulfide bond stabilizing helices 3 and 4. The discovery of PrPSc106 should facilitate structural studies of PrPSc, investigations of the mechanism of PrPSc formation, and the production of PrPSc-specific antibodies. aug 8 nature: Based on the large number of sequence-related genes encoding outer membrane proteins and the presence of homopolymeric tracts and dinucleotide repeats in coding sequences, H. pylori, like several other mucosal pathogens, probably uses recombination and slipped-strand mispairing within repeats as mechanisms for antigenic variation and adaptive evolution.