Prion Disease: Doppel Developments
Mad Cow Home . . Best Links . . Contact Researcher . . Science Index

Doppel developments
Doppel changing 2.1x faster than prions
Distant prion relative in Drosophila?
1,525,050 bp human chr 20p contig released
Distinct Dutch 7x repeat CJD
Displaying residue conservation in the prion protein
Doppel bibliography

Doppel developments

18 Dec 00 update of 24 Nov 00 webmaster research

Doppel update by species
...human: end of doppel gene found  ... pig: doppel protein inferred from EST
...rat: new pre-CDS exon defined    ... cow: 5'coding doppel EST found by USDA
...mouse: cryptic pre-CDS exon      ... mouse: 3'UTR end from EST data
...alignment: doppel 5'UTR splice exon junction from five species
...blastn: advanced parameter settings for alignment of non-coding exons
...ClustalW: doppel protein alignment from 6 species

Human: A region of 522kb of finished human genomic sequence provides adequate flanking sequence context around prion-doppel and defines bounding genes on both sides (essential to assigning downstream ESTs). Some 24 ESTs and 1 mRNA at GenBank "belong" to human doppel, primarily originating within 3'UTR because of its extraordinary length (5 "harmless" retrotransposons account for 1000 bp).

None of these ESTs spans a splice junction and together afford no tiling, thus leaving the boundaries of downstream exons unresolved. Exon 2 of human doppel could have structure 5'UTR+CDS+3'UTR of 11+531+3386=3928 bp. On the other hand, mouse has a pure 3'UTR exon 3. Though poorly alignable, mouse exon 3 does elicit a patchy blastn match to human genomic, one that terminates exactly at the poly A signal and site of the single sequenced human mRNA in this region, AF086354, which likely represents the end of the human doppel gene.

The human gene is not properly annotated in human genomic sequences such as AF106918. Where is the end of the gene, do humans have a pure 3'UTR exon 3 like that of mouse, if so where is the splice junction to exon 2? Two long-delayed GenBank entries AF187844 and AF187843 from Lee,I.Y. and Hood,L.E. associated with a never-to-be-submitted article called 3' RACE for prnd gene appeared on 7 Jan 01. However the 3'RACE method failed, merely priming off a naturally occuring genomic stretch of polyT: 22261 gtggcatgaa gattttcttt ctcttttttt tttttttaga

This was never a candidate region for mRNA termination, lacking as it does any polyA signal or site, as well as homology with full length rodent transcripts. None of the human doppel ESTs terminate at this position instead at AATAAAtgaaaacacctga 1162, by no coincidence following a canonical polyA signal eg, AW137067, AI637716, AA262992, AI656950, and BF092481.

After exploring the 6 dimensional parameter space of advanced blastn options for suitable alignment of non-coding regions across species that contain many insertions/deletions, retrotransposons, and simple repeats as well as point mismatches, conditions can be found where exon 3 of mouse doppel specifically pulls out genomic human doppel from GenBank (712,456 sequences). This is far more definitive than aligning two sequences presumed ab initio to be alignable; gaps make expectation scoring unfeasible. The expectation is that, apart from splice junctions, introns diverge more rapidly than 3'UTR exons.

The blastn outcome is independently confirmed by a cDNA sequence that defines the same terminus of the gene: AF086354 spanning 24326-24889 of genomic AF106918 and ending with a canonical polyA signal and site, aacAATAAAcatccttgtacatac. Of the additional 24 human doppel ESTs, identifiable now that the new adjacent gene has set the downstream boundary, five also begin precisely at the end of the gene, AW296843, AW243540, AI187842, W73057, and W76645.

The next gene downstream is on the opposite strand, whereas the polyA signal at the end of the mRNA AF086354 is on the doppel sense strand. Oddly, this mRNA is seen by RepeatMasker as containing 132 bp of a near-terminal LINE1 on the anti-sense strand, the aligning mouse sequence is not so recognized. This, as well as the L2-MIR earlier, may be either very old retrotransposons or simply false positives.

The distance from the ATG start codon to the end of the human doppel gene is thus 3991 bp of which 1000 bp is junk DNA. This is too long for a continuous transcript (the statistical distribution of 3'UTR in the whole genome are known) suggesting a pure 3'UTR human exon 3 like that of mouse. Blastn with mouse exon 3 readily identifies a candidate human exon 3 (see figure).

The splice acceptor to exon 2 is at position 23040 of genomic AF106918, consistent with the GT-AG consensus spice acceptor seen ... cctacaggcgctccaggccactctcagag ACTCCCAGGAG(exon 3). Human exon 3 is considerably longer than mouse, 1854 bp vs 1302 in mouse, of which 438 bp (23.7%) are masked for -AluSx and a weak -L1ME1 in human. A singlet human EST, AA347619, is found near the splice junction but is contiguous with 197 bp genomic across it so may represent an artefact, as the 5' end of mouse exon 3 matches fairly well to human (60-65% identical). Note however that mouse exon 3 is reported to have alternative splice acceptors.

The mouse genome proves fairly unsatisfactory as a guide to untranslated regions 5' as well as 3' for both human prion and doppel genes: too much mutational distance allows only the most conserved features to be reliably recognized through alignment. Genomic programs underway with rat, pig, cow, and zebrafish are also less than ideal for this purpose; chimpanzee would be near-useless for the opposite reason. A primate such as macaque would be best for identifying untranslated exons and the like. Clearly a range of genomes is needed, depending on purpose. In the short term, cow and sheep doppel genomic sequence will arrive first, improve on mouse, and potentially provide strong independent support of human doppel downstream gene features.

The EST data conflicts with the annotation of AF106918, an independently sequenced genomic region of length 33260 bp mentioned in experimental doppel studies by Moore et al. That entry shows a putative exon 2 containing only 788 bp of 3' UTR but possessing no splice donor or polyA signal, and ending within a quasi-polyT associated with an Alu on the complementary strand (potentially giving rise to an experimental PCR artefact). About 70% of the known doppel ESTs are farther downstream.

However, blastn with mouse exon 2b (long form) as query under permissive conditions shows moderately good alignment up to position 22273 of human doppel, 1301 bp from the start codon, which marks the start of a human retrotransposon pair. Only 48 bp of the 1261 bp mouse query remains. The short form, mouse exon 2a, ends at 21550. There is no plausible polyA signal or evident splice donor near either region in human.
However, a stack of human ESTs originates 1101 bp below the start codon at position 22073, eg est -AW139611, which is 218 bp upstream of the end of the region aligning to the end of exon 2b of mouse. That position corresponds to TAGAAATAAATGAAAACACCTGagctggtggctgcgtact, that is, a canonical polyA signal 12 bp before the end of the 6 such ESTs that terminate precisely there; there is no genomic polyA or polyT tract nearby to generate artefacts. Thus position 22073 is a plausible human counterpart of exon 2b of mouse. (It is exon 2a that splices to exon 3; its splice donor follows the stop codon by 48 residues. Humans have a potential splice donor GT 53 residues after the stop codon that could splice to exon 3 at 23040 creating a 1487 bp intron 2.)

GenScanW readily finds the doppel coding region and suggests a canonical AATAAA poly signal 1083 bp below the start codon at 22055-22060, confirming the EST data. The algorithm is not at its best at finding non-coding exons such as the putative exon 3. The 17 downstream ESTs, while confused by presence of retrotransposons, do not originate or terminate within the AluSx and Line1 elements. There is no support for an additional gene in terms of ORFs or gene prediction or blastx protein searches -- these ESTs "belong" to doppel.

The human counterpart of mouse doppel, pending confirmation from sheep or cow comparison, has at least two downstream gene models (AF106918 numbering):

5' cds  3'
11+531+3391 = 3934 (read-thru model: single exon containing 5' flanker, CDS, and long 3'UTR)
11+531+569  = 1112 (short model)
Human doppel ESTs, grouped in order of increasing distance from the stop codon as shown in the graphic, show a tissue expression profile similar to that reported experimentally (testis, fetal heart, ...):
AA234322   AI288920   AI825182
AW139611   AA347619   AA758081
AW137067              AL041968
BF091068   BE503379   BE671981
AA262992   AI655440   AW296843
AI656950   AI337054   W76645
BF092481   AW207420   AW243540
AI637716   AI242370   AI187842
           W73057
           AF086354 

The prion-doppel superfamily appears very limited: no additional proteins paralogous to doppel are found in the human genome (or any other species) at this time according to tBlastn of ancestral coding doppel query sequence against nr, dbEST, finished, and unfinished genomic databases. No anti-prion mouse EST has emerged (the search assumes a less imperfect match than a bona fide prion EST); in particular anti-prion transcripts cannot originate with doppel.

Unfinished human genome contains AL109808 (4 unordered pieces,150871bp, doppel at 97318-97845, allele TVK-); finished genome contains NT_001001 (522,111 bp), within which the relevant contig AL133396 is annoyingly given as minus strand (148,497 bp, 16-Feb-2000). The distance between prion-doppel start codons is 25,331 bp; the intergenic region is 22,908 bp.

>dop_ancestral (lower case less certain) MrkhLGgWWLAIlCmLLfSHLStVKARGIKHRiKWNRKVLPStgQITEAqVAENRPGAFIKQGRKLDIDFGAEGNRYYeANYWQFPDGIyYeGCSEANVTKEafVTsCvNATQAANQaE FSREKQDnKLHQRVLWRLIkELCStKHCDFWLERGAgLRVTvDQPaMlCLLgFIWFIVK

Although prion and doppel are common enough human transcripts, pseudogenes are seldom full length implying that any pseudogenes would be pure 3'UTR. Unless these somehow retained function (eg, anti-prion transcripts were once reported), non-coding pseudogenes quickly drop below limits of reliable detectability. In any event, blastn of masked 3'UTR returns no pseudogene candidates for either gene as of 29 Nov 00 even using permissive blastn parameters.

Prion 3'UTR, including a puzzling near-invariant central region, is far better conserved across species than doppel, which exhibits very poor conservation due to both point substitutions and indels. There is no support for an unfused 3'UTR exon in the prion gene nor for alternative polyadenylation -- all ESTs end cleanly at the same site.

The ancestral status of non-coding exons is unclear: neither the exon structure nor the end point inclusions of the duplication are established. One intriguing theory is that the ancestral gene had a detached 3' exon corresponding to present-day doppel exon 3 that was not included in the doubling event (which amounted to an insertion in the preceding intron. This left the new prion gene without such an exon (and so no polyA site) and the new doppel gene without a transcription start site but still a 3'UTR exon. Only genomic sequence that diverged prior to the event could resolve this.

The one conserved gene feature is the coding region core: a short fused 5'UTR of 8-12 bp (even turtle is 10 bp), an intronless coding region, followed by a very long 3'UTR. Exons 1 and 3 in doppel have no counterpart in prion; exons 1 and 2 of prion have no counterpart in doppel. Doppel now has its own regulatory region and transcription start, it may have escaped pseudogene fate initially by chimeric transcripts that today are a relic.

Doppel 3'UTR splice sites seem quite fluid within species and inconsistent across species: thus mouse has alternative splice donors (exons 2a and 2b) as well as alternative splice acceptors and polyadenylation sites (exons 3a,b, c, and d; annotated only in the Oct 99 J Mol Bio paper, fig 1b and 2b: exon 3 splice acceptors at 37,983 and 38,013 of U29187, alternative polyadenylation sites at 39,099 and 39,315 with CTTAAA and AATAAA signals, the former implausible). Is all this variability of in vivo significance?

The retrotransposons in human doppel 3'UTR confound comparison with other species and make Blast searches for ESTs more difficult (some proceed through these repeats). Here are RepeatMasker-defined repeats: note that the sense strand (T)n is better viewed as an anti-sense strand (A)n just part of the anti-sense AluSq. The L2, MIR, and L1ME1 may be fairly old; unlike the Alu, they do not summon up columns of matches under normal blastn conditions.

retrotransposons downstream of exon 2: total length 1000 bp
 721   802  +  L2      3175 3262 (10)
 814   995  C  MIR     (37)  225  21 
1312  1583  C  AluSq    (6)  307  27
1587  1612  +  (T)n      1    26  -
3252  3557  C  AluSx    (8)  304  1  
3794  3925  C  L1ME1  (343) 5825  5686  

Bovine: There is an 83% match to a new high quality bovine doppel cDNA, BF230757, from USDA, ARS, Beltsville ARC posted Nov 14, 2000. This sequence contains 57 base pairs of the 5 UTR as well as the first doppel coding polymorphism outside human (K101N, Asn for wildtype Lys at invariant codon 101; t to g). The 138 amino acids agree otherwise with a fragmentary Norwegian sequence. The polymorphism is not likely an artefact: there are no silent mutations elsewhere in the cDNA, the odds of a point mutation being silent are about 1 in 9, cDNA are not as error-prone as ESTs.

With an active imagination, one can even align the 5'UTR of human and cow finding 43/57 identities, suggesting in part a similar upstream exonic structure and splice junction that is validated when all known species are included in the alignment.

Bovine mammary doppel cDNA BF230757
ggacagggtgcccgtggctccagaggtgcatcagagagaccctaagatcccgacaca
ATGaggaaacatctgggtggatgctggttggccattgtatgtatcctgctctttagccaactc
 M  R  K  H  L  G  G  C  W  L  A  I  V  C  I  L  L  F  S  Q  L
tgctcagtcaaggcgagaggcataaagcacagaatcaagtggaaccggaaggtcttgcca
 C  S  V  K  A  R  G  I  K  H  R  I  K  W  N  R  K  V  L  P
agtacctcccaggtcacggaggcccgcactgcggaaatccgcccaggggccttcatcaag
 S  T  S  Q  V  T  E  A  R  T  A  E  I  R  P  G  A  F  I  K
caaggccgaaagctggatatcgactttggagtggagggcaataggtactatgaggccaac
 Q  G  R  K  L  D  I  D  F  G  V  E  G  N  R  Y  Y  E  A  N
tattggcagtttcctgacggcatccattacaacggctgctccaaggccaatgtcaccaat
 Y  W  Q  F  P  D  G  I  H  Y  N  G  C  S  K  A  N  V  T  N
gaaaagtttatcaccagctgcattaatgccacccaggcggcgaatcaagaggaactgtcc
 E  K  F  I  T  S  C  I  N  A  T  Q  A  A  N  Q  E  E  L  S
cgtgagaaacaagacaacaagctttaccagcgggtcctgtggcagctgatca
 R  E  K  Q  D  N  K  L  Y  Q  R  V  L  W  Q  L  I

Sheep The bovine doppel mRNA BF230757, when translated, differs at two amino acids from sheep doppel protein. Assuming single nucleotide changes, using the genetic code and human DNA, the bovine leading sequence can be unambiguously corrected to give a better sheep primer sequence (silent mutations cannot be corrected):

atgaggaaacatctgggtggatgctggttggccattgtatgtGtcctgctctttagccaactctCctcagtcaag
 M  R  K  H  L  G  G  C  W  L  A  I  V  C  v  L  L  F  S  Q  L  s  S  V  K 
Should full-length ovine and bovine cDNA sequences become available, the main issues would be the degree of conservation with human and rodent, with primary goal of determination of human exonic structure. Supposing bovid cDNA aligns persuasively but discontinuously with human genomic, that would suggest an exon 2/3 junction in human that could be further refined by searching for GT-AG concensus splice donor and acceptor (and appropriate flanking sequence) in human genomic. On the other hand, long continuous bovid 3'UTRs support a similar model in human. Thus if supported by a similar alignment of mouse mRNA and consistency with human ESTs, the exon 3 boundaries in human might be reliably established through comparative genomic alone.

Mouse 3'UTR: The mouse sequence U29187 is sloppily annotated. The mRNA data show a 3'UTR intron from 569-1799 of length 1231 and a subsequent UTR exon 3 (post-cds positions 1800-3105, length 1305 bp). Other mRNA and EST data confirm the existence of read-thru transcripts, eg AF192382 and BE306939. Thus the data support a variety of downstream mouse models:

8+540+ 29+1305 = 1853 (2a intron model)
8+540+722+1305 = 2575 (2b intron model)
8+540  +2564   = 3112 (read-thru model)
 
The final 1811 bp of mouse genomic sequence U29187 are free of retrotransposons and simple repeats. This sequence is a suitable query for existing extensions as part of the mouse genome project. However, no extensions are in the public domain. Mouse sequence, from doppel cds on, when masked, gives variable quality alignments to human genomic.

It is really the mouse ESTs that are remarkable. There are 75 of these, the vast majority beginning at the end of gene (3104 bp below start codon, poly A signal AATAAA at 3085). This suggests that the mouse genomic region available is adequate for doppel annotation. Two rat ESTs are also found in this region but human ESTs are not picked out without permissive blastn conditions.

Rat: Mouse doppel genomic probe finds no rat genomic (finished or unfinished), no mRNA, but 2 ESTS, proximal BE107154 of length 322 (corresponding to exon 2b, with a dozen bp possibly extending over a splice junction to exon 3) and more distal AI136375 of length 368 bp (corresponding to 706-1069 of the 1303 bp mouse exon 3). The Westaway group has never released the rat DNA sequence used to deduce the rat doppel protein sequence published 01 Oct 99.

However, a new rat doppel EST, accession BF566683, turns out to be quite revealing. The inferred coding sequence agrees with previous rat doppel prion for 115 amino acids, indicating a high quality sequence. However, 5'UTR upstream of the CDS did not align well with mouse. However, blast2 with U29187 quickly reveals that the first 27 bp of rat do agree perfectly with mouse exon 1, positions 34094-34120. The remaining rat leader sequence agrees at the 96% identity level with a non-contiguous distal region of mouse genomic, 34236-34279, numbering relative to U29187. Canonical GT-AG splice donors and acceptors are present in mouse.

In other words, this rat mature transcript reveals an additional intervening exon, called here exon 1.5 so as not to throw off existing numbering, which corresponds to a cryptic counterpart in mouse. This necessitates a repartition of intron 1 in mouse genomic U29187 to intron 1 (34123-34237) and intron 1.5 (34280-36204).

This mouse exon may well be expressed in tissue types such as atrium, ventricle, and atrium ventricular canal just as it is in rat, since both the splice donor and splice acceptor, GT-AG, are conserved in mouse and species divergence is moderate. No sign of exon 1.5 [corresponding sequence conservation] is seen in human genomic sequence; exon1.5 does not appear in human, pig, sheep, or cow EST data either under permissive blastn.

This situation recalls the cryptic exon 2 in human prion. As in mouse, if sufficient tissue types are examined, cryptic exon utilization will likely be observed. Expression per se could only partly explain the conservation pressure on such features; somewhere alternative splicing options must be exploited to regulatory advantage -- after all the final protein is not affected by inclusion or exclusion of various non-coding exons.

rat BF566683 89 bp of 5'UTR + 349 bp of coding DNA for 117aa 
MKNRLGTWGLAILCLLLASHLSTVKARGIKHRFKWNRKVLPSSGQITEAQVAENRPGAFI
KQGRKLDIDFGAEGNKYYAANYWQFPDGIYYEGCSEANVTKEVLVTRCVNATQAGN

cacgagggcttcagaggccagagtagcagagaacaaagctgcctctgcattcctgtgctctg
atgctactgggaaatgattctcccaccATGaagaaccgtctgggtacatgggggctggcc
                            M  K  N  R  L  G  T  W  G  L  A 
atcctctgcctgctgcttgctagccacctctccacggttaaggccaggggcataaagcat
 I  L  C  L  L  L  A  S  H  L  S  T  V  K  A  R  G  I  K  H 
aggttcaagtggaaccggaaggtcctgcccagcagcggccagattaccgaagcccaggtg
 R  F  K  W  N  R  K  V  L  P  S  S  G  Q  I  T  E  A  Q  V 
gctgagaaccgcccaggagccttcatcaagcaaggccgaaagctggacatcgactttgga
 A  E  N  R  P  G  A  F  I  K  Q  G  R  K  L  D  I  D  F  G 
gcagagggcaacaagtactatgcggccaactactggcagttccctgatgggatctactac
 A  E  G  N  K  Y  Y  A  A  N  Y  W  Q  F  P  D  G  I  Y  Y 
gaaggctgctctgaagccaacgtgaccaaggaggtgctggtgacccgctgcgtcaacgcc
 E  G  C  S  E  A  N  V  T  K  E  V  L  V  T  R  C  V  N  A 
acccaggcgggcaatc
 T  Q  A  G  N 

Alignment of rat exon 1 and exon 1.5 to mouse genomic:

                        exon 1
rat: 1     gcttcagaggccagagtagcagagaac 27                         
             ||||||||||||| |||||||||||||
mus: 34094 gcttcagaggccacagtagcagagaac 34120

                        exon 1.5       
rat: 30    ag|ctgcctctgcattcctgtgctctgatgctactgggaaatg|gt    71 numbering relative to rat BF566683 
             ||||||||| | |||||| |||||||||||||||||||||||
mus: 34236 ag|ctgcctcagtattcctatgctctgatgctactgggaaatg|gt 34279 numbering relative to mus U29187

Pig: From a single EST, 147 amino acids of pig doppel could be inferred, plus 5 bp across the exon 1/2 splice junction. Assuming the EST is accurate, pig doppel has substantial oddities in the signal region, but the 4 cysteines and both glycosylation sites are conserved. There is 88% identity to sheep and cow. In the 6 species in which doppel has been determined, 92 of the 147 residues are conserved in the 6 species, rather respectable. The alignment of all known doppel proteins is shown later.

>pig BF441543 16 bp of 5'UTR + 458 bp coding for 147 aa 
Source: pooled testis, ovary, endometrium, hypothalamus, pituitary, placenta

CCAAGgtcctgacaccatgaggaagcacctgggtggacgtaggtgggccattgtctgcatc
  .............. M  R  K  H  L  G  G  R  R  W  A  I  V  C  I
ctgctcttcagccagctctccgaagtcaaggcgaggggcataaagcacagaatcaagtgg
 L  L  F  S  Q  L  S  E  V  K  A  R  G  I  K  H  R  I  K  W
aaccggaaggccctgccaagtacctcccaggtcacagaggcccacacagcggagatgcgc
 N  R  K  A  L  P  S  T  S  Q  V  T  E  A  H  T  A  E  M  R
ccaggggctttcattaagcaaggtcgaaagctggatattgactttggggcagagggcaat
 P  G  A  F  I  K  Q  G  R  K  L  D  I  D  F  G  A  E  G  N
aggtactacgaggccaactattggcggttccctgatgggatccattacaacggctgctcc
 R  Y  Y  E  A  N  Y  W  R  F  P  D  G  I  H  Y  N  G  C  S
gaggtcaacgtcaccaaggagaagtttgtcaccagctgcatcaacaccacccaggcggcg
 E  V  N  V  T  K  E  K  F  V  T  S  C  I  N  T  T  Q  A  A
aaccaggaggaactgtcccacgagaaaccggacaataagctttaccagcgggtcctgtgg
 N  Q  E  E  L  S  H  E  K  P  D  N  K  L  Y  Q  R  V  L  W
cggctgatcaaggagctctgctccatcaagcactgtg
 R  L  I  K  E  L  C  S  I  K  H  C...

Alignment: doppel 5'UTR splice junction from 5 species

ggacagggtgcccgtggctccagaggtgca.tc.ag.agagaccctaag.atcccgacacaATG cow BF230757  
...........................................cc.aag.gtcctgacaccATG pig BF441543 
actgtgcagctc.gaggctccagaggcacactccag.agagagcc.aag gttctgacgcgATG human AF106918 
.....gg..ctccaagctt.cagaggc.cacagtagcagagaacc.gag.att....caccATG mouse U29187 
cacgagg........gctt.cagaggc.cagagtagcagagaaca.aag.attctcccaccATG rat BF566683 

dopAnc          MrkhLGgWWLAIvCmLLfSHLStVKARGIKHRiKWNRKVLPStg-QITEAqVAENRPGAF 59
dopHum          MRKHLSWWWLATVCMLLFSHLSAVQTRGIKHRIKWNRKALPSTA-QITEAQVAENRPGAF 59
dopSus          MRKHLGGRRWAIVCILLFSQLSEVKARGIKHRIKWNRKALPSTS-QVTEAHTAEMRPGAF 59
dopOvi          MRKHLGGCWLAIVCVLLFSQLSSVKARGIKHRIKWNRKVLPSTS-QVTEAHTAEIRPGAF 59
dopBos          MRKHLGGCWLAIVCILLFSQLCSVKARGIKHRIKWNRKVLPSTS-QVTEARTAEIRPGAF 59
dopMus          MKNRLGTWWVAILCMLLASHLSTVKARGIKHRFKWNRKVLPSSGGQITEARVAENRPGAF 60
dopRat          MKNRLGTWGLAILCLLLASHLSTVKARGIKHRFKWNRKVLPSSG-QITEAQVAENRPGAF 59
                *:::*.    * :*:** *:*. *::******:*****.***:. *:***:.** *****

dopAnc          IKQGRKLDIDFGAEGNRYYeANYWQFPDGIyYeGCSEANVTKEafVTsCvNATQAANQaE 119
dopHum          IKQGRKLDIDFGAEGNRYYEANYWQFPDGIHYNGCSEANVTKEAFVTGCINATQAANQGE 119
dopSus          IKQGRKLDIDFGAEGNRYYEANYWRFPDGIHYNGCSEVNVTKEKFVTSCINTTQAANQEE 119
dopOvi          IKQGRKLDINFGVEGNRYYEANYWQFPDGIHYNGCSEANVTKEKFVTSCINATQVANQEE 119
dopBos          IKQGRKLDIDFGVEGNRYYEANYWQFPDGIHYNGCSKANVTKEKFITSCINATQAANQEE 119
dopMus          IKQGRKLDIDFGAEGNRYYAANYWQFPDGIYYEGCSEANVTKEMLVTSCVNATQAANQAE 120
dopRat          IKQGRKLDIDFGAEGNKYYAANYWQFPDGIYYEGCSEANVTKEVLVTRCVNATQAANQAE 119
                *********:**.***:** ****:*****:*:***:.***** ::* *:*:**.*** *

dopAnc          FSREKQDnKLHQRVLWRLIkELCStKHCDFWLERGAgLRVTvDQPaMlCLLgFIWFIVK178
dopHum          FQ--KPDNKLHQQVLWRLVQELCSLKHCEFWLERGAGLRVTMHQPVLLCLLALIWLTVK 176
dopSus          LSHEKPDNKLYQRVLWRLIKELCSIKHC------------------------------- 147
dopOvi          LSREKQDNKLYQRVLWQLIRELCSIKHCDFWLERGAGLQVTLDQPMMLCLLVFIWFIVK 178
dopBos          LSREKQDNKLYQRVLWQLIRELCSTKHCDFWLERGAGLRVTLDQPMMLCLLVFIWFIVK 178
dopMus          FSREKQDSKLHQRVLWRLIKEICSAKHCDFWLERGAALRVAVDQPAMVCLLGFVWFIVK 179
dopRat          FSREKQDSKLHQRVLWRLIKEICSTKHCDFWLERGAALRITVDQQAMVCLLGFIWFIVK 178
                :.  * *.**:*:***:*::*:** ***    

Blastn parameters: recommended advanced options for Blastn non-coding exon alignment and homology searches:

-W 7 -G 1 -E 1 -e 3.00 -q -1 -b 15 -v 15     permissive  parameter choice for 3'UTR
-W 7 -G 1 -E 1 -e e-01 -q -2 -b 15 -v 15     stringent   parameter choice for 3'UTR

-W  Word size, default is 11 for blastn, 3 for other program, 7 is minimum
-G  Cost to open a gap [Integer] default = 5
-E  Cost to extend a gap [Integer] default = 2
-q  Penalty for a mismatch [Integer] default = -3
-r  Reward for a match [Integer] default = 1
-e  Expectation value (E) [Real] default = 10.0
-v  Number of one-line descriptions (V) [Integer] default = 100
-b  Number of alignments to show (B) [Integer] default = 100

Doppel proteins changing 2.1x faster than prions

25 Nov 00 webmaster research
Now that full length protein sequences are available for both doppel and prion for the same 5 species (human, mouse, rat, cow, and sheep), it becomes feasible to estimate the rate at which each lineage of this tandem paralogue pair is changing.

Now we all know that doppel is changing faster, the question is how much faster and why.

By reconstructing the doppel and prion ancestral proteins at the time of the rodent divergence from the others 100 million years ago and roughly counting the number of changes necessary to get to the contemporary proteins (ClustalW score), then averaging, rates are obtained. This is done separately for full length proteins (signal and GPI included), for mature proteins (not including prion repeat), and for the alignable region (IDF. .CDF).

The bottom line is that over the last 100my at least, doppel is evolving twice as fast as prion protein (12.7% residues changed vs 6.1% or 2.1x).

Since mouse-human prion is changing slowly (upper 10% of 1160 compared orthologous gene pairs), doppel is changing at a fairly average absolute rate. It is not possible to say whether the normal function of doppel is less important or whether it has changed, necessitating adaptation. Doppel has not become a pseudogene in any of the lineages studied so far, even though that is the fate of the vast majority of duplicated genes.

The tandem duplication must be very ancient (lamphreys?) because there is no demonstrable convergence: the reconstructed ancestral doppel and prion are no closer to each other than contemporary doppels and prions. Doppel and prion will be easily recognizable at these rates, if still present, in the fish genomes being completed by March of next year.

The fantastic wasted effort in determining the 3D structure of doppel (5 labs in a race) is of no interest in the alignable region because doppels will simply have the same fold as prions, the great interest being the structure of the 44 leading amino acids of doppel.

These will not be flopping around as random coil but instead define the secondary structure of ancestral fold prior to repeat region creation in prion and 106-126 loss in doppel. And since the globular domain of prion was never really globular (proper hydrophobic interior, buried tryptophan) in the fragmentary prion nmr structures, no doubt it is "completed" in doppel, which doesn't have the technical problems of the copper domain.

So the real prize is in sight: running the ancestral fold as DALI query to identify the fold superfamily and perhaps in turn some clue as to normal function.

The inner disulfide situation in human raises some special problems. That is, human doppel but not cow, sheep, rat, or mice, has a two residue deletion between the inner disulfide, making it 6 angstroms "short" if in a helix. Now a loop might be forgiving, but not an alpha helix without changing a host of packing and polarity relationships.

In mammalian prion, there are without exception 36 residues cys to cys, 179-214 in human numbering. Turtle and doppels other than human have 35. (Birds have 46 due to an expansion located by alignment in loop 2.) The best way to think of it is that mammals have 1 extra residue in loop 2 relative to ancestral whereas human doppel is 2 residues short, accommodated for however by its proline 123.

The outer disulfide of doppel also takes some explaining. The structural scenario favored has mammalian prion losing a residue just past the second beta strand, giving rise to proline 165. This amounts to an extra residue in loop1 of doppel relative to mammalian prion; the differance is enough to position cys 94 face to face with cys 145 provided that the end of a short helix C has somewhat unravelled (whose tail length is quite unstable 5 aa after the last inner cysteine.). Helix A and the underpass in doppel will be also be conserved .

>dopAnc
MrkhLGgWWLAIlCmLLfSHLStVKARGIKHRiKWNRKVLPStgQITEAqVAENRPGAFI
KQGRKLDIDFGAEGNRYYeANYWQFPDGIyYeGCSEANVTKEafVTsCvNATQAANQaEF
SREKQDnKLHQRVLWRLIkELCStKHCDFWLERGAgLRVTvDQPaMlCLLgFIWFIVK
>dopHum (3 coding alleles:T26M P56L M174T)
MRKHLSWWWLATVCMLLFSHLSAVQTRGIKHRIKWNRKALPSTAQITEAQVAENRPGAFI
KQGRKLDIDFGAEGNRYYEANYWQFPDGIHYNGCSEANVTKEAFVTGCINATQAANQGEF
QKPDNKLHQQVLWRLVQELCSLKHCEFWLERGAGLRVTMHQPVLLCLLALIWLTVK
>dopOvi
MRKHLGGCWLAIVCVLLFSQLSSVKARGIKHRIKWNRKVLPSTSQVTEAHTAEIRPGAFI
KQGRKLDINFGVEGNRYYEANYWQFPDGIHYNGCSEANVTKEKFVTSCINATQVANQEEL
SREKQDNKLYQRVLWQLIRELCSIKHCDFWLERGAGLQVTLDQPMMLCLLVFIWFIVK
>dopBos (K101N suspected allele)
MRKHLGGCWLAIVCILLFSQLCSVKARGIKHRIKWNRKVLPSTSQVTEARTAEIRPGAFI
KQGRKLDIDFGVEGNRYYEANYWQFPDGIHYNGCSKANVTKEKFITSCINATQAANQEEL
SREKQDNKLYQRVLWQLIRELCSTKHCDFWLERGAGLRVTLDQPMMLCLLVFIWFIVK
>dopSus
MRKHLGGRRWAIVCILLFSQLSEVKARGIKHRIKWNRKALPSTSQVTEAHTAEMRPGAFI
KQGRKLDIDFGAEGNRYYEANYWRFPDGIHYNGCSEVNVTKEKFVTSCINTTQAANQEEL
SHEKPDNKLYQRVLWRLIKELCSIKHC
>dopMus
MKNRLGTWWVAILCMLLASHLSTVKARGIKHRFKWNRKVLPSSGGQITEARVAENRPGAFI
KQGRKLDIDFGAEGNRYYAANYWQFPDGIYYEGCSEANVTKEMLVTSCVNATQAANQAEF
SREKQDSKLHQRVLWRLIKEICSAKHCDFWLERGAALRVAVDQPAMVCLLGFVWFIVK
>dopRat 
MKNRLGTWglAILClLLASHLSTVKARGIKHRFKWNRKVLPSSGQITEAqVAENRPGAFI
KQGRKLDIDFGAEGNkYYAANYWQFPDGIYYEGCSEANVTKEvLVTrCVNATQAANQAEF
SREKQDSKLHQRVLWRLIKEICStKHCDFWLERGAALRiTVDQqAMVCLLGFIWFIVK

import ClustalW scores to spreadsheet:

seq,seq,,dop,dop,dop,,prion,prion,prion
,,,F,M,A,,F,M,A
anc,hum,,83,87,85,,91,94,92
anc,ovi,,84,86,84,,95,96,95
anc,bos,,85,87,85,,94,95,93
anc,mus,,89,92,92,dop to anc,92,94,97
anc,rat,,88,92,91,87.3,92,93,96
hum,ovi,,77,81,80,12.7,89,92,90
hum,bos,,77,81,80,prn to anc,90,92,91
hum,mus,,73,80,78,93.9,89,89,90
hum,rat,,73,80,77,6.1,89,89,89
ovi,bos,,94,95,93,rate dif,98,98,96
ovi,mus,,75,81,79,6.6,88,90,92
ovi,rat,,74,79,76,rate ratio,86,89,91
bos,mus,,76,81,79,2.1,87,89,91
bos,rat,,75,80,78,,86,89,90
mus,rat,,93,96,95,,98,99,98

Distant prion relative in Drosophila?

1 Dec 00 webmaster research
In the bad old days, before in silico hybridization, prion researchers looked in fruit fly and nematode for counterparts of the prion gene. This turned up various false matches reminiscent of the prion repeat, eg QGGWGG PQQQQGGG GWGQQGGG GQGGWGGPQ... in C elegans (accessopm S35500; NAR 20 400 1992). Earlier Westaway and Prusiner claimed, wrongly as it turned out, that "DNA sequences related to PrP can be detected in a wide variety of organisms under relatively stringent conditions ... such as nematode, Drosophila and possibly yeast." [NAR 1986 Mar 11;14(5):2035] A. Raeber subsequently knocked in hamster prion to Drosophila, finding the GPI signal was still recognized [Mech Dev 1995 Jun;51(2-3):317-27] but never pursuing it as a disease or normal function model.

As complete genomes of yeast, fly, and worm were completed, of course everyone looked for homologs but to no avail. With increasingly permissive blast conditions, matches can always be found, but it is time-consuming and generally unsatisfying to pursue weak candidates (eg, supporting secondary structure or disulfide).

For example, when the id code for prion protein (but not its gene sympol or name!) are used at the pre-blasted proteome of Drosophila (keeping in mind that an expert in Drosophila/human protein comparison would know how to best set blastp parameters http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/7227.html ) a match is reported, to Drosophila Dlc90F gene product AAF55532, 111 aa, chr 3R, dynein light chain:

        1 MDDSREESQF IVDDVSKTIK EAIETTIGGN AYQHDKVNNW TGQVVENCLT VLTKEQKPYK
       61 YIVTAMIMQK NGAGLHTASS CYWNNDTDGS CTVRWENKTM YCIVSVFGLA V
Now the webmaster had duly commented on this dismal distal match before, but who has time to build a case for:
Score = 31 Expect = 4.8  Identities = 18/69 (26%) Positives = 34/69 (49%) Gaps = 1/69 (1%)
Dlc90F:  1 MDDSREESQFIVDDVSKTIKE-AIETTIGGNAYQHDKVNNWTGQVVENCLTVLTKEQKPY KYIVTAMIM 68
           MD+   ++ F+ D V+ TIK+  + TT  G  +    V      V + C+T   +E + Y     ++M++
prion: 158 MDEYSNQNNFVHDCVNITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAY YQRGSSMVL 226
Now the function of this featureless microtubule motor protein in the Tctex1 superfamily is fairly well laid out at its SwissProt entry DYLX_DROME; it has a much better conserved full-length (48% identical) human counterpart t-complex-associated-testis-expressed factor P51808, 116 aa that is surely its ortholog. Blastp(pdb) does not show an available 3D structure. A intracellular protein would not conserve disulfide and glycan sites. There is no apparent connection to a recent pure 3'UTR dynein pseudogene downstream of doppel

As the large segment of finished and annotated chr 20p sequence flanking the prion-doppel genes became available on16-Nov-2000 (NT_011424 1,525,050 bp), another method became feasible to settle the matter of a prion gene counterpart in completed invertabrate genomes: comparison of corresponding chromosomal locations.

Now this might seem hopeless given that gene order has gotten very scrambled over the years by inversions, translocations, fusions, and deletions. Fortunately, human chr 20p has not been a party to this to any great extent, at least not over the last 450 million years to divergence with zebrafish, and local order at the level of a few genes is not affected by larger scale events.

As a practical matter, the best way of doing this is tblastn Drosophila genome against a half dozen concatenated, adjacent human protein query within which prion-doppel are embedded. Now most genes are part of extended superfamilies and a great deal of time has elapsed, meaning quite a few matches can be returned for each protein, most of which would not be syntenic.

Now one of the flanking genes, PCNA (proliferating cell nuclear antigen or cyclin, chain E in DNA replication polymerase delta, PCNA or MUS209), is surprisingly conserved with human-to-fly identities of 71% (to AAF57493), with potential for confusion with only one other superfamily member, AE003663 = CG10262 with identities 47%. (The match to nematode is 48%, to zebrafish 90%, to rat 98%.)

The other genes are not as nice but still quite managable: ADRA1a (adrenergic receptor alpha1a, large conserved family), KIAA0168 (RASSF2; CG4656 in fly 32% identical), SVCT2 (sodium-coupled vitamin C transporter 2 or solute carrier family 23, CG6293 in fly 44% identical)

Recall the webmaster previously established the gene order in vertabrates as goliath-genX-ADRA1a-Prnp-Prnd-genY-KIAA0168-SVCT2 -PCNA, not to mention various pseudogenes, all annotated at great length here earlier based on unfinished contigs.

What's the trick for finding a candidate syntenic region in fly, given long columns of paralogous and partial matches for each of 8 genes? Simply look for a universal genomic scaffolding number in the tblastn match annotations:

AE003523 genX genomic scaffold 142000013386050 section 45 of 54, Length = 283337
AE003770  ADRA1a genomic scaffold 142000013386035  section 95 of 105, Length = 224100
..................prion-doppel: expected position, not immediately seen
AE003742 KIAA0168 genomic scaffold 142000013386035 section 67 of 105, Length = 233747
AE003686 SVCT2 genomic scaffold 142000013386035 section 11 of 105, Length = 221888
AE003792 PCNA genomic scaffold 142000013386047 section 51 of 52, Length = 261900 too far over
While nowhere explained, scaffolding numbers somehow give nucleotides coordinates within the finished Drosophila genome; more useful accession numbers are similarly serially ordered. In other words, the prion-doppel gene,should be present within sections 67-095 of scaffold 142000013386035, ie flanked by SVCT2, KIAA0168 and ADRA1a with accession number between AE003770-AE003742. PCNA is nearby, but a little distal on another scaffold. Remarkably all 5 of the prion flanking genes are still found in the same small region within the Drosophila genome with fairly good retention of gene order.

All of these genes are found near the end of the 52 million bp Drosophila chromosome 3R. All of this is independent of the supposed blastp match of prion protein to Drosophila Dlc90F gene product. So here is a curious situation since the above weak match to prion protein is also found on chr 3R, indeed within AE003721 or slightly on the wrong side of its expected syntenic position. Looking up the exact coordinates in Kbp of each of the genes gives:

AE003523 CG6034  17,187-18,235 genX'
AE003680 CG8032  28,655-29,703 genX
AE003770 DOPR2   49,530-50,578 ADRA1a
...............................prion-doppel expected position
AE003742 CG4656  43,032-44,081 KIAA0168
AE003721 Dlc90F  38,236-39,285 supposed blastp prion match
AE003686 CG6293  30,281-31,330 SVCT2
AE003792 MUS209  41,792-42,762 on chr 2 PCNA
AE003663 CG10262 18,982-19,952 on chr 2 PCNA'
While this is certainly suggestive, there are quite a few intervening genes in fly that have no immediate positional human counterpart. Since the order is slightly off (a small inversion is needed), it is necessary to check which strand the genes are found on in both organisms, which is immediate for human but GenBank has not made so easy in fly.
    90774..120728 - RNF24 G1L goliath
   331767..344118 + FLJ20746 genX
   378150..406010 - ADRA1D ADRA1
   856270..857031 + PRNP CJD prion
   881601..882131 + PRND DPL doppel
   941322..958079 - RASSF2 KIAA0168
 1014021..1089611 - SVCT2 = SLC23A1
 1322999..1327514 - PCNA
The relative strand order in fly, given the putative inversion and holding ADRA1a and SVCT2 fixed anti-sense, needs have strand-reversal for KIAA0168 and the putative prion gene:
-AE003770 DOPR2   49,530-50,578 ADRA1a
............................... prion-doppel expected position
+AE003742 CG4656  43,032-44,081 KIAA0168
-AE003721 Dlc90F  38,236-39,285 supposed blastp prion match
-AE003686 CG6293  30,281-31,330 SVCT2
Looking at the coordinate distances in Kbp:
fly chr 3R
28655   29703           genX
49530   50578   20875   ADRA1a
43032   44081   6497    KIAA0168
38236   39285   4796    supposed blastp prion
30281   31330   7955    SVCT2

human chr 20p
331767  344118  223.4   genX
378150  406010  61.9    ADRA1D
856270  857031  451.0   PRNP
881601  882131  25.1    PRND
941322  958079  75.9    KIAA0168
While more could be done, what is really needed are some intermediate genomes.

1,525,050 bp human chr 20p contig released

Thu, 30 Nov 2000 GenBank NT_011424
The strech of human chromosome 20p containing the prion-doppel doublet is a rather pathetic stretch of chromosome, with only 8-9 genes coding 2923 amino acids in a stretch of DNA large enough to encode an entire autotrophic bacterial genome (1500 genes). At this rate, humans would only have 17,000 genes in the whole genome, less than a nematode.

The short arm, chromosome 20p, has 115 detected genes in 32 million bp; the entire chr 20 has 258 detected genes (click here for complete list with links to protein sequence) in 72,153,882 bp. This gives 3.6 genes per million bp, in other words, on average only one gene is found in 278,293 bp of genome. This is a very low gene density for an entire chromosome but is not all that different from completed chr 21 and 22.

However, annotators seem to have missed the gene downstream of doppel, failed to annotate any of the pseudogenes, and generally left a lot unassigned ESTs.

StartStopStrandLengthIntergenicGene ProductName
90,774120,728-9,984-RNF24G1L goliath-like
331,767344,118+4,117211,039FLJ20746polyamine oxidase-like
378,150406,010-9,28634,032ADRA1adrenergic alpha 1A
856,270857,031+253450,260PRNPprion
881,601882,131+17624,570PRNDdoppel
941,322958,079-5,58559,191RASSF2KIAA0168
1,014,0211,089,611-25,19655,942SLC23A1Na+ ascorbate transport
1,322,9991,327,514-1,505233,388PCNAproliferating cell nuclear antigen

It is instructive to blastn amd tblastn the 2923 aa of concatenated protein against other human chromosomes to look for ancestral synteny and translocations, against other genomes such as mouse (chr 20 corresponds in its entirety to mouse chr 2), or merely to blastn against dbEST to see which transcripts are not accounted for. The graphic below shows EST tBlastn matches of the concatenated proteins (telomere to centromere); genes with long 3'UTRs are not well represented since few reach the coding area.

UC Santa Cruz has walked away with top honors for displaying the whole human genome in a useful way. The Genome Browser site is far more effective than competitive implementations at NCBI and Ensembl. The UCSC site displays mRNAs, ESTs containing introns, other ESTs, Genie gene predictions, and exon models, all in track format at whatever resolution needed.

To extract ESTs, mRNAs and genes from the Genome Browser graphic, the webmaster recommends the following protocol:

 1: "view page source html" in the browser.  

2: Copy desired text lines out of html to a text editor, these contain bold-faced ALT, jump to first desired word.

3: replace all occurences of "> by </a><br>

4: replace occurences of Alt= " by ">

5: replace HREF=".. by ><A HREF="http://genome.cse.ucsc.edu/cgi-bin/

6: redisplay in the browser

7: copy-paste clean list to the final destination, which displays correctly despite some residual area shape code.
The raw DNA sequence can be obtained for the exactly the region desired, a great advantage over opening a whole-chromosome file and trying to find start and stop points within the sequence. However, the build dates to 5 Sep 00 whereas the sequencing has moved on. Using NCBI, the 1.5mbp is broken into contigs which may or may not cover the region being investigated:

GenBank NT_011424

AL365183.4: 82441-84931. . . gap(100)
AL365183.4: 80159-82340. . . gap(100)
AL365183.4: 58747-61034. . . gap(100)
AL365183.4: 52714-58646. . . gap(100)
AC021974.4: 3517-5802. . . gap(100)
AL365183.4: 24912-26976. . . gap(100)
AL353194.13: 1-56570
AL031670.6: 1-130163
AL365183.4: 5135-12387
AL356414.11: 7354-72540
AL121675.36: 101-113168
AL357040.12: 21285-36884. . . gap(100)
AL357040.12: 36985-213004
AL139350.17: 168765-181585
AL121781.38: 86506-178568
AL121916.14: 102-100167
AL109808.2: 69705-96084
AL133396.1: 1-122017 prion-doppel region
AL109808.2: 150871-150871
AL133354.14: 106-59726
AL445075.1: 101-5973
AL389886.10: 101-117824
AC068582.2: 32508-34816. . . gap(100)
AC068582.2: 94422-108651. . . gap(100)
AC068582.2: 69340-82363. . . gap(100)
AC021974.4: 101954-121327. . . gap(100)
AC021974.4: 25639-32062. . . gap(100)
AC021974.4: 1960-3416. . . gap(100)
AC021974.4: 1-1859. . . gap(100)
AL357040.12: 4500-7352. . . gap(100)
AL357040.12: 2225-4399. . . gap(100)
AL357040.12: 1-2124. . . gap(100)
AC068582.2: 82464-94321
AL121890.32: 2039-156810. . . gap(100)
AC016073.2: 148595-181121
AL121755.23: 3301-143892


 


 


 

Distinct Dutch 7x repeat CJD

Neurology 2000;55:1055-1057  free full text 
B. van Harten, W. A. van Gool, I. M. van Langen, J. M. Deekman, P. H. S. Meijerink,  and H. C. Weinstein 
Comment (webmaster):

Another human family, here Dutch, with 7 octapeptide repeats and CJD raises the steaks for the breed of cow discovered to have 5+2 octapeptide repeats (thus a candidate for inbred BSE, is Prionics testing these?) The sequence was different from the previous known human 2x; the mutant prion had sequence R122a22a2a4 vs R12232a2a4 for the American family. The known distinct extra repeat situations now number 31.

R122 32a2a4 US
R122a22a2a4 Dutch

R2:   CCT CAT ... GGT GGT GGC TGG GGG CAG
R2a:  CCT CAT ... GGT GGT GGC TGG GGA CAG
R3g:  CCC CAT ... GGT GGT GGC TGG GGG CAG
R3:   CCC CAT ... GGT GGT GGC TGG GGA CAG

In addition to the repeat discussed here, a 4x R1223g23g234 met/met was reported earlier this summer by Rossi et al. Neurology 2000;55:405-410. Dermaut B et al. J Neurol 2000 May;247(5):364-8 report a novel 7x repear with some loose ends still on micro-variations.

This has gotten to be quite a remarkable set of data -- no other human gene mutation set is quite like it. Because of the amino acid composition (high glycine), the GC composition and resulting hairpin potential are high -- the repeat region amounts to imperfect microsatelites within the imperfect 24mer repeat structure.

Loosely, once replication slippage gets the repeats in staggered hybridization, an out-of-control situation develops with a great many possible outcomes (all seemingly equally likely) before the polymerase gets back on track. The accompaning point mutations quite restricted, with R2a and R3g upon closer examination, instead chimeric R2/R3 and R3/R2 respectively.

Despite the variability, some hard rules apply: The first (9 bp) and final repeats are never repeated, never modified, and always present in their standard position. The second repeat is often repeated, never modified, and always present in its usual spot. These follow immediately from the slippage mechanism and are valid as well in animals (where many slippage events have also been documented). Repeat R3 almost always occupies the penultimate runout position (R234), R2a can occur here but never R3g despite being more common, 14:22, consistent with R2a interpreted as ordered chimeric halves of R2 and R3.

With the single exception of R2c, the modified bases are restricted to R2a and R3g. This argues against strongly against random replication errors associated with the conformational strain of slippage. These may result instead from base repair of R2 and R3 slightly mismatched during slippage alignment; these differ only in the 3rd and 21st base pairs, precisely the modifications observed.

The set of mutations is far from saturated -- it is rare to see the same one reported twice -- so the eventual set of these might number in the hundreds. (The mechanism cannot be recombination because of the lack of reciprocal recombinants for the single repeat deletion, which is a 1-2% polymorphism with a great many kindreds.)

30 known repeat insertions
1.2.2.3.4
1.2.2.2.3.4
1.2.2a.2.2a.2a.4
1.2.2.3.2a.2a.4
1.2.2.3.2.2.2.3.4
1.2.2.3.2.3.2.3.4
1.2.2.2.2.2.2.3.4
1.2.2.2.2.2.2.3.4
1.2.2.3g.2.2.2.3.4
1.2.2.3g.2.3g.2.3.4
1.2.2.3g.3g.3g.2.2.3.4
1.2.2.3.2.2.2a.2.3.4
1.2.2.3.2.3g.2.2.3.4
1.2.2.2a.2.2a.2.2.3.4
1.2.2.3.2.2.2.2.3.4
1.2.2.3.2a.2.2a.2.2.4
1.2.2.2.3.2.3g.2.2.3.4
1.2.2.3.2.3g.2.3g.2.3.4
1.2.2.2.2.2.2.2.2.3.4
1.2.2.3g.2.2.3g.2.2.3.4
1.2.2.3g.2.2.3g.2.2.3.4
1.2.2.3g.2.2.2.2.2.3.4
1.2.2.2.2.2.2.2.3g.3.4
1.2.2c3.2.3.2.3.2.3g.3.4
1.2.2.3.2.2.2.3g.2.2.3.4
1.2.2.3.2.2.2.2a.2.2.2.3.4
1.2.2.3g.3.2.2.2.2.2.2.3.4
1.2.2.3.2.2.2.2.2.2.2.2a.4
1.2.2.3.2.3g.2a.2.2.2.3g.2.3.4
1.2.2.3.2.3.3g.2.2a.2.3.2.3.4

composition
1   30
2  163
3   47
4   30
2a  14
3g  22
2c   1

A new mutation in the prion protein gene: A patient with dementia and white matter changes

Clinical characteristics, MRI abnormalities, and molecular findings in a Dutch patient with a new two-octarepeat insertion mutation in the prion protein gene. This patient presented with moderately progressive dementia of presenile onset and gait ataxia. MRI showed extensive cortical atrophy and white matter abnormalities. The mutation consists of a two-octarepeat insertion mutation and irregularities in the nucleotide sequence of the octarepeat region.

Individual II-1 (index patient, 2x insert, val/val): In 1991, a 64-year-old woman with progressive memory impairment for 3 years was seen at the outpatient clinic of another hospital. . Her gait was unsteady. CT showed severe cerebral atrophy and bilateral frontal leukoencephalopathy. A diagnosis of probable AD was initially made. With a moderately progressive dementia of presenile onset, the clinical characteristics of this patient are consistent with those of patients with inherited prion disease reported previously.

We saw the patient 3 years later when she was admitted to a psychiatric clinic for observation of behavioral problems. At that time her attention and concentration were impaired, she was dysarthric and used semantic paraphasias, her orientation to time and place was disturbed, the fluency of her speech was reduced, and she had agraphia, apraxia, and acalculia. Positive pseudobulbar reflexes were noted, and forced laughing and crying were present. Her gait was ataxic.

Vascular dementia, including cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy, was considered in the differential diagnosis. However, the gradual course rather than a stepwise deterioration, the cortical symptoms, and the absence of cortical or lacunar infarcts on neuroimaging argued against this diagnosis. The presence of a possible positive family history for dementia and the gait disorder did raise a suspicion of human prion disease. The patient died in 1995, almost 7 years after the onset of her illness. An autopsy was not performed.

I-1 (prion gene not studied) The father of index patient was referred to a psychiatric clinic with memory problems and agitation when he was 60 years old. Physical examination revealed dysarthria, anomia, disorientation, and short-term memory problems. Additional neurologic examination results were normal. Serum investigation revealed a disturbed renal function. The patient died 7 months after the onset of his illness. No information about his family history could be obtained.

Individual I-2 (prion gene not studied). At the age of 52 mother of index patient was readmitted with a severe dementia syndrome. She had been diagnosed with hypertension and diabetes mellitus. On neurologic examination she was hallucinating, had a dementia syndrome with cortical features, and had an ataxic gait. A diagnosis of hypertensive encephalopathy was made. Two of her siblings died during childhood, and five other siblings died without signs of a neurodegenerative disease at the ages of 63, 83, 83, 85, and 99 years.

11-2 (prion gene not studied) died of meningitis, age 23

Individual II-3 (normal prion gene). At the age of 49 years, the patientıs brother complained of memory disturbances. Neuropsychological investigation revealed slowness of visual and psychomotor functions, and verbal memory dysfunction for new and unstructured material. These deficits were attributed to a depression. The patient died at age 54 as a result of mesothelioma.

In the family described by Goldfarb LG et al Neurology 1993; 43: 2392­2394 the proband had a history of a rapidly progressive cortical dementia developing from a mutistic to a comatose state, and died 3 months after the onset of her illness. However, the mother of this patient had a two-repeat octapeptide mutation and demonstrated a very gradually progressive global dementia in 13 years. In a population of approximately 15 million people in the Netherlands, the only other known Dutch mutation is 8x.

Prion protein devoid of the octapeptide repeat region restores susceptibility to scrapie in PrP knockout mice.

Neuron 2000 Aug;27(2):399-408
Flechsig E, Shmerling D, Hegyi I, Raeber AJ, Fischer M, Cozzio A, von Mering C, Aguzzi A, Weissmann C
Comment (webmaster):

This is quite an interesting finding, especially the infectivity and disease presentation. Hopefully they determined the amino terminus of what protease-resistant PrP there was and reported incubation times for second passage in both kinds of mice.

One cautionary note: there are 28 well-conserved residues between the signal peptide and start of the repeat region. These cannot be assigned a role in copper binding at this time and indeed there is not even a proposal on the table that accounts for their observed conservation. So it is not just the copper domain that is missing in these truncated transgenic mice.

"Mice devoid of PrP are resistant to scrapie and fail to replicate the agent. Introduction of transgenes expressing PrP into such mice restores susceptibility to scrapie.

We find that truncated PrP devoid of the five copper binding octarepeats still sustains scrapie infection; however, incubation times are longer and prion titers and protease-resistant PrP are about 30-fold lower than in wild-type mice. Surprisingly, brains of terminally ill animals show no histopathology typical for scrapie. However, in the spinal cord, infectivity, gliosis, and motor neuron loss are as in scrapie-infected wild-type controls.

Thus, while the region comprising the octarepeats is not essential for mediating pathogenesis and prion replication, it modulates the extent of these events and of disease presentation."

Displaying residue conservation in the prion protein

05 Nov 00 webmaster research  [graphics to be supplied shortly]
Awash in genomic information, the question is how to effectively display it. No one wants to look at hundreds of pages of raw sequence or Blast output: yes, the information is all there, no the salient points to not jump out of text. Most information can be effectively conveyed through registered track or bands representing some information or analysis aligned with the linear gene or protein sequence from which it is derived.

For example, few people want to look at 3.1 billion base pairs of human genome sequence; the exonic structure of genes, mRNA expression data, cross-species homologies, alternate splicing, assembly contigs, retrotransposons, etc. are effectively displayed as tracks.

Genebander, a versatile web track tool developed on this site, allows a great variety of track input data, giving full control over track color, size, and stacking order. Many dozen tracks have been given on this site for prion and doppel.

Some important information, though inherently a 3-dimensional property of the folded protein and only calculable there, can still be represented effectively as a track. For example, percent exposure to solvent , can only be computed in the folded protein (say in 10% bins), but it easily displays in a linear track by spectral coloring. Residue proximity, say all side chains within 5 angstroms of a given beta carbon, requires a square stack of tracks, one for each residue's neighborhood. In conjunction with a track of all known point mutations, information can be correlated in ways that are difficult to attain in 3-dimensional displays

How might the degree of evolutionary conservation of each amino acid in the prion gene be effectively displayed, both as track and as 3D? First, some semi-quantitative measure needs to be chosen and deployed against a curated database of prion protein sequences from all species available.

The data is non-uniform because only gene fragments were sequenced in many instances; avian prions have regions of non-alignability with mammal, doppel is only alignable in mid to distal regions because of missing proximal domains, and overall taxon sampling was chaotic. Note too that only the first and last octapeptide repeats can be included -- internal repeats are over-written during evolution and thus are not homologous cross-species; the hexapeptide bird repeat, is further only analagous to mammalian octapeptide, despite compositional, structural, and likely functional similarites.

The observed degree of conservation of a given amino acid reflects many possible considerations, including close packing in the hydrophoic interior, salt bridges, buried hydrogen bonds, necessity for metal binding or an active site, inter-protein binding sites, tight turns, rigidity, appropriateness to secondary structure, side chain size, charge, polarity, hydrophobicity, etc.

Further, the degree of conservation is not always a property of an individual residue nor even of its local neighborhood in the linear sequence; it may be strongly co-evolving with a remote residue or region that are adjacent (or somehow coupled) in the 3D structure. For example, an glutamic acid, if involved in a buried salt bridge with a remote lysine, can hardly mutate into an arginine even though on the surface of the protein, charged-to-charged might well be a mild change. In conservative change scoring systems such as PAM matrices, surface and interior modalities of change to glutamic acid etc. are lumped, inappropriately to a situation with known 3D structure and relations.

The expert system here uses all available data to assign relative ranks (roughly proportional to evolutionary rates) from 1-10 for each of the 253 amino acids of the human prion protein according to these criteria:

10: absolute invariant in all known mammalian, avian and turtle prions and doppels
 9: conservative substitutions, alignable in all known mammalian avian and turtle prions and doppels
 8: absolute invariant in all known mammalian and avian prions, doppels not applicable
 7: conservative substitutions, alignable in all known mammalian and avian prions, doppels not applicable
 6: absolute invariant in all known mammalian prions, marsupial conservative
 5: conservative or toggle substitutions, alignable in all known mammalian prions
 4: small number of older synapomorphies associated with substantial evolutionary clade
 3: recent, phylogenetically scattered, alleles within species
 2: variable but limited in range to a column of genetic code
 1: unconstrained residue, non-conservative substitutions or small indels in closely related species
Almost need first level of determining phylogenetic depth of alignability, then fine-tune by breadth of variability. Suppress singletons, except key long branches such as marsupial, turtle, avian signal. Need to give different considerations to different domains in signal and gpi region as have known universal constraints.

Once each residue is evaluated, the rank numbers are recast proportionally as 8-bit grayscale values from 0-255 and used to color a graphic track corresponding to linear position in the protein. This grayscale graphic can then be made more visually accessible by thermal or blackbody coloring. Convolution smoothing (moving average window) optionally draws attention to local and regional trends. For 3D display, the pdf coordinates are rebuilt with conservation rank on a 0 to 1.0 scale replacing the B scale (temperature) of the pdf file, allowing display in 3D in molecular viewers such as SwissProtView permitting coloration by B scale.

Doppel Bibliography

Medline search term: (Prnd OR PrP-like OR Doppel OR PrPLP/Dpl OR prion-like) NOT yeast
Note prnD was in prior use for a gene in proline operon.

Perspectives: neurobiology. PrP's double causes trouble.

Science. 1999 Oct 29;286(5441):914-5.
Weissmann C, Aguzzi A

Physiological expression of the gene for PrP-like protein, PrPLP/Dpl. .

Am J Pathol 2000 Nov;157(5):1447-52
Li A, Sakaguchi S, Shigematsu K, Atarashi R, Roy BC, Nakaoke R, Arima K, Okimura N, Kopacek J, Katamine S
. .In adult wild-type mice, PrPLP/Dpl mRNA was physiologically expressed at a high level by testis and heart, but was barely detectable in brain. However, transient expression of PrPLP/Dpl mRNA was detectable by Northern blotting in the brain of neonatal wild-type mice, showing maximal expression around 1 week after birth. In situ hybridization paired with immunohistochemistry using anti-factor VIII serum identified brain endothelial cells as expressing the transcripts. Moreover, in the neonatal wild-type mice, the PrPLP/Dpl mRNA colocalized with factor VIII immunoreactivities in spleen and was detectable on capillaries in lamina propria mucosa of gut. These findings suggested a role of PrPLP/Dpl in angiogenesis, in particular blood-brain barrier maturation in the central nervous system. ..

Identification of a novel gene encoding a PrP-like protein expressed as chimeric transcripts fused to PrP exon 1/2 in ataxic mouse line with a disrupted PrP gene.

Cell Mol Neurobiol 2000 Oct;20(5):553-67.
Li A, Sakaguchi S, Atarashi R, Roy BC, Nakaoke R, Arima K, Okimura N, Kopacek J, Shigematsu K.
. . Here, we identified aberrant mRNA species in the brain of Ngsk Prnp0/0 ataxic, but not in nonataxic Zrch Prnp0/0 mouse line. These mRNAs were chimeric between the noncoding exons 1 and 2 of the PrP gene (Prnp) and the novel sequence encoding PrP-like protein (PrPLP), a putative membrane glycoprotein with 23% identity to PrP(C) in the primary amino acid structure. . . In the brain of wild-type and Zrch Prnp0/0 mice, PrPLP mRNA was barely detectable. In contrast, in the brain of Ngsk Prnp0/0 mice, PrP/PrPLP chimeric mRNAs were expressed in neurons, at a particularly high level in hippocampus pyramidal cells and Purkinje cells under the control of the Prnp promoter. ..

A mouse prion protein transgene rescues mice deficient for the prion protein gene from purkinje cell degeneration and demyelination.

Lab Invest. 1999 Jun;79(6):689-97.  .
Nishida N, Tremblay P, Sugimoto T, Shigematsu K, . .Sakaguchi S, DeArmond SJ, Prusiner SB, Katamine S.

[Physiopathology and molecular diagnosis for prion diseases].

Rinsho Byori 2000 May;48(5):437-41  [Article in Japanese]
Katamine S
. . The PrP-null mice (Ngsk Prnp0/0) revealed progressive ataxia due to the degeneration of cellebellar Purkinje cells at old ages. Successful rescue of Ngsk Prnp0/0 mice from neurodegeneration by a transgene encoding the normal mouse PrPC has indicated that the functional loss of PrPC is essential for this phenotype. Moreover, we detected aberrant mRNAs chimeric between Prnp exon 1-2 and a novel gene encoding PrP-like protein (PrPLP); ectopic expression of the PrPLP in the brain of Ngsk Prnp0/0 mice could be associated with Purkinje cell degeneration.

Doppel is an N-glycosylated, gpi-anchored protein expressed in testis. .

J Biol Chem. 2000 Sep 1;275(35):26834-41
Silverman GL, Qin K, Moore RC, Yang Y, Mastrangelo P, Tremblay P, Prusiner SB, Cohen FE, Westaway D.

Expression and structural characterization of the recombinant human doppel protein.

Biochemistry. 2000 Nov 7;39(44):13575-83.
Lu K, Wang W, Xie Z, Wong BS, Li R, Petersen RB, Sy MS, Chen SG.

Ataxia in prion protein (PrP)-deficient mice is associated with upregulation of the novel PrP-like protein doppel.

J Mol Biol. 1999 Oct 1;292(4):797-817  
Moore RC, Lee IY, Silverman GL, Harrison PM, . .Hood LE, Westaway D.

First report of polymorphisms in the prion-like protein gene (PRND): implications for human prion diseases.

Neurosci Lett. 2000 Jun 2;286(2):144-8.
Peoc'h K, Guerin C, Brandel JP, Launay JM, Laplanche JL.                 

Examination of the human prion protein-like gene doppel for genetic susceptibility to sporadic and variant CJD

Neurosci Lett. 2000 Aug 25;290(2):117-20
Mead S, Beck J, Dickinson A, Fisher EM, Collinge J




Displaying residue conservation in the prion protein

Mad Cow Home . . Best Links . . Contact Researcher . . Science Index