Doppel Prion Upregulated in Knockouts
Mad Cow Home ... Best Links ... Search this site

Doppel prion upregulated in knockouts
Ancestral origins of E200K -- better chr 20 microsatellites now available
Did a tandem gene doubling created the prion superfamily?
Details on 4th gene: RPS4X pseudogene
Massive parasitism of chromosome 20
Have any genes been missed?
Novel approach to monoclonal antibodies
Glycotype analysis in familial and sporadic CJD patients
Unusual repeat in lemur prion

Doppel prion upregulated in knockouts

J. Mol. Biol. (1999) In press [Sept 24, 1999  or later]
Guest access to full text pdf now available 817k 
Moore,R.C., Lee,I.Y., Silverlman,G.L., Harrison,P.M., Storme,R., Heinrich,C., 
Karunaratne,A., Pasternak,S.H., Chishti,M.A., Liang,Y., Mastrangelo,P., Wang,K., 
Smit,A.F.A., Katamine,S.,  Carlson,G.A., Cohen,F.E., Prusiner,S.B., Melton,D.W., 
Tremblay,P., Hood,L.E. and Westaway,D.
On 8 Sep 99 the GenBank entry for long incubation mouse U29187 was updated yet again. This time the provocative title and authors of the unavailable JMB article were disclosed:

"Ataxia in Prion Protein (PrP) Deficient Mice is Associated with Upregulation of the Novel PrP-like Protein Doppel"

The first author, RC Moore, is known for precise knockouts/knockins in mice, most notably the study that resolved longstanding confusion over linc/sinc alleles in mice in favor of the two amino acid substitutions only (but whether one or both is crucial is not published).

From the title, it appears that earlier published studies on knockout mice are problematical until it is established whether strain construction disrupted _both_ the prion and prion-doppel genes or just the prion gene, because the ghost prion evidently has compensatory (rather anti-compensatory) abilities. It seems possible from the title that double knockouts might exhibit less severe symptoms. The result also throws a monkey wrench into plans to breed livestock incapable of sustaining TSE via prion gene knockouts.

Note the title doesn't say that the prnd upregulation actually causes the ataxia; perhaps a mouse strain developing ataxia had upregulation but a strain not exhibiting ataxia did not have upregulation. (One hopes controls show upregulation of prnd is narrow and specific, ie, not one of hundreds of miscellaneous genes upregulated and that mRNA quantity is a reasonable proxy for prnd protein.) The ataxia is unlikely to be a TSE (though ataxia is often observed there); it may be a neurological disorder with a completely different mechanism -- obviously there is no prion amyloid possible in a knockout.

In other words, the title suggests that the prnd gene can sense when the prion gene is not doing its job and tries to compensate for its absence. This raises the issue of how and what enables the feedback cascade. Need the prion gene be deleted to get the effect or could a subtle inactivating point mutation also do this? There must exist natural or environmental circumstances under which prnd is up-regulated (or down-regulated)-- the gene surely does not lie in wait for one in a trillion deletion events in prnp. The other half of the co-regulation equation is that prnp itself may be down-regulated under some circumstances as the cell adjusts the mix of these proteins to meet conditions.

What then happens with prnd regulation in familial CJD? Are there common human or mouse polymorphisms of prnd, what sort of diseases if any do mutations in prnd cause (eg, known but unmapped dementias or ataxias), and how do ghost prion gene variants interact with prion alleles and diseases? It seems like sporadic CJD, nvCJD, scrapie, and BSE will all have to be re-examined for ghost prion alleles and mutations, including those of regulatory type. (Codon 129 has no identifiable counterpart in prnd as yet; alignment with prnp only becomes reliable 11 aa later, but this could change if more species of prnd became available.)

Note that regulation of prnd in mouse could involve alternate splicing, favoring exon 2a or exon 2b depending on stage of development, tissue, and conditions. The much more rapid (2.4x) acceptance of point mutations in ghost prion suggests that prnp is gradually taking over its role or that it has become an auxillary supplementary protein helpful under special conditions.

Thus the plot has really thickened -- prion and prnd are only 22% identical in sequence, ie, faint paralogues. The prnd gene is adjacent and on the same strand in both human and mouse, with inter-CDS distances (stop to met) of 14,472 bp and 15,771 bp respectively, and no intervening genes. Although the ghost prion gene was known to be transcribed (represented in dbEST), it seemed little used. However, the heavily weighted conserved TATA box observed by the webmaster in aligning promoter regions of human and mouse prnd shows a classic RNA polymerase II positive regulatory signal [lacking in constitutive prion promoter], so perhaps prnd is poorly represented in the ESTs because conditions for induction were absent in the tissues where cDNA was prepared.

The 3D structure of the new protein, as noted earlier by the webmaster, is largely predictable because key anchor regions, such as the underpass and disulphide/first glyscosylation regions are intact (these fix the ghost protein's interior) and while weak, the homology is far more than adequate to guarantee conserved structural folds. The homology only extends post-repeat, post 106-126, which fortunately was the domain determined by nmr. The function of the globular domain could thus be somewhat conserved and a direct physical interaction (heterodimer) seems possible as well (though should have been detected in yeast two-hybrid screens; the ill-characterized protein X should be considered here as well). The two proteins must have interactions in common to explain why an ancient paralogue is not equally remote to avian prions.

What seems to be established is that the normal functions of these two proteins are not so diverged as to make them unrelated or uncoupled. It is all a little reminiscent of the contiguous hemoglobin genes and compensation for myoglobin knockouts. Are prnp and prnd just the start of a similarly large protein superfamily? The rapid progress in sequencing chromosome 20 and the rest of the human genome will settle this issue shortly.

The ghost prion is on the one hand an unwelcome complexity raising troubling questions about conclusions in hundreds of earlier prion papers. On the other hand, it affords new opportunities to understand normal prion domain function, its historical origin, and relation to other genes. While many other amyloid diseases (notably Alzheimer) implicate several genes, none have involved the interplay of tandem paralogues.

Prnd is unlikely in and of itself to cause amyloid disease because it lacks the crucial 106-126 cross-beta region. It is amyloid formation that makes familial CJD autosomal dominant; this is a gain-of-toxic-function disease, not a loss of normal function. Mutations in doppel prion might thus be recessive because a single functioning allele could respond adequately (or even normally) to an ongoing signal at its TATA promoter. If so, prnd disease could be extremely rare because only homozygotes would be affected (not their offspring who are just carriers). For this reason, it is not surprising that mutations in CJD kindreds always map to the prnp gene, not prnd. But one wonders about sporadic CJD modulatory effects in prnd heterozygotes. And perhaps zitter rats need to be revisited.

Does the ghost prion gene complicate efforts to find therapies for CJD? Yes. Some therapies envision knocking out or knocking back the amount of prion protein produced, so that the brain can clear amyloid faster than it is produced. Genetically engineered cows and sheep lacking the prion gene are sometimes put forward as a permanent solution to scrapie and BSE. The lesson from the knockout mice is that a novel and unacceptable ataxia might result in place of CJD. Knockouts of both the prion and ghost prion gene might lethal (as once was expected for the highly conserved prion gene).

Medline history of ataxia in prion knockout mice:

A mouse prion protein transgene rescues mice deficient for the prion protein gene from purkinje cell degeneration and demyelination.

Lab Invest 1999 Jun;79(6):689-97
Nishida N, Tremblay P, Sugimoto T, Shigematsu K, Shirabe S, Petromilli C, Erpel SP, 
Nakaoke R, Atarashi R, Houtani T, Torchia M, Sakaguchi S, DeArmond SJ, Prusiner SB, Katamine S
Disruption of both alleles of the prion protein gene, Prnp, renders mice resistant to prions; in a Prnp o/o line, mice progressively developed ataxia and Purkinje cell loss. Here we report torpedo-like axonal swellings associated with residual Purkinje cells in Prnp o/o mice, and we demonstrate abnormal myelination in the spinal cord and peripheral nerves in mice from two independently established Prnp o/o lines. Mice were successfully rescued from both demyelination and Purkinje cell degeneration by introduction of a transgene encoding wild-type mouse cellular prion protein. These findings suggest that cellular prion protein expression may be necessary to maintain the integrity of the nervous system.

A 2-year longitudinal study of swimming navigation in mice devoid of the prion protein: no evidence for neurological anomalies or spatial learning impairments.

Lipp HP, Stagliar-Bozicevic M, Fischer M, Wolfer DP
Behav Brain Res 1998 Sep;95(1):47-54
Uncontrolled accumulation of a conformationally distorted protein (PrP(Sc)) is supposed to be the pathological process leading to spongiform encephalopathy. Targeted disruptions of the Prn-P gene in the mouse have resulted in animals that did not show anomalies in spatial and avoidance learning and were resistant to experimental infections. However, another Prn-P knockout mouse was reported to show ataxia and Purkinje cell degeneration developing after 70 weeks of age. In this study the initial observations are confirmed on swimming navigation of PrP-null mutant mice using an enlarged sample of 58 mice. A representative subsample of 16 mice was then followed up for their ability of swimming navigation up to an age of two years (104 weeks). Surviving PrP-null mutants (n = 4) and controls (n = 6) did not differ in any measure, nor were there indications of ataxia and Purkinje cell degeneration. It was concluded that the PrP-knockout mice used by Bueler et al. were probably normal with respect to aging processes and that resistance to scrapie is not necessarily paid for by late neuronal degeneration. The reasons for the discrepancy between different knockout experiments require experimental clarification, however.

Expression of amino-terminally truncated PrP in the mouse leading to ataxia and specific cerebellar lesions.

Shmerling D, Hegyi I, Fischer M, Blattler T, Brandner S, Gotz J, Rulicke T, Flechsig E, Cozzio A, von Mering C, Hangartner C, Aguzzi A, Weissmann C
Cell 1998 Apr 17;93(2):203-14 
The physiological role of prion protein (PrP) remains unknown. Mice devoid of PrP develop normally but are resistant to scrapie; introduction of a PrP transgene restores susceptibility to the disease. To identify the regions of PrP necessary for this activity, we prepared PrP knockout mice expressing PrPs with amino-proximal deletions. Surprisingly, PrP lacking residues 32-121 or 32-134, but not with shorter deletions, caused severe ataxia and neuronal death limited to the granular layer of the cerebellum as early as 1-3 months after birth. The defect was completely abolished by introducing one copy of a wild-type PrP gene. We speculate that these truncated PrPs may be nonfunctional and compete with some other molecule with a PrP-like function for a common ligand.

Loss of cerebellar Purkinje cells in aged mice homozygous for a disrupted PrP gene.

Nature 1996 Apr 11;380(6574):528-31
Sakaguchi S, Katamine S, Nishida N, Moriuchi R, Shigematsu K, Sugimoto T, Nakatani A, Kataoka Y, Houtani T, Shirabe S, Okada H, Hasegawa S, Miyamoto T, Noda T
Prion protein (PrP) is a glycoprotein constitutively expressed on the neuronal cell surface. A protease-resistant isoform of prion protein is implicated in the pathogenesis of a series of transmissible spongiform encephalopathies. We have developed a line of mice homozygous for a disrupted PrP gene in which the whole PrP-coding sequence is replaced by a drug-resistant gene. In keeping with previous results, we find that homozygous loss of the PrP gene has no deleterious effect on the development of these mice and renders them resistant to prion. The PrP-null mice grew normally after birth, but at about 70 weeks of age all began to show progressive symptoms of ataxia. Impaired motor coordination in these ataxic mice was evident in a rotorod test. Pathological examination revealed an extensive loss of Purkinje cells in the vast majority of cerebellar folia, suggesting that PrP plays a role in the long-term survival of Purkinje neurons.

Ancestral origins and worldwide distribution of the prnp E200K mutation causing familial CJD.

Am J Hum Genet 1999 Apr;64(4):1063-1070
Lee HS, Sambuughin N, Cervenakova L, Chapman J, Pocchiari M, Litvak S, Qi HY, Budka H, del Ser T, Furukawa H, Brown P, Gajdusek DC, Long JC, Korczyn AD, Goldfarb LG
CJD belongs to a group of prion diseases that may be infectious, sporadic, or hereditary. The 200K point mutation in the PRNP gene is the most frequent cause of hereditary CJD, accounting for over 70% of families with CJD worldwide.

Prevalence of the 200K variant of familial CJD is especially high in Slovakia, Chile, and Italy, and among populations of Libyan and Tunisian Jews. To study ancestral origins of the 200K mutation-associated chromosomes, we selected microsatellite markers flanking the PRNP gene on chromosome 20p12-pter and an intragenic single-nucleotide polymorphism at the PRNP codon 129.

Haplotypes were constructed for 62 CJD families originating from 11 world populations. The results show that Libyan, Tunisian, Italian, Chilean, and Spanish families share a major haplotype, suggesting that the 200K mutation may have originated from a single mutational event, perhaps in Spain, and spread to all these populations with Sephardic migrants expelled from Spain in the Middle Ages. Slovakian families and a family of Polish origin show another unique haplotype.

The haplotypes in families from Germany, Sicily, Austria, and Japan are different from the Mediterranean or eastern European haplotypes. On the bais of this study, we conclude that founder effect and independent mutational events are responsible for the current geographic distribution of hereditary CJD associated with the 200K mutation.

Comment (webmaster):
This was a good effort that brings CJD mutations closer to a par with what is done in other diseases. Because of ambiguities and recombination, they did not quite settle the original objection to the model, namely why does E200K not show up in the Netherlands and Turkey which also received large numbers of Sephardic immigrants. Although formulas exist for extracting the date of the original mutation, no real estimate was made here other than Middle Ages.

CJD is an interesting disease genetically in that onset is so late that it has little effect on reproductive success, especially in eras where the lifespan was in the fourties. For that reason, it can and does persist for many generations without any clear selective pressure for elimination. However no really old kindreds are known in CJD: a few hundred years at most (unless val 129, a European non CpG allele, is considered a huge and ancient kindred).

Similar studies could be done for all the other CJD point mutations. The effect of identifying distant and unsuspected kindreds can only be to reduce the number of independent mutations at each locus, whether seeming unrelatedness arise through extra-marital affairs or geographical migration. However, many of the mutation sites (eg, CpG hotspots) will prove to have more than one ultimate kindred. The bottom line is that familial CJD (usually defined as prion ORF mutation) is a much smaller fraction of total CJD if each kindred is only counted once, perhaps only 2-3%.

This paper was submitted 13 Nov 98 and the microsatellite markers used of course had to be selected many months earlier. So the authors cannot be faulted for not using September 1999 results from the Sanger Centre chromosome 20 sequencing team. But if this study were to be repeated today (for this or another allele), how would the microsatellites be chosen differently?

Since linkage disequilibrium is sought, markers should be more tightly linked to the prion gene itself. Here the closest markers were millions of base pairs away from the prion gene, greatly increasing the risk of recombination and loss of the microsatellite signature of the original mutation carrier. Markers used in the paper are given below; MIT, Sanger Center, or chr 20 markers developed at this web site are intercalated as indents. [Note: the NCBI radiation hybrid map has the prion gene placed erroneously and is not used further here; Sanger Centre map of chr_20ctg440 is upside-down. Marker SG13212 is actually a common Mer2 retrotransposon and maps spuriously to chr_20ctg440.]

telomere...
D20S842
D20S181
..WI-8798 
..WI-9015 
..WI-5517 
D20S193
..CHLC.ATA21E04 
D20S473
D20S867
..WI-4876 
D20S889
D20S116           
..WI-3772        [STS for dJ1115K8 in shotgun ]
D20S97           [not in dJ1068H6]
D20S482          [now called GATA51D03, not in dJ1068H6]
..WI-2640        [D20S500, could match dJ1068H6 but does not] 
..WI-3651        [D20S1095 or  G13331. IDI isomerase pseudogene dJ1068H6.01598 11694-11887 minus 
..GDB:513003     [prion est dJ1068H6.01172  43200-45527]
WI-7784          [D20S1014, prion gene CDS dJ1068H6.01172  43966-45571] 
..g29963         [prion est dJ1068H6.01172 45222-45562; 27468-27808 of human prion U29185]
..RPS4X          [mRNA for RPS4X matches and overhangs the 5' end of the contig dJ1068H6
..FB25H5         [T03153, does not match dJ1068H6, nearest known marker 5' of this region, fetal brain mRNA]
D20S895          [not in dJ1068H6]
..WI-4689        [D20S751]
D20S849          [not in dJ1068H6]
D20S882          [not in dJ1068H6]
..D20S95         [not in dJ1068H6]
...centromere

U. Southampton map
Locus           Location
WI-17681  1.342  
STSG29963 1.342
D20S116   1.342    
STSG408   1.342  
STSG20158 1.342  
STSG8000  1.342  
STSG30106 1.354      
A005O05   1.361 
A002D12   1.361  
H22126    1.373     
PM1146    1.374    
D20S742   1.413   
STSG20381 1.423
A009O14   1.467
D20S97    1.483
A005R07   1.519    
AA026396  1.519    
D20S1149  1.60
CENPB     1.637      
D20S482   1.681 
D20S437   1.867    
D20S500   1.88
PRNP      2.079  
STSG8643  2.097
STSG10203 2.097  
D20S168   2.12 
WI-17673  2.125      
PCNA      2.125 
SHGC-1691 2.125 
STSG30431 2.125
SGC34960  2.125   
D20S835   2.125
STSG29447 2.125 
R79078    2.125 
D20S895   2.251     

Despite 150,000 bp of chr 20 sequence, the prion--prnd genes at this point lack flanking active genes. In other words, though flanking pseudogenes have been found, the active gene flankers that might provide a clue to normal prion or doppel prion function are farther out on the sequence.

The two nearest markers are WI-2640 [Genbank G03655], a 311 bp sequence created from random sheared human DNA, and FB25H5 (Genebank T03153), a more interesting 312 bp fetal brain mRNA sequence that, however, is not found in anything sequenced yet on chr 20 nor in known protein.

Now D20S1095 and RPS4X are very close-in markers to the prion gene. These do not have simple repeats per se (as used in the article) but are unconstrained pseudogenes possibly suitable for for CJD mutation mapping. There is no shortage of simple repeats which are expected to be highly polymorphic like those CA repeats used in this study because of replication slippage. There would be no likelihood of recombination for markers this close. [The -21 polymorphism for A1117V has never been found outside of the context of this mutation.]. The simple repeats and low complexity regions (quasi-repeats) are:

1172-exon1prp     534     588   (14622)   +   GC_rich   Low_complexity
1172-exon1prp     858     905   (14305)   +   G-rich    Low_complexity
1172-exon1prp   12202   12227   (2983)    +   AT_rich   Low_complexity
1172-exon1prp   14876   14906   (304)     +   AT_rich   Low_complexity
1277_exon1prp    6999    7071   (49770)   +   GA-rich   Low_complexity
1277_exon1prp    7288    7343   (49498)   +   GA-rich   Low_complexity
1277_exon1prp   21154   21176   (35665)   +   (TTTTC)5  Simple_repeat
1277_exon1prp   22845   22866   (33975)   +   AT_rich   Low_complexity
1277_exon1prp   26226   26268   (30573)   +   (TG)21    Simple_repeat
1277_exon1prp   37238   37291   (19550)   +   AT_rich   Low_complexit
1277_exon1prp   43575   43655   (13186)   +   CT-rich   Low_complexity
1277_exon1prp   46133   46160   (10681)   +   AT_rich   Low_complexity
1277_exon1prp   51170   51181   (5660)    +   AT_rich   Low_complexity
1277_exon1prp   51476   51499   (5342)    +   AT_rich   Low_complexity
1277_exon1prp   51742   51767   (5074)    +   (GA)13    Simple_repeat
795_end_prnpM    5782    5813   (14498)   +   (CAAAA)16 Simple_repeat
795_end_prnpM    7598    7630   (12681)   +   (A)33     Simple_repeat
795_end_prnpM   19859   19883   (428)     +   GC_rich   Low_complexity
post3'.......    7779    7799   (43425)   +   AT_rich   Low_complexity
post3'.......   11155   11207   (40017)   +   (TAAA)13  Simple_repeat
post3'.......   19633   19690   (31534)   +   (TTTTA)12 Simple_repeat
post3'.......   24504   24530   (26694)   +   AT_rich   Low_complexity
post3'.......   25551   25575   (25649)   +   (T)25     Simple_repeat
post3'.......   28245   28326   (22898)   +   (TTTC)20  Simple_repeat
post3'.......   36249   36287   (14937)   +   (TAAA)10  Simple_repeat
post3'.......   47799   47831   (3393)    +   (TTTTG)6  Simple_repeat
post3'.......   50707   50742   (482)     +   (CA)18    Simple_repeat
post3'.......   50902   50927   (297)     +   AT_rich   Low_complexity

D20S849 is not really mapped properly with respect to D20S895 nor FB25H5. It could be the closest marker to the prion gene. The order is not established with the YACs used, though this is not discussed in the paper. D20S849 has not been mapped on chromosome 20 and apparently is a little-used Genethon sequence. This raises the point that despite 150,000 bp of sequence about the prion gene, none of the known microsatellites actually map to this region. For that reason, global orientation of the prion-prnd genes relative to the telomere and centromere is not completely certain. However, this is a simple matter of mapping WI-3651 [D20S109] to the MIT YAC set. The Sanger Center map put this telomeric.

         STSs (key below)
  YACS   25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
753-G-9*  A  D  C  D  S  D  D  D  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
772-D-10  .  F  .  .  .  D  D  F  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
972-H-12  .  .  D  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
763-E-2   .  .  S  S  S  F  F  F  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
917-E-1   .  .  .  F  S  F  F  F  S  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
788-D-12  .  .  .  .  S  D  D  S  D  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
804-D-7   .  .  .  .  S  S  S  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
775-D-4   .  .  .  .  D  D  D  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
717-G-1   .  .  .  .  S  F  F  F  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
826-C-11  .  .  .  .  D  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
938-C-2*  .  .  .  .  .  .  .  S  D  F  D  .  .  .  D  S  S  D  .  .  .  .  .  .
898-D-9*  .  .  .  .  .  .  .  .  F  D  D  .  .  .  .  .  .  .  .  .  .  .  .  .
839-G-4   .  .  .  .  .  .  .  .  .  D  .  .  .  .  .  .  .  .  .  .  .  .  .  .
918-A-5   .  .  .  .  .  .  .  .  .  F  F  .  C  F  F  S  S  F  .  .  .  .  .  .
793-A-7   .  .  .  .  .  .  .  .  .  .  D  .  .  D  D  S  .  .  .  .  .  .  .  .
841-D-7   .  .  .  .  .  .  .  .  .  .  .  D  .  D  .  D  S  S  S  D  D  D  .  .
812-G-9   .  .  .  .  .  .  .  .  .  .  .  D  F  .  D  F  F  .  S  .  .  .  .  .
938-B-6   .  .  .  .  .  .  .  .  .  .  .  F  .  .  .  .  .  .  .  .  .  .  .  .
971-A-4   .  .  .  .  .  .  .  .  .  .  .  F  .  F  .  .  .  .  .  .  .  .  .  .
890-F-1   .  .  .  .  .  .  .  .  .  .  .  D  .  D  D  .  D  D  .  .  .  .  .  .

*used in E200K paper
V = Verified YAC/STS hit.
D = Unique ("Definite") YAC/STS hit.
F = Ambiguous YAC/STS hit, resolved using CEPH fingerprint data.
S = Ambiguous YAC/STS hit, resolved using STS content data.
C = Verified hit reported by other lab, primarily CEPH
c = Unique, unverified hit reported by other lab, primarily CEPH

  25  WI-4876  [G04713] on dJ1009E24.00239 chr_20ctg192
  26  D20S889 
  27  D20S116 
  28  WI-3772 
  29  D20S97 
  30  CHLC.GATA51D03 
  31  WI-2640 
  32  WI-7784 
  33  FB25H5 
  34  D20S895 
  35  WI-4689 
  36  D20S882 
  37  D20S95 
  38  WI-5288 
  39  WI-9063 
  40  D20S916 
  41  WI-9399 
  42  CHLC.GGAA9H10 
  43  D20S194 
  44  WI-9559 
  45  WI-8757 
  46  WI-8238 
  47  D20S892 
  48  D20S192 

The current status of the large DNA piece containing the prion gene: 30 contigs have been assembled to 4 contigs as of early Septermber 1999. These are easily joined in the proper order using mouse homology: dJ1068H6 is ordered 1277-1172-795-1598 to make a single stretch of 150kb of DNA. Next additions will come from 599I11 (DNA made) and 189G13 (in shotgun).

dJ1068H6.01277   26480 bp  RPS4X pseudogene at 5' end, minus strand
dJ1068H6.01172   60129 bp  prion gene 00030 through 00024 
dJ1068H6.00795   7875 bp   prion-prnd intergene region 00905 237-3118 , 00894 , 01225 3211-6055, 6175-7875
dJ1068H6.01598  53987 bp   ghost prion gene +IDI pseudogene

[Actually it is not so easy to attach piece 1277. Fortunately, a nested retrotransposon pair was split in half in forming the contigs. This allows a reliable join by a most unusual method.]

Did a tandem gene doubling created the prion superfamily?

15 Sep 99 webmaster research
Did a tandem gene doubling created the two prion superfamily genes? From the mouse sequence, it is seen that the two genes are on the same + strand with order 5'-prion-doppel-3'. Since the genes diverged prior to mouse/human divergence, the same orientation will be found in human (and is). This is consistent with a localized tandem duplication without inversion. However, many odd chromosomal rearrangements have been documented over the years. In fact, an inversion did occur in intron 1 of the prion gene involving several thousand bp or more: see below. Tandem duplication does not explain the missing domain 106-126 (invariant over 310+ million years) nor the missing repeat domain (which probably represents a more recent internal replication slippage).

The boundaries of this duplication seem to have been quite narrow. That is, if the original duplication had been more extensive, adjacent genes on both sides would now intervene, but none are found. So unless these were lost later by deletion, the event was indeed quite localized. It is not known which position is ancestral; the outcome is the same:

5'-X-prion-Y-3' (historical single gene)
5'-X-prion-Y-X-prion-Y-3' (tandem duplication)
5'-X-prion-Y-X-doppel-Y-3' (right-hand duplicated gene diverges)

5'-X-doppel-Y-3' (historical single gene)
5'-X-doppel-Y-X-doppel-Y-3'(tandem duplication)
5'-X-prion-Y-X-doppel-Y-3' (left-hand duplicated gene diverges)
Even if no other functional genes were duplicated, the boundaries are unlikely to have been exactly those of the mRNAs, ie zero gap. Can any of this intervening material be detected in the face of subsequent extensive parasitization of this region by retrotransposons (today constituting up to 70% of the contig DNA)? Little conservation of nucleotide sequence is expected in the inter-gene region and the date of tandem duplication may be very ancient. This scenario suggests that even stripping out retrotransposons would leave slim chances of leaving alignable regions reflecting the old boundaries. However, this would give an indication of the gap size at time of duplication.

On the other hand, if the duplication was more recent, some traces might be left in the older retrotransposons themselves. These can go back a couple of hundred million years or more. In this case, mammalian order-specific retrotransposons such as Alu could be removed to improve the comparison and those found in both human and mouse taken as the older ones. This method does not rely on streak or blast comparisons of individual nucleotides but on more global features of repeats of retrotransposons per se.

Poor alignment of doppel prion to avian prions support this more recent scenario; an early paralogue has no reason to be closer to mammalian prion. The fact that it does suggests a physical interaction between the two proteins (or a third protein in common) has caused co-evolution of ghost prion with the respective prion protein within mamalian and avian lineages. This scenarion would be strengthened if bird prnd is found closer to bird prion, or if prnd-prnp heterodimers are found.

At this time, 56,841 bp of sequence is available 5' of exon 1 of the prion gene (the X region above), 20,311 bp inter-gene region (end prion mRNA to TATA of prnp promoter, YX region), and some 51,224 bp 3' to the end of the ghost mRNA (Y region) are available. The human genome averages 47% identifiable interspersed repeats.

There are no dbEST hits or Blastp matches to the inter-gene region with or without masking of retrotransposons. If mouse inter-gene region 21676-34085 is also masked and compared to the 11846 bp masked (41.7%) human inter-gene, there are only 6 short scattered stretches of homology of about 100 bp each totalling 683 bp. These have been have conserved for unknown reasons but are not found elsewhere in known sequences, in particular not in tandem doubling positions.

In other words, any signature of a tandem duplication has been totally erased at the DNA level. This is not at all surprising given the poor correspondence of intron/exon structure, the failure of Blastn even withing coding sequences, and the borderline success of Blastp.

If there was a tandem duplication, why does the doppel prion lack the amino terminal domains? Perhaps it is not lacking: under some conditions, Blast searches extend the homology back to the end of the repeat region and further suggest that the unusual WWW in human sequence is a frameshift of gly-gly-gly within one of the repeats:

doppel:   34 KWNRKALPST-----AQITEAQVAENRPGAFI---KQGRKLDIDFGAE-GNRYYEANYWQ 84
             +WN+ + P T     A    A       G ++      R + I FG++  +RYY  N  +
prion: 43495 QWNKPSKPKTNMKHMAGAAAAGAVVGGLGGYMLGSAMSRPI-IHFGSDYEDRYYRENMHR 43671

doppel:     6 SWWWLAT 12
              SWWWL +
prion:  43455 SWWWLGS 43475

PHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHMAGA          normal prion
ASWWWLGAAPWWWLGTASWWWLGSRRWHPQSVEQAE-AKNQHEAHGWCC         frameshift
............MRKHLSWWWLATVCMLLFSHLSAVQTRGIKHRIKWNRKALPSTAQ normal human ghost
............MKNRLGTWWVAILCMLLASHLSTVKARGIKHRFKWNRKVLPSSGG normal mouse ghost
extent of human/mouse prnp-prnd inter-gene homology:
104/128 (81%)
53/63 (84%)
143/199 (71%)
94/124 (75%)
54/65 (83%)
77/101 (76%)

>human/mouse inter-gene conserved stretches, concatenated
ttagaaatagattttcttgattaaaatgaaaattaacaagctctaaagaacttgcctgctttagtctggcacaaagtgaagagtttcatgatca
accaacgtggtttagtttggctcagacctaggaagatctaaaactttacgccggatcatgttcctcccacacacaaatacactgcaaacacacc
tggcttggagttatgctgattagaagttaattaacataaagcagggagtgttggctgtaattcagtggctcctgacatctcaatctgcccagtt
gctccttctctacctcacaggaaagttatgctgatgagtaaggacacacccttctcacacactgagctgttttggcaagactaggtggtgtgaa
tttagtatttctggcaggctggtaattaacatgttagcacatacctgtgtattcaaagcagcttttgggcccgagcttactatcctttacacta
tttcaggggtcttccccattcctctgtgcctttgtaaacagccttttgttcactaacaagaggaatctgccatcaagattttcacgtggtttcc
ttagtaaagtgtgatgagaaggtccatccttctcaggatgaaggagtggtccaggaagccctgattggtctgccggggagggaagggctgcctt
atttggag

Details on 4th gene: psRPS4X

Thu, 16 Sep 1999 webmaster research
The fourth and last gene to be found by the webmaster on this stretch of human chromosome 20 is located at the very 5' start of the assembled sequence, on the minus strand. It turns out to be a very recently inactivated gene from the ribosomal protein S4 superfamily, which has 95% DNA and 93% protein homologues on the X and Y chromosomes, as well as chr 10 and 12. This protein family has no apparent structural or functional connection to prion or doppel proteins; a pseudogene would have even less.

Not all of it is represented by the chr 20 sequences at hand, only first 204 aa of the 263 of the originating gene are present. A cysteine-to-stop codon change, followed shortly by a 1 bp frameshift, insures that it is not functional. There are not any other strong gene candidates on the 157,000 consecutive nucleotides of chromosome 20 besides psRPS4X -prnp-prnd-psIDI.

The inactive protein is shown below -- it begins with the ATG met start codon and continues until truncated by the 5' end of the chr 20 sequence (suggesting that its DNA continuation would help join later contigs extending this region):

3'5' Frame 1
MARGPKKHLKQVAAPKHWMLDKLTGVFAPHPSTSPHKLRECLPLIIFIRNRLKYALTGEE
VKKICMQRFIKIDGKVRTDITYPAGFMDVISIDKMGENFCLICDTKGHFAVHRITPEEAK
YKL-KVRKnicahkrnpssgds-csyhllp-sphqge-hhsd-fgdwqdy-lhqvrhw-p
vygdwrcqpgknwcdhqqreapwi

3'5' Frame 2
wlvvprsi-sr-qlqsigcwin-lvclllihppvpts-esvspssfs-gtdlsmp-qerk
-rrfacsgslrsmarsali-ptlldswmssaltrwerisv-svtprvtllyivlhlrrps
tscek-ekIFVRTKGIPHLVTHDARTICYPDPLIKVNDTIQIDLETGKITDFIRFDTGNL
CMVTGGANLGRIGVITNRERHPG...

Assembled protein from the two reading frames by comparison to authentic RPS4X:

MARGPKKHLKQVAAPKHWMLDKLTGVFAPHPSTSPHKLRECLPLIIFIRNRLKYALTGEE
VKKICMQRFIKIDGKVRTDITYPAGFMDVISIDKMGENFCLICDTKGHFAVHRITPEEAK
YKL KVRKIFVRTKGIPHLVTHDARTICYPDPLIKVNDTIQIDLETGKITDFIRFDTGNL
CMVTGGANLGRIGVITNRERHPG

Alignment of assembled protein with authentic RPS4X (93% identity):

psRPS4X: 1   MARGPKKHLKQVAAPKHWMLDKLTGVFAPHPSTSPHKLRECLPLIIFIRNRLKYALTGEE 60
             MARGPKKHLK+VAAPKHWMLDKLTGVFAP PST PHKLRECLPLIIF+RNRLKYALTG+E
RPS4X:   1   MARGPKKHLKRVAAPKHWMLDKLTGVFAPRPSTGPHKLRECLPLIIFLRNRLKYALTGDE 60

psRPS4X: 61  VKKICMQRFIKIDGKVRTDITYPAGFMDVISIDKMGENFCLICDTKGHFAVHRITPEEAK 120
             VKKICMQRFIKIDGKVRTDITYPAGFMDVISIDK GENF LI DTKG FAVHRITPEEAK
RPS4X:   61  VKKICMQRFIKIDGKVRTDITYPAGFMDVISIDKTGENFRLIYDTKGRFAVHRITPEEAK 120

psRPS4X: 121 YKL-KVRKIFVRTKGIPHLVTHDARTICYPDPLIKVNDTIQIDLETGKITDFIRFDTGN 179
             YKL KVRKIFV TKGIPHLVTHDARTI YPDPLIKVNDTIQIDLETGKITDFI+FDTGN
RPS4X:   121 YKLCKVRKIFVGTKGIPHLVTHDARTIRYPDPLIKVNDTIQIDLETGKITDFIKFDTGN 179

psRPS4X: 180 LCMVTGGANLGRIGVITNRERHPG 203
             LCMVTGGANLGRIGVITNRERHPG
RPS4X:   180 LCMVTGGANLGRIGVITNRERHPG 203


The DNA sequences continue to agree for 14 bp 5' to the coding sequences:
psRPS4X: 613 ggctacgttaggcaCAGAAAAAGAA 637
             |||| |||||||||  |||| || |
RPS4X:   59  ggctgcgttaggcaAGGAAAGAGGA 11
At the DNA level, another 38 bp are available upstream before interuption by a FRAM Alu of 430 bp. Another 1068 bp of non-retrotransposon occurs before a massive set of insertions. Only the first 14 bp extend the coding sequence to any known RPS4X gene (even though a further 21 bp of that sequence exists); ie only positions 1-626 of the chr 20 contig have to do with the originating gene. [Some Blastn queries extend this to 637.]

In order to determine whether the chr 20 region is a processed pseudogene, it is necessary to know the intron/exon structure of the originating RPS4X and have a spliced out intron. Its sequence is annotated as mRNA rather than genomic, yet a promoter and 10 bp 5' are shown at the GenBank entry for RPS4X suggesting that it is really genomic. No introns are found and the sequence does not extend far enough 5' to determine whether it has the Fram Alu (and the sequence is not found at the chr X Blast server.) The chr 20 alignment extends 4-15 bp into the promoter which could well be the start of transcription. Thus the chr 20 segement could be either a translocation, an unprocessed pseudogene, or a processed pseudogene.

When the chr 20 segment gets extended, 245 bp of possible further matches, 61 of which are post-coding 3' UTR, become available. These are currently not found in Sanger Center chromosome 20 sequence.

The rest of RPS4X (non-coding 3' in caps):
tcttttgacgtggttcacgtgaaagatgccaatggcaacagctttgccactcgacttt
ccaacatttttgttattggcaagggcaacaaaccatggatttctcttccccgaggaaagggta
tccgcctcaccattgctgaagagagagacaaaagactggcggccaaacagagcagtgggtga
AATGGGTCCCTGGGTGACATGTCAGATCTTTGTACGTAATTAAAAATATTGTGGCAGGATT  
>chromosome 20 contig 1277 containing the RPS4X protein fragment and promoter (caps)
gatccagggtgcctctctctgttggtgatcacaccaattcttcccaggttggcacctccagtcaccatacacaggtt
accagtgtcgaacctgatgaagtcagtaatcttgccagtctccaaatcaatctgaatggtgtcattcaccttgatga
ggggatcagggtagcagatggtacgagcatcatgagtcaccagatgagggattccttttgtgcgcacaaatattttt
tctcacttttcacaacttgtacttggcctcctcaggtgtaatacgatgtacagcaaagtgacccttggtgtcacaga
tcagacagaaattctctcccatcttgtcaatgctgatgacatccatgaatccagcagggtaggttatatcagtgcgg
accttgccatcgatcttaatgaaccgctgcatgcaaatcttctttacttcctctcctgtcagggcatacttaagtct
gttccttatgaaaatgatgagggggagacactctctcaacttgtggggactggtggatggatgaggagcaaacacac
cagtcaatttatccagcatccaatgctttggagctgctacctgctttagatgcttcttgggaccacgagccatGGCTACGTTAGGCA
Two final oddities are worth mentioning. There are very strong Blastn hits to sequences on chr 10 and 12. These extend to include 83% matches to the chr 20 Fram Alu. These may be earlier pseudogenes.

Finally, green monkey RPS4X actually agrees slightly better with the pseudogene DNA than human RPS4X.

Massive parasitism of chromosome 20

17 Sep 99 Censor server and RepeatMasker analysis, slow mode
The webmaster thanks Jerzy Jurka and Arian Smit for helpful correspondence
The 288 retrotransposons in a large segment of human chromosome 20 centered on the prion genes are listed below. These can be analyzed by type and timing of insertion. Many are nested within earlier insertion events, sometimes multiply so. Out of total length 148,471 bp, the bases identified as retrotransposons or repeats amount to 78,189 bp (52.66 %). The GC content of 44.04 % has some significance as fixing the isochore.
               number of      length   percentage
               elements*    occupied  of sequence
SINEs:              82       20128 bp    13.56 %
      ALUs          76       19270 bp    12.98 %
      MIRs           6         858 bp     0.58 %
LINEs:              56       30935 bp    20.84 %
      LINE1         36       24041 bp    16.19 %
      LINE2         16        5759 bp     3.88 %
LTR elements:       38       18666 bp    12.57 %
      MaLRs         13        8023 bp     5.40 %
      Retrov.       13        5589 bp     3.76 %
      MER4_group     6        2968 bp     2.00 %
DNA elements:       29        6889 bp     4.64 %
      MER1_type     18        3081 bp     2.08 %
      MER2_type      7        3278 bp     2.21 %
      Mariners       0           0 bp     0.00 %

Total interspersed repeats:  76618 bp    51.60 %

Satellites:          1         261 bp     0.18 %
Simple repeats:     16         621 bp     0.42 %
Low complexity:     15         631 bp     0.42 %

1172-exon1prp_mRNA       534   588 (14622) +  GC_rich            Low_complexity             1   55    (0)  
1172-exon1prp_mRNA       858   905 (14305) +  G-rich             Low_complexity             4   51    (0)  
1172-exon1prp_mRNA      1783  1868 (13342) +  L1MC/D             LINE/L1                 5417 5502  (839)  
1172-exon1prp_mRNA      1953  2029 (13181) +  MIR                SINE/MIR                  84  160  (102)  
1172-exon1prp_mRNA      3571  3704 (11506) C  MER5B              DNA/MER1_type           (30)  148      5  
1172-exon1prp_mRNA      3964  4065 (11145) +  MIR                SINE/MIR                  45  149  (113)  
1172-exon1prp_mRNA      4315  4934 (10276) +  L1                 LINE/L1                 4021 4649 (1497)  
1172-exon1prp_mRNA      4936  5395  (9815) C  L1                 LINE/L1               (2043) 4103   3640  
1172-exon1prp_mRNA      5390  5645  (9565) C  L1                 LINE/L1               (3310) 2836   2581 *
1172-exon1prp_mRNA      5695  5920  (9290) C  L1M4               LINE/L1               (1530) 4616   4372  
1172-exon1prp_mRNA      6259  6452  (8758) C  L1M4               LINE/L1               (2201) 3945   3719  
1172-exon1prp_mRNA      6524  6954  (8256) +  L2                 LINE/L2                 2195 2673  (640)  
1172-exon1prp_mRNA      6970  7078  (8132) C  AluJb              SINE/Alu               (189)  123     15  
1172-exon1prp_mRNA      7079  7295  (7915) +  L2                 LINE/L2                 2691 2916  (397)  
1172-exon1prp_mRNA      7300  7388  (7822) +  L2                 LINE/L2                 3186 3272    (0)  
1172-exon1prp_mRNA      7445  7718  (7492) +  AluJo              SINE/Alu                   9  298   (14)  
1172-exon1prp_mRNA      7841  7973  (7237) +  FLAM_C             SINE/Alu                   1  133    (0)  
1172-exon1prp_mRNA      7987  8032  (7178) +  L2                 LINE/L2                 3268 3313    (0)  
1172-exon1prp_mRNA      8513  8767  (6443) C  L1M4               LINE/L1                (743) 5403   5139  
1172-exon1prp_mRNA      8768  9297  (5913) C  L1PA13             LINE/L1                  (7) 6149   5588  
1172-exon1prp_mRNA      9298  9498  (5712) C  L1M4               LINE/L1               (1007) 5139   4942  
1172-exon1prp_mRNA      9580 10233  (4977) C  L1M4               LINE/L1               (1302) 4844   4156  
1172-exon1prp_mRNA     10876 11557  (3653) C  L2                 LINE/L2                  (0) 3313   2475  
1172-exon1prp_mRNA     12052 12190  (3020) +  L1MB8              LINE/L1                 6023 6167    (8)  
1172-exon1prp_mRNA     12202 12227  (2983) +  AT_rich            Low_complexity             1   26    (0)  
1172-exon1prp_mRNA     14876 14906   (304) +  AT_rich            Low_complexity             1   31    (0)  
1277_exon1prp            652   825 (56016) +  FRAM               SINE/Alu                  -7  166   (10)  
1277_exon1prp            827   990 (55851) +  L1M4               LINE/L1                 2593 2767 (3379)  
1277_exon1prp           2059  3289 (53552) +  THE1C-internal     LTR/MaLR                 312 1580    (0)  
1277_exon1prp           3290  3624 (53217) +  THE1C              LTR/MaLR                   1  318   (53)  
1277_exon1prp           3637  4181 (52660) +  LTR26E             LTR/Retroviral             1  615    (7)  
1277_exon1prp           4258  4653 (52188) C  LTR26              LTR/Retroviral         (199)  404      1  
1277_exon1prp           4762  4997 (51844) +  L2                 LINE/L2                 1564 1824 (1489)  
1277_exon1prp           5434  6226 (50615) +  L2                 LINE/L2                 2466 3309    (4)  
1277_exon1prp           6306  6669 (50172) +  LTR16A1            LTR/Retroviral            11  420   (37)  
1277_exon1prp           6675  6975 (49866) +  AluJo              SINE/Alu                   5  299   (13)  
1277_exon1prp           6999  7071 (49770) +  GA-rich            Low_complexity             3   76    (0)  
1277_exon1prp           7288  7343 (49498) +  GA-rich            Low_complexity             1   58    (0)  
1277_exon1prp           8271  8539 (48302) C  AluSx              SINE/Alu                (43)  269      1  
1277_exon1prp           9437  9731 (47110) +  AluSx              SINE/Alu                   3  302   (10)  
1277_exon1prp           9736  9880 (46961) C  FRAM               SINE/Alu                (30)  146      2  
1277_exon1prp          10160 10374 (46467) +  AluJo              SINE/Alu                  84  298   (14)  
1277_exon1prp          10375 10546 (46295) C  MER92B             LTR/MER4-group          (52)  584    409  
1277_exon1prp          10547 10864 (45977) +  AluSx              SINE/Alu                   1  312    (0)  
1277_exon1prp          10865 11121 (45720) C  MER92B             LTR/MER4-group         (227)  409    173  
1277_exon1prp          11122 11451 (45390) +  MER1B              DNA/MER1_type              1  337    (0)  
1277_exon1prp          11452 11620 (45221) C  MER92B             LTR/MER4-group         (463)  173      1  
1277_exon1prp          12011 12619 (44222) +  L1M3d              LINE/L1                  254 1137 (5004)  
1277_exon1prp          12623 14303 (42538) +  Tigger1            DNA/MER2_type              1 1672  (743)  
1277_exon1prp          14304 14725 (42116) C  MSTA               LTR/MaLR                 (0)  426      1  
1277_exon1prp          14730 14963 (41878) C  MSTA-internal      LTR/MaLR                 (4) 1576   1345  
1277_exon1prp          14964 15063 (41778) C  THE1B              LTR/MaLR               (247)  117      3  
1277_exon1prp          15066 16212 (40629) C  THE1B-internal     LTR/MaLR                 (1) 1579    427  
1277_exon1prp          16219 16385 (40456) +  AluSc              SINE/Alu                 141  307    (2)  
1277_exon1prp          16386 16479 (40362) +  Tigger1            DNA/MER2_type           1739 1832  (586)  
1277_exon1prp          16505 16586 (40255) C  MER47              DNA/MER2_type            (1) 2322   2241  
1277_exon1prp          16602 16644 (40197) +  MER47              DNA/MER2_type              1   44 (2279)  
1277_exon1prp          16647 16717 (40124) +  L1MA7              LINE/L1                 4134 4201 (1945)  
1277_exon1prp          16718 17011 (39830) C  AluSx              SINE/Alu                (17)  295      2  
1277_exon1prp          17012 18038 (38803) +  L1MA7              LINE/L1                 4201 5268  (878)  
1277_exon1prp          18039 18275 (38566) C  MER8               DNA/MER2_type            (0)  239      3  
1277_exon1prp          18277 18828 (38013) +  L1MA7              LINE/L1                 5266 5828  (461)  
1277_exon1prp          18831 19016 (37825) +  AluSq              SINE/Alu                   1  178  (135)  
1277_exon1prp          19017 19187 (37654) +  L1MB1              LINE/L1                 5998 6171    (6)  
1277_exon1prp          19401 19697 (37144) C  AluJo              SINE/Alu                 (6)  306      1  
1277_exon1prp          19723 20237 (36604) C  MER90              LTR                      (0)  575     10  
1277_exon1prp          20256 20720 (36121) C  L1M4               LINE/L1               (1873) 4273   3782  
1277_exon1prp          20907 21080 (35761) +  MER3               DNA/MER1_type              1  127   (52)  
1277_exon1prp          21154 21176 (35665) +  (TTTTC)n           Simple_repeat              2   24    (0)  
1277_exon1prp          21179 21455 (35386) C  AluJb              SINE/Alu                (31)  281      1  
1277_exon1prp          21729 21900 (34941) +  L2                 LINE/L2                 1521 1687 (1626)  
1277_exon1prp          21963 22385 (34456) C  MLT1F              LTR/MaLR                 (0)  561      9  
1277_exon1prp          22845 22866 (33975) +  AT_rich            Low_complexity             1   22    (0)  
1277_exon1prp          22869 23994 (32847) C  L1PB3              LINE/L1                  (9) 6141   5031  
1277_exon1prp          24446 24824 (32017) +  MER77              LTR/MER21-group           81  488  (117)  
1277_exon1prp          24864 25020 (31821) C  AluJo/FRAM         SINE/Alu                (38)  274    157  
1277_exon1prp          25021 25157 (31684) +  MER77              LTR/MER21-group          451  598    (7)  
1277_exon1prp          25548 25853 (30988) C  AluSx              SINE/Alu                 (2)  310      6  
1277_exon1prp          26082 26216 (30625) +  MSTD               LTR/MaLR                   1  133  (261)  
1277_exon1prp          26226 26268 (30573) +  (TG)n              Simple_repeat              2   44    (0)  
1277_exon1prp          26408 26682 (30159) C  L1PA5              LINE/L1                  (7) 6138   5928  
1277_exon1prp          26694 26951 (29890) +  MSTD               LTR/MaLR                 142  394    (0)  
1277_exon1prp          27802 27968 (28873) +  L2                 LINE/L2                  937 1137 (2176)  
1277_exon1prp          28181 28651 (28190) +  L1MCb              LINE/L1                  106  570 (5576)  
1277_exon1prp          28642 29024 (27817) +  L1MEc              LINE/L1                 1701 1775 (4371) *
1277_exon1prp          29025 29054 (27787) +  MLT1A2             LTR/MaLR                 274  303   (71)  
1277_exon1prp          29055 29568 (27273) +  L1MEc              LINE/L1                 1775 2285 (3861)  
1277_exon1prp          29579 30238 (26603) +  L1MB7              LINE/L1                 4781 5434  (742)  
1277_exon1prp          30239 30667 (26174) +  LTR7               LTR/Retroviral             1  448    (2)  
1277_exon1prp          30668 31045 (25796) +  L1MB7              LINE/L1                 5434 5830  (346)  
1277_exon1prp          31046 31338 (25503) +  AluJb              SINE/Alu                   1  295   (17)  
1277_exon1prp          31339 31399 (25442) +  L1MB7              LINE/L1                 5830 5889  (287)  
1277_exon1prp          31429 31671 (25170) C  MER4D              LTR/MER4-group          (36)  981    727  
1277_exon1prp          31672 32057 (24784) +  MER48              LTR/Retroviral             1  398    (0)  
1277_exon1prp          32058 32185 (24656) C  MER4D              LTR/MER4-group         (290)  727    594  
1277_exon1prp          32186 32494 (24347) C  AluY               SINE/Alu                 (1)  310      1  
1277_exon1prp          32495 32908 (23933) C  MER4D              LTR/MER4-group         (423)  594    236  
1277_exon1prp          32910 33204 (23637) +  L1MB7              LINE/L1                 5886 6173    (2)  
1277_exon1prp          33239 33422 (23419) +  MER30              DNA/MER1_type              1  186   (44)  
1277_exon1prp          33423 33515 (23326) C  AluJ/FLAM          SINE/Alu               (218)   94      2  
1277_exon1prp          33933 34598 (22243) C  L1PB3              LINE/L1                  (3) 6147   5470  
1277_exon1prp          34655 34839 (22002) +  AluSx              SINE/Alu                   5  189  (123)  
1277_exon1prp          34907 35137 (21704) C  LTR1B              LTR/Retroviral         (237)  589    343 *
1277_exon1prp          34940 35140 (21701) C  LTR28              LTR/MER4-group         (413)  607    404  
1277_exon1prp          35236 35400 (21441) C  LTR1               LTR/Retroviral         (568)  217     36  
1277_exon1prp          35411 35465 (21376) C  LTR1B              LTR/Retroviral         (724)  102     43  
1277_exon1prp          35482 35585 (21256) +  AluSx              SINE/Alu                 187  290   (22)  
1277_exon1prp          35997 36185 (20656) +  HAL1               LINE/Other               377  559 (1218)  
1277_exon1prp          36186 36901 (19940) +  LTR8               LTR/Retroviral             2  691    (0)  
1277_exon1prp          36902 37018 (19823) +  HAL1               LINE/Other               559  677 (1100)  
1277_exon1prp          37238 37291 (19550) +  AT_rich            Low_complexity             1   54    (0)  
1277_exon1prp          37437 37745 (19096) C  AluSq              SINE/Alu                 (3)  310      1  
1277_exon1prp          38472 38576 (18265) C  MLT1A2             LTR/MaLR               (135)  255    141  
1277_exon1prp          38577 38871 (17970) C  AluSq              SINE/Alu                (16)  297      1  
1277_exon1prp          38872 39032 (17809) C  MLT1A2             LTR/MaLR               (249)  141      3  
1277_exon1prp          39206 39492 (17349) C  AluSx              SINE/Alu                (17)  295      1  
1277_exon1prp          39520 39575 (17266) +  (TG)n              Simple_repeat              2   57    (0)  
1277_exon1prp          39635 39762 (17079) +  L1M4               LINE/L1                 5269 5399  (747)  
1277_exon1prp          40294 40451 (16390) C  MER5A              DNA/MER1_type            (0)  189      9  
1277_exon1prp          40733 40866 (15975) +  L2                 LINE/L2                 2889 3026  (287)  
1277_exon1prp          40868 41155 (15686) C  L2                 LINE/L2               (1050) 2263   1967  
1277_exon1prp          41730 41913 (14928) +  HAL1               LINE/Other                22  231 (1546)  
1277_exon1prp          41922 42233 (14608) +  AluY               SINE/Alu                   1  311    (0)  
1277_exon1prp          42312 42441 (14400) +  MER21B             LTR/MER21-group          708  852   (11)  
1277_exon1prp          42643 42931 (13910) +  AluSx              SINE/Alu                   1  292   (20)  
1277_exon1prp          43321 43452 (13389) +  FLAM_A             SINE/Alu                   1  131    (2)  
1277_exon1prp          43575 43655 (13186) +  CT-rich            Low_complexity             2   83    (0)  
1277_exon1prp          43656 43938 (12903) C  AluSg              SINE/Alu                 (0)  310      1  
1277_exon1prp          44004 44477 (12364) C  MER74B             LTR/MER73-group         (75)  547     60  
1277_exon1prp          44493 44611 (12230) +  MER3               DNA/MER1_type             70  191   (18)  
1277_exon1prp          45838 45894 (10947) +  L1M4               LINE/L1                 5541 5607  (568)  
1277_exon1prp          46133 46160 (10681) +  AT_rich            Low_complexity             1   28    (0)  
1277_exon1prp          46315 46735 (10106) +  MER65A             LTR/MER4-group             1  445    (0)  
1277_exon1prp          47021 47339  (9502) C  MLT1A1             LTR/MaLR                (51)  314     51  
1277_exon1prp          47340 47633  (9208) +  AluSx              SINE/Alu                   1  292   (20)  
1277_exon1prp          47634 47678  (9163) C  MLT1A1             LTR/MaLR               (314)   51      1  
1277_exon1prp          49619 49849  (6992) C  LTR16C             LTR/Retroviral         (154)  233      1  
1277_exon1prp          50178 50466  (6375) +  AluSx              SINE/Alu                   1  294   (18)  
1277_exon1prp          51170 51181  (5660) +  AT_rich            Low_complexity             1   12   (25)  
1277_exon1prp          51182 51475  (5366) C  AluSx              SINE/Alu                (18)  294      1  
1277_exon1prp          51476 51499  (5342) +  AT_rich            Low_complexity            11   37    (0)  
1277_exon1prp          51742 51767  (5074) +  (GA)n              Simple_repeat              2   27    (0)  
1277_exon1prp          52169 52466  (4375) +  L2                 LINE/L2                 1721 2021 (1292)  
1277_exon1prp          52467 52644  (4197) +  MER5A              DNA/MER1_type              4  183    (6)  
1277_exon1prp          52645 53159  (3682) +  L2                 LINE/L2                 2021 2697  (616)  
1277_exon1prp          53160 53456  (3385) +  AluJb              SINE/Alu                   1  296   (16)  
1277_exon1prp          53457 53594  (3247) +  L2                 LINE/L2                 2697 2835  (478)  
1277_exon1prp          54094 54191  (2650) +  MLT1G              LTR/MaLR                   1  109  (403)  
1277_exon1prp          54192 54490  (2351) C  AluSx              SINE/Alu                (12)  300      2  
1277_exon1prp          54491 54973  (1868) +  MLT1G              LTR/MaLR                 109  512    (0)  
1277_exon1prp          55687 56009   (832) C  AluJo              SINE/Alu                (11)  301      1  
1277_exon1prp          56773 56794    (47) +  GC_rich            Low_complexity             1   22    (0)  
1598_TATA_prnd_mRNA     1336  1511  (3374) +  MIR                SINE/MIR                  62  248   (14)  
1598_TATA_prnd_mRNA     1793  2059  (2826) +  AluJo              SINE/Alu                   6  290   (19)  
1598_TATA_prnd_mRNA     2060  2112  (2773) +  (TG)n              Simple_repeat              2   54    (0)  
1598_TATA_prnd_mRNA     3265  3346  (1539) +  L2                 LINE/L2                 3175 3262   (10)  
1598_TATA_prnd_mRNA     3358  3539  (1346) C  MIR                SINE/MIR                (37)  225     21  
1598_TATA_prnd_mRNA     3856  4137   (748) C  AluSq              SINE/Alu                 (4)  309     27  
795_end_prnpMrna_TATA    343   409 (19902) C  Tigger2a           DNA/MER2_type            (2)  432    365  
795_end_prnpMrna_TATA    410   438 (19873) +  (TTTTG)n           Simple_repeat              1   31    (0)  
795_end_prnpMrna_TATA    439   717 (19594) C  AluSx              SINE/Alu                (31)  281      1  
795_end_prnpMrna_TATA    718  1090 (19221) C  Tigger2a           DNA/MER2_type           (69)  365      1  
795_end_prnpMrna_TATA   1367  1817 (18494) +  MER88              LTR/MER73-group            1  460    (0)  
795_end_prnpMrna_TATA   1925  2230 (18081) +  AluSq              SINE/Alu                   1  304    (9)  
795_end_prnpMrna_TATA   2998  3180 (17131) C  MER5A              DNA/MER1_type            (4)  185      1  
795_end_prnpMrna_TATA   3453  3613 (16698) C  MER5A              DNA/MER1_type            (0)  189      4  
795_end_prnpMrna_TATA   4000  4131 (16180) C  FLAM_A             SINE/Alu                 (1)  132      1  
795_end_prnpMrna_TATA   4544  4841 (15470) +  MER115             DNA                      134  455  (238)  
795_end_prnpMrna_TATA   5372  5497 (14814) +  AluJo/FRAM         SINE/Alu                 169  294   (18)  
795_end_prnpMrna_TATA   5503  5781 (14530) +  AluSg              SINE/Alu                   1  279   (31)  
795_end_prnpMrna_TATA   5782  5813 (14498) +  (CAAAA)n           Simple_repeat              1   32    (0)  
795_end_prnpMrna_TATA   6029  6316 (13995) +  AluY               SINE/Alu                   3  290   (21)  
795_end_prnpMrna_TATA   6332  6460 (13851) +  HAL1B              LINE/Other              1139 1260   (46)  
795_end_prnpMrna_TATA   6470  6930 (13381) +  MLT2CB             LTR/Retroviral            18  501    (0)  
795_end_prnpMrna_TATA   7290  7360 (12951) +  MER91C             DNA/MER1_type?            49  119   (21)  
795_end_prnpMrna_TATA   7598  7630 (12681) +  (A)n               Simple_repeat              1   33    (0)  
795_end_prnpMrna_TATA   7770  8149 (12162) C  MLT1B              LTR/MaLR                 (0)  390      1  
795_end_prnpMrna_TATA   9350  9653 (10658) C  AluJb              SINE/Alu                 (6)  306      1  
795_end_prnpMrna_TATA   9967 10325  (9986) +  L1MB6              LINE/L1                 5798 6156   (19)  
795_end_prnpMrna_TATA  10357 10667  (9644) C  AluSx              SINE/Alu                 (0)  312      1  
795_end_prnpMrna_TATA  11318 11717  (8594) C  L1PA8              LINE/L1                  (2) 6161   5759  
795_end_prnpMrna_TATA  12803 13102  (7209) +  AluSx              SINE/Alu                   1  300   (12)  
795_end_prnpMrna_TATA  13123 13362  (6949) C  Charlie4a          DNA/MER1_type           (37)  471    206  
795_end_prnpMrna_TATA  13363 13472  (6839) C  MER81              DNA                      (0)  114      2  
795_end_prnpMrna_TATA  13475 13525  (6786) C  MER81              DNA                     (63)   51      1  
795_end_prnpMrna_TATA  13527 13744  (6567) C  Charlie4a          DNA/MER1_type          (296)  212      1  
795_end_prnpMrna_TATA  13939 14238  (6073) +  L1ME               LINE/L1                 5509 5819  (345)  
795_end_prnpMrna_TATA  14255 14547  (5764) +  AluJo              SINE/Alu                   1  294   (18)  
795_end_prnpMrna_TATA  14550 14657  (5654) C  L1PA14             LINE/L1                (209) 5940   5832  
795_end_prnpMrna_TATA  14762 14856  (5455) +  L1ME3A             LINE/L1                 6062 6158    (5)  
795_end_prnpMrna_TATA  16458 16534  (3777) C  L2                 LINE/L2                 (35) 3237   3158  
795_end_prnpMrna_TATA  16707 17185  (3126) +  L1ME3A             LINE/L1                 5594 6102   (61)  
795_end_prnpMrna_TATA  17186 17480  (2831) +  AluY               SINE/Alu                   1  295   (16)  
795_end_prnpMrna_TATA  17481 17545  (2766) +  L1ME3A             LINE/L1                 6102 6163    (0)  
795_end_prnpMrna_TATA  17563 17926  (2385) C  MLT1A1             LTR/MaLR                 (0)  365      1  
795_end_prnpMrna_TATA  19859 19883   (428) +  GC_rich            Low_complexity             1   25    (0)  
post3'                   913  1206 (50018) C  AluSx              SINE/Alu                (20)  292      1  
post3'                  1443  1574 (49650) C  L1ME1              LINE/L1                (343) 5825   5686  
post3'                  3418  3723 (47501) C  AluSp              SINE/Alu                 (0)  313      7  
post3'                  4926  5017 (46207) C  L2                 LINE/L2                  (5) 3267   3172  
post3'                  5108  5250 (45974) +  MER63A             DNA/MER1_type             53  204    (6)  
post3'                  5515  5637 (45587) C  L2                 LINE/L2                 (11) 3302   3178  
post3'                  5983  6078 (45146) C  MER5A              DNA/MER1_type           (80)  109     11  
post3'                  6251  6542 (44682) +  AluJb              SINE/Alu                   1  295   (17)  
post3'                  7220  7431 (43793) +  MER20              DNA/MER1_type              1  219    (0)  
post3'                  7779  7799 (43425) +  AT_rich            Low_complexity             1   21    (0)  
post3'                  8123  8433 (42791) C  AluSx              SINE/Alu                 (2)  310      1  
post3'                  8434  8726 (42498) C  AluJb              SINE/Alu                (19)  293      1  
post3'                  9418  9708 (41516) +  AluJo              SINE/Alu                   7  297   (15)  
post3'                 10358 10618 (40606) C  ALR6/alpha         Satellite/centromeric   (79)  265      4  
post3'                 10651 10831 (40393) +  AluJo              SINE/Alu                  91  269   (43)  
post3'                 10868 11153 (40071) +  AluSx              SINE/Alu                   1  284   (28)  
post3'                 11155 11207 (40017) +  (TAAA)n            Simple_repeat              1   52    (0)  
post3'                 11310 11667 (39557) C  LTR16C             LTR/Retroviral           (0)  387     26  
post3'                 12700 12999 (38225) C  AluSc              SINE/Alu                 (7)  302      1  
post3'                 13539 13706 (37518) +  L2                 LINE/L2                 2965 3165  (107)  
post3'                 14531 14758 (36466) C  AluSg/x            SINE/Alu                 (1)  311     84  
post3'                 14792 14958 (36266) C  FAM                SINE/Alu                 (8)  167      1  
post3'                 15596 16014 (35210) +  MSTB               LTR/MaLR                   1  422    (4)  
post3'                 16067 17239 (33985) +  L1MC3              LINE/L1                 6371 7485  (255)  
post3'                 17239 17406 (33818) +  L1MC2              LINE/L1                 6155 6330    (0) *
post3'                 17416 17704 (33520) +  AluJb              SINE/Alu                   1  300   (12)  
post3'                 17733 17863 (33361) +  L1MB3              LINE/L1                 5891 6022  (163)  
post3'                 17864 17919 (33305) +  Alu                SINE/Alu                 242  297    (5)  
post3'                 17923 18261 (32963) +  L1M2               LINE/L1                 5194 5543  (681)  
post3'                 18375 19107 (32117) +  L1ME3              LINE/L1                 5361 6132   (32)  
post3'                 19249 19631 (31593) +  L1MB7              LINE/L1                 5754 6151   (24)  
post3'                 19633 19690 (31534) +  (TTTTA)n           Simple_repeat              4   61    (0)  
post3'                 20166 20443 (30781) +  L1ME1              LINE/L1                 5627 5920  (248)  
post3'                 20449 20877 (30347) +  MLT1C              LTR/MaLR                   9  462    (4)  
post3'                 21177 21311 (29913) C  FLAM_C             SINE/Alu                 (0)  133      1  
post3'                 21434 21499 (29725) C  LTR16C             LTR/Retroviral           (6)  381    317  
post3'                 22772 22947 (28277) +  MIR                SINE/MIR                  65  259    (3)  
post3'                 23165 23589 (27635) +  L2                 LINE/L2                 2589 3082  (231)  
post3'                 23632 24020 (27204) C  MLT1H              LTR/MaLR                 (0)  547     56  
post3'                 24159 24210 (27014) +  L2                 LINE/L2                 3243 3294   (19)  
post3'                 24167 24220 (27004) +  MIR                SINE/MIR                 206  261    (1) *
post3'                 24223 24386 (26838) C  FRAM               SINE/Alu                 (6)  170      7  
post3'                 24504 24530 (26694) +  AT_rich            Low_complexity             1   27    (0)  
post3'                 24531 24801 (26423) C  AluSq              SINE/Alu                (39)  274      3  
post3'                 25388 25479 (25745) C  Charlie7           DNA/MER1_type            (0) 2616   2513  
post3'                 25551 25575 (25649) +  (T)n               Simple_repeat              1   25    (0)  
post3'                 25721 25948 (25276) +  AluJ/FLAM          SINE/Alu                   4  306    (6)  
post3'                 25995 26029 (25195) C  L1PB1              LINE/L1                 (17) 6138   6105  
post3'                 26030 26334 (24890) C  AluSg              SINE/Alu                 (5)  305      1  
post3'                 26335 28244 (22980) C  L1PB1              LINE/L1                 (50) 6105   4232  
post3'                 28245 28326 (22898) +  (TTTC)n            Simple_repeat              1   81    (0)  
post3'                 28330 28614 (22610) C  AluSq              SINE/Alu                (29)  284      1  
post3'                 28615 30541 (20683) C  L1PB1              LINE/L1               (1923) 4232   2342  
post3'                 30542 30861 (20363) C  AluSx              SINE/Alu                 (2)  310      1  
post3'                 30862 32417 (18807) C  L1PB1              LINE/L1               (3813) 2342   -279  
post3'                 32418 33434 (17790) C  L1PB1              LINE/L1               (5625) -425  -1537  
post3'                 33472 33606 (17618) C  MIR                SINE/MIR                (74)  188     36  
post3'                 34709 35409 (15815) C  MER44C             DNA/MER2_type           (12)  716      4  
post3'                 35412 35520 (15704) +  L1ME2              LINE/L1                 6054 6165    (3)  
post3'                 35563 35717 (15507) C  L2                 LINE/L2                (246) 3067   2931  
post3'                 35959 36248 (14976) +  AluJb              SINE/Alu                   1  295   (16)  
post3'                 36249 36287 (14937) +  (TAAA)n            Simple_repeat              2   40    (0)  
post3'                 36377 36507 (14717) C  FLAM_A             SINE/Alu                 (2)  131      1  
post3'                 37622 37779 (13445) +  MLT1D              LTR/MaLR                   1  144  (361)  
post3'                 37973 38322 (12902) +  LTR16C             LTR/Retroviral             2  339   (48)  
post3'                 38323 38632 (12592) C  AluSg              SINE/Alu                 (0)  310      1  
post3'                 38633 38679 (12545) +  LTR16C             LTR/Retroviral           339  386    (1)  
post3'                 39308 39607 (11617) C  AluSx              SINE/Alu                (12)  300      1  
post3'                 39766 39881 (11343) C  MER102             DNA/MER1_type          (102)  239    131  
post3'                 40055 40570 (10654) +  HAL1               LINE/Other              1161 1772    (5)  
post3'                 40774 41079 (10145) +  AluSq              SINE/Alu                   1  305    (8)  
post3'                 41933 42228  (8996) C  AluY               SINE/Alu                (15)  296      1  
post3'                 43267 43361  (7863) +  MER5A              DNA/MER1_type             10  105   (84)  
post3'                 43967 44002  (7222) C  L2                 LINE/L2                  (0) 3313   3284  
post3'                 44003 44664  (6560) C  MER4A              LTR/MER4-group           (0)  660      1  
post3'                 44665 45007  (6217) C  L2                 LINE/L2                 (29) 3284   2875  
post3'                 45044 45542  (5682) +  LOR1a              LTR/MER4-group             1  497    (0)  
post3'                 46186 46543  (4681) C  THE1B              LTR/MaLR                 (0)  364      1  
post3'                 46565 46673  (4551) +  MER5B              DNA/MER1_type             48  155   (23)  
post3'                 46705 46843  (4381) +  MER5A              DNA/MER1_type             28  166   (23)  
post3'                 47799 47831  (3393) +  (TTTTG)n           Simple_repeat              5   38    (0)  
post3'                 48066 48340  (2884) C  AluSx              SINE/Alu                (36)  276      1  
post3'                 48445 48576  (2648) C  L1M4               LINE/L1               (1526) 4620   4472  
post3'                 48840 49628  (1596) C  LTR1               LTR/Retroviral           (1)  784      1  
post3'                 50030 50197  (1027) C  L1                 LINE/L1                (852) 5294   5134  
post3'                 50198 50491   (733) C  AluJb              SINE/Alu                (18)  294      1  
post3'                 50492 50539   (685) C  L1                 LINE/L1               (1012) 5134   5086  
post3'                 50540 50706   (518) +  L1MA9              LINE/L1                 6021 6137  (171)  
post3'                 50707 50742   (482) +  (CA)n              Simple_repeat              1   36    (0)  
post3'                 50743 50894   (330) +  L1MA9              LINE/L1                 6137 6307    (1)  
post3'                 50902 50927   (297) +  AT_rich            Low_complexity             1   26    (0)  

Have any genes been missed?

17 Sep 99 webmaster research
How many genes occur in the 150,000 bp about the prion gene? So far, 3 additional genes have been found and described: the functional prion doppel or ghost gene, an old pseudogene for isoprenyl delta isomerase, and a recent pseudogene fragment for ribosomal 40S protein S4 X isoform. This gives two active genes totally 433 amino acids within a stretch of DNA with the potential to code for 50,000 amino acids, so a density of less than 1%. Even allowing for promoter and poly A signals, the percentage would not be too much better at the DNA level, say 2%.

Is it possible that other genes occur in this region but have not been detected? The main method for finding genes is to run a gene-finding program such as GenScanW and investigate suggested candidate proteins with Blastx on dbEST or Blastp on GenBank proteins. GenScanW indeed suggests the 4 proteins mentioned above (along with an equal amount of intermingled false positives.

In the case of the 150,000 bp fragment of chr 30, GenScanW finds psRPS4X (though not the part after the frame shift), the prion protein with a long non-existent leader peptide, the ghost prion protein (again with a bogus leader peptide), and the psIDI (also with an inappropriate leader). The residual predicted proteins are shown below; they are not supported by either existing ESTs or Blastx matches of statistical significance. Therefore they are treated as artefactual for the time being.

>
protein 2 81_aa 10329-10393, 49490-49543, 63380-63493, 69521-70292 plus. no ESTS, no Blastp
MRLESGVNGSVMIRKQISSRARTFYEIEGGSTLQVIQLQEHDPPPVCPISVDGITAISLVMEVPDPGNSHSSIHSTSRAVI

>protein 3 39_aa 86310-86315, 91833-91932, 94781-95322 plus. no ESTS, no Blastp
MVGVVVQVNACPLFELNLPTEAVSKDCFAVQGRPLGSDA

>protein 4 50_aa 119215-119065, 106535-106273 minus.  weak ESTS, no Blastp
MPWTALRHAENNGALESLGPASSGQSSSSPDVFAEVTLLSWKGVGSCALG
An independent method for looking for genes within this stretch of chromsome 20 is to remove the 4 known genes (from promoter through 3' UTR) as well as all known retrotransposons (even though these are sometimes within genes) and evaluate what remains with various Blast tools. Note GenScanW is not designed to work on masked sequences such as this and will return artefacts. Also, there are retrotransposons known but not yet implemented in RepeatMasker. Further, many genes may not be well represented in dbEST or GenBank collections.

Given these limitations, it can be said that no candidate genes exist from the beginning of the U29185 prion sequence through the end of the 01225 prion doppel contig. It is readily verified that the 5' leader and 3' tail sequences also suggest no other good gene candidates. Thus it appears that this region of chromosome 20 is indeed very sparse in active genes and that additional flanking chromosome is needed to determine the flanking active genes of prion and prion doppel.

As an example of the artefacts encountered, contig 1277 after masking shows a highly significant Blast hit on a human EST in the region 20044-20,274. However, when the 38 bp of masked agreement are extended using the EST, it emerges that while the full extension is found on the contig, it is part of a Mer90 LTR retrotransposon, ie, the match was merely something that RepeatMasker imperfectly recognized.

The closest-in known markers based on mRNA (as opposed to simple repeats or random fragments) can also be explored:

-- FB25H5 [or T03153] is centromeric to prion protein. It has many high quality human EST hits and can be extended signficantly in both directions by the 625 bp mRNA AA085766. However, there is no known protein corresponding to this mRNA (which may be all 3' UTR) nor has it been reached by the human genome project.

--WI-2640 [or D20S500] and D20S482 [or G08052] are telomeric but simply represent sheared DNA. D20S895 and D20S849 are simple repetitive CA microsatellite sequence. WI-4689 [or D20S751] is more sheared DNA. These exhaust the known nearby markers.

The 1999 NCBI map shows tyrosine phosphatase and a EST SGC32955 flanking the prion gene. The first appears on dJ684O24 but this is very distant.

Now the 17 Sep 99 make-over of the Sanger Center Blast server allows all of the Human Genome Project to be surveyed at one time, with interest restrictable to either finished or unfinished sequence or particular chromosomes or even CpG islands. As of this date, there were no pseudogenes known for prion or ghost prion anywhere else in the human genome. It would not be at all surprising to see prion gene pseudogenes given a reasonable level of mRNA formation and the general ubiquity of pseudogenes.

Generation of monoclonal antibodies against prion proteins with an unconventional nucleic acid-based immunization strategy.

J Biotechnol 1999 Aug 20;73(2-3):119-29
Krasemann S, Jurgens T, Bodemer W
 
Prion diseases belong to a group of neurodegenerative disorders affecting humans and animals. The human diseases include kuru, Creutzfeldt-Jakob disease (CJD), Gerstmann-Straussler-Scheinker syndrome (GSS) and fatal familial insomnia (FFI). The pathomechanisms of the prion diseases are not yet understood. Therefore, monoclonal antibodies (mAbs) would provide valuable tools in diagnostics as well as in basic research of these diseases.

In contrast to conventional strategies we have developed an immunization protocol based on nucleic acid injection into non tolerant PrP0/0-mice. DNA or RNA coding for different human prion proteins including the mutated sequences associated with CJD, GSS and FFI were injected into muscle tissue. The mice were primarily inoculated with DNA-plasmids encoding PRNP and boosted either with DNA, RNA or recombinant Semliki Forest virus (SFV) particles expressing PRNP.

After hybridoma preparation, different mAbs against prion proteins were obtained and their binding behaviour was analysed by peptide-ELISA, Western blot, immunofluorescence and immunoprecipitation. Our mAbs are directed against four different linear epitopes and may also recognize discontinuous regions of the native prion protein.

It could, therefore, be demonstrated that immunization of non tolerant mice with DNA and live attenuated SF virus is a valuable means to induce a broad immune response leading eventually to the generation of a panel of mAbs for basic science as well as for diagnostics.

Comment (webmaster): This approach could also work for the ghost prion doppel protein. Here the need will be cytological localization and discrimination from prion protein. Most, but not all, of the homologous regions of these proteins are interior, so it should not be difficult to avoid cross-reaction.

Unusual repeat in lemur prion

GenBank 13 Sep 99
Gilch and Schatzl have made a wise choice here in extending their sequencing studies. Lemurs represented a significant gap in the primate-rodent ensemble though 3 species would be needed to avoid long branch noise. (Note also new rodent sequences as well as a partial guinea pig have been posted recently.] Lemurs are methionine at codon 129.

The GenBank entry does not say how many individual animals were sequenced, so we do not know if this an allele or the norm. Alleles have been found in numerous species.

Recall ruffed lemur was one of those studied by Noelle Bons.

Varecia variegata variegata   1985  Mulhouse zoo       1990 Montpellier zoo
Varecia variegata variegata   1993  xxx                      1994 Montpellier zoo

A three-repeat allele was found in goats (plus W102G) 
PQGGGGWGQ
PHGGGWGQ
PHGGGGWGQ
The posted lemur sequence is a fragment. It shows 3 repeats but does not show what comes earlier. This is unfortunate because hairpin C requires an earlier stretch. But one sees already a problem with the glutamine in second position in the first repeat -- that suggests strongly that it is the first repeat and that the whole section is only 3 repeats long. There is no compensatory W102G in lemur. Otherwise the sequence is normal. Lemurs are a good outgroup to the other primates but most of the residues where it differs are rapid oscillators within small subsets of allowable amino acids.

Unusual prion protein octarepeat structure of the highly BSE-susceptible lemur monkey

J Virology (unpublished)
Gilch,S. and Schatzl,H.M
 Varecia variegata variegata prion protein
 AF177293      373 bp 10-SEP-1999 peripheral blood lymphocyte
  Primates; Strepsirhini; Lemuridae; Varecia.

PQGGGWGQ
PHGGGWGQ
PHGGGWGQ
GGGsHgQWNKPSKPKTNM
KHVAGAAAAGAVVGGLGG
YMLGSAMSRPLIHFGNDYEDRYYRENMYRYPNQVYYkPVDQYSNQNs
FVHDCVNITIKQHTVTT

        1 ccccagggcg gcggctgggg acaaccccat gggggtggct ggggacagcc tcatggtggt
       61 ggctggggtc aaggaggtgg ctctcacggt cagtggaaca agcccagtaa accaaaaacc
      121 aacatgaagc acgtggcagg tgccgcagcg gctggggcag tggtgggtgg ccttggtggc
      181 tacatgctag ggagtgccat gagcaggccc ctcatacatt ttggcaatga ctatgaggac
      241 cgttactatc gcgaaaacat gtaccgttac cccaaccaag tgtactacaa accggtggat
      301 cagtacagca accagaacag cttcgtgcac gactgcgtca atatcaccat caagcagcac
      361 acggtcacca cca

ccccagggcggcggctggggacaaccccatgggggtggctggggacagcctcatggtggt
 P  Q  G  G  G  W  G  Q  P  H  G  G  G  W  G  Q  P  H  G  G
ggctggggtcaaggaggtggctctcacggtcagtggaacaagcccagtaaaccaaaaacc
 G  W  G  Q  G  G  G  S  H  G  Q  W  N  K  P  S  K  P  K  T
aacatgaagcacgtggcaggtgccgcagcggctggggcagtggtgggtggccttggtggc
 N  M  K  H  V  A  G  A  A  A  A  G  A  V  V  G  G  L  G  G
tacatgctagggagtgccatgagcaggcccctcatacattttggcaatgactatgaggac
 Y  M  L  G  S  A  M  S  R  P  L  I  H  F  G  N  D  Y  E  D
cgttactatcgcgaaaacatgtaccgttaccccaaccaagtgtactacaaaccggtggat
 R  Y  Y  R  E  N  M  Y  R  Y  P  N  Q  V  Y  Y  K  P  V  D
cagtacagcaaccagaacagcttcgtgcacgactgcgtcaatatcaccatcaagcagcac
 Q  Y  S  N  Q  N  S  F  V  H  D  C  V  N  I  T  I  K  Q  H
acggtcaccacca
 T  V  T  T

ccc cag ggc ggc ggc tgg gga caa
ccc cat ggg ggt ggc tgg gga cag
cct cat ggt ggt ggc tgg ggt caa

Prion protein glycotype analysis in familial and sporadic CJD patients.

Brain Res Bull 1999 Aug;49(6):429-33
Cardone F, Liu QG, Petraroli R, Ladogana A, D'Alessandro M, Arpino C, Di Bari M, Macchi G, Pocchiari M
 
CJD and other transmissible spongiform encephalopathies (TSEs) are characterised by the accumulation of a pathological conformer of PrP, named PrPsc. Molecular weight and glycosylation of the protease-resistant core of PrPsc (PrP27-30) are heterogeneous in different forms of TSEs. We analysed PrP27-30 glycotypes in a large number of TSE-affected patients: 50 sporadic CJD (sCJD), 1 iatrogenic CJD, 1 Gerstmann-Straussler-Scheinker syndrome (GSS) with the Pro102Leu mutation of PrP, 3 familial CJD (fCJD) with the Glu200Lys mutation and, for the first time, 7 fCJD with the V210I mutation. All patients were screened for the polymorphic codon 129 of the PrP gene. PrP27-30 deglycosylation and PrPsc immunohistochemistry were performed in selected cases.

We found that two PrP27-30 glycotypes (type 1A and type 2A) are produced in sCJD. Type 1A is more frequently associated with methionine than valine in position 129. Type 1A is also formed in Val210lle fCJD. In Glu200Lys fCJD and GSS patients, we found that PrP27-30 has the same mobility of type 1 but different glycosylation ratios (type 1B). Our findings indicate that the polymorphic residue 129 of PrP has a leading role in determining the proteinase degradation site of PrPsc while mutant residues 102 or 200 influence only the glycosylation pattern.

Comment (webmaster):
It sounds like classification of CJD has taken the next step in refinement here. There is still more that could be done to subdivide, most notably distinguish strain types on panels of mice. There is no question about molecular memory in TSE, the question is whether that memory is good enough to track back to a unique etiology.

Mad Cow Home ... Best Links ... Search this site