The structure of the prion gene for all species of mammals studied contains three exons. The first is contiguous to promoter and regulatory elements while the third contains just 10 nucleotides preceding the start codon. The second exon is transcriptionally expressed in most species, yet human mRNA reflects only exon 1 and exon 3, suggesting to earlier workers that the gene had a different structure.
28 Aug 98 webmaster
However, a region clearly related to exon 2 was identified in humans from its strong sequence homology to expressed exon 2 in other species. The question then arises, is human exon 2 cryptic [present but spliced out of mRNA] or is it only expressed in certain rare cell types or tissues or stages of development or at undetectable levels? Hamsters were also once thought to have no exon 2, much less expression, but later it emerged that splice type 1-3 swamped 1-2-3 mRNA unless astrocytes expressing it were present at high levels.
Exon 2 is changing very slowly in evolutionary time, with 6 species averaging 85% identity to a common ancestral consensus sequence some 100 million years back. Human exon 2 shows no sign of rapid change [loss of selective pressure]: the sequence changes orders of magnitude more slowly than [unselected] pseudogenes. Therefore, if human exon 2 is a molecular fossil without function, this must be a very recent development. Alternatively, exon 2 could have a cryptic function without appearing in mature mRNA. More likely, human exon 2 will turn out like hamster: preferentially expressed in some cell types under certain conditons and possibly at low -- but important -- levels in other cell types.
The mystery of exon 2 is why the sequence is so conserved when the uses of it seem minor if not obscure: how can exon 2 be so important that accepted point mutations are rare when the ORF itself can be knocked out without dramatic consequences? A similar question arises in exon 1b in sheep -- the splice junction seems unflawed but, unlike in cow, not utilized. And a conserved motif region upstream of promoter elements also begs for an explanation.
Since over-production of prion protein is a strong risk factor in animal models of CJD, it has long been speculated that polymorphisms in 5' and 3' UTR leading to higher levels of expression of prion protein could explain some fraction of sporadic CJD, nvCJD, or susceptibility to iatrogenic CJD. Polymorphisms have been reported of human (relative to exon 3) and at -600 (relative to exon 1) but not pursued. Of course, a polymorphism in a conserved region is more suggestive than one in a chaotic domain or the middle of a huge intron.
Is human exon 2 a suitable substrate for the spliceosome or does it contain some subtle or not-so-subtle variation that incapacitates it? If exon 2 is splice-competent, how is the level of participation or tissue specificity built in to the sequence or secondary structure? The issue is apparently why the donor 3' to exon 1 does not splice with the acceptor 5' of exon 2, which then forces the donor to splice with the acceptor 5' of exon 3 (a region where a polymorphism is found at the polypyrimidine boundary, G-21A).
2 Sept 98 webmaster
Alternative splicing is common in other genes. However, it typically results in domains added or dropped from the final protein product. This scenario is not applicable to exon 2: extra amino acids are not wanted upstream of a signal peptide and exon 2 is out of register in all three reading frames plus lacks an initiation codon, as seen from translation:
DS-IFFKTEQFQPCLSFPSSWRHKSSLAEPQQI TPEYFSKLNNFSHV-AFRLPGGTNLV-LNHNR LLNIFQN-TISAMSELSVFLEAQI-FS-TTTDAlternative splicing, resulting in exon 123 or exon 13 splice products, has been demonstrated for hamster, mouse, rat, cow, and sheep -- in all species studied except human, which so far only shows exon 13 splicing. It is fair to say that 1-3 can be the more common variant in the other species. Whole genome projects are mass-sequencing cDNA from a wide variety of tissues; these are kept separately from GenBank in STS or EST databases that are separately searchable. However, using human exon 123 as probe (or subsets thereof), nothing is found in these databases, though mouse exon 123 and mouse exon 13 yield solid returns.
A very large number of intron/exon junctions have been determined -- a single issue of Genomics alone contains many dozens of new ones. In many cases it is known whether these are subject to alternative splicing. Specialized online software has been developed to detect valid exon-intron boundaries: human exon 2 is affirmed by caomparisons and these methods. However software cannot at this time determine tissue specificity, regulatory response, or proportions of alternative splice useage (zero here so far).
To understand the protein and RNA spliceosome requirements for exon 2 functioning, eukaryotic splicing machinery must be understood. However splicing is sufficiently complex that no help is in sight, as seen here in a very recent paper:
"Pre-mRNA splicing requires the bridging of the 5' and 3' ends of the intron involving interactions between the WW domains in the splicing factor (FBP11 and FBP21) and a proline-rich domain in the branchpoint binding protein, BBP, in spliceosomal complex A associated with U2 snRNPs colocalized with splicing factors in nuclear speckle domains. FBP21 interacts directly with the U1 snRNP protein U1C, the core snRNP proteins SmB and SmB', and the branchpoint binding protein SF1/mBBP with a role in cross-intron bridging of U1 and U2 snRNPs in the mammalian A complex suggest ing a network of interactions between specific splicing factors bound to the 5' splice site, SF1/BBP bound to the branch site, and U2AF bound to the 3' splice site in the spliceosomal complex E. FBP11 protein binds the splicing factor SF1/mBBP; FBP21 also interacts with SF1/mBBP and binds three other splicing factors, SmB, SmB', and U1C. FBP21 corresponds to a spliceosome-associated protein that is detected in the spliceosomal complex A and present in spliceosomes assembled on different pre-mRNAs associated with U2 snRNPs."
From the sequence alignment below, one sees an extraordinary 75% average pairwise sequence identity and fully half of all bases completely invariant. Splicing donor/acceptor sites cannot account for this well-conserved sequence because these typically consist of just 4 nucleotides, AG-GT, belonging to the flanking introns. Note that exon 2 is almost as well conserved as the protein coding region itself.
28 Aug 98 webmaster
Knowledge of the mammalian prion gene is bounded by the marsupial and chicken 5' UTR sequences. These are sequenced only in the region of exon 3; at least one splice donor must lie upstream. Both species have uninterupted ORFs. Marsupial still has 10 bp of exon preceding the start codon with residual homology to placental mammals; chicken has lost all visible similarities and has only 2 bp in its exon 3 prior to the ATG of methionine.
It is possible to date recent 'growth' of human prion introns using the Alu elements, which are primate-only and thoroughly studied. Other mobile elements are shared with counterparts in the same position in sheep and mouse, suggesting their insertion predates the divergence of these species.
Exon 3 is different. The 10 bp upstream of the coding sequence has both the intron-exon splice junction to preserve as well as initiation of ribosomal translation. The latter region consensus sequence was described by Kozak (Mamm.Genome (1996) 7:563-574) as (gcc) gcc a/gcc AUG G, with residues -3 and +4 especially significant for optimal ribosome binding.
The reported human prion sequence, U29185, contains 12, 623 bp upstream from exon 1 -- does this contain the 3' end of another human gene? If so, these might shed light on the general area of metabolic function of the prion gene, even though eukaryotic genes are rarely organized into operons.
GenBank searches for genes flanking the prion gene 4 Sept 98 webmaster
This region, as the GenBank entry shows, contains many retrotransposons such as Alu, but these have sufficient gaps to accommodate exons and ORFs for another gene. A Blast search on 4 Sept 98 against 808,000,000 database letters for all of the region upstream of exon 1 failed to turn up any homologies to coding regions of other genes, though transposons and mobile elements are represented elsewhere in thousands of places in the genome: 42,079 significant hits were found.
Similarly, a search of chromosome 20pter-12 for neighbors of PrnP GDB:120720 shows CHGB Hs.2281 SCG1 chromogranin B (secretogranin 1 GDB:118770), PCNA Hs.78996 proliferating cell nuclear antigen GDB:120261, and PDYN prodynorphin GDB:120269 as known genes in this region (only 135 genes, 5 pseudogenes, and 1 unknown gene have been identified on the whole chromosome 20 thus far; the human genome has 7436 known individual genes). However, precise placement of these and other genes relative to the prion gene is unknown.
The gene responsible for spongiform degeneracy in the zitter rat has still not been pinned down though it maps close to the prion gene on chromosome 3. Exon 2 was sequenced and found normal in zitter rats; enhanced mRNA was not found.
CENPB, CHGB, AVP, AGS, OXT, PCNA, PDYN, PRNP, BMP2, RA, FKBP1, CSNK2A1, CDC25B, ADRA1D, F15, NBIA1, NDUFA7, SN, PLCB4, 6175920, SOX22, P3, GNRH2, HMCS.
These are the relevent Medline hits from 22 papers containing 'prion AND exon.' An important additional reference has only appeared on GenBank and in the Erice symposium
Prions and Brain Diseases in Animals and Humans, ed. D. Morrison NATO ISI Series Plenum Press ISBN 0-306-45825-X ... August 19-23, 1996 Erice workshop Large-scale sequencing of human, mouse, and sheep prion protein genes, pg 59-76 Lee,I.Y., Westaway,D., Smit,A.F., Cooper,C., Yao,H., Prusiner,S.B. and Hood,L. GenBank entries U29185, U67922, U29186 corresponding to this article refer to unpublished item: Structure and Organization of Chromosomal Regions Carrying the Mammalian Prion Gene from Three SpeciesA region strongly homologous to exon 2 in other species is identified in humans. The 5' half is more conserved than the 3' region. Exon 2 is annotated at GenBank. All mammalian species should use 123 exon nomenclature whether or not expression has been seen for exon 2. Figures 1-7 provide important global registration of sequences from mouse, human, and sheep; CpG islands, retrotransposon positions, and alginments. Figure 7A finds a non-maximal registration of these 3 species within intron 1 extended below (webmaster) to all 6 species. Note that sequence numbering in the book is not concordant with GenBank numbering, eg, human 14408 in figure 7A corresponds to 14775 at GenBank.
Horiuchi M, Ishiguro N, Nagasawa H, Toyoda Y, Shinagawa M Biochem Biophys Res Commun 1997 Apr 28;233(3):650-654Cattle use exon 1a23 and exon 1b23 equally (except in spleen, all exon 1a23 plus an uncharacterized minor product not 1b23). Exon 1a and 1b have identical start points but 1b contains 53+115=168 bp read-through to an alternate splice donor. Translation efficiency was the same. Usage of exons 2 and 3 was identical for the two mRNA species. This is the first case of tissue-specific alternative splicing for exon 1. Evidently spliceosomes have significant species and tissue differences; I haven't seen this investigated trans-genetically. Sheep were not found to use this distal splice donor despite a very similar sequence to the bovine 1b splice junction and to the splice consensus (C/A)AG--gt(a/g)agt. Adult cows had 5x the prion mRNA as an 8 month fetus; cows and sheep had reversed abundance in kidney and spleen.
Sheep, but not rodents and human, are 88% identical over this region but identical around the splice junction. No tissue has been found in sheep that expresses its exon 1b though the CpG island skew might be seen as supporting it. This feature of sheep prion gene is not annotated at GenBank. It is one thing to decide on what ORF to use to breed scrapie-proof sheep; it is quite another to decide on the 5' UTR sequence that will drive it.
Data for ragged transcription starts is shown by frequency in figure 1; the top line for 1a, the bottom for 1b:
...............4..5......1........12 ..1.....1.12..10..3......1..1...2.11 TTACCCGCCCTAGTTGCCAGTCGCTGACAGCCGCAGA
Inoue S, Tanaka M, Horiuchi M, Ishiguro N, Shinagawa M J Vet Med Sci 1997 Mar;59(3):175-183We cloned the part of the bovine PrP gene which contains the 5'-flanking region, exon 1, exon 2 and intron 1 to analyze its promoter region. The 5' non-coding region of the bovine PrP gene consisted of three exons and two introns, and its organization was similar to that of the mouse, rat and sheep PrP genes. The 5'-flanking region of the bovine PrP gene from the transcription start site to nucleotide position -88 was (G + C)-rich (78%) and contained three potential binding sites for the transcription factor Sp1, but no CCAAT-box or TATA-box. [They took CCGCCC = Sp-1 and CCCCGGGC = AP-2 (inactive). -88 to -30 seemed to be key for promoter activity, as tested with CAT.].
This region showed high homology (89%) with that of the sheep PrP gene, but relatively low homology (approximately 46-62%) with the same region of the mouse, rat, hamster and human PrP genes. The position from -88 to -30 within the 5'-flanking region of the bovine PrP gene showed major promoter activity. However, this region was able to function properly only in collaboration with the region at +123 to +891 of intron 1 of the bovine PrP gene. [ This fits with CpG island overhand. Nine tissues were studied, with brain, spleen, adrenal glands and kidney high but lymph nodes and skeletal muscles low.]
August 1998 TSE meeting in Iceland; posters P10 and P11 and talk T56 from the Goldmann-Hunter group addressed control regions in sheep.A 524 bp sheep promoter fragment was studied with a reporter gene and a series of deletions. A mutation G to T at -96 knocks out the single AP-2 and conveniently a Sma1 restriction site. A single SP-1 was found at -48. They are now looking at genotype and breed promoter variations. [These numbers do not work on GenBank sheep sequences, possibly because of different assumed start points -- it might be better to number from the end of exon 1.]
They found the 3' UTR had tissue-specific use of alternative polyadenylation sites that differed signficantly in their translation efficiency and regulation during development. They note further, "Additional RNA processing in other postitions of the untranslated regions has been found resulting in a complex system of post-transcriptional modulation of PrP expression ['ruminants only', implying cows and non-ruminants were tested. ].... A parallel study of PrP polymorphisms revealed a high variability in the 3' UTR within and between breeds..." This was investigated further with 3' UTR deletions. The mRNAs are given as 4.6kb and 2.1 kb in P10 which are said to differ by 2.3kb in T56. Online poly-A software can probably find the [unreported] alternative site.
Hum Mutat 1996;7(3):280-281 no abstract GenBank staff-entry S82948 Palmer MS, van Leeven RH, Mahal SP, Campbell TA, Humphreys CB, Collinge JThis paper, on its face, is an obscure technical note that calls attention to a widely used faulty sequencing primer, PDG-45. This came about because a 1989 paper on GSS by Hsiao (1989) Nature 338:342 inadvertently uncovered an uncommon allele at position -21 (relative to the start of exon 3) that was mistaken for wildtype and used by others for diagnostic sequencing in a clinical decision setting. Because the G to A at -21 occurs early on in the primer, it causes certain alleles not to be amplified. They sequenced 62 controls, finding seven A-21 among the 124 alleles for a frequency of 5.6%.
Now it gets more curious. They sequenced two [related] A117V cases, finding G-21A GCA117GTA and -129V for the haplotypes, ie, the polymorphism at -21 was on the disease allele. This was also the result in the original GSS case. The seven normal A-21 all co-occurred with GCA117GCG, the silent A117A polymorphism called PvuII negative in restriction language (10% of European population). In other words, the -21 intron polymorphism has a curious correlation with codon 117.
GenBank staff-entry S82948 thus gives the wildtype sequence in this region, which agrees with the more modern U29185; the Hsiao (1989) sequence never made it into the database, which is just as well as it seems to have 3 upstream sequencing errors:
tgataccattgctatgcactcattcattatgcaggaaacatttagtaatttcaacataaatatgggactctgac g ttctcctcttcattttgcag agcagtcatt ATG S82948 tgataccattgctatgcactcattcattatgcaggaaacatttagtaatttcaacataaatatgggactctgac g ttctcctcttcattttgcag agcagtcatt ATG U29185 ........................cattatgcag-aaacatttagtaatt-caacataaatatggAactctgac A ttctcctcttcattttgcag agcagtcatt ATG Hsiao (1989) ............................................................................................................................................ttttgcag agcagtcatt ATG Puckett X83416No one ever looked at this polymorphism again, not in sporadic CJD, not in nv CJD, not in familial CJD. Obviously the control regions should have been sequenced long ago in a couple thousand cases of sporadic CJD and all nvCJD. Little work is involved because exon 1, exon2, distal intron 2, and their flanking regions involve just a few hundred bases with known primers; middle stretches of the introns are not needed. (Eleven other species were sequenced in the exon 3 region but it is changing rather fast, no big surprise for an intron. Of course, the interest is over-production attributable to regulatory sequences driving accumulation of rogue conformer.
Notice that -21 is precisely at the boundary of the standard poly-pyrimidine tract that comprises part of the splice acceptor region. This feature is strongly conserved even though sequence specifics are not. This raises the questions of how efficiently the splice is made and whether the splice donor at cryptic human exon 2 might not be expressed differently in the -21A setting.
Poster session, Am Soc Hum Genetics 1996 v46 Mahal SP, Beck JA, Palmer MS, Antoniou, M, Collinge JMeeting abstract, no follow-up as of 14 Sept 98. Interpreted 2.9 kb of 5' UTR as containing Ap-1, Ap-2, CBP, MyoD, NF-IL6 and heat shock factors and a 200 bp active promoter. Two polymorphisms were found in sporadic CJD 600bp upstream of exon 1. nvCJD cases also to be sequenced in this region.
Proc Natl Acad Sci U S A 1994 Jul 5;91(14):6418-6422 Westaway D, Cooper C, Turner S, Da Costa M, Carlson GA, Prusiner SB...We retrieved mouse PrP gene (Prn-p) yeast artificial chromosome (YAC), cosmid, phage, and cDNA clones. Physical mapping positions Prn-p approximately 300 kb from ecotropic virus integration site number 4 (Evi-4), compatiblewith failure to detect recombination between Prn-p and Evi-4 in genetic crosses. The Prn-pa allele encompasses three exons, with exons 1 and 2 encoding the mRNA 5' untranslated region. Exon 2 has no equivalent in the Syrian hamster and human PrP genes. [wrong -- see Erice ref]
The Prn-pb gene shares this intron/exon structure but harbors an approximately 6-kb deletion within intron 2 [ defective IAP retrovirus]. While the Prn-pb open reading frame encodes two amino acid substitutions linked to prolonged scrapie incubation periods, a deletion of intron 2 sequences also characterizes inbred strains such as RIII/S and MOLF/Ei with shorter incubation periods, making a relationship between intron 2 size and scrapie pathogenesis unlikely. The promoter regions of a and b Prn-p alleles include consensus Sp1 and AP-1 sites, as well as other conserved motifs which may represent binding sites for as yet unidentified transcription factors.
Comment (webmaster): No polymorphisms distinguishing a and b mice strains were found in exons 1, 2. The splice donor and acceptor sites are said to differ from a consensus by 3/13 and 2/8 mismatches. Figure 2 shows a difference upstream abutting the AP-i site: the b allele is TGACTCA where the a allele is TGACTCA. AP-1 is noted to be a dimeric transcriptional activator composed of Fos and Jun proteins. The motif terminology is launched.in a 4 species alignment (figure 4). A speculative resemblance to sequences found in muscle-specific genes is noted; Mahal et al noted a human MyoD binding site in a meeting abstract. A previous study found muscle degneration in an over-production setting (Cell 76 117-129 1994).
Baybutt H, Manson J Gene 1997 Jan 3;184(1):125-131 GenBank U52821 3498 bp...In order to define the sequences that are responsible for the normal expression of the PrP gene we have isolated and sequenced a 5' region of the murine PrP gene, which includes 1.2 kb upstream from exon 1, intron I and exon 2. Sequencing of this region from several strains of mice identified a polymorphism linked to Sinc, the gene controlling the incubation period of scrapie in mice.
We used this gene fragment and deletions of it to examine promoter mediated expression of a chloramphenicol acetyl transferase reporter gene in neuroblastoma cells (N2a). Both promoter and suppressor elements were identified within this region. The two major areas of promoter activity were sequencesadjacent to and 5' to exons 1 and 2. The 5' region of intron 1 was shown to contain elements that were capable of suppressing promoter activity. Transcription factor binding sites have been identified within these sequences.
Comment (webmaster): Binding sites for AP-1, AP-2 (TCCCCAG), and SP-i are vaguely indicated in figure 1 and not all confirmable. Unspecified 'binding sites' are also vaguely described in intron 1 1960-2020 and 3328-3358. Significant transcription remained (21%) after deleting 1-1241 which includes exon 1. Fig 3 shows the effects of various deletions; 1864 to 2309 in the center of intron 1 has suppressor activity on both the exon 1 and exon 2 promoters. Transcriptions starts were at 1151-1159 and 2750-2776 (exon 1 deleted). The authors suggest tissue specificity may arise through suppression of transcription. Table 1 gives polymorphisms in various mouse sinc strains that could not be correlated with incubation time as plausibly as amino acid 108 and 189. Further work using smaller deletions and site-directed mutagenesis is underway; competing oligonucleotides may have therapeutic possibilities.
Biochem Biophys Res Commun 1996 Feb 6;219(1):47-52 GenBank D50092 Saeki K, Matsumoto Y, Matsumoto Y, Onodera* *fax: 81-3 5800-6974 TokyoWe have demonstrated the presence of a rat prion protein (RaPrP) gene promoter upstream of multiple initiation sites. A 0.1-kb fragment upstream of the 5'-untranslated region contains specific DNA motifs characteristic of promoter elements including an AP-1 binding site, an inverted CCAAT motif [that reads ATTGGTG] and three inverted Sp-1 binding sites. This fragment directs transcription of a luciferase reporter gene in pheochromocytoma cells (PC12) and rat glioma cells (C6), suggesting that it contains the promoter for the RaPrP gene. To more precisely localize the transcription regulatory elements in this region, a series of 5'-deletion mutants were generated. Deletion analysis showed that an inverted CCAAt and adjoining Sp-1 binding sequences may play an important role in transcription of the RaPrP gene.
Comment (webmaster): Figure 1 shows 2831 bp upstream and 168 bp downstream and the location of regulatory elements such as AP-2 (cggTCCCCAGctc) at -597 to -591; Figure 3 shows rat, mouse, and hamster gapped and aligned, with motifs 1-4, the AP-1, the CCAAT, 3 x Sp-1 and the start of exon 1. No TATA, CRE, NF-kB, or OTF-1 sites were found in either orientation. The motif region was deleted without effects on the promoter. The inverted consensus SP-1 binding sequence is (G/A)(C/T)(C/T)(C/A)CGCC(C/T)(C/A); one base mismatches were seen. About 90 bp were needed upstream of the transcriptional start site.
Virus Genes 1996;12(1):15-20 Saeki K, Matsumoto Y, Hirota Y, Matsumoto Y, Onodera T* *fax: 81-3 5800-6974 TokyoThe prion protein (PrP), encoded by a chromosomal gene, is associated with development of the neurodegeneration of prion-induced diseases. Since determination of the complete structure of the gene encoding PrP is important for understanding gene expression in the central nervous system (CNS), the nucleotide (nt) sequence of the isolated whole gene encoding rat PrP was determined.
The rat PrP gene (chromosome 3) spans 16 kilobases (kb) of the rat genome and contains three exons of 19-47 base pairs (bp), 98 bp, and 2 kb separated by two introns of 2.2 kb and 11 kb. The first and second exons are noncoding, while the third exon contains a short 5' untranslated region, the entire 762-bp open reading frame (ORF), and a 3' untranslated region. The putative raPrP promoter in the 5' flanking region contains putative Sp1, AP-1, and AP-2 binding sites without a consensus TATA box.
This TATA box-deficient feature, coupled with the presence of a high G+C content and Sp1-binding sites in the raPrP promoter, characterizes it as a housekeeping gene. Analysis of the raPrP cDNA 5'-end showed that raPrP mRNA transcription was initiated at multiple sites. Northern blot analysis showed that the levels of raPrP mRNA varied among rat tissues, with the highest levels found in the brain and placenta. This determination of raPrP nt sequences, including the introns and the 5' and 3' flanking regions, may make it possible to elucidate cis-acting elements that regulate the expression of this gene in different tissues and cell lines.
Comment (webmaster): Figure 1 shows the ragged transcription start points by frequency. These result in a 19-47 bp mRNA portion of exon 123 products:
2....1.81..*2.83.......2.3.3................... GCGTTGTCAGCGCAGCAGACGGAGTCTGAGCGTCGCGTCGGTGGCAG * = 12Nerve growth factor, insulin-like growth factor 1, and human growth hormone are known to increase prion protein expression in this rat cell line. mRNA was found in spleen, liver, lung, kidney, heart, testis, brain, and placenta. The latter two tissues were highest; spleen was low and liver undetectable.
Kuramoto T, Mori M, Yamada J, Serikawa T Biochem Biophys Res Commun 1994 Apr 29;200(2):1161-1168Spontaneously epileptic rat (SER) is a homozygote for both tremor (tm) and zitter (zi) genes and exhibits epilepsy-like seizures and spongiform encephalopathy. Genetic linkage analyses revealed that the tm and zi loci were tightly linked to the synaptobrevin-2 (Syb2) on chromosome 10 and the prion protein (Prnp) on chromosome 3, respectively. The genomic DNA sequences of Syb2 of the tm/tm (TRM) rats and exon 2 of the Prnp of the zi/zi (ZI) rats were identical to those of a control rat strain WTC. In addition, no difference was detected for expression of the Syb2 and Prnp on the Northern blot analyses of TRM, ZI and WTC brain,
Cell 1986 Aug 1;46(3):417-428 Basler K, Oesch B, Scott M, Westaway D, Walchli M, Groth DF, McKinley MP, Prusiner SB, Weissmann C GenBank M14055PrP 27-30 is the major protein in purified preparations of scrapie agent. An almost complete PrP cDNA was used to select PrP-related genomic clones from normal hamster DNA. The gene contains a noncoding exon of 56 to 82 bp and a 2 kb coding exon, separated by a 10 kb intron. Transcription initiates at the same multiple sites in vivo and in vitro. The promoter lacks a TATA box and contains three repeats of the sequence GCCCCGCCC, which resembles the Sp1 binding site found in "housekeeping" genes. The PrP coding sequence encodes a presumptive amino-terminal signal peptide. The primary structure of PrP encoded by the gene of a healthy animal does not differ from that encoded by a cDNA from a scrapie-infected animal, suggesting that the different properties of PrP from normal and scrapie-infected brains are due to post-translational events.
Brain Res. 1997 Mar 21; 751(2): 265-274. Li G, Bolton DC GenBank U78769 see also M14055This meticulous study found that hamsters express both exon 123 and 13 mRNA isoforms. For each exon 123 mRNA in colliculi, 2.9 exon 13 mRNAs are found. In hippocampus, this ratio is 3.5; in frontal cortex, it is 2.1 (statistically significant increase). During scrapie infection, relative expression of exon 123 increased in colliculi, reaching 1:1.25 relative to exon 13 mRNA (or 2.5x of previous exon 123 levels), which was attributed to proliferating astrocytes. 492 bp was sequenced about exon 2. The central region of hamster intron 1 has not been determined.
These ratios are turned around in mouse: only exon 123 expression is reported, whereas human is solely exon 13 so far. No exon 123 was found here in human, using isolated neutrophils, frozen adult brain, or a commercial cDNA adult brain library. Cattle use exon 1a23 and exon 1b23 equally where exon 1b contains 53+115=168 bp read-through to an alternate splice donor.
The isoform ratio in pure astrocytes (nor other cell types) was not determinable; isoform ratio changes during infection may reflect changes in cell type ratios rather than changes in isoform ratios within a given cell type; no change was seen in rates of prion mRNA synthesis overall.
Thus, mRNA isoform useage is tissue- and cell-type specific. Lesser isoforms are easily missed in whole brain RNA or in studies when only 4-5 cDNA clones are taken. Relative stabilities, translational efficiencies, and possible target-encoding are unknown. Isoforms in the prion gene have nothing whatsoever to do with producing alternative protein products.
Based on GenBank and Blastn data containing 5'UTR exons as of 1 Sept 98 -- webmaster
|Non-redundant set of exon 2 sequences|
|cow||D26150 = AB001468 = D10612|
|mouse||U29186 = U52821 = X79931 = M13685|
|sheep||U67922 = X79913|
>Consensus_exon2 GACTcCTGAaTATaTTtcAaAACTGAACaaTTTCAaCcaagctgaAGcattCtGtcTTcctaGaGgtACcagTccagtTTAGgaGAgcCAcAgCaGAtt >U29185 Homo sapiens Lee IY 1998 exon 2 gactcctgaatatttttcaaaactgaacaatttcagccatgtctgagctttccgtcttcctggaggcacaaatctagtttagctgaaccacaacagatt >D26150 Bos taurus Yoshimoto J 1994 exon 2 gacttctgaatatatttgaaaactgaacagtttcaaccaagccgaagcatctgtcttcccagagacacaaatccaacttgagctgaatcacagcagat >U67922 Ovis aries Lee IY 1998 exon 2 gacttctgaatatatttgaaaactgaacagtttcaaccaagctgaagcatctgtcttcccagagacacagatccaacttgagctgaatcacagcagat >U29186 Mus musculus Lee IY 1998 exon 2 gactcctgagtatatttcagaactgaaccatttcaaccgagctgaagcattctgccttcctagtggtaccagtccaatttaggagagccaagcagact >D50092 Rattus norvegicus Saeki,K. 1997 exon 2+ gactcctgaatatatttcaaaactgaaccatttcaacccaactgaagtattctgccttcttagcggtaccagtccggtttaggagagccaagccgact >U78769 Mesocricetus auratus Li G and Bolton 1996 exon 2+ gactcctgaatatattccaaaactgaacaatttcaactgagctgaagtactctgtttttctagaggtaccagttcagtttaggagagtcacagcagatc
>D26150 Bos taurus Yoshimoto J 1994 exon 2+ bovine gene in mouse L-929 cells
>AB001468 Bos taurus Yoshimoto,J 1997 exon 2 brain
>D10612 Bos taurus Yoshimoto,J 1993 exon 2 brain
>X79913 Ovis aries Westaway D, Zuliani V 1994 exon 2 adult brain
>U67922 Ovis aries Lee IY 1998 exon 2 adult brain
>U29186 Mus musculus Lee IY 1998 exon 2 brain
>D50092 Rattus norvegicus Saeki,K. 1997 exon 2 liver
>U78769 Mesocricetus auratus Li G and Bolton 1996 exon 2 scrapie-infected inferior, superior colliculi
>U52821 Mus musculus Baybutt,H.N 1997 exon 2 neuronal cells
>X79931 M.musculus Westaway,D 1994 exon 2 adult brain
>M13685 M.musculus Locht,C 1986 exon 2 scrapie infected brain
Compiled from GenBank and Blastn on 1 Sept 98 -- webmaster
Note: GenBank marsupial enry suggests an additional 3' splice site at 230 based on pattern similarity. 1 aagcttcagc tggctggctg gtgtccaaag aaggttaagt gtcgcttcta aagggtttct 61 cccaaaagaa catcaaagaa agtttacact tcatattgca ttcaaggctg ccaatctttg 121 ctgtttttta atagaagcat cctactcctt cctgatatca tattatgaca attaaaaatg 181 acatatatgt gtcttaattg tgtttctttt ttcccctcct tttcctttag tggtttctaa 241 ataaacccag aattttcatg tctttttttt tttccagatc acctaccatg ggaaaaatcc
|intron 2 distal portion||exon 3 UTR||ORF||GenBank||Species||Reference|
|gcttcagcctgagtgccggacactgatgccttgttcttcatttcacag||atcagccatc||atg||D50093||Rattus norvegicus||1997 Saeki,K|
|..................................tcctcattttgcag||atcagtcatc||atg||S69654||zitter rats||Gomi,H 1994|
|aatgacgtgttgctggagtacaatgatgccttgttcttcattttgcag||atcagccatc||atg||M14054||Mesocricetus auratus||1988 Basler,K|
|agtgttgtgttgttggagtatactgacgccttgttcttcattttgcag||attagccatc||atg||M33958||Chinese hamster||Lowenstein,DH 1994|
|ccttcagcctaaatactgggcactgataccttgttcctcattttgcag||atcagtcatc||atg||U29186||Mus musculus short inc||Lee,IY 1998|
|atttcaacataaatatgggactctgacgttctcctcttcattttgcag||agcagtcatt||atg||U29185||Homo sapiens||Lee,IY 1998|
|..................tcattttgttttgttttgttttgtttgcag||ataagccatc||atg||S46825||Mustela sp.||Kretzschmar,HA 1993|
|gtgatttttacatgggcatatgatgctgacaccctctttattttgcag||ataagtcatc||atg||D26151||Bos taurus||Yoshimoto,J 1994|
|gtgatttttacgtgggcatttgatgctgacaccctctttattttgcag||agaagtcatc||atg||U67922||Ovis aries||Lee,IY 1998|
|gtgattcttacgtgggcatttgatgctgacaccctctttattttgcag||agaagtcatc||atg||AJ000681||Ovis aries||Bossers,A 1997|
|.......................................attttgcag||agaagtcatc||atg||X91999||Capra hircus||Goldmann,W 1996|
|ggctttagcatcggtccaggccactgacagcctcctctctctttccag||gtcagctgtc||atg||U28334||Oryctolagus cuniculus||Loftus,B 1997|
|gtggtttctaaataaacccagaattttcatgtctttttttttttccag||atcacctacc||atg||L38993||Trichosurus vulpecula||Windl,O 1995|
|..............actgccctaacagtgtgtgtccttatgcccgcag||cc||atg||M95404||Gallus gallus||Gabriel,JM 1998|
Redundant GenBank entries in this region: sheep: U67922 = D38179. C-42T polymorphism relative to 6 other sheep sequences, cows are C-42T, T-29A, and G2T. sheep: AJ000736 = AJ000681 = AJ000679 = M31313 = AJ000680 = X91999. human: S82948 = U29185 = X83416. Polymorphism G-21_ reported in sporadic CJD. mouse: U29186 = M18070 = M18071
Non-redundant set, fasta format exon 3 and upstream intron 2 flanker: >Consensus_probe_sequence TttagccTaggtacaggacacTGAcgcccTgttCtTcatTTtgCAGatcAGccaTcATG >D26151_cow gtgatttttacatgggcatatgatgctgacaccctctttattttgcagataagtcatcatg >D38179_sheep gtgatttttacgtgggcatttgatgctgacaccctctttattttgcagagaagtcatcatg >D50093_rat gcttcagcctgagtgccggacactgatgccttgttcttcatttcacagatcagccatcatg >M18070_mouse ccttcagcctaaatactgggcactgataccttgttcctcattttgcagatcagtcatcatg >M18071_mouse ccttcagcctaaatactgggcactgataccttgttcctcattttgcagatcagtcatcatg >S46825_mink ..................tcattttgttttgttttgttttgtttgcagataagccatcatg >S82948_human atttcaacataaatatgggactctgacgttctcctcttcattttgcagagcagtcattatg >U28334_rabbit ggctttagcatcggtccaggccactgacagcctcctctctctttccaggtcagctgtcatg >U29185_human atttcaacataaatatgggactctgacgttctcctcttcattttgcagagcagtcattatg >M14054_golden_hamster aatgacgtgttgctggagtacaatgatgccttgttcttcattttgcagatcagccatcatg >M33958_Chinese_hamster agtgttgtgttgttggagtatactgacgccttgttcttcattttgcagattagccatcatg >U29186_mouse_short_inc ccttcagcctaaatactgggcactgataccttgttcctcattttgcagatcagtcatcatg
Compiled from GenBank and Blastn searches 1 Sept 908 -- webmaster
5' end of exon 1 3' end of exon 1 5'intron 2 or splice GenBank Species and source gcggcgtccgagcagcagaccgagaaggc acatcgagtccactcgtcgcgtcggtggcag gtaagcggcttctgaaggta M14055 Mesocricetus auratus Basler,K 1987 gcg---ttgtcggatcagcagacc gattctgggcgctgcgtcgcatcggtggcag gtaagcgggctgctgaagcc X79932 Mus musculus Westaway,D 1994 gcg---ttgtcggatcagcagacc gattctgggcgctgcgtcgcatcggtggcag gtaagcgggctgctgaagcc U29186 Mus musculus Lee,I.Y 1998 gcg---ttgtcggatcagcagacc gattctgggcgctgcgtccgatcggtggcag gtaagcgggctgctgaagcc U52821 Mus musculus Baybutt,HN 1997 gcg---ttgtcagagcagcagacg gagtctgagcgtcgcg-----tcggtggcag gtaagcgggctgctgaagcc D50092 Rattus norvegicus Saeki,K 1997 ....gccagtcgctgacagccgcaga gctgagagcgtcttctctctcgcagaagcag gacttctgaatatatttgaa+ D10612 Bos taurus Yoshimoto,J 1993 ....gccagtcgctgacagccgcaga gctgagagcgtcttctctctcgcagaagcag gacttctgaatatatttgaa+ AB001468 Bos taurus Yoshimoto,J 1997 agttgccagtcgctgacagccgcaga gctgagagcgtcttctctctcgcagaagcag gtaaatagccgcgtagtcct* D26150 Bos taurus Yoshimoto,J 1994 ctagttgccagtcgctgacagccgca gagctgagagcgtcttctctcccagaggcag gtaaatagccacgtagtcct X79914 Ovis aries Westaway,D 1996 ctagttgccagtcgctgacagccgca gagctgagagcgtcttctctcccagaggcag gtaaatagccacgtagtcct U67922 Ovis aries Lee,IY 1998 gccagtcgctgacagccgcggcgccg# cgagcttctcctctcctcacgaccgaggcag gtaaacgcccggggtgggag U29185 Homo sapiens Lee,IY 1998 gccagtcgctgacagccgcggcgccg# cgagcttctcctctcctcacgaccgaggcag gtaaacgcccgggg...... X83415 Homo sapiens Puckett,C 1996 gccagtcgctgacagccgcggcgccg cgagcttctcctctcctcacgaccgag----@ agcagtcattatggcgaacc X82545 Homo sapiens Kniazeva,MV 1997# X83415 and U29185 begin farther in the 5' direction with ccgcccgcgagcgccgccgct tcccttccccgccccgcgt ccctccccctcggccccgcgc gtcgcctgtcctccga.
Oddly the putative exon 1b splice junction is not used in sheep despite an almost identical sequence after the end of 1a: gcag...1a end..gtaaatagccacgtagtcctttaaacccccagcggaggccgcccccggcttgcggccgagg ccctagggcactcagccggatcggactggctgggaggcagaccttgacc...1b end..gtgaggaggactgggggc ttccggcgggcgcggggaacgtcgggcctgttt. Inoue reports intron elements 123-891 to be important for exon functioning in bovine.Rodents and human align poorly with this stretch of artiodactyl prion gene:
101 150 cow GACCTTGACC ...GTGAGTA GGG.CTGGGG GCT she GACCTTGACC ...GTGAGGA GGA.CTGGGG GCT hum GGTCGGGACC CCAGTGAGGA GGGGCCGGGG GCT
>rat D50092 tcttcctctttaccaatttcttgttaccaaagttccacgatggcctttttctttccgttaggtaacctttcattttctcgactacccattatgtaacgggagcgctgggttctggatcagtcttccattaaagatgacttttatagtctgtgagcgtcgtcacagagtgctgacactggggtggggaggggagtacggggggagggggttaaacagataacaagcatttaagccagtacggagcggtgactcatcccaccgcgagaagccattggtgagcatcacgctccgcccctcgccccgcccagcccccggcctgtcgggtccctcaccacgccccgctcccccgcgttgtcagagcagcagacggagtctgagtctgagcgtcgcg-----tcggtggcag >mouse2 U29186 Lee tcttcgttaccaatttcttgttaccaaagttcaacgatggcttcctcgctccgttaggtaacctttcattttctcaactacccattatgtaacgggagcattgggtactggatcagtcttccattaaagatgatttttatagttgctgagcgtcgtcagggagtgctgacactgggggcggtttaaacagatacaagcatttaagccagtccggagcggtgactcattcccccaccccccacccccccgcgagagacgcggcgcggccattggtgagcatcacgccccgcccctcgcccagcctagctcccgcctgccccgcccctttccactcccggctcccccgcgttgtcggatcagcagaccgattctgggcgctgcgtccgatcggtggcag >hamster M14055 Basler tctccctctttagcaatttcttgctcctagagtttcagcaattgctttctcgctccattaggcaacctttcattttctcaccttccccattatgtaacgggagcaatgggttctggaccagtcttccattaaagatgatttttatagtcggtgagcgccgtcagggagtgatgacacctgggggcggtttaaaccgtacaatcccttaaaccagtctggagcggtgactcatggcgcggccattggtgagcacgacgcaagccccgccccacccagcccggccccgccctgctacccctcctgactcactgccccgcccgctcccccgcggcgtccgagcagcagaccgagaaggcacatcgagtccactcgtcgcgtcggtggcag >human U29185 Lee tctcctctttagaaatttctggttgccaaagttccagaaattgcttcctcattcctgagcctttcattttctcgatttctccattatgtaacggggagctggagctttgggccgaatttccaattaaagatgatttttacagtcaatgagccacgtcagggagcgatggcacccgcaggcggtatcaactgatgcaagtgttcaagcgaatctcaactcgttttttccggtgactcattcccggccctgcttggcagcgctgcaccctttaacttaaacctcggccggccgcccgccgggggcacagagtgtgcgccgggccgcgcggcaattggtccccgcgccgacctccgcccgcgagcgccgccgcttcccttccccgccccgcgtccctccccctcggccccgc >sheep1 U67922=X79914 Lee tgtccttttcagaaatttctggttaccagagttcccgaaattgctttctcattccctaatctttcattttctccattacgtaacgagaagctggggctttggccgattttccctctaaagatgatttttatcgtcaacaagcaatttcagggagtgatgagccagggaggcggtgttagttgatgctagcgtttatgctagtctcaactcgtttttcccagggacttagattcctgggtctgccggtaaaccccgggcgcccgcagcgggcgcgcctgagcgtgcgcgcgccgtcgcctccccccccccgcagctcctcctctgcacggcgactcaccagccctagttgccagtcgctgacagccgcagagctgagagcgtcttctctcccagaggcaggt >cow D26150 Yoshimoto tgtcccttttagaaatttctggttaccaaagttccagaaattgctttctcattccctaatctttcattttctccattacgtaacgagaagctggggctttggccgattttccctttaaagatgatttttatcgtcaacaagcaatttcagggagtgatgagccggggaggcggtattagctgatgctagcgtttaagctagtctcaactcgtttttcccagggacttagattcctgggtctgccagtaaaccccgggcgccggcagcgggtgcgcctgagcgtcgcgcgcgccgtcgcctccccgcccctgcccctcctcctccgcccggcgacttacccgccctagttgccagtcgctgacagccgcagagctgagagcgtcttctctctcgcagaagca >mouse1 X79932 Saeki cttcctcgctccgttaggtaacctttcattttctcaactacccattatgtaacgggagcattgggtactggatcagtcttccattaaagatgatttttatagttgctgagcgtcgtcagggagtgctgacactgggggcggtttaaacagatacaagcatttaagccagtccggagcggtgactcattccccccaccccccacccccccgcgagagacgcggcgcggccattggtgagcatcacgccccgcccctcgcccagcctagctcccgcctgccccgcccctttccactcccggctcccccgcgttgtcggatcagcagaccgattct >mouse3 U52821 Baybutt aatgtcgaaaatcttcgttaccaatttcttgttaccaaagttcaacgatggcttcctcgctccgttaggtaacctttcattttctcaactacccattatgtaacgggagcattgggtactggatcagtcttccattaaagatgatttttatagttgctgagcgtcgtcagggagtgctgacactgggggcggtttaaacagatacaagcatttaagccagtccggagcggtgactcatccccccccacccccacccccccgcgagagacgcggcgcggccattggtgagcatcacgccccgcccctcgccccgcctagctcccgcctgccccgcccctttccactcccggctcccccgcgttgtcggatcagcagaccgattct
Motif Region through to exon 1>ra tcttcct-ctttaccaatttcttgttaccaaagttccacga-tggcctttttctttccgttaggtaacctttcattttctc >mo tcttc--g-tt-accaatttcttgttaccaaagttcaacga--tggcttcctcgctccgttaggtaacctttcattttctc >ha tctccctgctttac-aatttcttgctcctagagtttca-gcaattgctttctcgctccattaggcaacctttcattttctc >hu tctcct--ctttagaaatttctggttgccaaagttcca-gaaattgcttcctcattcc-t--g--agcctttcattttctc >sh tgtcctt--ttcagaaatttctggttaccagagttccc-gaaattgctttctcattccct-----aatctttcattttctc >co tgtccct--tttagaaatttctggttaccaaagttcca-gaaattgctttctcattccct-----aatctttcattttctc ....o.o.o....oo.o..ooooooo.o:o.o:o.oooo:o:.o:..:..ooo..oo..ooo.o.....o:.ooooooooooooo consensus >ra ctttcattttctcgacta-cccattatgtaacgg-gagcgctgggttctggatcagtcttccattaaagatgacttttatagtctgtgagcgtcgtcacagagt >mo ctttcattttctcaacta-cccattatgtaacgg-gagcattgggtactggatcagtcttccattaaagatgatttttatagttgctgagcgtcgtcagggagt >ha ctttcattttctcaccttccccattatgtaacgg-gagcaatgggttctggaccagtcttccattaaagatgatttttatagtcggtgagcgccgtcagggagt >hu ctttcattttctcgatttctccattatgtaacggggagctggagctttgggccgaatttccaattaaagatgatttttacagtcaatgagccacgtcagggagc >sh ctttcattttct--------ccattacgtaacgagaagctggggcttt-ggccgattttccctctaaagatgatttttatcgtcaacaagcaatttcagggagt >co ctttcattttct--------ccattacgtaacgagaagctggggcttt-ggccgattttccctttaaagatgatttttatcgtcaacaagcaatttcagggagt >ra tcacagagtgctgacac-tggggtggggaggggagtacggggggagggggttaaacagataacaagcatttaagccagtacggagcggtgactca >mo tcagggagtgctgacac-tgggggcggt-----------------------ttaaacagatacaagcatttaagccagtccggagcggtgactca >hu tcagggagcgatggcacccgcaggcggt-------------atcaactgatgcaagtgttcaagcgaatctcaactcgttttttccggtgactca >sh tcagggagtgatgagccagggaggcggt-------------gttagttgatgctagcgtttatgctagtctcaactcgtttttcccagggactta >co tcagggagtgatgagccggggaggcggt-------------attagctgatgctagcgtttaagctagtctcaactcgtttttcccagggactta >ra gtgactcat---cccacc------------gcgagaa----------gccattggtgagca----tcacgctccgcccctc--------gccccgcccagcccccgg-cctgtcgggtccctcaccacgcccc------------gctccccc_gcgttgtcagagcagcagacggagtctgag----------------cgtcgcgtcggtggcag >mo gtgactcattcccccaccccccacccccccgcgagagacgcggcgcggccattggtgagca----tcacgccccgcccctc--------gcccagcctagct-cccg-cct------------gccccgcccctttccactcccggctccccc_gcgttgtcggatcagcagaccgattctgggcg-----------ctgcgtcgcatcggtggcag >ha gtgactcat--------------------------------ggcgcggccattggtgagcacgacgcaagccccgccccacccagcccggccccgccctgctacccctcctgactca----ctgccccgcccg-------------ctccccc_gcggcgtccgagcagcagaccgag--aaggcacatcgagtccact-cgtcgcgtcggtggcag >hu gtgactca--ttcccggccctgc--ttggc-agcgctgcaccctttaacttaaacctcggccggccgcccgccgggggcacagagtgtgcgccgggccgcgcggcaattggtccccgcgccgacctccgcccgcgagcg_ccgccgcttcccttccccgccccgcccgcgtccctccccctcggccccgcgcgtcgcctgtcctccga------_gccagtcgctgacagccgcggcgccgcgagcttcc >sh gggacttagattcctgggtctgccggtaaaccccgggcgcccgcagcgggcgcgcctgagcgt-----------------------------------gcgcgcgccgt--------cgcc--tccccccccccgcagctcctcctctgcacggcgactcaccagc-----------------------------------cct------_agttgccagtcgctgacagccgcagagctgagagcgtct >co gggacttagattcctgggtctgccagtaaaccccgggcgccggcagcgggtgcgcctgagcgt----------------------------------cgcgcgcgccgt--------cgcc--tccccgcccctgcccctcctcctccgcccggcgacttacccgccct-----------------------------------------_agttgccagtcgctgacagccgcagagctgagagcgtct
"The Cs of most CpG dinucleotides in the human genome are methylated. Methyl-C tends to mutate to T, and so CpG dinucleotides tend to decay to TpG / CpA. This is believed to account for the fact that in bulk human DNA CpG dinucleotides occur about five times less frequently than expected (Bird, 1980, Jones et al 1992).
CpG islands are unmethylated regions of the genome that are associated with the 5' ends of most house-keeping genes and many regulated genes (Bird, 1986, Larsen et al 1992). The absence of methylation slows CpG decay, and so CpG islands can be detected in DNA sequence as regions in which CpG pairs occur at close to the expected frequency. The fact that CpG islands can be detected in this way indicates that the corresponding germline DNA has been substantially hypomethylated for an extended period of time, and in fact about 80% of CpG islands are common to man and mouse ( Antequera and Bird 1993 ).
About 56% of human genes and 47% of mouse genes are associated with CpG islands ( Antequera and Bird, 1993 ) Often CpG islands overlap the promoter and extend about 1000 base pairs downstream into the transcription unit. Identification of potential CpG islands during sequence analysis helps to define the extreme 5' ends of genes. CpG islands are commonly defined as regions of DNA of at least 200 bp in length and that have a G+C content above 50% and a ratio of observed vs. expected CpGs close to or above 0.6. "
Bird ( 1980 ) NAR, 8, 1499 - 1504 Bird (1986) Nature, 321, 209 - 213 Jones et al, (1992) BioEssays, 14, 33-36 Larsen et al, (1992) Genomics 13, 1095-1107 Antequera and Bird (1993) PNAS 90, 11995-11999 Cross et al (1994) Nature Genetics 6, 236-244 Gardiner-Garden and Fromer (1987)
There are 3 human retrotransposons in region (intron 1) between exon 1 and 2; rodents have 3 different ones; artiodactdyls have none:
human:intron 1 = 2622 bp; only last 180 bp related to cow/sheep repeat_region 11478..11800 AluJo = 323 bp exon 1 12634..12767 = 134 bp repeat_region 14413..14498 L1MC1 = 86 bp repeat_region 14583..14653 MIR = 71 bp = 353 bp total repeat_region 14752..14947 L1ME3 = 196 bp exon 2 15390..15488 = 469 bp mouse:intron 1 = 2190 bp; 83% identical to rat repeat_region 7980..8069 PB1D7 exon 1 8612..8658 = 47 bp repeat_region 9619..9844 B3 = 226 bp repeat_region 10044..10163 B1-F = 120 bp = 539 bp total repeat_region 10070..10163 PB1D7 = 193 bp exon 2 10849..10946 = 469 bp sheep:intron 1 = 2421 bp; 91% identical to cow repeat_region 3756..4215 MLT1F = 469 bp exon 1a 5666..5717 = 52 bp exon 2 8139..8236 = 98 bpThe CpG island in the sheep prion gene is similar: 89 occurences of CpG in 1020 bp (8.7%) of a normally severely depleted dinucleotide . Recall CpG are mutational hotspots when methylated to 5mC but that poly ADP-ribosylation protects CpG in promoter regions of eukaryotic genes.
Note that the anomaly is strongly skewed against the promoter and exon 1 proper, that is, it falls mainly to the 3' UTR side in intron 1. (Researchers confirmed the significance of this region with nested deletions.) The known retrotransposons do not kick in until much later, eg 12806..13148 MLT1F, 13146..13321 MLT1F, 13227..13648 MER57_internal, 16433..16524 Bov-tA2, whereas exon 2 is already at 8139..8236.
CpG incidence is displayed in terms of intervening nucleotides below, relative to exon 1 (positions 5666 - 5717).
ctcaact CGtttttcccagggacttagattcctgggtctgc CGgtaaaccc CGgg CGcc CGcag CGgg CG CGcctgag CGtg CG CG CGc CGt CGcctccccccccc CGcagctcctcctctgca CGg CGactcaccagccctagttgccagt CGctgacagc CGcagagctgagag CGtcttctctcccagaggcaggtaaatagcca CGtagtcctttaaacccccag CGgaggc CGcccc CGgcttg CGgc CGaggccctagggcactcagc CGgat CGgactggctgggaggcagaccttgac CGtgaggaggactgggggcttc CGg CGgg CG CGgggaa CGt CGggcctgtttag CGtgct CGttggtttttgccagccac CGct CGgttttgccctcctggttaggagagctccatttact CGgaatgtggg CGggggc CG CGgctggctggtccccctcctgaagtatgtgggtggtgtgtaggaatctagccccctccca CGct CGtccactg CGggagtggcatggg CGgat CGcac CGgtagaggggc CGcagtc CGaggaac CGctggggagatcagaagaacaag CGagaggccc CGggctctgggccctcc CGaagcccag CGgaga CG CGgaattgggggtggggggtggggaagaag CGgg CGcccaa CGgggccagacct CGgc CGtgaggagtgc CGgag CGac CGtgggcccccagc CGctgctgc CGaactcctcc CGagagg CGgccctgcttgccatca CG CGgctgggaggtacctgggtagc CGcag CGggtgggtctctggcagccccctggggat CGgct CGgg CGgg CGtg CGtggcctgggcttcagcct CGg CGaggggagtcatggg CGacc CGgccctctctccagagaaatccaggtac CGggagcagtgtttcctgggagctctgatgtggt CGacccaaaagcaaag CGatatttt CGctgtct CGactgaaggagggaact CGgcc
>U29185-spliced human hypothetical 1-2-3-ORF fusion
1 Sept 98 webmaster
>U29185-spliced human 1-3-ORF fusion
It is possible to track the growth in size of the human prion gene over time, at least for insertion of the Alu elements, which have been dated with Fla-a at 68mya, subfamilies Jo and Jb originating around 57 mya, subfamily Sx around 37 mya, and subfamily Y quite recent and possibly still retrotranspositionally competent. Mouse prion has three B1-F and six B1_MM which could be dated as well. A very recent paper suggests that the mouse Line-1 elements, the 541bp L1MA4, the 169 bp L1_MM, and the 181bp LINE2 -- which are possibly still active -- could also be dated:
PNAS 95 11284-11289, September 15, 1998 Olivier Verneau, Fran┴ois Catzeflis and Anthony V. Furano"The repeated DNA subfamilies generated by the mammalian L1 (LINE-1) retrotransposon are apparently homoplasy-free phylogenetic characters. L1 retrotransposons are transmitted only by inheritance and rapidly generate novel variants that produce distinct subfamilies of mostly defective copies, which then "age" as they diverge. Here we show that the L1 character can both resolve and date recent speciation events within the large group of very closely related rats known as Rattus sensu stricto...
All mammalian L1 elements contain four regions: a 5' UTR involved in regulation; ORF I, which encodes an RNA-binding protein; ORF II, which encodes a reverse transcriptase; and the 3' UTR. The evolution of the 3' UTR appears to occur rapidly enough to make it a useful source of phylogenetic characters for analyzing recent or rapid speciations. R. rattus and R. norvegicus diverged 2 million years ago. The Mus/Rattus dichotomy from the fossil record dates is 12.2 Mya."
The GenBank entry for human prion, U29185, is mis-annotated. The Alu retrotransposons have left and right homologous regions joined by an A-rich linker with a poly-A terminus, 300 bp or so all told. Alu units arose from 7sn RNA and are found in primates only though distant similarities are found in rodent B1 repeats; the oldest off-shoots, Fam, Fla, and Fra, are monomers but only Fla-A and Fla-C are reported in the human prion gene (where they are called FLAM_A and FLAM_C). The GenBank entry double-counts 9 of the Alu units through some sort of weird off-set error. There are only 15 Alu in actuality.
mRNA: 12634-12767 exon 1, 15390-15488 exon 2, 25464-27817 exon3
A non-redundant set of Alu sequences from the human prion is shown below. It is not easy to confirm their assignments using the Alu-only Blast server at NCBI because percent identity is not as important for classification as critical changes that affect secondary structure etc. For example, the reported AluY has a much better hit with consensus Alu-Sb than with consensus AluY, though one of the latter provides the best non-consensus match.
strand start end family length separation screw-up + 3,131 3,424 AluSx 293 2838 ok ggccgggtgt ggtggctcac acctgtaatc ccaacacttt gggaggctga 3181 ggcgggcaga tcacctgagg tcaggagttt gaaaccagcc tggccaacat ggcaaaaaca 3241 ctgtctctac taaaaaatac aaaaattagc cgggtgtgtt ggcacgtgcc tataatccca 3301 gctacttggg gggctgaggc aggagagtca cttgaacccg ggaggcagag gttgcagtga 3361 gacaagatca tgccactgca ctccagcctg gggaacagag cgaaactccg tctaaaaaaa 3421 aaaa + 5,969 6,265 AluSx 296 1004 ok gg ctgggcacag tggctcatgc ctgtaatccc 6001 agcactttgg gaggctgagg cgggcagatc acttgaggtc aggagttcga gaccagcctg 6061 ggcaatacgg tgaaacccag tctctactaa aaacacaaaa attagctagg catagtggtg 6121 catgactgta atcccagcta gttaggaggc tgaggcagga gaatcgcttg aacccaggag 6181 gtggaggttg cagtgagatt gtgccactgc actccagcct gggtgacaga gcgagactcc 6241 atctaagaaa aaaaaaatca gaaaa - 6,973 7,266 AluSx 293 1978 ok tttttttt ttttgagatg gagtcttgct ctgtcaccca ggctggagtg 7021 cagtggcaca atcttggctc actgcaacct ccgcctcctg ggttcaagcg attctcctgc 7081 ctcagcctcc ttagtagctg ggactacagg catgcaccac cacgcccagc taatttttgt 7141 atttttagta gagacagggt ttcaccatgt tggccaggct gttcttgaac tcttgacctt 7201 aggtgatcca cccgcctcgg cctcccaaag tgctgggact ataggcgtga gccacttcac 7261 caggcc + 8,951 9,253 AluJb 302 1032 ok ggctaagcat ggtggctcac acctgtgatc ccagcaattt gagaggccga 9001 gataagcaga ttgcttgagc tcaagagttc aaaaccaacc tggacaacat agtgagaccc 9061 ccgtctctaa aacaaataca aaaatcagcc aggcatggtg gctcacgcct gtggtcccag 9121 ctactcagaa gactgaggta agaggatagc ttaagcccag gaggcagagg ttgtagtgag 9181 ccgagatcac cccacggcac tccagcctgg gcaacagagt gagaccctat ctaaaaaaaa 9241 agaagaaggt aaa - 9,983 10,281 AluSx 298 1495 ok tttttttt tttcttttct gagacagagt ctcgctctgt 10021 cgcccaggct gaagtgcaat ggtgccatct tggctcactg caacctctgc ctcccgggtt 10081 caagcgattc tcctgcctca ccctcctgag tagctgagat tacaggtgcg catcaccatg 10141 cctggctaat ttttgtattt ttaatggaga cggggtttca ccatgttgcc caggctggtc 10201 ttgaactcct gacctcaagt aatccatggg cctcggcctc ccaaagtgct gggattacag 10261 gcatgagcca ccgcgcccgg c - 11,478 11,800 AluJo 322 8119 ok ttt ttttgtaaag agacaaagag tgagacaggg 11521 tcttggccta tagccctgtg gcccaggctg gagtgcagtg gcacaattaa agctcactgc 11581 agcctctacc tcctgggctc aagcaatcct cccatctcag cctccccagt agctgggact 11641 acaggactgt gccaccgttc ccagcaattt tttttatttt ttgtagaaat ggggtctcac 11701 tatgttgctc actctggtct caaactcctg agctcaagca atcttcctgc cttggcctct 11761 gaaagtgctg ggattacagg cctgagccac tgcacctggc - 19,597 19,708 AluJb 111 1 duplicate 477 atgt tttgtagaca tggggtttca 19621 ccatgttgcc caggctggtc tcaaactcct gggctcaagc aatcctcctg cctgagcttc 19681 ccaaagtgct gggattatag gcttgagc + 20,075 20,349 AluJo 274 1 duplicate 396 gctgtg gctcacacct ataatcccag 20101 cactttggga ggctgaggcc tgcagattgg ttgagcccag gaatttgaga ccagcctggg 20161 caacatggca agaccccatc actattaaaa atacaaaaaa aaaaaaaaac caggcgtggt 20221 gttgcatacc tgtggtccca gctactcagg aggatcactt gagcccactg ggtggaggtt 20281 gcagagagct gagatcatgc cactacactt tggcctgggt aagaccctgt ctcaagaaaa 20341 aaaaaaaaa + 20,472 20,604 FLAM_C 132 1 duplicate 7764 ggctgggca tggtggctca cacctgtaat ctcagcactt tggggggcca 20521 agacaggaag atctcttgag cccagaagtt tgaggcgagc ctgggcaaca tagcaagacc 20581 ccatctctac cagaaaagaa acaa - 28,237 28,534 AluSx 297 2 duplicate 1504 tttt gttttgtttt ttttgagacg 28261 gagtctcgct ctgtcgccca ggctggagtg cagtggcggg atctcggctc actgcaacct 28321 ccgcctcccg ggttcaggcg attctcctgc ctcagcctcc tgagtagctg ggactacagg 28381 catatgccac catgcccggc taatttttgt atttttagta gagatggagt ttcaccatat 28441 tggccaggct gttctcaaac tcggcctcaa gtgatctgct cgcctcagcc acccaaagtg 28501 ctaggattac aagcatgagc caccgcgccc ggcc + 29,743 30,047 AluSq 304 2 duplicate 2073 ggccaggc acagttgctc 29761 acgcctgtaa tccctgcact ttgggaggcc gaggcaggca gatcacttga ggtcaggagt 29821 tcaagaccag cctggccaac atggtgaaac ccccattccc tactaaaatt acaaaaaatt 29881 agctgggtgt ggtcgcacgt gcctgtaatc ccagctactc aggaggctga ggcaggagaa 29941 tcgcttataa ccgggaggca gaggctgcag tgagccgaga tcccgccatt gcactccagc 30001 ctggcggtaa gagcgaaact ccgtctcaaa aaaataaaat aaaataa - 31,818 31,949 FLAM_A 131 2 duplicate 1323 tt tttttttttc ttttgagata ggatcttgct atattgccca 31861 ggctggtctc aaattcctgg gctcaagcaa tcctcccatc tcagcctccc aagtggctga 31921 gatgacaggc acgtgccact atgtctggc + 33,143 33,315 AluJo 172 2 duplicate 177 caatcaat tcaaaaatta tagtggcctg gggttgtgtg 33181 ggtgaggtgg ctgcttggga ggctgaggtg ggaggatggc ttaagcccag tagtgcaagg 33241 ctgttgcaag ctgtgaccgc accactgcac tcctgcctgt gcaacagagc aagacaccat 33301 ctttataaaa acaaa + 33,322 33,619 AluSg 297 2 duplicate 523 gccagacgc ggtggctgac acctgtaatc ccaacacttt 33361 gggaggccga ggtgggcgga tcacgaggtg aggagtttga gaccagcctg gccaatatgg 33421 tgaaaccctg tctctcccaa aaacacaaaa actatctggg tgtggtggtg tgtgcctgta 33481 gtcccagcca ctcgggaggc tgaggcagaa gaatcgcttg aacccaggag gcggaggctg 33541 cagtgagccg agatcgcgcc actgcactcc agcctgggtg acagagtgag actctgtctc 33601 aaaacaaaac aaaacaaaa + 33,847 34,134 AluY 287 2 duplicate 0 ccgg gcgcggtggc tcacacctgt aatcccagca ctttgggagg ccgaggcggg 33901 cagatcagga ggtcaggagc tcgagaccat cctggctaac acggtgaaac cccgtctcta 33961 ctaaaaatac aaaaaattag ccgggcgtgg tggcgggcgc ctgtagtccc agctactcgg 34021 gaggctgagg caggagaatg gcgtgaaccc gggaggcgga gcttgcagtg agccgagatt 34081 gtgcctctgc actccaacct gggtgacaga gcgagactcc gtctcaaaaa taaa
14 Sept 98 webmasterIn PNAS 1994 Jul 5;91(14):6418-6422, Westaway et al. put forward the unsatisfactory idea of naming discrete motifs 1 through 4 upstream of exon 1, some of which featuring insignificant palindromy. Motif 1 has held up well as the number of species sequenced rose to 6 but others have evaporated (or fused or softened) as even better ones emerged (well, they were there all along) upstream of motif 1.
It is more accurate to say is that there is non-discrete 'motif' region of no known function [starting at about -350 to exon 1] containing surprisingly (and variously) conserved stretches that can be used to anchor alignments in the face of many pesky deletions in the promoter itself and of incongruities with exon 1. A number of nucleotides are absolute invariants in the three lineages; for comparison, exon 2 has over half and additionally lacks the deletions.
A polymorphism in the conserved motif region is more suggestive than one in a chaotic region or the middle of a large intron. If associated with disease, it would move things into interesting and unknown biological territority. It is safe to say that the motif region has not been stable for100 million years without strong selective pressure on a function. That same point can be made for cryptic human exon 2.
It is very odd to see the promoter region (in rodents, includes 3x Sp-1, Ap-1, CCAAT) experiencing greater variability than the motif region. The motif region has been experimentally deleted in two species without consequences for transcription, though not all facets -- such as tissue-specific expression -- can be tested. [H. Baybutt even reported that the much more orderly exon 2 region can serve as a fairly good self-contained promoter.] Were the motif region a standard regulatory element of eukaryotic gene expression or a universal structural element, it would call up something on Blastn, but it does not, though perhaps gap parameters cannot be sufficiently relaxed or a more special-purpose database is needed.
Though the prion gene is chock-full of retrotransposons, there are no recognizable ones here except for 3 each in human and mouse intron 1. A portion of the motif region looks like this (with 'motif 1' shown in blue):
>ra tcttcct-ctttaccaatttcttgttaccaaagttccacga-tggcctttttctttccgttaggtaacctttcattttctc >mo tcttc--g-tt-accaatttcttgttaccaaagttcaacga--tggcttcctcgctccgttaggtaacctttcattttctc >ha tctccctgctttac-aatttcttgctcctagagtttca-gcaattgctttctcgctccattaggcaacctttcattttctc >hu tctcct--ctttagaaatttctggttgccaaagttcca-gaaattgcttcctcattcc-t--g--agcctttcattttctc >sh tgtcctt--ttcagaaatttctggttaccagagttccc-gaaattgctttctcattccct-----aatctttcattttctc >co tgtccct--tttagaaatttctggttaccaaagttcca-gaaattgctttctcattccct-----aatctttcattttctc ....o.o.o....oo.o..ooooooo.o:o.o:o.oooo:o:.o:..:..ooo..oo..ooo.o.....o:.ooooooooooooo identity >ra gacta-cccattatgtaacggg-agcgctgggttctggatcagtcttccattaaagatgacttttatagtctgtgagcgtcgtcacagagt >mo aacta-cccattatgtaacggg-agcattgggtactggatcagtcttccattaaagatgatttttatagttgctgagcgtcgtcagggagt >ha accttccccattatgtaacggg-agcaatgggttctggaccagtcttccattaaagatgatttttatagtcggtgagcgccgtcagggagt >hu gatttctccattatgtaacggggagctggagctttgggccgaatttccaattaaagatgatttttacagtcaatgagccacgtcagggagc >sh -------ccattacgtaacgagaagctggggcttt-ggccgattttccctctaaagatgatttttatcgtcaacaagcaatttcagggagt >co -------ccattacgtaacgagaagctggggcttt-ggccgattttccctttaaagatgatttttatcgtcaacaagcaatttcagggagt ...........oooooo.oooooo.o.ooo....o.o...oo.....o.o.o...ooooooooo.ooooo..oo......oo....ooo...oo. identity
>She gaaatttgtgaaaaa---------------tggatcctttaagccatgaccctgaaaccccactcctgggaacttacctg-caat--ggaagaaattcggaaagaagaa-- >Bov gaaatttgtgaaaaa--------cagtcaggtgatcctttaagccatgaccctaaaaccc-actcctgggaacttacctg-taat--ggaggaaaccaggaaagaagaaga >Hom acattttgttaagcaatctggtgatgcattaagaagctggaagctgtgacccagaaaccccactcctgagaacttacctg-caat--ggaagaaacaaacaaacaaaaac- >Mus cccagcagtaaaacaatctggtgaggtatta-ttagtcgtgtgctgtgacccagaaaccccactcctggcaatttac-tgggaa---ggaacaaacaaagggctagggg-- >Rat cccagccgtaaaacaatctggtgaggtatta-ttagttgcatgctgtgacccagaaaccccacttctggcaattcacctgccgtggtggaaccaacaaagggctagggg-- >She -aagctgcattcacccacagggctcagaatgatctaaaattagatccagt-ccagagacaacctaaaggtattaagaaaatagcagggcagcagctaagaaaatcatagcactttaa >Bov aaagctgcattcacccacagaactcagaatgatctaaaattagatccagt-ccggagtcaacctaaatgtattaataaaatagcagggcagcagctaagaaaatcatagcactttaa >Hom aggcatgtattcc----tag----cagaatgatctaaaattagaacacctggaaaag--agcctaaatgtat------aacaccagggcagtagctaagaaaattatgacacattaa >Mus agccatatggcctg----------cagttagag--aaaattagatccaactgaaaaatcaacctaaaggtgt------aaaagccaagcagt---taagaaact---gaca--ggct >Rat agccatatggccaa----------cagttacag--aaaattagatccaagggaaaag-caacctaaatgttt------aacaggcgagcagc---taagaaact---gaca--ggct ...