Mad Cow Home ... Best Links ... Search this site

Special tools:dbEST ... Dot Plot ... Gene Finder ... GenLang ... Grail ... BCM Launch ... Promoters ... MatInspector
Last revised: 21 Sept 98

Exon 2: the mystery deepens
Is exon 2 splice-competent?
Evolution of exon 2: ancient region changes slowly
Adjacent genes
Annotated References
Supplementary material: see also on-site, off-page archive
...Exon 1: 13 sequences from 6 species with flanking regions and splice-typing
......Exon1b of cow compared to putative sheep exon 1b
......Major Cpg island around exon 1
......Motif mystery upstream of exon 1 promoter
......A long alignable region within intron 1
......Reference sequences for motif and promoter region
...Exon 2: non-redundant set for 6 species, alignment, consensus, distance table, fasta, probes, flanking regions
...Exon 3: 13 species, alignment, fasta, flanking regions, probes
......The G-21A polymorphism
......Exon 1-2-3 spliced probe sequences
...Alu and L1 retrotransposon insertion dating

Exon 2: the mystery deepens

28 Aug 98 webmaster
The structure of the prion gene for all species of mammals studied contains three exons. The first is contiguous to promoter and regulatory elements while the third contains just 10 nucleotides preceding the start codon. The second exon is transcriptionally expressed in most species, yet human mRNA reflects only exon 1 and exon 3, suggesting to earlier workers that the gene had a different structure.

However, a region clearly related to exon 2 was identified in humans from its strong sequence homology to expressed exon 2 in other species. The question then arises, is human exon 2 cryptic [present but spliced out of mRNA] or is it only expressed in certain rare cell types or tissues or stages of development or at undetectable levels? Hamsters were also once thought to have no exon 2, much less expression, but later it emerged that splice type 1-3 swamped 1-2-3 mRNA unless astrocytes expressing it were present at high levels.

Exon 2 is changing very slowly in evolutionary time, with 6 species averaging 85% identity to a common ancestral consensus sequence some 100 million years back. Human exon 2 shows no sign of rapid change [loss of selective pressure]: the sequence changes orders of magnitude more slowly than [unselected] pseudogenes. Therefore, if human exon 2 is a molecular fossil without function, this must be a very recent development. Alternatively, exon 2 could have a cryptic function without appearing in mature mRNA. More likely, human exon 2 will turn out like hamster: preferentially expressed in some cell types under certain conditons and possibly at low -- but important -- levels in other cell types.

The mystery of exon 2 is why the sequence is so conserved when the uses of it seem minor if not obscure: how can exon 2 be so important that accepted point mutations are rare when the ORF itself can be knocked out without dramatic consequences? A similar question arises in exon 1b in sheep -- the splice junction seems unflawed but, unlike in cow, not utilized. And a conserved motif region upstream of promoter elements also begs for an explanation.

Since over-production of prion protein is a strong risk factor in animal models of CJD, it has long been speculated that polymorphisms in 5' and 3' UTR leading to higher levels of expression of prion protein could explain some fraction of sporadic CJD, nvCJD, or susceptibility to iatrogenic CJD. Polymorphisms have been reported of human (relative to exon 3) and at -600 (relative to exon 1) but not pursued. Of course, a polymorphism in a conserved region is more suggestive than one in a chaotic domain or the middle of a huge intron.

Is exon 2 splice-competent?

2 Sept 98 webmaster
Is human exon 2 a suitable substrate for the spliceosome or does it contain some subtle or not-so-subtle variation that incapacitates it? If exon 2 is splice-competent, how is the level of participation or tissue specificity built in to the sequence or secondary structure? The issue is apparently why the donor 3' to exon 1 does not splice with the acceptor 5' of exon 2, which then forces the donor to splice with the acceptor 5' of exon 3 (a region where a polymorphism is found at the polypyrimidine boundary, G-21A).

Alternative splicing is common in other genes. However, it typically results in domains added or dropped from the final protein product. This scenario is not applicable to exon 2: extra amino acids are not wanted upstream of a signal peptide and exon 2 is out of register in all three reading frames plus lacks an initiation codon, as seen from translation:

DS-IFFKTEQFQPCLSFPSSWRHKSSLAEPQQI
TPEYFSKLNNFSHV-AFRLPGGTNLV-LNHNR
LLNIFQN-TISAMSELSVFLEAQI-FS-TTTD
Alternative splicing, resulting in exon 123 or exon 13 splice products, has been demonstrated for hamster, mouse, rat, cow, and sheep -- in all species studied except human, which so far only shows exon 13 splicing. It is fair to say that 1-3 can be the more common variant in the other species. Whole genome projects are mass-sequencing cDNA from a wide variety of tissues; these are kept separately from GenBank in STS or EST databases that are separately searchable. However, using human exon 123 as probe (or subsets thereof), nothing is found in these databases, though mouse exon 123 and mouse exon 13 yield solid returns.

A very large number of intron/exon junctions have been determined -- a single issue of Genomics alone contains many dozens of new ones. In many cases it is known whether these are subject to alternative splicing. Specialized online software has been developed to detect valid exon-intron boundaries: human exon 2 is affirmed by caomparisons and these methods. However software cannot at this time determine tissue specificity, regulatory response, or proportions of alternative splice useage (zero here so far).

To understand the protein and RNA spliceosome requirements for exon 2 functioning, eukaryotic splicing machinery must be understood. However splicing is sufficiently complex that no help is in sight, as seen here in a very recent paper:

"Pre-mRNA splicing requires the bridging of the 5' and 3' ends of the intron involving interactions between the WW domains in the splicing factor (FBP11 and FBP21) and a proline-rich domain in the branchpoint binding protein, BBP, in spliceosomal complex A associated with U2 snRNPs colocalized with splicing factors in nuclear speckle domains. FBP21 interacts directly with the U1 snRNP protein U1C, the core snRNP proteins SmB and SmB', and the branchpoint binding protein SF1/mBBP with a role in cross-intron bridging of U1 and U2 snRNPs in the mammalian A complex suggest ing a network of interactions between specific splicing factors bound to the 5' splice site, SF1/BBP bound to the branch site, and U2AF bound to the 3' splice site in the spliceosomal complex E. FBP11 protein binds the splicing factor SF1/mBBP; FBP21 also interacts with SF1/mBBP and binds three other splicing factors, SmB, SmB', and U1C. FBP21 corresponds to a spliceosome-associated protein that is detected in the spliceosomal complex A and present in spliceosomes assembled on different pre-mRNAs associated with U2 snRNPs."

Evolution of exon 2: ancient region changes slowly

28 Aug 98 webmaster
From the sequence alignment below, one sees an extraordinary 75% average pairwise sequence identity and fully half of all bases completely invariant. Splicing donor/acceptor sites cannot account for this well-conserved sequence because these typically consist of just 4 nucleotides, AG-GT, belonging to the flanking introns. Note that exon 2 is almost as well conserved as the protein coding region itself.

Knowledge of the mammalian prion gene is bounded by the marsupial and chicken 5' UTR sequences. These are sequenced only in the region of exon 3; at least one splice donor must lie upstream. Both species have uninterupted ORFs. Marsupial still has 10 bp of exon preceding the start codon with residual homology to placental mammals; chicken has lost all visible similarities and has only 2 bp in its exon 3 prior to the ATG of methionine.

It is possible to date recent 'growth' of human prion introns using the Alu elements, which are primate-only and thoroughly studied. Other mobile elements are shared with counterparts in the same position in sheep and mouse, suggesting their insertion predates the divergence of these species.

Exon 3 is different. The 10 bp upstream of the coding sequence has both the intron-exon splice junction to preserve as well as initiation of ribosomal translation. The latter region consensus sequence was described by Kozak (Mamm.Genome (1996) 7:563-574) as (gcc) gcc a/gcc AUG G, with residues -3 and +4 especially significant for optimal ribosome binding.

Adjacent genes

GenBank searches for genes flanking the prion gene 4 Sept 98 webmaster
The reported human prion sequence, U29185, contains 12, 623 bp upstream from exon 1 -- does this contain the 3' end of another human gene? If so, these might shed light on the general area of metabolic function of the prion gene, even though eukaryotic genes are rarely organized into operons.

This region, as the GenBank entry shows, contains many retrotransposons such as Alu, but these have sufficient gaps to accommodate exons and ORFs for another gene. A Blast search on 4 Sept 98 against 808,000,000 database letters for all of the region upstream of exon 1 failed to turn up any homologies to coding regions of other genes, though transposons and mobile elements are represented elsewhere in thousands of places in the genome: 42,079 significant hits were found.

Similarly, a search of chromosome 20pter-12 for neighbors of PrnP GDB:120720 shows CHGB Hs.2281 SCG1 chromogranin B (secretogranin 1 GDB:118770), PCNA Hs.78996 proliferating cell nuclear antigen GDB:120261, and PDYN prodynorphin GDB:120269 as known genes in this region (only 135 genes, 5 pseudogenes, and 1 unknown gene have been identified on the whole chromosome 20 thus far; the human genome has 7436 known individual genes). However, precise placement of these and other genes relative to the prion gene is unknown.

The gene responsible for spongiform degeneracy in the zitter rat has still not been pinned down though it maps close to the prion gene on chromosome 3. Exon 2 was sequenced and found normal in zitter rats; enhanced mRNA was not found.

CENPB, CHGB, AVP, AGS, OXT, PCNA, PDYN, PRNP, BMP2, RA, FKBP1, CSNK2A1, CDC25B, ADRA1D, F15, NBIA1, NDUFA7, SN, PLCB4, 6175920, SOX22, P3, GNRH2, HMCS.

Annotated References

These are the relevent Medline hits from 22 papers containing 'prion AND exon.'
An important additional reference has only appeared on GenBank and in the Erice symposium

Structure and Organization of Chromosomal Regions Carrying the Mammalian Prion Gene from Three Species

Prions and Brain Diseases in Animals and Humans, ed. D. Morrison
NATO ISI Series Plenum Press ISBN 0-306-45825-X ...  August 19-23, 1996 Erice workshop
Large-scale sequencing of human, mouse, and sheep prion protein genes, pg 59-76
Lee,I.Y., Westaway,D., Smit,A.F., Cooper,C., Yao,H., Prusiner,S.B. and Hood,L.

GenBank entries U29185, U67922, U29186 corresponding to this article refer to unpublished item: Structure and Organization of Chromosomal Regions Carrying the Mammalian Prion Gene from Three Species
A region strongly homologous to exon 2 in other species is identified in humans. The 5' half is more conserved than the 3' region. Exon 2 is annotated at GenBank. All mammalian species should use 123 exon nomenclature whether or not expression has been seen for exon 2. Figures 1-7 provide important global registration of sequences from mouse, human, and sheep; CpG islands, retrotransposon positions, and alginments. Figure 7A finds a non-maximal registration of these 3 species within intron 1 extended below (webmaster) to all 6 species. Note that sequence numbering in the book is not concordant with GenBank numbering, eg, human 14408 in figure 7A corresponds to 14775 at GenBank.

Alternative usage of exon 1 of bovine PrP mRNA.

Horiuchi M, Ishiguro N, Nagasawa H, Toyoda Y, Shinagawa M
Biochem Biophys Res Commun 1997 Apr 28;233(3):650-654 
Cattle use exon 1a23 and exon 1b23 equally (except in spleen, all exon 1a23 plus an uncharacterized minor product not 1b23). Exon 1a and 1b have identical start points but 1b contains 53+115=168 bp read-through to an alternate splice donor. Translation efficiency was the same. Usage of exons 2 and 3 was identical for the two mRNA species. This is the first case of tissue-specific alternative splicing for exon 1. Evidently spliceosomes have significant species and tissue differences; I haven't seen this investigated trans-genetically. Sheep were not found to use this distal splice donor despite a very similar sequence to the bovine 1b splice junction and to the splice consensus (C/A)AG--gt(a/g)agt. Adult cows had 5x the prion mRNA as an 8 month fetus; cows and sheep had reversed abundance in kidney and spleen.

Sheep, but not rodents and human, are 88% identical over this region but identical around the splice junction. No tissue has been found in sheep that expresses its exon 1b though the CpG island skew might be seen as supporting it. This feature of sheep prion gene is not annotated at GenBank. It is one thing to decide on what ORF to use to breed scrapie-proof sheep; it is quite another to decide on the 5' UTR sequence that will drive it.

Data for ragged transcription starts is shown by frequency in figure 1; the top line for 1a, the bottom for 1b:

...............4..5......1........12
..1.....1.12..10..3......1..1...2.11
TTACCCGCCCTAGTTGCCAGTCGCTGACAGCCGCAGA

Characterization of the bovine prion protein gene: the expression requires interaction between the promoter and intron.

Inoue S, Tanaka M, Horiuchi M, Ishiguro N, Shinagawa M
J Vet Med Sci 1997 Mar;59(3):175-183
We cloned the part of the bovine PrP gene which contains the 5'-flanking region, exon 1, exon 2 and intron 1 to analyze its promoter region. The 5' non-coding region of the bovine PrP gene consisted of three exons and two introns, and its organization was similar to that of the mouse, rat and sheep PrP genes. The 5'-flanking region of the bovine PrP gene from the transcription start site to nucleotide position -88 was (G + C)-rich (78%) and contained three potential binding sites for the transcription factor Sp1, but no CCAAT-box or TATA-box. [They took CCGCCC = Sp-1 and CCCCGGGC = AP-2 (inactive). -88 to -30 seemed to be key for promoter activity, as tested with CAT.].

This region showed high homology (89%) with that of the sheep PrP gene, but relatively low homology (approximately 46-62%) with the same region of the mouse, rat, hamster and human PrP genes. The position from -88 to -30 within the 5'-flanking region of the bovine PrP gene showed major promoter activity. However, this region was able to function properly only in collaboration with the region at +123 to +891 of intron 1 of the bovine PrP gene. [ This fits with CpG island overhand. Nine tissues were studied, with brain, spleen, adrenal glands and kidney high but lymph nodes and skeletal muscles low.]

Sheep prion control regions -- Iceland meeting

August 1998 TSE meeting in Iceland; posters P10 and P11 and talk T56 from the Goldmann-Hunter group addressed control regions in sheep.
A 524 bp sheep promoter fragment was studied with a reporter gene and a series of deletions. A mutation G to T at -96 knocks out the single AP-2 and conveniently a Sma1 restriction site. A single SP-1 was found at -48. They are now looking at genotype and breed promoter variations. [These numbers do not work on GenBank sheep sequences, possibly because of different assumed start points -- it might be better to number from the end of exon 1.]

They found the 3' UTR had tissue-specific use of alternative polyadenylation sites that differed signficantly in their translation efficiency and regulation during development. They note further, "Additional RNA processing in other postitions of the untranslated regions has been found resulting in a complex system of post-transcriptional modulation of PrP expression ['ruminants only', implying cows and non-ruminants were tested. ].... A parallel study of PrP polymorphisms revealed a high variability in the 3' UTR within and between breeds..." This was investigated further with 3' UTR deletions. The mRNAs are given as 4.6kb and 2.1 kb in P10 which are said to differ by 2.3kb in T56. Online poly-A software can probably find the [unreported] alternative site.

Sequence variation in intron of [human] prion protein gene, crucial for complete diagnostic strategies.

Hum Mutat 1996;7(3):280-281 no abstract GenBank staff-entry S82948
Palmer MS, van Leeven RH, Mahal SP, Campbell TA, Humphreys CB, Collinge J
This paper, on its face, is an obscure technical note that calls attention to a widely used faulty sequencing primer, PDG-45. This came about because a 1989 paper on GSS by Hsiao (1989) Nature 338:342 inadvertently uncovered an uncommon allele at position -21 (relative to the start of exon 3) that was mistaken for wildtype and used by others for diagnostic sequencing in a clinical decision setting. Because the G to A at -21 occurs early on in the primer, it causes certain alleles not to be amplified. They sequenced 62 controls, finding seven A-21 among the 124 alleles for a frequency of 5.6%.

Now it gets more curious. They sequenced two [related] A117V cases, finding G-21A GCA117GTA and -129V for the haplotypes, ie, the polymorphism at -21 was on the disease allele. This was also the result in the original GSS case. The seven normal A-21 all co-occurred with GCA117GCG, the silent A117A polymorphism called PvuII negative in restriction language (10% of European population). In other words, the -21 intron polymorphism has a curious correlation with codon 117.

GenBank staff-entry S82948 thus gives the wildtype sequence in this region, which agrees with the more modern U29185; the Hsiao (1989) sequence never made it into the database, which is just as well as it seems to have 3 upstream sequencing errors:

tgataccattgctatgcactcattcattatgcaggaaacatttagtaatttcaacataaatatgggactctgac g ttctcctcttcattttgcag agcagtcatt ATG S82948
tgataccattgctatgcactcattcattatgcaggaaacatttagtaatttcaacataaatatgggactctgac g ttctcctcttcattttgcag agcagtcatt ATG U29185
........................cattatgcag-aaacatttagtaatt-caacataaatatggAactctgac A ttctcctcttcattttgcag agcagtcatt ATG Hsiao (1989)
............................................................................................................................................ttttgcag agcagtcatt ATG Puckett X83416
No one ever looked at this polymorphism again, not in sporadic CJD, not in nv CJD, not in familial CJD. Obviously the control regions should have been sequenced long ago in a couple thousand cases of sporadic CJD and all nvCJD. Little work is involved because exon 1, exon2, distal intron 2, and their flanking regions involve just a few hundred bases with known primers; middle stretches of the introns are not needed. (Eleven other species were sequenced in the exon 3 region but it is changing rather fast, no big surprise for an intron. Of course, the interest is over-production attributable to regulatory sequences driving accumulation of rogue conformer.

Notice that -21 is precisely at the boundary of the standard poly-pyrimidine tract that comprises part of the splice acceptor region. This feature is strongly conserved even though sequence specifics are not. This raises the questions of how efficiently the splice is made and whether the splice donor at cryptic human exon 2 might not be expressed differently in the -21A setting.

Isolation and characterization of the promoter region of the human prion protein gene.

Poster session, Am Soc Hum Genetics 1996 v46 
Mahal SP, Beck JA, Palmer MS, Antoniou, M, Collinge J
Meeting abstract, no follow-up as of 14 Sept 98. Interpreted 2.9 kb of 5' UTR as containing Ap-1, Ap-2, CBP, MyoD, NF-IL6 and heat shock factors and a 200 bp active promoter. Two polymorphisms were found in sporadic CJD 600bp upstream of exon 1. nvCJD cases also to be sequenced in this region.

Structure and polymorphism of the mouse prion protein gene.

Proc Natl Acad Sci U S A 1994 Jul 5;91(14):6418-6422 
Westaway D, Cooper C, Turner S, Da Costa M, Carlson GA, Prusiner SB
...We retrieved mouse PrP gene (Prn-p) yeast artificial chromosome (YAC), cosmid, phage, and cDNA clones. Physical mapping positions Prn-p approximately 300 kb from ecotropic virus integration site number 4 (Evi-4), compatiblewith failure to detect recombination between Prn-p and Evi-4 in genetic crosses. The Prn-pa allele encompasses three exons, with exons 1 and 2 encoding the mRNA 5' untranslated region. Exon 2 has no equivalent in the Syrian hamster and human PrP genes. [wrong -- see Erice ref]

The Prn-pb gene shares this intron/exon structure but harbors an approximately 6-kb deletion within intron 2 [ defective IAP retrovirus]. While the Prn-pb open reading frame encodes two amino acid substitutions linked to prolonged scrapie incubation periods, a deletion of intron 2 sequences also characterizes inbred strains such as RIII/S and MOLF/Ei with shorter incubation periods, making a relationship between intron 2 size and scrapie pathogenesis unlikely. The promoter regions of a and b Prn-p alleles include consensus Sp1 and AP-1 sites, as well as other conserved motifs which may represent binding sites for as yet unidentified transcription factors.

Comment (webmaster): No polymorphisms distinguishing a and b mice strains were found in exons 1, 2. The splice donor and acceptor sites are said to differ from a consensus by 3/13 and 2/8 mismatches. Figure 2 shows a difference upstream abutting the AP-i site: the b allele is TGACTCA where the a allele is TGACTCA. AP-1 is noted to be a dimeric transcriptional activator composed of Fos and Jun proteins. The motif terminology is launched.in a 4 species alignment (figure 4). A speculative resemblance to sequences found in muscle-specific genes is noted; Mahal et al noted a human MyoD binding site in a meeting abstract. A previous study found muscle degneration in an over-production setting (Cell 76 117-129 1994).

Characterisation of two promoters for prion protein (PrP) gene expression in [mouse] neuronal cells.

Baybutt H, Manson J
Gene 1997 Jan 3;184(1):125-131
GenBank U52821 3498 bp
...In order to define the sequences that are responsible for the normal expression of the PrP gene we have isolated and sequenced a 5' region of the murine PrP gene, which includes 1.2 kb upstream from exon 1, intron I and exon 2. Sequencing of this region from several strains of mice identified a polymorphism linked to Sinc, the gene controlling the incubation period of scrapie in mice.

We used this gene fragment and deletions of it to examine promoter mediated expression of a chloramphenicol acetyl transferase reporter gene in neuroblastoma cells (N2a). Both promoter and suppressor elements were identified within this region. The two major areas of promoter activity were sequencesadjacent to and 5' to exons 1 and 2. The 5' region of intron 1 was shown to contain elements that were capable of suppressing promoter activity. Transcription factor binding sites have been identified within these sequences.

Comment (webmaster): Binding sites for AP-1, AP-2 (TCCCCAG), and SP-i are vaguely indicated in figure 1 and not all confirmable. Unspecified 'binding sites' are also vaguely described in intron 1 1960-2020 and 3328-3358. Significant transcription remained (21%) after deleting 1-1241 which includes exon 1. Fig 3 shows the effects of various deletions; 1864 to 2309 in the center of intron 1 has suppressor activity on both the exon 1 and exon 2 promoters. Transcriptions starts were at 1151-1159 and 2750-2776 (exon 1 deleted). The authors suggest tissue specificity may arise through suppression of transcription. Table 1 gives polymorphisms in various mouse sinc strains that could not be correlated with incubation time as plausibly as amino acid 108 and 189. Further work using smaller deletions and site-directed mutagenesis is underway; competing oligonucleotides may have therapeutic possibilities.

Identification of a promoter region in the rat prion protein gene.

Biochem Biophys Res Commun 1996 Feb 6;219(1):47-52 
GenBank D50092
Saeki K, Matsumoto Y, Matsumoto Y, Onodera*
*fax: 81-3 5800-6974 Tokyo
We have demonstrated the presence of a rat prion protein (RaPrP) gene promoter upstream of multiple initiation sites. A 0.1-kb fragment upstream of the 5'-untranslated region contains specific DNA motifs characteristic of promoter elements including an AP-1 binding site, an inverted CCAAT motif [that reads ATTGGTG] and three inverted Sp-1 binding sites. This fragment directs transcription of a luciferase reporter gene in pheochromocytoma cells (PC12) and rat glioma cells (C6), suggesting that it contains the promoter for the RaPrP gene. To more precisely localize the transcription regulatory elements in this region, a series of 5'-deletion mutants were generated. Deletion analysis showed that an inverted CCAAt and adjoining Sp-1 binding sequences may play an important role in transcription of the RaPrP gene.

Comment (webmaster): Figure 1 shows 2831 bp upstream and 168 bp downstream and the location of regulatory elements such as AP-2 (cggTCCCCAGctc) at -597 to -591; Figure 3 shows rat, mouse, and hamster gapped and aligned, with motifs 1-4, the AP-1, the CCAAT, 3 x Sp-1 and the start of exon 1. No TATA, CRE, NF-kB, or OTF-1 sites were found in either orientation. The motif region was deleted without effects on the promoter. The inverted consensus SP-1 binding sequence is (G/A)(C/T)(C/T)(C/A)CGCC(C/T)(C/A); one base mismatches were seen. About 90 bp were needed upstream of the transcriptional start site.

Three-exon structure of the gene encoding the rat prion protein and its expression in tissues.

Virus Genes 1996;12(1):15-20 
Saeki K, Matsumoto Y, Hirota Y, Matsumoto Y, Onodera T*
*fax: 81-3 5800-6974 Tokyo
The prion protein (PrP), encoded by a chromosomal gene, is associated with development of the neurodegeneration of prion-induced diseases. Since determination of the complete structure of the gene encoding PrP is important for understanding gene expression in the central nervous system (CNS), the nucleotide (nt) sequence of the isolated whole gene encoding rat PrP was determined.

The rat PrP gene (chromosome 3) spans 16 kilobases (kb) of the rat genome and contains three exons of 19-47 base pairs (bp), 98 bp, and 2 kb separated by two introns of 2.2 kb and 11 kb. The first and second exons are noncoding, while the third exon contains a short 5' untranslated region, the entire 762-bp open reading frame (ORF), and a 3' untranslated region. The putative raPrP promoter in the 5' flanking region contains putative Sp1, AP-1, and AP-2 binding sites without a consensus TATA box.

This TATA box-deficient feature, coupled with the presence of a high G+C content and Sp1-binding sites in the raPrP promoter, characterizes it as a housekeeping gene. Analysis of the raPrP cDNA 5'-end showed that raPrP mRNA transcription was initiated at multiple sites. Northern blot analysis showed that the levels of raPrP mRNA varied among rat tissues, with the highest levels found in the brain and placenta. This determination of raPrP nt sequences, including the introns and the 5' and 3' flanking regions, may make it possible to elucidate cis-acting elements that regulate the expression of this gene in different tissues and cell lines.

Comment (webmaster): Figure 1 shows the ragged transcription start points by frequency. These result in a 19-47 bp mRNA portion of exon 123 products:

2....1.81..*2.83.......2.3.3...................
GCGTTGTCAGCGCAGCAGACGGAGTCTGAGCGTCGCGTCGGTGGCAG
* = 12
Nerve growth factor, insulin-like growth factor 1, and human growth hormone are known to increase prion protein expression in this rat cell line. mRNA was found in spleen, liver, lung, kidney, heart, testis, brain, and placenta. The latter two tissues were highest; spleen was low and liver undetectable.

[Rat] tremor and zitter, causative mutant genes for epilepsy with spongiform encephalopathy in spontaneously epileptic rat (SER), are tightly linked to synaptobrevin-2 and prion protein genes, respectively.

Kuramoto T, Mori M, Yamada J, Serikawa T
Biochem Biophys Res Commun 1994 Apr 29;200(2):1161-1168 
Spontaneously epileptic rat (SER) is a homozygote for both tremor (tm) and zitter (zi) genes and exhibits epilepsy-like seizures and spongiform encephalopathy. Genetic linkage analyses revealed that the tm and zi loci were tightly linked to the synaptobrevin-2 (Syb2) on chromosome 10 and the prion protein (Prnp) on chromosome 3, respectively. The genomic DNA sequences of Syb2 of the tm/tm (TRM) rats and exon 2 of the Prnp of the zi/zi (ZI) rats were identical to those of a control rat strain WTC. In addition, no difference was detected for expression of the Syb2 and Prnp on the Northern blot analyses of TRM, ZI and WTC brain, not the tm and zi, respectively. The assignments of tm and zi to rat chromosome 10q24 and 3q35, however, will be the first step towards the positional cloning of the genes.

Scrapie and cellular PrP [hamster] isoforms are encoded by the same chromosomal gene.

Cell 1986 Aug 1;46(3):417-428 
Basler K, Oesch B, Scott M, Westaway D, Walchli M, Groth DF, McKinley MP, Prusiner SB, Weissmann C
GenBank M14055
PrP 27-30 is the major protein in purified preparations of scrapie agent. An almost complete PrP cDNA was used to select PrP-related genomic clones from normal hamster DNA. The gene contains a noncoding exon of 56 to 82 bp and a 2 kb coding exon, separated by a 10 kb intron. Transcription initiates at the same multiple sites in vivo and in vitro. The promoter lacks a TATA box and contains three repeats of the sequence GCCCCGCCC, which resembles the Sp1 binding site found in "housekeeping" genes. The PrP coding sequence encodes a presumptive amino-terminal signal peptide. The primary structure of PrP encoded by the gene of a healthy animal does not differ from that encoded by a cDNA from a scrapie-infected animal, suggesting that the different properties of PrP from normal and scrapie-infected brains are due to post-translational events.

A novel hamster prion protein mRNA contains an extra exon: increased expression in scrapie

Brain Res. 1997 Mar 21; 751(2): 265-274. 
Li G, Bolton DC
GenBank U78769 see also M14055
This meticulous study found that hamsters express both exon 123 and 13 mRNA isoforms. For each exon 123 mRNA in colliculi, 2.9 exon 13 mRNAs are found. In hippocampus, this ratio is 3.5; in frontal cortex, it is 2.1 (statistically significant increase). During scrapie infection, relative expression of exon 123 increased in colliculi, reaching 1:1.25 relative to exon 13 mRNA (or 2.5x of previous exon 123 levels), which was attributed to proliferating astrocytes. 492 bp was sequenced about exon 2. The central region of hamster intron 1 has not been determined.

These ratios are turned around in mouse: only exon 123 expression is reported, whereas human is solely exon 13 so far. No exon 123 was found here in human, using isolated neutrophils, frozen adult brain, or a commercial cDNA adult brain library. Cattle use exon 1a23 and exon 1b23 equally where exon 1b contains 53+115=168 bp read-through to an alternate splice donor.

The isoform ratio in pure astrocytes (nor other cell types) was not determinable; isoform ratio changes during infection may reflect changes in cell type ratios rather than changes in isoform ratios within a given cell type; no change was seen in rates of prion mRNA synthesis overall.

Thus, mRNA isoform useage is tissue- and cell-type specific. Lesser isoforms are easily missed in whole brain RNA or in studies when only 4-5 cDNA clones are taken. Relative stabilities, translational efficiencies, and possible target-encoding are unknown. Isoforms in the prion gene have nothing whatsoever to do with producing alternative protein products.

Supplementary material

Based on GenBank and Blastn data containing 5'UTR exons as of 1 Sept 98 -- webmaster

Exon 2: non-redundent set, 6 species, alignment, consensus sequence, sequence distance table, fasta format, flanking regions

<

Non-redundant set of exon 2 sequences
cow D26150 = AB001468 = D10612
rat D50092
human U29185
mouse U29186 = U52821 = X79931 = M13685
sheep U67922 = X79913
hamster U78769

Exon 2 probe set

>Consensus_exon2
GACTcCTGAaTATaTTtcAaAACTGAACaaTTTCAaCcaagctgaAGcattCtGtcTTcctaGaGgtACcagTccagtTTAGgaGAgcCAcAgCaGAtt

>U29185 Homo sapiens Lee IY 1998 exon 2
gactcctgaatatttttcaaaactgaacaatttcagccatgtctgagctttccgtcttcctggaggcacaaatctagtttagctgaaccacaacagatt

>D26150 Bos taurus Yoshimoto J 1994 exon 2
gacttctgaatatatttgaaaactgaacagtttcaaccaagccgaagcatctgtcttcccagagacacaaatccaacttgagctgaatcacagcagat

>U67922 Ovis aries Lee IY 1998 exon 2 
gacttctgaatatatttgaaaactgaacagtttcaaccaagctgaagcatctgtcttcccagagacacagatccaacttgagctgaatcacagcagat

>U29186 Mus musculus Lee IY 1998 exon 2 
gactcctgagtatatttcagaactgaaccatttcaaccgagctgaagcattctgccttcctagtggtaccagtccaatttaggagagccaagcagact

>D50092 Rattus norvegicus Saeki,K. 1997 exon 2+
gactcctgaatatatttcaaaactgaaccatttcaacccaactgaagtattctgccttcttagcggtaccagtccggtttaggagagccaagccgact

>U78769 Mesocricetus auratus Li G and Bolton 1996 exon 2+
gactcctgaatatattccaaaactgaacaatttcaactgagctgaagtactctgtttttctagaggtaccagttcagtttaggagagtcacagcagatc

Exon 2 with flanking regions

>U29185 Homo sapiens Lee IY 1998 exon 2+ Hela cell line S3
ttttaagaatcagttcttagattcatttatcaattctagttttttgttgttgtttttaaggactcctgaatatttttcaaaactgaacaatttcagccatgtctgagctttccgtcttcctggaggcacaaatctagtttagctgaaccacaacagattgtacatatcctgcagaacctctgtggtcttaggaaggttgaaagtcaccaaatgtcacag

>D26150 Bos taurus Yoshimoto J 1994 exon 2+ bovine gene in mouse L-929 cells
attctgttaaataatccgttcttagatttatcaattatagttttttcttttttttttaaggacttctgaatatatttgaaaactgaacagtttcaaccaagccgaagcatctgtcttcccagagacacaaatccaacttgagctgaatcacagcagatgtaggtacc

>AB001468 Bos taurus Yoshimoto,J 1997 exon 2 brain
gccagtcgctgacagccgcagagctgagagcgtcttctctctcgcagaagcaggacttctgaatatatttgaaaactgaacagtttcaaccaagccgaagcatctgtcttcccagagacacaaatccaacttgagctgaatcacagcagatataagtcatcatg

>D10612 Bos taurus Yoshimoto,J 1993 exon 2 brain
gccagtcgctgacagccgcagagctgagagcgtcttctctctcgcagaagcaggacttctgaatatatttgaaaactgaacagtttcaaccaagccgaagcatctgtcttcccagagacacaaatccaacttgagctgaatcacagcagatataagtcatatg

>X79913 Ovis aries Westaway D, Zuliani V 1994 exon 2 adult brain
ctattaaataatccgttcttagatttatcaattatagtttgtttttttttttaaggacttctgaatatatttgaaaactgaacagtttcaaccaagctgaagcatctgtcttcccagagacacagatccaacttgagctgaatcacagcagatgtaggtaccctgcggaatctctctggt

>U67922 Ovis aries Lee IY 1998 exon 2 adult brain
ttattctattaaataatccgttcttagatttatcaattatagtttgtttttttttttaaggacttctgaatatatttgaaaactgaacagtttcaaccaagctgaagcatctgtcttcccagagacacagatccaacttgagctgaatcacagcagatgtaggtacctgcggaatctctctggtcttgtgatggttgaaagtgcccaactgtttcaag

>U29186 Mus musculus Lee IY 1998 exon 2 brain
ttgtatttcagttctcagacttatttatcaattctagttttctctttttgttgttttaaaggactcctgagtatatttcagaactgaaccatttcaaccgagctgaagcattctgccttcctagtggtaccagtccaatttaggagagccaagcagactgtgagtgccctgtgaatcatgatggtcttggggagggttgggaggggaactgaaaaatcatcaact

>D50092 Rattus norvegicus Saeki,K. 1997 exon 2 liver
tcatctttcagttctcagacttatttatcaattctagtttttctttttgttgttttaaaggactcctgaatatatttcaaaactgaaccatttcaacccaactgaagtattctgccttcttagcggtaccagtccggtttaggagagccaagccgactgtaagtgccctgtgaatcatgatggtctgtgggtcggggattgaaaatcaccaactgtct

>U78769 Mesocricetus auratus Li G and Bolton 1996 exon 2 scrapie-infected inferior, superior colliculi
tggtatttcagtacttagatttattcatcaattctaatttttctttttcatgttttgaaggactcctgaatatattccaaaactgaacaatttcaactgagctgaagtactctgtttttctagaggtaccagttcagtttaggagagtcacagcagatcgtaagtgccctgtcaatcttggtagagggcttgaaaatctccaactgtctggggagatgg

>U52821 Mus musculus Baybutt,H.N 1997 exon 2 neuronal cells
tgtatttcagttctcagacttatttatcaattctagttttctctttttgttgttttaaaggactcctgagtatatttcagaactgaaccatttcaaccgagctgaagcattctgccttcctagtggtaccagtccaatttaggagagccaagcagact...

>X79931 M.musculus Westaway,D 1994 exon 2 adult brain
tgtatttcagttctcagacttatttatcaattctagttttctctttttgttgttttaaaggactcctgagtatatttcagaactgaaccatttcaaccgagctgaagcattctgccttcctagtggtaccagtccaatttaggagagccaagcagactgtgagtgccctgtgaatcatgatggtcttggggagggttgggaggggaac

>M13685 M.musculus Locht,C 1986 exon 2 scrapie infected brain
.....aattccttcagaactgaaccatttcaaccgagctgaagcattctgccttcctagtggtaccagtccaatttaggagagccaagcagactatcagtcatcatggcgaaccttggctactggctgctggccctctttgtgactatgtggactgatgtcggcctctgcaaaaagcggccaaagcctggagggtggaacaccggtggaagccggtatcccgggcagggaagccctggaggcaac

Non-redundant set of exon 3 sequences (5' UTR portion)

Compiled from GenBank and Blastn on 1 Sept 98 -- webmaster
Note: GenBank marsupial enry suggests an additional 3' splice site at 230 based on  pattern similarity.

           1 aagcttcagc tggctggctg gtgtccaaag aaggttaagt gtcgcttcta aagggtttct
        61 cccaaaagaa catcaaagaa agtttacact tcatattgca ttcaaggctg ccaatctttg
      121 ctgtttttta atagaagcat cctactcctt cctgatatca tattatgaca attaaaaatg
      181 acatatatgt gtcttaattg tgtttctttt ttcccctcct tttcctttag tggtttctaa
      241 ataaacccag aattttcatg tctttttttt tttccagatc acctaccatg ggaaaaatcc
intron 2 distal portionexon 3 UTRORFGenBankSpeciesReference
gcttcagcctgagtgccggacactgatgccttgttcttcatttcacag atcagccatc atg D50093 Rattus norvegicus 1997 Saeki,K
..................................tcctcattttgcag atcagtcatc atg S69654 zitter rats Gomi,H 1994
aatgacgtgttgctggagtacaatgatgccttgttcttcattttgcag atcagccatc atg M14054 Mesocricetus auratus 1988 Basler,K
agtgttgtgttgttggagtatactgacgccttgttcttcattttgcag attagccatc atg M33958 Chinese hamster Lowenstein,DH 1994
ccttcagcctaaatactgggcactgataccttgttcctcattttgcag atcagtcatc atg U29186 Mus musculus short inc Lee,IY 1998
atttcaacataaatatgggactctgacgttctcctcttcattttgcag agcagtcatt atg U29185 Homo sapiens Lee,IY 1998
..................tcattttgttttgttttgttttgtttgcag ataagccatc atg S46825 Mustela sp. Kretzschmar,HA 1993
gtgatttttacatgggcatatgatgctgacaccctctttattttgcag ataagtcatc atg D26151 Bos taurus Yoshimoto,J 1994
gtgatttttacgtgggcatttgatgctgacaccctctttattttgcag agaagtcatc atg U67922 Ovis aries Lee,IY 1998
gtgattcttacgtgggcatttgatgctgacaccctctttattttgcag agaagtcatc atg AJ000681 Ovis aries Bossers,A 1997
.......................................attttgcag agaagtcatc atg X91999 Capra hircus Goldmann,W 1996
ggctttagcatcggtccaggccactgacagcctcctctctctttccag gtcagctgtc atg U28334 Oryctolagus cuniculus Loftus,B 1997
gtggtttctaaataaacccagaattttcatgtctttttttttttccag atcacctacc atg L38993 Trichosurus vulpecula Windl,O 1995
..............actgccctaacagtgtgtgtccttatgcccgcag cc atg M95404 Gallus gallus Gabriel,JM 1998

Redundant GenBank entries in this region:
 
sheep: U67922 = D38179.  C-42T polymorphism relative to 6 other sheep sequences, cows are C-42T, T-29A, and G2T.

sheep: AJ000736 = AJ000681 = AJ000679 = M31313 = AJ000680 = X91999.

human: S82948 = U29185 = X83416. Polymorphism G-21_ reported in sporadic CJD.

mouse: U29186 = M18070 = M18071
Non-redundant set, fasta format  exon 3 and upstream intron 2 flanker:

>Consensus_probe_sequence
TttagccTaggtacaggacacTGAcgcccTgttCtTcatTTtgCAGatcAGccaTcATG
>D26151_cow
gtgatttttacatgggcatatgatgctgacaccctctttattttgcagataagtcatcatg
>D38179_sheep
gtgatttttacgtgggcatttgatgctgacaccctctttattttgcagagaagtcatcatg
>D50093_rat
gcttcagcctgagtgccggacactgatgccttgttcttcatttcacagatcagccatcatg
>M18070_mouse
ccttcagcctaaatactgggcactgataccttgttcctcattttgcagatcagtcatcatg
>M18071_mouse
ccttcagcctaaatactgggcactgataccttgttcctcattttgcagatcagtcatcatg
>S46825_mink
..................tcattttgttttgttttgttttgtttgcagataagccatcatg
>S82948_human
atttcaacataaatatgggactctgacgttctcctcttcattttgcagagcagtcattatg
>U28334_rabbit
ggctttagcatcggtccaggccactgacagcctcctctctctttccaggtcagctgtcatg
>U29185_human
atttcaacataaatatgggactctgacgttctcctcttcattttgcagagcagtcattatg
>M14054_golden_hamster
aatgacgtgttgctggagtacaatgatgccttgttcttcattttgcagatcagccatcatg
>M33958_Chinese_hamster
agtgttgtgttgttggagtatactgacgccttgttcttcattttgcagattagccatcatg
>U29186_mouse_short_inc
ccttcagcctaaatactgggcactgataccttgttcctcattttgcagatcagtcatcatg

Complete set of exon 1 sequences

Compiled from GenBank and Blastn searches 1 Sept 908 -- webmaster
 5' end of exon 1	        3' end of exon 1	          5'intron 2 or splice	GenBank	    Species and source
gcggcgtccgagcagcagaccgagaaggc	acatcgagtccactcgtcgcgtcggtggcag	  gtaagcggcttctgaaggta	M14055	    Mesocricetus auratus Basler,K 1987
gcg---ttgtcggatcagcagacc	gattctgggcgctgcgtcgcatcggtggcag	  gtaagcgggctgctgaagcc	X79932	    Mus musculus Westaway,D 1994
gcg---ttgtcggatcagcagacc	gattctgggcgctgcgtcgcatcggtggcag	  gtaagcgggctgctgaagcc	U29186	    Mus musculus Lee,I.Y 1998
gcg---ttgtcggatcagcagacc	gattctgggcgctgcgtccgatcggtggcag	  gtaagcgggctgctgaagcc	U52821	    Mus musculus Baybutt,HN 1997
gcg---ttgtcagagcagcagacg	gagtctgagcgtcgcg-----tcggtggcag	  gtaagcgggctgctgaagcc	D50092	    Rattus norvegicus Saeki,K 1997
....gccagtcgctgacagccgcaga	gctgagagcgtcttctctctcgcagaagcag	  gacttctgaatatatttgaa+	D10612	    Bos taurus Yoshimoto,J 1993
....gccagtcgctgacagccgcaga	gctgagagcgtcttctctctcgcagaagcag	  gacttctgaatatatttgaa+	AB001468    Bos taurus Yoshimoto,J 1997
agttgccagtcgctgacagccgcaga	gctgagagcgtcttctctctcgcagaagcag	  gtaaatagccgcgtagtcct*	D26150	    Bos taurus Yoshimoto,J 1994
ctagttgccagtcgctgacagccgca	gagctgagagcgtcttctctcccagaggcag	  gtaaatagccacgtagtcct	X79914	    Ovis aries Westaway,D 1996
ctagttgccagtcgctgacagccgca	gagctgagagcgtcttctctcccagaggcag	  gtaaatagccacgtagtcct	U67922	    Ovis aries Lee,IY 1998
gccagtcgctgacagccgcggcgccg#	cgagcttctcctctcctcacgaccgaggcag	  gtaaacgcccggggtgggag	U29185	    Homo sapiens Lee,IY 1998
gccagtcgctgacagccgcggcgccg#	cgagcttctcctctcctcacgaccgaggcag	  gtaaacgcccgggg......	X83415	    Homo sapiens Puckett,C 1996
gccagtcgctgacagccgcggcgccg	cgagcttctcctctcctcacgaccgag----@  agcagtcattatggcgaacc	X82545	    Homo sapiens Kniazeva,MV 1997
# X83415 and U29185 begin farther in the 5' direction with ccgcccgcgagcgccgccgct tcccttccccgccccgcgt ccctccccctcggccccgcgc gtcgcctgtcctccga.

@ Deletion probable sequencing error, otherwise cDNA splice type 1-3.

+ Start of bovine exon 2, so sequences are splice type 1a23. Note D10612 = D90545; D26151 contains 224 intron bp above exon 3, no exon 2 or exon 1

* Bovine (along with other species) has minor ragged transcription starts: mainly +1 G; +4 A, and -4A plus a major ragged finish with an alternative splice site for exon 1b 115 additional bp partly shown, in full gtaaatagccgcgtagtcctt taaactcccagcggaggacgccaaccc tgggtctgcggccgaggcccagggacccagccgaatcgga ttggtgggaggcagaccttgacc with flanker gtgagtagggctgggggctt gcggcgggcgcgggg...

Oddly the putative exon 1b splice junction is not used in sheep despite an almost identical sequence after the end of 1a: gcag...1a end..gtaaatagccacgtagtcctttaaacccccagcggaggccgcccccggcttgcggccgagg ccctagggcactcagccggatcggactggctgggaggcagaccttgacc...1b end..gtgaggaggactgggggc ttccggcgggcgcggggaacgtcgggcctgttt. Inoue reports intron elements 123-891 to be important for exon functioning in bovine.Rodents and human align poorly with this stretch of artiodactyl prion gene:

                      101                                                150
                 cow   GACCTTGACC ...GTGAGTA GGG.CTGGGG GCT
                 she   GACCTTGACC ...GTGAGGA GGA.CTGGGG GCT
                 hum   GGTCGGGACC CCAGTGAGGA GGGGCCGGGG GCT

Exon 1 and motif region reference sequences and alignment

14 Sept 98 webmaster. See also on-site, off-page archive

>rat  D50092 
tcttcctctttaccaatttcttgttaccaaagttccacgatggcctttttctttccgttaggtaacctttcattttctcgactacccattatgtaacgggagcgctgggttctggatcagtcttccattaaagatgacttttatagtctgtgagcgtcgtcacagagtgctgacactggggtggggaggggagtacggggggagggggttaaacagataacaagcatttaagccagtacggagcggtgactcatcccaccgcgagaagccattggtgagcatcacgctccgcccctcgccccgcccagcccccggcctgtcgggtccctcaccacgccccgctcccccgcgttgtcagagcagcagacggagtctgagtctgagcgtcgcg-----tcggtggcag

>mouse2 U29186 Lee 
tcttcgttaccaatttcttgttaccaaagttcaacgatggcttcctcgctccgttaggtaacctttcattttctcaactacccattatgtaacgggagcattgggtactggatcagtcttccattaaagatgatttttatagttgctgagcgtcgtcagggagtgctgacactgggggcggtttaaacagatacaagcatttaagccagtccggagcggtgactcattcccccaccccccacccccccgcgagagacgcggcgcggccattggtgagcatcacgccccgcccctcgcccagcctagctcccgcctgccccgcccctttccactcccggctcccccgcgttgtcggatcagcagaccgattctgggcgctgcgtccgatcggtggcag

>hamster M14055 Basler
tctccctctttagcaatttcttgctcctagagtttcagcaattgctttctcgctccattaggcaacctttcattttctcaccttccccattatgtaacgggagcaatgggttctggaccagtcttccattaaagatgatttttatagtcggtgagcgccgtcagggagtgatgacacctgggggcggtttaaaccgtacaatcccttaaaccagtctggagcggtgactcatggcgcggccattggtgagcacgacgcaagccccgccccacccagcccggccccgccctgctacccctcctgactcactgccccgcccgctcccccgcggcgtccgagcagcagaccgagaaggcacatcgagtccactcgtcgcgtcggtggcag

>human U29185 Lee
tctcctctttagaaatttctggttgccaaagttccagaaattgcttcctcattcctgagcctttcattttctcgatttctccattatgtaacggggagctggagctttgggccgaatttccaattaaagatgatttttacagtcaatgagccacgtcagggagcgatggcacccgcaggcggtatcaactgatgcaagtgttcaagcgaatctcaactcgttttttccggtgactcattcccggccctgcttggcagcgctgcaccctttaacttaaacctcggccggccgcccgccgggggcacagagtgtgcgccgggccgcgcggcaattggtccccgcgccgacctccgcccgcgagcgccgccgcttcccttccccgccccgcgtccctccccctcggccccgc

>sheep1 U67922=X79914 Lee 
tgtccttttcagaaatttctggttaccagagttcccgaaattgctttctcattccctaatctttcattttctccattacgtaacgagaagctggggctttggccgattttccctctaaagatgatttttatcgtcaacaagcaatttcagggagtgatgagccagggaggcggtgttagttgatgctagcgtttatgctagtctcaactcgtttttcccagggacttagattcctgggtctgccggtaaaccccgggcgcccgcagcgggcgcgcctgagcgtgcgcgcgccgtcgcctccccccccccgcagctcctcctctgcacggcgactcaccagccctagttgccagtcgctgacagccgcagagctgagagcgtcttctctcccagaggcaggt

>cow D26150 Yoshimoto
tgtcccttttagaaatttctggttaccaaagttccagaaattgctttctcattccctaatctttcattttctccattacgtaacgagaagctggggctttggccgattttccctttaaagatgatttttatcgtcaacaagcaatttcagggagtgatgagccggggaggcggtattagctgatgctagcgtttaagctagtctcaactcgtttttcccagggacttagattcctgggtctgccagtaaaccccgggcgccggcagcgggtgcgcctgagcgtcgcgcgcgccgtcgcctccccgcccctgcccctcctcctccgcccggcgacttacccgccctagttgccagtcgctgacagccgcagagctgagagcgtcttctctctcgcagaagca

>mouse1 X79932 Saeki
cttcctcgctccgttaggtaacctttcattttctcaactacccattatgtaacgggagcattgggtactggatcagtcttccattaaagatgatttttatagttgctgagcgtcgtcagggagtgctgacactgggggcggtttaaacagatacaagcatttaagccagtccggagcggtgactcattccccccaccccccacccccccgcgagagacgcggcgcggccattggtgagcatcacgccccgcccctcgcccagcctagctcccgcctgccccgcccctttccactcccggctcccccgcgttgtcggatcagcagaccgattct

>mouse3 U52821 Baybutt
aatgtcgaaaatcttcgttaccaatttcttgttaccaaagttcaacgatggcttcctcgctccgttaggtaacctttcattttctcaactacccattatgtaacgggagcattgggtactggatcagtcttccattaaagatgatttttatagttgctgagcgtcgtcagggagtgctgacactgggggcggtttaaacagatacaagcatttaagccagtccggagcggtgactcatccccccccacccccacccccccgcgagagacgcggcgcggccattggtgagcatcacgccccgcccctcgccccgcctagctcccgcctgccccgcccctttccactcccggctcccccgcgttgtcggatcagcagaccgattct

Motif Region through to exon 1

>ra tcttcct-ctttaccaatttcttgttaccaaagttccacga-tggcctttttctttccgttaggtaacctttcattttctc >mo tcttc--g-tt-accaatttcttgttaccaaagttcaacga--tggcttcctcgctccgttaggtaacctttcattttctc >ha tctccctgctttac-aatttcttgctcctagagtttca-gcaattgctttctcgctccattaggcaacctttcattttctc >hu tctcct--ctttagaaatttctggttgccaaagttcca-gaaattgcttcctcattcc-t--g--agcctttcattttctc >sh tgtcctt--ttcagaaatttctggttaccagagttccc-gaaattgctttctcattccct-----aatctttcattttctc >co tgtccct--tttagaaatttctggttaccaaagttcca-gaaattgctttctcattccct-----aatctttcattttctc ....o.o.o....oo.o..ooooooo.o:o.o:o.oooo:o:.o:..:..ooo..oo..ooo.o.....o:.ooooooooooooo consensus >ra ctttcattttctcgacta-cccattatgtaacgg-gagcgctgggttctggatcagtcttccattaaagatgacttttatagtctgtgagcgtcgtcacagagt >mo ctttcattttctcaacta-cccattatgtaacgg-gagcattgggtactggatcagtcttccattaaagatgatttttatagttgctgagcgtcgtcagggagt >ha ctttcattttctcaccttccccattatgtaacgg-gagcaatgggttctggaccagtcttccattaaagatgatttttatagtcggtgagcgccgtcagggagt >hu ctttcattttctcgatttctccattatgtaacggggagctggagctttgggccgaatttccaattaaagatgatttttacagtcaatgagccacgtcagggagc >sh ctttcattttct--------ccattacgtaacgagaagctggggcttt-ggccgattttccctctaaagatgatttttatcgtcaacaagcaatttcagggagt >co ctttcattttct--------ccattacgtaacgagaagctggggcttt-ggccgattttccctttaaagatgatttttatcgtcaacaagcaatttcagggagt >ra tcacagagtgctgacac-tggggtggggaggggagtacggggggagggggttaaacagataacaagcatttaagccagtacggagcggtgactca >mo tcagggagtgctgacac-tgggggcggt-----------------------ttaaacagatacaagcatttaagccagtccggagcggtgactca >hu tcagggagcgatggcacccgcaggcggt-------------atcaactgatgcaagtgttcaagcgaatctcaactcgttttttccggtgactca >sh tcagggagtgatgagccagggaggcggt-------------gttagttgatgctagcgtttatgctagtctcaactcgtttttcccagggactta >co tcagggagtgatgagccggggaggcggt-------------attagctgatgctagcgtttaagctagtctcaactcgtttttcccagggactta >ra gtgactcat---cccacc------------gcgagaa----------gccattggtgagca----tcacgctccgcccctc--------gccccgcccagcccccgg-cctgtcgggtccctcaccacgcccc------------gctccccc_gcgttgtcagagcagcagacggagtctgag----------------cgtcgcgtcggtggcag >mo gtgactcattcccccaccccccacccccccgcgagagacgcggcgcggccattggtgagca----tcacgccccgcccctc--------gcccagcctagct-cccg-cct------------gccccgcccctttccactcccggctccccc_gcgttgtcggatcagcagaccgattctgggcg-----------ctgcgtcgcatcggtggcag >ha gtgactcat--------------------------------ggcgcggccattggtgagcacgacgcaagccccgccccacccagcccggccccgccctgctacccctcctgactca----ctgccccgcccg-------------ctccccc_gcggcgtccgagcagcagaccgag--aaggcacatcgagtccact-cgtcgcgtcggtggcag >hu gtgactca--ttcccggccctgc--ttggc-agcgctgcaccctttaacttaaacctcggccggccgcccgccgggggcacagagtgtgcgccgggccgcgcggcaattggtccccgcgccgacctccgcccgcgagcg_ccgccgcttcccttccccgccccgcccgcgtccctccccctcggccccgcgcgtcgcctgtcctccga------_gccagtcgctgacagccgcggcgccgcgagcttcc >sh gggacttagattcctgggtctgccggtaaaccccgggcgcccgcagcgggcgcgcctgagcgt-----------------------------------gcgcgcgccgt--------cgcc--tccccccccccgcagctcctcctctgcacggcgactcaccagc-----------------------------------cct------_agttgccagtcgctgacagccgcagagctgagagcgtct >co gggacttagattcctgggtctgccagtaaaccccgggcgccggcagcgggtgcgcctgagcgt----------------------------------cgcgcgcgccgt--------cgcc--tccccgcccctgcccctcctcctccgcccggcgacttacccgccct-----------------------------------------_agttgccagtcgctgacagccgcagagctgagagcgtct

Astonishing Cpg Island around human exon 1

According to the Sanger Center:

"The Cs of most CpG dinucleotides in the human genome are methylated. Methyl-C tends to mutate to T, and so CpG dinucleotides tend to decay to TpG / CpA. This is believed to account for the fact that in bulk human DNA CpG dinucleotides occur about five times less frequently than expected (Bird, 1980, Jones et al 1992).

CpG islands are unmethylated regions of the genome that are associated with the 5' ends of most house-keeping genes and many regulated genes (Bird, 1986, Larsen et al 1992). The absence of methylation slows CpG decay, and so CpG islands can be detected in DNA sequence as regions in which CpG pairs occur at close to the expected frequency. The fact that CpG islands can be detected in this way indicates that the corresponding germline DNA has been substantially hypomethylated for an extended period of time, and in fact about 80% of CpG islands are common to man and mouse ( Antequera and Bird 1993 ).

About 56% of human genes and 47% of mouse genes are associated with CpG islands ( Antequera and Bird, 1993 ) Often CpG islands overlap the promoter and extend about 1000 base pairs downstream into the transcription unit. Identification of potential CpG islands during sequence analysis helps to define the extreme 5' ends of genes. CpG islands are commonly defined as regions of DNA of at least 200 bp in length and that have a G+C content above 50% and a ratio of observed vs. expected CpGs close to or above 0.6. "

Bird ( 1980 ) NAR, 8, 1499 - 1504
Bird (1986) Nature, 321, 209 - 213
Jones et al, (1992) BioEssays, 14, 33-36
Larsen et al, (1992) Genomics 13, 1095-1107
Antequera and Bird (1993) PNAS 90, 11995-11999
Cross et al (1994) Nature Genetics 6, 236-244
 Gardiner-Garden and Fromer (1987)

There are 3 human retrotransposons in region (intron 1) between exon 1 and 2; rodents have 3 different ones; artiodactdyls have none:

human:intron 1 = 2622 bp; only last 180 bp related to cow/sheep
     repeat_region   11478..11800 AluJo  = 323 bp
     exon 1          12634..12767        = 134 bp
     repeat_region   14413..14498 L1MC1  =  86 bp
     repeat_region   14583..14653 MIR    =  71 bp = 353 bp total
     repeat_region   14752..14947 L1ME3  = 196 bp
     exon 2          15390..15488        = 469 bp

mouse:intron 1 = 2190 bp; 83% identical to rat
     repeat_region     7980..8069  PB1D7 
     exon 1            8612..8658        =  47 bp
     repeat_region     9619..9844  B3    = 226 bp
     repeat_region    10044..10163 B1-F  = 120 bp = 539 bp total 
     repeat_region    10070..10163 PB1D7 = 193 bp
     exon 2           10849..10946       = 469 bp

sheep:intron 1 = 2421 bp; 91% identical to cow
     repeat_region    3756..4215 MLT1F   = 469 bp
     exon 1a          5666..5717         =  52 bp
     exon 2           8139..8236         =  98 bp
The CpG island in the sheep prion gene is similar: 89 occurences of CpG in 1020 bp (8.7%) of a normally severely depleted dinucleotide . Recall CpG are mutational hotspots when methylated to 5mC but that poly ADP-ribosylation protects CpG in promoter regions of eukaryotic genes.

Note that the anomaly is strongly skewed against the promoter and exon 1 proper, that is, it falls mainly to the 3' UTR side in intron 1. (Researchers confirmed the significance of this region with nested deletions.) The known retrotransposons do not kick in until much later, eg 12806..13148 MLT1F, 13146..13321 MLT1F, 13227..13648 MER57_internal, 16433..16524 Bov-tA2, whereas exon 2 is already at 8139..8236.

CpG incidence is displayed in terms of intervening nucleotides below, relative to exon 1 (positions 5666 - 5717).

ctcaact
CGtttttcccagggacttagattcctgggtctgc
CGgtaaaccc
CGgg
CGcc
CGcag
CGgg
CG
CGcctgag
CGtg
CG
CG
CGc
CGt
CGcctccccccccc
CGcagctcctcctctgca
CGg
CGactcaccagccctagttgccagt
CGctgacagc
CGcagagctgagag
CGtcttctctcccagaggcaggtaaatagcca
CGtagtcctttaaacccccag
CGgaggc
CGcccc
CGgcttg
CGgc
CGaggccctagggcactcagc
CGgat
CGgactggctgggaggcagaccttgac
CGtgaggaggactgggggcttc
CGg
CGgg
CG
CGgggaa
CGt
CGggcctgtttag
CGtgct
CGttggtttttgccagccac
CGct
CGgttttgccctcctggttaggagagctccatttact
CGgaatgtggg
CGggggc
CG
CGgctggctggtccccctcctgaagtatgtgggtggtgtgtaggaatctagccccctccca
CGct
CGtccactg
CGggagtggcatggg
CGgat
CGcac
CGgtagaggggc
CGcagtc
CGaggaac
CGctggggagatcagaagaacaag
CGagaggccc
CGggctctgggccctcc
CGaagcccag
CGgaga
CG
CGgaattgggggtggggggtggggaagaag
CGgg
CGcccaa
CGgggccagacct
CGgc
CGtgaggagtgc
CGgag
CGac
CGtgggcccccagc
CGctgctgc
CGaactcctcc
CGagagg
CGgccctgcttgccatca
CG
CGgctgggaggtacctgggtagc
CGcag
CGggtgggtctctggcagccccctggggat
CGgct
CGgg
CGgg
CGtg
CGtggcctgggcttcagcct
CGg
CGaggggagtcatggg
CGacc
CGgccctctctccagagaaatccaggtac
CGggagcagtgtttcctgggagctctgatgtggt
CGacccaaaagcaaag
CGatatttt
CGctgtct
CGactgaaggagggaact
CGgcc

Spliced sequence probes

1 Sept 98 webmaster
>U29185-spliced human hypothetical 1-2-3-ORF fusion
ccgcccgcgagcgccgccgcttcccttccccgccccgcgtccctccccctcggccccgcgcgtcgcctgtcctccgagccagtcgctgacagccgcggcgccgcgagcttctcctctcctcacgaccgaggcaggactcctgaatatttttcaaaactgaacaatttcagccatgtctgagctttccgtcttcctggaggcacaaatctagtttagctgaaccacaacagattagcagtcattatg

>U29185-spliced human 1-3-ORF fusion
ccgcccgcgagcgccgccgcttcccttccccgccccgcgtccctccccctcggccccgcgcgtcgcctgtcctccgagccagtcgctgacagccgcggcgccgcgagcttctcctctcctcacgaccgaggcagagcagtcatt-atg

When were the Alu inserted in the prion gene?

8 Sept 98 webmaster opinion A new tool allows researchers to Blast any query sequence against their own sequence database. This is a great help in cross-species comparisons of upstream and downstream prion sequences. Note too that conventional NCBI Blast does not choke (or even slow down) on huge queries, like the 35,000 bp of human prion.

It is possible to track the growth in size of the human prion gene over time, at least for insertion of the Alu elements, which have been dated with Fla-a at 68mya, subfamilies Jo and Jb originating around 57 mya, subfamily Sx around 37 mya, and subfamily Y quite recent and possibly still retrotranspositionally competent. Mouse prion has three B1-F and six B1_MM which could be dated as well. A very recent paper suggests that the mouse Line-1 elements, the 541bp L1MA4, the 169 bp L1_MM, and the 181bp LINE2 -- which are possibly still active -- could also be dated:

Determining and dating recent rodent speciation events by using L1 (LINE-1) retrotransposons

PNAS 95 11284-11289, September 15, 1998
Olivier Verneau, FranÁois Catzeflis and Anthony V. Furano
"The repeated DNA subfamilies generated by the mammalian L1 (LINE-1) retrotransposon are apparently homoplasy-free phylogenetic characters. L1 retrotransposons are transmitted only by inheritance and rapidly generate novel variants that produce distinct subfamilies of mostly defective copies, which then "age" as they diverge. Here we show that the L1 character can both resolve and date recent speciation events within the large group of very closely related rats known as Rattus sensu stricto...

All mammalian L1 elements contain four regions: a 5' UTR involved in regulation; ORF I, which encodes an RNA-binding protein; ORF II, which encodes a reverse transcriptase; and the 3' UTR. The evolution of the 3' UTR appears to occur rapidly enough to make it a useful source of phylogenetic characters for analyzing recent or rapid speciations. R. rattus and R. norvegicus diverged 2 million years ago. The Mus/Rattus dichotomy from the fossil record dates is 12.2 Mya."

The GenBank entry for human prion, U29185, is mis-annotated. The Alu retrotransposons have left and right homologous regions joined by an A-rich linker with a poly-A terminus, 300 bp or so all told. Alu units arose from 7sn RNA and are found in primates only though distant similarities are found in rodent B1 repeats; the oldest off-shoots, Fam, Fla, and Fra, are monomers but only Fla-A and Fla-C are reported in the human prion gene (where they are called FLAM_A and FLAM_C). The GenBank entry double-counts 9 of the Alu units through some sort of weird off-set error. There are only 15 Alu in actuality.

strandstartendfamilylengthseparationscrew-up
+ 3,131 3,424 AluSx 293 2838 ok
+ 5,969 6,265 AluSx 296 1004 ok
- 6,973 7,266 AluSx 293 1978 ok
+ 8,951 9,253 AluJb 302 1032 ok
- 9,983 10,281 AluSx 298 1495 ok
- 11,478 11,800 AluJo 322 8119 ok
- 19,597 19,708 AluJb 111 1 duplicate
- 19,598 19,709 AluJb 111 477 ok
+ 20,075 20,349 AluJo 274 1 duplicate
+ 20,076 20,350 AluJo 274 396 ok
+ 20,472 20,604 FLAM_C 132 1 duplicate
+ 20,473 20,605 FLAM_C 132 7764 ok
- 28,237 28,534 AluSx 297 2 duplicate
- 28,239 28,536 AluSx 297 1504 ok
+ 29,743 30,047 AluSq 304 2 duplicate
+ 29,745 30,049 AluSq 304 2073 ok
- 31,818 31,949 FLAM_A 131 2 duplicate
- 31,820 31,951 FLAM_A 131 1323 ok
+ 33,143 33,315 AluJo 172 2 duplicate
+ 33,145 33,317 AluJo 172 177 ok
+ 33,322 33,619 AluSg 297 2 duplicate
+ 33,324 33,621 AluSg 297 523 ok
+ 33,847 34,134 AluY 287 2 duplicate
+ 33,849 34,136 AluY 287 - ok

mRNA: 12634-12767 exon 1, 15390-15488 exon 2, 25464-27817 exon3

A non-redundant set of Alu sequences from the human prion is shown below. It is not easy to confirm their assignments using the Alu-only Blast server at NCBI because percent identity is not as important for classification as critical changes that affect secondary structure etc. For example, the reported AluY has a much better hit with consensus Alu-Sb than with consensus AluY, though one of the latter provides the best non-consensus match.

strand	start	end	family	length	separation	screw-up	
+	3,131	3,424	AluSx	293	2838	ok	
ggccgggtgt ggtggctcac acctgtaatc ccaacacttt gggaggctga
     3181 ggcgggcaga tcacctgagg tcaggagttt gaaaccagcc tggccaacat ggcaaaaaca
     3241 ctgtctctac taaaaaatac aaaaattagc cgggtgtgtt ggcacgtgcc tataatccca
     3301 gctacttggg gggctgaggc aggagagtca cttgaacccg ggaggcagag gttgcagtga
     3361 gacaagatca tgccactgca ctccagcctg gggaacagag cgaaactccg tctaaaaaaa
     3421 aaaa

+	5,969	6,265	AluSx	296	1004	ok	
gg ctgggcacag tggctcatgc ctgtaatccc
     6001 agcactttgg gaggctgagg cgggcagatc acttgaggtc aggagttcga gaccagcctg
     6061 ggcaatacgg tgaaacccag tctctactaa aaacacaaaa attagctagg catagtggtg
     6121 catgactgta atcccagcta gttaggaggc tgaggcagga gaatcgcttg aacccaggag
     6181 gtggaggttg cagtgagatt gtgccactgc actccagcct gggtgacaga gcgagactcc
     6241 atctaagaaa aaaaaaatca gaaaa

-	6,973	7,266	AluSx	293	1978	ok	
tttttttt ttttgagatg gagtcttgct ctgtcaccca ggctggagtg
     7021 cagtggcaca atcttggctc actgcaacct ccgcctcctg ggttcaagcg attctcctgc
     7081 ctcagcctcc ttagtagctg ggactacagg catgcaccac cacgcccagc taatttttgt
     7141 atttttagta gagacagggt ttcaccatgt tggccaggct gttcttgaac tcttgacctt
     7201 aggtgatcca cccgcctcgg cctcccaaag tgctgggact ataggcgtga gccacttcac
     7261 caggcc

+	8,951	9,253	AluJb	302	1032	ok	
ggctaagcat ggtggctcac acctgtgatc ccagcaattt gagaggccga
     9001 gataagcaga ttgcttgagc tcaagagttc aaaaccaacc tggacaacat agtgagaccc
     9061 ccgtctctaa aacaaataca aaaatcagcc aggcatggtg gctcacgcct gtggtcccag
     9121 ctactcagaa gactgaggta agaggatagc ttaagcccag gaggcagagg ttgtagtgag
     9181 ccgagatcac cccacggcac tccagcctgg gcaacagagt gagaccctat ctaaaaaaaa
     9241 agaagaaggt aaa

-	9,983	10,281	AluSx	298	1495	ok	
tttttttt tttcttttct gagacagagt ctcgctctgt
    10021 cgcccaggct gaagtgcaat ggtgccatct tggctcactg caacctctgc ctcccgggtt
    10081 caagcgattc tcctgcctca ccctcctgag tagctgagat tacaggtgcg catcaccatg
    10141 cctggctaat ttttgtattt ttaatggaga cggggtttca ccatgttgcc caggctggtc
    10201 ttgaactcct gacctcaagt aatccatggg cctcggcctc ccaaagtgct gggattacag
    10261 gcatgagcca ccgcgcccgg c

-	11,478	11,800	AluJo	322	8119	ok	
ttt ttttgtaaag agacaaagag tgagacaggg
    11521 tcttggccta tagccctgtg gcccaggctg gagtgcagtg gcacaattaa agctcactgc
    11581 agcctctacc tcctgggctc aagcaatcct cccatctcag cctccccagt agctgggact
    11641 acaggactgt gccaccgttc ccagcaattt tttttatttt ttgtagaaat ggggtctcac
    11701 tatgttgctc actctggtct caaactcctg agctcaagca atcttcctgc cttggcctct
    11761 gaaagtgctg ggattacagg cctgagccac tgcacctggc

-	19,597	19,708	AluJb	111	1	duplicate	477
atgt tttgtagaca tggggtttca
    19621 ccatgttgcc caggctggtc tcaaactcct gggctcaagc aatcctcctg cctgagcttc
    19681 ccaaagtgct gggattatag gcttgagc

+	20,075	20,349	AluJo	274	1	duplicate	396
gctgtg gctcacacct ataatcccag
    20101 cactttggga ggctgaggcc tgcagattgg ttgagcccag gaatttgaga ccagcctggg
    20161 caacatggca agaccccatc actattaaaa atacaaaaaa aaaaaaaaac caggcgtggt
    20221 gttgcatacc tgtggtccca gctactcagg aggatcactt gagcccactg ggtggaggtt
    20281 gcagagagct gagatcatgc cactacactt tggcctgggt aagaccctgt ctcaagaaaa
    20341 aaaaaaaaa

+	20,472	20,604	FLAM_C	132	1	duplicate	7764
ggctgggca tggtggctca cacctgtaat ctcagcactt tggggggcca
    20521 agacaggaag atctcttgag cccagaagtt tgaggcgagc ctgggcaaca tagcaagacc
    20581 ccatctctac cagaaaagaa acaa

-	28,237	28,534	AluSx	297	2	duplicate	1504
tttt gttttgtttt ttttgagacg
    28261 gagtctcgct ctgtcgccca ggctggagtg cagtggcggg atctcggctc actgcaacct
    28321 ccgcctcccg ggttcaggcg attctcctgc ctcagcctcc tgagtagctg ggactacagg
    28381 catatgccac catgcccggc taatttttgt atttttagta gagatggagt ttcaccatat
    28441 tggccaggct gttctcaaac tcggcctcaa gtgatctgct cgcctcagcc acccaaagtg
    28501 ctaggattac aagcatgagc caccgcgccc ggcc

+	29,743	30,047	AluSq	304	2	duplicate	2073
ggccaggc acagttgctc
    29761 acgcctgtaa tccctgcact ttgggaggcc gaggcaggca gatcacttga ggtcaggagt
    29821 tcaagaccag cctggccaac atggtgaaac ccccattccc tactaaaatt acaaaaaatt
    29881 agctgggtgt ggtcgcacgt gcctgtaatc ccagctactc aggaggctga ggcaggagaa
    29941 tcgcttataa ccgggaggca gaggctgcag tgagccgaga tcccgccatt gcactccagc
    30001 ctggcggtaa gagcgaaact ccgtctcaaa aaaataaaat aaaataa

-	31,818	31,949	FLAM_A	131	2	duplicate	1323
tt tttttttttc ttttgagata ggatcttgct atattgccca
    31861 ggctggtctc aaattcctgg gctcaagcaa tcctcccatc tcagcctccc aagtggctga
    31921 gatgacaggc acgtgccact atgtctggc

+	33,143	33,315	AluJo	172	2	duplicate	177
caatcaat tcaaaaatta tagtggcctg gggttgtgtg
    33181 ggtgaggtgg ctgcttggga ggctgaggtg ggaggatggc ttaagcccag tagtgcaagg
    33241 ctgttgcaag ctgtgaccgc accactgcac tcctgcctgt gcaacagagc aagacaccat
    33301 ctttataaaa acaaa

+	33,322	33,619	AluSg	297	2	duplicate	523
gccagacgc ggtggctgac acctgtaatc ccaacacttt
    33361 gggaggccga ggtgggcgga tcacgaggtg aggagtttga gaccagcctg gccaatatgg
    33421 tgaaaccctg tctctcccaa aaacacaaaa actatctggg tgtggtggtg tgtgcctgta
    33481 gtcccagcca ctcgggaggc tgaggcagaa gaatcgcttg aacccaggag gcggaggctg
    33541 cagtgagccg agatcgcgcc actgcactcc agcctgggtg acagagtgag actctgtctc
    33601 aaaacaaaac aaaacaaaa

+	33,847	34,134	AluY	287	2	duplicate	0
ccgg gcgcggtggc tcacacctgt aatcccagca ctttgggagg ccgaggcggg
    33901 cagatcagga ggtcaggagc tcgagaccat cctggctaac acggtgaaac cccgtctcta
    33961 ctaaaaatac aaaaaattag ccgggcgtgg tggcgggcgc ctgtagtccc agctactcgg
    34021 gaggctgagg caggagaatg gcgtgaaccc gggaggcgga gcttgcagtg agccgagatt
    34081 gtgcctctgc actccaacct gggtgacaga gcgagactcc gtctcaaaaa taaa

Motif Mystery

14 Sept 98 webmaster
In PNAS 1994 Jul 5;91(14):6418-6422, Westaway et al. put forward the unsatisfactory idea of naming discrete motifs 1 through 4 upstream of exon 1, some of which featuring insignificant palindromy. Motif 1 has held up well as the number of species sequenced rose to 6 but others have evaporated (or fused or softened) as even better ones emerged (well, they were there all along) upstream of motif 1.

It is more accurate to say is that there is non-discrete 'motif' region of no known function [starting at about -350 to exon 1] containing surprisingly (and variously) conserved stretches that can be used to anchor alignments in the face of many pesky deletions in the promoter itself and of incongruities with exon 1. A number of nucleotides are absolute invariants in the three lineages; for comparison, exon 2 has over half and additionally lacks the deletions.

A polymorphism in the conserved motif region is more suggestive than one in a chaotic region or the middle of a large intron. If associated with disease, it would move things into interesting and unknown biological territority. It is safe to say that the motif region has not been stable for100 million years without strong selective pressure on a function. That same point can be made for cryptic human exon 2.

It is very odd to see the promoter region (in rodents, includes 3x Sp-1, Ap-1, CCAAT) experiencing greater variability than the motif region. The motif region has been experimentally deleted in two species without consequences for transcription, though not all facets -- such as tissue-specific expression -- can be tested. [H. Baybutt even reported that the much more orderly exon 2 region can serve as a fairly good self-contained promoter.] Were the motif region a standard regulatory element of eukaryotic gene expression or a universal structural element, it would call up something on Blastn, but it does not, though perhaps gap parameters cannot be sufficiently relaxed or a more special-purpose database is needed.

Though the prion gene is chock-full of retrotransposons, there are no recognizable ones here except for 3 each in human and mouse intron 1. A portion of the motif region looks like this (with 'motif 1' shown in blue):

>ra tcttcct-ctttaccaatttcttgttaccaaagttccacga-tggcctttttctttccgttaggtaacctttcattttctc 
>mo tcttc--g-tt-accaatttcttgttaccaaagttcaacga--tggcttcctcgctccgttaggtaacctttcattttctc 
>ha tctccctgctttac-aatttcttgctcctagagtttca-gcaattgctttctcgctccattaggcaacctttcattttctc
>hu tctcct--ctttagaaatttctggttgccaaagttcca-gaaattgcttcctcattcc-t--g--agcctttcattttctc 
>sh tgtcctt--ttcagaaatttctggttaccagagttccc-gaaattgctttctcattccct-----aatctttcattttctc
>co tgtccct--tttagaaatttctggttaccaaagttcca-gaaattgctttctcattccct-----aatctttcattttctc
....o.o.o....oo.o..ooooooo.o:o.o:o.oooo:o:.o:..:..ooo..oo..ooo.o.....o:.ooooooooooooo  identity


>ra gacta-cccattatgtaacggg-agcgctgggttctggatcagtcttccattaaagatgacttttatagtctgtgagcgtcgtcacagagt
>mo aacta-cccattatgtaacggg-agcattgggtactggatcagtcttccattaaagatgatttttatagttgctgagcgtcgtcagggagt
>ha accttccccattatgtaacggg-agcaatgggttctggaccagtcttccattaaagatgatttttatagtcggtgagcgccgtcagggagt
>hu gatttctccattatgtaacggggagctggagctttgggccgaatttccaattaaagatgatttttacagtcaatgagccacgtcagggagc
>sh -------ccattacgtaacgagaagctggggcttt-ggccgattttccctctaaagatgatttttatcgtcaacaagcaatttcagggagt
>co -------ccattacgtaacgagaagctggggcttt-ggccgattttccctttaaagatgatttttatcgtcaacaagcaatttcagggagt
...........oooooo.oooooo.o.ooo....o.o...oo.....o.o.o...ooooooooo.ooooo..oo......oo....ooo...oo.  identity

An alignable region within intron 1

15 Sept 98 webmaster
Aligning intron 1 across all species calls for a different strategy than aligning proteins because gaps are common (coding phase is not an issue). What works the best is to align phylogenetically close species first, ie, cow-sheep and rat-mouse-hamster (after noting/removing any recent retrotransposons). These sequences provide enough long anchor segments that the alignment is reliable; 3 or more close species allows a consensus sequence that avoids singlet noise. Next, these sequence sets are aligned to each other, holding gaps and alignment from the first round fixed. This reduced ambiguity helps greatly in a distant alignment; again strong anchor sequences are crucial. Finally, human can be aligned, bridging to the sequence set that fits best on a particular stretch. Using this method -- which greatly extends what is possible in software -- the last 700 bp of intron 1 are alignable with some confidence:
>She  gaaatttgtgaaaaa---------------tggatcctttaagccatgaccctgaaaccccactcctgggaacttacctg-caat--ggaagaaattcggaaagaagaa--
>Bov  gaaatttgtgaaaaa--------cagtcaggtgatcctttaagccatgaccctaaaaccc-actcctgggaacttacctg-taat--ggaggaaaccaggaaagaagaaga
>Hom  acattttgttaagcaatctggtgatgcattaagaagctggaagctgtgacccagaaaccccactcctgagaacttacctg-caat--ggaagaaacaaacaaacaaaaac-
>Mus  cccagcagtaaaacaatctggtgaggtatta-ttagtcgtgtgctgtgacccagaaaccccactcctggcaatttac-tgggaa---ggaacaaacaaagggctagggg--
>Rat  cccagccgtaaaacaatctggtgaggtatta-ttagttgcatgctgtgacccagaaaccccacttctggcaattcacctgccgtggtggaaccaacaaagggctagggg--

>She  -aagctgcattcacccacagggctcagaatgatctaaaattagatccagt-ccagagacaacctaaaggtattaagaaaatagcagggcagcagctaagaaaatcatagcactttaa
>Bov  aaagctgcattcacccacagaactcagaatgatctaaaattagatccagt-ccggagtcaacctaaatgtattaataaaatagcagggcagcagctaagaaaatcatagcactttaa
>Hom  aggcatgtattcc----tag----cagaatgatctaaaattagaacacctggaaaag--agcctaaatgtat------aacaccagggcagtagctaagaaaattatgacacattaa
>Mus  agccatatggcctg----------cagttagag--aaaattagatccaactgaaaaatcaacctaaaggtgt------aaaagccaagcagt---taagaaact---gaca--ggct
>Rat  agccatatggccaa----------cagttacag--aaaattagatccaagggaaaag-caacctaaatgttt------aacaggcgagcagc---taagaaact---gaca--ggct
...

Mad Cow Home ... Best Links ... Search this site