Prion Gene Pseudogenes
Mad Cow Home ... Best Links ... Search this site

Analysis of the prion gene: introduction
Feature tables of mammalian prion genes
Rat prion cytochrome c pseudogene
Outgroup arbitration applied to prion promoter
Alignment of cyt c pseudogene delimited pre-exon 1
Ancestral size of intron 1
Full length alignment and analysis of cow and sheep



Supplemental resources: on-site, off-page
...Masked human prion: 13,885 bp removed, 21,637 bp left
...Masked mouse prion: 11,692 bp removed, 26,726 bp left
...Masked rat prion: 917 bp removed, 7577 bp left
...Masked sheep prion
...Masked cattle prion

Supplemental resources: off-site
...Censor Server at GIRI
...Blast services: 2 sequences, personal database, or by taxon

Analysis of the human prion gene: introduction

25 May 99 webmaster
IY Lee et al. published a major article on the human, sheep, and mouse prion gene sequences in Genome Res. 1998 Oct;8(10):1022-37]. Recall that the coding part of the mammalian gene comprises some 774 nucleotides whereas these researchers sequenced 35,000 bp envelopes of these three species. Many improvements and new observations were made relative to the version that appeared in the 1996 Erice symposium, previously summarized and expanded upon at this site.

The article is well worth reading in detail. A cytochrome c processed pseudogene was observed in rat prion delimiting the promoter region; a laminin receptor pseudogene in mouse far upstream (a curious coincidence given its purported binding to prion protein); a 4.5 S rRNA pseudogene, and various SINE and LINE retrotransposons that were classified into ancestral mammalian and lineage-specific. They also reconstructed the history of intron 2, which is 9,970, 18,012, and 14,031 bp in human, mouse, and sheep, respectively. The mammalian prion gene grew longer over time on average through retrotranspositional insertional events in excess of deletional loss.

Intron 2 was 9,100 bp at the time of mammalian divergence. Since the average is 14,004 today, an extra 4,904 bp (54%) has crept in over the eons, about 50 bp per million years. This process is no doubt continuing today (viz., the IAP defective virus in some mouse strains). A more recent retrotransposon can split a previous one upon insertion. This gives a strip-and-join method of finding unrecognizeable older insertions and a way of ordering of events that supplementing point mutational rate dating techniques (as used with the Alu primate family).

SINEs and LINEs are a specialized world. The literature is immense, the sequences an acquired taste, the subtypes unending. Because retrotransposon analysis is constantly improving, the paper by Lee is not fully consistent either with their GenBank entries nor with a 15 Mar 99 analysis by retrotransposon expert, Jerzy Jurka, whose Censor Servor analyzes and annotates genes in real time. Indeed, many GenBank entries are under-annotated or mis-annotated with respect to retrotransposons.

Below, various topics are explored concerning the structure of the prion gene in more depth than was possible in the IY Lee paper, exploiting the more contemporary feature analysis by Jurka and by considering additional sequenced species. By stripping off insertional elements and pseudogenes found in particular lineages, better alignments and Blast queries are possible.

The rat cytochrome c pseudogene conveniently limits the extent of the prion promoter to 454 bp. A primitive rodent promoter can be reconstructed using repeat masking and a tree-topology driven consensus technique called outgroup arbitration, which allows resolution of most indels into either deletions or insertions and determines as well which lineage had the point mutation where sequences are different.

It turns out that in non-coding DNA, whether promoter or not, small deletions or insertions occur at roughly the same rate as point mutations. (Note these changes are 'accepted' rates, not incidence rates; only changes fixed by genetic drift and selection across the species are under consideration.) The size distribution of indels shows a rapid fall-off with increasing size; many occur in a repeat context, suggesting replication slippage.

The whole region delimited 5' by the rat pseudogene and 3' by the start of exon 2 is aligned here for all species available. Erratic changes can be understood by first aligning close pairs such as sheep and cow that present no difficulties. The human sequence can often cast a deciding vote when sheep and cow differ. Anchor regions (long conserved stretches) allow rodents to be added and localize residual uncertainty

This sets the stage for asking why exon 1 aligns so poorly between rodents and other mammals; indeed, upstream presumptive transcription start and control signals are far better preserved. The ancestral size and history of intron 1 can be deduced using rodent, primate, and artiodactyl sequences. Here, after discounting retrotransposons, non-rodents average 2,386 bp while rodents average 1,871 yet no conspicuous large deletion can be identified that accounts for the 515 bp difference.

Features of mammalian prion genes

15 May 99 GenBank U29185 human gene feature analysis by Lee et al, Censor Servor , webmaster
The table below shows that 57 retrotransposons make up 39.1% of the 35,522 bp human prion gene region sequenced. Exons 1, 2, 3 total about 991 bp or only 2.8%. That leaves the origin and function of 64% of the region unexplained.

None of the insertion events below are recent. The Alu mainly date to a period 35 million years ago when a master element was active in primates. The LINE elements were in some cases inserted prior to the mammalian radiation. Thus, these elements are not plausible human population polymorphisms. Simple sequence repeats (SSRs) are more useful markers in this regard: see Figure 1 of the Lee paper.

Features of the human prion gene
dir

+
+
+
-
+
-
-
+
-
-
+
+
-
-
+
+
+
+
+
+
-
+
+
-
+
+
+
+
+
-
-
-
-
+
-
+
+
+
+
-
-
-
-
-
+
-
-
-
-
+
-
-
-
-
-
+
+
29+
start

4
288
1626
1926
2106
2814
2962
3131
3425
5410
5839
5969
6977
7521
7960
8262
8481
8961
9246
9915
9995
10284
10417
11511
14413
14583
14752
16267
16945
17566
18029
18325
18968
19154
19597
19713
20075
20472
20618
21144
21400
21643
21928
23507
24675
28160
28247
28535
29185
29743
30816
31331
31818
33195
33343
33847
34288
Totals
stop

206
402
1685
2000
2526
2926
3118
3422
3469
5640
5965
6253
7266
7558
8260
8440
8951
9241
9349
9971
10281
10416
10757
11800
14514
14657
14952
16334
17564
18021
18302
18550
19075
19584
19708
20018
20341
20607
20663
21395
21640
21927
22898
24188
24821
28217
28534
28907
29635
30034
30998
31431
31950
33300
33592
34133
34748
35522
bp

203
115
60
75
421
113
157
292
45
231
127
285
290
38
301
179
471
281
104
57
287
133
341
290
102
75
201
68
620
456
274
226
108
431
112
306
267
136
46
252
241
285
971
682
147
58
288
373
451
292
183
101
133
106
250
287
461
13885
39.1%
gap

81
1223
240
105
287
35
12
2
1940
198
3
723
254
401
1
40
9
4
565
23
2
0
753
2612
68
94
1314
610
1
7
22
417
78
12
4
56
130
10
480
4
2
0
608
486
3338
29
0
277
107
781
332
386
1244
42
254
154
773
21637
60.9%
feature

MER74
MER3
L1P_MA2
L1P_MA2
MER65A
MLT1A
MLT1A
Alu-Spqxz
MLT1A
LTR16C
L1ME_ORF2
Alu-Sz
Alu-Sx
HERVFH21
LINE2
MER5A
LINE2
Alu-Jb
LINE2
MLT1G1
Alu-Sz
MLT1G1
MLT1G
Alu-Jo
L1MD2
MIR
L1ME3A
MER5A
L1P_MA2
L1P_MA2
L1ME_ORF2
L1ME_ORF2
L1P_MA2
LINE2
Alu-Jb
LINE2
Alu-Jo
Alu-Jo
LINE2
L1ME_ORF2
L1PA11
L1PA15
L1ME_ORF2
LINE2
L1MB7
MER28
Alu-Sz
MER28
MER88
Alu-Sq
MER5A
MER5A
Alu-Jo
SVA
SVA
Alu-Y
MLT2B2
57
sorted

Alu-Jb
Alu-Jb
Alu-Jo
Alu-Jo
Alu-Jo
Alu-Jo
Alu-Spqxz
Alu-Sq
Alu-Sx
Alu-Sz
Alu-Sz
Alu-Sz
Alu-Y
HERVFH21
L1MB7
L1MD2
L1ME3A
L1ME_ORF2
L1ME_ORF2
L1ME_ORF2
L1ME_ORF2
L1ME_ORF2
L1PA11
L1PA15
L1P_MA2
L1P_MA2
L1P_MA2
L1P_MA2
L1P_MA2
LINE2
LINE2
LINE2
LINE2
LINE2
LINE2
LINE2
LTR16C
MER28
MER28
MER3
MER5A
MER5A
MER5A
MER5A
MER65A
MER74
MER88
MIR
MLT1A
MLT1A
MLT1A
MLT1G
MLT1G1
MLT1G1
MLT2B2
SVA
SVA
57
lineage

primate
primate
primate
primate
primate
primate
primate
primate
primate
primate
primate
primate
primate
unknown
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
unknown
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
mammalian
unknown
unknown
13 Alu

Features of the rat prion gene
dir

-
+
-
+
-
+
+
+
start

445
742
1136
1275
2174
3864
4302
2943
stop

477
881
1268
1342
2266
4069
4428
3059
bp

33
140
133
68
93
206
127
117
gap

265
255
7
832
1598
233
-
-
feature

RSINE2
RSINE2
B1
RMER17A
RDRE1_RN
B3
B1
B3
sorted

B1
B1
B3
B3
RDRE1_RN
RMER17A
RSINE2
RSINE2
lineage

rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent

Features of the mouse prion gene
dir

-
+
+
+
-
-
-
+
-
+
-
+
+
+
+
+
+
+
-
+
+
+
-
-
+
+
+
+
+
+
+
+
+
-
+
+
-
+
+
+
+
+
+
-
+
+
+
-
-
-
+
+
start

1400
2225
2314
3695
4189
4643
4796
5296
7314
7649
7980
8110
9619
10044
11233
12198
12675
13008
13607
13755
14007
15035
16219
16687
17440
18222
18806
21141
22643
22680
22720
22812
22854
22898
23280
24946
25807
26201
26470
26990
30947
31289
31438
32607
33417
36189
36370
36455
36719
36943
37166
37591
stop

1478
2283
2608
4116
4342
4738
4929
5521
7348
7758
8095
8169
9844
10163
11296
12243
12747
13214
13739
13793
14061
15534
16589
17068
17992
18772
20986
22619
22679
22717
22811
22853
22897
23279
23384
24994
25889
26397
26534
27159
31131
31398
31555
32873
33570
36323
36420
36588
36908
37139
37217
37790
bp

79
59
295
422
154
96
134
226
35
110
116
60
226
120
64
46
73
207
133
39
55
500
371
382
553
551
2181
1479
37
38
92
42
44
382
105
49
83
197
65
170
185
110
118
267
154
135
51
134
190
197
52
200
gap

747
31
1087
73
301
58
367
1793
301
222
15
1450
200
1070
902
432
261
393
16
214
974
685
98
372
230
34
155
24
1
3
1
1
1
1
1562
813
312
73
456
3788
158
40
1052
544
2619
47
35
131
35
27
374
-
feature

BGLII
ORR1B
ORR1B
MER67C
RSINE2
ORR1B
ORR1B
B3
RSINE2
RSINE2A
B1
RMER17A
B3
B1
L1MD2
RSINE2
MUSID4
B2
B1
RMER1B
MUSID5
L1MA2
ORR1D
RLTR1IAP_MM
IAPEYI
IAPEYI
IAPEYI
IAPEYI
IAPA_MM
IAPEYI
IAPA_MM
IAPEYI
IAPA_MM
RLTR1IAP_MM
B1
MUSID6
LTR10A
LINE2
MUSID4
L1_MM
B3
B1
B1
RSINE2A
MER5A
B1
B1
B1
MLT2B2
MLT2B2
RSINE2
MLT1B

sorted

B1
B1
B1
B1
B1
B1
B1
B1
B1
B2
B3
B3
B3
BGLII
IAPA_MM
IAPA_MM
IAPA_MM
IAPEYI
IAPEYI
IAPEYI
IAPEYI
IAPEYI
IAPEYI
L1MA2
L1MD2
L1_MM
LINE2
LTR10A
MER5A
MER67C
MLT1B
MLT2B2
MLT2B2
MUSID4
MUSID4
MUSID5
MUSID6
ORR1B
ORR1B
ORR1B
ORR1B
ORR1D
RLTR1IAP_MM
RLTR1IAP_MM
RMER17A
RMER1B
RSINE2
RSINE2
RSINE2
RSINE2
RSINE2A
RSINE2A
lineage

rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent
rodent

Features of the sheep prion gene
dir

-
+
+
-
-
-
+
-
-
+
+
-
-
+
+
-
+
-
+
+
+
-
-
+
-
+
-
+
+
-
+
+
+
+
+
+
start

1
321
1038
2092
2194
2503
2684
3760
4132
11940
16433
16566
17747
18422
18758
18975
19211
19376
19782
19843
21580
23801
24708
27249
27347
27568
27721
27816
27932
28140
28343
28602
29349
30417
30807
31287

stop

316
539
1159
2192
2492
2553
2842
3958
4215
12038
16524
16654
18115
18709
18929
19181
19368
19781
19839
19866
21689
24187
24872
27346
27563
27711
27815
27906
28134
28336
28591
28652
29733
30545
30950
31412

bp

316
219
122
101
299
51
159
199
84
99
92
89
369
288
172
207
158
406
58
24
110
387
165
98
217
144
95
91
203
197
249
51
385
129
144
126
gap

5
499
933
2
11
131
918
174
7725
4395
42
1093
307
49
46
30
8
1
4
1714
2112
521
2377
1
5
10
1
26
6
7
11
697
684
262
337
-
feature

BOV2
MER21B
BOVA2
MER5A
BOV2
MER5A
BOVTA
MLT1F
MLT1F
ALTR2
BOVTA
BOVTA
BOV2
BTALUL1
L1A_OC
BOVTA
L1A_OC
BTALUL2
BDDF2
L1A_OC
MIR2
BOV2
BOVTA
ARMER1
NLA
ARMER1
LINE_CH
L1MA5
L1S2_SS
BTALUL1
L1S2_SS
L1MA9
L1A_OC
MER5A
MER5A
BOVTA

sorted

ALTR2
ARMER1
ARMER1
BDDF2
BOV2
BOV2
BOV2
BOV2
BOVA2
BOVTA
BOVTA
BOVTA
BOVTA
BOVTA
BOVTA
BTALUL1
BTALUL1
BTALUL2
L1A_OC
L1A_OC
L1A_OC
L1A_OC
L1MA5
L1MA9
L1S2_SS
L1S2_SS
LINE_CH
MER21B
MER5A
MER5A
MER5A
MER5A
MIR2
MLT1F
MLT1F
NLA
lineage

bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae
bovidae

Features of the cattle prion gene
dir

-
-
+
start

1717
2633
1830
stop

2093
2797
1878
bp

377
165
49
gap
feature

BOV2
BOVTA
 C_OC
sorted

BOV2
BOVTA
 C_OC
lineage

bovidae
bovidae
bovidae

Rat prion cytochrome c pseudogene

27 May 99 webmaster
The rat prion gene was sequenced by Saeki et al in 1994. The sequence of 8,494 bp region SEG_D50092S begins 2,831 bp upstream of exon 1 and ends 1,687 bp downstream of the coding region. The long intron 2 was only partially sequenced: 197 bp down from the 5' end and 628 bp up from the 3' end. The GenBank entry contains no feature analysis other than exon boundaries and poly A sites, though three associated articles make an excellent analysis of transcriptional regulation.

Subsequently, IY Lee et al. briefly noted a processed cytochrome c pseudogene in rat prion just upstream of exon 1 that presumbably delimits the promoter region in all mammals. As a comparable pseudogene is missing in mouse and no outsplicing mechanism is available, the insertion in rat occured after the two species diverged.

A great deal is known about cytochrome c pseudogenes in rodents and primate [due to R. Scarpulla]. Rats have a cytochrome c multigene family of approximately 30 processed pseudogenes representing three alternatative cytochrome c mRNAs of sizes 1400, 1100 and 700 (length heterogeneity is in their 3' noncoding regions). One pseudogene class has a complete 3' noncoding region including a short poly A tail but is abruptly truncated at its 5' end, 19 amino acid codons up from the translation terminator, where it is fused through 17 consecutive A's to a 1.3 kb truncated long interspersed repeat sequence flanked by direct repeats.

There is a single functional copy of somatic rat cytochrome c containing a 105-base pair intron interrupting glycine codon 56; none of the pseudogenes contains this or an upstream intron, hence they are processed (reverse-transcribed from mature mRNA) with short direct flanking repeats and non-functional because of internal stop codons. This applies to the rat prion pseudogene.

Rats and mice additionally have an older single-copy paralogous cytochrome c with no detectable pseudogenes expressed during spermatogenic differentiation. The rat testis gene is 7 kb with three introns of 6.5 kb differing in position and size from the two introns of the 2.1-kb somatic gene of 0.9 kb. Humans have a single copy somatic cytochrome c gene nearly identical in both size and intron/exon structure to rodent somatic genes. The 11 known human processed pseudogenes are of two very distinct age classes, early and late. Some human cytochrome c pseudogenes have evolved at an exceptionally low rate. Human prion does not contain a cytochrome c pseudogene.

Despite the equimolar ratio of the three mRNA in rat tissues, nearly all of the 30 pseudogenes are derived from the 1,100-nucleotide mRNA. The rat prion pseudogene belongs to this commonest class. It is not identical to any of the previously sequenced rat pseudogenes but might be one of the 30 identified by hybridization. The rat testis cytochrome c gene can be eliminated as parent of the prion pseudogene by ClustalW alignment.

An odd feature of rat prion, not noted earlier, is the presence of a 111 bp disruption of the cytochrome c pseudogene at position 2174-2266. Jerzy Jurka has identified this as a 99 bp + RDRE1_RN plus flanking regions, a very common repeat element in the rat genome. The site of insertion bears no relation to the site of out-splicing of the cytochrome c intron. The obvious sequence of events is mouse-rat divergence, followed by pseudogene insertion in prion gene, then RDRE1_RN insertion within the inserted pseudogene, ie, the repeat element insertion is the most recent event.

The other four retrotransposon elements upstream of rat exon 1 cannot be dated by this method and pre-date the mouse-rat divergence since they are present in both lineages. Hamster sequence does not extend far enough and comparable regions are missing from other mammals.

Hamster prion would be a very instructive sequence. Unfortunately, the best sequence in this region, M14055 , does not extend up to the pseudogene or repeat insertion regions. In using Blast on a pair of sequences of this type, it is vital to lower gap penalties, say 2 for opening and 1 for extending. Gaps are just as common as point mutations in non-coding sequences and so should be weighted similarly. (The NCBI Blast service supports custom gaps as an advanced option: paste in "-G 2 -E 1" and shut off filtering) The hamster sequence gave 310/470 (65%) identities using generous gaps; using default parameters, two strongly conserved regions exhibit 153/181 (84%) and 31/35 (88%) levels of conservation.

promoter region
rat  33 bp +RSINE2     445- 477     mouse  35 bp +RSINE2   7314-7348   
rat 140 bp -RSINE2     742- 881     mouse 110 bp -RSINE2A  7649-7758   
rat 133 bp +B1        1136-1268     mouse 116 bp +B1 MM    7980-8095    
rat  67 bp -RMER17A   1275-1342     mouse  60 bp -RMER17A  8110-8169   
rat  99 bp +RDRE1_RN  2174-2266     mouse  ----   ----      ----  

intron 1
rat 206 bp -B3        3864-4069     mouse 226 bp -B3   9619-9844                  
rat 127 bp -B1        4302-4428     mouse 120 bp -B1F 10044-10163                    

post 3' UTR
rat 117 bp -B3       [2943-3059]*   mouse 185 bp -B3  30947-31131 or -B3 9619-9844 or 1 -B3 5296-552

*numbering of rat switches from D50092 to D50093 fragment; the rat polyA site is at 2625
Masking the repeat element RDRE1_RN improves alignment of pseudogene and parent cytochrome c gene K00750. Overall, 848/917 identity (92%) is found with minimal gapping. However, 38 bp of exon 1 5' UTR are missing from the pseudogene; this bears however no relation to out-splicing site of intron 1 of cytochrome c.

Masking both the pseudogene and its insertion also improves alignment with mouse. The other 4 retrotransposons and the motif regulatory region serve as reliable flanking anchors for this alignment. However, these repeats are rodent lineage only and are lacking from human and sheep so should be masked prior to alignment there. An portion of the rat gene upstream of exon 1 annotated for all the features mentioned is shown below:

One measure of this pseudogene's history is the accummulation of stop codons (2), frame shifts (0), point mutations (11), deletions (3bp x1), and insertions (3 bp x1) relative to the 105 residues of functional cytochrome c. Note rat and mouse functional proteins are identical, so the pseudogene unsurprisingly is evolving much faster. The insertion event in the prion gene thus might be dated by these events to roughly ten million years ago.The pseudogene could never have been functional because a very old B1 repeat element occupied the place of a promoter. Thus, the functional rat prion promoter -- and by homology, the mammalian promoter --must reside in the first 452 bases upstream of exon 1 because 1,245 extraneous bases precede this.

Outgroup arbitration applied to prion promoter

28 May 99 webmaster
To determine those sequences upstream of exon 1 important to prion gene regulation, researchers commonly align various species and take conservative stretches as important. The difficulty here is that introns and promoters require much more extensive gapping than coding sequences because frame shifts are not a significant issue. As gap opening and gap extension software parameters are arbitrary (or pertain only to coding sequences), this leads to loss in confidence in alignments.

Outgroup arbitration is a topologically driven consensus technique proposed here to sharpen these alignments. It requires sequences from 3 or more species clamped to a definitely known topology of divergence. In the case of the prion promoter (hamster, (mouse, rat)) or (rat, (mouse1, mouse2)) are the known trees. The geometry of the tree is most favorable if branch lengths are approximately equal. For example, had (mouse, rat) divergence been too recent relative to the rate of mutation, few opportunities to apply the technique would exist; if the divergence had been too early, the tree becomes a trichotomy and alleles may drift differently than speciation.

Outgroup arbitration works because two rare events are far less likely than one rare event -- a small number squared vs a small number. The method is much stronger than simple consensus but does require knowing the topology of the relevent tree. When the context allows for slippage of tandem repeats, confidence is down-weighted as these are evidently rather common.

The tree below illustrates the case analysis. The outgroup can resolve feature ambiguity in the ingroup when it can cast a deciding vote. In nested outgroup arbitration, the restored sequence can then serve as a noise-filtered outgroup to, say, (cow, sheep) as the process is repeated. The table below lists all combinations of scenarios (either gaps or point mutations; may include sequencing errors). Outgroup arbitration is only applicable to half of these cases. The others must be deferred to a more distant outgroup.

Sequence Feature Analysis by Outgroup Arbitration
Distribution of FeatureAllocating events to rodent tree
hamster mouse rat Interpretation
+ + + none
+ + - rat deletion
+ _ + mouse deletion
- + + deferred
- - - none
- - + rat insertion
- + - mouse insertion
+ - - deferred

The prion promoter sequence upstream of exon 1 is surely bounded by the 1254 extraneous bases inserted in the rat prion. This leaves 454 bp between exon 1 and the cytochrome c pseudogene to analyze. That same region has been sequenced (several times) in mouse and no additional insert sequences occur. Regretably, hamster sequence M14055 stops short with 339 bp. This limits the opportunity (by 115 bp) to fully define conserved areas within the rodent prion promoter.

Nested outgroup arbitration is surprisingly effective when applied to (hamster, ((mouse1, mouse2), rat)) as seen below. Some 39 point mutations can be resolved. An unsuspected 22 bp insertion of high G content is found in rat prion promoter at site 290 that is missing in hamster and mouse: it is unlikely that hamster and mouse had separate deletions at the same site. Topology rules: had the extra bases been found in hamster, nothing could have been inferred. Mouse has a 16 base insertion with high C content beginning at 365 lacking in rat and hamster. At 394, a 10 bp deletion is seen in rat. The figure shows how several of the 40 point mutations resolved. Many point events and gaps do not qualify under the rules of arbitration; their resolution is deferred.

.

D50092 rat 2380-2831 -- M14055 hamster 1-411
U29186 mouse1 8193-8573 -- U52821 mouse2 757-1138 -- Y17510 mouse3 4-310 -- X79932 mouse4 10-273

The prion promoter is somewhat mysterious in that even with filtering of insertion elements and a strong reconstruction procedure to suppress noise, the rodent sequence still aligns poorly in places with other mammalian prion promoters. (This was noted earlier by IY Lee.) The reference sequence above as query only calls forth a 185 bp stretch below of human prion U29185 with Blastn (set generously). There are no interfering insertions in human prion in this vicinity. Bovine and sheep prion promoters are also brought out by this particular stretch. These sequences are well-anchored in the so-called motif region.

The best rodent promoter region that emerges from this process (delimited 5' by cytochrome c pseudogene and 3' by exon 1, lower case bases inferred, best human homology zone in blue:

>rodent reference promoter by outgroup arbitration;  516 bp
agaAggGGAAAAGAAGGAATCAGCAGAGAATAAATAAGTCAACATGCAATGGCCAAtATACTTTCTAGGCCTCTAATTCTTTTATAGTTT
GTGGGAAAATGTCgAAAATCTTCctctTTACCAATTTCTTGTTACCAAAGTTtCaACGATGGCTTtcTCgcTCCGTTAGGTAACCTTTCA
TTTTCTCaACTaCCCcATTATGTAACGGGAGCatTGGGTtCTGGATCAGTCTTCCATTAAAGATGAtTTTTATAGTcggTGAGCGtCGTC
AgGGAGTGcTGACACTGGGGGcGGtTTAAACAGATACAAGCATTTAAGCCAGTcCGGAGCGGTGACTCATTCCCcCCGCGAGAgACGCGG
CGCGGCCATTGGTGAGCACgacgCAaGCcCCGCCCCACccaGCCCGGCCcCGCcCTgCtaCCtgtCctGcCcCtctcCCCcgCCcGCTCC
CCCGCGttGTCgGAgCAGCAGACcGAgaaggcactTCgaggCgcttcgTCGCgTCGGTGGCAGGT
>rodent intron 1: the 3' end + exon 2 by outgroup arbitration; 40 improvements, 185 bp
gaacgtgccatgtttgcttttgggaatctatctgagctgttcttatttccAtTTtcaaatactgccccatttttATgTgCCtgtAtttaG
TaGtggTttgAtAatttgTatattagatGGtAtttcagttctcagacttattcatcaattctagtttttctttttgttgttttaaaggac
tcctgaAtatattccaAaactgaaccatttcaaccGaGctgaagTattctgTTTttctagaggtaccagtccAGtttaggagagccacag
cagatc

Alignment of cyt c pseudogene delimited pre-exon 1

webmaster 30 May 99
The fortuitous occurence of a pseudgene in rat prion suggests that all active regulatory and transcription initiation control regions will be found downstream of the insertion site, in all mammals by homology.

rat 170 + motif 1 + 318
gaaaggaaagaagggaatcaa
ccgagaatgataaaccaacattcaatggccaatatactttctaagcctctaattctttta
tagtttatggggaaatgtcaaaaatcttcctctttaccaatttcttgttaccaaagttcc
acgatggctttttctttccgttaggtaacCTTTCATTTTCTCgactacccattatgtaac
gggagcgctgggttctggatcagtcttccattaaagatgacttttatagtctgtgagcgt
cgtcacagagtgctgacactggggtggggaggggagtacggggggagggggttaaacaga
taacaagcatttaagccagtacggagcggtgactcatcccaccgcgagaagccattggtg
agcatcacgctccgcccctcgccccgcccagcccccggcctgtcgggtccctcaccacgc
cccgctcccccgcgttgtcagagcagcagacggagtctgagcgtcgcgtcggtggcaggt

mouse 231 + motif 1 + 322
ctatggga
ttgtctccaaagataaagaaaaaagggaaggagagaaaagaaaaagaaaggaaagaaggg
gaaaagaaggaatcagcagagaataaataagtcaacatgcaatggccaatatactttcta
ggcctctaattcttttatagtttgtgggaaaatgtcgaaaatcttcgttaccaatttctt
gttaccaaagttcaacgatggcttcctcgctccgttaggtaacCTTTCATTTTCTCaact
acccattatgtaacgggagcattgggtactggatcagtcttccattaaagatgattttta
tagttgctgagcgtcgtcagggagtgctgacactgggggcggtttaaacagatacaagca
tttaagccagtccggagcggtgactcattcccccaccccccacccccccgcgagagacgc
ggcgcggccattggtgagcatcacgccccgcccctcgcccagcctagctcccgcctgccc
cgcccctttccactcccggctcccccgcgttgtcggatcagcagaccgattctgggcgct
gcgtcgcatcggtggcag

hamster 73 + motif 1 + 324
tcgaaaatctccctctttagcaatttcttgctcctagagtttcagcaattgctttctcgc
tccattaggcaacCTTTCATTTTCTCaccttccccattatgtaacgggagcaatgggttc
tggaccagtcttccattaaagatgatttttatagtcggtgagcgccgtcagggagtgatg
acacctgggggcggtttaaaccgtacaatcccttaaaccagtctggagcggtgactcatt
tccccagggagaagtggcgcggccattggtgagcacgacgcaagccccgccccacccagc
ccggccccgccctgctacccctcctgactcactgccccgcccgctcccccgcggcgtccg
agcagcagaccgagaaggcacatcgagtccactcgtcgcgtcggtggcag

human 223 + motif 1 + 411   start 12121 
taatatcaccacctaaatcatctcttgcctaaaacaaggagtagaaagtgaatgaaggaa
ggaacaggtgatggtcagtgtcctttctacgcctcaaaatttaagagtttatgtgaaaat
tcataaatattaatctcaatccaggttaagcaaaattttttgctctcctctttagaaatt
tctggttgccaaagttccagaaattgcttcctcattcctgagcCTTTCATTTTCTCgatt
tctccattatgtaacggggagctggagctttgggccgaatttccaattaaagatgatttt
tacagtcaatgagccacgtcagggagcgatggcacccgcaggcggtatcaactgatgcaa
gtgttcaagcgaatctcaactcgttttttccggtgactcattcccggccctgcttggcag
cgctgcaccctttaacttaaacctcggccggccgcccgccgggggcacagagtgtgcgcc
gggccgcgcggcaattggtccccgcgccgacctccgcccgcgagcgccgccgcttccctt
ccccgccccgcgtccctccccctcggccccgcgcgtcgcctgtcctccgagccagtcgct
gacagccgcggcgccgcgagcttctcctctcctcacgaccgaggcag

sheep 278 + motif 1 + 324
gttttatacaagaaatctcaggtaatgtgcagaatggacttgttaaatggagtgcatttc
cctcacttatgaatatcataatctaaatcatttactttgtaaataatgagcaggaactga
gtaaatgacggcaggtgatggctaatatcctttctaggcctcaaattttaatctgaaaat
tcacaaacattgggcttaatccagggtagtagaatttttgtccttttcagaaatttctgg
ttaccagagttcccgaaattgctttctcattccctaatCTTTCATTTTCTCcattacgta
acgagaagctggggctttggccgattttccctctaaagatgatttttatcgtcaacaagc
aatttcagggagtgatgagccagggaggcggtgttagttgatgctagcgtttatgctagt
ctcaactcgtttttcccagggacttagattcctgggtctgccggtaaaccccgggcgccc
gcagcgggcgcgcctgagcgtgcgcgcgccgtcgcctccccccccccgcagctcctcctc
tgcacggcgactcaccagccctagttgccagtcgctgacagccgcagagctgagagcgtc
ttctctcccagaggcag

cattle 513 + motif 1 + 328 start 1
aagcttagtggagcctcttgcccataacaaggggactagatatttcatttttcccaggtt
tatatccatttccctggcataattaatattggtactctcaaaagtgcccaaatttgggta
atgatatatatgatccctctaaccctaacacatgtcttctatcacttgccatccttcaca
tgagacaaacccctacataaaattttggcagtaataatgatcaagtacacaccatgtttt
atacaagaaacctcaggtaatgtgctgaatggacttgttaaatggagtgcatttccctca
cttatgaatatcataatctaaatcatttattttgtagataatgagcaggaactgagtaaa
tgacggcaggtgatggctaatatactttctaggcctcaaattttaatctgaaaattcaca
aacattgggctcaatccagggcaatagaatttttgtcccttttagaaatttctggttacc
aaagttccagaaattgctttctcattccctaatCTTTCATTTTCTCcattacgtaacgag
aagctggggctttggccgattttccctttaaagatgatttttatcgtcaacaagcaattt
cagggagtgatgagccggggaggcggtattagctgatgctagcgtttaagctagtctcaa
ctcgtttttcccagggacttagattcctgggtctgccagtaaaccccgggcgccggcagc
gggtgcgcctgagcgtcgcgcgcgccgtcgcctccccgcccctgcccctcctcctccgcc
cggcgacttacccgccctagttgccagtcgctgacagccgcagagctgagagcgtcttct
ctctcgcagaagcag

Ancestral size of intron 1

29 Mar 99 webmaster

Intron I has been sequenced from 5 species and the intervening retrotransposons determined by several methods. The length averages 2,401 bp with 221 bp of intervening sequencing, leaving 2,180 as the reconstructed ancestral length. However, after discounting retrotransposons, non-rodents average 2,386 bp while rodents average 1,871 for a 515 bp difference. Proposed retrotransposon events include two in the ancestor of mouse-rat, three in humans, four in cattle, and three in sheep and are shown in the table below.

Somewhat confusingly, an L1M-MIR -L1M within intron 1 is suggested in human and sheep by Lee (in Figure 2) but is not in their GenBank entry; only human is confirmed by Jurka (more precisely as L1MD2, MIR, L1ME3A ). We shall see shortly that human aligns strongly with sheep in this region, validating Lee on sheep. Cattle is strongly homologous to sheep along all of intron 1 and so deserves a similar annotation. Rodents present a complication in that more recent disruptive rodent-specific B1 and B3 elements must be removed before any comparisons can be made.

Jurka identified a 49 bp element C_OC in cattle but not in sheep. However, the alignment is excellent other than for one puzzling stretch, T's in cattle, C's in sheep. This is treated as a homologous 45 bp C_OC element in sheep here. Human, rat, and mouse lack signficant alignment here.

ttttttgctgtcctttttcgtttttgttttttttttttccttttcttt C_OC cattle element 1830-1878
ttttttgctgttctttttcgttttttcccccctct----cttttcttt C_OC sheep  element 6682-6726

Intron 1 has been rather conserved in length (9% increase) over the last 100 million years of evolutionary time, in marked contrast to intron 2 which has grown by 40% in size. Notice however that the rodents are discordant with the other mammals averaging 1,871 bp vs 2,386 bp. This suggests either a 515 bp deletion in the rodent lineage (or numerous smaller deletions summing to this) at the upstream end of intron 1 or an unidentified 515 bp insertional element in other mammals.

Intron 1
specieslengthstartstopinserts net
bovine 2442 856 3297 49: -C_OC 1830-1878 2393
sheep 2423 5718 8140 45: -C_OC 6682-6726 2423
human 2720 12768 15487 378: +L1MD2 14413-14514, +MIR 14583-14657, +L1ME3A 14752-14952 2342
rat 2230 2879 5108 333: -B3 3864-4069 , -B1 4302-4428 1897
mouse 2190 8659 10848 346: -B3 9619-9844 , -B1F 10044-10163 1844
average 2401 - - 221 2180

>intron  1 bovine    856..3297  D26150
                          gtaaa tagccgcgta gtcctttaaa ctcccagcgg aggacgccaa
      901 ccctgggtct tgcggccgag gcccagggac ccagccgaat cggattggtg ggaggcagac
      961 cttgaccgtg agtagggctg ggggcttgcg gcgggcgcgg ggaacgtcgg gcctgttgag
     1021 cgtgctcgtt ggtttttgcc agccgccgct cggttttacc ctcctggtta ggagagctcc
     1081 atttactcgg aatgtgggcg ggggccgcgg ctggctggtc cccctcccga ggtatgtggg
     1141 tggtgtgtag gaatctagcc ccctcccacg ctcgtccact gcgggagtgg gatgggcgaa
     1201 tcgcaccggt agaggggccg cagtcgagga accgctgggg acctcagaag aacaagggcg
     1261 agcccgggat ttgggccctc ccgaagccca gaggagtcgc ggaattgggg gtgggggtgg
     1321 tggggaagaa acgggcgcca acggggcccg acctcggcgg tgaggagtgc cggagcatcc
     1381 gtgggccccc agccgctgct gccgaactcc tcccgagagg cggccctgcc tgccatcacg
     1441 cggctgggag gtacctgggt agccgcagcg ggtgggtctc tggcaacccc ccggggatcg
     1501 gctctggcgg gcgtacgtgg cctgggcttc agcctcggcg cggggaatca tgggccacct
     1561 ggcgctctct ccgggccaga gaaatccagg taccgggaac agtgtttcct gggagctctg
     1621 atgtggtgga cccaaaagca aagcgaaatt ttccctgtct cgactgatcc tccagaagga
     1681 gggaactcgg ccgtcaggag actgagggga ggggattcag gcgcctctca gagaaccacc
     1741 ctcatctgcc agtaagggtg gcaccttcac gcttgatttt tttttttttt tcccctcaca
     1801 cgtttgatta ttaaacaacg agaagtccgt tttttgctgt cctttttcgt ttttgttttt
     1861 tttttttcct tttcttttgg taccatatgt agcaaataga ttttttaaaa tcataagccc
     1921 accaccctca ccatcttttt ttcagtttcc tcgtctccag attcttaaca acaaagcagt
     1981 ttcacctccc tgatcatggt tatccttatc tcatggccgg gttattttct tgtacttaag
     2041 agcaatcacg ttttattaag cagttccccg aatgctgaac ctttgaagtg ttacctttcc
     2101 ttacaaaaga taccacatag aataggatta aaaattttca caagttgtca gagaaaaata
     2161 ggaacagaaa attgtataaa aatgtcagac ctctggaaaa tgaacagctc tctcagattt
     2221 gaaaattaac ctatgaaaag gaacagtttt cctacggaaa cattgaggtg ctctaacaat
     2281 gaaaaagaat cagaaaagga aaaaaacaga gttaggatgt gatttgtata tgatttgtat
     2341 ctgatgcaaa tttttcatac ttgtgaaaga aaaatatcaa gattataaaa agataaatgg
     2401 tgaaatgaac aatcatttat gaaataaaat acaaatcaaa gcaagtctgg atttacaact
     2461 actagtaaaa acaacagtaa cagcaaccac ttctggaaag ttacctagaa atttgcatat
     2521 tcagtatgtg aggtggcaag gctttggagt tagaaatatg gtctgcaact aattttacaa
     2581 tttgggacct aatttcctca tccccctttt ggacattcat aaaatagagg aaattatacc
     2641 tacttcagag tttgccaaga ttaactgtgt aaaactgacc tttagtgtgt atacttttat
     2701 tcttttccta gtcacactgc actgggggac gttgtgaatc tgtatgaaat ttgtgaaaaa
     2761 cagtcaggtg atcctttaag ccatgaccct aaaacccact cctgggaact tacctgtaat
     2821 ggaggaaacc aggaaagaag aagaaaagct gcattcaccc acagaactca gaatgatcta
     2881 aaattagatc cagtccggag tcaacctaaa tgtattaata aaatagcagg gcagcagcta
     2941 agaaaatcat agcactttaa ctgaaaggaa cattgtgtaa ccatcacgag tcataatttt
     3001 agagcctctc tgtgatatac aggaaaaaac tgacaggtca aagtaagatt actcagacat
     3061 ggatgcgttt gtggaaaatc tgaatgaaaa atgaatccac agtttgctgt gtatgggagg
     3121 agagttcagt gtcacgtttg ctgctttttt taagttagca tcatctcttt tttaaaaata
     3181 ctatcatatt ttttccctga gtagattcat tagtggttta ataatttata tactgttatt
     3241 ctgttaaata atccgttctt agatttatca attatagttt tttctttttt ttttaag
>sheep intron 1 U67922  5718-8140
                            gta aatagccacg tagtccttta aacccccagc ggaggccgcc
     5761 cccggcttgc ggccgaggcc ctagggcact cagccggatc ggactggctg ggaggcagac
     5821 cttgaccgtg aggaggactg ggggcttccg gcgggcgcgg ggaacgtcgg gcctgtttag
     5881 cgtgctcgtt ggtttttgcc agccaccgct cggttttgcc ctcctggtta ggagagctcc
     5941 atttactcgg aatgtgggcg ggggccgcgg ctggctggtc cccctcctga agtatgtggg
     6001 tggtgtgtag gaatctagcc ccctcccacg ctcgtccact gcgggagtgg catgggcgga
     6061 tcgcaccggt agaggggccg cagtccgagg aaccgctggg gagatcagaa gaacaagcga
     6121 gaggccccgg gctctgggcc ctcccgaagc ccagcggaga cgcggaattg ggggtggggg
     6181 gtggggaaga agcgggcgcc caacggggcc agacctcggc cgtgaggagt gccggagcga
     6241 ccgtgggccc ccagccgctg ctgccgaact cctcccgaga ggcggccctg cttgccatca
     6301 cgcggctggg aggtacctgg gtagccgcag cgggtgggtc tctggcagcc ccctggggat
     6361 cggctcgggc gggcgtgcgt ggcctgggct tcagcctcgg cgaggggagt catgggcgac
     6421 ccggccctct ctccagagaa atccaggtac cgggagcagt gtttcctggg agctctgatg
     6481 tggtcgaccc aaaagcaaag cgatattttc gctgtctcga ctgaaggagg gaactcggcc
     6541 ctcaggagac tgaggggagg ggatcaggcg cctcttggag aaccaccctc atctgccagt
     6601 aagggtggca ccttcacgtt tttttttttg ttgttgttgt tttctcacac gtttgattat
     6661 taaacaacga ggagaagtcc gttttttgct gttctttttc gttttttccc ccctctcttt
     6721 tcttttggta ccatatgtag caaatagatt ttttaaaatc ataagaccac catcctcacc
     6781 atcttgtttt tcagtttcct cgtctccaga ttcttaacaa agcagtttca cttccctgat
     6841 gatggttatc ctcatctcat ggccaggtta ttttcttgta cttaagagca atcactgttt
     6901 attaagcagt ttcccgaatg ctgaaccttt gaagtgttac ctttccttgc aaaagattcc
     6961 gtatagaata ggattaaaaa ttttcacaag ttgtcagaga aaaataagaa cagaaaattg
     7021 aataaaatgt cagacctctg gaaaatgaac agctttctca aatttgaaaa ttaactataa
     7081 aaaggaacag ttttcctacg gagacactga ggcgctctca gtgaaaaaga acgatgaaaa
     7141 agaaccagaa aaggaaagaa aacggagtta tgtatatgat ttgtatctga tgcaaatttt
     7201 tcatacttgt gaaagaaaaa tatcaagatt ataaaaagat aaatggtgaa atgaagaatc
     7261 atttatggaa taaaatacaa atcaaagcaa gtctggatta tcgttttaca actactagta
     7321 aaaacagtaa cagcaaccac tcctggaagg ttacctagaa atttgcatat tcgtttatgt
     7381 gaggtggcaa ggctttggag ttagaaatat ggctctgcag ctaattttac aatttgggac
     7441 ctaatttcgt catcgtcctt ttgtccattt ataaaataga ggaaattata cctacttcag
     7501 gagtttgcca agattaactg tgtaaaactg acctttagca tgtatacatt tattctttcc
     7561 ctagtcacac tgcactgggg gacatttgtg aatctatgaa atttgtgaaa aatggatcct
     7621 ttaagccatg accctgaaac cccactcctg ggaacttacc tgcaatggaa gaaattcgga
     7681 aagaagaaaa gctgcattca cccacagggc tcagaatgat ctaaaattag atccagtcca
     7741 gagacaacct aaaggtatta agaaaatagc agggcagcag ctaagaaaat catagcactt
     7801 taactgaaag gaacattgtg taacccatca cgtggcataa ttttagagcc tctctgtgat
     7861 atataggaaa aaagtgacag gtcaaagtaa gattactcag acatggatgc atatgtggaa
     7921 aatctgaata aaaaatggac ccacagtttt ctgtgtatgg gaggagagtt cagtgtcatg
     7981 tttgctgctt ttttttagtc agcgtcatct cttttaaaaa tactatcata tttttttcct
     8041 tgagtagatt cattagtggt ttaataattt atatactgtt attctattaa ataatccgtt
     8101 cttagattta tcaattatag tttgtttttt tttttaagga
>human intron 1 U29185 12678-15389
                            ccc cctcggcccc gcgcgtcgcc tgtcctccga gccagtcgct
    12721 gacagccgcg gcgccgcgag cttctcctct cctcacgacc gaggcaggta aacgcccggg
    12781 gtgggaggaa cgcgggcggg ggcaggggag ccgcgggggc cgagtgagga ccccgggcct
    12841 cgggtcccag gcgcaagggt gcccggccgg gcggggtcgg gaccccagtg aggaggggcc
    12901 gggggctgcc ccgcgggcgc gtgacggtct cgggcctgcc cggctgcgct ggtctccgct
    12961 cgggtgaggc ggcttggctt cgcttttcag gttaggaaag ctccctttac tgcgcgttgg
    13021 ggggctgggg gagctggcgg agccacgtta gggaggtcgg tggcgccggg gtgtctcagc
    13081 gccccctgca ccccgcgcgg gtccggccca gcgggcgatc gctggcgccc agggaactcc
    13141 gggagggccg ccagcgggct ccgcaggcgc ggggcgggga ggggcgcctg ggggccgcgg
    13201 ggctcgcgct ccccgcccgt tggccgcccc tcggaggccg agatcggggc ccagaacgcc
    13261 ccttggcaaa gcctggcgct tccgcgatgc ccagagggtg cttgggggga tggagagagg
    13321 ggcgcccgcc ggggtagttc cgggagcctc ggtgcctccc gccgcagctg cagcgttcct
    13381 cccgggaggc ggcccagccc ttcatcctcg ccgcctgagc ttctccgagg ggggctgcag
    13441 ccttgcggcc gttgccaccg cctggagaag cggcccacgc ggactgacgg gcgggggcgg
    13501 ggcctcgggc ctcggcgggg gcggggtccg gggaggcccc accctctgtt ctccaggggc
    13561 ggggagagag gagctgcagg tctgcggcct ggccccaggt gcgatggcgg accccagctt
    13621 ggccagtcac attcctccca gtccccctgg agggagaacg ctggccatgg ggggctccaa
    13681 ggaacaacca gcctcggatg acgacccttg ggtcaccggt ctccccacct gtgcggcagg
    13741 cgccttcacg tttcattatt aaacaatggg gagaaatcca tgtttactgt cctttttagg
    13801 aattttttgc tcttctcttt gaggtggctg taggaaatag attttttttt taacctcgca
    13861 attccaccac ggtcacatcc atcctcgcca tcgcagagcc acagctctcc gtttttgttt
    13921 cctagcctcc agattctcac acaacacagt gcagtttcac tgctgtaatg atgaggatct
    13981 tcatggccgc gttattttct tgttctgaga gcatcacggt ttaattagca gttccccata
    14041 tgatttgaag tgtttcccgt ttccttaggg aaaactcctg gtagaatagg attaaggatt
    14101 tttacaaata taattatcaa aaacatagga acagggaatt ggataaatat gttaaacttc
    14161 tggaaaaatc aacaacgctc ttagatttgt agaagaaagg aaaaaatcac cagtggaaag
    14221 gagcaatttt acttacacaa acacagagaa ggtcttacag tgaaaaaaag ctaaccagta
    14281 aggggaaaag caggcagagg ggtaggatgt gatttgtatg ttatttatat ctaacacaag
    14341 tcttccacac cgaaaggaaa atattaagat tataatagat aaatggcaaa atgatgagtc
    14401 atttacacaa taaaatgcaa attagagcat gtttgggtta tcattttaca tctattaaaa
    14461 taaccaaaat aattaatagt aacagcaacc cttgctggaa ggttgcccaa aacttggcat
    14521 tttcaagtgt ctggggaggt ggcagggctt tggggtcaca aagatggttc tgcagtcaat
    14581 tttgtgacct tggacaggct acctaatttc ctgatcctcc ttttgtccat tcatagaatg
    14641 gaggaaatga tagctacttt ctgcgtctgt atgtatgagt tattgggggc atttcgaacc
    14701 agtgacaaac attttgttaa gcaatctggt gatgcattaa gaagctggaa gctgtgaccc
    14761 agaaacccca ctcctgagaa cttacctgca atggaagaaa caaacaaaca aaaacaggca
    14821 tgtattccta gcagaatgat ctaaaattag aacacctgga aaagagccta aatgtataac
    14881 accagggcag tagctaagaa aattatgaca cattaactga aatgaacatt atgtaaccac
    14941 taaaaatcat gattttggag cctgtgatat gtggggaaaa actgacaagt aaaaaagtgg
    15001 gttattaact gcacctgctt actctaacgt gaacgcatat gtgaaaaatc tgaaaggaaa
    15061 agcacagaaa atggacgttt tcattgaaat tgtcggtgat cttaattttc ctttggtgaa
    15121 tatattgctc tcactaaggg cgtttaaaaa atagttcaca ggttttaatt ttttagatga
    15181 aatggaccca cagttttctg taagagaaag gagagattgt tatatttgct acttagaata
    15241 aaagattttt agccaacgtt gtttcctttt tcaaatattt ttccattttt ttagttgatt
    15301 aatgatttag taatttgtgt attgggtttt tttaagaatc agttcttaga ttcatttatc
    15361 aattctagtt ttttgttgtt gtttttaag
>rat intron 1 D50092
     2879 gtaagcgggctg ctgaagccag gcgtcagcga gcattcagcc ttcctcccgt cgacaagctc
     2941 ggcttactgt gcctctccgg gacttgaggc cgcggggctg ggactggggt tgagcttggc
     3001 taggaggtgg ctgtgcaccc gctgtgcgcg actcctggag ggaccgaatc ccagggcagc
     3061 gaggccggga gccgagcctg attcacagct caacatcgct gtgggggatg gggggttggg
     3121 ggggtggcat cttttaactg ccctgtgctg ttttcttctc tcgttgtaat agctacagcg
     3181 aacataattt caccccgtga ttccaccacg gtctcatccg tcctcagcac cacactcatt
     3241 gctccccttg ctcagtttca tactcagcgc agccgttcgc cttcactgcc ctgcctaggc
     3301 gttttcatgg ttgtcttata ttcttttact ttgaatatcg tggtttaata gcagttgccg
     3361 gtgtgctaaa ttcctcattt ccttaagaga aactcctggg aggatggaat taaagacgtt
     3421 gcaaatttaa ttataccaca aacaggaatc aaaattttgc attaaaatgc cagacatctt
     3481 gaaaaattta actattcaat aaaaaaaaaa aaggaactac tttacctaca cacacatccg
     3541 agtgcttcaa agagtccaag gaaatagaaa gctaagggat gatttgggtt gtatttgaat
     3601 ctgacacgag ctttccatat tatttatagc agggactgaa ggatgagtca ttttctgaat
     3661 aagatgcaaa ttaaagcaag tttgttgtct ttacatcgat taaacagaca gagatgatga
     3721 cagcagcaac cctaacctag aggttgtctg aaaccaccgt gttcaagttt ggggagcagg
     3781 tggccctcct taagagctcg attgattgct ttacaaccaa cgttatgact tggcattgcc
     3841 tggggttcct tttatttatt cctttcttta aaagactact atctatttta tgagcatgag
     3901 tgtttcgctc cacagaagca tgtatacaag cctggttctg cggaggtcag aagagacagg
     3961 gtgttggaag ccctggaact agagctaggg atgattctgt gagcccctgc cacaggggag
     4021 ctcagatccc caatccaggc tgtctggaag agcagccaga gctcttaact accgaacacc
     4081 ccccccccca tcccctctca ttcacattta gaaaggagaa aactgctacc catgtctggc
     4141 atttatttca gagattaact gtgcaaaact cgatgtgaaa gtatactatt ctgtttccca
     4201 gtcacactta gttgacagtg taagtcagta agggctttgg ttggttggtt tggttggttg
     4261 gttcctgggt tagtctggat gtgcttgttg agagctcaat aacaggcttt caatatggat
     4321 atgtagctgg gaattcgcta tgtagaccag gcaggcctca aatttgtggc aatcctccct
     4381 gtgattcccc agaatgccct ggtacaggca taagccactg tgcccagccg taaaacaatc
     4441 tggtgaggta ttattagttg catgctgtga cccagaaacc ccacttctgg caattcacct
     4501 gccgtggtgg aaccaacaaa gggctagggg agccatatgg ccaacagtta cagaaaatta
     4561 gatccaaggg aaaagcaacc taaatgttta acaggcgagc agctaagaaa ctgacaggct
     4621 cgtgagggag ctgtagcaat cccgaagaac actcttcatt ttagactcca tgtatccctg
     4681 ggaaaaacag agtcaaagta caggttagga gaccgggact cctctggacc catgctgtcc
     4741 tctgaaaagc ccagaagagc tataatgaaa gagctcagaa gatgtctgat cttggctttc
     4801 tttatgtttg ttgctgtatt gtttccacta acaaacaact aaaaaaaaaa aaaaagttca
     4861 caggcttctt tccttaaaat actggggatt gaacccaggg atagtttttt agtgtctaaa
     4921 ttaacatgac catgccctgt ttgccttttt ggagtatgtt tgaatctgcc cttatttcca
     4981 ttctcaaata ctgctccatt ttatatgact atttagtttt ggcttgataa tttgcatatg
     5041 agattagatc atctttcagt tctcagactt atttatcaat tctagttttt ctttttgttg
     5101 ttttaaag
>mouse intron 1 U29186 8659-10848
                              t aagcgggctg ctgaagccag gccttggcga gcactcagcc
     8701 ttccgtcgtc aagctcggct cactgcgcct ctcggggcct tgaggccacg gggactagga
     8761 ctgggactgg gactggggct gagtctggct gggaggtgac tgtacacccc ctgtgcgcga
     8821 ctcctggagg aaccgaatcc cagggcagcc aggccgggag ccagcctttc cttcccgagc
     8881 cagattcaca gctcagcatc gctggggatg ggggtggcat cttttgactg tccttggctg
     8941 ttttcttctc tctttgtagt agctacagcg aacataattt tacctcgtta ttccaccaca
     9001 gtcattactc ccttgcacag tttcattctc aacgtcgccg tgcgccttca ctgccctgtc
     9061 taggcgtttt catgattgtc tattttcttg tactttgaat accgtggttt aatagcagtt
     9121 gcgggtgcgc agaattctcc atttccttaa gagaaactcc tgggagaatg ggactaaaga
     9181 cgtgcaaatt taattatatc gcaaacagga atcaaaattt tgcattaaaa tgccaaacat
     9241 cttgaaaaat taactattca atgaagaaaa ggaactactt tacctacaca cacatccgag
     9301 agcttcgagg aggcgaagga aatagaaagc taagggatga tttgggttgt atttgaatct
     9361 gacacaagct ttccatatta tttatagcag ggactaaacg atgagtcatt ttctgaataa
     9421 gatgcaaatt aaagcaagtt tgtttgttgt ctttacatct attaaataga cagagacaat
     9481 ggcaacagca accctaacct agaggttgcc tgaaagtgtc aggtttggga acaagtggcc
     9541 ctgcttaagg gctagaaaga ttgctttaca accaacaatc atgacttgac attgcctggg
     9601 gttccttttg tctattcctt ttttaaaaga ctagtgttta ttttatgtgc atgagtgttt
     9661 tgcatccaca ttcgcctgta tacacacctg gttctgtgga ggtcaggaga gggtgctgga
     9721 tgccctggca ctagagccgt gaatggttat gtgagcccct gccacagggg agctcagaac
     9781 caaatccagg tcctctggaa gagcaaccag agctcttaaa acttctaagt atccctccat
     9841 cccctttcca tcatatttgg aaaggagaaa actgctaccc atgcctggca tttatttcag
     9901 agattaactg tctgtgtaaa acttgacatt gaaagtgcac tattctgttt cccattcata
     9961 cttagttgag actactgtaa gtcagttagg gctttttttg tttggttcct tggttagttt
    10021 ggagtgtgtt tgtgagctca ttaacaggct ttcagtatgt agctgaaatt tgctgtgtag
    10081 accagacagg cctcaaattt gtggcaatcc tccctgcatc ttcccagaat gccctggtac
    10141 aggcataaac caccgtgccc agcagtaaaa caatctggtg aggtattatt agtcgtgtgc
    10201 tgtgacccag aaaccccact cctggcaatt tactgggaag gaacaaacaa agggctaggg
    10261 gagccatatg gcctgcagtt agagaaaatt agatccaact gaaaaatcaa cctaaaggtg
    10321 taaaagccaa gcagttaaga aactgacagg ctcatgatgg aagccgaggc catcgtgaac
    10381 actcttcatt ttaggcccca cgtatcactg gggacaactg agagtcaaag tacaggtaag
    10441 gagaccaagg cttttcagga ctcaggctgt ctcagtgaaa agcccagaag agcagtaatt
    10501 gaaagagctc agacgatgtg tctgatctcc tctgtttgtt tgttgctgta ttatttccac
    10561 taacttattt gggaggaaaa aaaacagttc acaggcttct tttcttgaaa tactggggat
    10621 tgctgggatc gaacccaggg ataggttttt agtttctaaa ataacataga tcatgccctg
    10681 tttgcttttt ggaatatgtt tgcgctgccc ttattttcat gttcaaatac tgctccattt
    10741 tgcgtgactc tttagtattg gtttgatgat ttgcatatta gattagattg tatttcagtt
    10801 ctcagactta tttatcaatt ctagttttct ctttttgttg ttttaaag

>hamster M14055
        1 tcgaaaatct ccctctttag caatttcttg ctcctagagt ttcagcaatt gctttctcgc
       61 tccattaggc aacctttcat tttctcacct tccccattat gtaacgggag caatgggttc
      121 tggaccagtc ttccattaaa gatgattttt atagtcggtg agcgccgtca gggagtgatg
      181 acacctgggg gcggtttaaa ccgtacaatc ccttaaacca gtctggagcg gtgactcatt
      241 tccccaggga gaagtggcgc ggccattggt gagcacgacg caagccccgc cccacccagc
      301 ccggccccgc cctgctaccc ctcctgactc actgccccgc ccgctccccc gcggcgtccg
      361 agcagcagac cgagaaggca catcgagtcc actcgtcgcg tcggtggcag gtaagcggct
      421 tctgaagcct ggccccggga agggtgctgg agccaggcct cggtaagcct tcggcttccc
      481 agagccaagc ccggcttact ccggctctcg gggcgctgag gccgcggggc tgaggttgag
      541 tctggctggg aggtgaccgc gcacccgcag ccgcgcgtct ccttgaggga ccgaacccca
      601 ggagaggcca ggagccatcc cttcctcccg agcccggctc acccccagag tcgctcgggg
      661 atgggggatg ggggatgggg tggcatcttt tgactgtcgt ttgctgtttt cttctctctt
      721 tgtaatagct acagcgaaca taattttacc cagggttcca ccgtggtctc gtccgtcctc
      781 ggcatctctc agtccagtac atacccaagg 


>hamster U78769 177+ exon 2
        1 gaacgtgcca tgtttgcttt tgggaatcta tctgagctgt tcttatttcc gttttcaaat
       61 actgccccat ttttatgtgc ctgtatttat tagtggtttg gtaatttgta tattagatgg
      121 tatttcagta cttagattta ttcatcaatt ctaatttttc tttttcatgt tttgaaggac
      181 tcctgaatat attccaaaac tgaacaattt caactgagct gaagtactct gtttttctag
      241 aggtaccagt tcagtttagg agagtcacag cagatc


>mouse intron 1 8659-10848 without B1 and B3 elements, blue was sandwiched
                              t aagcgggctg ctgaagccag gccttggcga gcactcagcc
     8701 ttccgtcgtc aagctcggct cactgcgcct ctcggggcct tgaggccacg gggactagga
     8761 ctgggactgg gactggggct gagtctggct gggaggtgac tgtacacccc ctgtgcgcga
     8821 ctcctggagg aaccgaatcc cagggcagcc aggccgggag ccagcctttc cttcccgagc
     8881 cagattcaca gctcagcatc gctggggatg ggggtggcat cttttgactg tccttggctg
     8941 ttttcttctc tctttgtagt agctacagcg aacataattt tacctcgtta ttccaccaca
     9001 gtcattactc ccttgcacag tttcattctc aacgtcgccg tgcgccttca ctgccctgtc
     9061 taggcgtttt catgattgtc tattttcttg tactttgaat accgtggttt aatagcagtt
     9121 gcgggtgcgc agaattctcc atttccttaa gagaaactcc tgggagaatg ggactaaaga
     9181 cgtgcaaatt taattatatc gcaaacagga atcaaaattt tgcattaaaa tgccaaacat
     9241 cttgaaaaat taactattca atgaagaaaa ggaactactt tacctacaca cacatccgag
     9301 agcttcgagg aggcgaagga aatagaaagc taagggatga tttgggttgt atttgaatct
     9361 gacacaagct ttccatatta tttatagcag ggactaaacg atgagtcatt ttctgaataa
     9421 gatgcaaatt aaagcaagtt tgtttgttgt ctttacatct attaaataga cagagacaat
     9481 ggcaacagca accctaacct agaggttgcc tgaaagtgtc aggtttggga acaagtggcc
     9541 ctgcttaagg gctagaaaga ttgctttaca accaacaatc atgacttgac attgcctggg
     9601 gttccttttg tctattccttttcca tcatatttgg aaaggagaaa actgctaccc atgcctggca tttatttcag
     9901 agattaactg tctgtgtaaa acttgacatt gaaagtgcac tattctgttt cccattcata
     9961 cttagttgag actactgtaa gtcagttagg gctttttttg tttggttcct tggttagttt
    10021 ggagtgtgtt tgtgagctca ttaagtaaaa caatctggtg aggtattatt agtcgtgtgc
    10201 tgtgacccag aaaccccact cctggcaatt tactgggaag gaacaaacaa agggctaggg
    10261 gagccatatg gcctgcagtt agagaaaatt agatccaact gaaaaatcaa cctaaaggtg
    10321 taaaagccaa gcagttaaga aactgacagg ctcatgatgg aagccgaggc catcgtgaac
    10381 actcttcatt ttaggcccca cgtatcactg gggacaactg agagtcaaag tacaggtaag
    10441 gagaccaagg cttttcagga ctcaggctgt ctcagtgaaa agcccagaag agcagtaatt
    10501 gaaagagctc agacgatgtg tctgatctcc tctgtttgtt tgttgctgta ttatttccac
    10561 taacttattt gggaggaaaa aaaacagttc acaggcttct tttcttgaaa tactggggat
    10621 tgctgggatc gaacccaggg ataggttttt agtttctaaa ataacataga tcatgccctg
    10681 tttgcttttt ggaatatgtt tgcgctgccc ttattttcat gttcaaatac tgctccattt
    10741 tgcgtgactc tttagtattg gtttgatgat ttgcatatta gattagattg tatttcagtt
    10801 ctcagactta tttatcaatt ctagttttct ctttttgttg ttttaaag
Nearest human insertion features:
U29185        11511-11800  -Alu-Jo
              12634-12767   exon 1

Nearest bovine insertion features:
D26150         1830-1878   -C_OC
               <803-855     exon 1

Nearest ovine insertion feature:
U67922         3760-3958   +MLT1F
               5666-5717    exon 1           

Cow versus sheep

31 May 99 webmaster
The table below shows 3,404 bp of cattle prion containing intron 1 compared to sheep prion. There are no particular difficulties with the alignment; however, without the cervid sequence, there is not a close enough outgroup to allocate events to cow or sheep, respectively.

The 146 point mutations and 63 indels are uniformly distributed. The transitions to transversion ratio, 92:54, is approximately 2:1, similar to coding DNA. However the 144 bp of change due to deletions and insertions is identical to 146 bp of point mutation change; this ratio is much lower in coding DNA where frame shifts are more significant. The table below gives the point mutation matrix. It is read, "A in sheep occurred 918 times but was found to be C in cattle in 7 places, G in 29 places, etc."

ACGT-totals
A91872951859
C67629231957
G18878641949
T82279382865
-1016171714460
totals4253624984290
A918
AG47

C762
TC45

G786

ti92
T938

tv54146
total bp3404

x-144
GC45%


total290

The length distribution of indels falls off exponentially, with 53 of 63 events being 3 bp or shorter. Longer indels may arise from slippage of tandem repeats, eg, the overlapping repeat tgatttgtatctgat:

sheep  2299 gaaaxaaaacagagttaggatgtgatttgtatatgatttgtatctgatgcaaatttttca 2357
cow    7154 gaaagaaaacggagttaxxxxxxxxxxtgtatatgatttgtatctgatgcaaatttttca 7203
36  indels of length   1 totaling 36 bp
12  indels of length   2 totaling 24 bp
 5  indels of length   3 totaling 15 bp*
 2  indels of length   4 totaling  8 bp
 2  indels of length   5 totaling 10 bp
 2  indels of length   6 totaling 12 bp*
 1  indels of length   7 totaling  7 bp
 1  indels of length  10 totaling 10 bp
 2  indels of length  11 totaling 22 bp

63  indels of          9 types   144 bp
(* only 7 indels totaling 27 bp are not frame-shifting)
 




 




Mad Cow Home ... Best Links ... Search this site