Mad Cow Home ... Best Links

KIAA0168: nearest neighboring gene 3' of prion protein
Prion gene mapped in horse
Human prion nmr structure
Prion rods contain an inert polysaccharide scaffold.
Increased sensitivity to seizures in mice prion knockouts
38k form responsible for rapid anterograde axonal transport
Preprint servers off and running: BMJ makes NIH look stupid

Annotation of unfinished contig fragments

last updated:28 Dec 99 webmaster
The sequence neighborhood of a gene of interest can often be enlarged by Blastn(htgs) of its repeatmasked ends, that is, by looking in the unfinished contigs database for new sequence that overlap at an end. The overlap is often just one fragment in an unassembled contig set (which may contain contaminating vectors). Using various tricks, these fragments can sometimes be assembled (or partly assembled) in the correct order. Then the extension process is iterated.

In the example below, the neighborhood of the prion-doppel genes was enlarged with an overlapping contig, locus AL133354, adding 12,485 new bp (after adjusting for overlap of 2252 bp).

GenScan predicted six terminal exons for this stretch; these quickly led to a complete cDNA from a capped mRNA survey that coded for an uncharacterized protein. The mRNA, because it lacked numerous large introns, then was long enough to order 3 additional contig fragments. Blastp(nrp) and Blastn(est) showed no further exons were missing -- the match of contig genomic to database mRNA was perfect. This left 17 small contig fragments -- presumbably purely intronic -- that could not be assembled . The annotation can still successfully proceed under these circumstances using GeneBander technology.

The gene, known as KIAA0168, has 326 amino acids from 5,426 bp of mRNA. The mRNA nucleotide sequence was somewhat annotated by 10 July 97 even though genomic sequence was not known and exon/intron structure could not be determined.

GenScan found 6 downstream exons extending nearly the full length on the minus strand of the extension, that is, Blastp(nrp) of GenScan output recovered the carboxy terminus of KIAA0168. However, GenScan erred in determining the 3' end of the second exon, resulting in 10 amino acids being dropped. (This might represent alternative splicing but is completely unsupported by ESTs and paralogues, all of which contain the 10 amino acids.) It is important to note that matches are perfect, ie, this genomic stretch on chromosome 20 is the origin of KIAA0168 mRNA and not merely a related gene.

The contig beginning at GenBank position 139,664 of AL133354 contained the 3 middle exons in its 6617 bp and the first contig of the entry had the amino teminus as a single coding exon. This gave 10 coding exons with reading frame phases fixed for adjacent exons within each contig but not across contigs (because some of the unplaced fragments could be located here). It is important to keep track of plus/minus in Blast output and reverse-complement as necessary to build a consistent genomic stretch.

The very long 3' UTR of 4,249 bp is part of exon 10. It matches the adjoined contig for 4,319 bp with 100% identity, from 235 to 4553 in the non-overlap numbering. This also establishes a poly A site at position 235 (confirmed by EST starts, wrongly predicted by GenScan), a typical distance of 15 bp downstream of the poly A site at 250-255. The stop codon is TGA ending at 4483.

The extraordinary length of 3'UTR (4,249 bp) cannot be attributed to retrotransposons -- only a full length 310 bp AluSg occurs 895 bp upstream of the poly A site (positions 4458-4767 of the mRNA). There may be selective pressure operative: only 7% of the 3' UTR is repeatmasked compared to some 45% for the contig overall.

A 3' UTR of 4,249 bp is bioinformatically unfavorable because ESTs are generally only a few hundred bp in length and begin at the end. EST matches should and do mostly begin far downstream with the poly A site and rarely reach coding exons. Blastn(est) of the final 1000 bp (unmasked) gives 102 matches, 45 from human, mostly retina and brain.

Blastn(est) of masked sequence up to and including the terminal exon shows the EST mRNAs distribution for this region. Three mRNAs appear to splice out a cryptic intron of length 1287; note consistent edges suggesting a good splice site. Thus alternative 3' UTR splicing occurs in about 5% of the mRNAs; the significance is unknown. Interestingly, the spliced-out portion includes the AluSg though the end points are not close.

236 - 327  1304 - 1697 AI810897  [Blast numbering drops masked Alu should add 310 bp]
236 - 327  1304 - 1626 AI420812  ctgaggCCCTGTCAGA - CAGCATTTTTcttg 
276 - 327  1304 - 1587 AI680109 

The 5' UTR of the GenBank mRNA entry can be tested for contiguity with genomic sequence with Blastn(2) of the 196 bp leader sequence to see where it lies within the contig of exon 1. The match consists of 166-196, meaning the coding part of exon 1 has a 31 bp leader. The 165 ends in an AG splice acceptor. Blastn(htgs) then shows that 89-165 is elsewhere in the contig assembly AL133354, namely contig positions 94426 - 94502 ++ match, a 12,089 bp fragment. The match begins 1318 bp into this contig and continues 77 bp. If this contig is truly adjacent, then a minimum of 3242 bp separates this exon from the coding region.

There is no sign of the first 88 bp, the 5' end of reported mRNA in any of the contigs (nor in htgs or nrn). The 17 unplaceable contigs are thus sandwiched between the prion-doppel contig and the extreme 5' end of the KIAA0128 gene -- they cannot contain other genes (unless they lie within an intron of KIAA0128 on the opposite strand as happened on chr 22). The 88bp makes a highly desirable search sequence for later extension of the contig as new genomic data comes in.

Structure of the upstream mRNA:unplaced; pure 5'UTR; leading UTR; first coding exon:

  1 ggggaggaag aaaggcgaag gcaaggcgaa ggggtggaga gtgatatgaa gagcgagaga
 61 aaagagagga cagcggacga gcagatccGG TATCTGGAAT CCCGGCGCCT AGAACGTGTT
181 aaaagagaaa ggaagaATGg actacagcca ccaaacgtcc ctagtcccat gtggacaaga
241 taaatacatt tccaa 

Because KIAA0168 is the immediate neighbor of the prion-doppel genes and because two new diseases map very close by (Parkinson-like dementia and childhood epitehlial blindness), it is important to look into theoretical properties of the coded protein.

The protein appears targeted to the nucleus; it is not membrane-associated nor on the cell surface like prion protein. The amino terminus has a limited amount, the carboxy terminus a lot, of alpha helix. The only known homology domain (KIAA0168 positions 173-316 ) is ras effector, confirmed both at ProtSite/Profam and by direct Blastp(nrp) to 8 homologues [identities 44/147 (29%), positives 75/147 (50%), gaps 10/147 (6%)]. This is a significant bit of information for function; however, the 3D structure has not been determined for any related protein. The domain boundaries do not match exon boundaries.

The first 172 residues match only unfinished genomic material from drosophila chromosome 3 [AC008200; 18/50 (36%) identical, 35/50 (70%) similar to residues 28-77]. This inferred protein also matches KIAA0168 fairly well across 221-307 in the ras effector homology domain. The drosophila gene lacks introns 4-10 relative to human. (The human protein assembles 2 of 70 unfinished contigs in drosophila.)

Protein-coding exons: drosphila genomic match; human ras effector homology domain; missed by GenScan

Within humans, there is little evidence for closely related superfamily members with tBlastn of finished and unfinished human sequence. However, tBlastn(est) of full length KIAA0168 turns up a dozen human paralogous ESTs in this region, notably AW197500 has 119/189 (62%) identity, 141/189 (73%) similarity and AI890191, 54% identity and 66% similarity.

In other species, Blastp(nrp) turn up distant ras effector domains homologues in rat (AF002251) and mouse (AF132851). tBlastn(est) turns up a remarkable set of zebrafish mRNAs [AI958150 = fc91d04.y1, AI958155 = fc91d10.y1, and AA497369 = fa04h12.r1] tileable for 246 amino acids which most likely represents the orthologous protein in this species. Unfortunately these ESTs have not been mapped in zebrafish.


           MQDD+ERI+PPPSS+SWHSGCNL  Q    +   P T P+++++    PPE      +T 


This proves once again that the EST database is far more useful for gene-finding than nrn or nrp: KIAA0168 is part of an extensive family of related human proteins. Unfortunately none of these matches are meaningfully annotated. If zebrafish genomic sequence were available, the prion genes might very well be nearby.

The addition of this stretch to previous bidirectional extensions of the prion-doppel brings the total to some 296,426 bp. However, this is still not enough to unambiguously determine the orientation relative to the telomere-centromer axis. KIAA0168 was not precisely located on radiation hybrid panels. Note how helpful Blasttn(sts) is in determining which sequence tagged sites a given contig contains:

        1 tcacttgagc ccaggagttc aagaccagcc tgggcagcag ggtgaaaccc tgtnctctac
              61 caaacaaaca aacaaaantt agctgggtgt ggtggttcat gcctatagtc cctgttactg
             121 ggtaggctga ggtgggagga aggcttgaac ctgggaggca gaggttgcag tgagttcata
             181 tggctatact ccagtggtgt gcactggagt tgcctgggtg acagaaagaa agaaagaagt
             241 gaaagaaaga aagaaagaaa gaaaganaga aagaaagaaa gaaagaaaga aaganagaaa
             301 gaaaganaga 

L29932 HUMUT235A STS UT235 chr 20
WI-7784  prion gene
WI-3651 psIPP
 LocusID:  9770  UniGene: Hs.80905   Map D79990 
Chr. 20  WI-12264     [This is name for 1423-1860 of KIAA0168 mRNA]
Chr. 20  stSG35837  
Chr. 20  stSG25710  
Chr. 20  SHGC-34960    [Positions 4999-5404 of KIAA0168 mRNA]
Chr. 20  STS EST405788 [Positions 5052-5404 of KIAA0168 mRNA

 stSG39181    BMP2
 WIAF-730    CHGB   chromogranin B (secretogranin 1)
  stSG20076  KIAA0168 gene product
  stSG25710  KIAA0168 gene product
  SGC30394   CHGB   chromogranin B (secretogranin 1)
 WI-12264    KIAA0168
 SGC34960    KIAA0168 gene product
 stSG42745   PCNA    proliferating cell nuclear antigen
 stSG10925   CHGB   chromogranin B (secretogranin 1)
D20S116            did htgs -
..WI-3772        [STS for dJ1115K8 in shotgun ] did htgs + on  AL121781.8 HSJ1164C1
D20S97           [not in dJ1068H6] did htgs+ on  AL121916.8  HSJ189G13  104195-104637 
D20S482          [now called GATA51D03, not in dJ1068H6]: did htgs + on AL121781.8 HSJ1164C1 
..WI-2640        [D20S500, could match dJ1068H6 but does not]  did htgs + on  AL121781.8 HSJ1164C1  
..WI-3651        [D20S1095 or  G13331. IDI isomerase pseudogene dJ1068H6.01598 11694-11887 minus  NT_002559 
    showing on both  AL133354   HSJ1187J4 
..GDB:513003     [prion est dJ1068H6.01172  43200-45527]
WI-7784          [D20S1014, prion gene CDS dJ1068H6.01172  43966-45571] 
..g29963         [prion est dJ1068H6.01172 45222-45562; 27468-27808 of human prion U29185]
..RPS4X          [mRNA for RPS4X matches and overhangs the 5' end of the contig dJ1068H6
..FB25H5         [T03153, does not match dJ1068H6, nearest known marker 5' of this region, fetal brain mRNA] weak too filtered some hits on right things.
D20S895          [not in dJ1068H6] |AL121890.1  and HS1116H23  
..WI-4689        [D20S751] nothing
D20S849          [not in dJ1068H6]
D20S882          [not in dJ1068H6]
..D20S95         [not in dJ1068H6]
Two genes has been annotated nearby however: clone 681N20 on chromosome 20p12.1-13. It contains FTLL1 (ferritin, light polypeptide-like 1) and goliath protein). At some point, extensions of the prion-doppel region will bump into these.

Prediction of the coding sequences of unidentified human genes. V. The coding sequences of 40 new genes (KIAA0161-KIAA0200) deduced by analysis of cDNA clones.

Nagase T, Seki N, Ishikawa K, Tanaka A, Nomura N
DNA Res 1996 Feb 29;3(1):17-24 
As part of our continuing efforts to accumulate information on the coding region of unidentified human genes, we newly determined the sequences of 40 cDNA clones of human cell line KG-1 which correspond to relatively long and nearly full-length transcripts, and predicted the coding sequences of the corresponding genes, named KIAA0161 to 0200. The average size of the cDNA clones analyzed was approximately 5.0 kb. A computer search of the sequences in public databases indicated that the sequences of 20 genes were unrelated to any reported genes, while the remaining 20 genes carried sequences which show some similarities to known genes. ... Northern hybridization analysis demonstrated that 10 genes are expressed in a cell- or tissue-specific manner.

Comment (webmaster): There are 2 other inherited diseases known to map right on top of the prion-doppel region on chr 20. One is a childhood corneal blindness and the other is a Parkinson-like dementia. While they might not be in either in the prion or doppel gene, finding them is a spin-off of gene discovery in this region.

Three apparent new genes (or gene pieces) have preliminary support within the upstream. 187,000bp contig, all on the 3' end. A single high quality mRNA apparently has to do with a ultra-high sulfur keratin. Another single long high quality EST about which nothing is known hasa translation as a 90 amino acid entity, QKILHGFKLKIAMLILLKFSFQQCFLAFKHFSNLFKCLQNLTVKSCTHSKLHSVIASLPK IDNTKLLHEICFYKTSQELPAPLAEGY-

Distances to next gene:
106,461 bp to ATG of prion
 25,331 bp to ATG of doppel
 38,620 bp to start of dynein
 16,617 bp to end of contig
 45,000 bp to end of KIAA0168
After 45% repeat-masking:
47907 bp
11399 bp
17379 bp
 7478 bp
20250 bp

Prion gene mapped in horse

27 Dec 99 Genome Research  
The prion gene was used as one of the markers in constructing a gene map of the horse. The article does not discuss the prion marker in the text but shows it mapped in table 3 to horse chromosome 22. The bottom line is that this region is syntenous over its full length with human chromosome 20p, ie, little has happened since lineage divergence by way of duplicative transpositions on this chromosome.

And the stability of the prion-containing region of chromosome 20 seems to largely hold back to zebrafish lingage group 17 and 20. However recent fine-structure mapping on chr 21 and 3 shows more has happened to other human chromosomes.

This means, since KIAA0168 mRNA from zebrafish is available as primer, the prion gene in zebrafish could readily be located to a zebrafish contig library and sequenced, even if it has changed too much to be primed directly.

A Comparative Gene Map of the Horse (Equus caballus)

Genome Res. 1999 9(12): p. 1239-1249
Alexandre R. Caetano, Yow-Ling Shiue, ... Ann T. Bowling, and James D. Murray
Human chr 20 genes mapping to horse chr 22

20p12    PRNP prion protein
20q11.2	 GHRH growth hormone releasing hormone 		
20q11.2	 ASP agouti signaling protein 		
20q13.11 ADA adenosine deaminase 	
20q13.2  GNAS1 guanine nucleotide-binding protein  

Solution Structure of the Human Prion Protein

 Proc.Nat.Acad.Sci.USA 97 pp. 145 (2000)   [Jan 4 issue]
 Zahn, R., Liu, A., Luhrs, T., Calzolai, L., ... Billeter, M., Wuthrich, K.:
17-SEP-99   pdb accession numbers:
 1QLX  long
 1QM0  90-230
 1QM2 121-230
Comment (webmaster): There is nothing of interest in this paper other than some loop mobility observations. There was no progress towards using native protein: in vivo covalent modifications and copper are missing. The lab talked both about this structure and those of mutants at a Dec 1996 meeting. In any event, SwissProt already published the structure of human prion many months ago, threading it from mouse and hamster. Protein folds don't even begin to change until 15% sequence identity whereas this situation is 90% identity.

At the minimum they could have exerted themselves a little to report on codon 129 valine or determine doppel. One sees that they are just going to slowly dribble out a large number of inconsequential little articles on mutants [which could just as easily be created within SwissProtViewer and optimized for rotamer]. We already know that these structures will not illuminate why the mutants are more susceptible to conformational change in vivo.

In webmaster's opinion, this paper represents an irresponsible waste of precious research money on a very serious disease and a cynical misuse of PNAS posting privileges.

Preprint servers off and running

PubMed Central
 Clinical Medicine and Health BMJ's NetPrint server:
The so-called PubMed Central preprint server is off and running, The editor of PNAS called it 'power to the people' in the 4 Jan 00 issue. Yet the reality is not a preprint server at all in the sense of astronomy or physics. The overwhelming focus of the NIH proposal is protection of foreign journal corporations and lucrative professional societies such as the Massachusetts Medical Society [NEJM] and AMA [JAMA]. Meanwhile the British Medical Journal and Stanford's HiWire Press have made NIH look like complete fools[below].

Why does NIH, which basically controls and enables all biomedical article creation with public tax dollars, have to get down on its knees and grovel before journal owners to get back articles that NIH paid for and never should have given away in the first place? [Through page charges, NIH actually pays journals to haul off their articles.] When did a journal ever fund research or pay for an article? A pathetic preprint server:

"PubMed Central is the barrier-free NIH repository for peer-reviewed primary research reports in the life sciences. All material in PubMed Central will be contributed by journals currently indexed by one of the major abstracting and indexing services, or journals that have three or more research-grant holders from major funding agencies on their editorial boards [ie, only establishment players] ... Copyright will reside with the submitting groups."

But then they leave the door open a little:

"The non-peer-reviewed reports will also enter PubMed Central through independent organizations, which will be responsible for screening this material. Many of the non-peer-reviewed reports will be "preprints," both deposited in PubMed Central and subjected to formal peer review by journal editorial boards. In other cases, these reports may never be submitted to a journal for traditional peer review, yet will be deposited in PubMed Central because, in the judgment of the screening organization, they provide valuable data to the research community. "

How is screening organization defined? Any journal, no matter how mediocre, that gets indexed and "any organization with at least three members who are principal investigators on research grants from major funding agencies and foundations." Thus we wait around for 3 people that want a real preprint server to form a screening organization or an existing internet journal to announce that it will routinely pass through preprints.

Meanwhile, BMJ already has started its clinical medicine preprint server. This is a fun site: visitors have to read and accept a scary nuclear-bomb-shelter warning that they are about to enter a non peer-reviewed sector:

"This week we launch an electronic archive where authors can post their research into clinical medicine and health before, during, or after peer review by other agencies. Resulting from a collaboration between the BMJ Publishing Group and Stanford University Libraries, it will allow researchers to share their findings in full, for free, and as soon as their studies are complete.

Articles will be screened for breaches of confidentiality and libel before we post them. After posting authors may submit them to any peer reviewed journal that will accept submissions that have appeared as electronic preprints. The list of such journals extends far beyond those of the BMJ Publishing Group and is growing daily (see box). Researchers who have retained the right to post their research results after publication in a peer reviewed journal can archive their articles here rather than on possibly more ephemeral institutional or personal websites.

Journals accepting submissions that have appeared on preprint servers:

          American Journal of Botany 
          Biophysical Journal 
          British Journal of Ophthalmology 
          European Journal of Public Health 
          Journal of Accident and Emergency Medicine 
          Journal of Biological Chemistry 
          Journal of Clinical Pathology 
          Journal of Cognitive Neuroscience 
          Journal of Epidemiology and Community Health 
          Journal of Medical Genetics 
          Journal of Medical Screening 
          Molecular Biology of the Cell 
          Molecular Pathology 
          Nature Medicine 
          Neural Computation 
          Occupational and Environmental Medicine 
          Pre-Hospital Immediate Care 
          Proceedings of the National Academy of Sciences 
          Quality in Health Care 
          Tobacco Control (12 month trial)

A novel cellular prion protein isoform present in rapid anterograde axonal transport.

Neuroreport 1999 Nov 26;10(17):3639-4
Rodolfo K, Hassig R, Moya KL, Frobert Y, Grassi J, Di Giamberardino 
We studied the axonal transport of PrP(C) in hamster retinal and sciatic nerve axons. Our results show that a novel 38kDa form is the predominant form in rapid anterograde axonal transport while the 36k and 33k PrP(C) forms, abundant in nerve and brain, appear to be either stationary or slowly transported.

We did not detect any significant retrograde transport of PrP(C). These results show that 38k PrP(C) is the form exported from the cell body to the axonal compartment where it may represent the precursor to the more abundant PrP(C) forms after its modification in nerve fibres or terminals.

Comment (webmaster): It is not clear from the abstract whether they did any work to characterize the 38k band. This could represent a significant development in understanding of disease transmission. It is not feasible to have a longer prion peptide via alternative exon use. However, different lipid or polysaccharide could be associated.

There is an apparent conflict with earlier work: retrograde spread has been reported earlier:

J Gen Virol 1996 Aug;77 ( Pt 8):1925-34
J Gen Virol 1992 Jul;73 ( Pt 7):1637-44
Ciba Found Symp 1988;135:24-36

Prion rods contain an inert polysaccharide scaffold.

Biol Chem 1999 Nov;380(11):1295-306
 Appel TR, Dumpitak C, Matthiesen U, Riesner D
A polysaccharide consisting of mainly 1,4-linked glucose units was found associated with prion rods, which are composed mainly of insoluble aggregates of the N-terminally truncated prion protein (PrP 27-30) exhibiting the ultrastructural and tinctorial properties of amyloid. The polysaccharide differs in composition from the Asn-linked oligosaccharides and the GPI-anchor of the prion protein.

Prion rods were prepared from scrapie-infected hamster brains using two different purification protocols. Prolonged digestion of rods with proteinase K reduced PrP by a factor of at least 500, leaving about 10% (w/w) of the sample as an insoluble remnant. Only glucose was obtained by acid hydrolysis of the remnant and methylation analysis showed 80% 1,4-, 15% 1,6- and 5% 1,4,6-linked glucose units. The physical and chemical properties as well as the absence of terminal glucose units indicate a very high molecular mass of the polysaccharide. No evidence was found for covalent bonds between PrP and the polysaccharide. The polysaccharide certainly contributes to the unusual chemical and physical stability of prion rods, acting like a scaffold. A potential structural and/or functional relevance of the polysaccharide scaffold is discussed.

Increased sensitivity to seizures in mice lacking cellular prion protein.

Epilepsia 1999 Dec;40(12):1679-82 
R Walz, Amaral OB, Rockenbach IC, Roesler R, Izquierdo I, Cavalheiro EA, Martins VR, Brentani RR l.
The physiologic role of the cellular prion protein (PrPc) is unknown. Mice devoid of PrPc develop normally and show only minor deficits. However, electrophysiologic and histologic alterations found in these mice suggest a possible role for PrPc in seizure threshold and/or epilepsy.

We tested the sensitivity of PrPc knockout mice to seizures induced by single convulsant or repeated subconvulsant (kindling) doses of pentylenetetrazol (PTZ), and to status epilepticus (SE) induced by kainic acid or pilocarpine.

In PTZ kindling, seizure severity progressed faster in the PrPc knockout group, in which 92.8% reached stage 5 or death after 4 days of stimulation, as opposed to 38.4% in wild-type animals. After 10 injections, mortality was 85.7% among knockouts and 15.3% among controls. After a single PTZ injection (60 mg/kg), overall mortality due to seizures was 91% in knockout mice, but only 33% among wild-type animals. Pilocarpine-induced SE (320 mg/kg) caused an 86.7% mortality in knockouts, as opposed to 40% in wild-type animals. Finally, after kainic acid injections (10 mg/kg), 70% of the knockouts developed at least one severe seizure, and 50% showed repetitive seizures, whereas no wild-type animal exhibited observable seizures.

Animals lacking cellular prion protein expression are more susceptible to seizures induced by various convulsant agents. This is perhaps the most striking alteration yet found in PrPc-null mice, who at first analysis appeared to be completely normal. A possible role for PrPc in chronic and idiopathic (familial), secondary, or cryptogenic epilepsies in humans remains to be investigated.

Comment (webmaster): The abstract does not make clear which kind of knockout was used, ie, whether doppel was affected. It is very important to describe the detailed genetic structure of the final knockout mice for results to be interpretable.

Annotation Tutorial . . GeneBander . . RepeatMasker . . GenScan
NCBI Blast . . Blast Human . . Blast 2 . . Sanger Blast . . Align 1, 2
BCM Tool Launcher . . Medline . . Entrez . . Translate . . Swiss Tools . . More Links