Unformatted text preview:

1 Fine Song Bio 4342 April 2, 2006 Chimp Chunk 2.5 Annotation Summary A finished genome sequence has limited practicality unless its key features are elucidated. We attempted to better understand the process and principles behind characterizing key features of a genome by annotating chimp chunk 2.5. After creating a repeat-masked fasta file of our chimp sequence, we used GENSCAN, a gene prediction program, and various BLAST searches to identify features and determine possible functions of the genes in chimp chunk 2.5. The GENSCAN output in table and map format can be seen in Figs. 1 and 2, respectively. Figure 1. GENSCAN output table Figure 2. GENSCAN output map2 On completion of our annotation, we found two gene-like features: Feature Location (b.p.) Type of Feature Related Function 1 3315-4299 Pseudogene Heterogeneous Nuclear Ribonucleoprotein K 2 50168-76125 Gene Apoptosis Inhibitor 5 Table 1. Features ofchimp chunk 2.5 and their location, type, and related function. Feature 1 appears to be a pseudogene derived from the heterogeneous nuclear ribonucleoprotein K gene in humans (HNRPK, GenBank accession number NM_002140). HNRPK influences pre-mRNA processing and other aspects of mRNA metabolism and transport. It is thought to have a role during cell cycle progression. Feature 2 is a gene that encodes apoptosis inhibitor 5 (API5, GenBank accession number NM_006595), which prevents programmed cell death. Our ~83kb (82926bp) chimp chunk has 40.65% GC and 44.86% (or 37203 b.p.) masked repeated base content (Fig. 3). There are ten significant repeat regions, defined in this case as non-Alu repeats spanning more than 500 bp: Location (b.p.) Length Type of Repeat 12654-15729 1071 LTR/ERVK 26796-27598 802 LTR/ERV1 31121-31789 668 LTR/ERV1 32774-36323 2006 LINE/L1 35077-36323 1246 LINE/L1 56603-57858 1255 DNA/Mariner 63707-64656 949 LTR/ERV1 71521-72813 1292 LINE/L1 72868-73437 569 LINE/L1 77249-77838 589 LINE/L2 Table 2. Ten significant repeat regions in chimp chunk 2.5. Figure 3. RepeatMasker summary table.3 Feature 1 GENSCAN predicted a two-exon gene between 3315 bp – 4299 bp in the chimp chunk 2.5. The two-exon gene fell within a 2kb – 4kb region, where Herne output indicated alignments with human ESTs. When the GENSCAN-predicted protein was put into the NCBI BLASTp search, the results indicated that this feature is related to HNRPK. Looking at the BLAST alignments in detail, there was only ~60% amino acid homology between the GENSCAN-predicted gene and HNRPK amino acid sequences. When the predicted chimp protein sequence was used in a BLAT search of the human genome, the best match, a site on human chromosome 11, had only 95.1% identity, giving greater possibility for a pseudogene or a paralog rather than a gene. When we used the HNRPK protein in a BLAT search of the human genome, we found a site with 100% identity on human chromosome 9, rather than chromosome 11 (Fig. 4). Figure 4. Human BLAT results for HNRPK protein. Figure 5. BLAT browser showing the gene for human HNRPK protein.4 From the browser, it is clear that the HNRPK gene has 13 exons, which is far more than the two exons predicted by GENSCAN (Fig. 5). Additionally, upon examining the best match for the HNRPK protein sequence on chromosome 11 (73.6% sequence match, quite low), a stop codon was observed. Both of these observations confirmed our prediction that this is a pseudogene (Fig 6). Figure 6. Presence of stop codon in HNRPK match for human chromosome 11. To estimate the age of this pseudogene, we searched for the HNRPK gene in the mouse. The functional mouse ortholog for HNRPK was on chromosome 13 (100% sequence match) and the ortholog of the pseudogene was on chromosome 7 (98.9% sequence match). No stop codon was observed in the latter case. This data suggested that the stop codon mutation was introduced into the pseudogene after the split between the primate and rodent lineages.5 Feature 2 GENSCAN predicted a 14-exon gene between 12509 bp –67662 bp. Both the BLAST and human BLAT alignments for this predicted protein were impressive, with an E-value of 0.0 and 93% amino acid homology with apoptosis inhibitor 5 (API5), and 99.9% amino acid identity with human chromosome 11, respectively. When the actual API5 protein sequence from humans was used in a BLAT search of the human genome, the gene identified showed 14 exons as well. Additionally, a BLAST2 match between a chimp chunk fragment 50168-76125 and human API5 showed 98-100% nucleotide homology. However, the distribution of the exons was a bit troublesome to annotate for two main reasons. First, the Herne output shows no human EST matches in the region between 4 kb and 50 kb, which is where three of the exons in the series (11023-11062, 12453-12509, 16506-16660) fall (Fig. 7). Since EST matches usually indicate a high degree of conservation, the lack of EST matches indicates a lack of conservation, which might suggest that the prediction of these three exons is not very reliable. Second, there is an EST-rich region between 68kb and 76kb that was not predicted by GENSCAN to contain any exons (Fig. 8). Figure 7. Herne output showing no human EST matches to parts of feature 2. Figure 8. Herne output with significant human EST matches to parts of feature 2.6 Considering these observations, perhaps the three exons in the EST-free 4kb - 50kb region can be deemed to be mistakes made by GENSCAN, since GENSCAN only predicts correctly 20-25% of mammalian genes. Looking at the alignment of the predicted protein to the human API5 gene (BLAT search), the region that seems to be missing from the predicted protein might be found among sequences from the 68-76kb region (Fig. 9). Most likely the 68-76kb region correlates with the terminating exon and/or UTR, but was not predicted by GENSCAN due to the gene predictor’s limitations. The human API5 gene has a 3’ untranslated region (UTR) right next to a terminating exon (Fig. 10). Since UTRs can have regulatory function, the fact that the region has high conservation, as indicated by good quality EST evidence, seems consistent with our observations. Figure 9. BLAT browser showing GENSCAN-predicted gene for Feature 2. Figure 10. Map of human API5 gene from NCBI. Overall, since 11 of the 14 exons predicted by GENSCAN in the chimp DNA matched perfectly with human API5 exons, and that the discrepancies that were present occurred in the less


View Full Document

WUSTL BIOL 4342 - Chimp Chunk 2.5 Annotation

Download Chimp Chunk 2.5 Annotation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chimp Chunk 2.5 Annotation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chimp Chunk 2.5 Annotation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?