WUSTL BIOL 4342 - Annotating the D. virilis Fourth Chromosome

Unformatted text preview:

1 Sonal Singhal 3 May 2006 Bio 4342W Annotating the D. virilis Fourth Chromosome: Fosmid 99M21 Abstract In this project, I annotated a chunk of the D. virilis fourth chromosome (fosmid 99M21) by considering genes, repeat structure, synteny, and conserved coding and non-coding regions. Using multiple tools and databases, I was able to complete this project. Two partial but likely functional genes are described, one that shows similarity to toy and the other that shows similarity to cathepsin-L in comparison to D. melanogaster. Analysis of repeats increased the overall percentage of repeats by nearly 10% and found four possible novel repeats. Studying the synteny of this fosmid suggests that the D. virilis dot chromosome might share genetic material with D. melanogaster chromosome 2. Finally, ClustalW analysis helped identify a putative promoter for toy and showed the remarkable conservation of cathepsin-L. My final annotation is shown in Figure 1. Introduction Sequencing a genome merely provides the order of base pairs on a DNA strand—an appropriate analogy might be that it provides us with a code that we must decipher before it makes sense. Annotation can be seen as the process of decoding a genome, because it extracts important biological information by analyzing the functional elements in a genome. In this class, we are annotating the fourth and largely euchromatic chromosome of Drosophila virilis so that we can compare it to the already-annotated fourth and largely heterochromatic chromosome of Drosophila melanogaster. By specifically considering chromosomal-wide changes in repeat density and distribution, synteny, and gene organization, we hope to better understand how heterochromatin forms. In this paper, I discuss my contribution to this project: the annotation of a fosmid (99M21), containing sequence from the fourth chromosome of Drosophila virilis. This fosmid is approximately 37 Kb in size and has a G/C content of 38%. With respect to my fosmid, I will discuss (1) identified genes, (2) repeat structure, (3) synteny with D. melanogaster, and (4) conservation of genic and non-genic regions. Gene Finding Method I used the same basic procedure to annotate all genes in the fosmid 99M21. Following application of RepeatMasker, the ab initio gene finder Genscan was used to identify all possible coding features in the fosmid. As shown in Figure 2, Genscan predicted three features in my fosmid. Each feature was handled separately. To determine possible homology for the prospective gene in D. melanogaster, I used Blat to search the D. melanogaster genome and 1 2 34 5 6 7 98 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37repeatscoding exonstoy homologcathepsin-L homologFigure 1: Final annotation of Fosmid 99M21.2 blastx to serach the refseq database, looking for matches to my predicted coding sequence (cds). Through these searches, I was able to find the putative homolog for the feature in question, and I also determined what portion of the putative homolog was encoded by my fosmid. I then found the putative homolog in Ensembl and used the transcript information available through this website to build a gene model. A gene model describes the spacing, number, and length of exons for the gene in question. For this project, when multiple transcripts are available, we used the transcript that contains the most genetic information (i.e., that has the most amino acids). After determining the gene model, I then attempted to characterize each exon in the model separately. I used bl2seq (tblastn), using the amino acid sequence from an exon to search the fosmid sequence to determine exon boundaries. In order to maximize matches, all searches were run without the "low complexity filter" and with an expect value of 1000. Generally, the results from these searches described the exon boundaries well, providing me with the coordinates for both the start and stop sites for the exon. I then did a first-pass check to determine if the exons were described (as appropriate) by start and stop codons and splice acceptor and donor sites. I modified my descriptions as necessary to conform to these rules without changing significantly the peptide that would result. Once my initial characterization of the exons had been confirmed, I used Wilson Leung's program "Annotation Check" to do a more thorough and reliable check of my annotation. If my annotation passed the check, I used bl2seq (blastp) to compare the polypeptide predicted by the concantenated exons to the homologous polypeptide from D. melanogaster. If I saw any drastic deviations, I re-evaluated my annotation as necessary—particularly, ensuring that exon boundaries and exon phases had been accurately defined. Here, exon phase refers to how the reading frame of each exon compares to the gene as a whole. Finally, because students in previous classes had already annotated both of my genes, I compared my annotation to the earlier class annotation. Figure 2: Genscan output for Fosmid 99M21.3 Feature 1 Feature 1 consists of two exons and is on the minus strand of the fosmid. Genscan predicted an initial exon, but it did not predict a terminal exon. Thus, my initial suspicion was that the fosmid only contains a partial 5' region of the gene. Using blastp, the predicted Genscan peptide was used to search the complete coding sequence (CDS) database. Blastp predicted a conserved homeodomain in the peptide, suggesting that this peptide encodes a transcription factor. Indeed, the search shows that the predicted peptide has high homology to the toy gene from D. melanogaster, which encodes a transcription factor that is similar to eyeless. toy is located on the fourth chromosome of D. melanogaster. Developed with D. melanogaster transcript information from Ensembl, the gene model for toy is unambigious—there is only one characterized splicing of toy mRNA which consists of seven exons (Figure 3). Using bl2seq, I used the peptide predicted from each individual exon to search my masked fosmid sequence using tblastx. Doing so confirmed that my fosmid contains the first two exons of the toy gene (Figure 4, Table 1). As determined by the Annotation Check program, these predicted exons are corroborated by the presence of appropriate splice acceptor and donor sites. Further, the final peptide shares very high homology


View Full Document

WUSTL BIOL 4342 - Annotating the D. virilis Fourth Chromosome

Download Annotating the D. virilis Fourth Chromosome
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Annotating the D. virilis Fourth Chromosome and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Annotating the D. virilis Fourth Chromosome 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?