Unformatted text preview:

XAAA113 Fosmid Annotation Andrew Nett 5/7/04 OVERVIEW The XAAA113 fosmid contains only a partial fragment of one gene. Three discernible exons of this gene – a homologue of the D. melanogaster gene CG2052 – are present on the littoralis fosmid. The first exon of this gene, if it exists, encodes a UTR, as does the first exon of the D. melanogaster gene. The second, third, and fourth exons of the littoralis gene exist, respectively, at bases 10337-9482, 5313-5236, and 507-223. Clustal analysis and blastn alignments of multiple Drosophila species suggest that a conserved promoter region containing a TATA box exists at roughly base 11900-11986 of the littoralis fosmid. This promoter sequence does not seem to be specific for the CG2052 gene. XAAA113 also contains 57 repetitive elements including two classes of potential novel repeats. The fosmid is syntenic with the fourth chromosome of D. melanogaster through at least the first 16,000 bases of XAAA113. Divergence from chromosome four sequence beyond that point may stem from a 40kb-long void of genes, although one cannot rule out that the remaining fosmid sequence better aligns to a different D. melanogaster chromosome. XAAA113 GENES Preliminary gene exploration Fig 1. Genescan predictions. XAAA113 fosmidGenescan predicts that the XAAA113 fosmid encodes two genes (Figure 1). The first is a single exon gene beginning at base 486 and ending at 222 with a length of 265 bases. The second predicted gene has nine exons located throughout the contig, spanning from base 31045 to base 2408 (Table 1). Despite this prediction, only one gene actually exists on the contig with exons overlapping predicted exons of both Genescan genes. Evidence for this hypothesis begins with a blastx query of unmasked XAAA113 against a Drosophila melanogaster protein database (Figure 2). This search yields matches to only one region of the littoralis fosmid with the best hit occurring to the protein CG2052-PB (Accession #24638609). CG2052, having nine exons, contains a C2H2-type zinc finger domain. It has a role in transcriptional regulation and may bind to DNA promoter or enhancer regions. Initial blastx alignment to this 1,097 aa protein occurs to the contig at bases 217-510 with 86% identity (matching a.a.’s 342-349) and at bases 9458-9595 with 73% identity (matching a.a.’s 279-324). A blastn query of unmasked XAAA113 against the D. melanogaster EST database results in a match that overlaps with CG2052 alignment. The EST specified as GH06573. complete AY58304 aligns to fosmid bases 224-384 (84% identity) and to bases 5236-5318 (89% identity). Table 1. Genescan predictions. Fig 2. Blastx alignment of CG2052 (aa 342-349) to XAAA113. Additionally, blat alignment of the first 25,000 bases of the XAAA113 fosmid occurs to a region of the fourth D. melanogaster chromosome that contains the CG2052 gene (Figure 3).Fig 3. Blat alignment of XAAA113 (base 1-25000) to D. melanogaster chromosome 4. Alignment results suggest the littoralis fosmid may contain CG2052, but further analysis is necessary to determine if littoralis sequences similar to CG2052 are part of a gene or pseudogene. Blat alignment (Figure 3) and an unfiltered tblastn blast2 alignment of XAAA113 against the CG2052 amino acid sequence (accession #45551180) establishes that the fosmid contains at most only a partial fragment of the CG2052 gene since the end of the contig falls in the middle of the potential gene. Furthermore, coding of the first 51 aa of the protein is not found by blast2 query. Alignment occurs with only 36% identity to aa 52-377 of CG2052 at fosmid base 10351-9458 (Figures 4,5). CG2052 aa’s 369-399 align to base 5316-5224 (90% identity), and aa’s 395-492 align to base 510-217 (86% identity). Fig 5. XAAA113 alignment to CG2052 (36% identity). ****extract numbers?????? Specific Blast2 query (blastx and tblastn) of only the first 51 aa of CG2052 against the unmasked littoralis contig yields no results. Alignment does not occur even with an expectancy stringency raised to a value of 1,000,000. Multiple sequence alignment, however, does show relative Fig 4. Tblastn blast2 alignment of CG2052 and XAAA113.conservation of a sequence less than 51 aa long that is upstream of the regions of similarity revealed by the above blast2 query (Figure 6). Clustal analysis If the alignment of XAAA113 base 10351-9458 to CG2052 is taken as a potential – though poorly – conserved exon, Clustal analysis of the surrounding fosmid may possibly uncover a nearby upstream exon short enough to escape blast2 alignments of CG2052 and XAAA113. Indeed, blastn searches of an XAAA113 extract (base 8500-12500) yield matches to D. melanogaster, D. yakuba, and D. pseudoobscura contigs that align to almost the exact same location of the littoralis query. Alignment to base 11900-11986 of the littoralis fosmid occurs at base 413116-413202 of D. melanogaster chromosome 4 (88% identity). This exact region in the littoralis contig also aligns to Contig5960_Contig5609 of a genomic D. pseudoobscura database at base 75459-75526 with 94% identity. Additionally, base 11900-11987 of the littoralis contig matches with 88% identity to base 16885-16799 of Contig 25.42 of the genomic D. yakuba database. (A blastn query of the XAAA113 extract masked by Repeatmasker against the D. yakuba database gives the same result, ensuring that alignments to the same region of the littoralis contig are not the result of some common repeat element existing at that location.) Fig 7. Clustal alignment output; comparison of conservation. Clustal alignment of sequence extracts (flanking the region of alignment to littoralis by 500 bp on each side) shows conservation across the four Drosophila species that appears relatively high around the region targeted Fig 6. Conservation relatively high around region corresponding to littoralis base 11900-11986.Fig 9. 11900-11986 of XAAA113 hit upstream of CG2052. by blastn corresponding to littoralis contig base 11900-11986 (Figures 6,7). At this point, a search for open reading frames in the translated sequence of the littoralis fosmid surrounding base 11900-11986 would seem to be the next step in elucidating an exon in the area. Closer examination of the blastn alignment of base 11900-11986 to the fourth D. melanogaster chromosome, however, suggests that this littoralis region does not actually encode a CG2052 exon, but instead


View Full Document

WUSTL BIOL 4342 - XAAA113 Fosmid Annotation

Download XAAA113 Fosmid Annotation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view XAAA113 Fosmid Annotation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view XAAA113 Fosmid Annotation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?