Unformatted text preview:

Zfh2 exonsThd1 exonsPur-alpha exons040 kb80 kb = 1 kb= LINE, Penelope= DNA/Transib, Transib1= DINE= Novel Repeat= LTR/PAO, Diver2 I = LTR/Gypsy, Invader= Transposon, Tel1= DNA, DNAREP1 DMAnnotating 7G24-63 Justin Richner May 4, 2005 Figure 1: Map of my sequence I was given 80,940 bases of sequence to annotate from the Drosophila virilis dot chromosome. This consisted of two approximately 40 kb fosmids joined together; 7G24 and 63. Fosmid 7G24 comprises bases 1 to 39,070. Fosmid 63 was annotated last year (Figure 1), and three genes were found; zfh2, thd1, and pur-alpha. I also found and annotated the same three genes. Zfh2 is zinc finger homeodomain protein 2, a probable transcription factor that is required for wing development. Zfh2 stretches from 22793 to 45965 and contains nine exons. Thd1 is mismatch dependent uracil/thymine DNA glycosylase, which removes mismatched uracil or thymine in double stranded DNA. Thd1 stretches from 62505 to 54357 and contains five exons. Pur-alpha is purine-rich binding protein-α, which is a single stranded DNA binding protein thought to be involved in DNA replication. Pur-alpha begins at 80071 and extends past the end of my sequence. Two of the Pur-alpha exons are within my sequence. The entire sequence contains 32 repeated segments, one of which is a novel repeat, and five of which are DINES. The protein Zfh2 is conserved across species in the zinc finge binding domain. No conserved non-genic regions were found. This segment of the dot chromosome has high synteny with the fourth chromosome of D. melanogaster. Figure 2: Gene map from last year’s submitted paper2 Genes: I first tried to identify genes using the Twinscan output on the Goose server within the UCSC genome browser format (Figure 3). The first gene predicted (chr6001.1) is the tel1 gene, a protein involved in transposable elements. I will look at this gene more closely in the Repeat section. Figure 3: UCSC output on goose server The next predicted feature I analyzed was chr6.002.1. Twinscan predicts this to be a single exon feature, but Genescan and mRNA data suggests that there are multiple exons. When Blast was performed against the nr database, the feature shows very good homology to the Zfh2 protein. But, the Zfh2 protein was much longer than the predicted one exon gene from Twinscan. I did a Blast search with the next predicted feature, chr6.003.1 and again found high homology to Zfh2. I decided that these were most likely the exons for this same gene and attempted to find the rest of the exons. At this point, I did not know how to use Ensembl or FlyBase, so to look for the exons, I blasted my entire repeat masked sequence to the nr database, and looked for the exons using herne on the Blast output file. The results were not expected. I had the first four exons transcribed in the forward direction from around 20000 to 40000 bases (Figure 4), and the last five exons transcribed in the reverse direction from the very end of my sequence to about 60000 bases (Figure 5). Figure 4: Two of the exons for Zfh2 transcribed in the forward direction. Figure 5: Three of the exons for Zfh2 transcribed in the reverse direction. I realized that my sequence was not assembled correctly, and XAAA63 should have been orientated in the opposite direction before it was joined with 7G24. Chris corrected my sequence but could not put the corrected sequence into the UCSC output on the Goose server. All of the numbers in the second half of my sequence were incorrect3 when looking at data on the UCSC output, and I continually had to do Blast2 alignments in order to find the proper numbers. Also, the Twinscan output was wrong for Zfh2. After performing a Blast search with the corrected sequence file, I looked at the hits to Zfh2. With an e-value score of 0.0, predicted exons for nearly all of the amino acids, no stop codons within the predicted exons, and last years data, I concluded that zfh2 is a real gene. I than begin searching for exons. The first exon predicted by Twinscan was much shorter than the first exon in D. melanogaster, obtained from the Ensembl database. However, I noticed that the exon could extend for quite some distance in the +2 frame without encountering a stop codon as shown by the green arrow in Figure 6. I hypothesized that the exon actually continued through the first three exons predicted by Genescan, as shown in Figure 6. Figure 6: UCSC output of first exon of zfh2 I performed a Blast2 alignment against my hypothesized exon and the D. melanogaster first exon, and obtained a good match (Figure 7). I hypothesize that this region, from 22805 to 24577, is the first exon of zfh2. Figure 7: D. melanogaster Vs. predicted zfh2 first exon Figure 8: Blast2 of D. melanogaster 2nd exon with my sequence. At this point I realized two things; Twinscan and Genscan are not reliable, and the method used to find the first exon was highly inefficient. I began to search for exons4 much more quickly by performing Blast2 with the D, melanogaster exons from Ensembl and my entire sequence (Figure 8). Later, I came back to exon 1 and examined intron/exon boundaries to determine the exact stop site of this exon. The beginning of exon 1 was moved farther back to 22793 bases because of mRNA data, Figure 9, and now the exon has a 5’ un-translated region. The end of exon 1 had to be moved forward a couple of bases to 24576 because all introns begin with the base GT, see Figure 10. Figure 9: Beginning of exon 1; Red arrow = old boundary; Green arrow = new boundary Figure 10: End of exon 1 Exons 2, 3, and 4 were found without much difficulty. When searching for exon 5, only half of the exon predicted by D. melanogaster matched with my sequence. I joined exons 5 and 6 of D. melanogaster and performed a Blast2 alignment with my sequence and found a complete exon encompassing both predicted exons without any internal stop codons (Figure 11). I hypothesize that exons 5 and 6 from melanogaster have combined to form one exon in virilis.5 Figure 11: Exons 5 and 6 of D. melanogaster aligned with my sequence Exons 6, 7, 8, and 9 were all pretty straight forward and matched the exons from D. melanogaster. Because exon 9 is the last exon in the ORF, it ends with a stop codon. I was unable to find any 3’


View Full Document

WUSTL BIOL 4342 - Study Guide

Download Study Guide
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Guide and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Guide 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?