WUSTL BIOL 4342 - Finishing the Dot Chromosome of D. virilis - D1270223

Home> Schools> Washington University in St. Louis> Biology and Biomedical Sciences (BIOL) > BIOL 4342> Finishing the Dot Chromosome of D. virilis

WUSTL BIOL 4342 - Finishing the Dot Chromosome of D. virilis

School name Washington University in St. Louis

Course Biol 4342- Research Explorations in Genomics

Pages 9

Download Save

Unformatted text preview:

1Sonal Singhal24 March 2006Bio 4342—Finishing Paper: Final DraftFinishing the Dot Chromosome of D. virilis: 99M21DNA is often portrayed as a unidimensional object; most pictures of DNA show it as alinear form. However, in the cell, DNA is packaged tightly with proteins to form three-dimensional objects called chromosomes. The mode and mechanism of this condensation is stillrelatively unknown, but it is likely important. Research has suggested levels of condensationare correlated strongly to levels of gene expression. To better understand how DNA becomes achromosome, our class is sequencing the fourth chromosome (or dot chromosome) ofDrosophila virilis. Previous classes have already sequenced 2/3 of this chromosome. Uponcompletion, this sequence will be compared to the already-sequenced Drosophila melanogasterdot chromosome. The D. melanogaster dot chromosome is highly heterochromatic—i.e., tightlypacked—yet initial analysis of its sequence suggests it has normal gene density. In contrast, thedot chromosome of D. virilis is euchromatic, or more loosely packed and presumably morepermissive for transcription. By sequencing and annotating both dot chromosomes to highfidelity, we hope a comparative approach can help us determine how gene distribution andsequence organization contribute to heterochromatin formation. My contribution to the projectis finishing a fosmid (99M21) containing approximately ~36 Kb of D. virilis sequence. Thisfosmid presented several challenges, which were solved by a variety of strategies. Atcompletion, this fosmid still has two problem regions, but they are well-defined and likely to beresolved easily following additional read calls.The sequences for my fosmid are derived from approximately 700 reads generated by theGenome Sequencing Center (GSC) using the normal pipeline and 96 reads generated by me.Using Phred and Phrap, the sequences are base-called and then assembled into contigs. As seenin Consed’s Assembly View, the initial assembly consists of five major contigs separated by fourgaps (Figure 1). Together, the gaps represent approximately 1 Kb of missing sequence; thisestimate is based on Assembly View. My first objective was to determine the clone ends so thatI could identify the orientation of the contigs. This year the GSC used a new vector to carry ourfosmid subcloned fragments, and unfortunately this vector sequence had not been entered intoFigure 1: The initial assembly of 800 reads.2the Phred/Phrap program. As such, the clone ends are misidentified. By looking for the“GATC” palindromic sequence that marks most vector ends, I was able to identify both of theclone ends correctly and thus determine the orientation of contigs in the assembly (Fig. 2).My next goal was to close the gaps to form a continuous contig, which proved to be thebiggest challenge of my project. Initially, I attempted to find sequence matches on the ends ofneighboring contigs to force a join. I was unsuccessful, and therefore, reads were called to spanthe gaps. Using Consed’s primer picking software, I ordered eight oligos, one read from eachside of the gaps between neighboring contigs (Table 1). Due to time constraints, I needed thesereads immediately, so I ordered these oligos to be used with all three possible chemistries (BigDye, dGTP, 4:1) and on multiple templates (if possible) in case some reactions failed.This first round of reads was compared with the reads called by Autofinish, theautomated finishing program. Autofinish and I made similar calls, though Autofinish called twofewer reads (Table 2). Both my and Autofinish’s oligos are designed to generate reads that willspan the gaps in the construct—Autofinish is able to call two fewer reads because, unlike me, itdoes not call for reads from both sides of the gap. In an ideal system where all reads are of goodquality and extend 600 to 700 base pairs, Autofinish’s calls might be sufficient to resolve thegaps. However, sequencing is not always optimal, so it is necessary to call reads from both endsto get enough data to cover the gap between contigs. For example, one gap is too large (~900bps) to be spanned by just one read as Autofinish attempts to do. For projects where there issufficient time to order multiple rounds of reads, using Autofinish will help save on sequencingcosts. After all, Autofinish is conservative in how many reads it calls. Plus, Autofinish is moreaccurate—I mistakenly called oligo 2 off an incorrect subclone, whereas Autofinish specifies thecorrect template. However, in our case, where time was the major constraint, Autofinish is likelytoo conservative.A second problem is that Assembly View predicted that my fosmid is only 36 Kb long,including gaps. Because our fosmids should contain 38-40 Kb of sequence, I checked whetherthe contig had repeat structures that had been mis-assembled, resulting in a shorter contig. First,I viewed the organization of the repeats in the contig by using Consed Crossmatch. By doingso, I was able to find those regions of the contig that contained repetitive DNA sequence (Fig. 3).I then scanned those regions of the contig to search for high-quality discrepancies, as they oftenare indicative of misassembled repeats. As I was unable to find any, I concluded that the contigis just abnormally short and not misassembled.While waiting for new reads, I surveyed the assembly for low consensus qualitysequence, high quality discrepancies, and high quality unaligned sequence. Consed allows oneto identify these regions through the navigation windows, making this task straightforward.Figure 2: My misidentified clone ends. The white box outlines the true startof the clone.Figure 2: My misidentified clone ends. The whitebox outlines the true start of the clone.3Figure 3: Contig after running Crossmatch. Orange linesrepresent repetitive units.oligo numberoligo sequence directionality purpose success chemistry templatesReads 14 tcgggaaatattgtaatggac reverse gap (contig 16 and 8) no all 3 aaf02a12 aaf03b045 ctcgcaactgacagcagta forward gap (contig 8 and 15) no all 3 aaf05d01 aaf05e048 aatcaagggatctcattagacc reverse gap (contig 15 and 10) no all 3 aaf04e109 tggaatggaagtcatataaacttg forward gap (contig 12 and 16) yes all 3 aaf02c05 aaf03f127 cctgaaaatgaatgtaaggga forward gap (contig 15 and 10) no all 3 aaf04e10 aaf05e09 aaf03d106 gcactaggaggacatacatctaaaa reverse gap (contig 15 and 8) yes all 3 aaf05e04 aaf05d01

View Full Document


School:
Email:
New Password:
Confirm Password:

WUSTL BIOL 4342 - Finishing the Dot Chromosome of D. virilis

Sign up for free to view:

Please select your school