WUSTL BIOL 4342 - Finishing Drosophila virilis Fosmid Clone 4N16 - D1587384

Home> Schools> Washington University in St. Louis> Biology and Biomedical Sciences (BIOL) > BIOL 4342> Finishing Drosophila virilis Fosmid Clone 4N16

WUSTL BIOL 4342 - Finishing Drosophila virilis Fosmid Clone 4N16

School name Washington University in St. Louis

Course Biol 4342- Research Explorations in Genomics

Pages 10

Download Save

Unformatted text preview:

Finishing Drosophila virilis Fosmid Clone 4N16 David Desruisseau April 12, 2006 Bio 434W GenomicsDMD 4/12 rev. 1 Finishing Drosophila virilis Fosmid Clone 4N16 David Desruisseau Abstract The overarching goal of Bio 4342/W research over the past several years has been to understand a euchromatic region of the Drosophila virilis genome well enough to be able to distinguish this domain at the DNA level from the heterochromatic counterparts in its genetic relatives, such as Drosophila melanogaster, the common fruit fly. The class will utilize the well-established genome of D. melanogaster as a model organism possessing a heterochromatic fourth chromosome for comparison. Current class research focuses on completing a reliable genome sequence for the equivalent “dot” chromosome in D. virilis in hopes of discerning sequence domains or gene characteristics that explain this inter-species difference in chromosomal organization. In this report, I describe sequence finishing for fosmid 4N16. Finishing Workflow To start off After running Phred/Phrap to call bases on the raw sequence data and create an initial assembly for my fosmid, I was presented with a Consed assembly consisting of five major contigs with a significant number of inconsistent forward/reverse pairs, a significant low coverage region and extensive misassembly. Running Crossmatch identifies direct sequence repeats, indicated by orange lines, or tandem sequence repeats, indicated by black lines (none visible here), according to a desired percent similarity. Viewing the Crossmatch results with a sequence similarity of 90% revealed very high sequence repetition throughout my entire fosmid. Figure 1.1 Initial assemblyDMD 4/12 rev. 2 As a first effort to quickly improve assembly quality, multiple high quality discrepancies were tagged with the command Tell Phred/Phrap not to overlap reads at this location in order to increase the stringency of assembly joins. After rerunning Phred/Phrap with these parameters set, Assembly View displayed slightly altered contigs with somewhat fewer inconsistent forward/reverse pairs. In order to improve the readability of Assembly View, the Reorient Contigs command was used to ensure consistent sequence directionality between all contigs. Finally, the exclude contig if depth of coverage greater than this parameter was increased to 80. This tells Consed to display any highly misassembled contigs with unusually high read depths, revealing a greater amount of the available sequence. The resulting assembly is shown in Figure 1.1. Locating and establishing ends The next task was to locate and establish two different types of ends: those of the individual reads and those of the entire fosmid. Looking through the aligned reads, my goal was to identify vector sequence that had been represented as clone sequence and eliminate it wherever possible. This process involved the recognition of vector end sequences on individual reads. The ends were as follows: GAATTCGTC—insert, GAATTCGTT—insert, insert—GACGAATTC, and insert—AACGAATTC. After locating regions where vector sequence remained in a read, the regions were excluded using the command Change to x’s to left/right. This tells Consed to ignore this sequence in order to better calculate consensus sequence for that particular area and therefore construct a better whole assembly. Whole clone ends were located in a similar fashion and were identified with a GATC end sequence and then a string of x’s in either direction, representing a fosmid end. High quality flanking vector sequence (from pfos1, the vector used here) was almost always correctly identified by Consed, making the process of finding clone ends relatively straightforward. However, if vector sequence was of too low quality for Consed to properly define as a clone end (see Figure 2.1), the trace window was used to easily Figure 2.1 Candidate endDMD 4/12 rev. 3 compare the general trace patterns for the confirmed and unconfirmed vector sequence. In this way, having to resolve the exact sequence for the unconfirmed trace would be unnecessary. Simply comparing the general trace patterns would give sufficient visual proof that the read in question did indeed contain pfos1 vector sequence and that the region could be safely ignored (Figure 2.2). Assessing inconsistent forward/reverse pairs After locating my fosmid ends, the next problem to be addressed was to assess inconsistent forward/reverse pairs. As is apparent in the original Assembly View (Figure 1.1), my fosmid was initially highly misassembled, with a number of mismatches falling almost entirely within highly repetitious regions. Knowing this, as well as understanding the time constraints on this project, a decision was made to access a version of the fosmid that had been partially finished by a WU GSC finisher. The assembly pieces borrowed from this alternate version would serve as a scaffold for my assembly. The scaffold data from this alternate assembly was prepared as a .phd file, added to the phd_dir of my project folder, and Phred/Phrap was rerun. The assembly data would serve to augment mine, and would provide a relatively trustworthy template to which Phred/Phrap could align my sequence. Incorporating this data improved my Assembly View significantly and produced a large contig of approximately 37 kilobases in length (Figure 3.1). The remaining four significant smaller contigs also exhibited fewer inconsistent forward/reverse pairs with relation to the main contig. Figure 2.2 Dye trace Note the high similarity between peak distribution and general trace shape. Figure 3.1 Scaffold incorporatedDMD 4/12 rev. 4 Note about Findid The GSC ran my assembly against their in-house database to sequence from 14 different organisms, including bacterial, human, yeast, and maize DNA. This scan confirmed that my main contig contained no findid-labeled contamination. Calling reads In order to close the gap at the right end of my fosmid, I needed to call reads to obtain additional sequence data for that region. Ordering oligos that would effectively span this region required careful selection of unique sequences that would anneal to only one complementary region during the PCR reaction. This was somewhat difficult with my clone, since it contains so much repetitious sequence, but I was able to define four unique oligos through trial-and-error using the Search for String. Autofinish

View Full Document


School:
Email:
New Password:
Confirm Password:

WUSTL BIOL 4342 - Finishing Drosophila virilis Fosmid Clone 4N16

Sign up for free to view:

Please select your school