Computational Molecular Biology and Genomics 02-711/03-711/15-856Assignment 1: Challenges of second-generation sequencing Due September 28thSince 2005, several “second-generation sequencing” platforms have become available characterized by lower cost, higher throughput and shorter reads than conventional Sanger sequencing. The following articles discuss algorithmic challenges that arise in the analysis of second-generation sequencing data.- Bioinformatics challenges of new sequencing technology Pop M, Salzberg SL. Trends Genet.. 2007; 24(3) 142:149 - Sense from sequence reads: methods for alignment and assembly. Flicek P, Birney E. Nat Methods. 2009 Nov;6(11 Suppl):S6-S12. Read these commentaries and briefly answer the following questions. You may read additional materials, if you wish. If you do, you must cite your sources. You may not quote verbatim without attribution. 1. In addition to short reads, what are two fundamental attributes of second-generation sequencing data that differentiate it from Sanger sequencing data and lead to fundamentally different algorithmic tradeoffs?2. Why are repetitive regions a bigger challenge for second-generation sequencing datathan for Sanger sequencing data?3. How does paired-end sequencing address this problem?4. Resequencing refers to the practice of sequencing DNA from an individual organism for which a reference genome sequence has already been determined. Two important applications for resequencing are investigating variation within a population and identifying mutations associated with disease. Why is sequence assembly easier for resequencing data than for de novo assembly?5. Compare the error characteristics of traditional Sanger sequencing and Roche 454 sequencing.6. Why is assembly of second-generation sequencing data easier for bacterial genomes?7. What is a spaced seed? Why is finding spaced seeds useful in alignment?8. Although assembly algorithms based on De Bruijn graphs had already been proposedbefore the first second-generation sequencing technologies appeared on the market,they have become much more important in the context of second-generation sequencing. What fundamental property of De Bruijn graphs makes this a natural framework for second-generation sequencing?9. How do assembly algorithms based on overlaps scale with the number of reads? How do assembly algorithms based on De Bruijn graphs scale with the number of
View Full Document