05-511/711 Computational Genomics and Molecular Biology, Fall 2005 1Problem Set 1 Due Thursday, September 29thThis homework is intended to evaluate your understanding of the genome assembly problem andalgorithms, as presented in Martin Farach-Colton’s 2003 Biosymposium talk (see class syllabus for alink to the Quicktime movie). Please answer each question with no more than one or two sentences.Collaboration is allowed on this homework, but you must hand in homeworks individually and listthe names of the people you worked with.1. What is the major limitation of current direct sequencing technology that makes it necessaryto use computational methods to assemble the human genome?2. Explain the difference between a bottom-up and a top-down approach to genome assembly.Which did NCBI use? Why?3. What are the advantages of creating a physical map before sequencing BACs? How aboutthe disadvantages?4. What is an invalid overlap? Draw one. What kind of error might it indicate? Explain.05-511/711 Computational Genomics and Molecular Biology, Fall 2005 25. Overlaps between BAC’s are used to determine which BACs are adjacent in the genomesequence. Sometimes two BAC’s which are truly adjacent have little or no overlap. Whymight these false negatives occur?6. DNA cloning and sequencing technologies invariably yield noisy data. Why are interval graphsa useful formalism for identifying sequencing and other types of error s? Explain how theycan be used to identify one particular typ e of error.7. Since the true human genome sequence is not known, evaluating an assembly is not straightfor-ward. Describe one method which can be used to assess and compare alternative assemblies.8. Which regions in the human genome still remain to be correctly assembled? Explain whythese regions are
View Full Document