Short Primer on Comparative Genomics Today: Special guest lecture 12pm, Alway M108 Comparative genomics of animals and plants Adam Siepel Assistant Professor of Biological Statistics and Computational Biology Cornell UniversityEvolution at the DNA levelOrthology and ParalogyOrthology, Paralogy, Inparalogs, OutparalogsSynteny mapsSlide 6Building synteny mapsIndex-based local alignmentLocal AlignmentsAfter chainingChaining local alignmentsProgressive AlignmentThreaded Blockset AlignerReconstructing the Ancestral Mammalian GenomeNeutral Substitution RatesFinding Conserved Elements (1)Finding Conserved Elements (2)Finding Conserved ElementsFinding Conserved Elements (3)Phylo HMMsSlide 21How do the methods agree/disagree?Statistical Power to Detect ConstraintSlide 24Short Primer on Comparative GenomicsToday: Special guest lecture12pm, Alway M108 Comparative genomics of animals and plants Adam SiepelAssistant Professor of Biological Statistics and Computational Biology Cornell UniversityEvolution at the DNA level…ACGGTGCAGTTACCA……AC----CAGTCCACCA…MutationSEQUENCE EDITSREARRANGEMENTSDeletionInversionTranslocationDuplicationOrthology and ParalogyHB HumanHB HumanWB WormWB WormHA1 HumanHA1 HumanHA2 HumanHA2 HumanYeastYeastWA WormWA WormOrthologs:Derived by speciationParalogs:Everything elseOrthology, Paralogy, Inparalogs, OutparalogsSynteny mapsComparison of human and mouseSynteny mapsBuilding synteny mapsRecommended local aligners•BLASTZMost accurate, especially for genesChains local alignments•WU-BLASTGood tradeoff of efficiency/sensitivityBest command-line options•BLATFast, less sensitiveGood for •comparing very similar sequences •finding rough homology mapIndex-based local alignmentDictionary:All words of length k (~10)Alignment initiated between words of alignment score T (typically T = k)Alignment:Ungapped extensions until score below statistical thresholdOutput:All local alignments with score > statistical threshold…………queryDBqueryscanQuestion: Using an idea from overlap detection, better way to find all local alignments between two genomes?Local AlignmentsAfter chainingChaining local alignments1. Find local alignments2. Chain -O(NlogN) L.I.S.3. Restricted DPProgressive Alignment•When evolutionary tree is known:Align closest first, in the order of the treeIn each step, align two sequences x, y, or profiles px, py, to generate a new alignment with associated profile presultWeighted version:Tree edges have weights, proportional to the divergence in that edgeNew profile is a weighted average of two old profilesxwyzThreaded Blockset AlignerHuman–Cow HMR – CDRestricted AreaProfile AlignmentReconstructing the Ancestral Mammalian GenomeHuman: CBaboon: CCat: CDog: GCC or GGNeutral Substitution RatesFinding Conserved Elements (1)•Binomial method25-bp window in the human genomeBinomial distribution of k matches in N bases given the neutral probability of substitutionFinding Conserved Elements (2)•Parsimony MethodCount minimum # of mutations explaining each columnAssign a probability to this parsimony score given neutral modelMultiply probabilities across 25-bp window of human genomeACAAGFinding Conserved ElementsFinding Conserved Elements (3)GERPPhylo HMMsHMMPhylogenetic Tree ModelPhylo HMMFinding Conserved Elements (3)How do the methods agree/disagree?Statistical Power to Detect ConstraintLNC: cutoff # mutationsD: neutral mutation rate: constraint mutation rate relative to neutralStatistical Power to Detect ConstraintLNC: cutoff # mutationsD: neutral mutation rate: constraint mutation rate relative to
View Full Document