Computational GenomicsBiology in One Slide: 2 ParadigmsHigh Throughput BiologyGoals of this courseSlide 5Topics in CS262Slide 7Course responsibilitiesReading materialTopic 1. Sequence AlignmentComplete genomesEvolutionEvolution at the DNA levelEvolutionary RatesSequence conservation implies functionSequence AlignmentWhat is a good alignment?Scoring FunctionComputational GenomicsLecture 1, Tuesday April 1, 2003Biology in One Slide: 2 ParadigmsMolecular ParadigmEvolution ParadigmHigh Throughput BiologyBiology is becoming an information science…ACGTGACTGAGGACCGTGCGACTGAGACTGACTGGGTCTAGCTAGACTACGTTTTATATATATATACGTCGTCGTACTGATGACTAGATTACAGACTGATTTAGATACCTGACTGATTTTAAAAAAATATT…Gene ExpressionDNA SequencingGoals of this course•Introduction to Computational BiologyBasic biology for computer scientistsBreadth: mention many topics & applications•In-depth coverage of Computational GenomicsAlgorithms for sequence analysisCurrent applications, trends, and open problems• Coverage of useful algorithmsHidden Markov modelsDynamic ProgrammingString algorithmsApplications of AI techniquesTopics in CS262Part 1: In-depth coverage of basic computational methods for analysis of biological sequencesSequence Alignment & Dynamic ProgrammingHidden Markov modelsThese methods are used heavily in most genomics applications:DNA sequencingComparison of DNA and proteins across organismsDiscovery of genes, promoters, regulatory sitesTopics in CS262Part 2: Topics in computational genomics, more algorithms, and areas of active researchDNA sequencing & assembly: reading a complete genome such as the human DNAGene finding: marking genes on the DNA sequenceLarge-scale comparative genomics: comparing whole genomes from multiple organismsMicroarrays & regulation: understanding the regulatory code, and potential disease-causing genesRNA structure: predicting the folding of RNAPhylogeny and evolution: quantifying the evolution of biological sequencesCourse responsibilities•Homeworks [72%]4 challenging problem sets, 4-5 problems/psetCollaboration allowed – please give creditHws due Thursday, solutions explained FridayTwo worst problems in all hws do not count•Final [18%]Takehome, 1 dayCollaboration not allowedBasic questions – much easier than homeworks•Scribing [10%]Due one week after the lecture, except special permissionReading material•Books“Biological sequence analysis” by Durbin, Eddy, Krogh, Mitchinson•Chapters 1-4, 6, (7-8), (9-10)“Algorithms on strings, trees, and sequences” by Gusfield•Chapters (5-7), 11-12, (13), 14, (17)•Papers•Lecture notesTopic 1. Sequence AlignmentComplete genomesEvolutionEvolution at the DNA level…ACGGTGCAGTCACCA……ACGTTGCAGTCCACCA…CSEQUENCE EDITS REARRANGEMENTSEvolutionary Rates OKOKOKXXStill OK?next generationChanges in non-functional sites are OK, so will be propagatedMost changes in functional sites are deleterious and will be rejectedSequence conservation implies functionInterleukin region in human and mouse100%40%Sequence Alignment-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGACDefinitionGiven two strings x = x1x2...xM, y = y1y2…yN,an alignment is an assignment of gaps to positions0,…, M in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gapin the other sequenceAGGCTATCACCTGACCTCCAGGCCGATGCCCTAGCTATCACGACCGCGGTCGATTTGCCCGACWhat is a good alignment?Alignment: The “best” way to match the letters of one sequence with those of the otherHow do we define “best”?Alignment:A hypothesis that the two sequences come from a common ancestor through sequence editsParsimonious explanation:Find the minimum number of edits that transform one sequence into the otherScoring Function•Sequence edits:AGGCCTCMutations AGGACTCInsertionsAGGGCCTCDeletionsAGG.CTCScoring Function:Match: +mMismatch: -sGap: -dScore F = (# matches) m - (# mismatches) s – (#gaps)
View Full Document