Some new sequencing technologiesMolecular Inversion ProbesSingle Molecule Array for Genotyping—SolexaNanopore SequencingPyrosequencingPyrosequencing on a chipPolony SequencingSome future directions for sequencingSlide 9Slide 10RNA Secondary StructureRNA and TranslationRNA and SplicingSlide 14Slide 15Slide 16Slide 17Modeling RNA Secondary Structure: Context-Free GrammarsA Context Free GrammarExample: modeling a stem loopSlide 21A parse tree: alignment of CFG to sequenceAlignment scores for parses!The Nussinov AlgorithmSlide 25Slide 26The Nussinov Algorithm and CFGsSlide 28Stochastic Context Free GrammarsExampleComputational ProblemsNormal Forms for CFGsExample of converting a CFG to C.N.F.Another exampleDecoding: the CYK algorithmThe CYK algorithm (Cocke-Younger-Kasami)A SCFG for predicting RNA structureCYK for RNA foldingEvaluationThe Inside AlgorithmAlgorithm: InsideThe Outside AlgorithmAlgorithm: OutsideLearning for SCFGsSlide 45Summary: SCFG and HMM algorithmsThe Zuker algorithm – main ideasCS262 Lecture 11, Win06, BatzoglouSome new sequencing technologiesCS262 Lecture 11, Win06, BatzoglouMolecular Inversion ProbesCS262 Lecture 11, Win06, BatzoglouSingle Molecule Array for Genotyping—SolexaCS262 Lecture 11, Win06, BatzoglouNanopore Sequencinghttp://www.mcb.harvard.edu/branton/index.htmCS262 Lecture 11, Win06, BatzoglouPyrosequencingCS262 Lecture 11, Win06, BatzoglouPyrosequencing on a chipMostafa Ronaghi, Stanford Genome Technologies Center454 Life SciencesCS262 Lecture 11, Win06, BatzoglouPolony SequencingCS262 Lecture 11, Win06, BatzoglouSome future directions for sequencing1. Personalized genome sequencing•Find your ~1,000,000 single nucleotide polymorphisms (SNPs)•Find your rearrangements•Goals:•Link genome with phenotype•Provide personalized diet and medicine•(???) designer babies, big-brother insurance companies•Timeline:•Inexpensive sequencing: 2010-2015•Genotype–phenotype association: 2010-???•Personalized drugs: 2015-???CS262 Lecture 11, Win06, BatzoglouSome future directions for sequencing2. Environmental sequencing•Find your flora: organisms living in your body•External organs: skin, mucous membranes•Gut, mouth, etc.•Normal flora: >200 species, >trillions of individuals•Flora–disease, flora–non-optimal health associations•Timeline:•Inexpensive research sequencing: today•Research & associations within next 10 years•Personalized sequencing 2015+•Find diversity of organisms living in different environments•Hard to isolate•Assembly of all organisms at onceCS262 Lecture 11, Win06, BatzoglouSome future directions for sequencing3. Organism sequencing•Sequence a large fraction of all organisms•Deduce ancestors•Reconstruct ancestral genomes•Synthesize ancestral genomes•Clone—Jurassic park!•Study evolution of function•Find functional elements within a genome•How those evolved in different organisms•Find how modules/machines composed of many genes evolvedRNA Secondary StructureaagacuucggaucuggcgacacccuacacuucggaugacaccaaagugaggucuucggcacgggcaccauucccaacuucggauuuugcuaccauaaagccuucggagcgggcguaacucCS262 Lecture 11, Win06, BatzoglouRNA and TranslationCS262 Lecture 11, Win06, BatzoglouRNA and SplicingCS262 Lecture 11, Win06, BatzoglouHairpin LoopsStemsBulge loopInterior loopsMulti-branched loopCS262 Lecture 11, Win06, BatzoglouSecondary StructureTertiary StructureCS262 Lecture 11, Win06, BatzoglouCS262 Lecture 11, Win06, BatzoglouCS262 Lecture 11, Win06, BatzoglouModeling RNA Secondary Structure:Context-Free GrammarsCS262 Lecture 11, Win06, BatzoglouA Context Free GrammarS AB Nonterminals: S, A, BA aAc | a Terminals: a, b, c, dB bBd | b Production Rules: 5 rulesDerivation:Start from the S nonterminal;Use any production rule replacing a nonterminal with a terminal, until no more nonterminals are presentS AB aAcB … aaaacccB aaaacccbBd … aaaacccbbbbbddddProduces all strings ai+1cibj+1dj, for i, j 0CS262 Lecture 11, Win06, BatzoglouExample: modeling a stem loopS a W1 uW1 c W2 gW2 g W3 cW3 g L cL agugcWhat if the stem loop can have other letters in place of the ones shown?ACGGUGCCAG UCGCS262 Lecture 11, Win06, BatzoglouExample: modeling a stem loopS a W1 u | g W1 u W1 c W2 gW2 g W3 c | g W3 uW3 g L c | a L uL agucg | agccg | cugugcMore general: Any 4-long stem, 3-5-long loop:S aW1u | gW1u | gW1c | cW1g | uW1g | uW1aW1 aW2u | gW2u | gW2c | cW2g | uW2g | uW2aW2 aW3u | gW3u | gW3c | cW3g | uW3g | uW3aW3 aLu | gLu | gLc | cLg | uLg | uLaL aL1 | cL1 | gL1 | uL1L1 aL2 | cL2 | gL2 | uL2L2 a | c | g | u | aa | … | uu | aaa | … | uuuACGGUGCCAG UCGGCGAUGCUAG CCGGCGAUGUUCUG UCGCS262 Lecture 11, Win06, BatzoglouA parse tree: alignment of CFG to sequenceACGGUGCCAG UCGA C G G A G U G C C C G USW1W2W3L•S a W1 u•W1 c W2 g•W2 g W3 c•W3 g L c•L agucgCS262 Lecture 11, Win06, BatzoglouAlignment scores for parses!We can define each rule X s, where s is a string,to have a score.Example:W g W’ c: 3 (forms 3 hydrogen bonds)W a W’ u: 2 (forms 2 hydrogen bonds)W g W’ u: 1 (forms 1 hydrogen bond)W x W’ z -1, when (x, z) is not an a/u, g/c, g/u pairQuestions:-How do we best align a CFG to a sequence? (DP)-How do we set the parameters? (Stochastic CFGs)CS262 Lecture 11, Win06, BatzoglouThe Nussinov AlgorithmLet’s forget CFGs for a momentProblem:Find the RNA structure with the maximum (weighted) number of nested pairingsAGACCUCUGGGCGGCAGUCUAUGCGAACGCGUCAUCAGCUGGAAGAAGGGAGAUCUUCACCAAUACUGAAUUGCACCACGCUUAAGACACCUAGCUUGUGUCCUGGAGGUCUAUAAGUCAGACCGCGAGAGGGAAGACUCGUAUAAGCGACS262 Lecture 11, Win06, BatzoglouThe Nussinov AlgorithmGiven sequence X = x1…xN,Define DP matrix:F(i, j) = maximum number of weighted bonds if xi…xj folds optimallyTwo cases, if i < j:•xi is paired with xjF(i, j) = s(xi, xj) + F(i+1, j – 1)•xi is not paired with xjF(i, j) = max{ k: i k < j } F(i, k) + F(k+1, j)i ji ji jkCS262 Lecture 11, Win06, BatzoglouThe Nussinov AlgorithmInitialization:F(i, i-1) = 0; for i = 2 to NF(i, i) = 0; for i = 1 to NIteration:For l = 2 to N:For i = 1 to N – lj = i + l – 1F(i+1, j – 1) + s(xi, xj)F(i, j) = maxmax{ i k < j } F(i, k) + F(k+1, j)Termination: Best structure is given by F(1, N)(Need to trace back; refer to the
View Full Document