UW CSEP 590 - RNA Search and Motif Discovery - D1085069

Home> Schools> University of Washington> (CSEP) > CSEP 590> RNA Search and Motif Discovery

DOC PREVIEW

UW CSEP 590 - RNA Search and Motif Discovery

School name University of Washington

Course Csep 590- Special Topics In Computer Science (PMP)

Pages 27

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

RNA Search and !Motif Discovery"Lecture 9!CSEP 590A!Autumn 2008"Outline"Whirlwind tour of ncRNA search & discovery"RNA motif description (Covariance Model Review)"Algorithms for searching"Rigorous & heuristic filtering"Motif discovery"Applications"2 Motif Description"7 RNA Motif Models"“Covariance Models” (Eddy & Durbin 1994)"aka profile stochastic context-free grammars"aka hidden Markov models on steroids"Model position-specific nucleotide preferences and base-pair preferences"Pro: accurate"Con: model building hard, search sloooow"8Example: searching for tRNAs 13 Mj: "Match states (20 emission probabilities)"Ij: "Insert states (Background emission probabilities)"Dj: "Delete states (silent - no emission)"Profile Hmm Structure"17 18 CM Structure"A: Sequence + structure"B: the CM “guide tree”"C: probabilities of letters/ pairs & of indels"Think of each branch being an HMM emitting both sides of a helix (but 3’ side emitted in reverse order)"CM Viterbi Alignment"! ! xi= ithletter of inputxij= substring i,..., j of inputTyz= P(transition y " z)Exi, xjy= P(emission of xi, xjfrom state y)Sijy= max#log P(xijgen'd starting in state y via path#)20! Sijy= max"log P(xijgenerated starting in state y via path")Sijy=maxz[Si+1, j#1z+ logTyz+ log Exi, xjy] match pairmaxz[Si+1, jz+ logTyz+ log Exiy] match/insert leftmaxz[Si, j#1z+ logTyz+ log Exjy] match/insert rightmaxz[Si, jz+ logTyz] deletemaxi<k$ j[Si,kyleft+ Sk +1, jyright] bifurcation% & ' ' ' ( ' ' ' Time O(qn3), q states, seq len n 21 23 mRNA leader mRNA leader switch? 24 Mutual Information"Max when no seq conservation but perfect pairing"MI = expected score gain from using a pair state"Finding optimal MI, (i.e. opt pairing of cols) is hard(?)"Finding optimal MI without pseudoknots can be done by dynamic programming"! Mij= fxi,xjxi,xj"log2fxi,xjfxifxj; 0 # Mij# 22629 Pseudoknots disallowed allowed ! maxjMi, ji=1n"# $ % & ' ( /231 Rfam – an RNA family DB!Griffiths-Jones, et al., NAR ‘03,’05"Biggest scientific computing user in Europe - 1000 cpu cluster for a month per release"Rapidly growing:"Rel 1.0, 1/03: 25 families, 55k instances"Rel 7.0, 3/05: 503 families, >300k instances"33 IRE (partial seed alignment):!Hom.sap. GUUCCUGCUUCAACAGUGUUUGGAUGGAAC Hom.sap. UUUCUUC.UUCAACAGUGUUUGGAUGGAAC Hom.sap. UUUCCUGUUUCAACAGUGCUUGGA.GGAAC Hom.sap. UUUAUC..AGUGACAGAGUUCACU.AUAAA Hom.sap. UCUCUUGCUUCAACAGUGUUUGGAUGGAAC Hom.sap. AUUAUC..GGGAACAGUGUUUCCC.AUAAU Hom.sap. UCUUGC..UUCAACAGUGUUUGGACGGAAG Hom.sap. UGUAUC..GGAGACAGUGAUCUCC.AUAUG Hom.sap. AUUAUC..GGAAGCAGUGCCUUCC.AUAAU Cav.por. UCUCCUGCUUCAACAGUGCUUGGACGGAGC Mus.mus. UAUAUC..GGAGACAGUGAUCUCC.AUAUG Mus.mus. UUUCCUGCUUCAACAGUGCUUGAACGGAAC Mus.mus. GUACUUGCUUCAACAGUGUUUGAACGGAAC Rat.nor. UAUAUC..GGAGACAGUGACCUCC.AUAUG Rat.nor. UAUCUUGCUUCAACAGUGUUUGGACGGAAC SS_cons <<<<<...<<<<<......>>>>>.>>>>> Rfam"Input (hand-curated):"MSA “seed alignment”"SS_cons"Score Thresh T"Window Len W"Output:"CM"scan results & “full alignment”"34Faster Search"37 Faster Genome Annotation !of Non-coding RNAs !Without Loss of Accuracy"Zasha Weinberg "& W.L. Ruzzo"Recomb ‘04, ISMB ‘04, Bioinfo ‘06"38 RaveNnA: Genome Scale !RNA Search"Typically 100x speedup over raw CM, w/ no loss in accuracy: "" "drop structure from CM to create a (faster) HMM""use that to pre-filter sequence; ""discard parts where, provably, CM score < threshold;""actually run CM on the rest (the promising parts)""assignment of HMM transition/emission scores is key """(large convex optimization problem)"Weinberg & Ruzzo, Bioinformatics, 2004, 2006 39 CM’s are good, but slow!EMBL CM hits junk Rfam Goal 10 years, 1000 computers 1 month, 1000 computers Our Work ~2 months, 1000 computers EMBL CM hits Ravenna Rfam Reality EMBL hits junk BLAST CM 41CM to HMM"25 emisions per state 5 emissions per state, 2x states A C G U – A C G U – A C G U – A C G U – A C G U – A C G U – A C G U – A C G U – CM HMM 43 Need: log Viterbi scores CM ! HMM"Key Issue: 25 scores " 10"P A C G U – A C G U – L A C G U – A C G U – R CM HMM 44 Viterbi/Forward Scoring"Path ! defines transitions/emissions"Score(!) = product of “probabilities” on !"NB: ok if “probs” aren’t, e.g. !"1!(e.g. in CM, emissions are odds ratios vs !0th-order background)"For any nucleotide sequence x:"Viterbi-score(x) = max{ score(!) | ! emits x}"Forward-score(x) = !{ score(!) | ! emits x}"45 Key Issue: 25 scores " 10"Need: log Viterbi scores CM ! HMM"PCA ! LC + RA PCC ! LC + RC PCG ! LC + RG PCU ! LC + RU PC– ! LC + R– … … … … … PAA ! LA + RA PAC ! LA + RC PAG ! LA + RG PAU ! LA + RU PA– ! LA + R– NB: HMM not a prob. model P A C G U – A C G U – L A C G U – A C G U – R CM HMM 46Rigorous Filtering"Any scores satisfying the linear inequalities give rigorous filtering!Proof: ! CM Viterbi path score ! ! “corresponding” HMM path score! ! Viterbi HMM path score ! (even if it does not correspond to any CM path)"PAA ! LA + RA PAC ! LA + RC PAG ! LA + RG PAU ! LA + RU PA– ! LA + R– … 47 Some scores filter better"PUA = 1 ! LU + RA"PUG = 4 ! LU + RG"" " "" " " Assuming ACGU # 25%"Option 1: " """Opt 1:" LU = RA = RG = 2 " " LU + (RA + RG)/2 = 4 "Option 2: " """Opt 2:" LU = 0, RA = 1, RG = 4 " LU + (RA + RG)/2 = 2.5"48 Optimizing filtering"For any nucleotide sequence x:"Viterbi-score(x) = max{ score(!) | ! emits x }"Forward-score(x) = !{ score(!) | ! emits x }"Expected Forward Score"E(Li, Ri) = !all sequences x Forward-score(x)*Pr(x)"NB: E is a function of Li, Ri only"Optimization: !Minimize E(Li, Ri) subject to score Lin.Ineq.s"This is heuristic (“forward$ % Viterbi$ % filter$”)"But still rigorous because “subject to score Lin.Ineq.s”"Under 0th-order background model 49 Calculating E(Li, Ri)"E(Li, Ri) = !x Forward-score(x)*Pr(x)"Forward-like: for every state, calculate expected score for all paths ending there; easily calculated from expected scores of predecessors & transition/emission probabilities/scores"50Minimizing E(Li, Ri)"Calculate E(Li, Ri) symbolically, in terms of emission scores, so we can do partial derivatives for numerical convex optimization algorithm"! "E (L1, L2, ...)"LiForward: Viterbi: 51 “Convex” Optimization"Convex: !local max = global max;"simple “hill climbing” works"Nonconvex:!can be many local maxima, << global max;!“hill-climbing” fails"52 Estimated Filtering Efficiency!(139 Rfam 4.0

View Full Document