DOC PREVIEW
UW CSEP 590 - Lecture Notes

This preview shows page 1-2-3-4-5-6-7-50-51-52-53-54-55-56-100-101-102-103-104-105-106 out of 106 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 106 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

RNA Search and !Motif Discovery"Lecture 9!CSEP 590A!Autumn 2008"Outline"Whirlwind tour of ncRNA search & discovery"RNA motif description (Covariance Model Review)"Algorithms for searching"Rigorous & heuristic filtering"Motif discovery"Applications"2Motif Description"7RNA Motif Models"“Covariance Models” (Eddy & Durbin 1994)"aka profile stochastic context-free grammars"aka hidden Markov models on steroids"Model position-specific nucleotide preferences and base-pair preferences"Pro: accurate"Con: model building hard, search sloooow"8Example: searching for tRNAs 13Mj: "Match states (20 emission probabilities)"Ij: "Insert states (Background emission probabilities)"Dj: "Delete states (silent - no emission)"Profile Hmm Structure"1718 CM Structure"A: Sequence + structure"B: the CM “guide tree”"C: probabilities of letters/ pairs & of indels"Think of each branch being an HMM emitting both sides of a helix (but 3’ side emitted in reverse order)"CM Viterbi Alignment"! ! xi= ithletter of inputxij= substring i,..., j of inputTyz= P(transition y " z)Exi,xjy= P(emission of xi, xjfrom state y)Sijy= max#log P(xijgen'd starting in state y via path#)20! Sijy= max"log P(xijgenerated starting in state y via path")Sijy=maxz[Si+1, j#1z+ logTyz+ log Exi,xjy] match pairmaxz[Si+1, jz+ logTyz+ log Exiy] match/insert leftmaxz[Si, j#1z+ logTyz+ log Exjy] match/insert rightmaxz[Si, jz+ logTyz] deletemaxi<k$ j[Si,kyleft+ Sk +1, jyright] bifurcation% & ' ' ' ( ' ' ' Time O(qn3), q states, seq len n 2123 mRNA leader mRNA leader switch?24Mutual Information"Max when no seq conservation but perfect pairing"MI = expected score gain from using a pair state"Finding optimal MI, (i.e. opt pairing of cols) is hard(?)"Finding optimal MI without pseudoknots can be done by dynamic programming"! Mij= fxi,xjxi,xj"log2fxi,xjfxifxj; 0 # Mij# 22629Pseudoknots disallowed allowed ! maxjMi, ji=1n"# $ % & ' ( /231Rfam – an RNA family DB!Griffiths-Jones, et al., NAR ‘03,’05"Biggest scientific computing user in Europe - 1000 cpu cluster for a month per release"Rapidly growing:"Rel 1.0, 1/03: 25 families, 55k instances"Rel 7.0, 3/05: 503 families, >300k instances"33IRE (partial seed alignment):!Hom.sap. GUUCCUGCUUCAACAGUGUUUGGAUGGAAC Hom.sap. UUUCUUC.UUCAACAGUGUUUGGAUGGAAC Hom.sap. UUUCCUGUUUCAACAGUGCUUGGA.GGAAC Hom.sap. UUUAUC..AGUGACAGAGUUCACU.AUAAA Hom.sap. UCUCUUGCUUCAACAGUGUUUGGAUGGAAC Hom.sap. AUUAUC..GGGAACAGUGUUUCCC.AUAAU Hom.sap. UCUUGC..UUCAACAGUGUUUGGACGGAAG Hom.sap. UGUAUC..GGAGACAGUGAUCUCC.AUAUG Hom.sap. AUUAUC..GGAAGCAGUGCCUUCC.AUAAU Cav.por. UCUCCUGCUUCAACAGUGCUUGGACGGAGC Mus.mus. UAUAUC..GGAGACAGUGAUCUCC.AUAUG Mus.mus. UUUCCUGCUUCAACAGUGCUUGAACGGAAC Mus.mus. GUACUUGCUUCAACAGUGUUUGAACGGAAC Rat.nor. UAUAUC..GGAGACAGUGACCUCC.AUAUG Rat.nor. UAUCUUGCUUCAACAGUGUUUGGACGGAAC SS_cons <<<<<...<<<<<......>>>>>.>>>>> Rfam"Input (hand-curated):"MSA “seed alignment”"SS_cons"Score Thresh T"Window Len W"Output:"CM"scan results & “full alignment”"34Faster Search"37Faster Genome Annotation !of Non-coding RNAs !Without Loss of Accuracy"Zasha Weinberg "& W.L. Ruzzo"Recomb ‘04, ISMB ‘04, Bioinfo ‘06"38RaveNnA: Genome Scale !RNA Search"Typically 100x speedup over raw CM, w/ no loss in accuracy: "" "drop structure from CM to create a (faster) HMM""use that to pre-filter sequence; ""discard parts where, provably, CM score < threshold;""actually run CM on the rest (the promising parts)""assignment of HMM transition/emission scores is key """(large convex optimization problem)"Weinberg & Ruzzo, Bioinformatics, 2004, 2006 39CM’s are good, but slow!EMBL CM hits junk Rfam Goal 10 years, 1000 computers 1 month, 1000 computers Our Work ~2 months, 1000 computers EMBL CM hits Ravenna Rfam Reality EMBL hits junk BLAST CM 41CM to HMM"25 emisions per state 5 emissions per state, 2x states A C G U – A C G U – A C G U – A C G U – A C G U – A C G U – A C G U – A C G U – CM HMM 43Need: log Viterbi scores CM ! HMM"Key Issue: 25 scores " 10"P A C G U – A C G U – L A C G U – A C G U – R CM HMM 44Viterbi/Forward Scoring"Path ! defines transitions/emissions"Score(!) = product of “probabilities” on !"NB: ok if “probs” aren’t, e.g. !"1!(e.g. in CM, emissions are odds ratios vs !0th-order background)"For any nucleotide sequence x:"Viterbi-score(x) = max{ score(!) | ! emits x}"Forward-score(x) = !{ score(!) | ! emits x}"45Key Issue: 25 scores " 10"Need: log Viterbi scores CM ! HMM"PCA ! LC + RA PCC ! LC + RC PCG ! LC + RG PCU ! LC + RU PC– ! LC + R– … … … … … PAA ! LA + RA PAC ! LA + RC PAG ! LA + RG PAU ! LA + RU PA– ! LA + R– NB: HMM not a prob. model P A C G U – A C G U – L A C G U – A C G U – R CM HMM 46Rigorous Filtering"Any scores satisfying the linear inequalities give rigorous filtering!Proof: ! CM Viterbi path score ! ! “corresponding” HMM path score! ! Viterbi HMM path score ! (even if it does not correspond to any CM path)"PAA ! LA + RA PAC ! LA + RC PAG ! LA + RG PAU ! LA + RU PA– ! LA + R– … 47Some scores filter better"PUA = 1 ! LU + RA"PUG = 4 ! LU + RG"" " "" " " Assuming ACGU # 25%"Option 1: " """Opt 1:" LU = RA = RG = 2 " " LU + (RA + RG)/2 = 4 "Option 2: " """Opt 2:" LU = 0, RA = 1, RG = 4 " LU + (RA + RG)/2 = 2.5"48Optimizing filtering"For any nucleotide sequence x:"Viterbi-score(x) = max{ score(!) | ! emits x }"Forward-score(x) = !{ score(!) | ! emits x }"Expected Forward Score"E(Li, Ri) = !all sequences x Forward-score(x)*Pr(x)"NB: E is a function of Li, Ri only"Optimization: !Minimize E(Li, Ri) subject to score Lin.Ineq.s"This is heuristic (“forward$ % Viterbi$ % filter$”)"But still rigorous because “subject to score Lin.Ineq.s”"Under 0th-order background model 49Calculating E(Li, Ri)"E(Li, Ri) = !x Forward-score(x)*Pr(x)"Forward-like: for every state, calculate expected score for all paths ending there; easily calculated from expected scores of predecessors & transition/emission probabilities/scores"50Minimizing E(Li, Ri)"Calculate E(Li, Ri) symbolically, in terms of emission scores, so we can do partial derivatives for numerical convex optimization algorithm"! "E (L1, L2, ...)"LiForward: Viterbi: 51“Convex” Optimization"Convex: !local max = global max;"simple “hill climbing” works"Nonconvex:!can be many local maxima, << global max;!“hill-climbing” fails"52Estimated Filtering Efficiency!(139 Rfam 4.0 families)"Filtering fraction"# families


View Full Document

UW CSEP 590 - Lecture Notes

Documents in this Course
Sequitur

Sequitur

56 pages

Sequitur

Sequitur

56 pages

Protocols

Protocols

106 pages

Spyware

Spyware

31 pages

Sequitur

Sequitur

10 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?