DOC PREVIEW
Stanford CS 374 - Lecture 11 - RNA finding

This preview shows page 1-2-3-4-25-26-27-52-53-54-55 out of 55 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

RNA findingCS374 2008 SpringRiku InoueWhat Does It Look Like?2Source: http://en.wikipedia.org/wiki/TRNAIt’s a Non-coding RNA!Noncoding RNA RNA molecule that is not translated into a protein but is functional molecule by itselfE.g. transfer RNA (tRNA), ribosomal RNA (rRNA), riboswitches Catalytic and regulatory functions Recent studies suggest that there are many undiscovered ncRNAs3Source: Gisela Storz (2002) An Expanding Universe of Noncoding RNAs. Science 296: 1260-1263.Primary, Secondary and Tertiary Structures of ncRNASource: http://en.wikipedia.org/wiki/TRNAPrimary Secondary Tertiary4Different Approaches to Find ncRNA De novo predictionLook for signals that suggest a functional RNA in the molecule Comparative methodCompare with known ncRNAs to detect RNA sequences that are likely to be ncRNAs5Issues with the Preceding Methods6De novo predictionComparative methodMethodsIssues- Secondary structure based The secondary structure signal is too weak- Transcription start and secondary structureSignals are still too weak- Scan conserved regionse.g. QRNAncRNA sequences tend to diverge- Covariance modele.g. ERPIN, RSEARCHGood performance, but needs to align sequences Too slowApply Fast Filter Operation to Speed It Up!7 Use rough but efficient filters to filter out most of the sequences unlikely to be ncRNAs Then apply the expensive but accurate alignment process to the small number of sequencesFast & Accurate!The Problem This Approach Solves8Given an RNA sequence with known secondary structure, efficiently compute all structural homologs(computed as a function of both sequence and structural similarity) in a genomic database.DatabaseQuery SequenceHomologsThe Problem This Approach Solves9Specifically, the authors chose riboswitche discovery as a target problem, since They play important roles in regulation of metabolite synthesis Very diverse and relatively difficult to filterSource: http://www.yale.edu/breaker/riboswitch.htmMethods Proposed by Zhang et al. Secondary structure based filter (FastR)Zhang et al. Searching Genomes for Noncoding RNA Using FastR, IEEE/ACM Transactions on Computational Biology and Bioinformatics. Vol. 2, no. 4, 2005. Multiple keyword filter (Chain filter)Zhang et al. A sequence-based filtering method for ncRNAidentification and its application to searching for riboswitch elements. Bioinformatics. Vol. 22, no. 14, 2006.10Methods Proposed by Zhang et al. Secondary structure based filter (FastR)Zhang et al. Searching Genomes for Noncoding RNA Using FastR, IEEE/ACM Transactions on Computational Biology and Bioinformatics. Vol. 2, no. 4, 2005. Multiple keyword filter (Chain filter)Zhang et al. A sequence-based filtering method for ncRNAidentification and its application to searching for riboswitch elements. Bioinformatics. Vol. 22, no. 14, 2006.11Structural Filter + Seq. Alignment = FastR12Step 1FilteringStep 2AlignmentQuery SequenceHomologsOutputInputGeneDatabaseFilteredSequencesRunning Time T1<< T2Filters Sequence-based filtersCompare sequences itself to get matches Structure-based filtersCompare secondary structures to get matches13Filter AlignSecondary Structure of RNA14Source: Zhang et al.Stacks are the basis of the filterFilter AlignSecondary Structure of RNA15Source: Zhang et al.Filter AlignStacks16Stack: Nucleotide strings that can form an energetically favorable pair(k, w)-stackSource: Zhang et al.Filter AlignStacks17Nested Stack: (k, w, l)-nested_stack is a collection of l (k, w)-stacks s1, s2, …, slsuch that si+1is nested in si(k, w, 4)-nested_stackSource: Zhang et al.Filter AlignStacks18Parallel Stack: (k, w, l)-parallel_stack is a collection of l (k, w)-stacks s1, s2, …, slsuch that any sidoes not overlap(k, w, 4)-parallel_stackSource: Zhang et al.Filter AlignDesigning Filters19 Basis: stacks Extension: distance constraints (# of BP)For a filter with l stacks, define a 2l-dimensional vector w containing the allowed ranges between stacksFig: (k, w, 4)-multiloop stack for tRNA with distance constraints, with w = [(50,75),(3,7),(3,15),(0,7),(5,15),(0,7),(5,15),(3,7)]Source: Zhang et al.Filter AlignFiltering Algorithms201. HashDescriptionTime ComplexityBuild a hash table to compute all kmer positions in the databaseO(m)m: the size of the DB2. Identify (k,w)-stacksFor all si(kmer @ pos i), compute a neighborhood N(si) of all “complementary” kmers* siand a neighbor sjN(si) forms a (k,w)-stack if (j-i) satisfies the distance constraintO(n)n: the number of (k,w)-stacks in the DBScan the DB for (k,w)-stacks with a moving window of size w.* Maintain an active list of (k,w)-stacks in the window and match with the filterO(mkw)mk: the number of (k,w)-stacksTypically, mk< m/w3. FiltersFilter AlignFiltering in Action21Query SequenceFilterDatabaseSequenceScan Window(Length w)No match!Filter AlignFiltering in Action22Query SequenceFilterDatabaseSequenceNo match!Filter AlignFiltering in Action23Query SequenceFilterDatabaseSequenceHit!Filter AlignWhat is a “Good” Filter?24 Efficient(The time to filter) < (The time to align and score the filtered hits) High Sensitivity(# of ncRNA admitted by the filter) / (# of ncRNA input to the filter)should be as close to 1 as possible High Specificity(Expected # of hits per random base-pair) should be as small as possibleLess false negativesLess false positivesFilter AlignEquation for Specificity25 The bound for specificity iswhereTo get the optimal filter,- Maximize kl- Keep kas low as possible- Also maintain sensitivityklFilter AlignFinding an Optimal Filter261. Employ a dynamic programming algorithm to automatically generate filters with high specificity Iterates over every value of k, l Checks if a (k, l)-nested filter is possible Choose the one that maximize kl while keeping k low2. Users tweak the filter parameters to get the desired sensitivity using the software However, the test results show that the automatically generated filters have sensitivity comparable to manually tuned filtersFilter AlignWhat is an Alignment Algorithm?27AGTGCCCTGGAACCCTGACGGTGGGTCACAAAAC TTCTGGAAGTGACCTGGGAAGACCCTGACCCTGGGTCACAA AACTCSource: CS262 Winter 2008Dynamic ProgrammingFilter AlignSequence Structure Alignment28 Align a plain sequence to a secondary structure Based on Bafna et al. Extension: use the extra information from the filter match (next page)Filter AlignExploiting the structure of filter matches29


View Full Document

Stanford CS 374 - Lecture 11 - RNA finding

Documents in this Course
Probcons

Probcons

42 pages

ProtoMap

ProtoMap

19 pages

Lecture 3

Lecture 3

16 pages

Load more
Download Lecture 11 - RNA finding
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 11 - RNA finding and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 11 - RNA finding 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?