DOC PREVIEW
UCSD CSE 280B - ncRNA Detection

This preview shows page 1-2-3-24-25-26-27-49-50-51 out of 51 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

ncRNA detection w multiple alignments March 2006 Vineet Bafna Comparative detection of ncRNA Given a pairwise alignment QRNA decides if it is RNA coding or Other The key to detecting RNA is covarying mutations Multiple alignment should provide more information on covarying mutations March 2006 Vineet Bafna RNAz Computes the probability of ncRNA in a multiple alignment RNAz computes two novel statistics Min Free Energy of sequences MFE Conserved secondary structure SCI Train an SVM using the following features MFE SCI Mean pairwise identity Number of sequences in the input March 2006 Vineet Bafna SCI Apply min energy folding to a multiple alignment The score of a pair of column is dependent upon base pairing as well as compensatory mutations Let EA denote the consensus fold energy Let E denote the average MFE of all sequences SCI EA E Claim Low SCI is bad high is good Q What is the SCI for diverged random sequences What is the SCI for identical sequences March 2006 Vineet Bafna MFE Compute a z score for a sequence with MFE m Z m Instead of computing by shuffling and computing slow Use regression to predict from sequence length and base composition March 2006 Vineet Bafna Non linear classification The z statistic and SCI capture different properties Green is good native red is bad shuffed Is SCI a good statistic given different levels of sequence identity March 2006 Vineet Bafna Using RNAz to predict ncRNA Applying RNAz to conserved regions results in a discovery of 30k putative RNA Is this list complete Is it valid March 2006 Vineet Bafna Structural Alignment X07545 M21086 X05870 U05019 M16530 X01588 AF034619 L27170 X05532 GC ACCCGGC CAUA GUGGCCG GGCAA CAC CCGG U C UCGUU ACCCGGC CAUA GCGGCCG GGCAA CAC CCGG A C UCAUG ACCCGGC CACA GUGAGCG GGCAA CAC CCGG A C UCAUU ACCCGGU CAUA GUGAGCG GGUAA CAC CCGG A C UCGUU ACCCGGC AAUA GGCGCCGGUGCUA CGC CCGG U C UCUUC ACCCGGU CACA GUGAGCG GGCAA CAC CCGG A C UCAUU GGCGGC CACA GCGGUGG GGUUGCCUC CCGU A C CCAUC AGUGGUGGC CAUA UCGGCGG GGUUC CUCCCCGU A C CCAUC AGGAACGGC CAUA CCACGUC GAUCG CAC CACA U C CCGUC Conserved sequences and conserved structure are more apparent in multiple alignments March 2006 Vineet Bafna RNA multiple alignments Detection of RNA depends upon reliable prediction of covarying mutations as well as regions of conserved sequence Precomputing multiple alignments based on sequence considerations is probably not sufficient should be tested How can structural alignments be computed March 2006 Vineet Bafna Computing Structural Alignments G G G G G G G U C U C U U G U C C C G G G G C G G G A G G A A G A G G C C C G G C G U C G C C C C C C G G U G G G G G G G C Pr G 1 0 8 1 2 3 4 1 2 3 Analogy In sequence alignment the score for aligning a column is position independent In profiles or HMMs position specific scoring is used to distinguish conserved positions from non conserved positions Similar ideas can be used for RNA March 2006 Vineet Bafna Covariance models RNA profiles S Terminal symbols correspond to columns W1 a W2 a W 2 b W3 b a W4 b A A A U March 2006 A A A A U U U U A Vineet Bafna A U Aligning a sequence to a covariance model We align each node of the covariance model it is tree like but may be a graph The alignment score follows the same recurrence as in Lecture 7 but with position specific probabilities Example A Wi i j log Pr Wi s i Wj s j A Wj i 1 j 1 If we wish to compute the probability that a sequence belongs to a family we compute the total likelihood sum over all probabilities If we wish to compute the structure of an unknown sequence by comparison to a covariance model we compute the max likelihood parse in this graph March 2006 Vineet Bafna Covariance models and ncRNA discovery Given a family of ncRNA sequences scan a genomic sequence with a covariance model and retrieve all high scoring sub sequences This is the most common method but it is expensive Assume covariance model has m states and the substring has at most n symbols and the database has L symbols Alignment cost O n2m1 n3m2 Total time March 2006 Vineet Bafna Computing covariance models If we are given a CM a multiple structural alignment is easy In turn align each sequence to the CM If we are given a multiple alignment computing the covariance model is easy For simultaneous prediction a Bayesian iterative approach is used Compute a seed alignment Use the alignment to compute a CM Use the CM to compute a new alignment Iterate March 2006 Vineet Bafna Open Compute a structural multiple alignment Existing methods do not work well without good seed alignment and require excessive hand curation Here we solve a simpler problem Predict conserved structure in unaligned sequences March 2006 Vineet Bafna Motivation to a new approach ACCUU AAGGA p 1 4 5 0 001 Base pairs appear in clusters we call them stacks which is energetically favorable Most of the stability of the RNA secondary structure is determined by stacks March 2006 Vineet Bafna Statistics of the stacks in Rfam database Most base pairs are stacked up 1 0 9 0 8 0 7 0 6 0 5 0 4 0 3 0 2 0 1 0 Fraction of true stacks missed 1 2 3 4 5 6 length of stacks March 2006 Vineet Bafna 7 8 9 10 Using stacks as anchors for predictions The idea of anchors as constraints has been used in multiple genomic sequence alignment MAVID Bray and Pachter 2004 TBA Blanchette et al 2004 Several heuristic methods have been developed by finding anchored stacks Waterman 1989 used a statistical approach to choose conserved stacks within fixed size windows Ji and Stormo 2004 and Perriquet et al 2003 use primary sequence conservation of the stacks and the length of loop regions to reduce the searching space March 2006 Vineet Bafna stack anchor has low sequence similarity It s hard to find correct anchors Problem Selecting one stack at a time may cause wrong matching stacks March 2006 Vineet Bafna A global approach configuration of stacks RNA secondary structure can be viewed as stacks plus unpaired loops no individual base pairs The energy of the structure is the sum of the energies of stacks and loops Stack configuration Nested stacks Parallel stacks Crossing stacks pseudo knots More generalized stacks can include mismatches in the March 2006 Vineet Bafna stacks RNA Stack based Consensus Folding RNAscf problem Find conserved stack configurations for a set of unaligned RNA sequence Optimize both stability free energy of the structure and sequence similarity computed based on these common stacks as anchors March 2006 Vineet Bafna RNA stack


View Full Document

UCSD CSE 280B - ncRNA Detection

Download ncRNA Detection
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view ncRNA Detection and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view ncRNA Detection and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?