DOC PREVIEW
Stanford CS 262 - Rapid Global Alignments

This preview shows page 1-2-3-24-25-26-27-48-49-50 out of 50 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Rapid Global AlignmentsSlide 2Saving cells in DPSlide 4The Problem: Find a Chain of Local AlignmentsSparse Dynamic ProgrammingSparse Dynamic Programming – L.I.S.Sparse Dynamic Programming – L.I.S.Sparse LCS expressed as LISSlide 10Sparse Dynamic Programming for LISSparse DP for rectangle chainingSlide 13Slide 14ExampleTime AnalysisExamplesGene RecognitionGene structureSlide 20Slide 21Needles in a HaystackGene FindingSignals for Gene FindingSlide 25Exon and Intron LengthsNucleotide CompositionSlide 28Splice SitesHMMs for Gene RecognitionSlide 31Duration HMMs for Gene RecognitionGenscanUsing Comparative InformationUsing Comparative InformationPatterns of ConservationComparison-based Gene FindersTwinscanSLAM – Generalized Pair HMMNSCAN—Multiple Species Gene PredictionSlide 41Performance ComparisonCONTRASTSlide 44CONTRAST - FeaturesCONTRAST – SVM accuraciesCONTRAST - DecodingCONTRAST - TrainingSlide 49Slide 50CS262 Lecture 9, Win07, BatzoglouRapid Global AlignmentsHow to align genomic sequences in (more or less) linear timeCS262 Lecture 9, Win07, BatzoglouCS262 Lecture 9, Win07, BatzoglouSaving cells in DP1. Find local alignments2. Chain -O(NlogN) L.I.S.3. Restricted DPCS262 Lecture 9, Win07, BatzoglouMethods to CHAIN Local AlignmentsSparse Dynamic ProgrammingO(N log N)CS262 Lecture 9, Win07, BatzoglouThe Problem: Find a Chain of Local Alignments(x,y)  (x’,y’)requiresx < x’y < y’Each local alignment has a weightFIND the chain with highest total weightCS262 Lecture 9, Win07, BatzoglouSparse Dynamic Programming15 3 24 16 20 4 24 3 11 1842024311151141820•Imagine a situation where the number of hits is much smaller than O(nm) – maybe O(n) insteadCS262 Lecture 9, Win07, BatzoglouSparse Dynamic Programming – L.I.S. •Longest Increasing Subsequence•Given a sequence over an ordered alphabetx = x1, …, xm•Find a subsequences = s1, …, sks1 < s2 < … < skCS262 Lecture 9, Win07, BatzoglouSparse Dynamic Programming – L.I.S.Let input be w: w1,…, wnINITIALIZATION:L: last LIS elt. array L[0] = -inf L[1] = w1 L[2…n] = +infB: array holding LIS elts; B[0] = 0P: array of backpointers// L[j]: smallest jth element wi of j-long LIS seen so farALGORITHMfor i = 2 to n { Find j such that L[j – 1] < w[i] ≤ L[j] L[j]  w[i]B[j]  iP[i]  B[j – 1]}That’s it!!!•Running time?CS262 Lecture 9, Win07, BatzoglouSparse LCS expressed as LISCreate a sequence w•Every matching point (i, j), is inserted into w as follows:•For each column j = 1…m, insert in w the points (i, j), in decreasing row i order•The 11 example points are inserted in the order given•a = (y, x), b = (y’, x’) can be chained iffa is before b in w, andy < y’15 3 24 16 20 4 24 3 11 18642 71 8109511342024311151141820xyCS262 Lecture 9, Win07, BatzoglouSparse LCS expressed as LISCreate a sequence ww = (4,2) (3,3) (10,5) (2,5) (8,6) (1,6) (3,7) (4,8) (7,9) (5,9) (9,10)Consider now w’s elements as ordered lexicographically, where •(y, x) < (y’, x’) if y < y’Claim: An increasing subsequence of w is a common subsequence of x and y15 3 24 16 20 4 24 3 11 18642 71 8109511342024311151141820xyCS262 Lecture 9, Win07, BatzoglouSparse Dynamic Programming for LISExample:w = (4,2) (3,3) (10,5) (2,5) (8,6) (1,6) (3,7) (4,8) (7,9) (5,9) (9,10) L = [L1] [L2] [L3] [L4] [L5] … 1. (4,2)2. (3,3)3. (3,3) (10,5)4. (2,5) (10,5)5. (2,5) (8,6)6. (1,6) (8,6)7. (1,6) (3,7)8. (1,6) (3,7) (4,8)9. (1,6) (3,7) (4,8) (7,9)10. (1,6) (3,7) (4,8) (5,9)11. (1,6) (3,7) (4,8) (5,9) (9,10)Longest common subsequence:s = 4, 24, 3, 11, 1815 3 24 16 20 4 24 3 11 18642 71 8109511342024311151141820xyCS262 Lecture 9, Win07, BatzoglouSparse DP for rectangle chaining•1,…, N: rectangles•(hj, lj): y-coordinates of rectangle j•w(j): weight of rectangle j•V(j): optimal score of chain ending in j•L: list of triplets (lj, V(j), j)L is sorted by lj: smallest (North) to largest (South) valueL is implemented as a balanced binary treeyhlCS262 Lecture 9, Win07, BatzoglouSparse DP for rectangle chainingMain idea: •Sweep through x-coordinates•To the right of b, anything chainable to a is chainable to b•Therefore, if V(b) > V(a), rectangle a is “useless” for subsequent chaining•In L, keep rectangles j sorted with increasing lj-coordinates  sorted with increasing V(j) scoreV(b)V(a)CS262 Lecture 9, Win07, BatzoglouSparse DP for rectangle chainingGo through rectangle x-coordinates, from lowest to highest:1. When on the leftmost end of rectangle i:a. j: rectangle in L, with largest lj < hib. V(i) = w(i) + V(j)2. When on the rightmost end of i:a. k: rectangle in L, with largest lk  lib. If V(i) > V(k):i. INSERT (li, V(i), i) in Lii. REMOVE all (lj, V(j), j) with V(j)  V(i) & lj  liijkIs k ever removed?CS262 Lecture 9, Win07, BatzoglouExamplexya: 5c: 3b: 6d: 4e: 225691011121415161. When on the leftmost end of rectangle i:a. j: rectangle in L, with largest lj < hib. V(i) = w(i) + V(j)2. When on the rightmost end of i:a. k: rectangle in L, with largest lk  lib. If V(i) > V(k):i. INSERT (li, V(i), i) in Lii. REMOVE all (lj, V(j), j) with V(j)  V(i) & lj  lia b c d eV5LliV(i)i55a8118c11 12911b1512d1316133CS262 Lecture 9, Win07, BatzoglouTime Analysis1. Sorting the x-coords takes O(N log N)2. Going through x-coords: N steps3. Each of N steps requires O(log N) time:•Searching L takes log N•Inserting to L takes log N•All deletions are consecutive, so log N per deletion•Each element is deleted at most once: N log N for all deletions•Recall that INSERT, DELETE, SUCCESSOR, take O(log N) time in a balanced binary search treeCS262 Lecture 9, Win07, BatzoglouExamplesHuman Genome BrowserABCCS262 Lecture 9, Win07, BatzoglouGene RecognitionCS262 Lecture 9, Win07, BatzoglouGene structureexon1exon2 exon3intron1 intron2transcriptiontranslationsplicingexon = protein-codingintron = non-codingCodon:A triplet of nucleotides that is converted to one amino acidCS262 Lecture 9, Win07, BatzoglouWhere are the genes?Where are the genes?Where are the genes?Where are the genes?CS262 Lecture 9, Win07, BatzoglouCS262 Lecture 9, Win07, BatzoglouNeedles in a HaystackCS262 Lecture 9, Win07, Batzoglou•Classes of Gene predictorsAb initio•Only look at the genomic DNA of target genomeDe novo•Target genome + aligned informant genome(s)EST/cDNA-based & combined approaches•Use aligned ESTs or cDNAs + any other kind of evidenceGene FindingEXON EXON EXON EXON


View Full Document

Stanford CS 262 - Rapid Global Alignments

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Rapid Global Alignments
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Rapid Global Alignments and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Rapid Global Alignments 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?