DOC PREVIEW
Stanford CS 262 - Rapid Global Alignments

This preview shows page 1-2-3-20-21-22-41-42-43 out of 43 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 43 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Rapid Global AlignmentsSlide 2Main IdeaSaving cells in DPSlide 5The Problem: Find a Chain of Local AlignmentsSparse Dynamic Programming – L.I.S.Sparse LCS expressed as LISSlide 9Sparse Dynamic Programming for LISSparse DP for rectangle chainingSlide 12Slide 13ExampleTime AnalysisExamplesWhole-genome alignment Rat—Mouse—HumanSlide 18Protein ClassificationPDB GrowthProtein classificationProtein structure classificationStructure Classification DatabasesSlide 24PSI-BLASTProfiles & Profile HMMsSlide 27Profile HMMsSlide 29Slide 30Alignment of a protein to a profile HMMHow to build a profile HMMResources on the webClassification with Profile HMMsSlide 35Generative ModelsSlide 37Slide 38Slide 39Discriminative Models -- SVMk-mer based SVMs for protein classificationSVMs will find a few support vectorsBenchmarksCS262 Lecture 15, Win06, BatzoglouRapid Global AlignmentsHow to align genomic sequences in (more or less) linear timeCS262 Lecture 15, Win06, BatzoglouCS262 Lecture 15, Win06, BatzoglouMain IdeaGenomic regions of interest contain islands of similarity, such as genes1. Find local alignments2. Chain an optimal subset of them3. Refine/complete the alignmentSystems that use this idea to various degrees:MUMmer, GLASS, DIALIGN, CHAOS, AVID, LAGAN, TBA, & othersCS262 Lecture 15, Win06, BatzoglouSaving cells in DP1. Find local alignments2. Chain -O(NlogN) L.I.S.3. Restricted DPCS262 Lecture 15, Win06, BatzoglouMethods to CHAIN Local AlignmentsSparse Dynamic ProgrammingO(N log N)CS262 Lecture 15, Win06, BatzoglouThe Problem: Find a Chain of Local Alignments(x,y)  (x’,y’)requiresx < x’y < y’Each local alignment has a weightFIND the chain with highest total weightCS262 Lecture 15, Win06, BatzoglouSparse Dynamic Programming – L.I.S.Let input be w: w1,…, wnINITIALIZATION:L: 1-indexed array, L[1]  w1B: 0-indexed array of backpointers; B[0] = 0P: array used for traceback// L[j]: smallest last element wi of j-long LIS seen so farALGORITHMfor i = 2 to n { Find j such that L[j] < w[i] ≤ L[j+1] L[j+1]  w[i]B[j+1]  iP[i]  B[j]}That’s it!!!•Running time?CS262 Lecture 15, Win06, BatzoglouSparse LCS expressed as LISCreate a sequence w•Every matching point (i, j), is inserted into w as follows:•For each column j = 1…m, insert in w the points (i, j), in decreasing row i order•The 11 example points are inserted in the order given•a = (y, x), b = (y’, x’) can be chained iffa is before b in w, andy < y’15 3 24 16 20 4 24 3 11 18642 71 8109511342024311151141820xyCS262 Lecture 15, Win06, BatzoglouSparse LCS expressed as LISCreate a sequence ww = (4,2) (3,3) (10,5) (2,5) (8,6) (1,6) (3,7) (4,8) (7,9) (5,9) (9,10)Consider now w’s elements as ordered lexicographically, where •(y, x) < (y’, x’) if y < y’Claim: An increasing subsequence of w is a common subsequence of x and y15 3 24 16 20 4 24 3 11 18642 71 8109511342024311151141820xyCS262 Lecture 15, Win06, BatzoglouSparse Dynamic Programming for LISExample:w = (4,2) (3,3) (10,5) (2,5) (8,6) (1,6) (3,7) (4,8) (7,9) (5,9) (9,10) L =1. (4,2)2. (3,3)3. (3,3) (10,5)4. (2,5) (10,5)5. (2,5) (8,6)6. (1,6) (8,6)7. (1,6) (3,7)8. (1,6) (3,7) (4,8)9. (1,6) (3,7) (4,8) (7,9)10. (1,6) (3,7) (4,8) (5,9)11. (1,6) (3,7) (4,8) (5,9) (9,10)Longest common subsequence:s = 4, 24, 3, 11, 1815 3 24 16 20 4 24 3 11 18642 71 8109511342024311151141820xyCS262 Lecture 15, Win06, BatzoglouSparse DP for rectangle chaining•1,…, N: rectangles•(hj, lj): y-coordinates of rectangle j•w(j): weight of rectangle j•V(j): optimal score of chain ending in j•L: list of triplets (lj, V(j), j)L is sorted by lj: smallest (top) to largest (bottom) valueL is implemented as a balanced binary treeyhlCS262 Lecture 15, Win06, BatzoglouSparse DP for rectangle chainingMain idea: •Sweep through x-coordinates•To the right of b, anything chainable to a is chainable to b•Therefore, if V(b) > V(a), rectangle a is “useless” – remove it•In L, keep rectangles j sorted with increasing lj-coordinates  sorted with increasing V(j) scoreV(b)V(a)CS262 Lecture 15, Win06, BatzoglouSparse DP for rectangle chainingGo through rectangle x-coordinates, from lowest to highest:1. When on the leftmost end of rectangle i:a. j: rectangle in L, with largest lj < hib. V(i) = w(i) + V(j)2. When on the rightmost end of i:a. k: rectangle in L, with largest lk  lib. If V(i)  V(k):i. INSERT (li, V(i), i) in Lii. REMOVE all (lj, V(j), j) with V(j)  V(i) & lj  liijkCS262 Lecture 15, Win06, BatzoglouExamplexy1: 53: 32: 64: 45: 22569101112141516CS262 Lecture 15, Win06, BatzoglouTime Analysis1. Sorting the x-coords takes O(N log N)2. Going through x-coords: N steps3. Each of N steps requires O(log N) time:•Searching L takes log N•Inserting to L takes log N•All deletions are consecutive, so log N per deletion•Each element is deleted at most once: N log N for all deletions•Recall that INSERT, DELETE, SUCCESSOR, take O(log N) time in a balanced binary search treeCS262 Lecture 15, Win06, BatzoglouExamplesHuman Genome BrowserABCCS262 Lecture 15, Win06, BatzoglouWhole-genome alignment Rat—Mouse—HumanCS262 Lecture 15, Win06, BatzoglouNext 2 years: 20+ mammals, & many other animals, will be sequenced & alignedCS262 Lecture 15, Win06, BatzoglouProtein ClassificationCS262 Lecture 15, Win06, BatzoglouPDB GrowthNew PDB structuresCS262 Lecture 15, Win06, BatzoglouProtein classification•Number of protein sequences grow exponentially•Number of solved structures grow exponentially•Number of new folds identified very small (and close to constant)•Protein classification canGenerate overview of structure typesDetect similarities (evolutionary relationships) between protein sequencesMorten Nielsen,CBS, BioCentrum, DTUSCOP release 1.67, Class # folds # superfamilies # familiesAll alpha proteins 202 342 550All beta proteins 141 280 529Alpha and beta proteins (a/b) 130 213 593Alpha and beta proteins (a+b) 260 386 650Multi-domain proteins 40 40 55Membrane & cell surface 42 82 91Small proteins 72 104 162Total 887 1447 2630CS262 Lecture 15, Win06, BatzoglouProtein worldProtein foldProtein structure classificationProtein superfamilyProtein familyMorten Nielsen,CBS, BioCentrum, DTUCS262 Lecture 15, Win06, BatzoglouStructure Classification Databases•SCOPManual classification (A. Murzin)scop.berkeley.edu•CATHSemi manual classification (C. Orengo)www.biochem.ucl.ac.uk/bsm/cath •FSSPAutomatic classification


View Full Document

Stanford CS 262 - Rapid Global Alignments

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Rapid Global Alignments
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Rapid Global Alignments and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Rapid Global Alignments 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?