DOC PREVIEW
Stanford CS 262 - Rapid Global Alignments

This preview shows page 1-2-3-21-22-23-43-44-45 out of 45 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Rapid Global AlignmentsSlide 2The Problem: Find a Chain of Local AlignmentsSparse DP for rectangle chainingSlide 5Slide 6ExampleTime AnalysisPutting it All Together: Fast Global Alignment AlgorithmsLAGAN: Pairwise AlignmentLAGANLAGAN: recursive callA trick to save on memoryMultiple Sequence AlignmentsSlide 15Slide 16OverviewDefinitionScoring FunctionSlide 20Scoring Function: Sum Of PairsSum Of Pairs (cont’d)Slide 23ConsensusSlide 251. Multidimensional Dynamic ProgrammingSlide 27Slide 282. Progressive AlignmentSlide 30CLUSTALW: progressive alignmentCLUSTALW & the CINEMA viewerMLAGAN: progressive alignment of DNAMLAGAN: main stepsMLAGAN: multi-anchoringHeuristics to improve multiple alignmentsIterative RefinementSlide 38Slide 39Slide 40Slide 41Restricted MDPSlide 43Slide 44Optional refinement steps in MLAGANRapid Global AlignmentsHow to align genomic sequences in (more or less) linear timeMethods to CHAIN Local AlignmentsSparse Dynamic ProgrammingO(N log N)The Problem: Find a Chain of Local Alignments(x,y)  (x’,y’)requiresx < x’y < y’Each local alignment has a weightFIND the chain with highest total weightSparse DP for rectangle chaining•1,…, N: rectangles•(hj, lj): y-coordinates of rectangle j•w(j): weight of rectangle j•V(j): optimal score of chain ending in j•L: list of triplets (lj, V(j), j)L is sorted by ljL is implemented as a balanced binary treeyhlSparse DP for rectangle chainingMain idea: •Sweep through x-coordinates•To the right of b, anything chainable to a is chainable to b•Therefore, if V(b) > V(a), rectangle a is “useless” – remove it•In L, keep rectangles j sorted with increasing lj-coordinates  sorted with increasing V(j)V(b)V(a)Sparse DP for rectangle chainingGo through rectangle x-coordinates, from left to right:1. When on the leftmost end of rectangle i, compute V(i)a. j: rectangle in L, with largest lj < hib. V(i) = w(i) + V(j)2. When on the rightmost end of i, possibly store V(i) in L:a. j: rectangle in L, with largest lj  lib. If V(i) > V(j):i. INSERT (li, V(i), i) in Lii. REMOVE all (lk, V(k), k) with V(k)  V(i) & lk  liijExamplexy1: 53: 32: 64: 45: 22569101112141516Time Analysis1. Sorting the x-coords takes O(N log N)2. Going through x-coords: N steps3. Each of N steps requires O(log N) time:•Searching L takes log N•Inserting to L takes log N•All deletions are consecutive, so log N per deletion•Each element is deleted at most once: N log N for all deletions•Recall that INSERT, DELETE, SUCCESSOR, take O(log N) time in a balanced binary search treePutting it All Together:Fast Global Alignment Algorithms1. FIND local alignments2. CHAIN local alignments FIND CHAINGLASS: k-mers hierarchical DPMumMer: Suffix Tree sparse DPAvid: Suffix Tree hierarchical DPLAGAN CHAOS sparse DPLAGAN: Pairwise Alignment1. FIND local alignments2. CHAIN local alignments3. DP restricted around chainLAGAN1. Find local alignments2. Chain -O(NlogN) L.I.S.3. Restricted DPLAGAN: recursive call•What if a box is too large?Recursive application of LAGAN,more sensitive word searchA trick to save on memory“necks” have tiny tracebacks…only store tracebacksMultiple Sequence Multiple Sequence AlignmentsAlignmentsOverview•Definition•Scoring Schemes•AlgorithmsDefinition•Given N sequences x1, x2,…, xN:Insert gaps (-) in each sequence xi, such that•All sequences have the same length L•Score of the global map is maximum•A faint similarity between two sequences becomes significant if present in many•Multiple alignments can help improve the pairwise alignmentsScoring Function•Ideally:Find alignment that maximizes probability that sequences evolved from common ancestor, according to some phylogenetic model•More on phylogenetic models laterxyzwv?Scoring Function•A comprehensive model would have too many parameters, too inefficient to optimize•Possible simplificationsIgnore phylogenetic treeStatistically independent columns:S(m) = G(m) + i S(mi) m: alignment matrixG: function penalizing gapsScoring Function: Sum Of Pairs Definition: Induced pairwise alignmentA pairwise alignment induced by the multiple alignmentExample: x: AC-GCGG-C y: AC-GC-GAG z: GCCGC-GAGInduces:x: ACGCGG-C; x: AC-GCGG-C; y: AC-GCGAGy: ACGC-GAC; z: GCCGC-GAG; z: GCCGCGAGSum Of Pairs (cont’d)•The sum-of-pairs score of an alignment is the sum of the scores of all induced pairwise alignmentsS(m) = k<l s(mk, ml)s(mk, ml): score of induced alignment (k,l)Sum Of Pairs (cont’d)•Heuristic way to incorporate evolution tree:HumanMouseChicken•Weighted SOP:S(m) = k<l wkl s(mk, ml)wkl: weight decreasing with distanceDuckConsensus-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGACCAG-CTATCAC--GACCGC----TCGATTTGCTCGACCAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC•Find optimal consensus string m* to maximizeS(m) = i s(m*, mi)s(mk, ml): score of pairwise alignment (k,l)Multiple Sequence AlignmentsAlgorithms1. Multidimensional Dynamic ProgrammingGeneralization of Needleman-Wunsh:S(m) = i S(mi)(sum of column scores)F(i1,i2,…,iN) = max(all neighbors of cube)(F(nbr)+S(nbr))•Example: in 3D (three sequences):•7 neighbors/cellF(i,j,k) = max{ F(i-1,j-1,k-1)+S(xi, xj, xk),F(i-1,j-1,k )+S(xi, xj, - ),F(i-1,j ,k-1)+S(xi, -, xk),F(i-1,j ,k )+S(xi, -, - ),F(i ,j-1,k-1)+S( -, xj, xk),F(i ,j-1,k )+S( -, xj, xk),F(i ,j ,k-1)+S( -, -, xk) } 1. Multidimensional Dynamic ProgrammingRunning Time:1. Size of matrix: LN;Where L = length of each sequence N = number of sequences2. Neighbors/cell: 2N – 1Therefore………………………… O(2N LN)1. Multidimensional Dynamic Programming2. Progressive Alignment•Multiple Alignment is NP-complete•Most used heuristic: Progressive AlignmentAlgorithm:1. Align two of the sequences xi, xj2. Fix that alignment3. Align a third sequence xk to the alignment xi,xj4. Repeat until all sequences are alignedRunning Time: O( N L2 )2. Progressive Alignment•When evolutionary tree is known:Align closest first, in the order of the treeExample:Order of alignments: 1. (x,y)2. (z,w)3. (xy, zw)xwyzCLUSTALW: progressive alignmentCLUSTALW: most popular multiple protein alignmentAlgorithm:1. Find all dij: alignment dist (xi, xj)2. Construct a tree(Neighbor-joining hierarchical clustering)3. Align nodes in order of decreasing similarity+ a large number of heuristicsCLUSTALW & the CINEMA viewerMLAGAN: progressive alignment of DNAGiven


View Full Document

Stanford CS 262 - Rapid Global Alignments

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Rapid Global Alignments
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Rapid Global Alignments and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Rapid Global Alignments 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?