DOC PREVIEW
Stanford CS 262 - Rapid Global Alignments

This preview shows page 1-2-16-17-18-33-34 out of 34 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Rapid Global AlignmentsMotivationMain IdeaOutlineMethods to FIND Local AlignmentsFinding Local Alignments: Sorting k-long wordsSorting k-long words: exampleRunning timeSuffix TreesDefinition of a Suffix TreeConstructing a Suffix TreeNaïve Algorithm to Construct a Suffix TreeExample of Suffix Tree ConstructionFaster ConstructionMemory to Store Suffix TreeApplication: find all matches between x, ySlide 17Application: String search on a databaseApplication: common substrings of k stringsSlide 20The Problem: Find a Chain of Local AlignmentsQuadratic Time SolutionSlide 23Sparse Dynamic ProgrammingSlide 25Sparse Dynamic Programming – L.I.S.Sparse LCS expressed as LISSlide 28Sparse Dynamic Programming for LISSlide 30Sparse DP for rectangle chainingSlide 32ExampleTime AnalysisRapid Global AlignmentsHow to align genomic sequences in (more or less) linear timeMotivation•Genomic sequences are very long:Human genome = 3 x 109 –longMouse genome = 2.7 x 109 –long•Aligning genomic regions is useful for revealing common gene structureUseful to compare regions > 1,000,000-longMain IdeaGenomic regions of interest contain ordered islands of similarity, such as genes1. Find local alignments2. Chain an optimal subset of themOutline•Methods to FIND Local AlignmentsSorting k-long wordsSuffix Trees•Methods to CHAIN Local AlignmentsDynamic ProgrammingSparse Dynamic ProgrammingMethods to FIND Local Alignments1. Sorting K-long wordsBLAST, BLAT, and the like2. Suffix TreesFinding Local Alignments: Sorting k-long wordsGiven sequences x, y:1. Write down all (w, 0, i):w = xi+1…xi+k(z, 1, j): z = yj+1…yj+k2. Sort them lexicographically3. Deduce all k-long matches between x and y4. Extend to local alignmentsSorting k-long words: exampleLet x, y be matched with 3-long words:x = caggc: (cag,0,0), (agg,0,1), (ggc,0,2)y = ggcag: (ggc,1,0), (gca,1,1), (cag,1,2)Sorted: (agg,0,1),(cag,0,0),(cag,1,2),(ggc,0,2),(ggc,1,0),(gca,1,1)Matches:1. cag: x1x2x3 = y3y4y52. ggc: x3x4x5 = y1y2y3Running time•Worst case: O(NxM)•In practice: a large value of k results in a short list of matchesTradeoff: Low k: worse running timeHigh k: significant alignments missedPatternHunter:Sampling non-consecutive positions increases the likelihood to detect a conserved region, for a fixed value of k – refer to Lecture 3Suffix Trees•Suffix trees are a method to find all maximal matches between two strings (and much more)Example: x = dabdacd a b d a ccabdaccccadb142563Definition of a Suffix TreeDefinition:For string x = x1…xm, a suffix tree is:A rooted tree with m leavesLeaf i: xi…xmEach edge is a substringNo two edges out of a node, start with same letterIt follows, every substring corresponds to an initial part of a path from root to a leafConstructing a Suffix Tree•Naïve algorithm: O( N2 ) time•Better algorithms: O( N ) time(outside the scope of this class – too technical and not so interesting)Memory: O( N ) but with a significant constantNaïve Algorithm to Construct a Suffix Tree1. Initialize tree T: a single root node r2. Insert special symbol $ at end of x3. For j = 1 to m•Find longest match of xi…xm to T, starting from r•Split edge where match stops: new node w•Create edge (w, j), and label with unmatched portion of xi…xmExample of Suffix Tree Construction1x = d a b d a $d a b d a $1. Insert d a b d a $abda$22. Insert a b d a $$adb33. Insert b d a $$44. Insert d a $$55. Insert a $$66. Insert $Faster ConstructionSeveral algorithmsO( N ) time, O( N ) memory with a big constantTechnical but not deep, outside the scope of this courseOptional: Gusfield, chapter 6Memory to Store Suffix Tree•Can store in O( N ) memory!•Every edge is labeled with (i, j):(i,j) denotes xi…xj•Tree has O( N ) nodesProof:1. # leafs  # nodes – 12. # leafs = |x|Application: find all matches between x, y1. Build suffix tree for x, mark nodes with x2. Insert y in suffix tree, mark all nodes y “passes from” with yThe path label of every node marked both 0 and 1, is a common substring1x = d a b d a $y = a b a d a $d a b d a $1. Construct tree for xabda$2$adb3$4$5$6xxx6. Insert a $566. Insert $4. Insert a d a $da$35. Insert d a $y42. Insert a b a d a $ayda$1yyx3. Insert b a d a $ady2a$xExample of Suffix Tree constructionApplication: String search on a databaseSay we have a database D = { s1, s2, …sn }(e.g., proteins)Question: Given new string x, find all matches of x to database1. Build suffix tree for {s1,…, sn}2. All new queries x take O( |x| ) time (somewhat like BLAST)Application: common substrings of k stringsTo find the longest common substring of s1, s2, …sn 1. Build suffix tree for s1,…, sn2. All nodes labeled {si1, …, sik} represent a match between si1, …, sikMethods to CHAIN Local AlignmentsSparse Dynamic ProgrammingO(N log N)The Problem: Find a Chain of Local Alignments(x,y)  (x’,y’)requiresx < x’y < y’Each local alignment has a weightFIND the chain with highest total weightQuadratic Time Solution•Build Directed Acyclic Graph (DAG):Nodes: local alignments [(xa,xb)  (ya,yb)] & scoreDirected edges: local alignments that can be chained•edge ( (xa, xb, ya, yb) , (xc, xd, yc, yd) )•xa < xb < xc < xd•ya < yb < yc < ydEach local alignmentis a node vi with alignment score siQuadratic Time SolutionDynamic programming:Initialization:Find each node va s.t. there is no edge (u,v0)Set score of V(a) to be saIteration:For each vi, optimal path ending in vi has total score:V(i) = max ( weight(vj, vi) + V(j) )Termination:Optimal global chain: j = argmax ( V(j) ); trace chain from vjWorst case time: quadraticSparse Dynamic ProgrammingBack to the LCS problem:•Given two sequencesx = x1, …, xmy = y1, …, yn•Find the longest common subsequenceQuadratic solution with DP•How about when “hits” xi = yj are sparse?Sparse Dynamic Programming15 3 24 16 20 4 24 3 11 1842024311151141820•Imagine a situation where the number of hits is much smaller than O(nm) – maybe O(n) insteadSparse Dynamic Programming – L.I.S. •Longest Increasing Subsequence•Given a sequence over an ordered alphabetx = x1, …, xm•Find a subsequences = s1, …, sks1 < s2 < … < skSparse LCS expressed as LISCreate a sequence w•Every matching point x-to-y, (i, j), is inserted into a sequence as follows:•For each position j of x, from smallest to largest, insert in z the


View Full Document

Stanford CS 262 - Rapid Global Alignments

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Rapid Global Alignments
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Rapid Global Alignments and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Rapid Global Alignments 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?