DOC PREVIEW
U of I CS 466 - Sequence alignment

This preview shows page 1-2-3-21-22-23-43-44-45 out of 45 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Dynamic Programming:Sequence alignmentCS 466Saurabh SinhaDNA Sequence Comparison: FirstSuccess Story• Finding sequence similarities with genes ofknown function is a common approach toinfer a newly sequenced gene’s function• In 1984 Russell Doolittle and colleaguesfound similarities between cancer-causinggene and normal growth factor (PDGF) gene• A normal growth gene switched on at thewrong time causes cancer !Cystic Fibrosis• Cystic fibrosis (CF) is a chronic and frequently fatal geneticdisease of the body's mucus glands. CF primarily affects therespiratory systems in children.• Search for the CF gene was narrowed to ~1 Mbp, and theregion was sequenced.• Scanned a database for matches to known genes. Asegment in this region matched the gene for some ATPbinding protein(s). These proteins are part of the ion transportchannel, and CF involves sweat secretions with abnormalsodium content!Role for Bioinformatics• Gene similarities between two genes with knownand unknown function alert biologists to somepossibilities• Computing a similarity score between two genestells how likely it is that they have similarfunctions• Dynamic programming is a technique forrevealing similarities between genesMotivating DynamicProgrammingDynamic programming example:Manhattan Tourist ProblemImagine seeking apath (from sourceto sink) to travel(only eastward andsouthward) with themost number ofattractions (*) in theManhattan gridSink******** ***Source*Imagine seeking apath (from sourceto sink) to travel(only eastward andsouthward) with themost number ofattractions (*) in theManhattan gridSink******** ***Source*Dynamic programming example:Manhattan Tourist ProblemManhattan Tourist Problem: FormulationGoal: Find a “most weighted” path in aweighted grid.Input: A weighted grid G with two distinctvertices, one labeled “source” and the otherlabeled “sink”Output: A “most weighted” path in G from“source” to “sink”MTP: An Example3 2 40 7 33 301 3 24456465582250 1 2 30123j coordinatei coordinate13sourcesink43 2 4 01 024 331122241995152302034MTP: Greedy Algorithm Is Not Optimal1 2 5 2152 340 0 05303501035512promising start,but leads tobad choices!sourcesink1822MTP: Simple RecursiveProgramMT(n,m) if n=0 or m=0 return MT(n,m) x  MT(n-1,m)+ length of the edge from (n- 1,m) to (n,m) y  MT(n,m-1)+ length of the edge from (n,m-1) to (n,m) return max{x,y}What’s wrong with this approach?Here’s what’s wrong• M(n,m) needs M(n, m-1) and M(n-1, m)• Both of these need M(n-1, m-1)• So M(n-1, m-1) will be computed atleast twice• Dynamic programming: the same ideaas this recursive algorithm, but keep allintermediate results in a table and reuse150 101isource15S1,0 = 5S0,1 = 1• Calculate optimal path score for each vertex in the graph• Each vertex’s score is the maximum of the prior verticesscore plus the weight of the respective edge in betweenMTP: Dynamic ProgrammingjMTP: Dynamic Programming(cont’d)1 2530 1 2012source1 3584S2,0 = 8iS1,1 = 4S0,2 = 33-5jMTP: Dynamic Programming(cont’d)1 2530 1 2 30123isource1 358840581035-59131-5S3,0 = 8S2,1 = 9S1,2 = 13S3,0 = 8jMTP: Dynamic Programming(cont’d)greedy alg. fails!1 2 5-5 1 -5-53053035010-3-50 1 2 30123isource1 3 85884913 8912S3,1 = 9S2,2 = 12S1,3 = 8jMTP: Dynamic Programming(cont’d)1 2 5-5 1 -5-53 30 053035010-3-5-520 1 2 30123isource1 3 85884913 8129159jS3,2 = 9S2,3 = 15MTP: Dynamic Programming(cont’d)1 2 5-5 1 -5-53 30 053035010-3-5-520 1 2 30123isource1 3 85884913 8129159j0116S3,3 = 16(showing all back-traces)Done!MTP: RecurrenceComputing the score for a point (i,j) by therecurrence relation:si, j =maxsi-1, j + weight of the edge between (i-1, j) and (i, j)si, j-1 + weight of the edge between (i, j-1) and (i, j)The running time is n x m for a n by m grid(n = # of rows, m = # of columns)Traveling in the Grid•By the time the vertex x is analyzed, the valuessy for all its predecessors y should be computed– otherwise we are in trouble.•We need to traverse the vertices in some order•For a grid, can traverse vertices row by row,column by column, or diagonal by diagonalTraversing the Manhattan Grid• 3 different strategies:– a) Column by column– b) Row by row– c) Along diagonalsa) b)c)AlignmentAligning DNA SequencesV = ATCTGATGW = TGCATACn = 8m = 7CATACGTGTAGTCTAV W matchinsertiondeletionmismatchindels4122 matches mismatches deletions insertionsAlignment : 2 x k matrix ( k ≥ m, n )Longest Common Subsequence (LCS) –Alignment without Mismatches• Given two sequences v = v1 v2…vm and w = w1 w2…wn• The LCS of v and w is a sequence of positions inv: 1 < i1 < i2 < … < it < mand a sequence of positions inw: 1 < j1 < j2 < … < jt < nsuch that it -th letter of v equals to jt-letter of w and tis maximalLCS: ExampleA T--C T G A T C--T G C T--A--Celements of velements of w--A12012233435455666778j coords:i coords:Matches shown in redpositions in v:positions in w:2 < 3 < 4 < 6 < 81 < 3 < 5 < 6 < 7Every common subsequence is a path in 2-D grid00(0,0)(1,0)(2,1)(2,2)(3,3)(3,4)(4,5)(5,5)(6,6)(7,6)(8,7)Computing LCSLet vi = prefix of v of length i: v1 … viand wj = prefix of w of length j: w1 … wjThe length of LCS(vi,wj) is computed by:si, j =maxsi-1, jsi, j-1si-1, j-1 + 1 if vi = wjLCS Problem as Manhattan TouristProblemTGCATAC12345670iA T C T G A T C0 1 2 3 4 5 6 7 8jEdit Graph for LCS ProblemTGCATAC12345670iA T C T G A T C0 1 2 3 4 5 6 7 8jEdit Graph for LCS ProblemTGCATAC12345670iA T C T G A T C0 1 2 3 4 5 6 7 8jEvery path is acommonsubsequence.Every diagonaledge adds anextra element tocommonsubsequenceLCS Problem:Find a path withmaximumnumber ofdiagonal edgesBacktracking• si,j allows us to compute the length of LCS for viand wj• sm,n gives us the length of the LCS for v and w• How do we print the actual LCS ?• At each step, we chose an optimal decision si,j =max (…)• Record which of the choices was made in order toobtain this maxComputing LCSLet vi = prefix of v of length i: v1 … viand wj = prefix of w of length j: w1 … wjThe length of LCS(vi,wj) is computed by:si, j =maxsi-1, jsi, j-1si-1, j-1 + 1 if vi = wjPrinting LCS: Backtracking1. PrintLCS(b,v,i,j)2. if i = 0 or j = 03. return4. if bi,j = “ “5. PrintLCS(b,v,i-1,j-1)6. print vi7. else8. if bi,j = “ “9. PrintLCS(b,v,i-1,j)10. else11. PrintLCS(b,v,i,j-1)From LCS to Alignment• The Longest Common


View Full Document

U of I CS 466 - Sequence alignment

Download Sequence alignment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Sequence alignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Sequence alignment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?