10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Adam Smith Algorithm Design and Analysis LECTURE 18 Dynamic Programming •(Segmented LS recap) •Longest Common SubsequenceSegmented Least Squares Least squares. Foundational problem in statistic and numerical analysis. Given n points in the plane: (x1, y1), (x2, y2) , . . . , (xn, yn). Find a line y = ax + b that minimizes the sum of the squared error: Solution. Calculus min error is achieved when SSE = ( yi axi b)2i=1n a =nxiyi ( xi)i( yi)iinxi2 ( xi)2ii, b =yi axiiinx y O(n) timeSegmented Least Squares Segmented least squares. Points lie roughly on a sequence of several line segments. Given n points in the plane (x1, y1), (x2, y2) , . . . , (xn, yn) with x1 < x2 < ... < xn, find a sequence of lines that minimizes: – E = sum of the sums of the squared errors in all segments – L = number of lines Tradeoff via objective function: E + c L, for some constant c > 0. x y Note: discontinuous functions permitted!Dynamic Programming: Multiway Choice Notation. OPT(j) = minimum cost for points p1, pi+1 , . . . , pj. e(i, j) = minimum sum of squares for points pi, pi+1 , . . . , pj. To compute OPT(j): Last segment uses points pi, pi+1 , . . . , pj for some i. Cost = e(i, j) + c + OPT(i-1). OPT( j) =0 if j = 0min1 i je(i, j) + c + OPT(i 1){}otherwise Segmented Least Squares: Algorithm Running time. O(n3). Bottleneck = computing e(i, j) for O(n2) pairs, O(n) per pair using previous formula. INPUT: n, p1,…,pN , c Segmented-Least-Squares() { M[0] = 0 for j = 1 to n for i = 1 to j compute the least square error eij for the segment pi,…, pj for j = 1 to n M[j] = min 1 i j (eij + c + M[i-1]) return M[n] } can be improved to O(n2) by pre-computing various statisticsLeast Common Subsequence A.k.a. “sequence alignment” “edit distance” … 10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Longest Common Subsequence (LCS) •Given two sequences x[1 . . m] and y[1 . . n], find a longest subsequence common to them both. x: A B C B D A B y: B D C A B A “a” not “the” BCBA = LCS(x, y)10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Longest Common Subsequence (LCS) •Given two sequences x[1 . . m] and y[1 . . n], find a longest subsequence common to them both. x: A B C B D A B y: B D C A B A “a” not “the” BCBA = LCS(x, y) BCABMotivation • Approximate string matching [Levenshtein, 1966] – Search for “occurance”, get results for “occurrence” • Computational biology [Needleman-Wunsch, 1970’s] – Simple measure of genome similarity cgtacgtacgtacgtacgtacgtatcgtacgtacgtacgtacgtacgtacgtacgtacgtacgt10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. WayneMotivation • Approximate string matching [Levenshtein, 1966] – Search for “occurance”, get results for “occurrence” • Computational biology [Needleman-Wunsch, 1970’s] – Simple measure of genome similarity acgtacgtacgtacgtcgtacgtatcgtacgtaacgtacgtacgtacgtcgtacgta cgtacgt•n – length(LCS(x,y)) is called the “edit distance” 10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Brute-force LCS algorithm Check every subsequence of x[1 . . m] to see if it is also a subsequence of y[1 . . n]. Analysis •Checking = O(n) time per subsequence. •2m subsequences of x (each bit-vector of length m determines a distinct subsequence of x). Worst-case running time = O(n2m) = exponential time.10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Dynamic programming algorithm Simplification: 1. Look at the length of a longest-common subsequence. 2. Extend the algorithm to find the LCS itself. Strategy: Consider prefixes of x and y. •Define c[i, j] = | LCS(x[1 . . i], y[1 . . j]) |. •Then, c[m, n] = | LCS(x, y) |. Notation: Denote the length of a sequence s by | s |.10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Recursive formulation c[i, j] = c[i–1, j–1] + 1 if x[i] = y[j], max{c[i–1, j], c[i, j–1]} otherwise. Base case: c[i,j]=0 if i=0 or j=0. Case x[i] = y[ j]: 1 2 i m 1 2 j n x: y: = The second case is similar.10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Dynamic-programming hallmark #1 Optimal substructure An optimal solution to a problem (instance) contains optimal solutions to subproblems. If z = LCS(x, y), then any prefix of z is an LCS of a prefix of x and a prefix of y.10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Recursive algorithm for LCS LCS(x, y, i, j) // ignoring base cases if x[i] = y[ j] then c[i, j] LCS(x, y, i–1, j–1) + 1 else c[i, j] max{LCS(x, y, i–1, j), LCS(x, y, i, j–1)} return c[i, j] Worse case: x[i] y[ j], in which case the algorithm evaluates two subproblems, each with only one parameter decremented.10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Recursion tree Height = m + n work potentially exponential. m = 7, n = 6: 7,6 6,6 7,5 6,5 5,5 6,4 6,5 5,5 6,4 5,6 4,6 5,5 7,4 6,4 7,3 m+n10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne same subproblem , but we’re solving subproblems already solved! Recursion tree Height = m + n work potentially exponential. Height = m + n work potentially exponential. m = 7, n = 6: 7,6 6,6 7,5 6,5 5,5 6,4 6,5 5,5 6,4 5,6 4,6 5,5 7,4 6,4 7,3 m+n10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Dynamic-programming hallmark #2 Overlapping subproblems A recursive solution contains a “small” number of distinct subproblems repeated many times. The number of distinct LCS subproblems for two strings of lengths m and n is only m n.10/6/2008 A. Smith; based on slides by E.
View Full Document