New version page

PSU CSE 565 - Dynamic Programming

Upgrade to remove ads
Upgrade to remove ads
Unformatted text preview:

10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Adam Smith Algorithm Design and Analysis LECTURE 18 Dynamic Programming •(Segmented LS recap) •Longest Common SubsequenceSegmented Least Squares Least squares.  Foundational problem in statistic and numerical analysis.  Given n points in the plane: (x1, y1), (x2, y2) , . . . , (xn, yn).  Find a line y = ax + b that minimizes the sum of the squared error: Solution. Calculus  min error is achieved when SSE = ( yi axi b)2i=1n a =nxiyi ( xi)i( yi)iinxi2 ( xi)2ii, b =yi axiiinx y O(n) timeSegmented Least Squares Segmented least squares.  Points lie roughly on a sequence of several line segments.  Given n points in the plane (x1, y1), (x2, y2) , . . . , (xn, yn) with  x1 < x2 < ... < xn, find a sequence of lines that minimizes: – E = sum of the sums of the squared errors in all segments – L = number of lines  Tradeoff via objective function: E + c L, for some constant c > 0. x y Note: discontinuous functions permitted!Dynamic Programming: Multiway Choice Notation.  OPT(j) = minimum cost for points p1, pi+1 , . . . , pj.  e(i, j) = minimum sum of squares for points pi, pi+1 , . . . , pj. To compute OPT(j):  Last segment uses points pi, pi+1 , . . . , pj for some i.  Cost = e(i, j) + c + OPT(i-1). OPT( j) =0 if j = 0min1  i  je(i, j) + c + OPT(i 1){}otherwise    Segmented Least Squares: Algorithm Running time. O(n3).  Bottleneck = computing e(i, j) for O(n2) pairs, O(n) per pair using previous formula. INPUT: n, p1,…,pN , c Segmented-Least-Squares() { M[0] = 0 for j = 1 to n for i = 1 to j compute the least square error eij for the segment pi,…, pj for j = 1 to n M[j] = min 1  i  j (eij + c + M[i-1]) return M[n] } can be improved to O(n2) by pre-computing various statisticsLeast Common Subsequence A.k.a. “sequence alignment” “edit distance” … 10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Longest Common Subsequence (LCS) •Given two sequences x[1 . . m] and y[1 . . n], find a longest subsequence common to them both. x: A B C B D A B y: B D C A B A “a” not “the” BCBA = LCS(x, y)10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Longest Common Subsequence (LCS) •Given two sequences x[1 . . m] and y[1 . . n], find a longest subsequence common to them both. x: A B C B D A B y: B D C A B A “a” not “the” BCBA = LCS(x, y) BCABMotivation • Approximate string matching [Levenshtein, 1966] – Search for “occurance”, get results for “occurrence” • Computational biology [Needleman-Wunsch, 1970’s] – Simple measure of genome similarity cgtacgtacgtacgtacgtacgtatcgtacgtacgtacgtacgtacgtacgtacgtacgtacgt10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. WayneMotivation • Approximate string matching [Levenshtein, 1966] – Search for “occurance”, get results for “occurrence” • Computational biology [Needleman-Wunsch, 1970’s] – Simple measure of genome similarity  acgtacgtacgtacgtcgtacgtatcgtacgtaacgtacgtacgtacgtcgtacgta cgtacgt•n – length(LCS(x,y)) is called the “edit distance” 10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Brute-force LCS algorithm Check every subsequence of x[1 . . m] to see if it is also a subsequence of y[1 . . n]. Analysis •Checking = O(n) time per subsequence. •2m subsequences of x (each bit-vector of length m determines a distinct subsequence of x). Worst-case running time = O(n2m) = exponential time.10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Dynamic programming algorithm Simplification: 1. Look at the length of a longest-common subsequence. 2. Extend the algorithm to find the LCS itself. Strategy: Consider prefixes of x and y. •Define c[i, j] = | LCS(x[1 . . i], y[1 . . j]) |. •Then, c[m, n] = | LCS(x, y) |. Notation: Denote the length of a sequence s by | s |.10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Recursive formulation c[i, j] = c[i–1, j–1] + 1 if x[i] = y[j], max{c[i–1, j], c[i, j–1]} otherwise. Base case: c[i,j]=0 if i=0 or j=0. Case x[i] = y[ j]: 1 2 i m 1 2 j n x: y: = The second case is similar.10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Dynamic-programming hallmark #1 Optimal substructure An optimal solution to a problem (instance) contains optimal solutions to subproblems. If z = LCS(x, y), then any prefix of z is an LCS of a prefix of x and a prefix of y.10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Recursive algorithm for LCS LCS(x, y, i, j) // ignoring base cases if x[i] = y[ j] then c[i, j]  LCS(x, y, i–1, j–1) + 1 else c[i, j]  max{LCS(x, y, i–1, j), LCS(x, y, i, j–1)} return c[i, j] Worse case: x[i]  y[ j], in which case the algorithm evaluates two subproblems, each with only one parameter decremented.10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Recursion tree Height = m + n  work potentially exponential. m = 7, n = 6: 7,6 6,6 7,5 6,5 5,5 6,4 6,5 5,5 6,4 5,6 4,6 5,5 7,4 6,4 7,3 m+n10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne same subproblem , but we’re solving subproblems already solved! Recursion tree Height = m + n  work potentially exponential. Height = m + n  work potentially exponential. m = 7, n = 6: 7,6 6,6 7,5 6,5 5,5 6,4 6,5 5,5 6,4 5,6 4,6 5,5 7,4 6,4 7,3 m+n10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Dynamic-programming hallmark #2 Overlapping subproblems A recursive solution contains a “small” number of distinct subproblems repeated many times. The number of distinct LCS subproblems for two strings of lengths m and n is only m n.10/6/2008 A. Smith; based on slides by E.


View Full Document
Download Dynamic Programming
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Dynamic Programming and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Dynamic Programming 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?