PSU CSE 565 - Dynamic Programming - D2459988

Home> Schools> Penn State University> Computer Science and Engineering (CSE) > CSE 565> Dynamic Programming

PSU CSE 565 - Dynamic Programming

Course Cse 565- Algorithm Design and Analysis

Pages 24

Download Save

Unformatted text preview:

10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Adam Smith Algorithm Design and Analysis LECTURE 18 Dynamic Programming •(Segmented LS recap) •Longest Common SubsequenceSegmented Least Squares Least squares.  Foundational problem in statistic and numerical analysis.  Given n points in the plane: (x1, y1), (x2, y2) , . . . , (xn, yn).  Find a line y = ax + b that minimizes the sum of the squared error: Solution. Calculus  min error is achieved when SSE = ( yi axi b)2i=1n a =nxiyi ( xi)i( yi)iinxi2 ( xi)2ii, b =yi axiiinx y O(n) timeSegmented Least Squares Segmented least squares.  Points lie roughly on a sequence of several line segments.  Given n points in the plane (x1, y1), (x2, y2) , . . . , (xn, yn) with  x1 < x2 < ... < xn, find a sequence of lines that minimizes: – E = sum of the sums of the squared errors in all segments – L = number of lines  Tradeoff via objective function: E + c L, for some constant c > 0. x y Note: discontinuous functions permitted!Dynamic Programming: Multiway Choice Notation.  OPT(j) = minimum cost for points p1, pi+1 , . . . , pj.  e(i, j) = minimum sum of squares for points pi, pi+1 , . . . , pj. To compute OPT(j):  Last segment uses points pi, pi+1 , . . . , pj for some i.  Cost = e(i, j) + c + OPT(i-1). OPT( j) =0 if j = 0min1  i  je(i, j) + c + OPT(i 1){}otherwise    Segmented Least Squares: Algorithm Running time. O(n3).  Bottleneck = computing e(i, j) for O(n2) pairs, O(n) per pair using previous formula. INPUT: n, p1,…,pN , c Segmented-Least-Squares() { M[0] = 0 for j = 1 to n for i = 1 to j compute the least square error eij for the segment pi,…, pj for j = 1 to n M[j] = min 1  i  j (eij + c + M[i-1]) return M[n] } can be improved to O(n2) by pre-computing various statisticsLeast Common Subsequence A.k.a. “sequence alignment” “edit distance” … 10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Longest Common Subsequence (LCS) •Given two sequences x[1 . . m] and y[1 . . n], find a longest subsequence common to them both. x: A B C B D A B y: B D C A B A “a” not “the” BCBA = LCS(x, y)10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Longest Common Subsequence (LCS) •Given two sequences x[1 . . m] and y[1 . . n], find a longest subsequence common to them both. x: A B C B D A B y: B D C A B A “a” not “the” BCBA = LCS(x, y) BCABMotivation • Approximate string matching [Levenshtein, 1966] – Search for “occurance”, get results for “occurrence” • Computational biology [Needleman-Wunsch, 1970’s] – Simple measure of genome similarity cgtacgtacgtacgtacgtacgtatcgtacgtacgtacgtacgtacgtacgtacgtacgtacgt10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. WayneMotivation • Approximate string matching [Levenshtein, 1966] – Search for “occurance”, get results for “occurrence” • Computational biology [Needleman-Wunsch, 1970’s] – Simple measure of genome similarity  acgtacgtacgtacgtcgtacgtatcgtacgtaacgtacgtacgtacgtcgtacgta cgtacgt•n – length(LCS(x,y)) is called the “edit distance” 10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Brute-force LCS algorithm Check every subsequence of x[1 . . m] to see if it is also a subsequence of y[1 . . n]. Analysis •Checking = O(n) time per subsequence. •2m subsequences of x (each bit-vector of length m determines a distinct subsequence of x). Worst-case running time = O(n2m) = exponential time.10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Dynamic programming algorithm Simplification: 1. Look at the length of a longest-common subsequence. 2. Extend the algorithm to find the LCS itself. Strategy: Consider prefixes of x and y. •Define c[i, j] = | LCS(x[1 . . i], y[1 . . j]) |. •Then, c[m, n] = | LCS(x, y) |. Notation: Denote the length of a sequence s by | s |.10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Recursive formulation c[i, j] = c[i–1, j–1] + 1 if x[i] = y[j], max{c[i–1, j], c[i, j–1]} otherwise. Base case: c[i,j]=0 if i=0 or j=0. Case x[i] = y[ j]: 1 2 i m 1 2 j n x: y: = The second case is similar.10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Dynamic-programming hallmark #1 Optimal substructure An optimal solution to a problem (instance) contains optimal solutions to subproblems. If z = LCS(x, y), then any prefix of z is an LCS of a prefix of x and a prefix of y.10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Recursive algorithm for LCS LCS(x, y, i, j) // ignoring base cases if x[i] = y[ j] then c[i, j]  LCS(x, y, i–1, j–1) + 1 else c[i, j]  max{LCS(x, y, i–1, j), LCS(x, y, i, j–1)} return c[i, j] Worse case: x[i]  y[ j], in which case the algorithm evaluates two subproblems, each with only one parameter decremented.10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Recursion tree Height = m + n  work potentially exponential. m = 7, n = 6: 7,6 6,6 7,5 6,5 5,5 6,4 6,5 5,5 6,4 5,6 4,6 5,5 7,4 6,4 7,3 m+n10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne same subproblem , but we’re solving subproblems already solved! Recursion tree Height = m + n  work potentially exponential. Height = m + n  work potentially exponential. m = 7, n = 6: 7,6 6,6 7,5 6,5 5,5 6,4 6,5 5,5 6,4 5,6 4,6 5,5 7,4 6,4 7,3 m+n10/6/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Dynamic-programming hallmark #2 Overlapping subproblems A recursive solution contains a “small” number of distinct subproblems repeated many times. The number of distinct LCS subproblems for two strings of lengths m and n is only m n.10/6/2008 A. Smith; based on slides by E.

View Full Document


School:
Email:
New Password:
Confirm Password:

PSU CSE 565 - Dynamic Programming

Sign up for free to view:

Please select your school