6 Dynamic Programming Those who cannot remember the past are condemned to repeat it Santayana Algorithm Design by va Tardos and Jon Kleinberg Copyright 2005 Addison Wesley Slides by Kevin Wayne Algorithmic Paradigms Greed Build up a solution incrementally myopically optimizing some local criterion Divide and conquer Break up a problem into two sub problems solve each sub problem independently and combine solution to sub problems to form solution to original problem Dynamic programming Break up a problem into a series of overlapping sub problems and build up solutions to larger and larger sub problems 2 6 6 Sequence Alignment Algorithm Design by va Tardos and Jon Kleinberg Copyright 2005 Addison Wesley Slides by Kevin Wayne String Similarity How similar are two strings ocurrance occurrence o c u r r o c c u r a n c e r e n c e 5 mismatches 1 gap o c u r o c c u r r a n c e r e n c e 1 mismatch 1 gap o c u r o c c u r r r e a n c e n c e 0 mismatches 3 gaps 4 Edit Distance Applications Basis for Unix diff Speech recognition Computational biology Edit distance Levenshtein 1966 Needleman Wunsch 1970 Smith Waterman 1981 Gap penalty mismatch penalty pq Cost sum of gap and mismatch penalties C T G A C C T A C C T C T G A C C T A C C T C C T G A C T A C A T C C T G A C T A C A T TC GT AG 2 CA 2 CA 5 Sequence Alignment Goal Given two strings X x1 x2 xm and Y y1 y2 yn find alignment of minimum cost Def An alignment M is a set of ordered pairs xi yj such that each item occurs in at most one pair and no crossings Def The pair xi yj and xi yj cross if i i but j j cost M x y i x y M j 1i4j243 mismatch i x unmatched j y unmatched j 14i444 4244 444 3 Ex CTACCG vs TACATG Sol M x2 y1 x3 y2 x4 y3 x5 y4 x6 y6 gap x1 x2 x3 x4 x5 C T A C C G T A C A T G y1 y2 y3 y4 y5 y6 x6 6 Sequence Alignment Problem Structure Def OPT i j min cost of aligning strings x1 x2 xi and y1 y2 yj Case 1 OPT matches xi yj pay mismatch for xi yj min cost of aligning two strings x1 x2 xi 1 and y1 y2 yj 1 Case 2a OPT leaves xi unmatched pay gap for xi and min cost of aligning x1 x2 xi 1 and y1 y2 yj Case 2b OPT leaves yj unmatched pay gap for yj and min cost of aligning x1 x2 xi and y1 y2 yj 1 j xi y j OPT i 1 j 1 OPT i j min OPT i 1 j OPT i j 1 i if i 0 otherwise if j 0 7 Sequence Alignment Algorithm Sequence Alignment m n x1x2 xm y1y2 yn for i 0 to m M 0 i i for j 0 to n M j 0 j for i 1 to m for j 1 to n M i j min xi yj M i 1 j 1 M i 1 j M i j 1 return M m n Analysis mn time and space English words or sentences m n 10 Computational biology m n 100 000 10 billions ops OK but 10GB array 8 6 7 Sequence Alignment in Linear Space Algorithm Design by va Tardos and Jon Kleinberg Copyright 2005 Addison Wesley Slides by Kevin Wayne Sequence Alignment Linear Space Q Can we avoid using quadratic space Easy Optimal value in O m n space and O mn time Compute OPT i from OPT i 1 No longer a simple way to recover alignment itself Theorem Hirschberg 1975 Optimal alignment in O m n space and O mn time Clever combination of divide and conquer and dynamic programming Inspired by idea of Savitch from complexity theory 10 Sequence Alignment Linear Space Edit distance graph Let f i j be shortest path from 0 0 to i j Observation f i j OPT i j y1 y2 y3 y4 y5 y6 0 0 x1 xi y j x2 x3 i j m n 11 Sequence Alignment Linear Space Claim f i j OPT i j Pf by induction on i j Base case f 0 0 OPT 0 0 0 Inductive step assume f i j OPT i j for all i j i j Last edge on path to i j is either from i 1 j 1 i 1 j or i j 1 y1 y2 y3 y4 y5 y6 0 0 x1 xi y j x2 i j x3 m n 12 Sequence Alignment Linear Space Edit distance graph Let f i j be shortest path from 0 0 to i j Can compute f j for any j in O mn time and O m n space j y1 y2 y3 y4 y5 y6 0 0 x1 x2 x3 i j m n 13 Sequence Alignment Linear Space Edit distance graph Let g i j be shortest path from i j to m n Can compute by reversing the edge orientations and inverting the roles of 0 0 and m n y1 y2 y3 y4 y5 y6 0 0 i j x1 xi y j x2 x3 m n 14 Sequence Alignment Linear Space Edit distance graph Let g i j be shortest path from i j to m n Can compute g j for any j in O mn time and O m n space j x1 y1 y2 y3 y4 y5 y6 0 0 i j x2 x3 m n 15 Sequence Alignment Linear Space Observation 1 The cost of the shortest path that uses i j is f i j g i j x1 y1 y2 y3 y4 y5 y6 0 0 i j x2 x3 m n 16 Sequence Alignment Linear Space Observation 2 let q be an index that minimizes f q n 2 g q n 2 Then the shortest path from 0 0 to m n uses q n 2 n 2 x1 y1 y2 y3 y4 y5 y6 0 0 q i j x2 x3 m n 17 Sequence Alignment Linear Space Divide find index q that minimizes f q n 2 g q n 2 using DP Align xq and yn 2 Conquer recursively compute optimal alignment in each piece n 2 x1 y1 y2 y3 y4 y5 y6 0 0 q i j x2 x3 m n 18 Sequence Alignment Running Time Analysis Warmup Theorem Let T m n max running time of algorithm on strings of length at most m and n T m n O mn log n T m n 2T m n 2 O mn T m n O mn log n Remark Analysis is not tight because two sub problems are …
View Full Document