Lecture 20 Dynamic Programming III of IV 6.006 Fall 2009Lecture 20: Dynamic Programming III: TextJustification, Knapsack, Pseudopolynomial TimeLecture Overview• Review• Bottom-Up Implementation• Parent Pointers• Text Justification• Knapsack• Pseudopolynomial TimeReadingsCLRS 15Review:* DP is all about subproblems & guessing* 5 easy steps:(a) define subproblems: count ] subprobs.(b) relate subproblem solutions, usually by guessing (part of the solution): count ]choicesIMPORTANT: check that subproblem solutions are related acyclically—recall theproblem with the obvious shortest path recursion in the last lecture!(c) recurse + memoize(d) time = ] subprobs × time/subprob.= ] subprobs × ] guesses/subpr. × overhead for combining solutions(e) check if original problem = a subproblem or solvable from DP table ( =⇒ extratime)* for sequences, good subproblems are often prefixes OR suffixes OR substrings1Lecture 20 Dynamic Programming III of IV 6.006 Fall 2009Bottom-up implementation of DP (Repetition From Previous Lecture):Alternative to recursion• subproblem dependencies form DAG (see Figure 4)—if not, we need a better recursiveformulation of the problem• imagine topological sorting the dependency graph• iterate through subproblems in that order=⇒ when solving a subproblem, have already solved all dependencies• often: “solve smaller subproblems first”Figure 1: DAG.F0F1F2F3F4F5F6F7… Figure 2: Subproblem Dependency Graph for Fibonacci Numbers.Example.Fibonacci:for k in range(n + 1): fib[k] = ···Shortest Paths:for k in range(n): for v in V : d[k, v, t] = ···Crazy Eights:for i in reversed(range(n)): trick[i] = ···Longest Common Subsequence:c(i,j) = length of the LCS(x[i:],y[j:])Recall Recursive formula:c(i, j) =(1 + c(i + 1, j + 1), if x[i] = y[j]max{c(i + 1, j), c(i, j + 1)}, if x[i] 6= y[j](1)2Lecture 20 Dynamic Programming III of IV 6.006 Fall 2009base cases: c(| x |, j) = c(i, | y |) = ∅Figure 3 shows Bottom-Up Strategies for LCS.∅∅|x||y|ijc[i, j]c[i +1,j]c[i, j + 1]c[i +1,j+ 1]∅∅∅∅∅∅∅∅∅if x[i] != y[j]if x[i]=y[j]∅∅|x||y|∅∅∅∅∅∅∅∅∅… ∅∅|x||y|∅∅∅∅∅∅∅∅… ∅∅∅|x||y|∅∅∅∅∅∅∅∅… Figure 3: Subproblem Dependency Structure for Longest Common Subsequence, and differ-ent Bottom-Up Computation Strategies.Parent Pointers- Often straightforward DP returns the value of the optimal solution.- To find the solution achieving this value, a bit more book-keeping is required.- It is usually sufficient to remember for each subproblem what guess resulted in theoptimal solution of the subproblem.- E.g., in the LCS problem it is enough to remember, for all pairs i, j, the direction“right”, “down”, or “diagonal” achieving equality in (1).- If we have these “pointers”, we can just follow them starting at position (0, 0) of thetable to reconstruct the optimal solution.3Lecture 20 Dynamic Programming III of IV 6.006 Fall 2009Text Justification:Split text into “good lines”• obvious (MS Word/Open Office) algorithm: put as many words fit on first line, repeat• but this can make very bad linesblah blah blah blah blahb l a hreallylongword reallylongwordblah blahvs.. .. .Figure 4: Good vs. Bad JustificationMathematically the line justification problem:• Input: Given array of words w[0 : n].• Scoring Rule: Suppose we are considering a line ` containing the words w[i] throughw[j]. Define the badness(`) for the line of words ` ≡ w[i : j + 1] to be, e.g.,(+∞, if total_length(`) > page_width(page_width - total_length(`))3, otherwise• Goal: Split words into lines `1= w[0 : i1], `2= w[i1: i2], etc. to minPibadness(`i).Subproblem structure:1. subproblem DP[i]= min badness for suffix words w[i :]=⇒ ] subproblems = Θ(n) where n = ] words2. guessing = where to end first line, in the optimal justification of words w[i : n]=⇒ ] choices = n −i + 1 = O(n)3. relation:• DP[i] = min(badness(i, j) + DP[j] for j in range(i + 1, n + 1))• DP[n] = ∅=⇒ time per subproblem = O(n)4. total time = O(n2)5. solution = DP[∅](& use parent pointers to recover split)4Lecture 20 Dynamic Programming III of IV 6.006 Fall 2009Knapsack:Knapsack of size S you want to pack with a subset of n items,• each item i has integer size si& real valuevi• goal: choose subset of items of maximum total value subject to total size ≤ STrivial Algorithm:Try all possible subsets of the items =⇒ runtime exponential in the number of items.First Attempt:1. subproblem DP [i] = value for suffix [i:] of items DOESN’T WORK, see below2. guessing = whether to include item i =⇒ ] choices = 23. relation:• DP [i] = max(DP [i + 1], vi+ DP [i + 1] ifsi≤ S?!)• not enough information to know whether item i fits - how much space is left?GUESS!Second Attempt, keeping more info:1. subproblem DP [i, X] = value for suffix [i :] of items, given knapsack of size X=⇒ ] subproblems = O(nS) !2. guessing: whether to include i or not in the optimal knapsack of size X3. relation:• DP [i, X] = max(DP [i + 1, X], vi+ DP [i + 1, X − si] if si≤ X)• DP [n, X] = ∅=⇒ time per subproblem = O(1)4. total time = O(nS)5. solution = DP [∅, S](& use parent pointers to recover subset)AMAZING: effectively trying all possible subsets!Knapsack is in fact NP-complete! =⇒ suspect no polynomial-time1algorithm (polynomialin length of input).1More on NP-completeness later in the term. For now, NP-complete problems is a family of hard problemsfor which no polynomial-time algorithm is known.5Lecture 20 Dynamic Programming III of IV 6.006 Fall 2009Why isn’t the above algorithm polynomial time?• here input =< S, s0, ··· , sn−1, v0, ··· , vn−1>• length in binary: O(lg S + lg s0+ ···) ≈ O(n lg . . .)• so O(nS) is not “polynomial-time”, because S is exponential in log S, an it could bethat log S dominates the size of the input• O(nS) still pretty good if S is small• “pseudopolynomial time”: polynomial in length of input & integers in the inputRemember:polynomial - GOODexponential - BADpseudopoly - SO
View Full Document