Unformatted text preview:

UC Berkeley—CS 170: Efficient Algorithms and Intractable Problems Handout 13Lecturer: David Wagner March 13, 2003Notes 13 for CS 1701 Introduction to Dynamic ProgrammingRecall our first algorithm for computing the n-th Fibonacci number Fn; it just recursivelyapplied the definition Fn= Fn−1+ Fn−2, so that a function call to compute Fnresulted intwo functions calls to compute Fn−1and Fn−2, and so on. The problem with this approachwas that it was very expensive, because it ended up calling a function to compute Fjfor eachj < n possibly very many times, even after Fjhad already been computed. We improvedthis algorithm by building a table of values of Fibonacci numbers, computing Fnby lookingup Fn−1and Fn−2in the table and simply adding them. This lowered the cost of computingFnfrom exponential in n to just linear in n.This worked because we could sort the problems of computing Fnsimply by increasingn, and compute and store the Fibonacci numbers with small n before computing those withlarge n.Dynamic programming uses exactly the same idea:1. Express the solution to a problem in terms of solutions to smaller problems.2. Solve all the smallest problems first and put their solutions in a table, then solve thenext larger problems, putting their solutions into the table, solve and store the nextlarger problems, and so on, up to the problem one originally wanted to solve. Eachproblem should be easily solvable by looking up and combining solutions of smallerproblems in the table.For Fibonacci numbers, how to compute Fnin terms of smaller problems Fn−1andFn−2was obvious. For more interesting problems, figuring out how to break big problemsinto smaller ones is the tricky part. Once this is done, the the rest of algorithm is usuallystraightforward to produce. We will illustrate by a sequence of examples, starting with“one-dimensional” problems that are most analogous to Fibonacci.2 String ReconstructionSuppose that all blanks and punctuation marks have been inadvertently removed from atext file, and its beginning was polluted with a few extraneous characters, so the file lookssomething like ”lionceuponatimeinafarfarawayland...” You want to reconstruct the file usinga dictionary.This is a typical problem solved by dynamic programming. We must define what isan appropriate notion of subproblem. Subproblems must be ordered by size, and eachsubproblem must be easily solvable, once we have the solutions to all smaller subproblems.Once we have the right notion of a subproblem, we write the appropriate recursive equationexpressing how a subproblem is solved based on solutions to smaller subproblems, and theNotes number 13 2program is then trivial to write. The complexity of the dynamic programming algorithmis precisely the total number of subproblems times the number of smaller subproblems wemust examine in order to solve a subproblem.In this and the next few examples, we do dynamic programming on a one-dimensionalobject—in this case a string, next a sequence of matrices, then a set of strings alphabeticallyordered, etc. The basic observation is this: A one-dimensional object of length n has aboutn2sub-objects (substrings, etc.), where a sub-object is defined to span the range from i toj, where i, j ≤ n. In the present case a subproblem is to tell whether the substring of thefile from character i to j is the concatenation of words from the dictionary. Concretely, letthe file be f[1 . . . n], and consider a 2-D array of Boolean variables T (i, j), where T (i, j)is true if and only if the string f [i . . . j] is the concatenation of words from the dictionary.The recursive equation is this:T (i, j) = dict(x[i . . . j]) ∨_i≤k<j[T (i, k) ∧ T (k + 1, j)]In principle, we could write this equation verbatim as a recursive function and execute it.The problem is that there would be exponentially many recursive calls for each short string,and 3ncalls overall.Dynamic programming can be seen as a technique of implementing such recursive pro-grams, that have heavy overlap between the recursion trees of the two recursive calls, sothat the recursive function is called once for each distinct argument; indeed the recursionis usually “unwound” and disappears altogether. This is done by modifying the recursiveprogram so that, in place of each recursive call a table is consulted. To make sure theneeded answer is in the table, we note that the lengths of the strings on the right hand sideof the equation above are k − i + 1 and j − k, a both of which are shorter than the string onthe left (of length j − i + 1). This means we can fill the table in increasing order of stringlength.for d := 0 to n − 1 do ... d + 1 is the size (string length) of the subproblem being solvedfor i := 1 to n − d do ... the start of the subproblem being solvedj = i + dif dict(x[i . . . j]) then T (i, j) :=true elsefor k := i to j − 1 doif T (i, k) =true and T (k + 1, j) =true then do {T (i, j) :=true}The complexity of this program is O(n3): three nested loops, ranging each roughly over nvalues.Unfortunately, this program just returns a meaningless Boolean, and does not tell ushow to reconstruct the text. Here is how to reconstruct the text. Just expand the innermostloop (the last assignment statement) to{T [i, j] :=true, first[i, j] := k, exit for}where first is an array of pointers initialized to nil. Then if T [i, j] is true, so that thesubstring from i to j is indeed a concatenation of dictionary words, then first[i, j] pointsto a breaking point in the interval i, j. Notice that this improves the running time, byexiting the for loop after the first match; more optimizations are possible. This is typicalof dynamic programming algorithms: Once the basic algorithm has been derived usingNotes number 13 3dynamic programming, clever modifications that exploit the structure of the problem speedup its running time.3 Edit Distance3.1 DefinitionWhen you run a spell checker on a text, and it finds a word not in the dictionary, it normallyproposes a choice of possible corrections.If it finds stell it might suggest tell, swell, stull, still, steel, steal, stall,spell, smell, shell, and sell.As part of the heuristic used to propose alternatives, words that are “close” to themisspelled word are proposed. We will now see a formal definition of “distance” betweenstrings, and a simple but efficient algorithm to compute such distance.The distance between two strings x = x1· · · xnand y = y1· · · ymis


View Full Document

Berkeley COMPSCI 170 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?