Unformatted text preview:

Natural Language Processing Lecture 5 9 10 2013 Jim Martin Today Minimum edit distance and Spelling Correction Dynamic programming Language modeling N grams N gram intro The chain rule Model evaluation 01 13 19 Speech and Language Processing Jurafsky and Martin 2 Spelling Correction We can detect spelling errors spell check by building an FST based lexicon and noting any strings that are rejected But how do I fix graffe That is how do I come up with suggested corrections Search through all words in my lexicon Graft craft grail giraffe crafted etc Pick the one that s closest to graffe But what does closest mean We need a distance metric The simplest one minimum edit distance As in the Unix diff command 01 13 19 Speech and Language Processing Jurafsky and Martin 3 Edit Distance The minimum edit distance between two strings is the minimum number of editing operations Insertion Deletion Substitution that one would need to transform one string into the other 01 13 19 Speech and Language Processing Jurafsky and Martin 4 Note The following discussion has 2 goals 1 Learn the minimum edit distance computation and algorithm 2 Introduce dynamic programming 01 13 19 Speech and Language Processing Jurafsky and Martin 5 Why Dynamic Programming Where did the name dynamic programming come from The 1950s were not good years for mathematical research We had a very interesting gentleman in Washington named Wilson He was Secretary of Defense and he actually had a pathological fear and hatred of the word research I m not using the term lightly I m using it precisely His face would suffuse he would turn red and he would get violent if people used the term research in his presence You can imagine how he felt then about the term mathematical The RAND Corporation was employed by the Air Force and the Air Force had Wilson as its boss essentially Hence I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation What title what name could I choose In the first place I was interested in planning in decision making in thinking But planning is not a good word for various reasons I decided therefore to use the word programming I wanted to get across the idea that this was dynamic this was multistage this was time varying I thought lets kill two birds with one stone Lets take a word that has an absolutely precise meaning namely dynamic in the classical physical sense It also has a very interesting property as an adjective and that is its impossible to use the word dynamic in a pejorative sense Try thinking of some combination that will possibly give it a pejorative meaning Its impossible Thus I thought dynamic programming was a good name It was something not even a Congressman could object to So I used it as an umbrella for my activities Richard Bellman Eye of the Hurricane an autobiography 1984 01 13 19 Speech and Language Processing Jurafsky and Martin 6 Min Edit Example 01 13 19 Speech and Language Processing Jurafsky and Martin 7 Minimum Edit Distance If each operation has cost of 1 distance between these is 5 If substitutions cost 2 Levenshtein distance between these is 8 01 13 19 Speech and Language Processing Jurafsky and Martin 8 Min Edit As Search That s all well and good but how did we find that particular minimum set of operations for those two strings We can view edit distance as a search for a path a sequence of edits that gets us from the start string to the final string 01 13 19 Initial state is the word we re transforming Operators are insert delete substitute Goal state is the word we re trying to get to Path cost is what we re trying to minimize the number of edits Speech and Language Processing Jurafsky and Martin 9 Min Edit as Search 01 13 19 Speech and Language Processing Jurafsky and Martin 10 Min Edit As Search But that generates a huge search space Navigating that space in a na ve backtracking fashion would be incredibly wasteful Why Lots of distinct paths wind up at the same state But there is no need to keep track of the them all We only care about the shortest path to each of those revisited states 01 13 19 Speech and Language Processing Jurafsky and Martin 11 Defining Min Edit Distance For two strings S1 of len n S2 of len m distance i j or D i j Is the min edit distance of S1 1 i and S2 1 j That is the minimum number of edit operations need to transform the first i characters of S1 into the first j characters of S2 The edit distance of S1 S2 is D n m We compute D n m by computing D i j for all i 0 i n and j 0 j m 01 13 19 Speech and Language Processing Jurafsky and Martin 12 Defining Min Edit Distance Base conditions D i 0 i D 0 j j Recurrence Relation D i 1 j 1 D i j min D i j 1 1 D i 1 j 1 S2 j 2 if S1 i 0 if S1 i S2 j 01 13 19 Speech and Language Processing Jurafsky and Martin 13 Dynamic Programming A tabular computation of D n m Bottom up We compute D i j for small i j And compute larger D i j based on previously computed smaller values 01 13 19 Speech and Language Processing Jurafsky and Martin 14 The Edit Distance Table 01 13 19 N 9 O 8 I 7 T 6 N 5 E 4 T 3 N 2 I 1 0 1 2 3 4 5 6 7 8 9 E X E C U T I O N Speech and Language Processing Jurafsky and Martin 15 01 13 19 N 9 O 8 I 7 T 6 N 5 E 4 T 3 N 2 I 1 0 1 2 3 4 5 6 7 8 9 E X E C U T I O N Speech and Language Processing Jurafsky and Martin 16 01 13 19 N 9 8 9 10 11 12 11 10 9 8 O 8 7 8 9 10 11 10 9 8 9 I 7 6 7 8 9 10 9 8 9 10 T 6 5 6 7 8 9 8 9 10 11 N 5 4 5 6 7 8 9 10 11 10 E 4 3 4 5 6 7 8 9 10 9 T 3 4 5 6 7 8 7 8 9 8 N 2 3 4 5 6 7 8 7 8 7 I 1 2 3 4 5 6 7 6 7 8 0 1 2 3 4 5 6 7 8 9 E X E C U T I O N Speech and Language Processing Jurafsky and Martin 17 Min Edit Distance Note that the result isn t all that informative For a pair of strings …


View Full Document

CU-Boulder CSCI 5832 - Lecture 5

Loading Unlocking...
Login

Join to view Lecture 5 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 5 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?