DOC PREVIEW
UMass Amherst CMPSCI 591N - Computational Linguistics

This preview shows page 1-2-3-18-19-37-38-39 out of 39 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Andrew McCallum, UMass Amherst, including material from William CohenString Edit Distance(and intro to dynamic programming)Lecture #4Computational LinguisticsCMPSCI 591N, Spring 2006University of Massachusetts AmherstAndrew McCallumAndrew McCallum, UMass Amherst, including material from William CohenDynamic Programming• (Not much to do with “programming” in theCS sense.)• Dynamic programming is efficient in findingoptimal solutions for cases with lots ofoverlapping sub-problems.• It solves problems by recombining solutionsto sub-problems, when the sub-problemsthemselves may share sub-sub-problems.Andrew McCallum, UMass Amherst, including material from William CohenFibonacci Numbers1 1 2 3 5 8 13 21 34 ...Andrew McCallum, UMass Amherst, including material from William Cohen1Andrew McCallum, UMass Amherst, including material from William CohenCalculating Fibonacci NumbersF(n) = F(n-1) + F(n-2), where F(0)=0, F(1)=1.Non-Dynamic Programming implementationFor fib(8), how many calls to function fib(n)?def fib(n): if n == 0 or n == 1: return n else: return fib(n-1) + fib(n-2)Andrew McCallum, UMass Amherst, including material from William CohenDP Example:Calculating Fibonacci Numberstable = {}def fib(n): global table if table.has_key(n): return table[n] if n == 0 or n == 1: table[n] = n return n else: value = fib(n-1) + fib(n-2) table[n] = value return valueDynamic Programming: avoid repeated calls by rememberingfunction values already calculated.Andrew McCallum, UMass Amherst, including material from William CohenDP Example:Calculating Fibonacci Numbersdef fib(n): table = [0] * (n+1) table[0] = 0 table[1] = 1 for i in range(2,n+1): table[i] = table[i-2] + table[i-1] return table[n]...or alternately, in a list instead of a dictionary...We will see this pattern many more times in this course:1. Create a table (of the right dimensions to describe our problem.2. Fill the table, re-using solutions to previous sub-problems.Andrew McCallum, UMass Amherst, including material from William CohenString Edit DistanceAndrewAmdrewz1. substitute m to n2. delete the zDistance = 2Given two strings (sequences) return the “distance”between the two strings as measured by......the minimum number of “character edit operations”needed to turn one sequence into the other.Andrew McCallum, UMass Amherst, including material from William CohenString distance metrics: Levenshtein• Given strings s and t– Distance is shortest sequence of editcommands that transform s to t, (or equivalently sto t).– Simple set of operations:• Copy character from s over to t (cost 0)• Delete a character in s (cost 1)• Insert a character in t (cost 1)• Substitute one character for another (cost 1)• This is “Levenshtein distance”Andrew McCallum, UMass Amherst, including material from William CohenLevenshtein distance - example• distance(“William Cohen”, “Willliam Cohon”)2SOECCCCCCCCICCCC2111111110000NHOC_MAILLLIWNHOC_MAILLIWsteditopcostso far...alignmentgapAlignment is a little bit like a parse.Andrew McCallum, UMass Amherst, including material from William CohenFinding the MinimumAnother fine day in the parkAnybody can see him pick the ballNot so easy.... not so clear.Not only are the strings, longer, but is isn’t immediatelyobvious where the alignments should happen.What if we consider all possible alignments by brute force?How many alignments are there?What is the minimum number of operations for....?Andrew McCallum, UMass Amherst, including material from William CohenDynamic Program Table for String EditEKcijAPSKRAPMeasure distance between strings PARKSPAKEcij =the number of editoperations neededto align PA withSPA.Andrew McCallum, UMass Amherst, including material from William CohenDynamic Programming to the Rescue!• Given some partial solution, it isn’t hard to figureout what a good next immediate step is.• Partial solution =“This is the cost for aligning s up to position i with t up to position j.• Next step =“In order to align up to positions x in s and y in t, should the last operation be a substitute, insert, or delete?How to take our big problem and chop it into building-block pieces.Andrew McCallum, UMass Amherst, including material from William CohenDynamic Program Table for String EditEKAPSKRAPMeasure distance between strings PARKSPAKEdeleteinsertsubstituteEdit operationsfor turningSPAKEintoPARKAndrew McCallum, UMass Amherst, including material from William CohenDynamic Program Table for String EditEK???c31c30Ac24c23c22c21c20Pc14c13c12c11c10Sc05c04c03c00KRAPMeasure distance between strings PARKSPAKEc02Andrew McCallum, UMass Amherst, including material from William CohenDynamic Program Table for String EditEK???c31c30Ac24c23c22c21c20Pc14c13c12c11c10Sc05c04c03c00KRAPc02D(i,j) = score of best alignment from s1..si to t1..tj= minD(i-1,j-1), if si=tj //copyD(i-1,j-1)+1, if si!=tj //substituteD(i-1,j)+1 //insertD(i,j-1)+1 //deletedeleteinsertsubstAndrew McCallum, UMass Amherst, including material from William CohenComputing Levenshtein distance - 2D(i,j) = score of best alignment from s1..si to t1..tj= minD(i-1,j-1) + d(si,tj) //subst/copyD(i-1,j)+1 //insertD(i,j-1)+1 //delete(simplify by letting d(c,d)=0 if c=d, 1 else)also let D(i,0)=i (for i inserts) and D(0,j)=jAndrew McCallum, UMass Amherst, including material from William CohenDynamic Program Table Initialized5E4K3A2P?1S43210KRAPD(i,j) = score of best alignment from s1..si to t1..tj= minD(i-1,j-1)+d(si,tj) //substituteD(i-1,j)+1 //insertD(i,j-1)+1 //deleteAndrew McCallum, UMass Amherst, including material from William CohenDynamic Program Table ... filling in5E4K3A2P11S43210KRAPD(i,j) = score of best alignment from s1..si to t1..tj= minD(i-1,j-1)+d(si,tj) //substituteD(i-1,j)+1 //insertD(i,j-1)+1 //deleteAndrew McCallum, UMass Amherst, including material from William CohenDynamic Program Table ... filling in5E4K3A?2P 4 3211S43210KRAPD(i,j) = score of best alignment from s1..si to t1..tj= minD(i-1,j-1)+d(si,tj) //substituteD(i-1,j)+1 //insertD(i,j-1)+1 //deleteAndrew McCallum, UMass Amherst, including material from William CohenDynamic Program Table ... filling in5E4K3A12P 4 3211S43210KRAPD(i,j) = score of best alignment


View Full Document

UMass Amherst CMPSCI 591N - Computational Linguistics

Download Computational Linguistics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Computational Linguistics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Computational Linguistics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?