Villanova CSC 9010 - Natural Language Processing

Unformatted text preview:

Parsing with Context Free Grammars CSC 9010 Natural Language Processing Paula Matuszek and Mary-Angela Papalaskari This slide set was adapted from: Jim Martin (after Dan Jurafsky) from U. Colorado Rada Mihalcea, University of North Texas http://www.cs.unt.edu/~rada/CSCE5290/ Robert Berwick, MIT Bonnie Dorr, University of MarylandParsingProgramming languagesNatural LanguagesSome assumptions..Top-Down ParsingTop Down SpaceBottom-Up ParsingBottom-Up SpaceTop-Down VS. Bottom-UpTop-Down, Depth-First, Left-to-Right SearchExample (cont’d)Slide 13Slide 14Bottom-Up FilteringPossible Problem: Left-RecursionSolution: Rule OrderingAvoiding Repeated WorkSlide 19Slide 20Slide 21Slide 22Dynamic ProgrammingEarley ParsingStatesStates/LocationsGraphicallyEarleySlide 29Earley and Left RecursionPredictorScannerCompleterEarley: how do we know we are done?Slide 35Slide 36ExampleSlide 38Slide 39Slide 40A simple exampleWhat is it?Converting Earley from Recognizer to ParserAugmenting the chart with structural informationRetrieving Parse Trees from ChartSlide 46Earley and Left Recursion: 1Earley and Left Recursion: 2Dynamic Programming ApproachesParsing with Context Free GrammarsCSC 9010 Natural Language ProcessingPaula Matuszek and Mary-Angela PapalaskariThis slide set was adapted from:•Jim Martin (after Dan Jurafsky) from U. Colorado•Rada Mihalcea, University of North Texas http://www.cs.unt.edu/~rada/CSCE5290/• Robert Berwick, MIT•Bonnie Dorr, University of MarylandSlide 1ParsingMapping from strings to structured representation•Parsing with CFGs refers to the task of assigning correct trees to input strings•Correct here means a tree that covers all and only the elements of the input and has an S at the top•It doesn’t actually mean that the system can select the correct tree from among the possible trees•As with everything of interest, parsing involves a search which involves the making of choices•We’ll start with some basic methods before moving on to more complex onesSlide 1Programming languages max = min = grade; // Read and process the rest of the grades while (grade >= 0) { count++; sum += grade; if (grade > max) max = grade; else if (grade < min) min = grade; System.out.print ("Enter the next grade (-1 to quit): "); grade = Keyboard.readInt (); } • Easy to parse• Designed that way!Slide 1Natural Languages max = min = grade; // Read and process the rest of the grades while (grade >= 0) count++; sum += grade; if (grade > max) max = grade; else if (grade < min) min = grade; System.out.print ("Enter the next grade (-1 to quit): "); grade = Keyboard.readInt ();}• No {} ( ) [ ] to indicate scope and precedence• Lots of overloading (arity varies)• Grammar isn’t known in advance!•Context-free grammar is not the best formalismSlide 1Some assumptions..•You have all the words already in some buffer•The input isn’t pos tagged•We won’t worry about morphological analysis•All the words are knownSlide 1Top-Down Parsing•Since we’re trying to find trees rooted with an S (Sentences) start with the rules that give us an S.•Then work your way down from there to the words.Slide 1Top Down SpaceSlide 1Bottom-Up Parsing•Of course, we also want trees that cover the input words. So start with trees that link up with the words in the right way.•Then work your way up from there.Slide 1Bottom-Up SpaceSlide 1Top-Down VS. Bottom-Up•Top-down–Only searches for trees that can be answers–But suggests trees that are not consistent with the words–Guarantees that tree starts with S as root–Does not guarantee that tree will match input words•Bottom-up–Only forms trees consistent with the words–Suggest trees that make no sense globally–Guarantees that tree matches input words–Does not guarantee that parse tree will lead to S as a root•Combine the advantages of the two by doing a search constrained from both sides (top and bottom)Slide 1Top-Down, Depth-First, Left-to-Right SearchSlide 1Example (cont’d)Slide 1Example (cont’d)flightflightSlide 1Example (cont’d)flightflightSlide 1Bottom-Up FilteringSlide 1Possible Problem: Left-RecursionWhat happens in the following situationS -> NP VPS -> Aux NP VPNP -> NP PPNP -> Det Nominal…With the sentence starting withDid the flight…Slide 1Solution: Rule OrderingS -> Aux NP VPS -> NP VPNP -> Det NominalNP -> NP PPThe key for the NP is that you want the recursive option after any base case.Slide 1Avoiding Repeated WorkParsing is hard, and slow. It’s wasteful to redo stuff over and over and over.Consider an attempt to top-down parse the following as an NPA flight from Indianapolis to Houston on TWASlide 1flightSlide 1flightflightSlide 1Slide 1Slide 1Dynamic Programming•We need a method that fills a table with partial results that–Does not do (avoidable) repeated work–Does not fall prey to left-recursion–Solves an exponential problem in (approximately) polynomial timeSlide 1Earley ParsingFills a table in a single sweep over the input wordsTable is length N+1; N is number of wordsTable entries representCompleted constituents and their locationsIn-progress constituentsPredicted constituentsSlide 1StatesThe table-entries are called states and are represented with dotted-rules.S -> · VP A VP is predictedNP -> Det · Nominal An NP is in progressVP -> V NP · A VP has been foundSlide 1States/LocationsIt would be nice to know where these things are in the input so…S -> · VP [0,0] A VP is predicted at the start of the sentenceNP -> Det · Nominal [1,2] An NP is in progress; the Det goes from 1 to 2VP -> V NP · [0,3] A VP has been found starting at 0 and ending at 3Slide 1GraphicallySlide 1Earley•As with most dynamic programming approaches, the answer is found by looking in the table in the right place.•In this case, there should be an S state in the final column that spans from 0 to n+1 and is complete.•If that’s the case you’re done.–S – α · [0,n+1]•So sweep through the table from 0 to n+1…–New predicted states are created by states in current chart–New incomplete states are created by advancing existing states as new constituents are discovered–New complete states are created in the same way.Slide 1Earley•More specifically…–Predict all the states you can upfront–Read a word–Extend states based on matches–Add new predictions–Go to 2–Look at N+1 to


View Full Document

Villanova CSC 9010 - Natural Language Processing

Documents in this Course
Lecture 2

Lecture 2

48 pages

Lecture 2

Lecture 2

46 pages

Load more
Download Natural Language Processing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Natural Language Processing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Natural Language Processing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?