Unformatted text preview:

CSCI 5832 Natural Language Processing Jim Martin Lecture 14 01 15 19 1 Today 3 4 Parsing CKY again Earley 2 01 15 19 Sample Grammar 3 01 15 19 Dynamic Programming DP methods fill tables with partial results and Do not do too much avoidable repeated work Solve exponential problems in polynomial time sort of Efficiently store ambiguous structures with shared sub parts 4 01 15 19 CKY Parsing First we ll limit our grammar to epsilonfree binary rules more later Consider the rule A BC If there is an A in the input then there must be a B followed by a C in the input If the A spans from i to j in the input then there must be some k st i k j Ie The B splits from the C someplace 5 01 15 19 CKY So let s build a table so that an A spanning from i to j in the input is placed in cell i j in the table So a non terminal spanning an entire string will sit in cell 0 n If we build the table bottom up we ll know that the parts of the A must go from i to k and from k to j 6 01 15 19 CKY Meaning that for a rule like A B C we should look for a B in i k and a C in k j In other words if we think there might be an A spanning i j in the input AND A B C is a rule in the grammar THEN There must be a B in i k and a C in k j for some i k j 7 01 15 19 CKY So to fill the table loop over the cell i j values in some systematic way What constraint should we put on that For each cell loop over the appropriate k values to search for things to add 8 01 15 19 CKY Table 9 01 15 19 CKY Algorithm 10 01 15 19 CKY Parsing Is that really a parser 11 01 15 19 Note We arranged the loops to fill the table a column at a time from left to right bottom to top This assures us that whenever we re filling a cell the parts needed to fill it are already in the table to the left and below 12 01 15 19 Example 13 01 15 19 Other Ways to Do It Are there any other sensible ways to fill the table that still guarantee that the cells we need are already filled 14 01 15 19 Other Ways to Do It 15 01 15 19 Sample Grammar 16 01 15 19 Problem What if your grammar isn t binary As in the case of the TreeBank grammar Convert it to binary any arbitrary CFG can be rewritten into Chomsky Normal Form automatically What does this mean The resulting grammar accepts and rejects the same set of strings as the original grammar But the resulting derivations trees are different 17 01 15 19 Problem More specifically rules have to be of the form A B C Or A w That is rules can expand to either 2 nonterminals or to a single terminal 18 01 15 19 Binarization Intuition Eliminate chains of unit productions Introduce new intermediate non terminals into the grammar that distribute rules with length 2 over several rules So S A B C Turns into S X C X AB Where X is a symbol that doesn t occur anywhere else in the the grammar 19 01 15 19 CNF Conversion 20 01 15 19 CKY Algorithm 21 01 15 19 Example Filling column 5 22 01 15 19 Example 23 01 15 19 Example 24 01 15 19 Example 25 01 15 19 Example 26 01 15 19 CKY Notes Since it s bottom up CKY populates the table with a lot of phantom constituents Segments that by themselves are constituents but cannot really occur in the context in which they are being suggested To avoid this we can switch to a top down control strategy or We can add some kind of filtering that blocks constituents where they can not happen in a final analysis 27 01 15 19 Break Quiz pushed back to Tues 3 18 Schedule Today CKY and Earley Thursday Partial parsing chunking and more on statistical sequence processing Next week statistical parsing 28 01 15 19 Earley Parsing Allows arbitrary CFGs Top down control Fills a table in a single sweep over the input words Table is length N 1 N is number of words Table entries represent Completed constituents and their locations In progress constituents Predicted constituents 29 01 15 19 States The table entries are called states and are represented with dotted rules S VP A VP is predicted NP Det Nominal An NP is in progress VP V NP A VP has been found 30 01 15 19 States Locations S VP 0 0 NP Det Nominal 1 2 VP V NP 0 3 A VP is predicted at the start of the sentence An NP is in progress the Det goes from 1 to 2 A VP has been found starting at 0 and ending at 3 31 01 15 19 Graphically 32 01 15 19 Earley As with most dynamic programming approaches the answer is found by looking in the table in the right place In this case there should be an S state in the final column that spans from 0 to n and is complete If that s the case you re done S 0 n 33 01 15 19 Earley So sweep through the table from 0 to n New predicted states are created by starting top down from S New incomplete states are created by advancing existing states as new constituents are discovered New complete states are created in the same way 34 01 15 19 Earley More specifically 1 Predict all the states you can upfront 2 Read a word 1 Extend states based on matches 2 Generate new predictions 3 Go to step 2 3 Look at n to see if you have a winner 35 01 15 19 Earley Code 36 01 15 19 Earley Code 37 01 15 19 Example Book that flight We should find an S from 0 to 3 that is a completed state 38 01 15 19 Example 39 01 15 19 Add To Chart 40 01 15 19 Example 41 01 15 19 Example 42 01 15 19 Efficiency For such a simple example there seems to be a lot of useless stuff in there Why It s predicting things that aren t consistent with the input That s the flipside to the CKY problem 43 01 15 19 Details As with CKY that isn t a parser until we add the backpointers so that each state knows where it came from 44 01 15 19 Back to Ambiguity Did we solve it 45 01 15 19 Ambiguity 46 01 15 19 Ambiguity No Both CKY and Earley will result in multiple S structures for the 0 n …


View Full Document

CU-Boulder CSCI 5832 - Parsing

Loading Unlocking...
Login

Join to view Parsing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Parsing and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?