Unformatted text preview:

Natural Language Processing Lecture 16 10 24 2013 Jim Martin Today Statistical parsing Sources of problems Improvements Grammar rewriting Lexicalized grammars 01 14 19 Speech and Language Processing Jurafsky and Martin 2 CFG Parsing We left CFG parsing CKY with the problem of selecting the right parse out of all the possible parses Now if we define right parse as most probable parse we get our old friend Argmax P Parse Words 01 14 19 Speech and Language Processing Jurafsky and Martin 3 Example 01 14 19 Speech and Language Processing Jurafsky and Martin 4 Probabilistic CFGs 1 The framework Model How to assign probabilities to parse trees 2 Training the model Learning How to acquire estimates for the probabilities specified by the model 3 Parsing with probabilities Decoding Given an input sentence and a model how can we efficiently find the best or N best tree s for that input 01 14 19 Speech and Language Processing Jurafsky and Martin 5 Simple Probability Model A derivation tree consists of the collection of grammar rules that are in the tree The probability of a tree is the product of the probabilities of the rules in the derivation 01 14 19 Speech and Language Processing Jurafsky and Martin 6 Example How many rules are in this derivation 01 14 19 Speech and Language Processing Jurafsky and Martin 7 Rule Probabilities So What s the probability of a rule Start at the top A tree should have an S at the top So given that we know we need an S we can ask about the probability of each particular S rule in the grammar That is P particular S rule S is what I need So in general we need For each rule in the grammar 01 14 19 Speech and Language Processing Jurafsky and Martin 8 Training the Model We can get the estimates we need from an annotated database i e a treebank For example to get the probability for a particular VP rule just count all the times the rule is used and divide by the number of VPs overall 01 14 19 Speech and Language Processing Jurafsky and Martin 9 Parsing Decoding So to get the best most probable parse for a given input 1 Enumerate all the trees for a sentence 2 Assign a probability to each using the model 3 Return the best argmax 01 14 19 Speech and Language Processing Jurafsky and Martin 10 Example Consider Book the dinner flight 01 14 19 Speech and Language Processing Jurafsky and Martin 11 Examples These trees consist of the following rules 01 14 19 Speech and Language Processing Jurafsky and Martin 12 Dynamic Programming Of course as with normal parsing we don t really want to do it that way Instead we need to exploit dynamic programming For the parsing as with CKY And for computing the probabilities and returning the best parse as with Viterbi and HMMs 01 14 19 Speech and Language Processing Jurafsky and Martin 13 Probabilistic CKY Alter CKY so that the probabilities of constituents are stored in the table as they are derived Probability of a new constituent A derived from the rule A B C P A B C A P B P C Where P B and P C are already in the table given the way that CKY operates What we store is the MAX probability over all the A rules 01 14 19 Speech and Language Processing Jurafsky and Martin 14 Probabilistic CKY 01 14 19 Speech and Language Processing Jurafsky and Martin 15 Probabilistic CFGs 1 The framework Model Product of probabilities of rules in a derivation 2 Training the model Learning MLE counts derived from a treebank 3 Parsing with probabilities Decoding CKY with max probabilities added in 01 14 19 Speech and Language Processing Jurafsky and Martin 16 HW Questions 01 14 19 Speech and Language Processing Jurafsky and Martin 17 Problems with PCFGs The probability model we re using is just based on the the bag of rules in the derivation 1 Does not take the actual words into account in any useful way 2 Does not take into account where in the derivation a rule is used 3 Does not work terribly well 01 14 19 That is the most probable parse isn t usually the right one the one in the treebank test set Speech and Language Processing Jurafsky and Martin 18 Common Sources of Problems Attachment ambiguities PP attachment Coordination problems 01 14 19 Speech and Language Processing Jurafsky and Martin 19 PP Attachment 01 14 19 Speech and Language Processing Jurafsky and Martin 20 Coordination Most grammars have a rule implicitly of the form X X and X This leads to massive ambiguity problems 01 14 19 Speech and Language Processing Jurafsky and Martin 21 Improved Approaches There are two approaches to overcoming these shortcomings 1 Rewrite the grammar to better capture the dependencies among rules 2 Integrate lexical dependencies into the model 1 And come up with the independence assumptions needed to make it work 01 14 19 Speech and Language Processing Jurafsky and Martin 22 Solution 1 Rule Rewriting The grammar rewriting approach attempts to capture local tree information by rewriting the grammar so that the rules capture the regularities we want By splitting and merging the non terminals in the grammar Example split NPs into different classes that is split the NP rules into separate rules 01 14 19 Speech and Language Processing Jurafsky and Martin 23 Example NPs Our CFG rules for NPs don t condition on where in a tree the rule is applied But we know that not all the rules occur with equal frequency in all contexts Consider NPs that involve pronouns vs those that don t 01 14 19 Speech and Language Processing Jurafsky and Martin 24 Example NPs So that comes down to NP Pronoun Gets replaced with something like NP Subj Pronoun NP Obj Pronoun Separate rules with different counts in the treebank and therefore different probabilities 01 14 19 Speech and Language Processing Jurafsky and Martin 25 Rule Rewriting Three approaches 1 Use linguistic knowledge to directly rewrite rules by hand 1 NP Obj and the NP Subj approach 2 Automatically rewrite the rules using context to capture some of what we want 1 Ie Incorporate context into a context free approach 3 Search through the space of all rewrites for the grammar that maximizes the probability of the training set 01 14 19 Speech and Language Processing Jurafsky and Martin 26 Local Context Approach Condition the rules based on their parent nodes Splitting based on tree context captures some of the linguistic intuitions we saw with the NP example 01 14 19 Speech and Language Processing Jurafsky and Martin 27 Parent Annotation Now we have non terminals NP S and NP VP that should capture the subject object and pronoun full


View Full Document

CU-Boulder CSCI 5832 - Statistical parsing

Loading Unlocking...
Login

Join to view Statistical parsing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Statistical parsing and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?