This preview shows page 1-2-3-19-20-38-39-40 out of 40 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 8: Parsing tricks; beyond context-free grammarsProfessor Robert C. [email protected]/9.611J SP11The Menu Bar• Administrivia: Lab 3…• The Earley algorithm• Some speedup tricks• How do the actual ‘large scale systems’ do?• Pluses and minuses of context-free grammars6.863J/9.611J SP11Earley parsing• Top down instead of bottom-up• Uses left-context and optionally right-context toconstrain search so we waste less time buildingimpossible things• No restrictions of the form of the grammare.g., NP→ Det N with John PP is an OK rule, thanksto a clever ‘on the fly’ transformation to binarybranching rules (“dotted rules”)• Again O(n3) time (but watch out for grammar size!)6.863J/9.611J SP11Overview of Earley’s Algorithm• Finds constituents and partial constituents in input• A → B C . D E is partial: only the first half of the AAB C D EA → B C . D ED+=AB C D EA → B C D . E6.863J/9.611J SP11Overview of Earley’s Algorithm• Proceeds incrementally, left-to-right• Before it reads word 5, it has already built all hypotheses thatare consistent with first 4 words• Reads word 5 & attaches it to immediately precedinghypotheses. Might yield new constituents that are thenattached to hypotheses immediately preceding them …• E.g., attaching D to A → B C . D E gives A → B C D . E• Attaching E to that gives A → B C D E .• Now we have a complete A that we can attach to hypothesesimmediately preceding the A, etc.6.863J/9.611J SP11Motivation• We will carry out a deterministic simulation of anondeterministic machine that can keep track of ‘all thepossible next states’ Si it can be in after reading each word• So we imagine a sequence of state sets, S0, S1…• Representation as a ‘chart’ of ‘columns’; Column i orchart(i) will contain all of the possible machine states afterreading the ith word (NB. Indexing: we start at 0, before thefirst word of the sentence)• Compare this to how we can solve ‘fruit flies like abanana’ in a nondeterministic finite-state automaton…6.863J/9.611J SP11FSA analog‘fruit flies like a banana’fruit (Adj)fruit (Noun)0123flies(Noun)like (Verb)flies(Verb)like (Adv)abanana456a6.863J/9.611J SP11Parsing with State Sets‘fruit flies like a banana’S0S1 ={1,2} S2{2,3}‘Scan’fruitflieslikeS3 {3,4}aS4 {5}S5 {6}banana6.863J/9.611J SP11State set construction as sequence of machinestates, extending paths…Initial: state set S0 = where machine couldbe after reading zero wordsApply all edge operations until closureS0S1Sn6.863J/9.611J SP11State Set machine• Need to specify initial state• Need to specify how to construct State Set Si, given StateSet Si-1• Need to specify final state (accept or reject)• For getting from one state to next, need only calculate howto scan under closure, word by word• For the linear, or finite-state case, this is all easy• For the hierarchical, or context-free case, we need a morecomplex representation of ‘state’, and how to get from onestate set to the next6.863J/9.611J SP11Table, or ‘chart’ as a sequence of State SetsColumn j = holds a State SetWhat is a “state”? Now we need a bit more info…321j0 3eleph → •NP → eleph•NP → •Det Ni 2VP → V NP •shot → •V → • shotVP → V • NPVP → • V NPV → • shot 1S → NP VP •I → •NP →I •S → NP • VPS → • NP VPNP → • I 06.863J/9.611J SP11But not much more info…Earley’salgorithm• Use table as before; but 1 column per word as ‘bag’ (a ‘chart’)• The column entries are called states or items and are represented withdotted-rules, plus information about how much of the rule has been found.A dotted rule is of 3 types:S → • VP A VP is predictedNP → Det • Noun An NP is in progressVP → V NP • A VP has been foundTo the dotted rule we add 2 indices that denote where the phrase on the left ofthe arrow starts (its left edge), and how far to the right the phrase has beenconstructed so farThese two elements constitute a ‘state’So, e.g., [NP → Det • Noun, (1, 2)] means that we started building an NP atword 1 of the input, and we’ve found the first word, a Det.A collection of states in a given column (e.g., col. 2), is called a State Set6.863J/9.611J SP11Dotted rules: run-time conversion tobinary branching• A VP is predicted at the start of thesentence• An NP is in progress; the Det goesfrom 1 to 2• A VP has been found starting at 0and ending at 3• S →  VP [0,0]• NP → Det  Noun [1,2]• VP → V NP  [0,3]6.863J/9.611J SP11What are…• The initial state? S0={Start→•S, [0,0]}• The final state? Sn={Start →S •, [0,n]}• How do we get from one State Set to the next?We apply three operations on the items in a StateSet until closureNot just ‘scan’, but 2 more to start and stop thebuilding of trees6.863J/9.611J SP11Earley’s algorithm uses 3 basic operations,corresponding to how we construct ‘trees’Predicter (push)ScannerCompleter (attach, pop)6.863J/9.611J SP11Earley’s algorithm (from textbook)6.863J/9.611J SP11The 3 operations: Predictor (‘wishor’);Scanner, Completer6.863J/9.611J SP11Algorithm is just a loop around this:1. Add Start → • S to column 02. For each j from 0 to n:i. For each dotted rule in column j (including those weadd as we go), look at what’s after the dot:a. If word, Scanb. If nonterminal, Predict (until closure)c. If nothing after dot, Complete (until closure)ii. Output ‘yes’ if Start → S • is in column nNote: Don’t add duplicates! (AWP; but this is tricky to do inpractice!)Question: where is the stack in all of this? (Since we need astack…)6.863J/9.611J SP11An Example…ROOT → SS → NP VP NP → PapaNP → Det N N → caviarNP → NP PP N → spoonVP → VP PP V → ateVP → V NP P → withPP → P NP Det → theDet → aPapa ate the caviar with a spoon6.863J/9.611J SP11Time complexity• Algorithm iterates for every word of the input (n iterations)• How many items can be created and processed in Si?- Each item in Si has the form [A→ X1…•C…Xm,j], 0≤ j ≤ i- Thus O(n) items; better:• The scanner and predictor operations on an item take at worstconstant time (Why?)• The completer operations on an item adds items of the form[B→ X1…•A…Xm,l] to Si with 0≤ l ≤ i , so it may require up toO(n) time for each processed item• Time required for each iteration Si is thus


View Full Document

MIT 6 863J - Parsing tricks

Documents in this Course
N-grams

N-grams

42 pages

Semantics

Semantics

75 pages

Semantics

Semantics

82 pages

Semantics

Semantics

64 pages

Load more
Download Parsing tricks
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Parsing tricks and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Parsing tricks 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?