6.863J Natural Language Processing Lecture 7: parsing with hierarchical structures – context-free parsing Robert C. Berwick The Menu Bar • Administrivia: • Schedule alert: Lab2 due today; Lab 3 out late Weds. (context-free parsing) • Lab time today, tomorrow • (on web) Please read notes3.pdf, englishgrammar.pdf • Agenda: • Chart parsing summary; time complexity • How to write grammars 6.863J/9.611J Lecture 8 Sp03Chart parsing summary • Data structures & Algorithm • Data structures • A chart parser has three data structures: , which holds the words of the• an input stackinput sentence (in order) , which holds completed phrases• a chartorganized by starting position and length , organized by ending position.• a set of edges6.863J/9.611J Lecture 8 Sp03 Input sentence stack • The input • Positions in the input sentence will be numbered starting with zero and will be the positions between successive words. For example: 0 I 1 shot 2 an 3 elephant 4 in 5 my 6 pajamas 7 For now, assume POS already assigned, words consumed l-to-r 6.863J/9.611J Lecture 8 Sp03We have presented the chart graphically so far… 6.863J/9.611J Lecture 8 Sp03 Chart • Example of chart S VP NP I shot an elephant in my pajamas n v d n p d n NP NPNP VPS PP6.863J/9.611J Lecture 8 Sp03Chart in graphical form 6543210start ndpndvn1 NPNP2 PPVP3 S4 NP5 VP6 S7 p length Position 6.863J/9.611J Lecture 8 Sp03 At the end… length 6543210start ndpndvn1 NPNP2 PPVP3 S4 NP5 VP6 S7 p Position 6.863J/9.611J Lecture 8 Sp03Corresponding to Marxist analysis NP S I VP V NP NP shot Det N PP an elephant P Det N in my pj’s 6.863J/9.611J Lecture 8 Sp03 The chart • A cell in the chart can contain more than one phrase (e.g., n & np) • With each constituent is frequently stored information about which parsing rule was used to generate it and what smaller constituents make it up (to recover the parse) • Used to prevent redundant work if 2 or more possible internal structures for a single phrase (“blue socks and shoes”) 6.863J/9.611J Lecture 8 Sp03Edges • Each edge consists of a grammar rule, plus info about how it matches up against the input, specifically: • A rule, e.g., S(entence)fi NP VP • The position up to which we’ve matched the rule to the input, indicated by a dot (• ), e.g., S fiNP • VP • The starting position of the edge (the first input word matched) (e.g., VP ‘shot…’ starts at position 1 • The # of input words matched so far • Edges organized by ending position (the last input word matching against their rule so far) • Edges are added but never deleted 6.863J/9.611J Lecture 8 Sp03 Edges, cont’d Start 0 S fi • NP VP NP fi • Det N NPfi • N NPfi • NP PP 1 NP fi N • S fi NP • VP NPfi NP • PP VP fi • V NP VP fi • V NP PP PP fi • P NP Etc… 6.863J/9.611J Lecture 8 Sp03State-set construction S0‹initial state set= initialInitialize: Loop: Final: state edge [Start fi • S , 0, n] ¨ e-closure of this set under predict, complete For word i=1,…,n Si computed from Si-1 (using scan, predict, complete) try scan; then predict, complete Is a final edge in Sn? [Start fi S• , 0, n]˛ Sn? 6.863J/9.611J Lecture 8 Sp03 The overall algorithm • Suppose there are n words in the input • Set up chart of height and width n • Add input words onto stack, last word at bottom • For each ending position i in the input, 0 through n, set up two sets, S and Di (“Start”,i “Done”) S0 ‹ all rules expanding start node of grammar Si ‹˘ for i „ 0 Si will be treated as search queue (BFS or DFS). Edges will be extracted one by one from Si & put into Di . When Si becomes empty, remove 1st word from stack & go to next ending position 6.863J/9.611J Lecture 8 Sp03i +1Or: • Loop until Si is empty • Remove first edge e from Si • Add e to Di • to e, using the 3Apply 3 extension operations operators: scan, complete, predict (which may produce new edges) • New edges added to Si or Si+1, if they are not already in Si, Di, or Di+1 • Pop first word off input stack • When all ending positions processed, chart contains all complete phrases found 6.863J/9.611J Lecture 8 Sp03 Adding edges by the 3 operations Predict (start a phrase) Complete (finish phrase) – like extending a search Scan (match words) 6.863J/9.611J Lecture 8 Sp03Another way to view it NP Predict Complete phrase Phrase DT NN scan scan the guy 6.863J/9.611J Lecture 8 Sp03 6.863J/9.611J Lecture 8 Sp03 Connecting the dots Sentence The guy NP Det Noun Dotted rule form NPDot at beginning= just started building a phrase of this type Predict fi •Det Noun6.863J/9.611J Lecture 8 Sp03 Sentence The guy NP Det Noun Dotted rule form NPfiDet • Noun Scan 6.863J/9.611J Lecture 8 Sp03 Dot at end Sentence The guy NP Det Noun Dotted rule form NPfiDet • Noun Advance in input = scan NPfi • (finished building phrase) The dots – in the middle Det NounThe three ops add edges in our full chart representation … • Loops (Predict) – start a phrase • Skips (Scan) – build phrase • Pastes – glue 2 edges to make a third, larger one (Complete) – finish a phrase 6.863J/9.611J Lecture 8 Sp03 Picture: Predict adds the ‘loops’ S fi • NP VP I shot an elephant in my pajamas 6.863J/9.611J Lecture 8 Sp03Picture: Scan adds the ‘jumps’ I shot an elephant in my pajamas S fi NP fi N NP fi NP fi •NP VP • D • N • NP PP NP fi N • 6.863J/9.611J Lecture 8 Sp03 Picture: Complete combines edges S fi NP VP • VP fi V NP • VP fi VP • PP I shot an elephant in my pajamas NP fi N • NP fi D N • NP fi • NP PP S fi NP • VP 6.863J/9.611J Lecture 8 Sp03The ops • 3 ops: scan, predict, complete; or scan, push, pop move1.Scan: forward, consuming a token (word class) -what if this is a phrase name, though? start building a phrase (tree) at this point2.Predict (push): in the input; or jump to subnetwork; finish building a phrase (tree) at this point; pop stack and return from subnet (which also says attached) Scan = linear precedence; Predict, complete: dominance 3.Complete (pop): where the subphrase gets 6.863J/9.611J Lecture 8 Sp03 Another way to view it Push NP Pop NP d n Scan NP 6.863J/9.611J Lecture 8 Sp03Definitions – words & symbols • Scan Suppose current edge e is not finished & part of speech tag X follows the dot in the rule for e Scan examines next word in input If word has
View Full Document