CORNELL CS 674 - Intro to Statistical Parsing - D1099512

Home> Schools> Cornell University> Computer Science (CS) > CS 674> Intro to Statistical Parsing

DOC PREVIEW

CORNELL CS 674 - Intro to Statistical Parsing

School name Cornell University

Course Cs 674- Advanced Language Techologies

Pages 9

This preview shows page 1-2-3 out of 9 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Intro to Statistical ParsingPresenter: Benyah ShaparenkoCS 674, 3/14/2005CS 674, 3/14/2005 2Statistical Parsing Parsing + Statistics Choose “best” parseCS 674, 3/14/2005 3First Paper Eugene Charniak, “Statistical Parsing with a Context-free Grammar and Word Statistics” AAAI-1997CS 674, 3/14/2005 4Charniak 1997 Parser Penn Treebank Grammar, probabilities Uses probabilities for a good parse Exhaustive search impractical SmoothingCS 674, 3/14/2005 5Probabilistic Models: sentence, π: parse Probabilities calculated bottom-up System not guaranteed to find the best (highest probability) parseCS 674, 3/14/2005 6An Example ParseCS 674, 3/14/2005 7Probabilities of Constituents Constituent: e.g. np, vp, s 3 Steps (Top-down) 1. Calculate probability of head 2. Calculate probability of constituent given the head 3. Recurse down in parse treeCS 674, 3/14/2005 8Step 1: Head Probability s: head t: type of s h: head of parent of s l: type of parent of sCS 674, 3/14/2005 9Step 1: Example p(profits | rose, np, s) is based on… p(profits | rose, np, s) = 0 p(profits | class of rose, np, s) = .00352223 p(profits | np, s) = .0006274 p(profits | np) = .000556527CS 674, 3/14/2005 10Step 2: Pr(Constituent | Head) r: rule used for rewriting the constituent h, t, l as before Head, type, parent typeCS 674, 3/14/2005 11Step 2: Example r is (np -> adj plural-n) in “corporate profits” p(r | profits, np, s) is based on… p(r | profits, np, s) = .1707 p(r | profits, np) = .1875 p(r | class of profits, np) = .1192 p(r | np, s) = .0176 p(r | np) = .0255CS 674, 3/14/2005 12Algorithm Get grammar and stats from treebank Use only constituents likely to be in probable parses Based upon p(r | t) distribution Find best parse of probable parsesCS 674, 3/14/2005 13Experimental Setup Training: Sections 2-21 of Penn Treebank corpus (1 million words) Testing: Section 23 (50K words) Preliminary testing/tuning: Section 24CS 674, 3/14/2005 14Systems Tested PCFG: only use probability p(r | t) Minimal: use observed ṗ(r | h, t, l) No classes: all except ṗ’s with ch Basic: all equations as described earlier Full: Basic + 30M words used for unsupervised learningCS 674, 3/14/2005 15Metrics Labeled recall (LR): #right / #possible Labeled precision (LP): #right / #marked LR2/LP2 are LR/LP ignoring punctuation, and collapsing ADVP and PRT Crossing brackets CB: constituents violating correct boundaries CB0: no crossing brackets CB2: no more than 2 crossing bracketsCS 674, 3/14/2005 16ResultsCS 674, 3/14/2005 17Comparison w/ Previous WorkCS 674, 3/14/2005 18Comparison Explanations Non-factors p(s, π) vs. p(π | s) POS tags Formal/explicit grammar? Non-occurrence of grammar rules in data Factors: statistics and smoothing Decision tree vs. smoothing equations Word counts vs. classes Unsupervised data: ~.5% LR2CS 674, 3/14/2005 19Second Paper Michael Collins, “Three Generative, Lexicalised Models for Statistical Parsing” ACL/EACL-1997CS 674, 3/14/2005 20Three Generative Models Model 1: Collins 1996, but generative Generative: the same T maximizes both p(S, T) and p(T | S) Note: S is Charniak’s s, T is Charniak’s π Model 2: adds probabilistic complement/adjunct distinction Model 3: adds probabilistic wh-movementCS 674, 3/14/2005 21Notation PCFG: rewrite rules w/ probabilities Lexicalized PCFG P(h) -> Ln(ln)…L1(l1)H(h)R1(r1)…Rm(rm) X(x) with nonterminal X and <word, POS tag> pair x P: parent nonterminal H: head child of P, with h being head word Liand Riare modifiers of HCS 674, 3/14/2005 22An Example ParseCS 674, 3/14/2005 23Model 1 Step 1: Generate head probability pH(H | P, h) Steps 2,3: Generate left,right probability pL(Li(li) | P, h, H) pR(Ri(ri) | P, h, H) Uses 0-order Markov assumption Can use distance, e.g. pL(Li(li) | P, h, H, distancel(i-1))CS 674, 3/14/2005 24Model 1 Example Top rewrite rule in example parse Head prob.: pH(VP | S, bought) Left prob.: pL(NP(marks) | S, VP, bought) * pL(… Right prob.: pR(STOP | S, VP, bought)CS 674, 3/14/2005 25Model 2: Subcategorization Model 1 + complement/adjunct division Example…CS 674, 3/14/2005 26Model 2 Motivation Parsing info useful for marking complements May help accuracy Some rules for treebank data: must be NP, SBAR, S, etc under an S, cannot be ADV, VOC, etcCS 674, 3/14/2005 27Model 2 Probabilities Head probability same: pH(H | P, h) L/R subcat frames LC and RC w/ probs. plc(LC | P, h, H) and prc(RC | P, h, H) Frames specify needed complements… L/R probabilities depend on LC/RC, e.g. pL(Li(li) | P, h, H, distancel(i-1), LC)CS 674, 3/14/2005 28Model 2 Example pL(NP(marks) | S, VP, bought)*pL(NP(week) | S, VP, bought)*pL(STOP | S, VP, bought) becomes… plc({NP-C} | S, VP, brought)*pL(NP-C(marks) | S, VP, bought, {NP-C})*pL(NP(week) | S, VP, bought, {})*pL(STOP | S, VP, bought, {})CS 674, 3/14/2005 29Model 3: Wh-Movement Refers to the effects of a wh-word, e.g. which, where, on a clause Parsing solution: using a +gap feature which must be matched with a TRACE 1. +gap passed to head of phrase 2. +gap passed to L/R modifiers or output as a TRACECS 674, 3/14/2005 30Model 3 Probabilities pG(G | P, h, H) G either Head, Left, or Right If G = Head, propagate +gap to head If G = Left/Right, add +gap to Left/Right subcat variableCS 674, 3/14/2005 31Implementation Details Smoothing used Backoff… pretty standard Unknown words Words < 5 times replaced by “UNKNOWN” POS tags Only use tags that appear in training dataCS 674, 3/14/2005 32Experimental Results Setup: same as Charniak Metrics: Same, except only “LR2” and “LP2,” which are now called LR and LPCS 674, 3/14/2005 33The Numbers…CS 674, 3/14/2005 34Still Not the End of the Story… Collins (1998) applied techniques for “semantic tagging” Management succession: outgoing manager, new manager, the position Charniak (2000) made an max entropy parser Just over 90% LP /

View Full Document