DOC PREVIEW
CORNELL CS 674 - Intro to Statistical Parsing

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Intro to Statistical ParsingPresenter: Benyah ShaparenkoCS 674, 3/14/2005CS 674, 3/14/2005 2Statistical Parsing Parsing + Statistics Choose “best” parseCS 674, 3/14/2005 3First Paper Eugene Charniak, “Statistical Parsing with a Context-free Grammar and Word Statistics” AAAI-1997CS 674, 3/14/2005 4Charniak 1997 Parser Penn Treebank Grammar, probabilities Uses probabilities for a good parse Exhaustive search impractical SmoothingCS 674, 3/14/2005 5Probabilistic Models: sentence, π: parse Probabilities calculated bottom-up System not guaranteed to find the best (highest probability) parseCS 674, 3/14/2005 6An Example ParseCS 674, 3/14/2005 7Probabilities of Constituents Constituent: e.g. np, vp, s 3 Steps (Top-down) 1. Calculate probability of head 2. Calculate probability of constituent given the head 3. Recurse down in parse treeCS 674, 3/14/2005 8Step 1: Head Probability s: head t: type of s h: head of parent of s l: type of parent of sCS 674, 3/14/2005 9Step 1: Example p(profits | rose, np, s) is based on… p(profits | rose, np, s) = 0 p(profits | class of rose, np, s) = .00352223 p(profits | np, s) = .0006274 p(profits | np) = .000556527CS 674, 3/14/2005 10Step 2: Pr(Constituent | Head) r: rule used for rewriting the constituent h, t, l as before Head, type, parent typeCS 674, 3/14/2005 11Step 2: Example r is (np -> adj plural-n) in “corporate profits” p(r | profits, np, s) is based on… p(r | profits, np, s) = .1707 p(r | profits, np) = .1875 p(r | class of profits, np) = .1192 p(r | np, s) = .0176 p(r | np) = .0255CS 674, 3/14/2005 12Algorithm Get grammar and stats from treebank Use only constituents likely to be in probable parses Based upon p(r | t) distribution Find best parse of probable parsesCS 674, 3/14/2005 13Experimental Setup Training: Sections 2-21 of Penn Treebank corpus (1 million words) Testing: Section 23 (50K words) Preliminary testing/tuning: Section 24CS 674, 3/14/2005 14Systems Tested PCFG: only use probability p(r | t) Minimal: use observed ṗ(r | h, t, l) No classes: all except ṗ’s with ch Basic: all equations as described earlier Full: Basic + 30M words used for unsupervised learningCS 674, 3/14/2005 15Metrics Labeled recall (LR): #right / #possible Labeled precision (LP): #right / #marked LR2/LP2 are LR/LP ignoring punctuation, and collapsing ADVP and PRT Crossing brackets CB: constituents violating correct boundaries CB0: no crossing brackets CB2: no more than 2 crossing bracketsCS 674, 3/14/2005 16ResultsCS 674, 3/14/2005 17Comparison w/ Previous WorkCS 674, 3/14/2005 18Comparison Explanations Non-factors p(s, π) vs. p(π | s) POS tags Formal/explicit grammar? Non-occurrence of grammar rules in data Factors: statistics and smoothing Decision tree vs. smoothing equations Word counts vs. classes Unsupervised data: ~.5% LR2CS 674, 3/14/2005 19Second Paper Michael Collins, “Three Generative, Lexicalised Models for Statistical Parsing” ACL/EACL-1997CS 674, 3/14/2005 20Three Generative Models Model 1: Collins 1996, but generative Generative: the same T maximizes both p(S, T) and p(T | S) Note: S is Charniak’s s, T is Charniak’s π Model 2: adds probabilistic complement/adjunct distinction Model 3: adds probabilistic wh-movementCS 674, 3/14/2005 21Notation PCFG: rewrite rules w/ probabilities Lexicalized PCFG P(h) -> Ln(ln)…L1(l1)H(h)R1(r1)…Rm(rm) X(x) with nonterminal X and <word, POS tag> pair x P: parent nonterminal H: head child of P, with h being head word Liand Riare modifiers of HCS 674, 3/14/2005 22An Example ParseCS 674, 3/14/2005 23Model 1 Step 1: Generate head probability pH(H | P, h) Steps 2,3: Generate left,right probability pL(Li(li) | P, h, H) pR(Ri(ri) | P, h, H) Uses 0-order Markov assumption Can use distance, e.g. pL(Li(li) | P, h, H, distancel(i-1))CS 674, 3/14/2005 24Model 1 Example Top rewrite rule in example parse Head prob.: pH(VP | S, bought) Left prob.: pL(NP(marks) | S, VP, bought) * pL(… Right prob.: pR(STOP | S, VP, bought)CS 674, 3/14/2005 25Model 2: Subcategorization Model 1 + complement/adjunct division Example…CS 674, 3/14/2005 26Model 2 Motivation Parsing info useful for marking complements May help accuracy Some rules for treebank data: must be NP, SBAR, S, etc under an S, cannot be ADV, VOC, etcCS 674, 3/14/2005 27Model 2 Probabilities Head probability same: pH(H | P, h) L/R subcat frames LC and RC w/ probs. plc(LC | P, h, H) and prc(RC | P, h, H) Frames specify needed complements… L/R probabilities depend on LC/RC, e.g. pL(Li(li) | P, h, H, distancel(i-1), LC)CS 674, 3/14/2005 28Model 2 Example pL(NP(marks) | S, VP, bought)*pL(NP(week) | S, VP, bought)*pL(STOP | S, VP, bought) becomes… plc({NP-C} | S, VP, brought)*pL(NP-C(marks) | S, VP, bought, {NP-C})*pL(NP(week) | S, VP, bought, {})*pL(STOP | S, VP, bought, {})CS 674, 3/14/2005 29Model 3: Wh-Movement Refers to the effects of a wh-word, e.g. which, where, on a clause Parsing solution: using a +gap feature which must be matched with a TRACE 1. +gap passed to head of phrase 2. +gap passed to L/R modifiers or output as a TRACECS 674, 3/14/2005 30Model 3 Probabilities pG(G | P, h, H) G either Head, Left, or Right If G = Head, propagate +gap to head If G = Left/Right, add +gap to Left/Right subcat variableCS 674, 3/14/2005 31Implementation Details Smoothing used Backoff… pretty standard Unknown words Words < 5 times replaced by “UNKNOWN” POS tags Only use tags that appear in training dataCS 674, 3/14/2005 32Experimental Results Setup: same as Charniak Metrics: Same, except only “LR2” and “LP2,” which are now called LR and LPCS 674, 3/14/2005 33The Numbers…CS 674, 3/14/2005 34Still Not the End of the Story… Collins (1998) applied techniques for “semantic tagging” Management succession: outgoing manager, new manager, the position Charniak (2000) made an max entropy parser Just over 90% LP /


View Full Document
Download Intro to Statistical Parsing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Intro to Statistical Parsing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Intro to Statistical Parsing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?