DOC PREVIEW
Berkeley COMPSCI 294 - Grammar Induction

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 294-5: StatisticalNatural Language ProcessingGrammar InductionDan KleinAssignment 3 Honors Best F1 Christine Hodges (84.1, 2H/2V, bin first) David Rosenberg (84.2, 2V, bin first?) Best Exact Match Roger Bock (36.1%, 2H/2V, pre-tagged) Observations: Bug in my lexicon (Rosenberg) V/H order has subtle issues (Maire) Short test sentences can be parsed almost as well with short training sentences only (Barrett, Petrov) Rare rules slow parsing, hurt accuracy (Latham) Unary issues (Nakov, Bock) Exact match can be at odds with F1 (why?)Idea: Lexical Affinity Models Words select other words on syntactic grounds Idea: Link up pairs with high mutual information [Yuret, 1998]: Greedy linkage [Paskin, 2001]: Iterative re-estimation with EM Evaluation: compare linked pairs to a gold standardcongress narrowly passed the amended bill(,)(| ,)head argP arg head dir∏()Pgraph()PlengthLexical Affinity Models Generative Model for [Paskin, 2001]congress narrowly passed the amended billpassed …. billUniformEmpiricalProblem: Non-Syntactic Affinity Mutual information between words does not necessarily indicate syntactic selection.a new year begins in new yorkexpect brushbacks but no beanballscongress narrowly passed the amended bill39.7AccuracyMethodPaskin, 200141.7RandomIdea: Word Classes Individual words like congress are entwined with semantic facts about the world. Syntactic classes, like NOUN and ADVERB are bleached of word-specific semantics. Automatic word classes more likely to look like DAYS-OF-WEEK or PERSON-NAME. We could build dependency models over word classes. [cf. Carroll and Charniak, 1992]NOUN ADVERB VERB DET PARTICIPLE NOUNcongress narrowly passed the amended bill2()Plength ()Pgraph(,)(( )|( ), )head argP c arg c head dir∏(|())wordPword cword∏(, ,)P words classes graph =A Word-Class ModelNOUN ADVERB VERB DET PARTICIPLE NOUNcongress narrowly passed the amended billProblems: Word Class Models Issues: Too simple a model – doesn’t work much better supervised No representation of valence (number of arguments)NOUN NOUN VERBstock prices fellNOUN NOUN VERBstock prices fell41.7Random53.244.7Adjacent WordsCarroll and Charniak, 92Issue: Local RepresentationsDistanceP(c(a) | c(h), d)This Model (DMV)P(c(a) | c(h))Carroll & Charniak 92P(a | h)Paskin 01Local FactorWord-Classes?argheaddistance?A Head-Outward Model (DMV) Supervised statistical parsers benefit from modeling tree distributions implicitly. [e.g., Collins, 99] A head-outward model with word classes and valence/adjacency:()hPt=(, )(STOP|(), , )aargshdirP c h dir adj∈¬∏(()| (), ) ( | ())Pca ch dirPa ca(STOP | ( ), , )Pchdiradj()aPt{,}dir l r∈∏ha1a2STOP1(|, ) (,, , )distPa h dir count h a dist dirdist∝∑Results: Dependencies55.9Adjacent Words62.7Our Model (DMV) Model is re-estimated with EM Cubic dynamic program run over each sentence Expected counts of each modeled configuration are aggregated Initialization: Initial parameters from simple heuristics:Common Errors: Dependency420DET → V-PRES470DET → V-PAST627DET → PREP696DET ← N-PL735PREP ← DET760NUM → NUM2096N-PROP ← N-PROP3474DET ← N54N → V-PAST54NUM ← NUM669N ← PREP672DET → N-PL714N → V-PRES838PREP ← N1898N-PROP → N-PROP3079DET → NUnderproposedDependenciesOverproposedConstituentsOverproposedDependencies3Early Approaches: Structure Search Incremental grammar learning, chunking [Wolff 88, Langley 82, many others] Can recover synthetic grammars An (extremely good) result of incremental structure search: Looks good, … but can’t parse in the wild.Issues with Chunk/Merge Systems Hard to recover from initial choices (c.f. EM, where the issue is initial state) Hard to make local decisions which will interact well with each other (e.g. group verb-preposition and preposition-determiner, both wrong, and not consistent) Good local heuristics often don’t have a well-formed global objective that can be evaluated for the target grammar.Idea: Learn PCFGs with EM Classic experiments on learning PCFGs with Expectation-Maximization [Lari and Young, 1990] Full binary grammar over n symbols Parse uniformly/randomly at first Re-estimate rule expectations off of parses Repeat Their conclusion: it doesn’t really work.XjXiXk{ X1, X2… Xn}:( , , ) ( ):()()(|,,)()kT X i j yield T ScTyieldT SPTPX ijSPT∧===∑∑2(|)1/ab cPX X X n=Re-estimation of PCFGs Basic quantity needed for re-estimation with EM: Can calculate in cubic time with the Inside-Outside algorithm. Consider an initial grammar where all productions have equal weight: Then all trees have equal probability initially. Therefore, after one round of EM, the posterior over trees will (in the absence of random perturbation) be approximately uniform over all trees, and symmetric over symbols.Problem: “Uniform” PosteriorsTree UniformSplit UniformProblem: Model Symmetries Symmetries How does this relate to trees?NOUN VERB ADJ NOUNX1?X2?X1? X2?NOUN VERB ADJ NOUNNOUNVERBNOUNVERBADJ4Other Approaches Evaluation: fraction of nodes in gold trees correctly posited in proposed trees (unlabeled recall) Some recent work in learning constituency: [Adrians, 99] Language grammars aren’t general PCFGs [Clark, 01] Mutual-information filters detect constituents, then an MDL-guided search assembles them [van Zaanen, 00] Finds low edit-distance sentence pairs and extracts their differences35.634.616.8van Zaanen, 2000Clark, 2001Adriaans, 1999Right-Branching Baseline English trees tend to be right-branching, not balanced A simple (English-specific) baseline is to choose the right chain structure for each sentencethey were unwilling to agree to new terms35.6van Zaanen, 0046.4Right-BranchDesiderata: Practical Learnability Be as simple as possible Make symmetries self-breaking whenever possible Avoid hidden structures which are not directly coupled to surface phenomena To be practically learnable, models should:congress NOUNNPSVPVNP?Inspiration: Distributional Clusteringthe __ ofgovernorsources __ ♦president __ thatsources __ ♦the __ appointedthe __ saidthe __ ofreportedsaidsaidgovernorpresidentpresidentpresidentgovernorsaidreportedthea♦ the president said that the downturn was over ♦[Finch and Chater 92, Shuetze 93, many others]∏−=iiiiiccPcwPCSP


View Full Document

Berkeley COMPSCI 294 - Grammar Induction

Documents in this Course
"Woo" MAC

"Woo" MAC

11 pages

Pangaea

Pangaea

14 pages

Load more
Download Grammar Induction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Grammar Induction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Grammar Induction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?