Berkeley COMPSCI 294 - Grammar Induction - D779953

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 294> Grammar Induction

DOC PREVIEW

Berkeley COMPSCI 294 - Grammar Induction

School name University of California, Berkeley

Course Compsci 294- Special Topics

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CS 294-5: StatisticalNatural Language ProcessingGrammar InductionDan KleinAssignment 3 Honors Best F1 Christine Hodges (84.1, 2H/2V, bin first) David Rosenberg (84.2, 2V, bin first?) Best Exact Match Roger Bock (36.1%, 2H/2V, pre-tagged) Observations: Bug in my lexicon (Rosenberg) V/H order has subtle issues (Maire) Short test sentences can be parsed almost as well with short training sentences only (Barrett, Petrov) Rare rules slow parsing, hurt accuracy (Latham) Unary issues (Nakov, Bock) Exact match can be at odds with F1 (why?)Idea: Lexical Affinity Models Words select other words on syntactic grounds Idea: Link up pairs with high mutual information [Yuret, 1998]: Greedy linkage [Paskin, 2001]: Iterative re-estimation with EM Evaluation: compare linked pairs to a gold standardcongress narrowly passed the amended bill(,)(| ,)head argP arg head dir∏()Pgraph()PlengthLexical Affinity Models Generative Model for [Paskin, 2001]congress narrowly passed the amended billpassed …. billUniformEmpiricalProblem: Non-Syntactic Affinity Mutual information between words does not necessarily indicate syntactic selection.a new year begins in new yorkexpect brushbacks but no beanballscongress narrowly passed the amended bill39.7AccuracyMethodPaskin, 200141.7RandomIdea: Word Classes Individual words like congress are entwined with semantic facts about the world. Syntactic classes, like NOUN and ADVERB are bleached of word-specific semantics. Automatic word classes more likely to look like DAYS-OF-WEEK or PERSON-NAME. We could build dependency models over word classes. [cf. Carroll and Charniak, 1992]NOUN ADVERB VERB DET PARTICIPLE NOUNcongress narrowly passed the amended bill2()Plength ()Pgraph(,)(( )|( ), )head argP c arg c head dir∏(|())wordPword cword∏(, ,)P words classes graph =A Word-Class ModelNOUN ADVERB VERB DET PARTICIPLE NOUNcongress narrowly passed the amended billProblems: Word Class Models Issues: Too simple a model – doesn’t work much better supervised No representation of valence (number of arguments)NOUN NOUN VERBstock prices fellNOUN NOUN VERBstock prices fell41.7Random53.244.7Adjacent WordsCarroll and Charniak, 92Issue: Local RepresentationsDistanceP(c(a) | c(h), d)This Model (DMV)P(c(a) | c(h))Carroll & Charniak 92P(a | h)Paskin 01Local FactorWord-Classes?argheaddistance?A Head-Outward Model (DMV) Supervised statistical parsers benefit from modeling tree distributions implicitly. [e.g., Collins, 99] A head-outward model with word classes and valence/adjacency:()hPt=(, )(STOP|(), , )aargshdirP c h dir adj∈¬∏(()| (), ) ( | ())Pca ch dirPa ca(STOP | ( ), , )Pchdiradj()aPt{,}dir l r∈∏ha1a2STOP1(|, ) (,, , )distPa h dir count h a dist dirdist∝∑Results: Dependencies55.9Adjacent Words62.7Our Model (DMV) Model is re-estimated with EM Cubic dynamic program run over each sentence Expected counts of each modeled configuration are aggregated Initialization: Initial parameters from simple heuristics:Common Errors: Dependency420DET → V-PRES470DET → V-PAST627DET → PREP696DET ← N-PL735PREP ← DET760NUM → NUM2096N-PROP ← N-PROP3474DET ← N54N → V-PAST54NUM ← NUM669N ← PREP672DET → N-PL714N → V-PRES838PREP ← N1898N-PROP → N-PROP3079DET → NUnderproposedDependenciesOverproposedConstituentsOverproposedDependencies3Early Approaches: Structure Search Incremental grammar learning, chunking [Wolff 88, Langley 82, many others] Can recover synthetic grammars An (extremely good) result of incremental structure search: Looks good, … but can’t parse in the wild.Issues with Chunk/Merge Systems Hard to recover from initial choices (c.f. EM, where the issue is initial state) Hard to make local decisions which will interact well with each other (e.g. group verb-preposition and preposition-determiner, both wrong, and not consistent) Good local heuristics often don’t have a well-formed global objective that can be evaluated for the target grammar.Idea: Learn PCFGs with EM Classic experiments on learning PCFGs with Expectation-Maximization [Lari and Young, 1990] Full binary grammar over n symbols Parse uniformly/randomly at first Re-estimate rule expectations off of parses Repeat Their conclusion: it doesn’t really work.XjXiXk{ X1, X2… Xn}:( , , ) ( ):()()(|,,)()kT X i j yield T ScTyieldT SPTPX ijSPT∧===∑∑2(|)1/ab cPX X X n=Re-estimation of PCFGs Basic quantity needed for re-estimation with EM: Can calculate in cubic time with the Inside-Outside algorithm. Consider an initial grammar where all productions have equal weight: Then all trees have equal probability initially. Therefore, after one round of EM, the posterior over trees will (in the absence of random perturbation) be approximately uniform over all trees, and symmetric over symbols.Problem: “Uniform” PosteriorsTree UniformSplit UniformProblem: Model Symmetries Symmetries How does this relate to trees?NOUN VERB ADJ NOUNX1?X2?X1? X2?NOUN VERB ADJ NOUNNOUNVERBNOUNVERBADJ4Other Approaches Evaluation: fraction of nodes in gold trees correctly posited in proposed trees (unlabeled recall) Some recent work in learning constituency: [Adrians, 99] Language grammars aren’t general PCFGs [Clark, 01] Mutual-information filters detect constituents, then an MDL-guided search assembles them [van Zaanen, 00] Finds low edit-distance sentence pairs and extracts their differences35.634.616.8van Zaanen, 2000Clark, 2001Adriaans, 1999Right-Branching Baseline English trees tend to be right-branching, not balanced A simple (English-specific) baseline is to choose the right chain structure for each sentencethey were unwilling to agree to new terms35.6van Zaanen, 0046.4Right-BranchDesiderata: Practical Learnability Be as simple as possible Make symmetries self-breaking whenever possible Avoid hidden structures which are not directly coupled to surface phenomena To be practically learnable, models should:congress NOUNNPSVPVNP?Inspiration: Distributional Clusteringthe __ ofgovernorsources __ ♦president __ thatsources __ ♦the __ appointedthe __ saidthe __ ofreportedsaidsaidgovernorpresidentpresidentpresidentgovernorsaidreportedthea♦ the president said that the downturn was over ♦[Finch and Chater 92, Shuetze 93, many others]∏−=iiiiiccPcwPCSP

View Full Document

Berkeley COMPSCI 294 - Grammar Induction

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 6 pages.

Berkeley COMPSCI 294 - Grammar Induction

Sign up for free to view:

Please select your school