DOC PREVIEW
UMass Amherst CMPSCI 591N - Probabilistic Parsing in Practice

This preview shows page 1-2-3-4-28-29-30-31-57-58-59-60 out of 60 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 60 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 60 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 60 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 60 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 60 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 60 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 60 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 60 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 60 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 60 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 60 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 60 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 60 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Probabilistic Parsingin PracticeLecture #15Computational LinguisticsCMPSCI 591N, Spring 2006Andrew McCallum(including slides from Michael Collins, Chris Manning, Jason Eisner, Mary Harper)Andrew McCallum, UMassToday’s Main Points•Training data•How to evaluate parsers•Limitations of PCFGs, enhancements & alternatives-Lexicalized PCFGs-Structure sensitivity-Left-corner parsing-Faster parsing with beam search-Dependency parsers•Current state of the artTreebanksPure Grammar Induction Approaches tend not to produce the parse trees that people wantSolutionØGive a some example of parse trees that we wantØMake a learning tool learn a grammarTreebankØA collection of such example parsesØPennTreebank is most widely usedTreebanks●Penn Treebank●Trees are represented via bracketing●Fairly flat structures for Noun Phrases(NP Arizona real estate loans)●Tagged with grammatical and semantic functions(-SBJ , –LOC, …)●Use empty nodes(*) to indicate understood subjects and extraction gaps( ( S ( NP-SBJ The move) ( VP followed ( NP ( NP a round ) ( PP of (NP ( NP similar increases ) ( PP by ( NP other lenders ) ) ( PP against ( NP Arizona real estate loans ))))) , ( S-ADV ( NP-SBJ * ) ( VP reflecting ( NP a continuing decline ) ( PP-LOC in (NP that market )))))) . )Treebanks●Many people have argued that it is better to have linguists constructing treebanks than grammars●Because it is easier-to work out the correct parse of sentences●than-to try to determine what all possible manifestations of a certain rule or grammatical construct areAndrew McCallum, UMassTreebanking Issues•Type of data-Task dependent (newspaper, journals, novels, technical manuals, dialogs, email)•Size-The more the better! (Resource-limited)•Parse representation-Dependency vs Parse tree-Attributes. What do encode? words, morphology, syntax, semantics...-Reference & bookkeeping: date time, who did whatAndrew McCallum, UMassOrganizational Issues•Team-1 Team leader; bookkeeping/hiring-1 Guideline person-1 Linguistic issues person-3-5 Annotators-1-2 Technical staff/programming-2 Checking persons•Double annotation if possible.Andrew McCallum, UMassTreebanking Plan•The main points (after getting funding)-Planning-Basic guidelines development-Annotation & guidelines refinement-Consistency checking, guidelines finalization-Packaging and distribution•Time needed-on the order of 2 years per 1 million words-only about 1/3 of the total effort is annotationAndrew McCallum, UMassParser EvaluationEvaluationUltimate goal is to build system for IE, QA, MTPeople are rarely interested in syntactic analysis for its own sakeEvaluate the system for evaluate the parserFor Simplicity and modularization, and ConvenienceCompare parses from a parser with the result of hand parsing of a sentence(gold standard)What is objective criterion that we are trying to maximize?EvaluationTree Accuracy (Exact match)It is a very tough standard!!!But in many ways it is a sensible one to usePARSEVAL MeasuresFor some purposes, partially correct parses can be usefulOriginally for non-statistical parsersEvaluate the component pieces of a parseMeasures : Precision, Recall, Crossing bracketsEvaluation(Labeled) PrecisionHow many brackets in the parse match those in the correct tree (Gold standard)?(Labeled) RecallHow many of the brackets in the correct tree are in the parse?Crossing bracketsAverage of how many constituents in one tree cross over constituent boundaries in the other treeB1 ( )B2 ( )B3 ( )B4 ( ) w1 w2 w3 w4 w5 w6 w7 w8Problems with PARSEVALEven vanilla PCFG performs quite wellIt measures success at the level of individual decisionsYou must make many consecutive decisions correctly to be correct on the entire tree.Problems with PARSEVAL (2)Behind storyThe structure of Penn TreebankFlat → Few brackets → Low Crossing bracketsTroublesome brackets are avoided → High Precision/RecallThe errors in precision and recall are minimalIn some cases wrong PP attachment penalizes Precision, Recall and Crossing Bracket Accuracy minimally.On the other hand, attaching low instead of high, then every node in the right-branching tree will be wrong: serious harm362%EvaluationDo PARSEVAL measures succeed in real tasks?Many small parsing mistakes might not affect tasks of semantic interpretation(Bonnema 1996,1997) Tree Accuracy of the Parser : 62%Correct Semantic Interpretations : 88%(Hermajakob and Mooney 1997) English to German translationAt the moment, people feel PARSEVAL measures are adequate for the comparing parsersLexicalized ParsingAndrew McCallum, UMassLimitations of PCFGs•PCFGs assume:-Place invariance-Context free: P(rule) independent of •words outside span•also, words with overlapping derivation-Ancestor free: P(rule) independent of•Non-terminals above.•Lack of sensitivity to lexical information•Lack of sensitivity to structural frequenciesAndrew McCallum, UMassLack of Lexical DependencyMeans thatP(VP → V NP NP)is independent of the particular verb involved!... but much more likely with ditransitive verbs (like gave).He gave the boy a ball.He ran to the store.The Need for Lexical DependencyProbabilities dependent on Lexical wordsProblem 1 : Verb subcategorizationVP expansion is independent of the choice of verbHowever …Including actual words information when making decisions about tree structure is necessary verb cometakethinkwantVP -> V9.5%2.6%4.6%5.7%VP -> V NP1.1%32.1%0.2%13.9%VP -> V PP34.5%3.1%7.1%0.3%VP -> V SBAR6.6%0.3%73.0%0.2%VP -> V S2.2%1.3%4.8%70.8%Weakening the independence assumption of PCFGProbabilities dependent on Lexical wordsProblem 2 : Phrasal AttachmentLexical content of phrases provide information for decisionSyntactic category of the phrases provide very little informationStandard PCFG is worse than n-gram modelsAnother case of PP attachment ambiguityAnother Case of PP Attachment


View Full Document

UMass Amherst CMPSCI 591N - Probabilistic Parsing in Practice

Download Probabilistic Parsing in Practice
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Probabilistic Parsing in Practice and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Probabilistic Parsing in Practice 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?