Unformatted text preview:

CMSC 723 Computational Linguistics I Session 6 Syntax and Context Free Grammars Jimmy Lin The iSchool University of Maryland Wednesday October 7 2009 Today s Agenda Words structure meaning Formal Grammars Context free grammar Grammars for English Treebanks Dependency grammars Next week parsing algorithms Grammar and Syntax By grammar or syntax we mean implicit knowledge of a native speaker Acquired by around three years old without explicit instruction It s already inside our heads we re just trying to formally capture it Not the kind of stuff you were later taught in school Don t split infinitives Don t end sentences with prepositions Syntax Why should you care Syntactic analysis is a key component in many applications Grammar checkers Conversational agents Question answering Information extraction Machine translation Constituency Basic idea groups of words act as a single unit Constituents form coherent classes that behave similarly With respect to their internal structure e g at the core of a noun phrase is a noun With respect to other constituents e g noun phrases generally occur before verbs Constituency Example The following are all noun phrases in English Why They can all precede verbs They can all be preposed Grammars and Constituency For a particular language Answer not obvious and difficult What are the right set of constituents What rules govern how they combine That s why there are so many different theories of grammar and competing analyses of the same data Approach here Very generic Focus primarily on the machinery Doesn t correspond to any modern linguistic theory of grammar Context Free Grammars Context free grammars CFGs Aka phrase structure grammars Aka Backus Naur form BNF Consist of Rules Terminals Non terminals Context Free Grammars Terminals Non Terminals We ll take these to be words for now The constituents in a language e g noun phrase Rules Consist of a single non terminal on the left and any number of terminals and non terminals on the right Some NP Rules Here are some rules for our noun phrases Rules 1 2 describe two kinds of NPs One that consists of a determiner followed by a nominal Another that consists of proper names Rule 3 illustrates two things An explicit disjunction A recursive definition L0 Grammar CFG Formal definition Three fold View of CFGs Generator Acceptor Parser Derivations and Parsing A derivation is a sequence of rules applications that Covers all tokens in the input string Covers only the tokens in the input string Parsing given a string and a grammar recover the derivation Derivation can be represented as a parse tree Multiple derivations Parse Tree Example Note equivalence between parse trees and bracket notation Natural vs Programming Languages Wait don t we do this for programming languages What s similar What s different An English Grammar Fragment Sentences Noun phrases Issue agreement Verb phrases Issue subcategorization Sentence Types Declaratives A plane left S NP VP Imperatives Leave S VP Yes No Questions Did the plane leave S Aux NP VP WH Questions When did the plane leave S WH NP Aux NP VP Noun Phrases Let s consider these rules in detail NPs are a bit more complex than that Consider All the morning flights from Denver to Tampa leaving before 10 A Complex Noun Phrase stuff that comes after stuff that comes before head central most critical part of the NP Determiners Noun phrases can start with determiners Determiners can be Simple lexical items the this a an etc e g a car Or simple possessives e g John s car Or complex recursive versions thereof e g John s sister s husband s son s car Premodifiers Come before the head Examples Cardinals ordinals etc e g three cars Adjectives e g large car Ordering constraints three large cars vs large three cars Postmodifiers Naturally come after the head Three kinds Prepositional phrases e g from Seattle Non finite clauses e g arriving before noon Relative clauses e g that serve breakfast Similar recursive rules to handle these Nominal Nominal PP Nominal Nominal GerundVP Nominal Nominal RelClause A Complex Noun Phrase Revisited Agreement Agreement constraints that hold among various constituents Example number agreement in English This flight Those flights One flight Two flights This flights Those flight One flights Two flight Problem Our NP rules don t capture agreement constraints Accepts grammatical examples this flight Also accepts ungrammatical examples these flight Such rules overgenerate We ll come back to this later Verb Phrases English verb phrases consists of Head verb Zero or more following constituents called arguments Sample rules Subcategorization Not all verbs are allowed to participate in all VP rules We can subcategorize verbs according to argument patterns sometimes called frames Modern grammars may have 100s of such classes This is a finer grained articulation of traditional notions of transitivity Subcategorization Sneeze John sneezed Find Please find a flight to NY NP Give Give me NP a cheaper fare NP Help Can you help me NP with a flight PP Prefer I prefer to leave earlier TO VP Told I was told United has a flight S Subcategorization Subcategorization at work But some verbs can participate in multiple frames John sneezed the book I prefer United has a flight Give with a flight I ate I ate the apple How do we formally encode these constraints Why As presented the various rules for VPs overgenerate John sneezed the book NP Allowed by the second rule Possible CFG Solution Encode agreement in non terminals SgS SgNP SgVP PlS PlNP PlVP SgNP SgDet SgNom PlNP PlDet PlNom PlVP PlV NP SgVP SgV Np Can use the same trick for verb subcategorization Possible CFG Solution Critique It works But it s ugly And it doesn t scale explosion of rules Alternatives Multi pass solutions Three fold View of CFGs Generator Acceptor Parser The Point CFGs have about just the right amount of machinery to account for basic syntactic structure in English Lot s of issues though Good enough for many applications But there are many alternatives out there Treebanks Treebanks are corpora in which each sentence has been paired with a parse tree These are generally created Hopefully the right one By first parsing the collection with an automatic parser And then having human annotators correct each parse as necessary But Detailed annotation guidelines are needed Explicit instructions for dealing with particular constructions Penn Treebank Penn TreeBank is a widely used treebank 1 million words from the


View Full Document

UMD CMSC 723 - Syntax and Context-Free Grammars

Documents in this Course
Lecture 9

Lecture 9

12 pages

Smoothing

Smoothing

15 pages

Load more
Download Syntax and Context-Free Grammars
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Syntax and Context-Free Grammars and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Syntax and Context-Free Grammars 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?