DOC PREVIEW
MSU CSE 842 - Lecture10-Parsing2
Course Cse 842-
Pages 11

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

2/16/2011 CSE842, Spring 2011, MSU 1CSE 842Natural Language ProcessingLecture 10: Parsing (2)2/16/2011 CSE842, Spring 2011, MSU 2Syntactic Parsing• Declarative formalisms like CFGs define the legal strings of a language but don’t specify how to recognize or assign structure to them• Parsing algorithms specify how to recognize the strings of a language and assign each string one or more syntactic structures• Parse trees useful for grammar checking, semantic analysis, MT, QA, information extraction, speech recognition…and almost every task in NLP2/16/2011 CSE842, Spring 2011, MSU 3Comparing Top-Down and Bottom-Up• Top-Down parsers never explore illegal parses (e.g. can’t form an S) -- but waste time on trees that can never match the input• Bottom-Up parsers never explore trees inconsistent with input -- but waste time exploring illegal parses (no S root)• In both cases, we assume processing is done in parallel, but it is not practical to do so. • Search with backtracking2/16/2011 CSE842, Spring 2011, MSU 4Backtracking• One approach is called backtracking.– Make a choice, if it works out then fine– If not then back up and make a different choice• Backtracking methods are doomed because of two problems– Ambiguity– Shared subproblems2/16/2011 CSE842, Spring 2011, MSU 5Dynamic Programming• Create table of solutions to sub-problems (e.g. subtrees) as parse proceeds• Look up subtrees for each constituent rather than re-parsing, avoiding repeated work.• Since all parses implicitly stored, all available for later disambiguation• We will look at two approaches corresponding to top-down and bottom-up: – CKY: Cocke-Kasami-Younger (CKY) (1960),– Earley: Earley (1970) 2/16/2011 CSE842, Spring 2011, MSU 6CKY Parsing• Limit our grammar to Chomsky Normal Form –A → BC (two non-terminals)–A → a (one terminal)• Consider the rule A →BC– If there is an A somewhere in the input then there must be a B followed by a C in the input.– If the A spans from i to j in the input then there must be some k st. i<k<j• i.e., A is split to B followed by C somewhere.2/16/2011 CSE842, Spring 2011, MSU 7Problem• What if your grammar isn’t binary?– As in the case of the TreeBank grammar?• Convert it to binary… any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically.• What does this mean?– The resulting grammar accepts (and rejects) the same set of strings as the original grammar.–Butthe resulting derivations (trees) are different.– Weak equivalence2/16/2011 CSE842, Spring 2011, MSU 8Problem• More specifically, we want our rules to be of the formA →B COrA →aThat is, rules can expand to either 2 non-terminals or to a single terminal.2/16/2011 CSE842, Spring 2011, MSU 9Conversion to CNF• Replace terminals with non-terminals for rules that mix terminals and non-terminals on RHS. • Eliminate chains of unit productions.• Introduce new intermediate non-terminals into the grammar that distribute rules with length > 2 over several rules. –So…S →A B C turns into S →X C andX→A BWhere X is a symbol that doesn’t occur anywhere else in the the grammar.2/16/2011 CSE842, Spring 2011, MSU 10Sample L1 Grammar2/16/2011 CSE842, Spring 2011, MSU 11CNF Conversion2/16/2011 CSE842, Spring 2011, MSU 12CKY Intuition• So let’s build a table so that each cell [i,j] in the table stores the constituent (e.g., A) spanning from i to j in the input.• So a non-terminal spanning an entire string will sit in cell [0, n]– Hopefully an S• If we build the table bottom-up, we’ll know that the parts of the A must go from i to k and from k to j, for some k.• In other words, if we think there might be an A spanning i,j in the input then we need to check if there exists a rule A →B Cwhere B is in [i,k] and C is in [k,j] for some i<k<j2/16/2011 CSE842, Spring 2011, MSU 13CKY• We arranged the loops to fill the table a column at a time, from left to right, bottom to top. – This assures us that whenever we’re filling a cell, the parts needed to fill it are already in the table (to the left and below)– It’s somewhat natural in that it processes the input a left to right a word at a time• Known as online2/16/2011 CSE842, Spring 2011, MSU 14Example2/16/2011 CSE842, Spring 2011, MSU 15CKY Algorithm2/16/2011 CSE842, Spring 2011, MSU 16CKY Parsing• Is that really a parser?• What needs to be changed to turn this algorithm into a parser?2/16/2011 CSE842, Spring 2011, MSU 17ExampleFilling column 52/16/2011 CSE842, Spring 2011, MSU 18Example2/16/2011 CSE842, Spring 2011, MSU 19Example2/16/2011 CSE842, Spring 2011, MSU 20Example2/16/2011 CSE842, Spring 2011, MSU 21Example2/16/2011 CSE842, Spring 2011, MSU 22CKY Summary• The problems of constructing parsing trees using CKY in practice. – Post-process resulting trees for restoration. – Change the CKY algorithm to allow consideration of unit productions (how? Homework assignment 2)• Since it’s bottom up, CKY populates the table with a lot of phantom constituents.– Segments that by themselves are constituents but cannot really occur in the context of a given sentence– To avoid this we can switch to a top-down control strategy2/16/2011 CSE842, Spring 2011, MSU 23Earley Parsing• Allows arbitrary CFGs• Top-down control• Fills a table in a single sweep over the input– Table is length N+1; N is number of words– Think of chart entries as sitting between words in the input string keeping track of states of the parse at these positions– For each word position, chart contains set of states representing all partial parse trees generated to date.• Completed constituents and their locations• In-progress constituents• Predicted constituents2/16/2011 CSE842, Spring 2011, MSU 24States• The table-entries are called states and are represented with dotted-rules.S →· VP A VP is predictedNP →Det · Nominal An NP is in progressVP →V NP · A VP has been found2/16/2011 CSE842, Spring 2011, MSU 25States/Locations•S →z VP [0,0]•NP →Det z Nominal [1,2]•VP →V NP z [0,3]• A VP is predicted at the start of the sentence• An NP is in progress; the Detgoes from 1 to 2• A VP has been found starting at 0 and ending at 3–[x,y] tells us where the state begins (x) and where the dot lies (y) with respect to the input2/16/2011 CSE842, Spring 2011, MSU 26S --> • VP, [0,0] – First 0 means S constituent begins at the start of the input– Second 0


View Full Document

MSU CSE 842 - Lecture10-Parsing2

Course: Cse 842-
Pages: 11
Download Lecture10-Parsing2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture10-Parsing2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture10-Parsing2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?