DOC PREVIEW
Columbia COMS W4705 - The Earley Algorithm

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 10ReviewHave all the problems been solved?: Left RecursionSlide 4Slide 5Structural ambiguity:Slide 7Slide 8Slide 9Dynamic ProgrammingEarley’s AlgorithmSlide 120 Book 1 that 2 flight 3Slide 14Successful ParseParsing Procedure for the Earley AlgorithmPredictorScannerCompleterBook that flight (Chart [0])CFG for Fragment of EnglishSlide 23Chart[1]Slide 25Slide 26How do we retrieve the parses at the end?Slide 28Useful PropertiesError HandlingAlternative Control StrategiesSumming UpCS 4705Lecture 10The Earley AlgorithmReview•Top-Down vs. Bottom-Up Parsers–Both generate too many useless trees–Combine the two to avoid over-generation: Top-Down Parsing with Bottom-Up look-ahead•Left-corner table provides more efficient look-ahead–Pre-compute all POS that can serve as the leftmost POS in the derivations of each non-terminal categoryHave all the problems been solved?: Left Recursion•Depth-first search will never terminate if grammar is left recursive (e.g. NP --> NP PP)),(**•Solutions:–Rewrite the grammar (automatically?) to a weakly equivalent one which is not left-recursiveNP --> NP PP | Nom (Nom PP+)---------------------------NP --> Nom NP’NP’ --> PP NP’ | e•This may make rules unnatural•Harder to eliminate non-immediate left recursion–NP --> Nom PP–Nom --> NP–Fix depth of search explicitly–Rule ordering: non-recursive rules firstNP --> Det NomNP --> NP PPThe cat in the hat...Structural ambiguity:•Multiple legal structures–Attachment (e.g. I saw a man on a hill with a telescope)–Coordination (e.g. younger cats and dogs)–NP bracketing (e.g. Spanish language teachers)•Solution? –Return all possible parses and disambiguate using “other methods”Inefficient Re-Parsing of SubtreesDynamic Programming•Create table of solutions to sub-problems (e.g. subtrees) as parse proceeds•Look up subtrees for each constituent rather than re-parsing•Since all parses implicitly stored, all available for later disambiguation•Examples: Cocke-Younger-Kasami (CYK) (1960), Graham-Harrison-Ruzzo (GHR) (1980) and Earley (1970) algorithmsEarley’s Algorithm•Uses dynamic programming to do parallel top-down search in (worst case) O(N3) time •First, L2R pass fills out a chart with N+1 states (N: the number of words in the input)–Think of chart entries as sitting between words in the input string keeping track of states of the parse at these positions–For each word position, chart contains set of states representing all partial parse trees generated to date. E.g. chart[0] contains all partial parse trees generated at the beginning of the sentence•Chart entries represent three type of constituents:–predicted constituents–in-progress constituents–completed constituents•Progress in parse represented by Dotted Rules –Position of • indicates type of constituent–0 Book 1 that 2 flight 3S --> • VP, [0,0] (predicting VP)NP --> Det • Nom, [1,2] (finding NP)VP --> V NP •, [0,3] (found VP)–[x,y] tells us where the state begins (x) and where the dot lies (y) wrt the inputS --> • VP, [0,0] –First 0 means S constituent begins at the start of the input–Second 0 means the dot here too–So, this is a top-down predictionNP --> Det • Nom, [1,2]–the NP begins at position 1–the dot is at position 2–so, Det has been successfully parsed–Nom predicted next0 Book 1 that 2 flight 3VP --> V NP •, [0,3]–Successful VP parse of entire inputSuccessful Parse•Final answer found by looking at last entry in chart•If entry resembles S -->  • [0,N] then input parsed successfully•But note that chart will also contain a record of all possible parses of input string, given the grammar -- not just the successful one(s)Parsing Procedure for the Earley Algorithm•Move through each set of states in order, applying one of three operators to each state:–predictor: add predictions to the chart–scanner: read input and add corresponding state to chart–completer: move dot to right when new constituent found•Results (new states) added to current or next set of states in chart•No backtracking and no states removed: keep complete history of parsePredictor•Intuition: new states represent top-down expectations•Applied when non part-of-speech non-terminals are to the right of a dotS --> • VP [0,0]•Adds new states to current chart–One new state for each expansion of the non-terminal in the grammarVP --> • V [0,0]VP --> • V NP [0,0]Scanner•New states for predicted part of speech.•Applicable when part of speech is to the right of a dotVP --> • V NP [0,0] ‘Book…’•Looks at current word in input•If match, adds state(s) to next chartVP --> V • NP [0,1]Completer•Intuition: parser has discovered a constituent, so must find and advance states all that were waiting for this•Applied when dot has reached right end of ruleNP --> Det Nom • [1,3]•Find all states w/dot at 1 and expecting an NPVP --> V • NP [0,1]•Adds new (completed) state(s) to current chartVP --> V NP • [0,3]Book that flight (Chart [0])•Seed chart with top-down predictions for S from grammar  -[0,0] Dummy start stateS  - NP VP[0,0] PredictorS  - Aux NP VP[0,0] PredictorS  - VP[0,0] PredictorNP  - Det Nom[0,0] PredictorNP  - PropN[0,0] PredictorVP  - V[0,0] PredictorVP  - V NP[0,0] PredictorCFG for Fragment of EnglishPropN  Houston | TWAPrep from | to | onNP  Det NomS  VPS  Aux NP VPS  NP VPNom  N NomNom  NDet  that | this | aN  book | flight | meal | moneyV  book | include | preferAux  doesVP  V NPVP  VNP PropNNom  Nom PPPP  Prep NP•When dummy start state is processed, it’s passed to Predictor, which produces states representing every possible expansion of S, and adds these and every expansion of the left corners of these trees to bottom of Chart[0]•When VP --> • V, [0,0] is reached, Scanner called, which consults first word of input, Book, and adds first state to Chart[1], VP --> Book •, [0,0] •Note: When VP --> • V NP, [0,0] is reached in Chart[0], Scanner does not need to add VP --> Book •, [0,0] again to Chart[1]Chart[1]V book -[0,1] ScannerVP  V -[0,1] CompleterVP  V - NP[0,1] CompleterS  VP -[0,1] CompleterNP  - Det Nom[1,1] PredictorNP  - PropN[1,1] PredictorV--> book - passed to Completer, which finds 2 states in Chart[0] whose left corner is V and adds them to Chart[1],


View Full Document

Columbia COMS W4705 - The Earley Algorithm

Download The Earley Algorithm
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Earley Algorithm and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Earley Algorithm 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?