DOC PREVIEW
CORNELL CS 674 - Lecture Notes

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Today's Papers• Example Selection for Bootstrapping Statistical Parsers (2003)• Mark Steedman, Rebecca Hwa, Stephen Clark, Miles Osborne, Anoop Sarkar, Julia Hockenmaier, Paul Ruhlen, Steven Baker, Jeremiah Crim• Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank (2001)• Dan Klein and Christopher D. ManningCo-training• Idea: multiple classifiers (parsers) train each other• Assumes parsers use independent models• During each co-training iteration:• Select a small cache of unlabeled sentences• Run parsers A and B on the cache• Score each parse output from each parser• Select some of A's parses to add to the training set of B, select some of B's parses to add to the training set of A• Retrain both A and B• Corrected co-training: human checks & corrects parser output before it is added to training setCo-training (2)• Problem 1: How do we score output of parser?• Problem 2: How do we do sample selection?• Intuitively, we should use only accurate output• But also choose examples with high training utility• This paper: Sample selection for co-training• Opposing goals: want training samples to have both high training utilityand high accuracyScoring & selecting training examples• Parse scoring• Optimal: comparison to human-labeled ground truth • Practical: likelihood of parse given model• Training example selection• Above-n: (select high-quality samples)• (score of teacher's parse) > n• Difference: (select high-utility samples)• (score of teacher's parse) – (score of student's parse) > n• Intersection: (high-quality, high-utility samples)• (score of teacher's parse in highest n percentile) and(score of student's parse in lowest n percentile)Experimental protocol• Two parsers• Lexicalized context free grammar parser [Collins99]• Lexicalized tree adjoining grammar parser [Sarkar02]• Seed (labelled) training data: 1,000 sentences• Unlabelled training data: ~38,000 sentences• Cache size: 500 sentences• Test data: independent, ~2,400 sentencesResults, ideal scoring function• Conclusion: utility is more important than accuracy• But:• Statistical significance (e.g. error bars)? • What about other values of n?Difference method with n=10% does best“Relaxed” (~85% accuracy) “Strict” (~95% accuracy)Difference method with n=10% does best,... but intersection with n=30% may do better, given more dataResults, practical scoring function• ~1 percentage point gain using diff-30% or int-30%• Again, utility more important than accuracy~20 test sentencesStatistically significant?Corrected co-training, ideal scoring • Intersection (n=30%) may be best choice based on growth rate (need more data to confirm)(i.e. # of sentences that must be checked by human) (i.e. # of constituents that must be corrected by human)Bad choice: requires more human effort than supervised learningCorrected co-training, practical scoring• Corrected co-training still saves human effort• Again, utility is more important than accuracy (assuming results are statistically significant)(i.e. # of sentences that must be checked by human) (i.e. # of constituents that must be corrected by human)Conclusions• Selection methods emphasizing high training utilitydo best, even at the expense of lower accuracy• Quality of scoring function important• For ideal scoring function, co-training significantly improved parser performance (2-3 percentage points)• For practical scoring function, co-training improved performance only marginally (< 1 percentage point)• (so better scoring functions are needed...)• Corrected co-training with high training utility selection further increases performance, with less human effort than a supervised method• Future work: try other pairs (sets) of parsersToday's Papers• Example Selection for Bootstrapping Statistical Parsers (2003)• Mark Steedman, Rebecca Hwa, Stephen Clark, Miles Osborne, Anoop Sarkar, Julia Hockenmaier, Paul Ruhlen, Steven Baker, Jeremiah Crim• Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank (2001)• Dan Klein and Christopher D. ManningPaper by Klein & Manning• Idea: build a grammar by observing the hand-labelled parses of the Penn treebank • But: this can create huge grammars • [Charniak96] found 10,605 rules on the Penn treebank, and less than 40% of those occurred more than once• How fast (slow) are chart parsers that use these grammars?• Which parameters affect parsing speed and memory use? How can we model these effects?Parameters (1)• Tree transforms• None, NoEmpties, NoUnariesParameters (2)• Grammar encoding• List, trie, or minimized DFA• All encodings equivalent (do not affect parser output)• Rule ordering• Top-down or Bottom-upe.g., rules for noun phrase:(gray and white states are accepting)Background: chart parsers (1)•Ccategories, Sstates in the grammar DFA• Chart: nodes and edges• Node: placed between each word of sentence • (so n+1nodes for a sentence with nwords)• Span: range of words (e.g. [0,2] refers to The old)• (there are O(n2)possible spans)0321The old manS -> NP VPNP -> ART NVP -> V NP | VV -> manN -> man | oldART -> theART N NVNP -> ART . NVP-> V . NPBackground: chart parsers (2)• Edge: associates a category with a span (e.g. N:[1,2])• Passive edge: e.g. ART:[0,1]• Means the span belongs to the category• There are O(Cn2) possible passive edges (~2% of edges)• Active edge: e.g. NP -> ART . N:[0,1]• Means that if we find some node k such that [1,k] is a noun, then [0,k] is a noun phrase• There are O(Sn2) possible active edges (~98% of edges)0321The old manS -> NP VPNP -> ART NVP -> V NP | VV -> manN -> man | oldART -> theART N NVNP -> ART . NVP-> V . NPBackground: chart parsers (3)• Saturation of a span: # of edges over that span• Traversal: combining an active edge and a passive edge to form anew edge• e.g. (NP -> ART . N:[0,1] + Noun:[1,2]) => NP:[0,2]• # of traversals bounded by O(SCn3)• Computation cost proportional to number of traversals• Memory use proportional to number of active edges [O(Sn2)]0321The old manS -> NP VPNP -> ART NVP -> V NP | VV -> manN -> man | oldART -> theART N NVNP -> ART . NVP-> V . NPBackground: chart parsers (3)• Saturation of a span: # of edges over that span• Traversal: combining an active edge and a passive


View Full Document
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?