DOC PREVIEW
Berkeley COMPSCI 294 - Lecture Notes

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 294-5: StatisticalNatural Language ProcessingDan KleinMF 1-2:30p mSoda Hall 310Last Time Language models for text categorization (Naïve-Bayes, conditional LMs) Generative models: Break a complex structure down into derivation steps Each step is a multinomial choice, conditioned on some history We estimate those multinomials by collecting counts and smoothing Backbone of statistical NLP until very recently Today: maximum entropy, a discriminative approachcw1w2wn. . .STARTWord Senses Words have multiple distinct meanings, or senses: Plant: living plant, manufacturing plant, … Title: name of a work, ownership document, form of address, material at the start of a film, … Many levels of sense distinctions Homonymy: totally unrelated meanings (river bank, money bank) Polysemy: related meanings (star in sky, star on tv) Systematic polysemy: productive meaning extensions (organizations to their buildings) or metaphor Sense distinctions can be extremely subtle (or not) Granularity of senses needed depends a lot on the task Why is it importat to model word senses? Translation, parsing, information retrieval?Word Sense Disambiguation Example: living plant vs. manufacturing plant How do we tell these senses apart? “context” Maybe it’s just text categorization Each word sense represents a topic Run the naive-bayes classifier from last class? Bag-of-words classification works ok for noun senses 90% on classic, shockingly easy examples (line, interest, star) 80% on senseval-1 nouns 70% on senseval-1 verbsThe manufacturing plant which had previously sustained the town’s economy shut down after an extended labor strike.Verb WSD Why are verbs harder? Verbal senses less topical More sensitive to structure, argument choice Verb Example: “Serve” [function] The tree stump serves as a table [enable] The scandal served to increase his popularity [dish] We serve meals for the homeless [enlist] He served his country [jail] He served six years for embezzlement [tennis] It was Agassi's turn to serve [legal] He was served by the sheriff Rest of today: a maximum entropy approachVarious Approaches to WSD Unsupervised learning Bootstrapping (Yarowsky 95) Clustering Indirect supervision From thesauri From WordNet From parallel corpora Supervised learning Most systems do some kind of supervised learning Many competing classification technologies perform about the same (it’s all about the knowledge sources you tap) Problem: training data available for only a few words2Resources WordNet Hand-build (but large) hierarchy of word senses Basically a hierarchical thesaurus SensEval A WSD competition, of which there have been 3 iterations Training / test sets for a wide range of words, difficulties, and parts-of-speech Bake-off where lots of labs tried lots of competing approaches SemCor A big chunk of the Brown corpus annotated with WordNetsenses OtherResources The Open Mind Word Expert Parallel texts Flat thesauriKnowledge Sources So what do we need to model to handle “serve”? There are distant topical cues  …. point … court ………………… serve ……… game … ∏=iincwPcPwwwcP )|()(),,,(21…cw1w2wn. . .Weighted Windows with NB Distance conditioning Some words are important only when they are nearby …. as …. point … court ………………… serve ……… game … …. ………………………………………… serve as…………….. Distance weighting Nearby words should get a larger vote … court …… serve as……… game …… point'10 1 '(, ,..., , , , ) () ( | , ())kkkiikPcw w w w w Pc Pw cbini−− ++=−=∏…'()10 1 '( , ,..., , , , ) ( ) ( | )kboost ikkiikPcw w w w w Pc Pw c−− ++=−=∏…boostrelative position iBetter Features There are smarter features: Argument selectional preference: serve NP[meals] vs. serve NP[papers] vs. serve NP[country] Subcategorization: [function] serve PP[as] [enable] serve VP[to] [tennis] serve <intransitive> [food] serve NP {PP[to]} Can capture poorly (but robustly) with local windows … but we can also use a parser and get these features explicitly Other constraints (Yarowsky 95) One-sense-per-discourse (only true for broad topical distinctions) One-sense-per-collocation (pretty reliable when it kicks in: manufacturing plant, flowering plant)Complex Features with NB? Example: So we have a decision to make based on a set of cues: context:jail, context:county, context:feeding, … local-context:jail, local-context:meals subcat:NP, direct-object-head:meals Not clear how build a generative derivation for these: Choose topic, then decide on having a transitive usage, then pick “meals” to be the object’s head, then generate other words? How about the words that appear in multiple features? Hard to make this work (though maybe possible) No real reason to tryWashington County jail served 11,166 meals last month - a figure that translates to feeding some 120 people three times daily for 31 days. A Discriminative Approach View WSD as a discrimination task (regression, really) Have to estimate multinomial (over senses) where there are a huge number of things to condition on History is too complex to think about this as a smoothing / back-off problem Many feature-based classification techniques out there We tend to need ones that output distributions over classes (why?)P(sense | context:jail, context:county, context:feeding, …local-context:jail, local-context:mealssubcat:NP, direct-object-head:meals, ….)3Feature Representations Features are indicator functions fiwhich count the occurrences of certain patterns in the input We map each input to a vector of feature predicate countsWashington County jail served11,166 meals last month - a figure that translates to feeding some 120 people three times daily for 31 days. context:jail = 1context:county = 1 context:feeding = 1context:game = 0…local-context:jail = 1local-context:meals = 1…subcat:NP = 1subcat:PP = 0…object-head:meals = 1object-head:ball = 0{()}ifddLinear Classifiers For a pair (c,d), we take a weighted vote for each class: There are many ways to set these weights Perceptron: find a currently misclassified example, and nudge


View Full Document

Berkeley COMPSCI 294 - Lecture Notes

Documents in this Course
"Woo" MAC

"Woo" MAC

11 pages

Pangaea

Pangaea

14 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?