UMD CMSC 723 - Word Sense Disambiguation - D1763996

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 723> Word Sense Disambiguation

DOC PREVIEW

UMD CMSC 723 - Word Sense Disambiguation

School name University of Maryland, College Park

Course Cmsc 723- Computational Linguistics I

Pages 54

This preview shows page 1-2-3-4-25-26-27-51-52-53-54 out of 54 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 54 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 54 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 54 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 54 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 54 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 54 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 54 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 54 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 54 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 54 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 54 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 54 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Slide 1Progression of the CourseToday’s AgendaWord Sense DisambiguationRecap: Word SenseWord Sense DisambiguationHow big is the problem?Ground TruthCorporaEvaluationBaseline + Upper BoundWSD ApproachesLesk’s AlgorithmLesk’s AlgorithmSupervised WSD: NLP meets MLSupervised ClassificationThree Laws of Machine LearningFeaturesClassifiersClassifiers TradeoffsNaïve BayesThe “Naïve” PartNaïve Bayes: TrainingDecision ListBuilding Decision ListsDecision TreesUsing Decision TreesBuilding Decision TreesEvaluating Splits via EntropyWSD AccuracyMinimally Supervised WSDOne Sense per DiscourseOne Sense per CollocationYarowsky’s Method: ExampleSlide 35Slide 36Slide 37Slide 38Yarowsky’s Method: StoppingYarowsky’s Method: DiscussionWSD with Parallel TextBeyond Lexical SemanticsSyntax-Semantics PipelineSemantic AttachmentsAugmenting Syntactic RulesSemantic Analysis: ExampleComplexitiesSemantics in NLP TodayWhat works in NLP?PropBank: Two ExamplesHow do we do it?Recap of Today’s TopicsThe Complete PictureThe Home StretchWord Sense DisambiguationCMSC 723: Computational Linguistics I ― Session #11Jimmy LinThe iSchoolUniversity of MarylandWednesday, November 11, 2009Material drawn from slides by Saif Mohammad and Bonnie DorrProgression of the CourseWordsFinite-state morphologyPart-of-speech tagging (TBL + HMM)StructureCFGs + parsing (CKY, Earley)N-gram language modelsMeaning!Today’s AgendaWord sense disambiguationBeyond lexical semanticsSemantic attachments to syntaxShallow semantics: PropBankWord Sense DisambiguationRecap: Word SenseNoun{pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb{shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert”{pipe} (play on a pipe) “pipe a tune”{pipe} (trim with piping) “pipe the skirt”From WordNet:Word Sense DisambiguationTask: automatically select the correct sense of a wordLexical sampleAll-wordsTheoretically useful for many applications:Semantic similarity (remember from last time?)Information retrievalMachine translation…Solution in search of a problem? Why?How big is the problem?Most words in English have only one sense62% in Longman’s Dictionary of Contemporary English79% in WordNetBut the others tend to have several sensesAverage of 3.83 in LDOCEAverage of 2.96 in WordNetAmbiguous words are more frequently usedIn the British National Corpus, 84% of instances have more than one senseSome senses are more frequent than othersGround TruthWhich sense inventory do we use?Issues there?Application specificity?CorporaLexical sampleline-hard-serve corpus (4k sense-tagged examples)interest corpus (2,369 sense-tagged examples)… All-wordsSemCor (234k words, subset of Brown Corpus)Senseval-3 (2081 tagged content words from 5k total words)…Observations about the size?EvaluationIntrinsicMeasure accuracy of sense selection wrt ground truthExtrinsicIntegrate WSD as part of a bigger end-to-end system, e.g., machine translation or information retrievalCompare WSDBaseline + Upper BoundBaseline: most frequent senseEquivalent to “take first sense” in WordNetDoes surprisingly well!Upper bound:Fine-grained WordNet sense: 75-80% human agreementCoarser-grained inventories: 90% human agreement possibleWhat does this mean?62% accuracy in this case!WSD ApproachesDepending on use of manually created knowledge sourcesKnowledge-leanKnowledge-richDepending on use of labeled dataSupervisedSemi- or minimally supervisedUnsupervisedLesk’s AlgorithmIntuition: note word overlap between context and dictionary entriesUnsupervised, but knowledge richThe bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable-rate mortgage securities. WordNetLesk’s AlgorithmSimplest implementation:Count overlapping content words between glosses and contextLots of variants:Include the examples in dictionary definitionsInclude hypernyms and hyponymsGive more weight to larger overlaps (e.g., bigrams)Give extra weight to infrequent words (e.g., idf weighting)…Works reasonably well!Supervised WSD: NLP meets MLWSD as a supervised classification taskTrain a separate classifier for each wordThree components of a machine learning problem:Training data (corpora)Representations (features)Learning method (algorithm, model)Supervised Classificationlabel1label2label3label4Classifiersupervised machine learning algorithm?unlabeled documentlabel1?label2?label3?label4?TestingTrainingtraining dataRepresentation FunctionThree Laws of Machine LearningThou shalt not mingle training data with test dataThou shalt not mingle training data with test dataThou shalt not mingle training data with test dataBut what do you do if you need more test data?FeaturesPossible featuresPOS and surface form of the word itselfSurrounding words and POS tagPositional information of surrounding words and POS tagsSame as above, but with n-gramsGrammatical information…Richness of the features?Richer features = ML algorithm does less of the workMore impoverished features = ML algorithm does more of the workClassifiersOnce we cast the WSD problem as supervised classification, many learning techniques are possible:Naïve Bayes (the thing to try first)Decision listsDecision treesMaxEntSupport vector machinesNearest neighbor methods…Classifiers TradeoffsWhich classifier should I use?It depends:Number of featuresTypes of featuresNumber of possible values for a featureNoise…General advice:Start with Naïve BayesUse decision trees/lists if you want to understand what the classifier is doingSVMs often give state of the art performanceMaxEnt methods also work wellNaïve BayesPick the sense that is most probable given the contextContext represented by feature vectorBy Bayes’ Theorem:Problem: data sparsity!)fP(s|sSs maxargˆ)()()|maxargˆfPsPsfP(sSsWe

View Full Document