Slide 1Progression of the CourseToday’s AgendaWord Sense DisambiguationRecap: Word SenseWord Sense DisambiguationHow big is the problem?Ground TruthCorporaEvaluationBaseline + Upper BoundWSD ApproachesLesk’s AlgorithmLesk’s AlgorithmSupervised WSD: NLP meets MLSupervised ClassificationThree Laws of Machine LearningFeaturesClassifiersClassifiers TradeoffsNaïve BayesThe “Naïve” PartNaïve Bayes: TrainingDecision ListBuilding Decision ListsDecision TreesUsing Decision TreesBuilding Decision TreesEvaluating Splits via EntropyWSD AccuracyMinimally Supervised WSDOne Sense per DiscourseOne Sense per CollocationYarowsky’s Method: ExampleSlide 35Slide 36Slide 37Slide 38Yarowsky’s Method: StoppingYarowsky’s Method: DiscussionWSD with Parallel TextBeyond Lexical SemanticsSyntax-Semantics PipelineSemantic AttachmentsAugmenting Syntactic RulesSemantic Analysis: ExampleComplexitiesSemantics in NLP TodayWhat works in NLP?PropBank: Two ExamplesHow do we do it?Recap of Today’s TopicsThe Complete PictureThe Home StretchWord Sense DisambiguationCMSC 723: Computational Linguistics I ― Session #11Jimmy LinThe iSchoolUniversity of MarylandWednesday, November 11, 2009Material drawn from slides by Saif Mohammad and Bonnie DorrProgression of the CourseWordsFinite-state morphologyPart-of-speech tagging (TBL + HMM)StructureCFGs + parsing (CKY, Earley)N-gram language modelsMeaning!Today’s AgendaWord sense disambiguationBeyond lexical semanticsSemantic attachments to syntaxShallow semantics: PropBankWord Sense DisambiguationRecap: Word SenseNoun{pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb{shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert”{pipe} (play on a pipe) “pipe a tune”{pipe} (trim with piping) “pipe the skirt”From WordNet:Word Sense DisambiguationTask: automatically select the correct sense of a wordLexical sampleAll-wordsTheoretically useful for many applications:Semantic similarity (remember from last time?)Information retrievalMachine translation…Solution in search of a problem? Why?How big is the problem?Most words in English have only one sense62% in Longman’s Dictionary of Contemporary English79% in WordNetBut the others tend to have several sensesAverage of 3.83 in LDOCEAverage of 2.96 in WordNetAmbiguous words are more frequently usedIn the British National Corpus, 84% of instances have more than one senseSome senses are more frequent than othersGround TruthWhich sense inventory do we use?Issues there?Application specificity?CorporaLexical sampleline-hard-serve corpus (4k sense-tagged examples)interest corpus (2,369 sense-tagged examples)… All-wordsSemCor (234k words, subset of Brown Corpus)Senseval-3 (2081 tagged content words from 5k total words)…Observations about the size?EvaluationIntrinsicMeasure accuracy of sense selection wrt ground truthExtrinsicIntegrate WSD as part of a bigger end-to-end system, e.g., machine translation or information retrievalCompare WSDBaseline + Upper BoundBaseline: most frequent senseEquivalent to “take first sense” in WordNetDoes surprisingly well!Upper bound:Fine-grained WordNet sense: 75-80% human agreementCoarser-grained inventories: 90% human agreement possibleWhat does this mean?62% accuracy in this case!WSD ApproachesDepending on use of manually created knowledge sourcesKnowledge-leanKnowledge-richDepending on use of labeled dataSupervisedSemi- or minimally supervisedUnsupervisedLesk’s AlgorithmIntuition: note word overlap between context and dictionary entriesUnsupervised, but knowledge richThe bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable-rate mortgage securities. WordNetLesk’s AlgorithmSimplest implementation:Count overlapping content words between glosses and contextLots of variants:Include the examples in dictionary definitionsInclude hypernyms and hyponymsGive more weight to larger overlaps (e.g., bigrams)Give extra weight to infrequent words (e.g., idf weighting)…Works reasonably well!Supervised WSD: NLP meets MLWSD as a supervised classification taskTrain a separate classifier for each wordThree components of a machine learning problem:Training data (corpora)Representations (features)Learning method (algorithm, model)Supervised Classificationlabel1label2label3label4Classifiersupervised machine learning algorithm?unlabeled documentlabel1?label2?label3?label4?TestingTrainingtraining dataRepresentation FunctionThree Laws of Machine LearningThou shalt not mingle training data with test dataThou shalt not mingle training data with test dataThou shalt not mingle training data with test dataBut what do you do if you need more test data?FeaturesPossible featuresPOS and surface form of the word itselfSurrounding words and POS tagPositional information of surrounding words and POS tagsSame as above, but with n-gramsGrammatical information…Richness of the features?Richer features = ML algorithm does less of the workMore impoverished features = ML algorithm does more of the workClassifiersOnce we cast the WSD problem as supervised classification, many learning techniques are possible:Naïve Bayes (the thing to try first)Decision listsDecision treesMaxEntSupport vector machinesNearest neighbor methods…Classifiers TradeoffsWhich classifier should I use?It depends:Number of featuresTypes of featuresNumber of possible values for a featureNoise…General advice:Start with Naïve BayesUse decision trees/lists if you want to understand what the classifier is doingSVMs often give state of the art performanceMaxEnt methods also work wellNaïve BayesPick the sense that is most probable given the contextContext represented by feature vectorBy Bayes’ Theorem:Problem: data sparsity!)fP(s|sSs maxargˆ)()()|maxargˆfPsPsfP(sSsWe
View Full Document