CS6322: CS6322: Information Retrieval Information Retrieval SandaSandaHarabagiuHarabagiuLecture 8:Question Answering 2Lecture 8:Question Answering 2CS 6322: Information RetrievalCS 6322: Information RetrievalPart II. Structure of main QA modules Question processing Document retrieval Answer extraction Part III. Advanced topics in QA Keyword alternations Question caching Special cases of questions Statistical methods Question Treebanks Logic forms Semantic indexingCS 6322: Information RetrievalCS 6322: Information RetrievalQ733: Who was the first Russian astronaut to walk in space?Who was the first Russian astronaut to walk in space WP VBD DT JJ NNP NN TO VB IN NNNPNPPPVPVPSVPSwalkspacespacewalkwalkastronautastronautastronautPERSONastronautPERSONfirstwalkRussianspaceAn ExampleCS 6322: Information RetrievalCS 6322: Information RetrievalDetecting the Answer Type1. Determine the category(ies) of the question stem2. Select answer type nodes {A} having the same category as the question stem3. Select node N that(a) is connected to the question stem(b) has highest connectivity in the semantic representation4. Search for the word in node N along Answer hierarchies5. Return the answer type as the top of the hierarchy found when N was locatedCS 6322: Information RetrievalCS 6322: Information RetrievalPossible Answer TypesTOPPERSON LOCATION DATE TIME PRODUCT NUMERICAL MONEY ORGANIZATION MANNER REASONVALUEDEGREE DIMENSION RATE DURATION PERCENTAGE COUNTtime of daymidnightprime timeclock timehockeyteamteam,squadinstitution,establishmentfinancialinstitutioneducationalinstitutionnumerosity,multiplicityinteger,whole numberpopulationdenominatorthicknesswidth,breadthdistance,lengthaltitudewingspanCS 6322: Information RetrievalCS 6322: Information RetrievalExamplesWhatplayedactressnameShineWhatBMWcompanyproduceTOPPERSON LOCATION DATE TIME PRODUCT NUMERICAL MONEY ORGANIZATION MANNER REASONVALUEWhat is the name of theactress that played in Shine?What does the BMW companyproduce?PERSONPRODUCTPRODUCTPERSONCS 6322: Information RetrievalCS 6322: Information RetrievalQuestion TaxonomyMappingAnswerTypeQuestion ReformulationRulestype itype jtype kAnswer type nAnswer Taxonomyanswer typeQuestion-STEMword 1word 2word 3Question Semantic RepresentationQuestion WordAlternationsanswer typeQuestion-STEMword 1word 2word 3answer typeQuestion-STEMword 1word 2answer typeQuestion-STEMword 1word 2word 3word 4type itype ktype jQuestion Taxonomyanswer typeQuestion-STEMword 1word 2word 3answer typeQuestion-STEMword 1word 2word 3answer typeQuestion-STEMword 1word 2word 3answer typeQuestion-STEMword 1word 2word 3answer typeQuestion-STEMword 1word 2word 3answer typeQuestion-STEMword 1word 2word 3Question TaxonomyNodeCS 6322: Information RetrievalCS 6322: Information RetrievalNamed Entity Categoriesdate time organization townproduct price country moneyhuman disease phone number continentpercent province other location plantmammal alphabet airport code gamebird reptile University dog breednumber quantity attractionDATE TIME ORGANIZATIONREASON MANNER NATIONALITYPRODUCT MONEY LANGUAGEMAMMAL GAME DOG BREEDLOCATION REPTILE NUMERICAL VALUEQUOTATION ALPHABET PERCENTAGETop Answer TaxonomyCS 6322: Information RetrievalCS 6322: Information RetrievalMapping answer types into named entity categoriesAnswer TypeNamed Entity CategoryPersonMoneySpeedDurationAmounthumanmoneypricequantitynumberCS 6322: Information RetrievalCS 6322: Information RetrievalDocument Retrieval Main approaches used so far: Traditional IR and some NLP extensions Indexing Word - based Named entities (terms and variants) Conceptual indexing Paragraph indexing Retrieval Retrieve documents then rank them Retrieve documents, extract passages, then rank passages Retrieve directly passages and rank them Retrieval methods Vector model Boolean modelCS 6322: Information RetrievalCS 6322: Information RetrievalLIMSITermExtractionIndexingQuestionSentenceMatchingQAnswerNLQ. AnalysisFerret et al., Trec9, 2000RankingNamedEntitySearchEngineDocCS 6322: Information RetrievalCS 6322: Information RetrievalTerminological variants for document selectionLIMSI’s QALC System- high level indexes, comprising terms and variants- 2-step procedure:1) automatic term extraction from questions (uses POS tagging and pattern matching)2) automatic document indexing (uses term and variant recognition)CS 6322: Information RetrievalCS 6322: Information RetrievalTerm Extraction in QALC Questions are tagged with Tree Tagger (Schmid 1999) Patterns of symbolic categories are used to extract terms from the tagged questions. The pattern used to extract terms is:(((((JJ | NN | NP | VBG)) ? (JJ | NN | NP | VBG)(NP | NN))) | (VBD) | (NN) | (NP) | (CD))CS 6322: Information RetrievalCS 6322: Information RetrievalExtraction ExamplenameNNofINtheDTUSNPhelicopterNNpilotNNshotVBDdownPP4 terms are acquired:- US helicopter pilot- helicopter pilot- Pilot- shotCS 6322: Information RetrievalCS 6322: Information RetrievalVariant RecognitionUses FASTR (Jacquemin, ACL ’99)- a transformational shallow parser for the recognition of term occurrences and variants.How?- Terms are transformed into grammar rules and the single words building these terms are extracted and linked to their morphological and semantic families.CS 6322: Information RetrievalCS 6322: Information RetrievalMorphological and Semantic FamiliesThe morphological family of w is M(w) – returned by the CELEX database, having the same root morpheme as w.Example: M(maker) = {maker, make, to make,to remake}The semantic family of w is S(w), all the WordNetsynsets containing w.Example: S(maker) = {maker, manufacturer, shaper, manufacturing business}2 senses!CS 6322: Information RetrievalCS 6322: Information RetrievalVariantsVariant patterns that rely on morphological and semantic families are generated through METARULES.Example: the pattern N to SemArgVM(‘maker’) RP? PREP? (ART (NN | NP)? PREP)? ART?(JJ | NN | NP | VBD | VBG)0-3NS(‘car’)extracts: ‘making many automobiles’ as a variant of ‘car manufacturer’Problem: Some incorrect variants are extracted as well:e.g. ‘make those cuts in auto’CS 6322: Information RetrievalCS 6322: Information RetrievalDocument SelectionThe result of NLP-based indexing is a list of term occurrences composed of:- a document
View Full Document