UMD LBSC 796 - Question-Answering by Predictive Annotation - D2463892

Home> Schools> University of Maryland, College Park> Library Science (LBSC) > LBSC 796> Question-Answering by Predictive Annotation

DOC PREVIEW

UMD LBSC 796 - Question-Answering by Predictive Annotation

School name University of Maryland, College Park

Course Lbsc 796- Information Retrieval Systems

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Question-Answering by Predictive Annotation John Prager, Eric Brown, Anni Coden IBM T.J. Watson Research Center Yorktown Heights, N.Y. 10598 iprager/ewb/anni @ us.ibm.com Dragomir Radev University of Michigan Ann Arbor, Michigan radev @ umich.edu Abstract We present a new technique for question answering called Predictive Annotation. Predictive Annotation identifies po- tential answers to questions in text, annotates them accord- ingly and indexes them. This technique, along with a com- plementary analysis of questions, passage-level ranking and answer selection, produces a system effective at answering natural-language fact-seeking questions posed against large document collections. Experimental results show the effects of different parameter settings and lead to a number of general observations about the question-answering problem. 1. Introduction Question-answering is an area of Information Retrieval (IR) that is attracting increasingly more attention, as evidenced by new tracks in conferences such as AAAI[1] and TREC[5,14], and several Web sites. A Question-answering system searches a large text collection and finds a short phrase or sentence that precisely answers a user's question. To solve the Question-answering problem, we might first turn to tradi- tional IR techniques', which have been applied successfully to large scale text search problems[6]. Unfortunately, traditional text search engines typically return lists of documents in re- sponse to a user's query, and therefore provide inappropriate solutions to this problem. Alternatively, the Natural Lan- guage Processing (NLP) and Information Extraction (IE) communities have developed techniques for extracting very precise answers from text. However, these communities use domain specific techniques applied to relatively small text collections. It would appear that an approach combining the strengths of IR and NLP/IE might provide an appropriate way to answer questions using large bodies of text. The extent to which information extraction or deep parsing techniques must be applied to solve the Question-answering problem is an open Permlss,on to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or cornmerc=al advan- tage and that copras bear thin notme and the full citation on the first page 'To copy otherwise, to repubhsh, to post on servers or to redistribute to |isis, requires prior specific permission and/or 8 fee. SIGIR 2000 7/00 Athens, Greece © 2000 ACM 1-58113-226-3100/0007.-$5.00 question. We explore this question and present results show- ing that combining shallow NLP/IE techniques with a custom text search produces an effective Question-answering system. In particular, we describe a new text-processing technique called Predictive Annotation (PA) and analyze its effective- ness using the TREC8 benchmark. Our analysis reveals a number of characteristics about the Question-answering prob- lem, leading to a variety of widely applicable system parame- ter settings. This paper is in 5 sections. In Section 2 we describe the search system we implemented. In Section 3 we analyze the performance of the system. We also attempt to determine the limits of our approach. We discuss related work in Section 4 and in Section 5 we conclude and discuss future work. 2. The GuruQA System Our approach is based on the following observations about fact-seeking questions: • Questions can be classified by the kind of answer they are seeking. • Answers are usually in the form of phrases. • Answer phrases can be classified by the same scheme as the questions. • Answers can be extracted from text using shallow parsing techniques. • The context of the answer phrase that validates it as an answer to the question is usually a small fraction of the document it is embedded in. We took our existing prototype search engine Guru[19] and modified it in three significant ways. We modified the query analysis to detect the question type and to modify the question accordingly. We modified the indexing process to perform shallow linguistic analysis of the text and to identify and an- notate appropriate phrases with class labels. We modified the search engine to rank passages instead of documents, and to use a simple ranking formula. These steps combined to pro- duce a system that performed well in the TREC8 QA track. The complete GuruQA system is depicted in Figure 1. Two components, Query Analysis and Textract/Resporator, in- volve the generation of QA-Tokens, which we now describe. 184T;esX oa:atto r Indexer GuruQA \ Ranked HitList ~-~ AnSel \ ~---~-~ Hit List I Answer selection Figure 1. Block diagram of the GuruQA system. QA-Tokens We observed earlier that both questions and answer phrases could be classified. The question immediately arises of what is the most appropriate granularity of such a classification. The coarsest useful granularity would be according to the so- called "wh-words"; a much finer level would be to use, at least for noun phrases, the basic object categories of Rosch et al.[11]. We chose an intermediate level of about 20 catego- ries which correspond fairly closely to the named-entity types of [13]: each such category is identified by a construct we call a QA-Token. The QA-Token serves both as a category label and a text-string used in the search process. Table 1 lists our QA-Tokens, along with associated question types and repre- sentative matching text phrases. QA-Token Question type PLACES Where COUNTRY$ Where/What country ROLES Example In the Rocky Mountains United King- dom Massachusetts STATES Where/What state PERSONS Who Albert Einstein Who Doctor Who/What/Which /Where/Name the NAMES x Shakespeare Festival ORG$ Who/What The US Post Office DURATIONS For 5 centuries How long How old AGES YEARS TIMES When/What year When DATES When/What dam VOLUMES How big 30 years old 1999 In the afternoon July 4 th, 1776 3 gallons 1 The NAMES token was used for proper names that Textract was unable to subclassify as Person, Place or Organization. AREAS LENGTHS WEIGHTS NUMBERS METHODS How big How big/long/high How big/heavy How many MONEYS How RATES How much How much 4 square inches 5 miles 25tons 1,234.5 By rubbing 50 per cent $4 million Table 1. List of QA-Tokens used in GuruQA

View Full Document