DOC PREVIEW
defscriber_long

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

A Hybrid Approach for Answering Definitional QuestionsSasha Blair-Goldensohn and Kathleen R. McKeown and Andrew Hazen SchlaikjerDepartment of Computer ScienceColumbia UniversityNew York, NY 10027{sashabg,kathy,hazen}@cs.columbia.eduAbstractWe present DefScriber, a fully imple-mented system that combines knowledge-based and statistical methods in formingmulti-sentence answers to open-ended defi-nitional questions of the form, “What is X?”We show how a set of definitional predi-cates proposed as the knowledge-based sideof our approach can be used to guide theselection of definitional sentences. Finally,we present results of an evaluation of defini-tions generated by DefScriber from Internetdocuments.1 IntroductionQuestion answering (QA) systems have reached a re-markably high level of performance (NIS, 2002) dueto the integration of techniques from computationallinguistics and information retrieval. Much of the ef-fort in QA until now has gone into building short an-swer QA systems, which answer questions for whichthe correct answer is a single word or short phrase.Many questions are not in this class; they are bet-ter answered with a longer description or explanation.Producing these kinds of answers is the focus of long-answer QA, an area still in early stages of develop-ment but already the subject of several recent pilotstudies (ARD, 2002).Our work is concerned specifically with definitionalQA - answering questions of the form, “What is X?”with multi-sentence responses which we provisionallycall definitional descriptions. Definitional descrip-tions can be thought of as longer and more descrip-tive than dictionary definitions, while shorter thandefinitions found in an encyclopedia. DefScriber isa fully implemented system that generates these de-scriptions using an innovative combination of top-down and bottom-up techniques.Top-down techniques in DefScriber are based onkey elements of definitions as identified in the liter-ature and in our own empirical study of definitions.One such element is information on the term’s cat-egory (Genus) and/or important properties (Species)(Sager and L’Homme, 1994). For instance, category,or Genus, information about the term “Hajj” is givenin the sentence “The Hajj is a type of ritual.” Def-Scriber specifically searches for sentences that conveythese definitional information types, or predicates, inbuilding a definitional description.Since relevant information for a given definitionmay not be entirely modeled by predicates, we com-plement our top-down approach with data-driventechniques adapted from work in multi-documentsummarization. These techniques take advantage ofredundancy on the web to identify good definitionalsentences. Using centroid-based metrics and cluster-ing, DefScriber finds similarities in documents that fo-cus on a given term and includes them in the response.These techniques allow us to include core informationin the definition even when we don’t have a specificpredicate to model its semantic type.Lastly, we give evaluation results which demon-strate the promise of this combined approach for gen-erating definitions of ad hoc terms from a large andheterogenous document collection, the Internet.2 Related WorkOur work on generation of definitions builds on re-search in summarization and in generation. Previ-ous work in multi-document summarization has de-veloped solutions that identify similarities across doc-uments as the basis for summary content (Carbonelland Goldstein, 1998; Radev et al., 2000; Hovy andLin, 1997; Mani and Bloedorn, 1997). Whether simi-larities are included through sentence extraction or in-formation fusion (Barzilay et al., 1999), all of theseapproaches are data-driven because similarities in thedata determine content.9 Genus-Species SentencesThe Hajj, or pilgrimage to Makkah (Mecca), is the central duty of Islam.The Hajj is a milestone event in a Muslim 's life.The hajj is one of five pillars that make up the foundation of Islam.The Hajj is a week-long pilgrimage that begins in the 12th month of the Islamic lunar calendar.Hajj is the highest of all Muslim practices, even if less than 10 % of all Muslims ever manage to perform it. ...11 Web documents, 1127 total sentencesInputT = “What isthe Hajj?”N = 20L = 8383 Non-specific Definitional sentencesThe Hajj, or pilgrimage to Makkah [Mecca], is the central duty of Islam. More than two million Muslims are expected to take the Hajj this year. Muslims must perform the hajj at least once in their lifetime if physically and financially able. The Hajj is a milestone event in a Muslim's life. The annual hajj begins in the twelfth month of the Islamic year (which is lunar, not solar, so that hajj and Ramada-n fall sometimes in summer, sometimes in winter). The Hajj is a week-long pilgrimage that begins in the 12th month of the Islamic lunar calendar. Another ceremony, which was not connected with the rites of the Ka'ba before the rise of Islam, is the Hajj, the annual pilgrimage to 'Arafat, about two miles east of Mecca, toward Mina. The hajj is one of five pillars that make up the foundation of Islam. Not only was the kissing of this stone incorporated into Islam, but the whole form of the Hajj Pilgrimage today is fundamentally that of the Arabs before Islam. Rana Mikati of Rochester will make a pilgrimage, or Hajj, to the holy site of Mecca next week.Sentence clusters, importance orderingDocumentRetrievalPredicateIdentificationData-DrivenAnalysisDefinitionCreationFigure 1: DefScriber creates a descriptive definition of the term “Hajj”Top-down approaches are more often found in gen-eration. Schemas (McKeown, 1985), rhetorical struc-ture theory (Marcu, 1997; Moore and Paris, 1992) andplan-based approaches (Reiter and Dale, 2000) are ex-amples of top-down approaches, where the schema orplan specifies the kind of information to include in agenerated text. In early work, schemas were used togenerate definitions (McKeown, 1985), but the infor-mation for the definitional text was found in a knowl-edge base. In more recent work, information extrac-tion is used to create a top-down approach to sum-marization (Radev and McKeown, 1998) by search-ing for specific types of information which can be ex-tracted from the input texts (e.g., perpetrator in a newsarticle on terrorism). Here, the summary briefs theuser on domain-specific information assumed a priorito be of interest.Other long-answer QA approaches (ARD, 2002)are still in early stages and,


defscriber_long

Download defscriber_long
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view defscriber_long and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view defscriber_long 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?