USING SEMANTIC REPRESENTATIONS IN QUESTION ANSWERING

Home> Academic Documents> USING SEMANTIC REPRESENTATIONS IN QUESTION ANSWERING

DOC PREVIEW

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

USING SEMANTIC REPRESENTATIONS IN QUESTION ANSWERINGSameer S. Pradhan, Valerie Krugler, Wayne Ward, Dan Jurafsky and James H. MartinCenter for Spoken Language ResearchUniversity of ColoradoBoulder, CO 80309-0594, USAABSTRACTThis paper describes the architecture of a Question Answering system for answering complex questions that requirethe integration of various techniques such as Robust Semantics, Event Detection, Information Fusion and Summa-rization. The focus of the paper is on a general Semantic Representation and how it can be used in the questionanswering process. We describe the vision for the system, report on its current state of development, and evaluateits accuracy on TREC-9 and TREC-10 questions. We also discuss a confidence annotation scheme and evaluate itusing the NIST scoring metric.1. INTRODUCTIONThe Center for Spoken Language Research (CSLR) at the University of Colorado and Columbia University are col-laborating to develop a new technology for question answering. This project is supported by the ARDA AQUAINT(Advanced QUestion and Answering for INTelligence) program. The project proposes to integrate robust semantics,event detection, information fusion, and summarization technologies to enable a multimedia question answeringsystem. The goal is to develop a system capable of answering complex questions; these are questions that re-quire interacting with the user to refine and clarify the context of the question, whose answer may be located innon-homogeneous databases of speech and text, and for which presenting the answer requires combining and sum-marizing information from multiple sources and over time. Generating a satisfactory answer to complex questionsrequires the ability to collect all relevant answers from multiple documents in different media, weigh their relativeimportance, and generate a coherent summary of the multiple facts and opinions reported.We propose to integrate four core technologies:. Semantic annotation (CSLR) – We use a shallow, domain-independent, semantic representation and astatistical semantic parser for building this representation. The semantic representation is a basic buildingblock for dialog management, event detection, and information fusion.. Context management (CSLR) – We are developing a dialogue interface to allow the system to carry ona focused dialogue with users to answer queries. The interface will maintain context through the interactionto allow followup questions and will conduct clarification dialogues with the user to clarify ambiguities andrefine queries.. Event recognition and information tracking (Columbia) – An event is an activity with a starting andending point, involving a fixed set of participants. We propose to identify atomic events within each inputdocument by extracting information about named entities that play a prominent role in the document andabout the time period that the text covers. We will rely on the semantic representation of documents toallow us to identify participants and their functions.This work was supported by ARDA through the AQUAINT program1. Information fusion and summary generation (Columbia) – Rather than listing a set of relevantresponses to a question, we will investigate techniques to integrate summarization and language generationto produce a brief, coherent, and fluent answer. Critical to this task is the problem of selecting fragments oftext from different documents that should be included in the answer and determining how to combine them,removing redundancy and integrating complementary information fluently.This paper will focus on the domain-independent semantic representation and how we propose to use it for questionanswering applications.2. CUAQ SYSTEM2.1. Semantic RepresentationThe novel feature of our approach is the use of shallow semantic representations to enhance potential answeridentification. Most successful systems first identify a list of potential answer candidates using pure word-basedmetrics. Varying granularity of syntactic and semantic information is then used to re-rank those candidates [8, 6].However, most of these semantic units are quite specific in what they label. We identify a small set of thematicroles – viz., agent, patient, manner, degree, cause, result, location, temporal, force, goal, path, percept, proposition,source, state, and topic, in the candidate answer sentences, using a statistical classifier [5]. The classifier is trainedon the FrameNet database [2]. This is an online lexical resource for English, based on frame semantics. It containshand-tagged semantic annotations of example sentences from a large text corpora, and covers thousands of lexicalitems – verbs, nouns, and adjectives – representative of a wide range of semantic domains. We map the morespecific frames and the corresponding frame elements to the reduced, more general set before training the classifier.2.2. ArchitectureWe plan to use two generally available Information Retrieval search engines: 1) mg (Managing Gigabytes) [9], and2) Google [3]The following sequence of actions will be taken in response to an input query:1. Question type classification – Identify the Named Entity and Thematic Role of the expected answer type.This also defines a set of answer type patterns, and includes named entity tagging and parsing the questionfor thematic roles.2. Focus identification – Identify certain salient words/phrases in question that are very likely to be present inthe answer string in one form or the other.3. Extract a set of query words from the question, and apply semantic expansion to them.4. Submit the query words to the IR engine and get back a rank-ordered set of documents.5. Keep the top N (approximately 500) documents and prune the rest.6. Segment documents into paragraphs and prune all but top N0paragraphs.7. Generate scoring features for the paragraphs, including named entity tagging and parsing of paragraphs toadd thematic roles.8. Re-rank documents based on the set of features that we compute, including answer type patterns. Some ofthe answer type patterns are based on the semantic labels.9. Compute confidence for each paragraph (that it contains some relevant information). This includes N -Bestcount as one of the features.10. Send tagged paragraphs that exceed a confidence threshold, for summarization.For the problem of question answering, we are more concerned with precision than recall, so we have to be carefulin expanding the query words to get


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

Please select your school