An Introduction to Information Retrieval and Question AnsweringThe Information Retrieval CycleSupporting the Search ProcessTypes of Information NeedsIR is an Experimental Science!What experiments?IR Test CollectionsWhere do they come from?Roots of Question AnsweringInformation Retrieval (IR)Information Extraction (IE)Central Idea of Factoid QAAn ExampleGeneric QA ArchitectureQuestion analysisUsing WordNetExtracting Named EntitiesMore Named EntitiesHow do we extract NEs?Answer Type HierarchyDoes it work?Limitations?ConclusionAn Introduction to Information Retrieval and Question AnsweringJimmy LinCollege of Information StudiesUniversity of MarylandWednesday, December 8, 2004The Information Retrieval CycleSourceSelectionSearchQuerySelectionRanked ListExaminationDocumentsDeliveryDocumentsQueryFormulationResourcequery reformulation,vocabulary learning,relevance feedbacksource reselectionSupporting the Search ProcessSourceSelectionSearchQuerySelectionRanked ListExaminationDocumentsDeliveryDocumentsQueryFormulationResourceIndexingIndexAcquisitionCollectionTypes of Information NeedsAd hoc retrieval: find me documents “like this”Question answeringWho discovered Oxygen?When did Hawaii become a state?Where is Ayer’s Rock located?What team won the World Series in 1992?Identify positive accomplishments of the Hubble telescope since it was launched in 1991.Compile a list of mammals that are considered to be endangered, identify their habitat and, if possible, specify what threatens them.What countries export oil?Name U.S. cities that have a “Shubert” theater.Who is Aaron Copland?What is a quasar?“Factoid”“List”“Definition”IR is an Experimental Science!Formulate a research question, the hypothesisDesign an experiment to answer the questionPerform the experimentCompare with a baseline “control”Does the experiment answer the question?Are the results significant?Report the results!Rinse, repeat…What experiments?Example “questions”:Does morphological analysis improve retrieval performance?Does expanding the query with synonyms improve retrieval performance?Corresponding experiments:Build a “stemmed” index and compare against “unstemmed” baselineExpand queries with synonyms and compare against baseline unexpanded query.What’s missing here?IR Test CollectionsThree components of a test collection:Collection of documents (corpus)Set of information needs (topics)Sets of documents that satisfy the information needs (relevance judgments)Metrics for assessing “performance”PrecisionRecallOther measures derived therefromWhere do they come from?TREC = Text REtrieval ConferencesSeries of annual evaluations, started in 1992Organized into “tracks”Test collections are formed by “pooling”Gather results from all participantsCorpus/topics/judgments can be reusedRoots of Question AnsweringInformation Retrieval (IR)Information Extraction (IE)Information Retrieval (IR)Can substitute “document” for “information”IR systemsUse statistical methodsRely on frequency of words in query, document, collectionRetrieve complete documentsReturn ranked lists of “hits” based on relevanceLimitationsAnswers questions indirectlyDoes not attempt to understand the “meaning” of user’s query or documents in the collectionInformation Extraction (IE)IE systemsIdentify documents of a specific typeExtract information according to pre-defined templatesPlace the information into frame-like database recordsTemplates = pre-defined questionsExtracted information = answersLimitationsTemplates are domain dependent and not easily portableOne size does not fit all!Weather disaster:TypeDateLocationDamageDeaths...Central Idea of Factoid QADetermine the semantic type of the expected answerRetrieve documents that have keywords from the questionLook for named-entities of the proper type near keywords“Who won the Nobel Peace Prize in 1991?” is looking for a PERSONRetrieve documents that have the keywords “won”, “Nobel Peace Prize”, and “1991”Look for a PERSON near the keywords “won”, “Nobel Peace Prize”, and “1991”An ExampleBut many foreign investors remain sceptical, and western governments are withholding aid because of the Slorc's dismal human rights record and the continued detention of Ms Aung San Suu Kyi, the opposition leader who won the Nobel Peace Prize in 1991.The military junta took power in 1988 as pro-democracy demonstrations were sweeping the country. It held elections in 1990, but has ignored their result. It has kept the 1991 Nobel peace prize winner, Aung San Suu Kyi - leader of the opposition party which won a landslide victory in the poll - under house arrest since July 1989.The regime, which is also engaged in a battle with insurgents near its eastern border with Thailand, ignored a 1990 election victory by an opposition party and is detaining its leader, Ms Aung San Suu Kyi, who was awarded the 1991 Nobel Peace Prize. According to the British Red Cross, 5,000 or more refugees, mainly the elderly and women and children, are crossing into Bangladesh each day.Who won the Nobel Peace Prize in 1991?Generic QA ArchitectureQuestion AnalyzerDocument RetrieverPassage RetrieverAnswer ExtractorNL questionIR QueryDocumentsPassagesAnswersAnswer TypeQuestion analysisQuestion word cuesWho person, organization, location (e.g., city)When dateWhere locationWhat/Why/How ??Head noun cuesWhat city, which country, what year...Which astronaut, what blues band, ...Scalar adjective cuesHow long, how fast, how far, how old, ...Using WordNetwingspanlengthdiameter radius altitudeceilingWhat is the service ceiling of an U-2?NUMBERExtracting Named EntitiesPerson: Mr. Hubert J. Smith, Adm. McInnes, Grace ChanTitle: Chairman, Vice President of Technology, Secretary of StateCountry: USSR, France, Haiti, Haitian RepublicCity: New York, Rome, Paris, Birmingham, Seneca FallsProvince: Kansas, Yorkshire, Uttar PradeshBusiness: GTE Corporation, FreeMarkets Inc., AcmeUniversity: Bryn Mawr College, University of IowaOrganization: Red Cross, Boys and Girls ClubMore Named EntitiesCurrency: 400 yen, $100, DM 450,000Linear: 10 feet, 100 miles, 15 centimetersArea: a square foot, 15 acresVolume: 6 cubic feet, 100 gallonsWeight: 10 pounds, half a ton, 100 kilosDuration:
View Full Document