DOC PREVIEW
UMD CMSC 723 - An Introduction to Information Retrieval and Question Answering

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1An Introduction to InformationRetrieval and Question AnsweringJimmy LinCollege of Information StudiesUniversity of MarylandWednesday, December 8, 2004The Information Retrieval CycleSourceSelectionSearchQuerySelectionRanked ListExaminationDocumentsDeliveryDocumentsQueryFormulationResourcequery reformulation,vocabulary learning,relevance feedbacksource reselectionSupporting the Search ProcessSourceSelectionSearchQuerySelectionRanked ListExaminationDocumentsDeliveryDocumentsQueryFormulationResourceIndexingIndexAcquisitionCollectionTypes of Information Needs Ad hoc retrieval: find me documents “like this” Question answeringWho discovered Oxygen?When did Hawaii become a state?Where is Ayer’s Rock located?What team won the World Series in 1992?Identify positive accomplishments of the Hubble telescope since itwas launched in 1991.Compile a list of mammals that are considered to be endangered,identify their habitat and, if possible, specify what threatens them.What countries export oil?Name U.S. cities that have a “Shubert” theater.Who is Aaron Copland?What is a quasar?“Factoid”“List”“Definition”IR is an Experimental Science! Formulate a research question, the hypothesis Design an experiment to answer the question Perform the experiment Compare with a baseline “control” Does the experiment answer the question? Are the results significant? Report the results! Rinse, repeat…What experiments? Example “questions”: Does morphological analysis improve retrievalperformance? Does expanding the query with synonyms improveretrieval performance? Corresponding experiments: Build a “stemmed” index and compare against“unstemmed” baseline Expand queries with synonyms and compare againstbaseline unexpanded query. What’s missing here?2IR Test Collections Three components of a test collection: Collection of documents (corpus) Set of information needs (topics) Sets of documents that satisfy the information needs(relevance judgments) Metrics for assessing “performance” Precision Recall Other measures derived therefromWhere do they come from? TREC = Text REtrieval Conferences Series of annual evaluations, started in 1992 Organized into “tracks” Test collections are formed by “pooling” Gather results from all participants Corpus/topics/judgments can be reusedRoots of Question Answering Information Retrieval (IR) Information Extraction (IE)Information Retrieval (IR) Can substitute “document” for “information” IR systems Use statistical methods Rely on frequency of words in query, document,collection Retrieve complete documents Return ranked lists of “hits” based on relevance Limitations Answers questions indirectly Does not attempt to understand the “meaning” of user’squery or documents in the collectionInformation Extraction (IE) IE systems Identify documents of a specific type Extract information according to pre-defined templates Place the information into frame-like database records Templates = pre-defined questions Extracted information = answers Limitations Templates are domain dependent and not easilyportable One size does not fit all!Weather disaster:TypeDateLocationDamageDeaths...Central Idea of Factoid QA Determine the semantic type of the expectedanswer Retrieve documents that have keywords from thequestion Look for named-entities of the proper type nearkeywords“Who won the Nobel Peace Prize in 1991?” is looking for a PERSONRetrieve documents that have the keywords “won”, “Nobel PeacePrize”, and “1991”Look for a PERSON near the keywords “won”, “Nobel PeacePrize”, and “1991”3An ExampleBut many foreign investors remain sceptical, and western governmentsare withholding aid because of the Slorc's dismal human rights recordand the continued detention of Ms Aung San Suu Kyi, the oppositionleader who won the Nobel Peace Prize in 1991.The military junta took power in 1988 as pro-democracy demonstrationswere sweeping the country. It held elections in 1990, but has ignoredtheir result. It has kept the 1991 Nobel peace prize winner, Aung SanSuu Kyi - leader of the opposition party which won a landslide victory inthe poll - under house arrest since July 1989.The regime, which is also engaged in a battle with insurgents near itseastern border with Thailand, ignored a 1990 election victory by anopposition party and is detaining its leader, Ms Aung San Suu Kyi, whowas awarded the 1991 Nobel Peace Prize. According to the British RedCross, 5,000 or more refugees, mainly the elderly and women andchildren, are crossing into Bangladesh each day.Who won the Nobel Peace Prize in 1991?Generic QA ArchitectureQuestion AnalyzerDocument RetrieverPassage RetrieverAnswer ExtractorNL questionIR QueryDocumentsPassagesAnswersAnswer TypeQuestion analysis Question word cues Who → person, organization, location (e.g., city) When → date Where → location What/Why/How → ?? Head noun cues What city, which country, what year... Which astronaut, what blues band, ... Scalar adjective cues How long, how fast, how far, how old, ...Using WordNetwingspanlengthdiameter radius altitudeceilingWhat is the service ceiling of an U-2?NUMBERExtracting Named EntitiesPerson: Mr. Hubert J. Smith, Adm. McInnes, Grace ChanTitle: Chairman, Vice President of Technology, Secretary of StateCountry: USSR, France, Haiti, Haitian RepublicCity: New York, Rome, Paris, Birmingham, Seneca FallsProvince: Kansas, Yorkshire, Uttar PradeshBusiness: GTE Corporation, FreeMarkets Inc., AcmeUniversity: Bryn Mawr College, University of IowaOrganization: Red Cross, Boys and Girls ClubMore Named EntitiesCurrency: 400 yen, $100, DM 450,000Linear: 10 feet, 100 miles, 15 centimetersArea: a square foot, 15 acresVolume: 6 cubic feet, 100 gallonsWeight: 10 pounds, half a ton, 100 kilosDuration: 10 day, five minutes, 3 years, a millenniumFrequency: daily, biannually, 5 times, 3 times a daySpeed: 6 miles per hour, 15 feet per second, 5 kphAge: 3 weeks old, 10-year-old, 50 years of age4How do we extract NEs? Heuristics and patterns Fixed-lists (gazetteers) Machine learning approachesAnswer Type HierarchyDoes it work? Where do lobsters like to live? on a Canadian airline Where do hyenas live? in Saudi Arabia in the back of pick-up trucks Where are


View Full Document

UMD CMSC 723 - An Introduction to Information Retrieval and Question Answering

Documents in this Course
Load more
Download An Introduction to Information Retrieval and Question Answering
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view An Introduction to Information Retrieval and Question Answering and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view An Introduction to Information Retrieval and Question Answering 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?