1CS 294-5: StatisticalNatural Language ProcessingQuestion AnsweringDan Klein, UC Berkeley(from Chris Manning’s slides, which includes slides originally borrowed from Sanda Harabagiu, ISI, Nicholas Kushmerick) Assignment 3 Honors Best F1 Christine Hodges (84.1, 2H/2V, bin first) David Rosenberg (84.2, 2V, bin first?) Best Exact Match Roger Bock (36.1%, 2H/2V, pre-tagged) Observations: Bug in my lexicon (Rosenberg) V/H order has subtle issues (Maire) Short test sentences can be parsed almost as well with short training sentences only (Barrett, Petrov) Rare rules slow parsing, hurt accuracy (Latham) Unary issues (Nakov, Bock) Exact match can be at odds with F1 (why?)Project Presentations By popular demand: in-class presentations on the last class (Friday 12/10, unless we prefer Wednesday 12/8) You’ve got 6-8 minutes! Tell us: The problem: why do we care? Your concrete task: input, output, evaluation A simple baseline for the task Your method (half the time here) Any serious surprises, challenges, etc? Headline results (if any) Put your slides (if any) on the web before classQuestion Answering from Text Question Answering: Give the user a (short) answer to their question, perhaps supported by evidence. An idea originating from the IR community With massive collections of full-text documents, simply finding relevant documents is of limited use: we want answers from textbases The common person’s view? [From a novel] “I like the Internet. Really, I do. Any time I need a piece ofshareware or I want to find out the weather in Bogota … I’m the first guy to get the modem humming. But as a source of information, it sucks. You got a billion pieces of data, struggling to be heard and seen and downloaded, and anything I want to know seems to get trampled underfoot in the crowd.” M. Marshall. The Straw Men. HarperCollins Publishers, 2002.People want to ask questions?Examples from AltaVista query logwho invented surf music?how to make stink bombswhere are the snowdens of yesteryear?which english translation of the bible is used in official catholic liturgies?how to do clayarthow to copy psxhow tall is the sears tower?Examples from Excite query log (12/1999)how can i find someone in texaswhere can i find information on puritan religion?what are the 7 wonders of the worldhow can i eliminate stressWhat vacuum cleaner does Consumers Guide recommendAround 10–15% of query logsAskJeeves Probably the most hyped example of “question answering” It largely does pattern matching to match your question to their own knowledge base of questions If that works, you get the human-curatedanswers to that known question If that fails, it falls back to regular web search A potentially interested middle ground, but a fairly weak shadow of real QA2A Brief (Academic) HistoryA Brief (Academic) History Question answering is not a new research area Question answering systems can be found in many areas of NLP research, including: Natural language database systems A lot of early NLP work on these Spoken dialog systems Currently very active and commercially relevant The focus on open-domain QA is new MURAX (Kupiec 1993): Encyclopedia answers Hirschman: Reading comprehension tests TREC QA competition: 1999–Question Answering at TRECQuestion Answering at TREC Question answering competition at TREC consists of answering a set of 500 fact-based questions, e.g., “When was Mozart born?”. For the first three years systems were allowed to return 5 ranked answer snippets (50/250 bytes) to each question. IR think Mean Reciprocal Rank (MRR) scoring: 1, 0.5, 0.33, 0.25, 0.2, 0 for 1, 2, 3, 4, 5, 6+ doc Mainly Named Entity answers (person, place, date, …) From 2002 the systems are only allowed to return a single exact answer and the notion of confidence has been introduced.The TREC Document CollectionThe TREC Document Collection The current collection uses news articles from the following sources: AP newswire, 1998-2000 New York Times newswire, 1998-2000 Xinhua News Agency newswire, 1996-2000 In total there are 1,033,461 documents in the collection. 3GB of text Clearly this is too much text to process entirely using advanced NLP techniques so the systems usually consist of an initial information retrieval phase followed by more advanced processing. Many supplement this text with use of the web, and other knowledge basesSample TREC questions1. Who is the author of the book, "The Iron Lady: A Biography of Margaret Thatcher"?2. What was the monetary value of the Nobel PeacePrize in 1989?3. What does the Peugeot company manufacture?4. How much did Mercury spend on advertising in 1993?5. What is the name of the managing director of ApricotComputer?6. Why did David Koresh ask the FBI for a word processor?7. What debts did Qintex group leave?8. What is the name of the rare neurological disease with symptoms such as: involuntary movements (tics), swearing,and incoherent vocalizations (grunts, shouts, etc.)?Top Performing SystemsTop Performing Systems Currently the best performing systems at TREC can answer approximately 70% of the questions Approaches and successes have varied a fair deal Knowledge-rich approaches, using a vast array of NLP techniques stole the show in 2000, 2001 Notably Harabagiu, Moldovan et al. – SMU/UTD/LCC AskMSR system stressed how much could be achieved by very simple methods with enough text (and now various copycats) Middle ground is to use large collection of surface matching patterns (ISI)Online QA System Examples Examples AnswerBus is an open-domain question answering system: www.answerbus.com Ionaut: http://www.ionaut.com:8400/ LCC: http://www.languagecomputer.com/ EasyAsk, AnswerLogic, AnswerFriend, Start, Quasm, Mulder, Webclopedia, etc. ISI TextMaphttp://brahms.isi.edu:8080/textmap/3Webclopedia ArchitectureThe Google answer #1 Include question words etc. in your stop-list Do standard IR Sometimes this (sort of) works: Question: Who was the prime minister of Australia during the Great Depression? Answer: James Scullin (Labor) 1929–31.Page about Curtin (WW II Labor Prime Minister)(Can deduce answer)Page about Curtin (WW II Labor Prime Minister)(Lacks answer)Page about Chifley(Labor Prime Minister)(Can deduce answer)4But often it doesn’t…
View Full Document