Wright CS 707 - Query Operations - D2380877

Home> Schools> Wright State University> Counseling (CS) > CS 707> Query Operations

Wright CS 707 - Query Operations

Pages 47

Download Save

Unformatted text preview:

Query OperationsRecap: Unranked retrieval evaluation: Precision and RecallEvaluation of large search enginesThis lectureRelevance FeedbackRelevance FeedbackRelevance feedbackRelevance Feedback: ExampleResults for Initial QuerySlide 10Results after Relevance FeedbackInitial query/resultsExpanded query after relevance feedbackResults for expanded queryKey concept: CentroidRocchio AlgorithmThe Theoretically Best QueryRocchio 1971 Algorithm (SMART)Subtleties to noteRelevance feedback on initial queryRelevance Feedback in vector spacesPositive vs Negative FeedbackRelevance Feedback: AssumptionsViolation of A1Violation of A2Relevance Feedback: ProblemsEvaluation of relevance feedback strategiesEvaluation of relevance feedbackEvaluation: CaveatExcite Relevance FeedbackPseudo relevance feedbackRelevance Feedback :SummaryOther Uses of Relevance FeedbackIndirect relevance feedbackQuery ExpansionQuery ExpansionQuery assist : ExampleQuery expansion: ExampleQuery assist: ExampleHow do we augment the user query?Example of manual thesaurusThesaurus-based query expansionAutomatic Thesaurus GenerationCo-occurrence ThesaurusAutomatic Thesaurus Generation ExampleAutomatic Thesaurus Generation DiscussionQuery assistPrasad L11QueryOps 1Query OperationsAdapted from Lectures by Prabhakar Raghavan (Yahoo, Stanford) and Christopher Manning (Stanford)2Recap: Unranked retrieval evaluation:Precision and RecallPrecision: fraction of retrieved docs that are relevant = P(relevant|retrieved)Recall : fraction of relevant docs that are retrieved = P(retrieved|relevant)Precision P = tp/(tp + fp)Recall R = tp/(tp + fn)Relevant NonrelevantRetrieved tp fpNot Retrieved fn tnPrasadEvaluation of large search enginesSearch engines have test collections of queries and hand-ranked resultsRecall is difficult to measure on the webSearch engines often use precision at top k, e.g., k = 10. . . or measures that reward you more for getting rank 1 right than for getting rank 10 right.NDCG (Normalized Cumulative Discounted Gain)Search engines also use non-relevance-based measures.Clickthrough on first resultNot very reliable if you look at a single clickthrough … but pretty reliable in the aggregate.Studies of user behavior in the labA/B testing3L11QueryOpsThis lectureImproving resultsFor high recall. E.g., searching for aircraft doesn’t match with plane; nor thermodynamic with heatFor gleaning user intent from queries The complete landscapeGlobal methodsQuery expansionThesauriAutomatic thesaurus generationLocal methodsRelevance feedbackPseudo relevance feedbackPrasad 4L11QueryOps5Relevance FeedbackRelevance FeedbackIdea: it may be difficult to formulate a good query when you don’t know the collection well, or cannot express it, but can judge relevance of a result. So iterate …User feedback on relevance of docs in initial set of resultsUser issues a (short, simple) queryThe user marks some results as relevant or non-relevant.The system computes a better representation of the information need based on feedback.Relevance feedback can go through one or more iterations.Prasad 6L11QueryOpsRelevance feedbackWe will use ad hoc retrieval to refer to regular retrieval without relevance feedback.We now look at examples of relevance feedback that highlight different aspects.Prasad 7L11QueryOpsRelevance Feedback: ExampleImage search engine http://nayana.ece.ucsb.edu/imsearch/imsearch.htmlResults for Initial Query9.1.1Prasad 9L11QueryOpsRelevance Feedback9.1.1Results after Relevance Feedback9.1.1Prasad 11L11QueryOpsInitial query/resultsInitial query: New space satellite applications1. 0.539, 08/13/91, NASA Hasn’t Scrapped Imaging Spectrometer2. 0.533, 07/09/91, NASA Scratches Environment Gear From Satellite Plan3. 0.528, 04/04/90, Science Panel Backs NASA Satellite Plan, But Urges Launches of Smaller Probes4. 0.526, 09/09/91, A NASA Satellite Project Accomplishes Incredible Feat: Staying Within Budget5. 0.525, 07/24/90, Scientist Who Exposed Global Warming Proposes Satellites for Climate Research6. 0.524, 08/22/90, Report Provides Support for the Critics Of Using Big Satellites to Study Climate7. 0.516, 04/13/87, Arianespace Receives Satellite Launch Pact From Telesat Canada8. 0.509, 12/02/87, Telecommunications Tale of Two CompaniesUser then marks relevant documents with “+”.+++9.1.1Prasad 12L11QueryOpsExpanded query after relevance feedback2.074 new 15.106 space30.816 satellite 5.660 application5.991 nasa 5.196 eos4.196 launch 3.972 aster3.516 instrument 3.446 arianespace3.004 bundespost 2.806 ss2.790 rocket 2.053 scientist2.003 broadcast 1.172 earth0.836 oil 0.646 measure9.1.1Prasad 13L11QueryOpsResults for expanded query1. 0.513, 07/09/91, NASA Scratches Environment Gear From Satellite Plan2. 0.500, 08/13/91, NASA Hasn’t Scrapped Imaging Spectrometer3. 0.493, 08/07/89, When the Pentagon Launches a Secret Satellite, Space Sleuths Do Some Spy Work of Their Own4. 0.493, 07/31/89, NASA Uses ‘Warm’ Superconductors For Fast Circuit5. 0.492, 12/02/87, Telecommunications Tale of Two Companies6. 0.491, 07/09/91, Soviets May Adapt Parts of SS-20 Missile For Commercial Use7. 0.490, 07/12/88, Gaping Gap: Pentagon Lags in Race To Match the Soviets In Rocket Launchers8. 0.490, 06/14/90, Rescue of Satellite By Space Agency To Cost $90 Million2189.1.1Prasad 14L11QueryOpsKey concept: CentroidThe centroid is the center of mass of a set of pointsRecall that we represent documents as points in a high-dimensional spaceDefinition: Centroidwhere C is a set of documents.CddCC||1)(9.1.1Prasad 15L11QueryOpsRocchio AlgorithmThe Rocchio algorithm uses the vector space model to pick a relevance fed-back queryRocchio seeks the query qopt that maximizesTries to separate docs marked relevant and non-relevantProblem: we don’t know the truly relevant docs))](,cos())(,[cos(maxargnrrqoptCqCqqrjrjCdjnrCdjroptdCdCq119.1.1Prasad 16L11QueryOpsThe Theoretically Best Query xxxxoooOptimal queryx non-relevant documentso relevant documentsoooxxxxxxxxxxxxxx9.1.1Rocchio 1971 Algorithm (SMART)Used in practice:Dr = set of known relevant doc vectorsDnr = set of known irrelevant doc vectorsDifferent from Cr and Cnrqm = modified query vector; q0 = original query vector; α,β,γ: weights

View Full Document


School:
Email:
New Password:
Confirm Password:

Wright CS 707 - Query Operations

Sign up for free to view:

Please select your school