Unformatted text preview:

1LBSC 796/INFM 718R: Week 8Relevance FeedbackJimmy LinCollege of Information StudiesUniversity of MarylandMonday, March 27, 2006The IR Black BoxSearchQueryRanked ListAnomalous State of Knowledge| Basic paradox:z Information needs arise because the user doesn’t know something: “an anomaly in his state of knowledge with respect to the problem faced”z Search systems are designed to satisfy these needs, but the user needs to know what he is looking forz However, if the user knows what he’s looking for, there may not be a need to search in the first place| Implication: computing “similarity” between queries and documents is fundamentally wrong| How do we resolve this paradox?Nicholas J. Belkin. (1980) Anomalous States of Knowledge as a Basis for Information Retrieval. Canadian Journal of Information Science, 5, 133-143. The Information Retrieval CycleSourceSelectionSearchQuerySelectionRanked ListExaminationDocumentsDeliveryDocumentsQueryFormulationResourcesource reselectionSystem discoveryVocabulary discoveryConcept discoveryDocument discoveryUpcoming TopicsSourceSelectionSearchQuerySelectionRanked ListExaminationDocumentsDeliveryDocumentsQueryFormulationResourcesource reselectionSystem discoveryVocabulary discoveryConcept discoveryDocument discoveryTodayNext WeekDifferent Types of Interactions| System discovery – learning capabilities of the systemz Playing with different types of query operatorsz “Reverse engineering” a search system| Vocabulary discovery – learning collection-specific terms that relate to your information needz The literature on aerodynamics refers to aircrafts, but you query on planesz How do you know what terms the collection uses?2Different Types of Interactions| Concept discovery – learning the concepts that relate to your information needz What’s the name of the disease that Reagan had?z How is this different from vocabulary discovery?| Document discovery – learning about the types of documents that fulfill your information needz Were you looking for a news article, a column, or an editorial?Relevance Feedback| Take advantage of user relevance judgments in the retrieval process:z User issues a (short, simple) query and gets back an initial hit listz User marks hits as relevant or non-relevantz The system computes a better representation of the information need based on this feedbackz Single or multiple iterations (although little is typically gained after one iteration)| Idea: you may not know what you’re looking for, but you’ll know when you see itOutline| Explicit feedback: users explicitly mark relevant and irrelevant documents| Implicit feedback: system attempts to infer user intentions based on observable behavior| Blind feedback: feedback in absence of any evidence, explicit or otherwiseWhy relevance feedback?| You may not know what you’re looking for, but you’ll know when you see it| Query formulation may be difficult; simplify the problem through iteration| Facilitate vocabulary and concept discovery| Boost recall: “find me more documents like this…”Relevance Feedback ExampleImage Search Enginehttp://nayana.ece.ucsb.edu/imsearch/imsearch.htmlInitial Results3Relevance Feedback Revised ResultsUpdating Queries| Let’s assume that there is an optimal queryz The goal of relevance feedback is to bring the user query closer to the optimal query| How does relevance feedback actually work?z Use relevance information to update queryz Use query to retrieve new set of documents| What exactly do we “feed back”?z Boost weights of terms from relevant documentsz Add terms from relevant documents to the queryz Note that this is hidden from the userPicture of Relevance FeedbackxxxxoooRevised queryx non-relevant documentso relevant documentsoooxxxxxxxxxxxx∆xxInitial query∆xRocchio Algorithm| Used in practice:| New queryz Moves toward relevant documentsz Away from irrelevant documents∑∑∈∈−+=nrjrjDdjnrDdjrmdDdDqqrrrrrr110γβαqm= modified query vector; q0= original query vector;α,β,γ: weights (hand-chosen or set empirically); Dr = set of known relevant doc vectors; Dnr= set of known irrelevant doc vectorsRocchio in Picturesvector feedback negativevector feedback positivevectorquery originalvectorquery ⋅−⋅+⋅=γβα0 4 0 8 0 01 2 4 0 0 12 0 1 1 0 4-1 6 3 7 0 -30 4 0 8 0 02 4 8 0 0 28 0 4 4 0 16Original queryPositive FeedbackNegative feedback0.1=α5.0=β25.0=γ(+)(-)New queryTypically, γ < β4Relevance Feedback: Assumptions| A1: User has sufficient knowledge for a reasonable initial query| A2: Relevance prototypes are “well-behaved”Violation of A1| User does not have sufficient initial knowledge| Not enough relevant documents are retrieved in the initial query| Examples:z Misspellings (Brittany Speers)z Cross-language information retrievalz Vocabulary mismatch (e.g., cosmonaut/astronaut)Relevance Prototypes| Relevance feedback assumes that relevance prototypes are “well-behaved”z All relevant documents are clustered togetherz Different clusters of relevant documents, but they have significant vocabulary overlap| In other words,z Term distribution in relevant documents will be similar z Term distribution in non-relevant documents will be different from those in relevant documentsViolation of A2| There are several clusters of relevant documents| Examples:z Burma/Myanmarz Contradictory government policiesz OpinionsEvaluation| Compute standard measures with q0| Compute standard measures with qmz Use all documents in the collection• Spectacular improvements, but… it’s cheating!• The user already selected relevant documentsz Use documents in residual collection (set of documents minus those assessed relevant)• More realistic evaluation• Relative performance can be validly compared| Empirically, one iteration of relevance feedback produces significant improvementsz More iterations don’t helpRelevance Feedback: Cost| Speed and efficiency issuesz System needs to spend time analyzing documentsz Longer queries are usually slower| Users often reluctant to provide explicit feedback| It’s often harder to understand why a particular document was retrieved5Koenemann and Belkin’s Work| Well-known study on relevance feedback in information retrieval| Questions asked:z Does relevance feedback improve results?z Is user control over relevance feedback helpful?z How do different levels of user control effect results?Jürgen Koenemann and Nicholas J. Belkin. (1996) A Case For Interaction: A Study


View Full Document

UMD LBSC 796 - Relevance Feedback

Download Relevance Feedback
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Relevance Feedback and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Relevance Feedback 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?