LBSC 796/INFM 718R: Week 8 Relevance FeedbackThe IR Black BoxAnomalous State of KnowledgeThe Information Retrieval CycleUpcoming TopicsDifferent Types of InteractionsSlide 7Relevance FeedbackOutlineWhy relevance feedback?Relevance Feedback ExampleInitial ResultsSlide 13Revised ResultsUpdating QueriesPicture of Relevance FeedbackRocchio AlgorithmRocchio in PicturesRelevance Feedback: AssumptionsViolation of A1Relevance PrototypesViolation of A2EvaluationRelevance Feedback: CostKoenemann and Belkin’s WorkWhat’s the best interface?Query InterfacePenetrable InterfaceStudy DetailsSample TopicProcedurePrecision ResultsRelevance feedback works!Number of IterationsBehavior ResultsImplicit FeedbackObservable BehaviorDiscussion PointSo far…Blind Relevance FeedbackBRF ExperimentBRF ExampleResultsThe Complete LandscapeLocal vs. GlobalUser InvolvementQuery Expansion TechniquesGlobal MethodsUsing Controlled VocabularyThesauriUsing Manual ThesauriAutomatic Thesauri GenerationAutomatic Thesauri: ExampleAutomatic Thesauri: DiscussionKey PointsOne Minute PaperLBSC 796/INFM 718R: Week 8Relevance FeedbackJimmy LinCollege of Information StudiesUniversity of MarylandMonday, March 27, 2006The IR Black BoxSearchQueryRanked ListAnomalous State of KnowledgeBasic paradox:Information needs arise because the user doesn’t know something: “an anomaly in his state of knowledge with respect to the problem faced”Search systems are designed to satisfy these needs, but the user needs to know what he is looking forHowever, if the user knows what he’s looking for, there may not be a need to search in the first placeImplication: computing “similarity” between queries and documents is fundamentally wrongHow do we resolve this paradox?Nicholas J. Belkin. (1980) Anomalous States of Knowledge as a Basis for Information Retrieval. Canadian Journal of Information Science, 5, 133-143.The Information Retrieval CycleSourceSelectionSearchQuerySelectionRanked ListExaminationDocumentsDeliveryDocumentsQueryFormulationResourcesource reselectionSystem discoveryVocabulary discoveryConcept discoveryDocument discoveryUpcoming TopicsSourceSelectionSearchQuerySelectionRanked ListExaminationDocumentsDeliveryDocumentsQueryFormulationResourcesource reselectionSystem discoveryVocabulary discoveryConcept discoveryDocument discoveryTodayNext WeekDifferent Types of InteractionsSystem discovery – learning capabilities of the systemPlaying with different types of query operators“Reverse engineering” a search systemVocabulary discovery – learning collection-specific terms that relate to your information needThe literature on aerodynamics refers to aircrafts, but you query on planesHow do you know what terms the collection uses?Different Types of InteractionsConcept discovery – learning the concepts that relate to your information needWhat’s the name of the disease that Reagan had?How is this different from vocabulary discovery?Document discovery – learning about the types of documents that fulfill your information needWere you looking for a news article, a column, or an editorial?Relevance FeedbackTake advantage of user relevance judgments in the retrieval process:User issues a (short, simple) query and gets back an initial hit listUser marks hits as relevant or non-relevantThe system computes a better representation of the information need based on this feedbackSingle or multiple iterations (although little is typically gained after one iteration)Idea: you may not know what you’re looking for, but you’ll know when you see itOutlineExplicit feedback: users explicitly mark relevant and irrelevant documentsImplicit feedback: system attempts to infer user intentions based on observable behaviorBlind feedback: feedback in absence of any evidence, explicit or otherwiseWhy relevance feedback?You may not know what you’re looking for, but you’ll know when you see itQuery formulation may be difficult; simplify the problem through iterationFacilitate vocabulary and concept discoveryBoost recall: “find me more documents like this…”Relevance Feedback ExampleImage Search Enginehttp://nayana.ece.ucsb.edu/imsearch/imsearch.htmlInitial ResultsRelevance FeedbackRevised ResultsUpdating QueriesLet’s assume that there is an optimal queryThe goal of relevance feedback is to bring the user query closer to the optimal queryHow does relevance feedback actually work?Use relevance information to update queryUse query to retrieve new set of documentsWhat exactly do we “feed back”?Boost weights of terms from relevant documentsAdd terms from relevant documents to the queryNote that this is hidden from the userPicture of Relevance FeedbackxxxxoooRevised queryx non-relevant documentso relevant documentsoooxxxxxxxxxxxxxxInitial queryxRocchio AlgorithmUsed in practice:New queryMoves toward relevant documentsAway from irrelevant documentsnrjrjDdjnrDdjrmdDdDqq110qm = modified query vector; q0 = original query vector;α,β,γ: weights (hand-chosen or set empirically); Dr = set of known relevant doc vectors; Dnr = set of known irrelevant doc vectorsRocchio in Picturesvector feedback negativevector feedback positivevectorquery originalvectorquery 0 4 0 8 0 01 2 4 0 0 12 0 1 1 0 4-1 6 3 7 0 -30 4 0 8 0 02 4 8 0 0 28 0 4 4 0 16Original queryPositive FeedbackNegative feedback0.15.025.0(+)(-)New queryTypically, < Relevance Feedback: AssumptionsA1: User has sufficient knowledge for a reasonable initial queryA2: Relevance prototypes are “well-behaved”Violation of A1User does not have sufficient initial knowledgeNot enough relevant documents are retrieved in the initial queryExamples:Misspellings (Brittany Speers)Cross-language information retrievalVocabulary mismatch (e.g., cosmonaut/astronaut)Relevance PrototypesRelevance feedback assumes that relevance prototypes are “well-behaved”All relevant documents are clustered togetherDifferent clusters of relevant documents, but they have significant vocabulary overlapIn other words,Term distribution in relevant documents will be similar Term distribution in non-relevant documents will be different from those in relevant documentsViolation of A2There are several clusters of relevant
View Full Document