New version page

UMD LBSC 796 - Relevance Feedback

This preview shows page 1-2-3-4-26-27-28-53-54-55-56 out of 56 pages.

View Full Document
View Full Document

End of preview. Want to read all 56 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

LBSC 796/INFM 718R: Week 8 Relevance FeedbackThe IR Black BoxAnomalous State of KnowledgeThe Information Retrieval CycleUpcoming TopicsDifferent Types of InteractionsSlide 7Relevance FeedbackOutlineWhy relevance feedback?Relevance Feedback ExampleInitial ResultsSlide 13Revised ResultsUpdating QueriesPicture of Relevance FeedbackRocchio AlgorithmRocchio in PicturesRelevance Feedback: AssumptionsViolation of A1Relevance PrototypesViolation of A2EvaluationRelevance Feedback: CostKoenemann and Belkin’s WorkWhat’s the best interface?Query InterfacePenetrable InterfaceStudy DetailsSample TopicProcedurePrecision ResultsRelevance feedback works!Number of IterationsBehavior ResultsImplicit FeedbackObservable BehaviorDiscussion PointSo far…Blind Relevance FeedbackBRF ExperimentBRF ExampleResultsThe Complete LandscapeLocal vs. GlobalUser InvolvementQuery Expansion TechniquesGlobal MethodsUsing Controlled VocabularyThesauriUsing Manual ThesauriAutomatic Thesauri GenerationAutomatic Thesauri: ExampleAutomatic Thesauri: DiscussionKey PointsOne Minute PaperLBSC 796/INFM 718R: Week 8Relevance FeedbackJimmy LinCollege of Information StudiesUniversity of MarylandMonday, March 27, 2006The IR Black BoxSearchQueryRanked ListAnomalous State of KnowledgeBasic paradox:Information needs arise because the user doesn’t know something: “an anomaly in his state of knowledge with respect to the problem faced”Search systems are designed to satisfy these needs, but the user needs to know what he is looking forHowever, if the user knows what he’s looking for, there may not be a need to search in the first placeImplication: computing “similarity” between queries and documents is fundamentally wrongHow do we resolve this paradox?Nicholas J. Belkin. (1980) Anomalous States of Knowledge as a Basis for Information Retrieval. Canadian Journal of Information Science, 5, 133-143.The Information Retrieval CycleSourceSelectionSearchQuerySelectionRanked ListExaminationDocumentsDeliveryDocumentsQueryFormulationResourcesource reselectionSystem discoveryVocabulary discoveryConcept discoveryDocument discoveryUpcoming TopicsSourceSelectionSearchQuerySelectionRanked ListExaminationDocumentsDeliveryDocumentsQueryFormulationResourcesource reselectionSystem discoveryVocabulary discoveryConcept discoveryDocument discoveryTodayNext WeekDifferent Types of InteractionsSystem discovery – learning capabilities of the systemPlaying with different types of query operators“Reverse engineering” a search systemVocabulary discovery – learning collection-specific terms that relate to your information needThe literature on aerodynamics refers to aircrafts, but you query on planesHow do you know what terms the collection uses?Different Types of InteractionsConcept discovery – learning the concepts that relate to your information needWhat’s the name of the disease that Reagan had?How is this different from vocabulary discovery?Document discovery – learning about the types of documents that fulfill your information needWere you looking for a news article, a column, or an editorial?Relevance FeedbackTake advantage of user relevance judgments in the retrieval process:User issues a (short, simple) query and gets back an initial hit listUser marks hits as relevant or non-relevantThe system computes a better representation of the information need based on this feedbackSingle or multiple iterations (although little is typically gained after one iteration)Idea: you may not know what you’re looking for, but you’ll know when you see itOutlineExplicit feedback: users explicitly mark relevant and irrelevant documentsImplicit feedback: system attempts to infer user intentions based on observable behaviorBlind feedback: feedback in absence of any evidence, explicit or otherwiseWhy relevance feedback?You may not know what you’re looking for, but you’ll know when you see itQuery formulation may be difficult; simplify the problem through iterationFacilitate vocabulary and concept discoveryBoost recall: “find me more documents like this…”Relevance Feedback ExampleImage Search Enginehttp://nayana.ece.ucsb.edu/imsearch/imsearch.htmlInitial ResultsRelevance FeedbackRevised ResultsUpdating QueriesLet’s assume that there is an optimal queryThe goal of relevance feedback is to bring the user query closer to the optimal queryHow does relevance feedback actually work?Use relevance information to update queryUse query to retrieve new set of documentsWhat exactly do we “feed back”?Boost weights of terms from relevant documentsAdd terms from relevant documents to the queryNote that this is hidden from the userPicture of Relevance FeedbackxxxxoooRevised queryx non-relevant documentso relevant documentsoooxxxxxxxxxxxxxxInitial queryxRocchio AlgorithmUsed in practice:New queryMoves toward relevant documentsAway from irrelevant documentsnrjrjDdjnrDdjrmdDdDqq110qm = modified query vector; q0 = original query vector;α,β,γ: weights (hand-chosen or set empirically); Dr = set of known relevant doc vectors; Dnr = set of known irrelevant doc vectorsRocchio in Picturesvector feedback negativevector feedback positivevectorquery originalvectorquery 0 4 0 8 0 01 2 4 0 0 12 0 1 1 0 4-1 6 3 7 0 -30 4 0 8 0 02 4 8 0 0 28 0 4 4 0 16Original queryPositive FeedbackNegative feedback0.15.025.0(+)(-)New queryTypically,  < Relevance Feedback: AssumptionsA1: User has sufficient knowledge for a reasonable initial queryA2: Relevance prototypes are “well-behaved”Violation of A1User does not have sufficient initial knowledgeNot enough relevant documents are retrieved in the initial queryExamples:Misspellings (Brittany Speers)Cross-language information retrievalVocabulary mismatch (e.g., cosmonaut/astronaut)Relevance PrototypesRelevance feedback assumes that relevance prototypes are “well-behaved”All relevant documents are clustered togetherDifferent clusters of relevant documents, but they have significant vocabulary overlapIn other words,Term distribution in relevant documents will be similar Term distribution in non-relevant documents will be different from those in relevant documentsViolation of A2There are several clusters of relevant


View Full Document
Loading Unlocking...
Login

Join to view Relevance Feedback and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Relevance Feedback and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?