DOC PREVIEW
Stanford CS 276 - Lecture 10 Probabilistic relevance feedback

This preview shows page 1-2-3-21-22-23-43-44-45 out of 45 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS276A Text Retrieval and MiningRecap of the last lectureProbabilistic relevance feedbackWhy probabilities in IR?Probabilistic IR topicsThe document ranking problemRecall a few probability basicsThe Probability Ranking PrincipleProbability Ranking PrincipleProbability Ranking Principle (PRP)Slide 11Slide 12Probabilistic Retrieval StrategyProbabilistic RankingBinary Independence ModelSlide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Estimation – key challengeIteratively estimating piProbabilistic Relevance FeedbackPRP and BIRRemoving term independenceFood for thoughtGood and Bad NewsBayesian Networks for Text Retrieval (Turtle and Croft 1990)Bayesian NetworksToy ExampleIndependence AssumptionsChained inferenceModel for Text RetrievalBayesian Nets for IR: IdeaBayesian Nets for IRBayesian nets for text retrievalLink matrices and probabilitiesExample: “reason trouble –two”ExtensionsComputational detailsBayes Nets in IRResourcesSlide 45CS276AText Retrieval and Mining Lecture 10Recap of the last lectureImproving search resultsEspecially for high recall. E.g., searching for aircraft so it matches with plane; thermodynamic with heatOptions for improving results…Global methodsQuery expansionThesauriAutomatic thesaurus generationGlobal indirect relevance feedbackLocal methodsRelevance feedbackPseudo relevance feedbackProbabilistic relevance feedbackRather than reweighting in a vector space…If user has told us some relevant and some irrelevant documents, then we can proceed to build a probabilistic classifier, such as a Naive Bayes model:P(tk|R) = |Drk| / |Dr|P(tk|NR) = |Dnrk| / |Dnr|tk is a term; Dr is the set of known relevant documents; Drk is the subset that contain tk; Dnr is the set of known irrelevant documents; Dnrk is the subset that contain tk.Why probabilities in IR?User Information NeedDocumentsDocumentRepresentationDocumentRepresentationQueryRepresentationQueryRepresentationHow to match?How to match?In traditional IR systems, matching between each document andquery is attempted in a semantically imprecise space of index terms.Probabilities provide a principled foundation for uncertain reasoning.Can we use probabilities to quantify our uncertainties?Uncertain guess ofwhether document has relevant contentUnderstandingof user need isuncertainProbabilistic IR topicsClassical probabilistic retrieval modelProbability ranking principle, etc.(Naïve) Bayesian Text Categorization Bayesian networks for text retrievalLanguage model approach to IRAn important emphasis in recent workProbabilistic methods are one of the oldest but also one of the currently hottest topics in IR.Traditionally: neat ideas, but they’ve never won on performance. It may be different now.The document ranking problemWe have a collection of documentsUser issues a queryA list of documents needs to be returnedRanking method is core of an IR system:Ranking method is core of an IR system:In what order do we present documents to the In what order do we present documents to the user?user?We want the “best” document to be first, second best second, etc….Idea: Rank by probability of relevance of the Idea: Rank by probability of relevance of the document w.r.t. information needdocument w.r.t. information needP(relevant|documenti, query)Recall a few probability basicsFor events a and b:Bayes’ RuleOdds:aaxxpxbpapabpbpapabpbapapabpbpbapapabpbpbapbapbap,)()|()()|()()()|()|()()|()()|()()|()()|()(),()(1)()()()(apapapapaOPosteriorPriorThe Probability Ranking Principle “If a reference retrieval system's response to each request is a ranking of the documents in the collection in order of decreasing probability of relevance to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data have been made available to the system for this purpose, the overall effectiveness of the system to its user will be the best that is obtainable on the basis of those data.”[1960s/1970s] S. Robertson, W.S. Cooper, M.E. Maron; van Rijsbergen (1979:113); Manning & Schütze (1999:538)Probability Ranking PrincipleLet x be a document in the collection. Let R represent relevance of a document w.r.t. given (fixed) query and let NR represent non-relevance.)()()|()|()()()|()|(xpNRpNRxpxNRpxpRpRxpxRpp(x|R), p(x|NR) - probability that if a relevant (non-relevant) document is retrieved, it is x.Need to find p(R|x) - probability that a document x is relevant.p(R),p(NR) - prior probabilityof retrieving a (non) relevantdocument1)|()|(  xNRpxRpR={0,1} vs. NR/RProbability Ranking Principle (PRP)Simple case: no selection costs or other utility concerns that would differentially weight errorsBayes’ Optimal Decision Rulex is relevant iff p(R|x) > p(NR|x)PRP in action: Rank all documents by p(R|x)Theorem:Using the PRP is optimal, in that it minimizes the loss (Bayes risk) under 1/0 lossProvable if all probabilities correct, etc. [e.g., Ripley 1996]Probability Ranking PrincipleMore complex case: retrieval costs.Let d be a documentC - cost of retrieval of relevant documentC’ - cost of retrieval of non-relevant documentProbability Ranking Principle: iffor all d’ not yet retrieved, then d is the next document to be retrievedWe won’t further consider loss/utility from now on))|(1()|())|(1()|( dRpCdRpCdRpCdRpCProbability Ranking PrincipleHow do we compute all those probabilities?Do not know exact probabilities, have to use estimates Binary Independence Retrieval (BIR) – which we discuss later today – is the simplest modelQuestionable assumptions“Relevance” of each document is independent of relevance of other documents.Really, it’s bad to keep on returning duplicatesBoolean model of relevanceThat one has a single step information needSeeing a range of results might let user refine queryProbabilistic Retrieval StrategyEstimate how terms contribute to relevanceHow do things like tf, df, and length influence your judgments about document relevance? One answer is the Okapi formulae (S. Robertson)Combine to find document relevance probabilityOrder documents by decreasing probabilityProbabilistic RankingBasic concept:"For a given query, if we know some documents that are relevant, terms that


View Full Document

Stanford CS 276 - Lecture 10 Probabilistic relevance feedback

Documents in this Course
Load more
Download Lecture 10 Probabilistic relevance feedback
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 10 Probabilistic relevance feedback and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 10 Probabilistic relevance feedback 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?