CORNELL CS 630 - Lecture 9: End of Probabilistic Retrieval - D2379755

Home> Schools> Cornell University> Computer Science (CS) > CS 630> Lecture 9: End of Probabilistic Retrieval

CORNELL CS 630 - Lecture 9: End of Probabilistic Retrieval

Course Cs 630- Representing and Accessing Digital Information

Pages 6

Download Save

Unformatted text preview:

CS630 Lecture 9: End of Probabilistic Retrieval, Introduction toRelevance FeedbackLecture by Lillian LeeScribed by Randy Au, Nick Gerner, Blazej KotFebruary 23, 2006In this lecture, we finish treating the basic retrieval paradigms, and move onto introducing therelevance feedback paradigm.1 Meta-lessonsDuring the past several lectures, we have been trying to develop the following skills and importantmeta-tools to do mathematical modeling. These are tools with broad application within and outsideof information retrieval.• Bayes flip : Apply Bayes Rule when appropriate.• (Re)-factorizations : Consider independent random variables and decomposing into productsthereof. This rephrases the problem to hopefully eliminate te rms to appropriately simplify aderivation.• Insert R.V.’s : Capture the system you are trying to model by inserting (mathematicallyappropriately) random variables• Ground models : Incorporate semantic knowledge and sce narios to yield intuition and takereasonable next steps (sometimes making appropriate as sumptions).• Aesthetic sense : Many choices during a derivation may seem arbitrary, but must be taken withcare to move towards a desirable result. These choices rely on your intuition and ”aestheticsense”• Check/challenge preconceptions : Important, interesting and valuable new results are oftenachieved by questioning previous assumptions and models (consider pivoted length normal-ization or the language modeling approach to IR ranking).• Math formalisms : By formalizing a model mathematically, assumptions and preconceptionsare made explicit, making the applicability of a technique clear and suggesting directions forfuture work. Also consider taking advantage of previous formalisms (see challenging precon-ceptions above)12 Simple re-derivation of a probabilistic scoring functionRecall we have the topic modelsTD→ Dand independent query modelsTQ→ QWe can now define what we mean by R = y to be that the topic model that generates a documentis the same as the topic model which generates the query.Our scoring function now (relying on the above intuition of R = y) is:P (R = y|D = d, Q = q)Again relevance status would seem to be determined by d and q and so apparently there is no“room” for probabilities. To address this issue (which we have faced before in probabilistic models ofinformation retrieval) we consider the contents of documents rather than the documents themselves.Here, two documents might have been generated by different topic models but have the same content.This gives us a binning intuition which we considered in the RSJ model to give a random choice.We can rewrite our scoring function as follows:Xt,t0P (R = y, TQ= t, TD= t0|D = d, Q = q)Now we want to move Q to the LHS, so use Bayes flip:Pt,t0P (Q = q|R = y, TQ= t, TD= t0, D = d)P (R = y, TQ= t, TD= t0|D = d)P (Q = q|D = d)Note that the P (R = y, TQ= t, TD= t0|D = d) term is zero if t 6= t0since our interpretationof relevance is that these two topic models must be the same (equal the same t). Also, by theindependence of the query and document the denominator term becomes P (Q = q), which isconstant for all documents, so it can be ignored under rank.This now gives:XtP (Q = q|R = y, TQ= t, TD= t, D = d)P (R = y, TQ= t, TD= t|D = d)Firstly, notice R = y is in both cases implied because of TQ= t, TD= t - that is our very definitionof relevance. Therefore, we can drop this term from b oth parts of the equation.Secondly, notice Q = q does not depend on the document choice at all - therefore we can removeTD= t and D = d.Together, this now gives:2XtP (Q = q|TQ= t)P (TQ= t, TD= t|D = d)Now, notice that in the right-most term TQ= t is independent of D = d. However, we have thisannoying term TD= t. Note that from basic probability, this term can be re-written in a split form:P (TQ= t|TD= t, D = d)P (TD= t|D = d)Plugging this back into our scoring function gives:XtP (Q = q|TQ= t)P (TQ= t)P (TD= t|D = d)In practice, we assume t∗(d) is an MLE/MAP for the topic model of d and that P (TD= t∗(d)|D =d) = 1 removing the necessity to sum over all possible topic models for d. This gives:P (Q = q|TQ= t∗(d))P (TQ= t∗(d))Note that this model has a problem in that it looks for exact matching between the the topic modelfor the document and the topic model for the query. We might address this with some notion ofscoring by the distance between the models to allow for partial matching.3 Relevance Feedback (prelude)3.1 Introduction to Relevance FeedbackIn our coverage of classic probabilistic information retrieval we had to overcome the issue of missingrelevance information: we always needed some way to gather statistics conditioned on the (hidden)value of R. Now we will consider the case when some of this relevance information is available.Vector space models have no a priori method of incorporating this information and rely on further adhoc methods. The probabilistic retrieval framework depends on some probabilistic scoring functionthat can (and does) include R = y explicitly.3.2 Relevance Feedback IntuitionFor a given query we assume that some re levance-labeled documents are available. This may seemcounter-intuitive (this assumes that the user chooses relevant documents rather than the s ystem !)Here are some p os sible scenarios in which the availability of such information is plausible:• The user is very interested in recall for novelty verification (paper publishing or plagiarismdetection)3• Mitigate sample bias: to use a returned set of relevant documents as a random sample wemust account for ranking bias• If a system is giving no relevant documents, but some partially relevant documents, feedbackcan improve results and so the user might be willing to provide it. This may arise if thesystem was unable to interpret the query (because of user or system error is left to thereader’s judgement)4 QuestionsThe scoring function derived in section 2 results in the same function as derived in the previouslecture. Specifically we can use the same multinomial estimation model for do c ument and querygeneration. Recall an important parameter of this model: µ.Let’s consider how these two derivations differ qualitatively by exploring this parameter.Imagine a world in which we have the following corpus. We’ve included topic models which cangenerate the documents; however, notice how we have not been concrete in defining the topic models.They are, for your reference in parameter decisions and to

View Full Document


School:
Email:
New Password:
Confirm Password:

CORNELL CS 630 - Lecture 9: End of Probabilistic Retrieval

Sign up for free to view:

Please select your school