New version page

CORNELL CS 630 - Lecture 10: Relevance Feedback Methods

Pages: 6
Documents in this Course

This preview shows page 1-2 out of 6 pages.

View Full Document

End of preview. Want to read all 6 pages?

View Full Document
Unformatted text preview:

CS630 Lecture 10: Relevance Feedback MethodsDate: February 28th, 2006Lecturer: Lillian LeeScribes: Ari Rabkin and Victoria KrafftIn this lecture, we’ll examine the idea of applying relevance feedback (RF) to the threemodels we’ve studies so far; classic probabilistic retrieval, the vector space model, and thelanguage model.For now, let (RF) = explicit binary judgments on each of the top k retrieved documents.This means we don’t have to deal with a lack of feedback, or documents classified as “mayberelevant”.Contents1 Relevance Feedback in Classic Probabilistic Model 12 Relevance Feedback in the VSM 33 Relevance Feedback for LM-based IR 44 Conclusions 55 Question 51 Relevance Feedback in Classic Probabilistic ModelWe’ll start with the case of probabilistic retrieval, because this case is the clearest. This isactually not the order of historical development; query expansion RF methods for the VSMare significantly older. Recall the RSJ “weight” for term v(j)assuming binary attributes ,RSJj=P (Aj= 1|R = y)P (Aj= 1)·P (Aj= 0)(P (Aj= 0|R = y)The parts which depend on R = y use the “positive” feedback from RF (documentsjudged relevant) to re-estimate the probability a document is relevant.What about negative feedback here? We can argue that it’s in the model indirectly. Notethat in the odds ratio version of RSJ, the P (Aj= 1) is actually denoted P (Aj= 1|R = n).In practice, the general collection is used to estimate this value, rather than the collection ofnot relevant documents. The argument for this is that general collection is so much larger,and overwhelmingly not relevant, that it’s more meaningful to use that than to use thehandful of known-to-be-irrelevant documents.However, when we display the top-k documents, negative feedback is interesting. Theseare documents our search engine put in the top k documents, and the fact that they’re not1relevant seems like it ought to be useful. That said, they don’t get used in most RF schemesfor probabilistic retrieval. It’s possible that we may not normally see significant numbers ofdocuments in the top k which are not relevant, so these don’t occur frequently enough to beuseful.The general scheme for RF is as follows:• Initially: Present some set of documents, using the Croft & Harper estimate for P (Aj=1|R = y), or using q as the relevant document.• Get RF.• Re-rank with new RF information.• We repeat this until the user throws stuff at us, or goes away, potentially satisfied.How effective is this in scenarios of interest to us? For example, suppose there is some d= “autos”, q = “cars”, and only relevant documents contain “autos”. Eventually, you wantthe score for d to be high enough that it is retrieved. What is the score for the document?Recall that RSJ score for d given query q is:Yj:qj>0,aj(d)>0RSJjAnd this only uses terms which appear in d and q, so we’ll never look at “autos”. No matterwhat we do, this query simply isn’t going to get us documents whose words are not containedin the query. One solution: add indicative terms to the queries. This is called automaticquery expansion (abbreviated AQE).Here’s a scheme proposed by Robertson in 1990: rank terms byselectscore(v(j)) = RSJj× [P(Aj= 1|R = y) − P (Aj= 1)]This is known in the literature as the wpq score for reasons that will become clear in the nextparagraph. We add the terms with the highest selectscore rank to the query, then re-scorethe documents using the RSJ score, updated to take RF into account. Robertson presents amathematical derivation of this approach.Isn’t the s econd term in the selectscore redundant with respect to RSJj? Let’s supposewe have two terms, v and v0, with corresponding attribute variables A and A’ (we are avoidingsuperscripts and subscripts for notational clarity). Letp = P (A = 1|R = y), q = P (A = 1)Likewise, letp0= P (A0= 1|R = y), q0= P (A0= 1)2Suppose thatRSJ =p/(1 − p)q/(1 − q)< RSJ0=p0/(1 − p0)q0/(1 − q0)(1)so that v0is preferred by RSJjranking. Is it possible to simultaneously have the selectscorevaluesRSJ · (p − q) > RSJ0· (p0− q0) (2)Equation (1) implies:pq<p0q0, and hence equation (2) implies p − q > p0− q0. This is sayingthat the relative difference between the probability of a given term appearing in a relevantdocument and that term appearing in a general document is bigger for v than for v0, butthe absolute difference is smaller. For example, consider p =12and q =14, while p0=7100and q0=1100. In this case, the rankings differ because they treat the difference between thetwo things differently: the RSJ weight selects only for relative difference in term frequencybetween relevant and not relevant documents, selectscore for the absolute difference also.A large absolute difference “requires” that the term appear frequently which means selec-tion is biased towards more frequent terms; moreover small changes in the absolute frequencybetween relevant and not relevant documents do not result in words being put into the query.It turns out that automatic query expansion is significantly older than this, and indeed,it was used in VSM systems as long ago as 1971.2 Relevance Feedback in the VSM[Rocchio ’71, Ide & Salton ’71]Consider the vectors shown in Figure 1. The query vector ~q is closer to~d0, which isnot relevant, than to d, which is relevant, which means that the wrong ranking will result(assuming a cosine retreival function, say). If we add ~q to~d, then we get a new vector~q0,which is closer to the vector for the relevant document d. We can also take ~q +~d and subtract~d0, giving us~q00= ~q +~d −~d0, which is further away from documents which are not relevant,and closer to documents which are relevant. Note that~q00could have negative components.This would mean that we would actively discourage the system from handing us documentscontaining the corresponding terms. This is nice, since it lets you get some of the power ofboolean queries, but negative terms are commonly zeroed in practice.The Rocchio formula is~qnew= α~q + β1|R|Xd(i)∈R~d(i)− γ1|NR|Xd(i)∈NR~d(i)where R is the set of documents judged relevant, and NR is the set of documents judgednot relevant. Taking the average of the [ir-]relevant documents reduces class-size bias effects.Otherwise, if we got a large number of (nearly) identical documents, and marked them allas relevant, that might swamp other information (like the original query!). The constantsα, β, and γ are free

View Full Document Unlocking...