Unformatted text preview:

CS 630 Notes: Lecture 9Lecturer: Lillian LeeNotes by Matt Connolly and Danish MujeebFebruary 23rd, 20061 Meta-Note on Modeling ToolsThus far in the course we’ve been using some important tools for simple mathematical modeling.These are worth keeping in your “toolbox” as useful problem-solving techniques:• Bayes flip• (re)factorization• Inserting random variables (as long as it’s done appropriately)• Knowing how to ground models in scenarios• Maintaining good aesthetic sense• Checking and challenging preconceptions (which we’ve seen result in the technique of pivotednormalization and, via a “different” Bayes flip, the language modeling approach)2 A Basic LM Retrieval Paradigm (Continued)Recall from last time that we had begun to develop a topic-based language modeling approach inwhich documents and queries were created from topic models. Thus,(document topic models) TD→ D(query topic models) TQ→ Q(where TDand TQare independent).In this model, relevance is equivalent to a match between a document’s and a query’s topic model.(R = y) ≡ (TD= TQ)We can score documents with the function P (R = y|D = d, Q = q), since d and q are the values wehave available to us. However, once again we have to ask what this probability value means: why isthe value of P not simply 0 or 1, since it appears that relevance is fixed if we know the identity ofboth the query and the document?1Consider the case of two documents d and d0that are copies of each other, but with d generated byt1and d0generated by t2. Thus the two documents contain identical content but were generatedby different topic models. We’re really looking at content, not the physical document itself. (Thiscould be considered a form of binning.)Can we develop a useful scoring function from our probability term?P (R = y|D = d, Q = q) (1)(introduce our new random variables; because a summation is used, all possible document andquery topic models must be countable)=Xt,t0P (R = y, TQ= t, TD= t0|D = d, Q = q) (2)(perform a Bayes flip)=Xt,t0P (Q = q|R = y, TQ= t, TD= t0, D = d) · P (R = y, TQ= t, TD= t0|D = d)P (Q = q|D = d)(3)(Here note that the denominator equals P (Q = q) by independence and does not affect ranking)rank=XtP (Q = q|R = y, TQ= t, TD= t, D = d) · P (R = y, TQ= t, TD= t|D = d) (4)(because R = y implies that TQ= TD. However, we can now see that most of the terms in ourformula do not affect Q = q, and the R = y term is made redundant.)rank=XtP (Q = q|R = y, TQ= t,TD= t,D = d) · P (R = y, TQ= t, TD= t|D = d) (5)(Furthermore, we want to separate the second term into its two component events)P (TQ= t, TD= t|D = d) =:P (TQ= t) by independenceP (TQ= t|TD= t, D = d) · P (TD= t|D = d) (6)(Note that in lecture the term P (TQ= t) was mistakenly dropped at this point, but that cannot bedone unless P (TQ= t) = P (TQ= t0) for all t, t0. So we’re left with:)rank=XtP (Q = q|TQ= t) · P (TQ= t) · P (TD= t|D = d) (7)(Assume a function t∗(d) as a [MLE/MAP] estimate of a single “best” model to be inferred from d,where P (TD= t∗(d)|D = d) = 1)2= P (Q = q|TQ= t∗(d)) · P (TQ= t∗(d)) (8)This is a form that resembles one we’ve seen before, namely, as P (Q = q|D = d, R = y) · P (R =y|D = d) in the first language model derivation. The second term, P (TQ= t∗(d)), is a prior onwhether TQ= t∗(d). If a uniform prior can be assumed (which requires that the topic set is finite),then the value for P (Q = q|TQ= t∗(d)) can be recovered as our final score.An extension common in practice is to consider the “distance” between the distributions of TQand TDas a scoring function.With the completion of this derivation, we’ve ended up with the following perspective on query/documentmodeling for the standard ad hoc retrieval setting: there are two basic models, the Vector SpaceModel and the probabilistic model. The latter can further be subdivided into two “flavors”, “classic”probabilistic modeling and the Language Model approach.3 Relevance FeedbackTo set the stage for next time, consider the following observations about relevance:• In probabilistic retrieval, we had problems with missing relevance information. For example,in the case of P (Aj|R = y), we’d be set if we only had knowledge about relevance!• The Vector Space Model makes no explicit mention of relevance; if we had relevance informa-tion, we would have to decide what to do with it.• In the Language Modeling approach, we had relevance explicitly represented at one point butthen got rid of it! (Though this depends on which scoring function is used.)Relevance feedback means that we have relevance-labeled documents for a given query available tous (Ruthven and Lalmas ’03). Why would a user be willing to provide this feedback after havingseen some relevant documents (presumably satisfying her stated information need)?• The user might be very interested in recall and want to see all the documents relevant to thequery. This could be for the purpose of novelty (or lack thereof) detection: checking extantresearch for a certain idea, or watching for plagiarism. Another reason might be to mitigate(some) sample bias.• The user might be browsing the document collection.• The system might be giving no wholly relevant documents (but some partially relevant ones),and the user really needs the information he asked for.4 Questions4.1 Incorporating Relevance feedback in the VSMWhat could be one way of incorporating the relevance feedback by the user into a ranking enginethat uses the vector space model?3SolutionWe propose a possible solution as follows. Please refer to figure 1 for a visual representation1. Initially, a document clustering algorithm is employed beforehand to split the corpus intogroups of similar documents. In our example, we have cluster A and cluster B.2. A query q is issued by the user. Lets assume this query is closer to cluster B than cluster A.However, one document in cluster A, that is closest to q, is retrieved in addition to all thedocuments in cluster B. Moreover, the single retrieved document from cluster A is given aweak ranking, due to its distance from q.3. The user provides relevance feedback. From the feedback, we find that the sole retrieveddocument from cluster A is, in fact, very relevant. Hence, more of the documents from clusterA, if not all, should be retrieved in the refined search.4. In order to incorporate the user


View Full Document

CORNELL CS 630 - CS 630 Lecture 9

Download CS 630 Lecture 9
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CS 630 Lecture 9 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 630 Lecture 9 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?