CORNELL CS 630 - Lecture 15: Preference Implicit Feedback - D2712479

Home> Schools> Cornell University> Computer Science (CS) > CS 630> Lecture 15: Preference Implicit Feedback

CORNELL CS 630 - Lecture 15: Preference Implicit Feedback

Course Cs 630- Representing and Accessing Digital Information

Pages 3

Download Save

Unformatted text preview:

CS630 Lecture 15: Preference Implicit FeedbackLecture by Lillian LeeScribed by Randy Au, Nick GernerMarch 28, 2006In this lecture, we outline a preference implicit feedback mechanism described by Joachims in 2002and by Radlinski and Joachims in 2005. One key insight we’ve seen before is to use clickthroughdata to identify pair-wise document summary preferences for queries. If document summary d(i)islisted before summary d(j)and d(j)was clicked on we can say that d(j)is preferred to d(i).1 Insights and ProblemsThis can be extended to summaries over several queries. Suppose qkyields a set of documentsummaries {s1, s2, ...} and the user doesn’t click on any of these. Then suppose the user reformulatestheir query to qk+1with summaries {s′1, s′2, ..., s′i, ...} and the user clicks on si. In general we mightlike to say that s′iis preferred to all other sjand all other s′lfor l < i. However, we can’t guaranteethat the user actually considered all other sj. Going back to the eye tracking research we’ve seenbefore, we can say that with high probability (in some settings) the user did consider s1and s2.So we say that the user prefers s′iover s1and s2. This is a highly accurate heuristic (recall thepreference results from the eye tracking study), but provides very sparse data.We have two important problems though. First, while the data is highly accurate, it isn’t in thesame format as the implicit feedback data we’ve used before. We only have relative informationbetween documents, no relevance judgments. Second, the relevance feedback we collect in this waymay not generalize to other queries (since the judgments are with respect to the original query orquery chain).2 Generalizing Preference Implicit FeedbackThe idea Radlinski and Joachims present is to group the queries with the documents. Considera vector space. Traditionally we had the dimensions represent features of the documents. Nowwe will consider the dimensions also incorporating query features along with document features.Instead of representing a document by a vector in the vector space, we represent (q, d) by the vectorφ(q, d). For example one coordinate of φ(q, d) might be the cosine between the traditional queryand document vectors from the VSM. Some other possible coordinate rules follow:1φi(q, d) =(1 if d = the CU homepage and qcontains “big red”,0 otherwiseφi(q, d) =(1 if d is a Finnish website,0 otherwiseφi(q, d) =(1 if d is ranked in the top ten for q according to Google or some other external search engine ,0 otherwiseThis model is at least as expressive as the vector space model since we can represent the VSM’sranking method as a single feature. The new model allows us to incorporate many more queryspecific features. But notice that the second example above for the new model was also expressiblein the original VSM.However, on one hand, we have a very high-dimensional space now: at least m ×n dimensions resultjust from the first case above. But on the other hand, the vectors themselves will probably be verysparse.3 TrainingGiven that we have a model for representing queries and documents, we need a way to representour preference implicit feedback in this model. Specifically, suppose we know that with respect toq, d is preferred to d′. We also know that with respect to ˆq, d′is preferred to d. Our goal is toproduce a vector ~w such that the above preference information can be encoded in the length of theprojections on to ~w of the different query/document vectors we have. In our example above wewant to preserve ~w · φ(q, d) > ~w · φ(q, d′) and ~w · φ(ˆq, d′) > ~w · φ(ˆq, d). This is illustrated below. φ(q,d) φ(q,d’) φ(q,d’) φ(q,d) ^ ^ → w In general this is an NP-Complete problem (constraint satisfaction). However, in practice we tryto minimize the constraint violations (the support vector machine approach).In order to compute a ranking for a new queryˆˆq we compute φ(ˆˆq, d)∀d and rank in descending orderof ~w · φ(ˆˆq, d).24 ExerciseRecall that our major problem with click-through data was that it is not the same sort of data as therelevance data we had access to in other feedback schemes. However, one new piece of informationthat we felt we could rely on was preference data of clicks provided we had a heuristic to determinewhat a user had probably looked at but didn’t click on, and so a new scheme was required to encodethe relative positions of document query pairs in a high-dimensional vector space.Now consider this search situation. A user has an difficult search task that stretches their knowledgeand ability to articulate their information need. It should be reasonable to assume that the usermakes a ”best effort” attempt at providing as best a query they have the ability to produce.In such a search situation, the user’s ability to judge the relevance of a document from a summary isobviously limited, which partly explains why there is a very low correlation between clickthroughsand relevance data. Another potential explanation being that summaries are not representativeenough to the underlying documents.1) Now, consider the relationship between the documents the user clicked and supposedly browsedthrough, judged inadequate, then reformulated their query. What has our class discussion statedabout about the relationships between documents and query reformulations, a priori?- Our class discussion makes no claim whatso ever about the relationship between the two otherthan assuming that they’re within the same search context. It instead immediately starts analyzingthe user click behavior.2) Now, taking into account our assumption that users could do no better with their initial query,but after their initial search they then reformulate their query into a supposedly ”better” one, howcould this come about?- In such a situation, it would seem that the only source of inspiration for a better query wouldbe any documents that were supposedly lo oked at. Users either learn terms that potentially wouldmake a better query, or learn terms to exclude in order to filter out irrelevant documents. We wouldtherefore have some material link between their query reformation and the documents/summariesthat the user has clicked on.3) And finally, how can we integrate this scheme into our extended vector space?- This link between new query and document suggests a possible φ(q, d) for the new vector space,since we can incorporate a condition such as

View Full Document


School:
Email:
New Password:
Confirm Password:

CORNELL CS 630 - Lecture 15: Preference Implicit Feedback

Sign up for free to view:

Please select your school