Stanford CS 276B - Lecture 6 - D2853071

Home> Schools> Stanford University> Computer Science (CS) > CS 276B> Lecture 6

Stanford CS 276B - Lecture 6

Course Cs 276b- Text Information Retrieval, Mining, and Exploitation

Pages 27

Download Save

Unformatted text preview:

CS276B Web Search and Mining Winter 2005Recap: Recommendation SystemsImplementationPlan for TodayExtensionsPrivacyUtility formulation of RSUser typesIntuitive picture (exaggerated)Matrix reconstructionIntuitive pictureMatrix reconstruction: Achlioptas/McSherryHow do we reconstruct U from Û?Achlioptas/McSherry theoremNorms of matricesSlide 16What is the SVD doing?What is the SVD doing, contd.Iterating to get other user typesAchlioptas/McSherry againProbabilistic Model-based RSMcLaughlin & Herlocker 2004Novelty versus TrustCommon Prediction Accuracy MetricSlide 25Rsults from SIGIR 2004 PaperResourcesCS276B Web Search and MiningWinter 2005Lecture 6Recap: Recommendation SystemsWhat they are and what they do?A couple of algorithmsClassical Collaborative Filtering (CF): Nearest neighbor-based approachesGoing beyond simple behavior: contextHow do you measure their quality?ImplementationWe worked in terms of matrices, butDon’t really want to maintain this gigantic (and sparse) vector spaceDimension reductionFast nearest neighbors Incremental versionsupdate as new transactions arrivetypically done in batch modeincremental dimension reduction etc.Plan for TodayIssues related to last timeExtensionsPrivacyModel-based RS approachesLearn model from database, and make predictions from model rather than iterating over users each timeUtility formulationMatrix reconstruction for low-rank matricesModel-based probabilistic formulationsEvaluation and a modified NN formulationExtensionsAmazon - “Why was I recommended this”See where the “evidence” came fromClickstreams - do sequences matter?HMMs (next IE lecture) can be used to infer user type from browse sequenceE.g., how likely is the user to make a purchase?Meager improvement in using sequence relative to looking only at last pagePrivacyWhat info does a recommendation leak?E.g., you’re looking for illicit content and it shows me as an expertWhat about compositions of recommendations?“These films are popular among your colleagues”“People who bought this book in your dept also bought … ”“Aggregates” are not good enoughPoorly understoodUtility formulation of RSMicroeconomic viewAssume that each user has a real-valued utility for each itemm  n matrix U of utilities for each of m users for each of n itemsnot all utilities known in advancePredict which (unseen) utilities are highest for each userUser typesIf users are arbitrary, all bets are offtypically, assume matrix U is of low ranksay, a constant k independent of m,nsome perturbation is allowableI.e., users belong to k well-separated types(almost)Most users’ utility vectors are close to one of k well-separated vectorsIntuitive picture (exaggerated)Type 1Type 2…Type kUsersItemsAtypical usersMatrix reconstructionGiven some utilities from the matrixReconstruct missing entriesSuffices to predict biggest missing entries for each userSuffices to predict (close to) the biggestFor most usersNot the atypical onesIntuitive pictureType 1Type 2…Type kUsersItemsAtypical usersSamplesMatrix reconstruction: Achlioptas/McSherryLet Û be obtained from U by the following sampling: for each i,jÛij = Uij ,with probability 1/s,Ûij = 0 with probability 1-1/s.The sampling parameter s has some technical conditions, but think of it as a constant like 100.Interpretation: Û is the sample of user utilities that we’ve managed to get our hands onFrom past transactions(that’s a lot of samples)How do we reconstruct U from Û?First the “succinct” waythen the (equivalent) intuitionFind the best rank k approximation to sÛUse SVD (best by what measure?)Call this ÛkOutput Ûk as the reconstruction of UPick off top elements of each row as recommendations, etc.Achlioptas/McSherry theoremWith high probability, reconstruction error is smallsee paper for detailed statementWhat’s high probability?Over the samples not the matrix entriesWhat’s error – how do you measure it?Norms of matricesFrobenius norm of a matrix M:|M|F2 = sum of the square of the entries of MLet Mk be the rank k approximation computed by the SVDThen for any other rank k matrix X, we know|M- Mk|F  |M-X|FThus, the SVD gives the best rank k approximation for each kNorms of matricesThe L2 norm is defined as|M|2 = max |Mx|, taken over all unit vectors xThen for any other rank k matrix X, we know|M- Mk|2  |M-X|2Thus, the SVD also gives the best rank k approximation by the L2 normWhat is it doing in the process?Will avoid using the language of eigenvectors and eigenvaluesWhat is the SVD doing?Consider the vector v defining the L2 norm of U:|U|2 = |Uv|Then v measures the “dominant vector direction” amongst the rows of U (i.e., users)ith coordinate of Uv is the projection of the ith user onto v|U|2 = |Uv| captures the tendency toalign with vWhat is the SVD doing, contd.U1 (the rank 1 approximation to U) is given by UvvTIf all rows of U are collinear, i.e., rank(U)=1, then U= U1 ;the error of approximating U by U1 is zero In general of course there are still user types not captured by v leftover in the residual matrix U-U1: Type 2…Type kAtypical usersIterating to get other user typesNow repeat the above process with the residual matrix U-U1Find the dominant user type in U-U1 etc.Gives us a second user type etc.Iterating, get successive approximations U2, U3, … UkAchlioptas/McSherry againSVD of Û: the uniformly sampled version of UFind the rank k SVD of ÛThe result Ûk is close to the best rank k approximation to UIs it reasonable to sample uniformly?Probably notE.g., unlikely to know much about your fragrance preferences if you’re a sports fanProbabilistic Model-based RSBreese et al. UAI 1998Similar to Achlioptas/McSherry but probabilistic:Assume a latent set of k classes, never observedThese generate observed votes as a Naïve Bayes model (recall cs276a)Learn a best model using the EM algorithmBayesian Network modelLearn probabilistic decision trees for predicting liking each item based on liking other itemsThey concluded that in many (but not all!) circumstances, Bayesian DT model works bestMcLaughlin & Herlocker 2004Argues that current well-known algorithms give poor user

View Full Document


School:
Email:
New Password:
Confirm Password:

Stanford CS 276B - Lecture 6

Sign up for free to view:

Please select your school