CS276B Web Search and Mining Winter 2005Recap: Recommendation SystemsImplementationPlan for TodayExtensionsPrivacyUtility formulation of RSUser typesIntuitive picture (exaggerated)Matrix reconstructionIntuitive pictureMatrix reconstruction: Achlioptas/McSherryHow do we reconstruct U from Û?Achlioptas/McSherry theoremNorms of matricesSlide 16What is the SVD doing?What is the SVD doing, contd.Iterating to get other user typesAchlioptas/McSherry againProbabilistic Model-based RSMcLaughlin & Herlocker 2004Novelty versus TrustCommon Prediction Accuracy MetricSlide 25Rsults from SIGIR 2004 PaperResourcesCS276B Web Search and MiningWinter 2005Lecture 6Recap: Recommendation SystemsWhat they are and what they do?A couple of algorithmsClassical Collaborative Filtering (CF): Nearest neighbor-based approachesGoing beyond simple behavior: contextHow do you measure their quality?ImplementationWe worked in terms of matrices, butDon’t really want to maintain this gigantic (and sparse) vector spaceDimension reductionFast nearest neighbors Incremental versionsupdate as new transactions arrivetypically done in batch modeincremental dimension reduction etc.Plan for TodayIssues related to last timeExtensionsPrivacyModel-based RS approachesLearn model from database, and make predictions from model rather than iterating over users each timeUtility formulationMatrix reconstruction for low-rank matricesModel-based probabilistic formulationsEvaluation and a modified NN formulationExtensionsAmazon - “Why was I recommended this”See where the “evidence” came fromClickstreams - do sequences matter?HMMs (next IE lecture) can be used to infer user type from browse sequenceE.g., how likely is the user to make a purchase?Meager improvement in using sequence relative to looking only at last pagePrivacyWhat info does a recommendation leak?E.g., you’re looking for illicit content and it shows me as an expertWhat about compositions of recommendations?“These films are popular among your colleagues”“People who bought this book in your dept also bought … ”“Aggregates” are not good enoughPoorly understoodUtility formulation of RSMicroeconomic viewAssume that each user has a real-valued utility for each itemm n matrix U of utilities for each of m users for each of n itemsnot all utilities known in advancePredict which (unseen) utilities are highest for each userUser typesIf users are arbitrary, all bets are offtypically, assume matrix U is of low ranksay, a constant k independent of m,nsome perturbation is allowableI.e., users belong to k well-separated types(almost)Most users’ utility vectors are close to one of k well-separated vectorsIntuitive picture (exaggerated)Type 1Type 2…Type kUsersItemsAtypical usersMatrix reconstructionGiven some utilities from the matrixReconstruct missing entriesSuffices to predict biggest missing entries for each userSuffices to predict (close to) the biggestFor most usersNot the atypical onesIntuitive pictureType 1Type 2…Type kUsersItemsAtypical usersSamplesMatrix reconstruction: Achlioptas/McSherryLet Û be obtained from U by the following sampling: for each i,jÛij = Uij ,with probability 1/s,Ûij = 0 with probability 1-1/s.The sampling parameter s has some technical conditions, but think of it as a constant like 100.Interpretation: Û is the sample of user utilities that we’ve managed to get our hands onFrom past transactions(that’s a lot of samples)How do we reconstruct U from Û?First the “succinct” waythen the (equivalent) intuitionFind the best rank k approximation to sÛUse SVD (best by what measure?)Call this ÛkOutput Ûk as the reconstruction of UPick off top elements of each row as recommendations, etc.Achlioptas/McSherry theoremWith high probability, reconstruction error is smallsee paper for detailed statementWhat’s high probability?Over the samples not the matrix entriesWhat’s error – how do you measure it?Norms of matricesFrobenius norm of a matrix M:|M|F2 = sum of the square of the entries of MLet Mk be the rank k approximation computed by the SVDThen for any other rank k matrix X, we know|M- Mk|F |M-X|FThus, the SVD gives the best rank k approximation for each kNorms of matricesThe L2 norm is defined as|M|2 = max |Mx|, taken over all unit vectors xThen for any other rank k matrix X, we know|M- Mk|2 |M-X|2Thus, the SVD also gives the best rank k approximation by the L2 normWhat is it doing in the process?Will avoid using the language of eigenvectors and eigenvaluesWhat is the SVD doing?Consider the vector v defining the L2 norm of U:|U|2 = |Uv|Then v measures the “dominant vector direction” amongst the rows of U (i.e., users)ith coordinate of Uv is the projection of the ith user onto v|U|2 = |Uv| captures the tendency toalign with vWhat is the SVD doing, contd.U1 (the rank 1 approximation to U) is given by UvvTIf all rows of U are collinear, i.e., rank(U)=1, then U= U1 ;the error of approximating U by U1 is zero In general of course there are still user types not captured by v leftover in the residual matrix U-U1: Type 2…Type kAtypical usersIterating to get other user typesNow repeat the above process with the residual matrix U-U1Find the dominant user type in U-U1 etc.Gives us a second user type etc.Iterating, get successive approximations U2, U3, … UkAchlioptas/McSherry againSVD of Û: the uniformly sampled version of UFind the rank k SVD of ÛThe result Ûk is close to the best rank k approximation to UIs it reasonable to sample uniformly?Probably notE.g., unlikely to know much about your fragrance preferences if you’re a sports fanProbabilistic Model-based RSBreese et al. UAI 1998Similar to Achlioptas/McSherry but probabilistic:Assume a latent set of k classes, never observedThese generate observed votes as a Naïve Bayes model (recall cs276a)Learn a best model using the EM algorithmBayesian Network modelLearn probabilistic decision trees for predicting liking each item based on liking other itemsThey concluded that in many (but not all!) circumstances, Bayesian DT model works bestMcLaughlin & Herlocker 2004Argues that current well-known algorithms give poor user
View Full Document