1CS276BWeb Search and MiningWinter 2005Lecture 5(includes slides borrowed from Jon Herlocker)Recap: Project and PracticumWe hope you’ve been thinking about projects!Revised concrete project plan due todayInitial project presentation: Thursday and TuesdayAbout 10 minutes per groupAbout 5 minutes presentations and a few minutes discussionA chance to explain and focus what you are doing and why it’s interestingPlan for TodayRecommendation Systems (RS)The most prominent type of which goes under the name Collaborative Filtering (CF)What are they are and what do they do?A couple of algorithmsGoing beyond simple behavior: contextHow do you measure them?Recommendation SystemsGiven a set of users and itemsItems could be documents, products, other users …Recommend items to a user based onPast behavior of this and other usersWho has viewed/bought/liked what?Additional information on users and itemsBoth users and items can have known attributes [age, genre, price, …]What do RSs achieve?Help people make decisionsExamples: Where to spend attentionWhere to spend moneyHelp maintain awarenessExamples:New productsNew informationSample ApplicationsEcommerceProduct recommendations - amazonCorporate IntranetsRecommendation, finding domain experts, …Digital LibrariesFinding pages/books people will likeMedical ApplicationsMatching patients to doctors, clinical trials, …Customer Relationship ManagementMatching customer problems to internal experts2Well-known recommender systems: Amazon and NetflixCorporate intranets - document recommendationCorporate intranets - “expert”findingInputs to intranet systemBehaviorusers’ historical “transactions”Contextwhat the user appears to be doing nowUser/domain attributesadditional info about users, documents …Inputs - more detailPast transactions from users:which docs viewedcontent/attributes of documentswhich products purchasedpages bookmarkedexplicit ratings (movies, books … )Current context:browsing historysearch(es) issuedExplicit role/domain info:Role in an enterpriseDocument taxonomiesInterest profilesExample - behavior onlyUsers Docs viewedU1U2d1d2d3U1 viewed d1, d2, d3.U2 views d1, d2.Recommend d3 to U2.?3Expert finding - simple exampleRecommend U1 to U2 as someone to talk to?U1U2d1d2d3Simplest Algorithm: Naïve k Nearest NeighborsU viewed d1, d2, d5.Look at who else viewed d1, d2 or d5.Recommend to U the doc(s) most “popular” among these users.UVWd1d2d5Simple algorithm - shortcomingTreats all other users as equally importantIgnores the fact that some users behaved more like me in the pastTypical RS issuesLarge item space Usually with item attributesLarge user baseUsually with user attributes (age, gender, city, …)Some evidence of customer preferencesExplicit ratings (powerful, but harder to elicit)Observations of user activity (purchases, page views, emails, what was printed, …)Typically extremely sparse, even when user has an opinionUsers ItemsObserved preferencesThe RS SpaceItem-ItemLinksUser-UserLinksLinks derived from similar attributes, similar content, explicit cross referencesLinks derived from similar attributes, explicit connections(Ratings, purchases, page views, laundry lists, play lists)DefinitionsA recommendation system is any system which provides a recommendation/prediction/opinion to a user on itemsRule-based systems use manual rules to do thisAn item similarity/clustering system uses item links to recommend items like ones you likeA classic collaborative filtering system uses the links between users and items as the basis of recommendationsCommonly one has hybrid systems which use all three kinds of links in the previous picture4Link typesUser attributes-based RecommendationMale, 18-35: Recommend The MatrixContent SimilarityYou liked The Matrix: recommend The Matrix ReloadedCollaborative FilteringPeople with interests like yours also liked Kill BillRule-based recommendationsIn practice – rule-based systems are common in commerce enginesMerchandizing interfaces allow product managers to promote itemsCriteria include inventory, margins, etc.Must reconcile these with algorithmic recommendationsMeasuring collaborative filteringHow good are the predictions?How much of previous opinion do we need?Computation.How do we motivate people to offer their opinions?Matrix viewA = UsersDocsAij = 1 if user i viewed doc j,= 0 otherwise.AAt : Entries give # of docs commonly viewed by pairs of users.Voting AlgorithmRow i of AAt: Vector whose jthentry is the # of docs viewed by both i and j.Call this row ri’e.g., (0, 7, 1, 13, 0, 2, ….)What’s on the diagonal of AAt?Voting AlgorithmThen ri °A is a vector whose kthentry gives a weighted vote count to doc kemphasizes users who have high weights in ri.Recommend doc(s) with highest vote counts.How does this differ fromthe simple algorithm?riA5Voting Algorithm - implementation issuesWouldn’t implement using matrix operationsuse weight-propagation on compressed adjacency listsNeed to log and maintain “user views doc”relationship. typically, log into databaseupdate vote-propagating structures periodically.For efficiency, discard all but the heaviest weights in each rionly in fast structures, not in back-end database.WritepseudocodeForward pointerThere are connections between CF and web link analysis:The voting algorithm may be viewed as oneiteration of the Hubs/Authorities algorithm Different setting/algorithmEach user i rates some docs (products, … )say a real-valued rating vikfor doc kin practice, one of several ratings on a formThus we have a ratings vector vi for each user(with lots of zeros)Compute a correlation coefficient between every pair of users i,jdot product of their ratings vectors (symmetric, scalar) measure of how much user pair i,j agrees: wijPredict user i’s utility for doc kSum (over users j such that vjkis non-zero) wijvjkOutput this as the predicted utility for user i on doc k.So how does this differfrom the voting algorithm?It really doesn’t …Same algorithm, different scenarioImplicit (user views doc) vs. Explicit (user assigns rating to doc)Boolean vs. real-valued utilityIn practice,
View Full Document