UT Dallas CS 6350 - 18.ch09-recsys1 - D3104987

Home> Schools> University of Texas at Dallas> Computer Science (CS) > CS 6350> 18.ch09-recsys1

DOC PREVIEW

UT Dallas CS 6350 - 18.ch09-recsys1

School name University of Texas at Dallas

Course Cs 6350- Big Data Management and Analytics

Pages 43

This preview shows page 1-2-3-20-21-22-41-42-43 out of 43 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Slide 1High Dimensional DataExample: Recommender SystemsRecommendationsFrom Scarcity to AbundanceSidenote: The Long TailPhysical vs. OnlineTypes of RecommendationsFormal ModelUtility MatrixKey Problems(1) Gathering Ratings(2) Extrapolating UtilitiesSlide 14Content-based RecommendationsPlan of ActionItem ProfilesSidenote: TF-IDFUser Profiles and PredictionPros: Content-based ApproachCons: Content-based ApproachSlide 22Collaborative FilteringFinding “Similar” UsersSimilarity MetricRating PredictionsItem-Item Collaborative FilteringItem-Item CF (|N|=2)Item-Item CF (|N|=2)Item-Item CF (|N|=2)Item-Item CF (|N|=2)Item-Item CF (|N|=2)CF: Common PracticeItem-Item vs. User-UserPros/Cons of Collaborative FilteringHybrid MethodsRemarks & Practical TipsEvaluationEvaluationEvaluating PredictionsProblems with Error MeasuresCollaborative Filtering: ComplexityTip: Add DataRecommender Systems:Content-based Systems & Collaborative FilteringMining of Massive DatasetsJure Leskovec, Anand Rajaraman, Jeff Ullman Stanford Universityhttp://www.mmds.org Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of a significant portion of these slides in your own lecture, please include this message, or a link to our web site: http://www.mmds.orgHigh Dimensional DataHigh dim. dataHigh dim. dataLocality sensitive hashingLocality sensitive hashingClusteringClusteringDimensionality reductionDimensionality reductionGraph dataGraph dataPageRank, SimRankPageRank, SimRankCommunity DetectionCommunity DetectionSpam DetectionSpam DetectionInfinite dataInfinite dataFiltering data streamsFiltering data streamsWeb advertisingWeb advertisingQueries on streamsQueries on streamsMachine learningMachine learningSVMSVMDecision TreesDecision TreesPerceptron, kNNPerceptron, kNNAppsAppsRecommender systemsRecommender systemsAssociation RulesAssociation RulesDuplicate document detectionDuplicate document detectionJ. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 2Example: Recommender SystemsCustomer XBuys Metallica CDBuys Megadeth CDCustomer YDoes search on MetallicaRecommender system suggests Megadeth from data collected about customer XJ. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 3Recommendations ItemsSearch RecommendationsProducts, web sites, blogs, news items, …4J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.orgExamples:From Scarcity to AbundanceShelf space is a scarce commodity for traditional retailers Also: TV networks, movie theaters,…Web enables near-zero-cost dissemination of information about productsFrom scarcity to abundanceMore choice necessitates better filtersRecommendation enginesHow Into Thin Air made Touching the Void a bestseller: http://www.wired.com/wired/archive/12.10/tail.htmlJ. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 5Sidenote: The Long TailSource: Chris Anderson (2004)6J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.orgPhysical vs. OnlineJ. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 7Read http://www.wired.com/wired/archive/12.10/tail.html to learn more!Types of RecommendationsEditorial and hand curatedList of favoritesLists of “essential” itemsSimple aggregatesTop 10, Most Popular, Recent UploadsTailored to individual usersAmazon, Netflix, …8J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.orgFormal ModelX = set of CustomersS = set of ItemsUtility function u: X ×S  RR = set of ratingsR is a totally ordered sete.g., 0-5 stars, real number in [0,1]9J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.orgUtility Matrix0.410.20.30.50.21Avatar LOTR Matrix PiratesAliceBobCarolDavid10J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.orgKey Problems(1) Gathering “known” ratings for matrixHow to collect the data in the utility matrix(2) Extrapolate unknown ratings from the known onesMainly interested in high unknown ratingsWe are not interested in knowing what you don’t like but what you like(3) Evaluating extrapolation methodsHow to measure success/performance ofrecommendation methods11J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org(1) Gathering RatingsExplicitAsk people to rate itemsDoesn’t work well in practice – people can’t be botheredImplicitLearn ratings from user actionsE.g., purchase implies high ratingWhat about low ratings?12J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org(2) Extrapolating UtilitiesKey problem: Utility matrix U is sparseMost people have not rated most itemsCold start: New items have no ratingsNew users have no historyThree approaches to recommender systems:1) Content-based2) Collaborative3) Latent factor based13J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.orgToday!Content-based Recommender SystemsContent-based RecommendationsMain idea: Recommend items to customer x similar to previous items rated highly by xExample:Movie recommendationsRecommend movies with same actor(s), director, genre, …Websites, blogs, newsRecommend other sites with “similar” contentJ. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 15Plan of ActionlikesItem profilesRedCirclesTrianglesUser profilematchrecommendbuild16J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.orgItem ProfilesFor each item, create an item profileProfile is a set (vector) of featuresMovies: author, title, actor, director,…Text: Set of “important” words in documentHow to pick important features?Usual heuristic from text mining is TF-IDF(Term frequency * Inverse Doc Frequency)Term … FeatureDocument … Item17J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.orgSidenote: TF-IDFfij = frequency of term (feature) i in doc (item) jni = number of docs that mention term iN = total number of docsTF-IDF score: wij = TFij ×

View Full Document