Unformatted text preview:

Project PresentationsICS 278: Data Mining Lecture 18: Analysis of Web User DataOutlineFurther ReadingIntroduction to Web MiningAdvertising ApplicationsData Sources for Web MiningSlide 8How our Web navigation is recorded…Web Server Log FilesSlide 11Example of Web Log entriesRoutine Server Log AnalysisVisualization of Web Log Data over TimeDescriptive Summary StatisticsSlide 16Slide 17Slide 18Web data measurement issuesA time-series plot of ICS Website dataExample: Web Traffic from Commercial Site (slide from Ronny Kohavi, Amazon)Robot / human identificationFractions of Robot Data (from Tan and Kumar, 2002)From Tan and Kumar, 2002 Overall accuracies of around 90% were obtained using decision tree classifiers, trained on sessions of lengths 1, 2, 3, 4,..Page requests, caching, and proxy serversSlide 26Slide 27Identifying individual users from Web server logsSlide 29SessionizingClient-side dataSlide 32Modeling Clickrate DataSlide 34Slide 35Slide 36Slide 37Markov-Poisson Model (Scott and Smyth, 2003)Early studies from 1995 to 1997Slide 43The Cockburn and McKenzie study from 2002Slide 45Slide 46Slide 47Slide 48Markov models for page predictionSlide 50Slide 51Slide 52Slide 53Parameter estimation for Markov model transitionsParameter estimation for Markov modelsPredicting page requests with Markov modelsMixtures of Markov ChainsModeling Web Page Requests with Markov chain mixturesSlide 59Slide 60Clusters of Finite State MachinesLearning ProblemSketch of EM Algorithm for Mixtures of Markov ChainsPrediction with Markov mixturesSlide 65Slide 66Experimental MethodologySlide 68Slide 69Timing ResultsWebCanvasSlide 72Insights from WebCanvas for MSNBC dataPossible ExtensionsRelated WorkSlide 76Analysis of Search Engine Query LogsMain ResultsXie and O Halloran Study (2002)Power-law Characteristics of Common QueriesSlide 81Ecommerce DataRecommender SystemsExamples of Recommender SystemsTreatment of Zero’s in Ratings DataDifferent recommender algorithmsAdditional Aspects of Recommender SystemsGeneral IssuesEvaluation of Recommender SystemsSlide 91Additional Reading on Recommender SystemsData Mining Lectures Analysis of Web User Data Padhraic Smyth, UC IrvineProject Presentations•Thursday next week, each student will make a 4-minute presentation on their project in class (with 1 or 2 minutes for questions)•Email me your Powerpoint or PDF slides, with your name (e.g., joesmith.ppt), before 10am next Thursday•Suggested content:–Definition of the task/goal –Description of data sets–Description of algorithms–Experimental results and conclusions–Be visual where possible! (i.e., use figures, graphs, etc)•Final project report will be due by 12 noon Tuesday of finals week – more details to come laterData Mining Lectures Analysis of Web User Data Padhraic Smyth, UC IrvineICS 278: Data MiningLecture 18: Analysis of Web User DataPadhraic SmythDepartment of Information and Computer ScienceUniversity of California, IrvineData Mining Lectures Analysis of Web User Data Padhraic Smyth, UC IrvineOutline•Basic concepts in Web mining•Analyzing user navigation or clickstream data•Predictive modeling of Web navigation behavior–Markov modeling methods•Analyzing search engine data•Ecommerce aspects of Web log mining–Automated recommender systemsData Mining Lectures Analysis of Web User Data Padhraic Smyth, UC IrvineFurther Reading•Modeling the Internet and the Web, P. Baldi, P. Frasconi, P. Smyth, Wiley, 2003.•ACM Transactions on Internet Technology (ACM TOIT) – can be accessed via ACM Digital Library (available from UCI IP addresses).•Annual WebKDD workshops at the ACM SIGKDD conferences.•Papers on Web page prediction–Selective Markov models for predicting Web page accesses, M. Deshpande, G. Karypis, ACM Transactions on Internet Technology, May 2004.–Model-based clustering and visualization of navigation patterns on a Web site, Cadez et al, Journal of Data Mining and Knowledge Discovery, 2003.Data Mining Lectures Analysis of Web User Data Padhraic Smyth, UC IrvineIntroduction to Web Mining•Useful to study human digital behavior, e.g. search engine data can be used for–Exploration e.g. # of queries per session?–Modeling e.g. any time of day dependence?–Prediction e.g. which pages are relevant?•Applications–Understand social implications of Web usage–Design of better tools for information access–E-commerce applicationsData Mining Lectures Analysis of Web User Data Padhraic Smyth, UC IrvineAdvertising Applications•Revenue of many internet companies is driven by advertising•Key problem:–Given user data:•Pages browsed•Keywords used in search•Demographics–Determine the most relevant ads (in real-time)–Currently about 50% of keyword searches can not be matched effectively to any ads–(other aspects include bidding/pricing of ads)•Another major problem: “click fraud”–Algorithms that can automatically detect when online advertisements are being manipulated (this is a major problem for Internet advertising)•Understanding the user is key to these types of applicationsData Mining Lectures Analysis of Web User Data Padhraic Smyth, UC IrvineData Sources for Web Mining•Web content–Text and HTML content on Web pages, e.g., categorization of content•Web connectivity–Hyperlink/directed-graph structure of the Web–e.g., using PageRank to infer importance of Web pages–e.g., using links to improve accuracy in classification of Web pages•Web user data–Data on how users interact with the Web•Navigation data, aka “clickstream” data•Search query data (keywords for users)•Online transaction data (e.g., purchases at an ecommerce store)–Volume of data?•Large portals (e.g., Yahoo!, MSN) report 100’s of millions of users per monthData Mining Lectures Analysis of Web User Data Padhraic Smyth, UC IrvineFlowchart of a typicalWeb Miningprocess(From Cooley, ACM TOIT,2003)Data Mining Lectures Analysis of Web User Data Padhraic Smyth, UC IrvineHow our Web navigation is recorded…•Web logs–Record activity between client browser and a specific Web server–Easily available–Can be augmented with cookies (provide notion of “state”)•Search engine records–Text in queries, which pages were viewed,


View Full Document

UCI ICS 278 - Project Presentations

Download Project Presentations
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Project Presentations and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Project Presentations 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?