Unformatted text preview:

ABSTRACTINTRODUCTIONWeb Content MiningAgent-Based ApproachDatabase ApproachOverview of CrawlersWeb Usage MiningPATTERN DISCOVERYPreprocessing TasksData CleaningTransaction Identification Discovery TechniquesPath AnalysisAssociation RulesClustering and ClassificationSequential PatternsWEB USAGE MINING ARCHITECTURE – WEBMINERToolsBENEFITSAPPLICATIONSRESEARCH AREASCONCLUSIONBIBLIOGRAPHYWeb Mining & Pattern DiscoveryPaul George CSE 8331Submitted to: Dr. M. DunhamDepartment of Computer ScienceSouthern Methodist University, Texas, USA1TABLE OF CONTENTSABSTRACT.......................................................................................................................3INTRODUCTION...............................................................................................................4Web Content Mining........................................................................................................6Agent-Based Approach................................................................................................6Intelligent Search Agents.........................................................................................6Information Filtering/Categorization.......................................................................6Personalized Web Agents........................................................................................6Database Approach......................................................................................................6Multilevel Databases...............................................................................................7Web Query Systems.................................................................................................7Overview of Crawlers..................................................................................................7Personalization.............................................................................................................9Web Usage Mining........................................................................................................10PATTERN DISCOVERY.................................................................................................11Preprocessing Tasks.......................................................................................................11Data Cleaning.............................................................................................................11Transaction Identification..........................................................................................11Discovery Techniques....................................................................................................13Path Analysis.............................................................................................................13Association Rules......................................................................................................13Clustering and Classification.....................................................................................15Sequential Patterns.....................................................................................................16WEB USAGE MINING ARCHITECTURE – WEBMINER........................................18Tools...............................................................................................................................19BENEFITS........................................................................................................................20APPLICATIONS..............................................................................................................22RESEARCH AREAS......................................................................................................24CONCLUSION................................................................................................................25BIBLIOGRAPHY.............................................................................................................262ABSTRACTWeb mining has been and is the focus of many research papers. Web mining can beclassified into three categories: Web content mining which is the process of discoveringinformation from various resources available on WWW, Web structure mining which isthe process of discovering knowledge from the interconnections of hypertext documentsand Web usage mining which is the process of pattern discovery and analysis. In thispaper Web content mining approaches have been briefly discussed and Web usage mininghas been concentrated upon in the area of pattern discovery. I have also listed someapplications where Web mining could be applied. Also WEBMINER, a system which isused for Web usage mining has been covered briefly. In the end I have concluded bylisting the issues and the research directions with respect to the topics covered.3INTRODUCTIONFirstly let’s define data mining, Data Mining can be defined as finding hiddeninformation in a database or exploratory data analysis, data driven discovery, anddeductive learning [1].Web mining is data mining applied to the World Wide Web i.e. mining of data related toWorld Wide Web.The Web data can be any of the following:- Web page content- HTML/XML code- Data automatically generated which are stored as server access logs, referrerlogs and cookies residing on the client.- E-commerce transaction dataWhen data mining is applied to the Web, it can perform several functions like:- Information extraction: This deals with acquiring/interpreting usefulinformation using the Web data which may lead to Business Intelligence- Resource discovery: This is the discovery of locations of unfamiliar fileson the network which may or may not be relevant.- Generalization: It relates to the discovery of information patterns [4].Following page shows Web mining classification – Fig. 1.As we see in Fig. 1, Web mining is divided into Web content mining, Web structuremining and Web usage mining. In the following sections in this paper I have discussedthe types of approaches in Web content mining and very briefly covered Web structuremining and concentrated more on pattern discovery in Web usage mining. 4Web mining can be classified as shown below:Fig.1: Web mining classification [3].WEB MINING WEB CONTENT MINING WEB USAGE MININGDatabase Approach WEB STRUCTURE MINING5 Agent Based ApproachWeb


View Full Document

SMU CSE 8331 - Web Mining & Pattern Discovery

Download Web Mining & Pattern Discovery
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Web Mining & Pattern Discovery and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Web Mining & Pattern Discovery 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?