DOC PREVIEW
UH COSC 6340 - COSC 6340 Lecture Notes

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Promising “Newer” Technologies to Cope with theKnowledge Discovery in Data [and Data Mining] (KDD)Making Sense of Data --- Knowledge Discovery and Data MiningData Mining: Confluence of Multiple DisciplinesSlide 5Popular KDD-TasksSlide 7Slide 8Why Do We Need so many Data Mining / Analysis Techniques?Motivation: “Necessity is the Mother of Invention”Why Data Mining? — Potential ApplicationsMarket Analysis and ManagementFraud Detection and ManagementOther ApplicationsData Mining and Business IntelligenceArchitecture of a Typical Data Mining SystemExample: Decision Tree ApproachDecision Tree Approach2Decision TreesOne PossibilityAnother PossibilityExample: Nearest Neighbor ApproachClusteringAnother Example: TextIssuesAssociation Rule MiningCharacteristics and Assumptions of Popular Data Mining/Analysis TechniquesSummary KDDWhere to Find References?Christoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD) 1Promising “Newer” Technologies to Cope with theKnowledge Discovery and Data Mining (KDD)Agent-based TechnologiesOntologies and Knowledge BrokeringNon-traditional data analysis techniquesInformation FloodModel GenerationAs an ExampleTo Explain /Discuss TechnologiesChristoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD) 2Knowledge Discovery in Data [and Data Mining] (KDD)Let us find something interesting!Definition := “KDD is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data” (Fayyad)Frequently, the term data mining is used to refer to KDD.Many commercial and experimental tools and tool suites are available (see http://www.kdnuggets.com/siftware.html)Field is more dominated by industry than by research institutionsChristoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD) 3Making Sense of Data ---Knowledge Discovery and Data Mining2005 Lectures1. Introduction to KDD2. Similarity Assessment 3. Clustering 4. Classification (very very brief)5. Association Rule Mining6. Spatial Databases and Spatial Data Mining7. Data Warehouses and OLAPChristoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD) 4Data Mining: Confluence of Multiple Disciplines Data MiningDatabase TechnologyStatisticsOtherDisciplinesInformationScienceMachineLearningVisualizationChristoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD) 5Select/preprocessSelect/preprocessTransformTransformData mineData mineInterpret/Evaluate/AssimilateInterpret/Evaluate/AssimilateData preparationData sources Selected/Preprocessed data Transformed data Extracted information KnowledgeGeneral KDD StepsGeneral KDD StepsChristoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD) 6Popular KDD-TasksClassification (try to learn how to classify)Clustering (finding groups of similar object)Estimation and Prediction (try to learn a function that predicts an th value of a continuous output variable based on a set of input variables)Bayesian and Dependency NetworksDeviation and Fraud DetectionText MiningWeb Mining VisualizationTransformation and Data CleaningChristoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD) 7KDD is less focused than data analysis in that it looks for interesting patterns in data; classical data analysis centers on analyzing particular relationships in data. The notion of interestingness is a key concept in KDD. Classical data analysis centers more on generating and testing pre-structured hypothesis with respect to a given sample set.KDD is more centered on analyzing large volumes of data (many fields, many tuples, many tables, …). In a nutshell the the KDD-process consists of preprocessing (generating a target data set), data mining (finding something interesting in the data set), and post processing (representing the found pattern in understandable form and evaluated their usefulness in a particular domain); classical data analysis is less concerned with the the preprocessing step.KDD involves the collaboration between multiple disciplines: namely, statistics, AI, visualization, and databases.KDD employs non-traditional data analysis techniques (neural networks, association rules, decision trees, fuzzy logic, evolutionary computing,…).KDD and Classical Data AnalysisKDD and Classical Data AnalysisChristoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD) 8The goal of model generation (sometimes also called predictive data mining) is the creation, evaluation, and use of models to make predictions and to understand the relationships between various variables that are described in a data collection. Typical example application include:–generate a model to that predicts a student’s academic performance based on the applicants data such as the applicant’s past grades, test scores, past degree,…–generate a model that predicts (based on economic data) which stocks to sell, hold, and buy.–generate a model to predict if a patient suffers from a particular disease based on a patient’s medical and other data.Model generation centers on deriving a function that can predict a variable using the values of other variables: v=f(a1,…,an) Neural networks, decision trees, naïve Bayesian classifiers and networks, regression analysis and many other statistical techniques, fuzzy logic and neuro-fuzzy systems, association rules are the most popular model generation tools in the KDD area.All model generation tools and environments employ the basic train-evaluate-predict cycle.Generating Models as an ExampleGenerating Models as an ExampleChristoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD) 9Why Do We Need so manyData Mining / Analysis Techniques?No generally good technique exists.Different methods make different assumptions with respect to the data set to be analyzed (to be discussed on the next transparency)Cross fertilization between different methods is desirable and frequently helpful in obtaining a deeper understanding of the analyzed dataset.Christoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD) 10Motivation: “Necessity is the Mother of Invention”Data explosion problem –Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories We are drowning in data, but


View Full Document

UH COSC 6340 - COSC 6340 Lecture Notes

Download COSC 6340 Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view COSC 6340 Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view COSC 6340 Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?