Unformatted text preview:

Data Mining OverviewSlide 2Data Mining is …Data Mining is … (2)Data Mining - Alternative Names?What is Data Mining? Real Example from the NBAData Mining Defining CharacteristicsData Mining, circa 1963Since 1963Slide 10Why Data Mining?MultidisciplinaryWhat Is Data Mining?Confusing TerminologyRequired ExpertiseNuggetsData Mining: History of the FieldKnowledge Discovery in Databases: ProcessSteps of a KDD ProcessData Mining and Business IntelligenceMulti-Dimensional View of Data MiningWhy Mining in Data Warehouses?Ingredients of an Effective KDD ProcessPotential ApplicationsMarket Analysis and ManagementCorporate Analysis & Risk ManagementFraud Detection & Mining Unusual PatternsOther ApplicationsExample: RetailingExample: Aviation SafetyData Mining & Individual PredictionsMore cartoonsData Mining: Classification SchemesWhat Can Data Mining Do?Frequent Patterns & Association RulesSequential Patterns/AssociationsMore Pattern/Association UsesClusteringDeviation DetectionMore Uses for Clusters/OutliersClassificationSlide 43More Classification UsesWar Stories: Warehouse Product AllocationWar Stories: Inventory ForecastingNecessity for Data MiningData Mining ComplicationsMajor Issues in Data MiningAre All the “Discovered” Patterns Interesting?Can We Find All and Only Interesting Patterns?Related Techniques: OLAP On-Line Analytical ProcessingRelated Techniques: VisualizationData Mining and VisualizationResearch Issues in Data MiningEffectivenessEfficiencyApplicationsTheory: Foundation for Data MiningAcknowledgements/SourcesData Mining OverviewData Mining is …“advanced methods for exploring and modeling relationships in large amounts of data.” (SAS)“the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques.” (Gartner Group)the “extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data.” (Clifton)Data Mining is … (2)“the exploration and analysis, by automatic and semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules” (Michael Berry and Gordon Linoff)“the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data” (Fayyad, Piatetsky-Shapiro, Smyth)Data Mining - Alternative Names?Data MiningKnowledge MiningKnowledge Discoveryin DatabasesData ArchaeologyData DredgingDatabase MiningKnowledge ExtractionData Pattern ProcessingInformation HarvestingSiftwareWhat is Data Mining?Real Example from the NBAPlay-by-play information recorded by teamsWho is on the courtWho shootsResultsCoaches want to know what works bestPlays that work well against a given teamGood/bad player matchupsAdvanced Scout (from IBM Research) is a data mining tool to answer these questionsStarks+Houston+Ward playingData Mining Defining Characteristics1. The DataMassive, operational, and opportunistic2. The Users and SponsorsBusiness decision support3. The MethodologyComputer-intensive “ad hockery”Multidisciplinary lineageData Mining, circa 1963 IBM 7090 600 cases“Machine storage limitationsrestricted the total number ofvariables which could beconsidered at one time to 25.”“Machine storage limitationsrestricted the total number ofvariables which could beconsidered at one time to 25.”Since 1963Moore’s Law:The information density on silicon-integrated circuits doubles every 18 to 24 months.Cost of storageCost of processing powerParallel computingAdvances in DBMS and Data WarehousingAdvances in AIAdvances in computing algorithmsAdvances in statistics10electronic point-of-sale datahospital patient registriescatalog orders bank transactionsremote sensing images tax returnsairline reservations credit card chargesstock trades OLTP telephone callsData DelugeWhy Data Mining?Evolution of database technologyTo collect a large amount of data  primitive file processingTo store and query data efficiently  DBMSNew challenges: huge amount of data, how to analyze and understand?Data miningMultidisciplinaryDatabasesStatisticsPatternRecognitionKDDMachineLearningAINeurocomputingData MiningWhat Is Data Mining?ITComplicated database queriesMLInductive learning from examplesStat What we were taught not to doConfusing Terminology“Bias”•Statistics: the expected difference between an estimator and what is being estimated•Neurocomputing: the constant term in a linear combination•Machine Learning: a reason for favoring any model that does not fit the data perfectlyRequired ExpertiseThe domain expert understands the particulars of the business or scientific problem; the relevant background knowledge, context, and terminology; and the strengths and deficiencies of the current solution (if a current solution exists). The data expert understands the structure, size, and format of the data. The analytical expert understands the capabilities and limitations of the methods that may be relevant to the problem.Nuggets“If you’ve got terabytes of data, and you’re relying on data mining to find interesting things in there for you, you’ve lost before you’ve even begun. You really need people who understand what it is they are looking for – and what they can do with it once they find it.” (Herb Edelstein)Data Mining:History of the FieldThe term “data mining” has been around since at least 1983 – as a pejorative term in the statistics communityKnowledge Discovery in Databases workshops started in 1989Now a conference under the auspices of ACM SIGKDDIEEE conference series started 2001Knowledge Discovery in Databases: ProcessJian Pei; adapted from:U. Fayyad, et al. (1995), “From Knowledge Discovery to Data Mining: An Overview”Steps of a KDD Process Learning the application domainrelevant prior knowledge and goals of applicationCreating a target data set: data selectionData cleaning and preprocessing: (may take 60% of effort!)Data reduction and transformationFind useful features, dimensionality/variable reduction, invariant representation.Choosing functions of data mining Summarization, classification, regression, association, clustering.Choosing the mining algorithm(s)Data


View Full Document

UNCC MBAD 6201 - Data Mining - Overview

Download Data Mining - Overview
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Mining - Overview and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Mining - Overview 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?