Unformatted text preview:

DATA MININGData Mining: OutlineIntroductionSlide 4Applications / IssuesData Mining ProductsAngoss Knowledge StudioSAS InstituteSPSS Inc.Data Mining ProcessSlide 11Slide 12Slide 13Slide 14Data Mining TechniquesStatistical MethodsExample: Linear RegressionNearest Neighbor PredictionExample: Nearest NeighborNeural NetworkExample: Neural NetworkClustering/SegmentingExample: Clustering/SegmentingDecision TreesExample: Decision TreesData Mining Example 1. Problem DefinitionData Mining Example 2. Collect DataData Mining Example 3. Data ReviewData Mining Example 5. Build ModelSlide 30Data Mining Example 6. Model EvaluationData Mining Example 7. Document / DeploySlide 33Data MiningDATA MININGTeam #1Kristen DurstMark GillespieBanan ManduraUniversity of Dayton MBA 664 13 APR 09MBA 664, Team #1 2Data Mining: Outline•Introduction•Applications / Issues•Products•Process•Techniques•ExampleMBA 664, Team #1 3Introduction• Data Mining Definition–Analysis of large amounts of digital data–Identify unknown patterns, relationships–Draw conclusions AND predict future• Data Mining Growth–Increase in computer processing speed–Decrease in cost of data storageMBA 664, Team #1 4Introduction• High Level Process–Summarize the Data–Generate Predictive Model–Verify the Model• Analyst Must Understand–The business–Data and its origins–Analysis methods and results–Value providedMBA 664, Team #1 5Applications / Issues• Applications –Telecommunications•Cell phone contract turnover– Credit Card•Fraud identification– Finance•Corporate performance– Retail•Targeting products to customers• Legal and Ethical Issues–Aggregation of data to track individual behaviorMBA 664, Team #1 6Data Mining Products•Angoss Software (www.angoss.com)–Knowledge Seeker/Studio–Strategy Builder•Infor Global Solutions (www.infor.com)–Infor CRM Epiphany•Portrait Software (www.portraitsoftware.com)•SAS Institute (www.sas.com)–SAS Enterprise Miner–SAS Analytics•SPSS Inc (www.spss.com)–ClementineMBA 664, Team #1 7Angoss Knowledge StudioMBA 664, Team #1 8SAS InstituteMBA 664, Team #1 9SPSS Inc.MBA 664, Team #1 10Data Mining Process•No uniformly accepted practice•2002 www.KDnuggets.com survey–SPSS CRISP-DM–SAS SEMMAMBA 664, Team #1 11Data Mining Process•SPSS CRISP-DM–CRoss Industry Standard Process for Data Modeling–Consortium: Daimler-Chrysler, SPSS, NCR–Hierarchical Process – Cyclical and IterativeMBA 664, Team #1 12Data Mining Process•CRISP-DMMBA 664, Team #1 13Data Mining Process•SAS SEMMA–Model development is focus–User defines problem, conditions data outside SEMMA•Sample – portion data, statistically•Explore – view, plot, subgroup•Modify – select, transform, update•Model – fit data, any technique•Assess – evaluate for usefulnessMBA 664, Team #1 14Data Mining Process•Common Steps in Any DM Process–1. Problem Definition–2. Data Collection–3. Data Review–4. Data Conditioning–5. Model Building–6. Model Evaluation–7. Documentation / DeploymentMBA 664, Team #1 15Data Mining Techniques•Statistical Methods (Sample Statistics, Linear Regression)•Nearest Neighbor Prediction•Neural Network•Clustering/Segmenting•Decision TreeMBA 664, Team #1 16Statistical Methods•Sample Statistics–Quick look at the data–Ex: Minimum, Maximum, Mean, Median, Variance•Linear Regression–Easy and works with simple problems–May need more complex model using different methodMBA 664, Team #1 17Example: Linear RegressionCustomer IncomeMBA 664, Team #1 18Nearest Neighbor Prediction•Easy to understand•Used for predicting•Works best with few predictor variables•Based on the idea that something will behave the same as how others “near” it behave•Can also show level of confidence in predictionMBA 664, Team #1 19Distance from CompetitorPopulation of City BAAAAAAAUBBB BACCCCProduct Sales by Population of City and Distance from CompetitorA: > 200 unitsB: 100 – 200 unitsC: < 100 unitsExample: Nearest NeighborMBA 664, Team #1 20Neural Network•Contains input, hidden and output layer•Used when there are large amounts of predictive variables•Model can be used again and again once confirmed successful•Can be hard to interpret•Extremely time consuming to format the dataMBA 664, Team #1 21Example: Neural NetworkW1 =.36W2 =.64Population of CityProduct SalesPredictionDistance from Competitor0.736MBA 664, Team #1 22Clustering/Segmenting•Not used for prediction•Forms groups that are very similar or very different•Gives an overall view of the data•Can also be used to identify potential problems if there is an outlierMBA 664, Team #1 23Example: Clustering/Segmenting< 40 years>= 40 yearsRed = FemaleBlue = MaleDimension AMBA 664, Team #1 24Decision Trees•Uses categorical variables•Determines what variable is causing the greatest “split” between the data•Easy to interpret•Not much data formatting •Can be used for many different situationsMBA 664, Team #1 25Example: Decision TreesFM-.63n = 24-.29n = 24-.29n = 24Change from original score.14n = 115.58n = 67-.46n = 48Baseline < 3.75Baseline >= 3.75M F.76n = 51.47n = 281.11n = 23Largebody typeSmallbody typeMBA 664, Team #1 26Data Mining Example1. Problem Definition•Improve On-Time Delivery of New ProductsOn Time Delivery00.050.10.150.20.250.30.350.4-50-45-40-35-30-25-20-15-10-516111621263136414651566166717681869196ProbabilityDelivery A ctual - f itDelivery RequiredMBA 664, Team #1 27Data Mining Example2. Collect DataBrainstorm Variation Sources Data Collection PlanMBA 664, Team #1 28Data Mining Example3. Data Review•Data SegmentsTOTAL LEAD TIME by Part Type: p < .05Level N Mean StDev ----+---------+---------+---------+--BRACKET 520 x6.76 x3.14 (--*-) DUCT 138 x6.70 x0.40 (----*---) MANIFOLD 44 x9.95 x4.68 (-------*-------) TUBE 47 x3.60 x2.79 (------*-------) ----+---------+---------+---------+--Pooled StDev = 68.47MBA 664, Team #1 29Data Mining Example5. Build Model72.7518.2538114.338038.838131.538044.51444895.757.2585.25-20.25-34.5-155.521.5-91.524.75-43.75SHIP_D UEIR CR EATEBOM CREAT EBOM C_M ODCBOM C_M ODPBOM C_M OD IM OD C_D UEMODI_DU EBOM C_DU EM OD I_M ODCCAT MO_FINISCAT MO_STARTCAT SCHE D_STCAT MA N-DUECAT BOM_CR-DCAT MOD_ISSUCAT MODE L_CR604530150S H I P - D U EMain Effects


View Full Document

Dayton MIS 385 - DATA MINING

Download DATA MINING
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view DATA MINING and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view DATA MINING 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?