Data Stream Mining Applications: Toward Inductive DSMSData Stream Mining and DSMSRoad Map for Next Three WeeksThe DM Experience for DBMS: from dreams to realityDB2 Intelligent MinerSlide 6Oracle Data MinerOLE DB for DM (DMX)Defining a Mining ModelTrainingPrediction JoinOLE DB for DM (DMX) (cont.)Summary of Vendors’ ApproachesPMMLPMML ExampleThe Data Mining Software Vendors Market CompetitionDisclaimerMajor Data Mining VendorsCompetitionMajor DMORACLESlide 22SASOur View ...WekaReferencesSlide 271Data Stream Mining Applications:Toward Inductive DSMS CS240B Notes byCarlo Zaniolo UCLA Computer Science DepartmentSpring 200821-Mar-08 2 http://wis.cs.ucla.eduData Stream Mining and DSMSMining Data Stream: an emerging area of important applicationsMany fast & light algorithms developed for mining data streams: Ensembles, Moment, SWIM, etc.Deployemnt of these algorithms on data streams a challengeTo deal with bursty arrivals, synopses, QoS, scheduling Analysts want to focus on high-level mining tasks, leaving such lower-level issues to the DSMSIntegration of mining methods and DSMS technology is needed—but it faces difficult research challenges:Data mining: a big problem for SQL-based DBMS21-Mar-08 3 http://wis.cs.ucla.eduRoad Map for Next Three WeeksData Mining query languages and systemsThe Inductive DBMS dream and the reality:Oracle, IBM DB2, MS DMX, Weka Fast& Light Algorithms for Mining Data Streams Classifiers and Classifier Ensembles,Clustering methods,Association Rules,Time series Supporting these Algorithms in a DSMSData Mining Query Languages and support for the mining process21-Mar-08 4 http://wis.cs.ucla.eduThe DM Experience for DBMS: from dreams to realityInitial attempts to support mining queries in relational DBMS: UnsuccessfulOR-DBMS do not fare much better [Sarawagi’ 98].In 1996, a ‘high-road’ approach was proposed by Imielinski & Mannila who called for a quantum leap in functionality based on:High-level declarative languages for Data Mining (DM) Technology breakthrough in DM query optimization.The research area of Inductive DBMS was thus born Inspiring significant work: DMQL, Mine Rule, MSQL, … Suffer from limited generality and performance issues.21-Mar-08 5 http://wis.cs.ucla.eduDB2 Intelligent MinerModel creationTraining:CALL IDMMX.DM_buildClasModelCmd('IDMMX.CLASTASKS', 'TASK', 'ID', 'HeartClasTask', 'IDMMX.CLASSIFMODELS', 'MODEL', 'MODELNAME', 'HeartClasModel' );PredictionStored procedures and virtual mining viewsOutside the DBMS (like Cache Mining)Data transfer delayshttp://www-306.ibm.com/software/data/iminer/21-Mar-08 6 http://wis.cs.ucla.eduDB2 Intelligent MinerModel creationTrainingCALL IDMMX.DM_buildClasModelCmd('IDMMX.CLASTASKS', 'TASK', 'ID', 'HeartClasTask', 'IDMMX.CLASSIFMODELS', 'MODEL', 'MODELNAME', 'HeartClasModel' );PredictionStored procedures and virtual mining viewsOutside the DBMS (like Cache Mining)Data transfer delayshttp://www-306.ibm.com/software/data/iminer/21-Mar-08 7 http://wis.cs.ucla.eduOracle Data MinerAlgorithmsAdaptive Naïve BayesSVM regressionK-means clusteringAssociation rules, text, mining, etc.PL/SQL with extensions for miningModels as first class objectsCreate_Model, Prediction, Prediction_Cost, Prediction_Details, etc.http://www.oracle.com/technology/products/bi/odm/index.html21-Mar-08 8 http://wis.cs.ucla.eduOLE DB for DM (DMX)Model creationCreate mining model MemCard_Pred ( CustomerId long key, Age long continuous, Profession text discrete, Income long continuous, Risk text discrete predict)Using Microsoft_Decision_Tree;Training Insert into MemCard_Pred OpenRowSet(“‘sqloledb’, ‘sa’, ‘mypass’”, ‘SELECT CustomerId, Age,Profession, Income, Risk from Customers’)Prediction JoinSelect C.Id, C.Risk, PredictProbability(MemCard_Pred.Risk)From MemCard_Pred AS MP Prediction Join Customers AS CWhere MP.Profession = C.Profession and AP.Income = C.Income AND MP.Age = C.Age;21-Mar-08 9 http://wis.cs.ucla.eduDefining a Mining ModelDefineThe format of “training cases” (top-level entity) Attributes, Input/output type, distributionAlgoritms and parametersExampleCREATE MINING MODEL CollegePlanModel( StudentID LONG KEY,Gender TEXT DISCRETE,ParentIncome LONG NORMAL CONTINUOUS,Encouragement TEXT DISCRETE, CollegePlans TEXT DISCRETE PREDICT) USING Microsoft_Decision_Trees21-Mar-08 10 http://wis.cs.ucla.eduINSERT INTO CollegePlanModel(StudentID, Gender, ParentIncome, Encouragement, CollegePlans)OPENROWSET(‘<provider>’, ‘<connection>’,‘SELECT StudentID,Gender, ParentIncome,Encouragement,CollegePlans FROM CollegePlansTrainData’)Training21-Mar-08 11 http://wis.cs.ucla.eduSELECT t.ID, CPModel.PlanFROM CPModel PREDICTION JOIN OPENQUERY(…,‘SELECT * FROM NewStudents’) AS tON CPModel.Gender = t.Gender AND CPModel.IQ = t.IQID Gender IQID Gender IQ PlanCPModel NewStudentsPrediction Join21-Mar-08 12 http://wis.cs.ucla.eduOLE DB for DM (DMX) (cont.)Mining objects as first class objectsSchema rowsetsMining_ModelsMining_Model_ContentMining_FunctionsOther featuresColumn value distributionNested caseshttp://research.microsoft.com/dmx/DataMining/21-Mar-08 13 http://wis.cs.ucla.eduSummary of Vendors’ ApproachesBuilt-in library of mining methodsScript language or GUI toolsLimitationsClosed systems (internals hidden from users)Adding new algorithms or customizing old ones -- DifficultPoor integration with SQLLimited interoperability across DBMSsPredictive Markup Modeling Language (PMML) as a palliative21-Mar-08 14 http://wis.cs.ucla.eduPMMLPredictive Markup Model LanguageXML based language for vendor independent definition of statistical and data mining modelsShare models among PMML compliant productsA descriptive languageSupported by all major vendors21-Mar-08 15 http://wis.cs.ucla.eduPMML ExampleThe Data Mining Software Vendors Market CompetitionThe Data Mining World According toDisclaimerDisclaimerDisclaimerThis presentation contains preliminary information that may be changed substantially prior to final commercial release of the software described herein.The information contained in this presentation
View Full Document