Unformatted text preview:

Data Stream Mining Applications: Toward Inductive DSMSData Stream Mining and DSMSRoad Map for Next Three WeeksThe DM Experience for DBMS: from dreams to realityDB2 Intelligent MinerSlide 6Oracle Data MinerOLE DB for DM (DMX)Defining a Mining ModelTrainingPrediction JoinOLE DB for DM (DMX) (cont.)Summary of Vendors’ ApproachesPMMLPMML ExampleThe Data Mining Software Vendors Market CompetitionDisclaimerMajor Data Mining VendorsCompetitionMajor DMORACLESlide 22SASOur View ...WekaReferencesSlide 271Data Stream Mining Applications:Toward Inductive DSMS CS240B Notes byCarlo Zaniolo UCLA Computer Science DepartmentSpring 200821-Mar-08 2 http://wis.cs.ucla.eduData Stream Mining and DSMSMining Data Stream: an emerging area of important applicationsMany fast & light algorithms developed for mining data streams: Ensembles, Moment, SWIM, etc.Deployemnt of these algorithms on data streams a challengeTo deal with bursty arrivals, synopses, QoS, scheduling Analysts want to focus on high-level mining tasks, leaving such lower-level issues to the DSMSIntegration of mining methods and DSMS technology is needed—but it faces difficult research challenges:Data mining: a big problem for SQL-based DBMS21-Mar-08 3 http://wis.cs.ucla.eduRoad Map for Next Three WeeksData Mining query languages and systemsThe Inductive DBMS dream and the reality:Oracle, IBM DB2, MS DMX, Weka Fast& Light Algorithms for Mining Data Streams Classifiers and Classifier Ensembles,Clustering methods,Association Rules,Time series Supporting these Algorithms in a DSMSData Mining Query Languages and support for the mining process21-Mar-08 4 http://wis.cs.ucla.eduThe DM Experience for DBMS: from dreams to realityInitial attempts to support mining queries in relational DBMS: UnsuccessfulOR-DBMS do not fare much better [Sarawagi’ 98].In 1996, a ‘high-road’ approach was proposed by Imielinski & Mannila who called for a quantum leap in functionality based on:High-level declarative languages for Data Mining (DM) Technology breakthrough in DM query optimization.The research area of Inductive DBMS was thus born Inspiring significant work: DMQL, Mine Rule, MSQL, … Suffer from limited generality and performance issues.21-Mar-08 5 http://wis.cs.ucla.eduDB2 Intelligent MinerModel creationTraining:CALL IDMMX.DM_buildClasModelCmd('IDMMX.CLASTASKS', 'TASK', 'ID', 'HeartClasTask', 'IDMMX.CLASSIFMODELS', 'MODEL', 'MODELNAME', 'HeartClasModel' );PredictionStored procedures and virtual mining viewsOutside the DBMS (like Cache Mining)Data transfer delayshttp://www-306.ibm.com/software/data/iminer/21-Mar-08 6 http://wis.cs.ucla.eduDB2 Intelligent MinerModel creationTrainingCALL IDMMX.DM_buildClasModelCmd('IDMMX.CLASTASKS', 'TASK', 'ID', 'HeartClasTask', 'IDMMX.CLASSIFMODELS', 'MODEL', 'MODELNAME', 'HeartClasModel' );PredictionStored procedures and virtual mining viewsOutside the DBMS (like Cache Mining)Data transfer delayshttp://www-306.ibm.com/software/data/iminer/21-Mar-08 7 http://wis.cs.ucla.eduOracle Data MinerAlgorithmsAdaptive Naïve BayesSVM regressionK-means clusteringAssociation rules, text, mining, etc.PL/SQL with extensions for miningModels as first class objectsCreate_Model, Prediction, Prediction_Cost, Prediction_Details, etc.http://www.oracle.com/technology/products/bi/odm/index.html21-Mar-08 8 http://wis.cs.ucla.eduOLE DB for DM (DMX)Model creationCreate mining model MemCard_Pred ( CustomerId long key, Age long continuous, Profession text discrete, Income long continuous, Risk text discrete predict)Using Microsoft_Decision_Tree;Training Insert into MemCard_Pred OpenRowSet(“‘sqloledb’, ‘sa’, ‘mypass’”, ‘SELECT CustomerId, Age,Profession, Income, Risk from Customers’)Prediction JoinSelect C.Id, C.Risk, PredictProbability(MemCard_Pred.Risk)From MemCard_Pred AS MP Prediction Join Customers AS CWhere MP.Profession = C.Profession and AP.Income = C.Income AND MP.Age = C.Age;21-Mar-08 9 http://wis.cs.ucla.eduDefining a Mining ModelDefineThe format of “training cases” (top-level entity) Attributes, Input/output type, distributionAlgoritms and parametersExampleCREATE MINING MODEL CollegePlanModel( StudentID LONG KEY,Gender TEXT DISCRETE,ParentIncome LONG NORMAL CONTINUOUS,Encouragement TEXT DISCRETE, CollegePlans TEXT DISCRETE PREDICT) USING Microsoft_Decision_Trees21-Mar-08 10 http://wis.cs.ucla.eduINSERT INTO CollegePlanModel(StudentID, Gender, ParentIncome, Encouragement, CollegePlans)OPENROWSET(‘<provider>’, ‘<connection>’,‘SELECT StudentID,Gender, ParentIncome,Encouragement,CollegePlans FROM CollegePlansTrainData’)Training21-Mar-08 11 http://wis.cs.ucla.eduSELECT t.ID, CPModel.PlanFROM CPModel PREDICTION JOIN OPENQUERY(…,‘SELECT * FROM NewStudents’) AS tON CPModel.Gender = t.Gender AND CPModel.IQ = t.IQID Gender IQID Gender IQ PlanCPModel NewStudentsPrediction Join21-Mar-08 12 http://wis.cs.ucla.eduOLE DB for DM (DMX) (cont.)Mining objects as first class objectsSchema rowsetsMining_ModelsMining_Model_ContentMining_FunctionsOther featuresColumn value distributionNested caseshttp://research.microsoft.com/dmx/DataMining/21-Mar-08 13 http://wis.cs.ucla.eduSummary of Vendors’ ApproachesBuilt-in library of mining methodsScript language or GUI toolsLimitationsClosed systems (internals hidden from users)Adding new algorithms or customizing old ones -- DifficultPoor integration with SQLLimited interoperability across DBMSsPredictive Markup Modeling Language (PMML) as a palliative21-Mar-08 14 http://wis.cs.ucla.eduPMMLPredictive Markup Model LanguageXML based language for vendor independent definition of statistical and data mining modelsShare models among PMML compliant productsA descriptive languageSupported by all major vendors21-Mar-08 15 http://wis.cs.ucla.eduPMML ExampleThe Data Mining Software Vendors Market CompetitionThe Data Mining World According toDisclaimerDisclaimerDisclaimerThis presentation contains preliminary information that may be changed substantially prior to final commercial release of the software described herein.The information contained in this presentation


View Full Document

UCLA COMSCI 240B - Inductive DSMS

Download Inductive DSMS
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Inductive DSMS and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Inductive DSMS 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?