DOC PREVIEW
CMU CS 15826 - Data Mining - DB concepts

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

C. Faloutsos1CMU SCS15-826: Multimedia Databases and Data MiningData Mining - DB conceptsC. Faloutsos15-826 Copyright: C. Faloutsos (2005) 2CMU SCSOutlineGoal: ‘Find similar / interesting things’• Intro to DB• Indexing - similarity search• Data Mining15-826 Copyright: C. Faloutsos (2005) 3CMU SCSData Mining - Detailed outline• Statistics• AI - decision trees• DB– data warehouses; data cubes; OLAP– classifiers– association rules– misc. topics: • reconstruction of info• network databases; time sequence forecasting15-826 Copyright: C. Faloutsos (2005) 4CMU SCSData Ware-housing + OLAPProblem:Given: multiple data sourcesFind: patternssales(p-id, c-id, date, $price)customers( c-id, age, income, ...)NYSF???PGH15-826 Copyright: C. Faloutsos (2005) 5CMU SCSData Ware-housingProblem:Given: multiple data sourcesFind: patterns (such as?)15-826 Copyright: C. Faloutsos (2005) 6CMU SCSData Ware-housingProblem:Given: multiple data sourcesFind: patterns (such as?)• classifiers (‘supervised learning’)• ‘association rules’; clusters (‘unsup. learning’)bread, milk -> butterC. Faloutsos215-826 Copyright: C. Faloutsos (2005) 7CMU SCSData Ware-housingSub-problems:P1: how to collect the data (-> Data Warehousing)P1.1: how to collect counts (-> OLAP; datacubes)P2: Decision treesP3: Association rulesP4: Clustering15-826 Copyright: C. Faloutsos (2005) 8CMU SCSData Ware-housingP1: how to collect the data ?sales(p-id, c-id, date, $price)customers( c-id, age, income, ...)NYSF???PGH15-826 Copyright: C. Faloutsos (2005) 9CMU SCSData Ware-housingP1: how to collect the data ?A: one solution: make local (summarized) copysales(p-id, c-id, date, $price)customers( c-id, age, income, ...)NYSFPGH15-826 Copyright: C. Faloutsos (2005) 10CMU SCSData Ware-housingP1: how to collect the data ?A: one solution: make local (summarized) copy• how often to update?• what/how to summarize?• ‘wrappers’ and ‘mediators’: s/w modules to automate conversions and smooth discrepancies• Q: how about a ‘virtual’ D/W?15-826 Copyright: C. Faloutsos (2005) 11CMU SCSData Ware-housingQ: how about a ‘virtual’ D/W? (ie., ‘views’)A: may delay OLTP machines (but: ‘Cerebellum’)sales(p-id, c-id, date, $price)customers( c-id, age, income, ...)NYSFPGH15-826 Copyright: C. Faloutsos (2005) 12CMU SCSD/W - OLAP(OLAP= On Line Analytical Processing)Sub-problems:P1: how to collect the data (-> Data Warehousing)P1.1: how to collect counts (-> OLAP; datacubes)Problem: “is it true that shirts in large sizes sell better in dark colors?”C. Faloutsos315-826 Copyright: C. Faloutsos (2005) 13CMU SCSD/W - OLAPProblem: “is it true that shirts in large sizes sell better in dark colors?”ci-d p-id Size Color $C10 Shirt L Blue 30C10 Pants XL Red 50C20 Shirt XLWhite20sales...C / S S M L TOTRed 20 3 5 28Blue 3 3 8 14Gray 0 0 5 5TOT 23 6 18 4715-826 Copyright: C. Faloutsos (2005) 14CMU SCSDataCubes‘color’, ‘size’: DIMENSIONS‘count’: MEASUREC / S S M L TOTRed 20 3 5 28Blue 3 3 8 14Gray 0 0 5 5TOT 23 6 18 47φcolorsizecolor; size15-826 Copyright: C. Faloutsos (2005) 15CMU SCSDataCubes‘color’, ‘size’: DIMENSIONS‘count’: MEASUREC / S S M L TOTRed 20 3 5 28Blue 3 3 8 14Gray 0 0 5 5TOT 23 6 18 47φcolorsizecolor; size15-826 Copyright: C. Faloutsos (2005) 16CMU SCSDataCubes‘color’, ‘size’: DIMENSIONS‘count’: MEASUREC / S S M L TOTRed 20 3 5 28Blue 3 3 8 14Gray 0 0 5 5TOT 23 6 18 47φcolorsizecolor; size15-826 Copyright: C. Faloutsos (2005) 17CMU SCSDataCubes‘color’, ‘size’: DIMENSIONS‘count’: MEASUREC / S S M L TOTRed 20 3 5 28Blue 3 3 8 14Gray 0 0 5 5TOT 23 6 18 47φcolorsizecolor; size15-826 Copyright: C. Faloutsos (2005) 18CMU SCSDataCubes‘color’, ‘size’: DIMENSIONS‘count’: MEASUREC / S S M L TOTRed 20 3 5 28Blue 3 3 8 14Gray 0 0 5 5TOT 23 6 18 47φcolorsizecolor; sizeC. Faloutsos415-826 Copyright: C. Faloutsos (2005) 19CMU SCSDataCubes‘color’, ‘size’: DIMENSIONS‘count’: MEASUREC / S S M L TOTRed 20 3 5 28Blue 3 3 8 14Gray 0 0 5 5TOT 23 6 18 47φcolorsizecolor; sizeDataCube15-826 Copyright: C. Faloutsos (2005) 20CMU SCSDataCubesSQL query to generate DataCube:• Naively (and painfully:)select size, color, count(*)from sales where p-id = ‘shirt’group by size, colorselect size, count(*)from sales where p-id = ‘shirt’group by size...15-826 Copyright: C. Faloutsos (2005) 21CMU SCSDataCubesSQL query to generate DataCube:• with ‘cube by’ keyword:select size, color, count(*)from saleswhere p-id = ‘shirt’cube by size, color15-826 Copyright: C. Faloutsos (2005) 22CMU SCSDataCubes(some additional concepts:• concept hierarchy: eg., time: hour -> day-> month -> year(Q: other concept hierarchies?)• ‘star’ schema (‘snow-flake’, ‘constellation’etc))15-826 Copyright: C. Faloutsos (2005) 23CMU SCSDataCubesQ1: How to store a dataCubeQ2: What operations should we support?Q3: How to index a dataCube?15-826 Copyright: C. Faloutsos (2005) 24CMU SCSDataCubesQ1: How to store a dataCube?C / S S M L TOTRed 20 3 5 28Blue 3 3 8 14Gray 0 0 5 5TOT 23 6 18 47C. Faloutsos515-826 Copyright: C. Faloutsos (2005) 25CMU SCSDataCubesQ1: How to store a dataCube?A1: Relational (R-OLAP)C / S S M L TOTRed 20 3 5 28Blue 3 3 8 14Gray 0 0 5 5TOT 23 6 18 47Color Size count'all' 'all' 47Blue 'all' 14Blue M 3…15-826 Copyright: C. Faloutsos (2005) 26CMU SCSDataCubesQ1: How to store a dataCube?A2: Multi-dimensional (M-OLAP)A3: Hybrid (H-OLAP)C / S S M L TOTRed 20 3 5 28Blue 3 3 8 14Gray 0 0 5 5TOT 23 6 18 4715-826 Copyright: C. Faloutsos (2005) 27CMU SCSDataCubesPros/Cons:ROLAP strong points: (DSS, Metacube)15-826 Copyright: C. Faloutsos (2005) 28CMU SCSDataCubesPros/Cons:ROLAP strong points: (DSS, Metacube)• use existing RDBMS technology• scale up better with dimensionality15-826 Copyright: C. Faloutsos (2005) 29CMU SCSDataCubesPros/Cons:MOLAP strong points: (EssBase/hyperion.com)• faster indexing(careful with: high-dimensionality; sparseness)HOLAP: (MS SQL server OLAP services)• detail data in ROLAP; summaries in MOLAP15-826 Copyright: C. Faloutsos (2005) 30CMU SCSDataCubesQ1: How to store a dataCubeQ2: What operations should we support?Q3: How to index a dataCube?C. Faloutsos615-826 Copyright: C. Faloutsos (2005) 31CMU SCSDataCubesQ2: What operations should we support?C / S S M L TOTRed 20 3 5 28Blue 3 3 8 14Gray 0 0 5 5TOT 23 6 18 47φcolorsizecolor; size15-826 Copyright: C. Faloutsos (2005) 32CMU SCSDataCubesQ2: What operations


View Full Document

CMU CS 15826 - Data Mining - DB concepts

Download Data Mining - DB concepts
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Mining - DB concepts and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Mining - DB concepts 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?