NYU CSCI-GA 2433 - Data Analysis and Mining

Unformatted text preview:

Chapter 18: Data Analysis and MiningSlide 2Decision Support SystemsDecision-Support Systems: OverviewData Analysis and OLAPCross Tabulation of sales by item-name and colorRelational Representation of Cross-tabsData CubeOnline Analytical ProcessingHierarchies on DimensionsCross Tabulation With HierarchyOLAP ImplementationOLAP Implementation (Cont.)Extended Aggregation in SQL:1999Extended Aggregation (Cont.)Slide 16Slide 17RankingRanking (Cont.)Slide 20Slide 21WindowingWindowing (Cont.)Data WarehousingSlide 25Design IssuesMore Warehouse Design IssuesWarehouse SchemasData Warehouse SchemaData MiningData Mining (Cont.)Classification RulesDecision TreeConstruction of Decision TreesBest SplitsBest Splits (Cont.)Slide 37Finding Best SplitsDecision-Tree Construction AlgorithmOther Types of ClassifiersNaïve Bayesian ClassifiersRegressionAssociation RulesAssociation Rules (Cont.)Finding Association RulesFinding SupportOther Types of AssociationsClusteringHierarchical ClusteringClustering AlgorithmsCollaborative FilteringOther Types of MiningDatabase System Concepts©Silberschatz, Korth and SudarshanSee www.db-book.com for conditions on re-use Chapter 18: Data Analysis and Mining Chapter 18: Data Analysis and Mining©Silberschatz, Korth and Sudarshan18.2Database System Concepts - 5th Edition, Aug 26, 2005Chapter 18: Data Analysis and Mining Chapter 18: Data Analysis and Mining Decision Support SystemsData Analysis and OLAPData Warehousing Data Mining©Silberschatz, Korth and Sudarshan18.3Database System Concepts - 5th Edition, Aug 26, 2005Decision Support SystemsDecision Support SystemsDecision-support systems are used to make business decisions, often based on data collected by on-line transaction-processing systems.Examples of business decisions:What items to stock?What insurance premium to change?To whom to send advertisements?Examples of data used for making decisionsRetail sales transaction detailsCustomer profiles (income, age, gender, etc.)©Silberschatz, Korth and Sudarshan18.4Database System Concepts - 5th Edition, Aug 26, 2005Decision-Support Systems: OverviewDecision-Support Systems: OverviewData analysis tasks are simplified by specialized tools and SQL extensionsExample tasksFor each product category and each region, what were the total sales in the last quarter and how do they compare with the same quarter last yearAs above, for each product category and each customer categoryStatistical analysis packages (e.g., : S++) can be interfaced with databasesStatistical analysis is a large field, but not covered hereData mining seeks to discover knowledge automatically in the form of statistical rules and patterns from large databases.A data warehouse archives information gathered from multiple sources, and stores it under a unified schema, at a single site.Important for large businesses that generate data from multiple divisions, possibly at multiple sitesData may also be purchased externally©Silberschatz, Korth and Sudarshan18.5Database System Concepts - 5th Edition, Aug 26, 2005Data Analysis and OLAPData Analysis and OLAPOnline Analytical Processing (OLAP)Interactive analysis of data, allowing data to be summarized and viewed in different ways in an online fashion (with negligible delay)Data that can be modeled as dimension attributes and measure attributes are called multidimensional data.Measure attributes measure some valuecan be aggregated upone.g. the attribute number of the sales relationDimension attributesdefine the dimensions on which measure attributes (or aggregates thereof) are viewede.g. the attributes item_name, color, and size of the sales relation©Silberschatz, Korth and Sudarshan18.6Database System Concepts - 5th Edition, Aug 26, 2005Cross Tabulation of Cross Tabulation of salessales by by item-name item-name and and colorcolorThe table above is an example of a cross-tabulation (cross-tab), also referred to as a pivot-table.Values for one of the dimension attributes form the row headersValues for another dimension attribute form the column headersOther dimension attributes are listed on topValues in individual cells are (aggregates of) the values of the dimension attributes that specify the cell.©Silberschatz, Korth and Sudarshan18.7Database System Concepts - 5th Edition, Aug 26, 2005Relational Representation of Cross-tabsRelational Representation of Cross-tabsCross-tabs can be represented as relationsWe use the value all is used to represent aggregatesThe SQL:1999 standard actually uses null values in place of all despite confusion with regular null values©Silberschatz, Korth and Sudarshan18.8Database System Concepts - 5th Edition, Aug 26, 2005Data CubeData CubeA data cube is a multidimensional generalization of a cross-tabCan have n dimensions; we show 3 below Cross-tabs can be used as views on a data cube©Silberschatz, Korth and Sudarshan18.9Database System Concepts - 5th Edition, Aug 26, 2005Online Analytical ProcessingOnline Analytical ProcessingPivoting: changing the dimensions used in a cross-tab is called Slicing: creating a cross-tab for fixed values onlySometimes called dicing, particularly when values for multiple dimensions are fixed.Rollup: moving from finer-granularity data to a coarser granularity Drill down: The opposite operation - that of moving from coarser-granularity data to finer-granularity data©Silberschatz, Korth and Sudarshan18.10Database System Concepts - 5th Edition, Aug 26, 2005Hierarchies on DimensionsHierarchies on DimensionsHierarchy on dimension attributes: lets dimensions to be viewed at different levels of detailE.g. the dimension DateTime can be used to aggregate by hour of day, date, day of week, month, quarter or year©Silberschatz, Korth and Sudarshan18.11Database System Concepts - 5th Edition, Aug 26, 2005Cross Tabulation With HierarchyCross Tabulation With HierarchyCross-tabs can be easily extended to deal with hierarchiesCan drill down or roll up on a hierarchy©Silberschatz, Korth and Sudarshan18.12Database System Concepts - 5th Edition, Aug 26, 2005OLAP ImplementationOLAP ImplementationThe earliest OLAP systems used multidimensional arrays in memory to store data cubes, and are referred to as multidimensional OLAP (MOLAP) systems.OLAP implementations using only relational database features are called relational OLAP (ROLAP) systemsHybrid


View Full Document

NYU CSCI-GA 2433 - Data Analysis and Mining

Download Data Analysis and Mining
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Analysis and Mining and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Analysis and Mining 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?