Chapter 18: Data Analysis and MiningSlide 2Decision Support SystemsDecision-Support Systems: OverviewData Analysis and OLAPCross Tabulation of sales by item-name and colorRelational Representation of Cross-tabsData CubeOnline Analytical ProcessingHierarchies on DimensionsCross Tabulation With HierarchyOLAP ImplementationOLAP Implementation (Cont.)Extended Aggregation in SQL:1999Extended Aggregation (Cont.)Slide 16Slide 17RankingRanking (Cont.)Slide 20Slide 21WindowingWindowing (Cont.)Data WarehousingSlide 25Design IssuesMore Warehouse Design IssuesWarehouse SchemasData Warehouse SchemaData MiningData Mining (Cont.)Classification RulesDecision TreeConstruction of Decision TreesBest SplitsBest Splits (Cont.)Slide 37Finding Best SplitsDecision-Tree Construction AlgorithmOther Types of ClassifiersNaïve Bayesian ClassifiersRegressionAssociation RulesAssociation Rules (Cont.)Finding Association RulesFinding SupportOther Types of AssociationsClusteringHierarchical ClusteringClustering AlgorithmsCollaborative FilteringOther Types of MiningDatabase System Concepts©Silberschatz, Korth and SudarshanSee www.db-book.com for conditions on re-use Chapter 18: Data Analysis and Mining Chapter 18: Data Analysis and Mining©Silberschatz, Korth and Sudarshan18.2Database System Concepts - 5th Edition, Aug 26, 2005Chapter 18: Data Analysis and Mining Chapter 18: Data Analysis and Mining Decision Support SystemsData Analysis and OLAPData Warehousing Data Mining©Silberschatz, Korth and Sudarshan18.3Database System Concepts - 5th Edition, Aug 26, 2005Decision Support SystemsDecision Support SystemsDecision-support systems are used to make business decisions, often based on data collected by on-line transaction-processing systems.Examples of business decisions:What items to stock?What insurance premium to change?To whom to send advertisements?Examples of data used for making decisionsRetail sales transaction detailsCustomer profiles (income, age, gender, etc.)©Silberschatz, Korth and Sudarshan18.4Database System Concepts - 5th Edition, Aug 26, 2005Decision-Support Systems: OverviewDecision-Support Systems: OverviewData analysis tasks are simplified by specialized tools and SQL extensionsExample tasksFor each product category and each region, what were the total sales in the last quarter and how do they compare with the same quarter last yearAs above, for each product category and each customer categoryStatistical analysis packages (e.g., : S++) can be interfaced with databasesStatistical analysis is a large field, but not covered hereData mining seeks to discover knowledge automatically in the form of statistical rules and patterns from large databases.A data warehouse archives information gathered from multiple sources, and stores it under a unified schema, at a single site.Important for large businesses that generate data from multiple divisions, possibly at multiple sitesData may also be purchased externally©Silberschatz, Korth and Sudarshan18.5Database System Concepts - 5th Edition, Aug 26, 2005Data Analysis and OLAPData Analysis and OLAPOnline Analytical Processing (OLAP)Interactive analysis of data, allowing data to be summarized and viewed in different ways in an online fashion (with negligible delay)Data that can be modeled as dimension attributes and measure attributes are called multidimensional data.Measure attributes measure some valuecan be aggregated upone.g. the attribute number of the sales relationDimension attributesdefine the dimensions on which measure attributes (or aggregates thereof) are viewede.g. the attributes item_name, color, and size of the sales relation©Silberschatz, Korth and Sudarshan18.6Database System Concepts - 5th Edition, Aug 26, 2005Cross Tabulation of Cross Tabulation of salessales by by item-name item-name and and colorcolorThe table above is an example of a cross-tabulation (cross-tab), also referred to as a pivot-table.Values for one of the dimension attributes form the row headersValues for another dimension attribute form the column headersOther dimension attributes are listed on topValues in individual cells are (aggregates of) the values of the dimension attributes that specify the cell.©Silberschatz, Korth and Sudarshan18.7Database System Concepts - 5th Edition, Aug 26, 2005Relational Representation of Cross-tabsRelational Representation of Cross-tabsCross-tabs can be represented as relationsWe use the value all is used to represent aggregatesThe SQL:1999 standard actually uses null values in place of all despite confusion with regular null values©Silberschatz, Korth and Sudarshan18.8Database System Concepts - 5th Edition, Aug 26, 2005Data CubeData CubeA data cube is a multidimensional generalization of a cross-tabCan have n dimensions; we show 3 below Cross-tabs can be used as views on a data cube©Silberschatz, Korth and Sudarshan18.9Database System Concepts - 5th Edition, Aug 26, 2005Online Analytical ProcessingOnline Analytical ProcessingPivoting: changing the dimensions used in a cross-tab is called Slicing: creating a cross-tab for fixed values onlySometimes called dicing, particularly when values for multiple dimensions are fixed.Rollup: moving from finer-granularity data to a coarser granularity Drill down: The opposite operation - that of moving from coarser-granularity data to finer-granularity data©Silberschatz, Korth and Sudarshan18.10Database System Concepts - 5th Edition, Aug 26, 2005Hierarchies on DimensionsHierarchies on DimensionsHierarchy on dimension attributes: lets dimensions to be viewed at different levels of detailE.g. the dimension DateTime can be used to aggregate by hour of day, date, day of week, month, quarter or year©Silberschatz, Korth and Sudarshan18.11Database System Concepts - 5th Edition, Aug 26, 2005Cross Tabulation With HierarchyCross Tabulation With HierarchyCross-tabs can be easily extended to deal with hierarchiesCan drill down or roll up on a hierarchy©Silberschatz, Korth and Sudarshan18.12Database System Concepts - 5th Edition, Aug 26, 2005OLAP ImplementationOLAP ImplementationThe earliest OLAP systems used multidimensional arrays in memory to store data cubes, and are referred to as multidimensional OLAP (MOLAP) systems.OLAP implementations using only relational database features are called relational OLAP (ROLAP) systemsHybrid
View Full Document