Unformatted text preview:

CSCI585 Multidimensional Databases Cyrus Shahabi Computer Science Department University of Southern California shahabi usc edu C Shahabi CSCI585 Outline Definitions from A R 20 and 21 Focus Application OLAP Prefix Sum from A R 16 Dynamic Data Cube from A R 17 Iterative Data Cube from A R 18 Wavelet based approaches Compact Data Cube from A R 19 ProPolyne from A R 22 and 23 C Shahabi Definitions CSCI585 During the past decade the multidimensional data model emerged for use when the objective is to analyze data rather than to perform online transactions In contrast to previous technologies these databases view data as multidimensional cubes that are particularly well suited for data analysis Multidimensional data models have three important application areas within data analysis C Shahabi Data warehouses are large repositories that integrate data from several sources in an enterprise for analysis Online analytical processing OLAP systems provide fast answers for queries that aggregate large amounts of detail data to find overall trends Data mining applications seek to discover knowledge by searching semi automatically for previously unknown patterns and relationships in multidimensional databases CSCI585 Definitions Multidimensional databases view data as cubes that generalize spreadsheets to any number of dimensions In addition cubes support hierarchies in dimensions and formulas without duplicating their definitions A collection of related cubes comprises a multidimensional database or data warehouse Dimensions are used for selecting and aggregating data at the desired level of detail A dimension is organized into a containment like hierarchy composed of numerous levels each representing a level of detail required by the desired analyses Each instance of the dimension or dimension value belongs to a particular level C Shahabi Definitions CSCI585 Facts represent the subject the interesting pattern or event in the enterprise that must be analyzed In most multidimensional data models facts are implicitly defined by their combination of dimension values a fact exists only if there is a nonempty cell for a particular combination of values A measure consists of two components a fact s numerical property such as the sales price or profit a formula usually a simple aggregation function such as sum that can combine several measure values into one In a multidimensional database measures generally represent the properties of the fact that the user wants to optimize Measures then take on different values for various dimension combinations The property and formula are chosen to provide a meaningful value for all combinations of aggregation C Shahabi levels Focus Application On Line Analytical Processing OLAP Multidimensional data sets Average sale of shoes in CA in 2001 Number of jackets sold in Seattle in Sep 2001 Tougher queries C Shahabi Store Product Location LA NY Date Sale Covariance of sale and price of jackets in CA in 2001 correlation Variance of price of jackets in 2001 in Seattle Price Shoes Jan 01 21 500 85 99 Jacket June 01 28 700 45 99 Range sum queries Dimension attributes e g Store Product Data Measure attributes e g Sale Price Market Relation Avg sale d in 2001 To oS low CSCI585 s in CA p shoe Market Relation CSCI585 Range Queries in OLAP Data Cubes Ching Tien Ho Rakesh Agrawal Nimord Megiddo Rammakrishnan Srikant IBM Almaden Research Center C Shahabi CSCI585 Outline C Shahabi Introduction The Basic Range Sum Algorithm The Blocked Range Sum Algorithm The Batch Update Algorithm for RangeSum Queries Choosing Dimension Cuboids and Block Sizes Conclusion CSCI585 Introduction Example for MDDB Assume the data cube has four functional attributes dimensions age year state and insurance type The data cube will have age year state type cells with each cell containing the total revenue the measure attribute for the corresponding combination of these four attributes C Shahabi CSCI585 Introduction example Dimensional Attribute Date C Shahabi Measure Attribute Customer age Index 17 18 19 20 21 22 23 10 24 300 400 700 500 250 600 150 10 25 120 130 100 240 300 100 300 10 26 110 170 200 600 500 500 300 10 27 800 550 790 800 470 500 250 10 28 250 230 600 600 250 230 170 10 29 170 390 440 500 250 360 190 10 30 370 500 800 950 560 590 390 CSCI585 Introduction Range Queries That apply a given aggregation operation over selected cells where the selection is specified as continuous ranges in the domains of some of the attributes Example Sum Max Count and Average C Shahabi CSCI585 Introduction The C Shahabi Basic Range Sum Algorithm The Blocked Range Sum Algorithm The Batch Update Algorithm for RangeSum Queries Conclusion CSCI585 The Basic Range Sum Algorithm Pre computed Prefix Sum array P Let P be a d dimensional array of size N n1 n2 nd x y P x y Sum 0 x 0 y A i j i 0 j 0 C Shahabi The Basic Range Sum Algorithm CSCI585 Example Array A original array Array P Prefix Sum array Index 0 1 2 3 4 5 Index 0 1 2 3 0 3 5 1 2 2 3 0 3 8 9 11 13 16 1 7 3 2 6 8 2 1 10 18 21 29 39 44 18 2 2 4 2 3 3 5 2 12 24 P 1 1 A 0 0 A 0 1 A 1 0 A 1 1 C Shahabi 4 5 29 40 53 63 63 CSCI585 The Basic Range Sum Algorithm Theorem 1 Proves how any range sum of A can be computed from up to 2d appropriate elements of P C Shahabi CSCI585 The Basic Range Sum Algorithm Question How can we get the sum of this area Array A original array C Shahabi Index 0 1 2 3 4 5 0 3 5 1 2 2 3 1 7 3 2 6 8 2 2 2 4 2 3 3 5 CSCI585 The Basic Range Sum Algorithm When d 2 the range sum Sum l1 h1 l2 h2 can be obtained in three computation steps as Sum l1 h1 l2 h2 P h1 h2 P h1 l2 1 P l1 1 h2 P l1 1 l2 1 22 1 3 steps C Shahabi CSCI585 The Basic Range Sum Algorithm Region B in Array P Region C Region A in Original Array C Shahabi Region D Region E CSCI585 The Basic Range Sum Algorithm For instance The range sum Sum 2 3 1 2 P 3 2 P 3 0 P 1 2 P 1 0 40 11 24 8 13 Array A original array Array P Index 0 1 2 3 4 5 Index 0 1 2 3 0 3 5 1 2 2 3 0 3 8 9 11 13 16 1 7 3 2 6 8 2 1 10 18 21 29 39 44 2 2 4 2 3 3 5 2 12 24 29 40 53 63 C Shahabi 4 5 CSCI585 Introduction The Basic Range Sum Algorithm The C Shahabi Blocked Range Sum Algorithm The Batch Update Algorithm for RangeSum Queries Conclusion CSCI585 The Blocked Range Sum Algorithm C Shahabi To save space as a trade off to time is to keep Prefix Sums at a coarser grained block level Store the Prefix …


View Full Document

USC CSCI 585 - Session17

Loading Unlocking...
Login

Join to view Session17 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Session17 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?