CSCI585CSCI585C. ShahabiMultidimensional DatabasesCyrus ShahabiComputer Science DepartmentUniversity of Southern [email protected]. ShahabiOutlineDefinitions (from A.R. 20 and 21)Focus Application: OLAPPrefix-Sum (from A.R. 16)Dynamic Data Cube (from A.R. 17)Iterative Data Cube (from A.R. 18)Wavelet-based approachesCompact Data Cube (from A.R. 19)ProPolyne (from A.R. 22 and 23)CSCI585CSCI585C. ShahabiDefinitionsDuring the past decade, the multidimensional data model emerged for use when the objective is to analyze data rather than to perform online transactions. In contrast to previous technologies, these databases view data as multidimensional cubes that are particularly well suited for data analysis.Multidimensional data models have three important application areas within data analysis:Data warehouses are large repositories that integrate data from several sources in an enterprise for analysis.Online analytical processing (OLAP) systems provide fast answers for queries that aggregate large amounts of detail data to find overall trends.Data mining applications seek to discover knowledge by searching semi-automatically for previously unknown patterns and relationships in multidimensional databases.CSCI585CSCI585C. ShahabiDefinitions…Multidimensional databases view data as cubes that generalize spreadsheets to any number of dimensions.In addition, cubes support hierarchies in dimensions and formulas without duplicating their definitions. A collection of related cubes comprises a multidimensional database or data warehouse.Dimensions are used for selecting and aggregating data at the desired level of detail. A dimension is organized into a containment-like hierarchy composed of numerous levels, each representing a level of detail required by the desired analyses. Each instance of the dimension, or dimension value, belongs to a particular level.CSCI585CSCI585C. ShahabiDefinitions …Facts represent the subject—the interesting pattern or event in the enterprise that must be analyzed In most multidimensional data models, facts are implicitly defined by their combination of dimension values; a fact exists only if there is a nonempty cell for a particular combination of values.A measure consists of two components:a fact’s numerical property, such as the sales price or profita formula, usually a simple aggregation function such as sum , that can combine several measure values into one.In a multidimensional database, measures generally represent the properties of the fact that the user wants to optimize. Measures then take on different values for various dimension combinations. The property and formula are chosen to provide a meaningful value for all combinations of aggregation levels.CSCI585CSCI585C. ShahabiFocus Application: On-Line Analytical Processing (OLAP)Multidimensional data sets: Dimension attributes (e.g., Store, Product, Data) Measure attributes (e.g., Sale, Price)Range-sum queries Average sale of shoes in CA in 2001 Number of jackets sold in Seattle in Sep. 2001Tougher queries: Covariance of sale and price of jackets in CA in 2001 (correlation) Variance of price of jackets in 2001 in SeattleStoreLocationProductDate SaleLA Shoes Jan. 01 $21,500 $85.99NY Jacket June 01 $28,700 $45.99Price...............Market-RelationMarket-Relationσ(p=shoe)σ(s <in> CA)σ(d <in> 2001)Avg (sale)Too Slow!Too Slow!CSCI585CSCI585C. ShahabiRange Queries in OLAP Data CubesChing-Tien Ho Rakesh AgrawalNimord Megiddo Rammakrishnan SrikantIBM Almaden Research CenterCSCI585CSCI585C. ShahabiOutlineIntroductionThe Basic Range-Sum AlgorithmThe Blocked Range-Sum AlgorithmThe Batch-Update Algorithm for Range-Sum QueriesChoosing Dimension,Cuboids and Block SizesConclusionCSCI585CSCI585C. ShahabiIntroductionExample for MDDBAssume the data cube has four functional attributes (dimensions): age,year,state and insurance type. The data cube will have age*year * state* type cells, with each cell containing the total revenue (the measure attribute) for the corresponding combination of these four attributesCSCI585CSCI585C. ShahabiMeasure AttributeIntroduction(example)Customer_ageDateDimensional Attribute$390$590$560$950$800$500$37010/30$190$360$250$500$440$390$17010/29$170$230$250$600$600$230$25010/28$250$500$470$800$790$550$80010/27$300$500$500$600$200$170$11010/26$300$100$300$240$100$130$12010/25$150$600$250$500$700$400$30010/2423222120191817IndexCSCI585CSCI585C. ShahabiIntroductionRange QueriesThat apply a given aggregation operation over selected cells where the selection is specified as continuous ranges in the domains of some of the attributes.Example: Sum, Max, Count and AverageCSCI585CSCI585C. ShahabiIntroduction The Basic Range-Sum AlgorithmThe Blocked Range-Sum AlgorithmThe Batch-Update Algorithm for Range-Sum QueriesConclusionCSCI585CSCI585C. ShahabiThe Basic Range-Sum Algorithm Pre-computed Prefix-Sum array PLet P be a d-dimensional array of sizeN = n1 * n2 * ……* nd∑∑= ===xiyjjiAyxSumyxP0 0],[):0,:0(],[CSCI585CSCI585C. ShahabiThe Basic Range-Sum Algorithm533242228623713221530543210IndexArray A (original array)635340292412244392921181011613119830543210IndexArray P (Prefix-Sum array)ExampleP(1,1)=A(0,0)+A(0,1)+A(1,0)+A(1,1)6318CSCI585CSCI585C. ShahabiThe Basic Range-Sum AlgorithmTheorem 1:Proves how any range-sum of A can be computed from up to 2dappropriate elements of P.CSCI585CSCI585C. ShahabiThe Basic Range-Sum AlgorithmQuestion:How can we get the sum of this area?533242228623713221530543210IndexArray A (original array)CSCI585CSCI585C. ShahabiThe Basic Range-Sum AlgorithmWhen d=2, the range-sum Sum(l1:h1,l2:h2) can be obtained in three computation steps asSum(l1:h1,l2:h2) = P [h1,h2]-P [h1,l2-1]-P [l1-1,h2]+P [l1-1,l2-1]22-1=3 stepsCSCI585CSCI585C. ShahabiThe Basic Range-Sum AlgorithmRegion A in Original ArrayRegion C=Region B in Array P+Region ERegion DCSCI585CSCI585C. ShahabiThe Basic Range-Sum AlgorithmFor instance:The range-sum Sum(2:3,1:2)=P [3,2]-P [3,0]-P [1,2]+P [1,0]=40-11-24+8=13635340292412244392921181011613119830543210IndexArray P4029242921181198533242228623713221530543210IndexArray A (original array)CSCI585CSCI585C. ShahabiIntroductionThe Basic Range-Sum Algorithm The Blocked Range-Sum AlgorithmThe Batch-Update Algorithm for Range-Sum
View Full Document