0018-9162/01/$17.00 © 2001 IEEE40 ComputerMultidimensionalDatabase TechnologyThe relational data model, which was intro-duced by E.F. Codd in 1970 and earned himthe Turing award a decade later, served as thefoundation of today’s multibillion-dollardatabase industry. During the past decade,the multidimensional data model emerged for usewhen the objective is to analyze data rather than toperform online transactions. Multidimensional data-base technology is a key factor in the interactive analy-sis of large amounts of data for decision-makingpurposes. In contrast to previous technologies, thesedatabases view data as multidimensional cubes thatare particularly well suited for data analysis. Multidimensional models categorize data either asfacts with associated numerical measures or as textualdimensions that characterize the facts. In the case of aretail business, a purchase would be a fact and the pur-chase amount and price would be measures; the typeof product being bought and the purchase time andlocation would be dimensions. Queries aggregate mea-sure values over a range of dimension values to provideresults such as total sales per month of a given prod-uct. Multidimensional data models have three impor-tant application areas within data analysis. • Data warehouses are large repositories that inte-grate data from several sources in an enterprisefor analysis. • Online analytical processing (OLAP) systems pro-vide fast answers for queries that aggregate largeamounts of detail data to find overall trends.• Data mining applications seek to discover knowl-edge by searching semiautomatically for previ-ously unknown patterns and relationships in mul-tidimensional databases. Academic researchers have proposed formal math-ematical models of multidimensional databases, whileindustry has implicitly specified proposals via the con-crete software tools that implement them.1,2The“Multidimensional Database History” sidebar de-scribes the evolution of the multidimensional datamodel and how it has benefited from the use of seman-tic as well as scientific and statistical data models. SPREADSHEETS AND RELATIONSA spreadsheet such as that shown in Table 1 is a use-ful tool for analyzing sales data such as product sold,number of purchases, and city of sale. A pivot table isa two-dimensional spreadsheet with associated subto-tals and totals that supports viewing more complexdata by nesting several dimensions on the x- or y-axisand displaying data on multiple pages. Pivot tablesgenerally support interactively selecting data subsetsand changing the displayed level of detail. Spreadsheets are an inadequate tool for managingand storing multidimensional data because they tie datastorage too tightly to the presentation—they do notseparate the structural information from the desiredviews of the information. Thus, adding a third dimen-sion such as time or grouping the data into higher-levelproduct types requires a considerably more complexsetup. The obvious solution is to use a separate spread-sheet for each dimension, but this will work only to alimited extent because analyzing the additional valuesof the extra dimension quickly becomes unwieldy. Multidimensional databases model data as either facts, dimensions, ornumerical measures for use in the interactive analysis of large amounts of data for decision-making purposes. Torben BachPedersen Christian S.JensenAalborg UniversityCOVER FEATUREUsing a Structured Query Language database man-agement system offers considerable flexibility in struc-turing data. However, formulating many desirablecomputations such as cumulative aggregates (sales inyear to date), combining totals and subtotals, or deter-mining rankings such as the top 10 selling products isdifficult if not impossible in standard SQL. Also, trans-posing rows and columns requires manually specify-ing and combining multiple views. Although SQLextensions such as the data cube operator3and querywindows4will remedy some of these problems, theSQL-based relational model does not handle hierar-chical dimensions satisfactorily.Spreadsheets and relational databases provide ade-quate support for a small volume of data that has onlya few nonhierarchical dimensions, but they do notfully support the requirements for advanced dataanalysis. The only robust solution is to use databasetechnology that offers inherent support for the fullrange of multidimensional data modeling.CUBESMultidimensional databases view data as cubes thatgeneralize spreadsheets to any number of dimensions.In addition, cubes support hierarchies in dimensionsand formulas without duplicating their definitions. Acollection of related cubes comprises a multidimen-sional database or data warehouse.Because dimensions in a cube are first-class, built-in concepts with associated domains, cubes can easilymanage the addition of new dimension values.Although the term implies three dimensions, a cubecan theoretically have any number of dimensions; infact, most real-world cubes have four to 12 dimen-sions.5,6Current tools often experience performanceproblems when a so-called hypercube contains morethan 10 to 15 dimensions.Combinations of dimension values define a cube’scells. Depending on the specific application, the cellsin a cube range from sparse to dense. Cubes tend tobecome sparser as dimensionality increases and as thedimension values’ granularities become finer. Figure 1 shows a cube capturing the sales for thetwo Danish cities in Table 1 with the additionaldimension of time. The corresponding cells store thenumber of sales. The example has a fact—a nonemptycell that contains a number of associated numericalmeasures—for each combination of time, product,and city where at least one sale was made. The cellsstore numerical values associated with a fact—in thiscase, the number of sales is the only measure.Generally, a cube supports viewing only two or threedimensions simultaneously, but it can show up to fourlow-cardinality dimensions by nesting one dimensionwithin another on the axes. Thus, cube dimensional-ity is reduced at query time by projecting it down toDecember 2001 41Table 1. Sample sales spreadsheet.Product Number of purchases by cityAalborg Copenhagen Los Angeles New York CityMilk 123 555 145 5,001 Bread 102 250 54 2,010 Jeans 20 89 32 345Light bulbs 22 213 32 9,4502000 2001123 12757 455667211CopenhagenAalborgBreadMilkFigure 1. Sample cube capturing sales data. Data cubes support viewing of up to fourlow-cardinality
View Full Document