Modeling Multidimensional Databases, Cubes and Cube OperationsPanos VassiliadisNational Technical University of AthensAbstractOn-Line Analytical Processing (OLAP) is a trend indatabase technology, which was recently introduced andhas attracted the interest of a lot of research work.OLAP is based on the multidimensional view of data,supported either by multidimensional databases(MOLAP) or relational engines (ROLAP).In this paper we propose a model formultidimensional databases. Dimensions, dimensionhierarchies and cubes are formally introduced. We alsointroduce cube operations (changing of levels in thedimension hierarchy, function application, navigationetc.). The approach is based on the notion of the basecube, which is used for the calculation of the results ofcube operations. We focus our approach on the supportof series of operations on cubes (i.e. the preservation ofthe results of previous operations and the applicability ofaggregate functions in a series of operations).Furthermore, we provide a mapping of themultidimensional model to the relational model and tomultidimensional arrays.1. IntroductionIn recent database trends, data warehouses come tofill a gap in the field of querying large, distributed andfrequently updated systems. Most researchers anddevelopers share the same general vision of what a datawarehouse is [19], [3]. Data are extracted from severaldata sources, cleansed, customized and inserted into thedata warehouse. The logical structure and semantics ofthe data, or else Enterprise Model, is stored in anInformation Directory. Next, the data warehouse datacan be filtered, aggregated and stored in smallerspecialized data stores, usually called data marts. Usersquery the data marts and/or the data warehouse, mostlythrough On Line Analytical Processing (OLAP)applications. The main characteristics of suchapplications are (a) multidimensional view of data, and(b) data analysis, through interactive and/or navigationalquerying of data [6].The multidimensional view of data considers thatinformation is stored in a multi-dimensional array(sometimes called a Hypercube, or Cube). A Cube is agroup of data cells arranged by the dimensions of thedata [13]. A dimension is defined in [13] as "a structuralattribute of a cube that is a list of members, all of whichare of a similar type in the user's perception of the data".Each dimension has an associated hierarchy of levels ofaggregated data i.e. it can be viewed from different levelsof detail (for example, Time can be detailed as Year,Month, Week, or Day). Measures (which are also knownas variables, metrics, or facts) represent the realmeasured values [6].To motivate the work describing this paper, let ususe a running example of a bookstore company. Whenconsidering the sales of this company, three are the majordimensions: Time, Geography and Item, while weconsider Sales as the measure of the multidimensionalcube. The dimensions, along with their dimension levelsare depicted in Figure 1, where the upper levels of eachhierarchy point to the lower levels:GeographyRegion Country CityItemCategory Type ProductTimeYear Month DaySalesSalesWeekFigure 1. Dimensions and dimension levelsConsider, now, the way dimension level hierarchiesare instantiated in the real world (we consider theinstantiation for dimension Time, to be obvious):Category Type ProductBooks Literature “Report to El Greco” N. Kazantzakis“Karamazof brothers” F. DostoiewskyPhilosophy “Zarathustra”, F. W. Nietzsche“Symposium”, PlatoMusic HeavyMetal“Piece of Mind”, Iron Maiden“Ace of Spades”, MotorheadFigure 2. Item dimensionRegion Country CityEurope Hellas AthensRhodesFrance ParisAsia Israel Tel AvivJapan TokyoFigure 3. Geography dimensionNavigation is a term used to describe the processesemployed by users to explore a cube interactively, bymanipulating the multidimensionally viewed data [6],[13]. Possible operations which can be applied are:Aggregation (or Consolidation, or Roll-up) whichcorresponds to summarization of data for the higher levelof a hierarchy, Roll Down (or Drill down, or Drillthrough) which allows for navigation among levels ofdata ranging from higher level summary (up) to lowerlevel summary or detailed data (down), Selection (orScreening, or Filtering or Dicing) whereby a criterion isevaluated against the data or members of a dimension inorder to restrict the set of retrieved data, Slicing whichallows for the selection of all data satisfying a conditionalong a particular dimension and Pivoting (or Rotation)throughout which one can change of the dimensionalorientation of the cube, e.g. swapping the rows andcolumns, or moving one of the row dimensions into thecolumn dimension, etc. [6], [13].Two are the basic architectures for storing data in anOLAP database: ROLAP and MOLAP. ROLAP(Relational OLAP) [3] is based on a relational databaseserver, extended with capabilities such as extendedaggregation and partitioning of data [8]. The schema ofthe database can be a star, snowflake, or factconstellation schema [3]. On the other hand, MOLAP(Multidimensional OLAP) is based on "pure"Multidimensional Databases (MDDs), which logicallystore data in multidimensional arrays, which are heavilycompressed and indexed, in the physical level, for spaceand performance reasons.The main motivation of this paper is to provide aformal model for multidimensional databases. Sincemultidimensional databases are defined in terms ofdimensions (which are organized in dimensionhierarchies), the model represents them formally.Furthermore, classical OLAP operations, such as roll-up,slice, dice etc. are also represented by the model. We alsoprovide a mapping to relational databases andmultidimensional arrays. We make a serious designchoice: since querying is done in an interactive way, wegive emphasis to the tracking of series of operations,performed in a navigational way.The major contribution of the paper is the modelingof cubes, dimensions and cube operations, in the contextof series of operations. This formalization is currentlyused, in this paper, for a direct modeling of the usualOLAP operations. Instead of mapping OLAP operationsto complex and complicated "relational", or "calculus-like" queries, we directly model them, in astraightforward fashion. To our knowledge, the modelingof the drill-down operation is introduced for the first timein our model. Since engines are based on relationaltechnology, or multidimensional arrays, we also providea direct mapping
View Full Document