1 Introduction2 The Relational Interval Tree2.1 Relational Storage and Extensible Indexing2.2 Dynamic Data Structure2.3 Intersection Query Processing3 Algorithms for General Interval Relationships3.1 Query Generation Schema3.2 Query Processing for the Traversal Classes3.3 Extended Indexes for the Singleton Classes3.4 Heuristics to Scan the Fork Node3.5 Closing Gaps for the Range Classes3.6 Complete Query Processing4 Experimental Evaluation5 ConclusionsObject-Relational Indexing for General Interval RelationshipsHans -Peter Kriegel, Marco Pötke , and Thomas SeidlUniversity of Munich, Institute for Compu ter ScienceOetti ngenstr. 67, 80538 Munich, Ger many{kriegel,poetke,seidl}@dbs.informatik.uni-muenchen.deAbstract . Intervals represen t a fundamental data type for temporal , scientific, andspatia l databases where time stamp s and point data are extended to time spans andrange data, res pectively. For O LTP and OLAP a pplications on large amounts ofdata, no t only intersection que r ies have to be processed e fficiently but a lso generalinterval rela tionships including before, meets, overlaps, starts, finishes, contains,equals, during, startedBy, finishedBy , overlappedBy, metBy, and after. Our newalgorithms use the Relationa l Interva l Tree, a purely SQL-base d and object-relationally wrapped index str uctur e. The technique therefore pre serves theindust rial strength of the underly ing RDBMS including stab ility, transactions , andperfor mance. The efficiency of our a ppro ach is demonstrat ed by an experim entalevaluation on a rea l weblog data se t conta ining one million sessions.1 IntroductionModern database a pplications often manag e extended data i ncluding time spans for thevalidity of stored facts [TCG +93], tolerance ranges for imprecisely measured values inscientific databases, or approximate values i n local caches of distributed databases[OLW01]. Onlin e analytical processing fo r data wa rehouses, for example on the tempo-ral coherence of marketing activities and the sales volume, requires intervals as a basicdatatype . Moreover, the practica l relevance of intervals ha s been strongly emphasizedby the introductio n of the corresponding datatypes and predicates into the newSQL:1999 standar d, forme rly known as SQL 3 [EM99]. In SQL, a n inter val comprisinga rang e between two ordered boundarie s is termed a “PER IOD” and denotes an an-chored duration on the linear tim e line. Dates can be encod ed by a basic temporaldatatype or an in teger. In addition to plain interval intersection queries, mor e refined re-lationships have to be supported for many appl ications, e.g. sequenced temporal integ-rity checking [LSD +01]. Compulsory p redicates on PERIODs include th e plain inte rvalintersect ion as well as a subset of Allen ’s 13 general interval relationsh ips [All83] (cf.Figur e1), namely “PRECEDES” ( =before), “SUCCEDES” ( =after), “MEETS”(=meets∪metBy ), and “CON TAINS” (=contains orduring, resp.) [Sno00]. In order to bring the expressive pow er of interval predic ates and SQ L:1999 to life, arobus t access method for intervals and their predicates is required . As this componenthas to be integrated into commercial database servers to complement the interval datamodel with e fficient quer y execution, a maximal exploit ation of the generic functional-C.S. Jensen et al. (Eds.): SSTD 2001, LNCS 2121, pp. 522−542, 2001. Springer-Verlag Berlin Heidelberg 2001ity of existing RDBMS is essential [JS 99] [TJS 98]. In this paper, we therefore proposea new technique to efficiently evaluate not only the five interval predicates ofSQL:1999, but the full set of Allen’s general relationships on top of any object-relationaldatabase kernel. We follow the layered approach of temporal database systems by trans-forming the DDL and DML on interval data into conventional statements executed bythe underlying DBMS. At the same time, its industrial strength, including stability,transactions, and performance is fully preserved by our proposed method.Whereas point data has been supported very efficiently for a long time, the problemof managing intervals is not addressed by commercial database systems up to now. Sev-eral index structures have been proposed that immediately are built on top of relationaldatabase systems. They use the SQL level of an ordinary RDBMS as virtual storage me-dium, and, therefore, we call them relational storage structures. Among them, the Rela-tional Interval Tree1 (RI-tree) [KPS 00] provides the optimal complexities of O(n/b)space to store n intervals on disk blocks of size b, O(logbn) I/O operations for insertionor deletion of a single interval, and O(h·logbn + r/b) I/Os to process an interval intersec-tion query producing r results. The parameter h depends on the extension and the gran-ularity of the data space but not on the number n of intervals. As a competing method,the linear quadtree [Sam 90] as used in Oracle or DB2 for spatial objects maps a main-memory structure onto built-in relational indexes, too, and may be called linear segmenttree in the one-dimensional case. Unfortunately, the decomposition of intervals into seg-ments yields a potentially high redundancy in the database in contrast to the RI-tree.The MAP21 transformation [ND 99] or the H-, V-, or D-order interval spatial trans-form [GLOT 96] refine the idea to employ a composite index on the interval bounds andorder the intervals lexicographically by (lower, upper) or (upper, lower). Finally, thewindow list technique [Ram 97] is very efficient but may degenerate for dynamic datasets. An additional broad variety of secondary storage structures for intervals has beenproposed in the literature. Since these approaches rely on augmentations of built-in in-dexes structures, they are not suitable to be used in industrial applications unless they areintegrated into the kernel software by the database vendors. A detailed discussion ofthese aspects is provided in [KPS 00].What we propose in this paper are extensions of the RI-tree algorithms that efficient-ly support the general interval relationships of Allen. After recalling the Relational In-terval Tree in Section 2, we present our new algorithms in Section 3. We suggest an ef-1. Patent pendingFig. 1. The 13 general interval relationships according to Allen [All 83]containsoverlappedByfinishedBy startedBybefore meets overlapsfinishesstartsduringmetBy afterequals523Object-Relational Indexing
View Full Document