U of M CSCI 8715 - Performance Analysis of R-Trees with Arbitrary Node Extents

Unformatted text preview:

Performance Analysis of R*-Treeswith Arbitrary Node ExtentsYufei Tao and Dimitris PapadiasAbstract—Existing analysis for R-trees is inadequate for several traditional and emerging applications including, for example,temporal, spatio-temporal, and multimedia databases because it is based on the assumption that the extents of a node are identical onall dimensions, which is not satisfied in these domains. In this paper, we propose analytical models that can accurately predict R*-treeperformance without this assumption. Our derivation is based on the novel concept of extent regression function, which computes thenode extents as a function of the number of node splits. Detailed experimental evaluation reveals that the proposed models areaccurate, even in cases where previous methods fail completely.Index Terms—Database, spatial database, R-tree, cost model.æ1INTRODUCTIONTHE R-tree [14] is one of the most popular multidimen-sional access methods, currently incorporated in var-ious commercial products such as Oracle and Informix.Object minimum bounding rectangles (MBRs) are groupedtogether in leaf nodes according to their spatial proximity,which are then recursively grouped in higher levels untilthe root contains a single node. Fig. 1 shows a simple 2Dexample where five objects ða; b; c; d; eÞ are clustered intotwo leaf nodes N1and N2that constitute the two entries inthe root R. R-trees (like most spatial access methods) weremotivated by the need to efficiently process windowqueries. The R-tree answers the query q in Fig. 1 as follows:The root is first retrieved and the entries (e.g., N2) thatintersect the range are recursively searched because theymay contain qualifying objects (object e). Nonintersectingentries (e.g., N1) are skipped.In addition to “traditional” spatial applications, R-treeshave also been widely used to index data from otherdomains. As an example, consider a banking system thatrecords the historical changes of account balances as a resultof withdrawals and deposits. Old versions of the recordsare not removed since possible queries may inquire aboutany time in history (e.g., find accounts with balances greaterthan 5k dollars during May or June). Each record ismodeled as an interval, e.g., in Fig. 2a segments b0, b1, b2,b3indicate that three deposits have been made to account b(current balance 5k dollars). Similarly, account d incurs onewithdrawal, while a and c do not change during therecorded period. Fig. 2a also shows the MBRs of the leafnodes N1, N2, N3in the corresponding R-tree. Thisapproach (i.e., viewing each record as a 2D interval) hasbeen adopted in numerous R-tree-based indexes [36], [21],[22], [7], [40] for temporal databases.In spatio-temporal and multimedia databases (thatmanage large volume of moving objects), the most basicform of R-trees is the 3D R-tree, which considers time as justanother dimension and integrates it in the tree constructionalong with the other dimensions. The movements of2D objects are modeled as distinct boxes in the three-dimensional space. The temporal projection denotes theperiod when the corresponding object remains static, whilethe spatial projection of the box corresponds to the object’sposition and extent during this period. Whenever an objectmoves to another position, a separate box is created torepresent its new static period, position, and extents. Fig. 2bextends the example of Fig. 1 assuming that at time t1,objects d, and e issue location updates, which result in newversions d0and e0, leading to another leaf node N02.Variations of the above idea have been exploited forindexing multimedia objects [48], trajectory [32], andhistorical information retrieval [38] for real-world data.Furthermore, the 3D R-tree has been used as a part of amultitree structure for aggregate processing in the contextof spatio-temporal data warehouses [35].A number of analytical models (reviewed in the nextsection) have been proposed to capture the performance ofR-trees. One basic assumption in these models is that the extentsof a node are similar on all dimensions which, however, does nothold for temporal, spatio-temporal, and multimedia objects.Asshown in Fig. 2, for example, object “lifespans” (i.e., theprojection of intervals/3D boxes on the time dimension)depend on the lengths of the periods that the correspondingobjects retain their attributes (i.e., salary and positions inFigs. 2a and 2b, respectively). In particular, if a record doesnot change for a long time, then its lifespan will force itsparent node to have (much) longer extent (on the temporaldimension) than those nodes containing only objects withshort lifespans. In such situations, the existing R-tree analysis issimply inapplicable. Although the “elongated-node-extent”problem is well-known [22], [7], [40], predicting the R-treeperformance in this scenario remains an open problem,IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 16, NO. 6, JUNE 2004 653. Y. Tao is with the Department of Computer Science, City University ofHong Kong, Tat Chee Avenue, Hong Kong. E-mail: [email protected].. D. Papadias is with the Department of Computer Science, Hong KongUniversity of Science and Technology, Clear Water Bay, Hong Kong.E-mail: [email protected] received 9 Oct. 2003; revised 5 Feb. 2004; accepted 10 Feb. 2004.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TKDE-0197-1003.1041-4347/04/$20.00 ß 2004 IEEE Published by the IEEE Computer Societywhich seriously limits query optimization in the relatedsystems.This paper settles the problem for the R*-tree [10], the mostefficient and widely-used variant of R-trees. In particular, wepropose a cost model, based on the novel concept of extentregression functions (ERFs), that accuratelycaptures the R*-treeperformance for arbitrary data extents, independently of theconcrete application domain. ERFs provide considerableinsight into the behavior of R*-trees and motivate newoptimization techniques, even in well-studied domains(e.g., spatial databases). Further, our analytical frameworkcan be also applied to other data partition methods (e.g., theBKD-[28], X-[8], A-[41] trees, etc.).The rest of the paper is organized as follows: Section 2surveys existing approaches for R-tree analysis andelaborates why they fail to produce correct estimation forgeneral data sets. Section 3 discusses our models, andSection 4 applies them to


View Full Document

U of M CSCI 8715 - Performance Analysis of R-Trees with Arbitrary Node Extents

Documents in this Course
Load more
Download Performance Analysis of R-Trees with Arbitrary Node Extents
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Performance Analysis of R-Trees with Arbitrary Node Extents and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Performance Analysis of R-Trees with Arbitrary Node Extents 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?