U of M CSCI 8715 - Efficient Cost Models for Spatial Queries Using R-Trees - D2699378

Home> Schools> University of Minnesota- Twin Cities> Computer Science (CSCI) > CSCI 8715> Efficient Cost Models for Spatial Queries Using R-Trees

U of M CSCI 8715 - Efficient Cost Models for Spatial Queries Using R-Trees

School name University of Minnesota- Twin Cities

Course Csci 8715- Spatial Databases and Applications

Pages 14

Download Save

Unformatted text preview:

Efficient Cost Models for Spatial QueriesUsing R-TreesYannis Theodoridis, Member, IEEE, Emmanuel Stefanakis, and Timos Sellis, Member, IEEEAbstractÐSelection and join queries are fundamental operations in Data Base Management Systems (DBMS). Support fornontraditional data, including spatial objects, in an efficient manner is of ongoing interest in database research. Toward this goal,access methods and cost models for spatial queries are necessary tools for spatial query processing and optimization. In this paper,we present analytical models that estimate the cost (in terms of node and disk accesses) of selection and join queries using R-tree-based structures. The proposed formulae need no knowledge of the underlying R-tree structure(s) and are applicable to uniform-likeand nonuniform data distributions. In addition, experimental results are presented which show the accuracy of the analyticalestimations when compared to actual runs on both synthetic and real data sets.Index TermsÐSpatial databases, access methods, query optimization, cost models, R-trees.æ1INTRODUCTIONSUPPORTING large volumes of multidimensional (spatial)data is an inherent characteristic of modern databaseapplications, such as Geographical Information Systems(GIS), Computer-Aided Design (CAD), Image and multi-media databases. Such databases need underlying systemswith extended features (query languages, data models,indexing methods) as compared to traditional databases,mainly due to the complexity of representing and retrievingspatial data. Spatial Data Base Management Systems(SDBMS), in general, should 1) offer appropriate data typesand query language to support spatial data and 2) provideefficient indexing methods and cost models on the execu-tion of specialized spatial operations, for query processingand optimization purposes [15].In the particular field of spatial query processing andoptimization, during the last two decades, several datastructures have been developed for point and nonpointmultidimensional objects in low-dimensional space to meetneeds in a wide area of applications, including the GIS andCAD domains. Due to the large number of spatial datastructures proposed (an exhaustive survey can be found in[14]), active research in this field has recently turned to thedevelopment of analytical models that could make accuratecost predictions for a wide set of spatial queries. Powerfulanalytical models are useful in three ways:1. Structure evaluation: This allows us to better under-stand the behavior of a data structure under variousinput data sets and sizes.2. Benchmarking: This can play the role of an objectivecomparison point when various proposals forefficient spatial indexing are compared to each other3. Query optimization: This can be used by a queryoptimizer in order to evaluate the cost of a complexspatial query and its execution procedure.Spatial queries addressed by users of SDBMS usuallyinvolve selection (point or range) and join operations. In theliterature, most efforts toward the analytical prediction ofthe performance of spatial data structures have focused onpoint and range queries [13], [19], [28], [11], [37], while onlyrecently on spatial join queries [17], [38]. Some proposalssupport both uniform-like and nonuniform data distribu-tions, which is an important advantage, keeping in mindthat modern database applications handle large amounts ofreal (usually nonuniform) multidimensional data.In this paper, we focus on the derivation of analyticalformulae for range and join queries based on R-trees [16],[4]; such models support data sets of various distributionsand make cost prediction based on data properties only.The proposed formulae are shown to be efficient for severaldistributions of synthetic and real data sets with the relativeerror being around 10-15 percent for any type of data setsused in our experiments.The rest of the paper is organized as follows: In Section 2,we provide background information about hierarchical treestructures for spatial data, in particular R-tree-based ones,and related work on cost analysis for R-tree-based methods.Section 3 presents cost models for the prediction of the R-tree performance for selection and join queries. In Section 4,comparison results of the proposed models are presentedwith respect to efficient R-tree implementations for differentdata distributions. An extended survey of related workappears in Section 5, while Section 6 concludes the paper.2BACKGROUNDMultidimensional data was first involved in geographicalapplications (GIS, cadastral systems, etc.), CAD, and VLSIIEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 12, NO. 1, JANUARY/FEBRUARY 2000 19. Y. Theodoridis is with the Computer Technology Institute, PO Box 1122,GR-26110, Patras, Hellas. E-mail: [email protected].. E. Stefanakis and T. Sellis are with the Department of Electrical andComputer Engineering, National Technical University of Athens, Zogra-fou 15773, Athens, Hellas.E-mail: {stefanak, timos}@dbnet.ece.ntua.gr.Manuscript accepted 28 Sept. 1999.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number 110165.1041-4347/00/$10.00 ß 2000 IEEEdesign. Recently, especially due to the development ofextensible DBMS [34], spatial data management techniqueshave been applied to a wide area of applications, fromimage and multimedia databases [9] to data mining andwarehousing [10], [31]. Fig. 1 is motivated by a geographicalapplication, where the database consists of several relations(or classes, etc.) about Europe. In particular, Fig. 1 illustratescapitals and regions of European countries (i.e., point andregion data).The most common queries on such databases includepoint or range queries on a specified relation (e.g., ªfind allcountries that contain a user-defined pointº or ªfind allcountries that overlap a user-defined query windowº) orjoin queries on pairs of relations (e.g., ªfind all pairs ofcountries and motorways that overlap each otherº).2.1 Spatial Queries and Spatial Data StructuresThe result of the select operation on a relation REL1using aquery window q contains those tuples in REL1, with thespatial attribute standing in some relation  to q. On theother hand, the result of the join operation between a relationREL1and a relation REL2contains those tuples in theCartesian product REL1 REL2, where the ith column ofREL1stands in some relation  to the jth column of REL2.In traditional

View Full Document


School:
Email:
New Password:
Confirm Password:

U of M CSCI 8715 - Efficient Cost Models for Spatial Queries Using R-Trees

Sign up for free to view:

Please select your school