The R*-tree: An Efficient and Robust Access Method for Points and Rectangles+ Norbert Beckmann, Hans-Peter begel Ralf Schneider, Bernhard Seeger Praktuche Informatlk, Umversltaet Bremen, D-2800 Bremen 33, West Germany Abstract The R-tree, one of the most popular access methods for rectangles, IS based on the heurlstlc optlmlzatlon of the area of the enclosmg rectangle m each mner node By running numerous experiments m a standardized testbed under highly varying data, queries and operations, we were able to design the R*-tree which mcorporates a combined optlmlzatlon of area, margin and overlap of each enclosmg rectangle m the directory Using our standardized testbed m an exhaustive performance comparison, It turned out that the R*-tree clearly outperforms the exlstmg R-tree varmnts Guttman’s linear and quadratic R-tree and Greene’s variant of the R-tree This superlorlty of the R*-tree holds for different types of queries and operations, such as map overlay. for both rectangles and multldlmenslonal points m all experiments From a practical pomt of view the R*-tree 1s very attractive because of the followmg two reasons 1 It effrclently supports pomt and spattal data at the same time and 2 Its lmplementatlon cost 1s only slightly higher than that of other R-trees l.Introduction In this paper we will consider spatial access methods (SAMs) which are based on the approxlmatlon of a complex spatial object by the mmlmum boundmg rectangle with the sides of the rectangle parallel to the axes of the data space yIp--- + This work was supported by grant no Kr 670/4-3 from the Deutsche Forschun&iememschaft (German Research Society) and by the Mmlstry of Environmental and Urban Planning of Bremen Pemxss~on to copy wthout fee all or part of this maternal IS granted prowded that the copses are not made or dlstnbuted for dwzct commeraal advantage, the ACM copy&t notice and the title of the pubbcatlon and its date appear, and notw IS gwn that cqymg II by pemuwon of the Assoaatlon for Computmg Machmq To copy othemw, or to repubbsh requ,res a fee and/or speoflc pemllsslon 0 1990 ACM 089791365 5/!90/0@35/0322 $150 The most important property of this simple approxlmatlon 1s that a complex object 1s represented by a limited number of bytes Although a lot of mformatlon 1s lost, mnumum bounding rectangles of spatial oblects preserve the most essential geometric properties of the object, 1 e the location of the oblect and the extension of the object in each axis In [SK 881 we showed that known SAMs organlzmg (mmlmum bounding) rectangles are based on an underlymg point access method (PAM) using one of the followmg three techniques cllpplng, transformation and overlapping regions The most popular SAM for storing rectangles 1s the R- tree [Gut 841 Followmg our classlflcatlon, the R-tree 1s based on the PAM B+-tree [Knu 731 usmg the technique over-lapping regions Thus the R-tree can be easily implemented which considerably contributes to Its popularity The R-tree 1s based on a heurlstlc optlmlzatlon The optlmlzatton crlterlon which It persues, 1s to mmlmlze the area of each enclosing rectangle m the mner nodes This crlterlon IS taken for granted and not shown to be the best possible Questions arise such as Why not mnumlze the margin or the overlap of such mlnlmum bounding rectangles Why not optimize storage utlllzatlon? Why not optunlze all of these criteria at the same hme? Could these criteria mteract in a negative way? Only an engineering approach will help to find the best possible combmatlon of optimization criteria Necessary condltlon for such an engmeermg approach 1s the avallablhty of a standardized testbed which allows us to run large volumes of experiments with highly varying data, queries and operations We have implemented such a standardized testbed and used It for performance comparisons parucularly of pomt access methods [KSSS 891 As the result of our research we designed a new R-tree varmnt, the R*-tree, which outperforms the known R-tree variants under all experiments For many reallstlc profiles of data and operations the gam m performance 1s quite considerable Additionally to the usual point query, 322rectangle mtersectton and rectangle enclosure query, we have analyzed our new R*-tree for the map overlay operation. also called spatial lout. which 1s one of the most rmportant operatrons m geographic and envrronmental database systems This paper is organized as follows In sectron 2, we tntroduce the prrncrples of R-trees rncludrng their optimizatron criteria In section 3 we present the existing R-tree variants of Guttman and Greene Section 4 describes rn detail the design our new R*-tree The results of the comparrsons of the R*-tree wrth the other R-tree varmnts are reported m section 5 Section 6 concludes the paper 2. Principles of R-trees and possible optimization criteria An R-tree 1s a B+-tree like structure which stores multrdrm- ensional rectangles as complete ObJects without clipping them or transformmg them to higher drmensronal points before A non-leaf node contarns entries of the form (cp, Rectangle) where cp 1s the address of a child node m the R-tree and Rectangle 1s the mnumum boundmg rectangle of all rectangles which are entries m that child node A leaf node contams entries of the form (Old, Rectangle) where Old refers to a record m the database, describing a spatial oblect and Rectangle 1s the enclosrng rectangle of that spatial oblect Leaf nodes containing entries of the form (datuob.tect, Rectangle) are also possrble This wrll not affect the basic structure of the R-tree In the followmg we wrll not consider such leaf nodes Let M be the maximum number of entries that will fit m one node and let m be a parameter specrfymg the mrmmum number of entries m a node (2 5 m < M/2) An R-tree satisfies the following properties l The root has at least two children unless rt 1s a leaf l Every non-leaf node has between m and M children unless it is the
View Full Document