SWARTHMORE CS 97 - The Priority R-Tree- A Practically Efficient and Worst-Case Optimal R-Tree - D1006529

Home> Schools> Swarthmore College> (CS) > CS 97> The Priority R-Tree- A Practically Efficient and Worst-Case Optimal R-Tree

DOC PREVIEW

SWARTHMORE CS 97 - The Priority R-Tree- A Practically Efficient and Worst-Case Optimal R-Tree

School name Swarthmore College

Course Cs 97- Computer Perception

Pages 12

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

The Priority R-Tree: A Practically Efficientand Worst-Case Optimal R-TreeLars Arge∗Department of Computer ScienceDuke University, Box 90129Durham, NC [email protected] de BergDepartment of Computer ScienceTU Eindhoven, P.O.Box 5135600 MB EindhovenThe [email protected] J. Haverkort†Institute of Informationand Computing SciencesUtrecht University, PO Box 80 0893508 TB Utrecht, The [email protected] Yi∗Department of Computer ScienceDuke University, Box 90129Durham, NC 27708-0129, [email protected] present the Priority R-tree, or PR-tree, which is the firstR-tree variant that always answers a window query usingO((N/B)1−1/d+ T/B) I/Os, where N is the number of d-dimensional (hyper-) rectangles stored in the R-tree, B isthe disk block size, and T is the output size. This is provablyasymptotically optimal and significantly better than other R-tree variants, where a query may visit all N/B leaves inthe tree even when T = 0. We also present an extensiveexperimental study of the practical performance of the PR-tree using both real-life and synthetic data. This study showsthat the PR-tree performs similar to the best known R-treevariants on real-life and relatively nicely distributed data, butoutperforms them significantly on more extreme data.1. INTRODUCTIONSpatial data naturally arise in numerous applications, in-cluding geographical information systems, computer-aideddesign, computer vision and robotics. Therefore spatialdatabase systems designed to store, manage, and manipulatespatial data have received considerable attention over theyears. Since these databases often involve massive datasets,disk based index structures for spatial data have been re-searched extensively—see e.g. the survey by Gaede and G¨un-∗Supported in part by the National Science Foundationthrough RI grant EIA–9972879, CAREER grant CCR–9984099, ITR grant EIA–0112849, and U.S.–Germany Co-operative Research Program grant INT–0129182.†Supported by the Netherlands’ Organization for ScientificResearch (NWO).Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage, and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SIGMOD 2004 June 13-18, 2004, Paris, France.Copyright 2004 ACM 1-58113-859-8/04/06 . . . $5.00.ther [11]. Especially the R-tree [13] and its numerous vari-ants (see e.g. the recent survey by Manolopoulos et al. [19])have emerged as practically efficient indexing methods. Inthis paper we present the Priority R-tree, or PR-tree, whichis the first R-tree variant that is not only practically efficientbut also provably asymptotically optimal.1.1 Background and previous resultsSince objects stored in a spatial database can be rathercomplex they are often approximated by simpler objects,and spatial indexes are then built on these approximations.The most commonly used approximation is the minimalbounding box: the smallest axis-parallel (hyper-) rectanglethat contains the object. The R-tree, originally proposed byGuttman [13], is an index for such rectangles. It is a height-balanced multi-way tree similar to a B-tree [5, 9], whereeach node (except for the root) has degree Θ(B). Each leafcontains Θ(B) data rectangles (each possibly with a pointerto the original data) and all leaves are on the same level ofthe tree; each internal node v contains pointers to its Θ(B)children, as well as for each child a minimal bounding boxcovering all rectangles in the leaves of the subtree rooted inthat child. If B is the number of rectangles that fits in adisk block, an R-tree on N rectangles occupies Θ(N/B) diskblocks and has height Θ(logBN). Many types of queriescan be answered efficiently using an R-tree, including thecommon query called a window query: Given a query rect-angle Q, retrieve all rectangles that intersect Q. To answersuch a query we simply start at the root of the R-tree andrecursively visit all nodes with minimal bounding boxes in-tersecting Q; when encountering a leaf l we report all datarectangles in l intersecting Q.Guttman gave several algorithms for updating an R-treein O(logBN) I/Os using B-tree-like algorithms [13]. Sincethere is no unique R-tree for a given dataset, and becausethe window query performance intuitively depends on theamount of overlap between minimal bounding boxes in thenodes of the tree, it is natural to try to minimize boundingbox overlap during updates. This has led to the developmentof many heuristic update algorithms; see for example [6, 16,23] or refer to the surveys in [11, 19]. Several specializedalgorithms for bulk-loading an R-tree have also been devel-oped [7, 10, 12, 15, 18, 22]. Most of these algorithms useO(NBlogM/BNB) I/Os (the number of I/Os needed to sortN elements), where M is the number of rectangles that fitsin main memory, which is much less than the O(N logBN)I/Os needed to build the index by repeated insertion. Fur-thermore, they typically produce R-trees with better spaceutilization and query performance than R-trees built usingrepeated insertion. For example, while experimental resultshave shown that the average space utilization of dynamicallymaintained R-trees is between 50% and 70% [6], most bulk-loading algorithms are capable of obtaining over 95% spaceutilization. After bulk-loading an R-tree it can of coursebe updated using the standard R-tree updating algorithms.However, in that case its query efficiency and space utiliza-tion may degenerate over time.One common class of R-tree bulk-loading algorithms workby sorting the rectangles according to some global one-di-mensional criterion, placing them in the leaves in that order,and then building the rest of the index bottom-up level-by-level [10, 15, 18]. In two dimensions, the so-called packedHilbert R-tree of Kamel and Faloutsos [15], which sorts therectangles according to the Hilbert values of their centers,has been shown to be especially query-efficient in practice.The Hilbert value of a point p is the length of the frac-tal Hilbert space-filling curve from the origin to p. TheHilbert curve is very good at clustering spatially close rect-angles together, leading to a good index. A variant of thepacked Hilbert R-tree, which also takes

View Full Document