DOC PREVIEW
USC CSCI 585 - nn

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Nearest Neighbor Queries *Nick Roussopoulos Stephen Kelley Fr6d6ric VincentDepartment of Computer ScienceUniversity of MarylandCollege Park, MD 20742AbstractA frequently encountered type of query in GeographicInformation Systems is to find the k nearest neighborobjects to a given point in space. Processing such queriesrequires substantially different search algorithms than thosefor location or range queries. In this paper we present anefficient branch-and-bound R-tree traversaJ algorithm to findthe nearest neighbor object to a point, and then generalize itto finding the k nearest neighbors. We also discuss metricsfor an optimistic and a pessimistic search ordering strategyas well as for pruning. Finally, we present the results ofseveral experiments obtained using the implementation ofour algorithm and examine the behavior of the metrics andthe scalabfity of the algorithm.1INTRODUCTIONThe efficient implementation of Nearest Neighbor (NN)queries is of a particular interest in Geographic Informa-tion Systems (GIS). For example, a user may point to aspecific location or an object on the screen, and requestthe system to find the five nearest objects to it in thedatabase. Another situation where NN query is usefulis when the user is not familiar withthe layout of thespatial objects. In the case of an astrophysics database,finding the nearest star to a given point in the sky couldinvolve multiple unsuccessful searches with varying win-dow sizes if we were to use a more traditional 2D rangequery. Another even more complex query that could behandled by an NN technique is to find the four neareststars which are at least ten light-years away.The versatility of k nearest neighbors search increasessubstantially if we consider all variations of it, such asthe k furthest neighbors, or when it is combined with*This research was sponsored partially by the National ScienceFoundation under grant BIR 9318183, by ARPA under contract003195 Ve100ID, and by NASA/USRA under contract 5555-09.Permission to copy without fee all or part of this material isgranted provided that the copies are not made or distributed fordirect commercial advantage, the ACM copyright notice and thetitle of the publication and its date appear, and notice is giventhat copyin is by permission of the Association of Computing?Machinery. o copy otherwise, or to republish, requiresa fee and/or specific permission.SIGMOD’95,San Jose, CA USAQ 1995 ACM 0-89791-731 -6/95/0005.. $3.50other spatial queries such as find the k NN to the East ofa location, or even spatial joins with NN join predicate,such as find the three closest restaurants for each of twodifferent movie theaters,Efficient processing of NN queries requires spatialdata structures which capitalize on the proximity ofthe objects to focus the search of potential neighborsonly, There is a wide variety of spatial access methods[Same89]. However, very few have been used for NN.In [Same90], heuristics are provided to find objects inquadtrees. The exact k-NN problem, is also posedfor hierarchical spatial data structures such as the PMquadtree. The proposed solution is a top-down recursivealgorithm which first goes down the quadtree, exploringthe subtree that contains the query point, in orderto get a first estimate of the NN location.Thenit backtracks and explores remaining subtrees whichpotentially contain NN until no subtree needs be visited.In [FBF77] a NN algorithm for k-d-trees was proposedwhich was later refined in [Spro91].R-trees [Gutt84], Packed R-trees [Rous85], [Kame94],R-tree variations [SRF87], [Beck90] have been primarilyused for overlap/containment range queries and spatialjoin queries [BKS93] based on overlap/containment.In this paper, we provide an efficient branch-and-bound search algorithm for processing exact k-NNqueries for the R-trees, introduce several metrics forordering and pruning the search tree, and performseveral experiments on synthetic and real-world datato demonstrate the performance and scalability of ourapproach. To the best of our knowledge, neither NNalgorithms have been developed for R-trees, nor similarmetrics for NN search. We would also like to point outthat, although the algorithm and these metrics are inthe context of R-trees, they are directly applicable toall other spatial data structures.Section 2 of the paper contains the theoreticalfoundation for the nearest neighbor search. Section 3describes the algorithm and the metrics for orderingthe search and pruning during it. Section 4 has theexperiments with the implementation of the algorithm.The conclusion is in section 5.712NEARESTNEIGHBOR SEARCHUSING R-TREESR-trees were proposed as a natural extension of B-trees in higher than one dimensions[Gutt84]. Theycombine most of the nice features of both B-trees andquadtrees. Like B-trees, they remain balanced, whilethey maintain the flexibility of dynamically adjustingtheir grouping to deal with eitherdead-space or denseareas, like the quad trees do. The decomposition used inR-trees is dynamic, driven by the spatial data objects.And with appropriate split algorithms, if a region ofan n-dimensional space includes dead-space, no entry inthe R-tree will be introduced.Leaf nodes of the R-tree cent ain entries of the form(RECT, oid) where oid is an object-identifier and isused as a a pointer to a data object andRECT isan n-dimensional Minimal Bounding Rectangle (MBR)which bounds the corresponding object. For example,in a 2-dimensional space, an entry RECTwill be ofthe form (z~~~, z~ig~, y[~~, yhigh ) whichrepresents thecoordinates of the lower-left and upper-right corner ofthe rectangle. The possibly composite spatial objectsstored at the leaf level are considered atomic, and arenot further decomposed into their spatial primitives,i.e. quadrants, triangles, trapezoids, line segments, orpixels. Non-leaf R-tree nodes contain entries of the form(RECT, p)where p is a pointer to a successor node inthe next level of the R-tree, andREC’T is a minimalrectangle which bounds all the entries in the descendentnode.The termbranching factor (or fan-out) can be used tospecify the maximum number of entries that a node canhave; each node of an R-tree with branching factor fifty,for example, points to a maximum of fifty descendantsor leaf objects. To illustrate the way an R-tree is definedon some space, Figure 1 shows a collection of rectanglesand Figure 2 the corresponding tree.Performanceof an R-tree search is measured by the number ofdisk accesses (reads)


View Full Document

USC CSCI 585 - nn

Download nn
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view nn and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view nn 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?