Unformatted text preview:

Top-k Spatial JoinsManli Zhu, Dimitris Papadias, Jun Zhang, and Dik Lun LeeAbstract—Given two spatial data sets A and B, a top-k spatial join retrieves the k objects from A or B that intersect the largest numberof objects from the other data set. Depending on the application requirements, there exist several variations of the problem. Forinstance, B may be a point data set, and the goal may be to retrieve the regions of A that contain the maximum number of points. Theprocessing of such queries with conventional spatial join algorithms is expensive. However, several improvements are possible basedon the fact that we only require a small subset of the result (instead of all intersection/containments pairs). In this paper, we proposeoutput-sensitive algorithms for top-k spatial joins that utilize a variety of optimizations for reducing the overhead.Index Terms—Database, spatial database, spatial joins.æ1INTRODUCTIONAspatial join retrieves the pairs of objects that satisfysome spatial predicate (most often intersection). Joinshave been studied quite extensively in the spatial databaseliterature because they are involved in several commonqueries (e.g., map overlays) and they incur high executioncost, which calls for performance optimizations. Conse-quently, various algorithms have been proposed (seeSection 2), covering all possible combinations of indexedand nonindexed data sets. In most cases, the result of aspatial join is rather large (for typical geographic layers,linear to the cardinality of the participating data sets). Inseveral applications, however, the user may be interested inonly a small subset of the output. This paper studies such aquery type, called the top-k spatial join (top-k SJ).Given two data sets A and B, the top-k SJ retrieves thek objects in data set A or B that intersect the maximumnumber of objects from the other data set. For example, inVLSI design, a circuit layer consists of numerous wires,represented as rectangles. When two layers are placedtogether, the intersection between the wires of differentlayers may cause electro-magnetic interference. A top-kspatial join can be applied to retrieve the wires intersectingthe largest number of wires from the other layer. The resultindicates the positions where the topology of the circuit canbe improved to reduce interference.Similarly, the top-k spatial semijoin returns the k objects ofA with the maximum intersection or containment counts. Forexample, consider a traffic supervision system, where dataset A represents urban regions and B stands for vehiclelocations monitored through sensor networks [7]. In orderto balance the traffic, we should have knowledge about theregions with the maximum number of cars. Similar queriesare relevant for systems monitoring mobile devices (e.g.,find the area with the highest user density) and otherspatial decision making applications.Top-k spatial joins have some similarity with multi-dimensional range aggregate queries. Given a set A and awindow q in a multidimensional vector space, a rangeaggregate (RA) query returns a single value that sum-marizes the subset R  A of objects intersecting (or inside) qaccording to an aggregation function (e.g., the number ofobjects qualifying q instead of their concrete ids). Severaltechniques [8], [14], [24], [35] have been proposed or theefficient processing of RA queries in spatial databases (see[34] for an overview of related work). However, theapplication of these techniques to top-k spatial (semi) joinswould incur large computational overhead because thenumber of RA queries increases linearly with the cardinal-ities jAj and jBj of the participating data sets. In particular,the processing of a top-k semijoin requires the application ofjAj RA queries (one for each object of A), while a full joinrequires jAjþjBj RA queries.Top-k joins in relational databases retrieve the k tuples ofthe join result with the highest value of a scoring function.However, existing relational methods (e.g., [13], [22]) areinapplicable to top-k SJ for the same reasons that equi-joinalgorithms are inapplicable to spatial joins (a lack of one-dimensional ordering of tuples, intersection as opposed toequality join condition, etc.). A naı¨ve approach to evaluatetop-k SJ is to: 1) apply a conventional spatial join algorithmon the two data sets A and B, 2) count the number of outputpairs in which each object participates, and 3) return thek objects with the maximum intersection counts. However,this method may cause a significant amount of redundantwork, especially if the value of k is small.Assuming that, in practice, the number k of requestedobjects is very small compared to the entire spatial joinresult, we propose a set of output-sensitive algorithms thatapply the branch-and-bound framework for pruning thesearch space. Following most of the related work in thespatial database literature, we consider that the spatial datasets are indexed by R-trees [9] (or their variations [2], [31]),but the general methodology is applicable to any data-partition method (e.g., X-trees [3]). The remainder of theIEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO. 4, APRIL 2005 567. M. Zhu, D. Papadias, and D.L. Lee are with the Department of ComputerScience, Hong Kong University of Science and Technology, Clear WaterBay, Hong Kong. E-mail: {cszhuml, dimitris, dlee}@cs.ust.hk.. J. Zhang is with the Division of Information Systems, ComputerEngineering School, Nanyang Technological University, Singapore.E-mail: [email protected] received 4 Dec. 2003; revised 9 June 2004; accepted 27 Aug. 2004;published online 17 Feb. 2005.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TKDE-0249-1203.1041-4347/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Societypaper is structured as follows: Section 2 reviews the relatedwork, focusing mostly on spatial joins. Section 3 proposesthe basic top-k SJ algorithm and discusses its properties.Section 4 describes several optimizations for reducing theI/O overhead, and studies variations of the problem.Section 5 contains an experimental evaluation of theproposed techniques and Section 6 concludes the paper.2RELATED WORKEarly spatial join methods apply transformation of objectsin order to overcome the difficulties raised by their spatialextent and dimensionality. The first known algorithm [23]uses a grid to regularly divide the multidimensional


View Full Document

U of M CSCI 8715 - Top-k Spatial Joins

Documents in this Course
Load more
Download Top-k Spatial Joins
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Top-k Spatial Joins and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Top-k Spatial Joins 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?