9/28/20091Reverse kNN search in Arbitrary DimensionalityInfoLab.usc.edu Geospatial Information Management (Fall 2009)Seyed Jalal KazemitabarOriginal paper by Y. Tao, D. Papadias, and X. LianNearest Neighbor QueriesWhat are the two nearest stars to Andromeda?InfoLab.usc.edu Geospatial Information Management (Fall 2009)Where is the nearest restaurant?Where is the nearest….Algorithms for finding NN Elementary methods:Search AlgorithmIndexing Data StructureNN solutionInfoLab.usc.edu Geospatial Information Management (Fall 2009) More advanced methods:BFDFS R-tree R*-treeSearch AlgorithmBranch & Bound MethodsIndexing Data StructureNN solutionMindistMaxdistMinmaxdistReverse Nearest Neighbors QueriesWhat are the fireplaces I’m nearest to?InfoLab.usc.eduWhich houses I’m the closest restaurant to?Geospatial Information Management (Fall 2009) A data point p is the reverse nearest neighbor of query point q, if there is no point p’ such that dist(p’, p)< dist(q, p), i.e. q is the NN of p.NN(p2)=NN(p3)=qRNN Definitionp2p3qVicinity circlesInfoLab.usc.eduRNN(q)= {p2, p3} In our example, p2,p3 are the houses for which q is the nearest restaurant Is RNN a symmetric relation?Geospatial Information Management (Fall 2009)p1p4p5Related WorkMain ideaRNN AlgorithmsPre- Filter/ InfoLab.usc.edu Geospatial Information Management (Fall 2009)MethodsMain ideacomputingKM YLrefinementSAA SFT9/28/20092 Original RNN methodFor all p:1. Pre-compute NN(p)2. Represent p as a vicinity circle3. Index the MBR of all circles by an R-tree(Named RNN-tree)KMp3p4p2qp1InfoLab.usc.edu()4. RNN(q)= all circles that contain q Needs two trees: RNN-tree & R-treeGeospatial Information Management (Fall 2009)R-treeMBRMBRRNN-treeMBR MBRp5 YL: Merges the trees What happens if we insert p5?1. RNN(p5)=?Find all points that have p5 as their new NN 2. Update the vicinity circles of those points in the index3. Compute NN(p5) and insert the correspondingYLp3p4p2p1InfoLab.usc.edup(p)pgcircle in the index Drawbacks?Geospatial Information Management (Fall 2009)R-treeMBRMBRRNN-treeMBR MBRRdNNtreeMBR MBRTechniques that rely on pre-processing cannot deal efficiently with updatesS1S2S3 Elimination of the need for pre-computing all NNs in filter/ refinement methods SAA: Divide the space around query into six equal regions Find NN(q) in all regions (candidate keys)SAAqp2p1p4FilterRefineInfoLab.usc.eduS6S4S5 Either (i) or (ii) holds for each candidate key p (i) p is in RNN(q) (ii) No RNN(q) in Si RNN(q)= {p6} Any Drawbacks?Geospatial Information Management (Fall 2009)p3p5p7p6RefineThe number of regions increases exponentially with the dimensionality1. Find the kNNs of the query q (k candidates)2. Eliminate the points that are closer to other candidates than q.3. Apply Boolean range queries to determine the actual RNNs A Boolean range query terminates as the first N1SFTqFilterRefinep1p2p3Boolean range for p2InfoLab.usc.edudata point is found Drawbacks?Geospatial Information Management (Fall 2009)p7p6p5p4Boolean range for p6False missesChoosing a proper k Concluding former methods:Dynamic dataArbitrary dimensionalityExact resultKM YLNYYInfoLab.usc.edu Geospatial Information Management (Fall 2009)KM, YLNoYesYesSAA Yes No YesSFT Yes Yes No Can p’ be closer to q than p can be?Half-plane pruningInfoLab.usc.edu If p1, p2,…, pnare n data points, then any node whose MBR falls inside Ui=1..n Plpi(p3 ,q) cannot contain any RNN result.Geospatial Information Management (Fall 2009)9/28/20093 Pruning an R-tree MBR:InfoLab.usc.edu Drawbacks?Geospatial Information Management (Fall 2009)processing time in terms of bisector trimming for computing Computation of intersections does not scale with dimensionality Approximating the residual MBRInfoLab.usc.edu Geospatial Information Management (Fall 2009) An MBR can be pruned if its residual region is empty The approximation is a superset of the real residual region We can prune an MBR if its approximate residual is emptyInfoLab.usc.edu Good news:Geospatial Information Management (Fall 2009)processing time for computing No more hyper-polyhedrons to make the intersection computation complexTPL Algorithm The big picture Uses best-first search Utilizes one R-tree as the data structure Includes filtering/ refinement phases Uses candidate points to prune entries Filters visited entries to obtain the set Scnd of candidates Adds pruned entries to set SrfnInfoLab.usc.edu Srfn is used in the refinement step to eliminate false hitsGeospatial Information Management (Fall 2009)Search AlgorithmBranch & Bound MethodsIndexing Data StructureRNN solutionTPL ExampleInfoLab.usc.edu * Figures of this example are obtained from [2]Geospatial Information Management (Fall 2009)Filtering stepN3N6N11N12N4N5N5N2N1N3N4N6N10N11N12p1p2p5p7p1p3p2p6contents omittedp6p8p4data R-treep3InfoLab.usc.eduAction Heap Scnd SrfnVisit root {N10, N11, N12}{} {}Geospatial Information Management (Fall 2009)qN1N2N10p5p7p5p7p1p3p2p6contents omitted....p4p89/28/20094N3N6N11N12N4N5N5N2N1N3N4N6N10N11N12p1p2p5p7p1p3p2p6contents omittedp6p8p4data R-treep3InfoLab.usc.edu Geospatial Information Management (Fall 2009)qN1N2N10p5p7p5p7p1p3p2p6contents omitted....p4p8Action Heap Scnd SrfnVisit N10{N3, N11,N2, N1,N12}{} {}InfoLab.usc.edu Geospatial Information Management (Fall 2009)Action Heap Scnd SrfnVisit N3{N11,N2, N1,N12}{p1} {p3}InfoLab.usc.edu Geospatial Information Management (Fall 2009)Action Heap Scnd SrfnVisit N11{N5,N2, N1,N12}{p1}{p3, N4,N6}InfoLab.usc.edu Geospatial Information Management (Fall 2009)Action Heap Scnd SrfnVisit N5{N2, N1,N12}{p1,p2}{p3, N4,N6, p6}InfoLab.usc.edu Geospatial Information Management (Fall 2009)Action Heap Scnd SrfnVisit N1{N12}{p1,p2,p5}{p3, N4,N6, p6,N2, p7}InfoLab.usc.edu Geospatial Information Management (Fall 2009)Action Heap Scnd Srfn{} {p1,p2,p5}{p3, N4,N6, p6,N2, p7,N12}9/28/20095Refinement Heuristics Let Prfn be the set of points and Nrfn be the set of nodes in Srfn A point p from Scnd can be discarded as a false hit if there is a point such that either of the following hold:(i)(ii) There is a node MBR such that InfoLab.usc.edu A candidate point can be eliminated if it is closer to another candidate point than to the query A point p from Scndcan be reported as an actual result if the following conditions hold:(i) There is no point such that(ii) For every node If none of the
View Full Document