CSCI585CSCI585Nearest Neighbor QueriesInstructor: Cyrus ShahabiCSCI585CSCI5852Nearest Neighbor Search• Retrieve the nearest neighbor of query point Q• Simple Strategy:– convert the nearest neighbor search to range search.– Guess a range around Q that contains at least one object say O• if the current guess does not include any answers, increase range size until an object found.– Compute distance d’ between Q and O – re-execute the range query with the distance d’ around Q. – Compute distance of Q from each retrieved object. The object at minimum distance is the nearest neighbor!!!CSCI585CSCI585Naïve ApproachABCEFGHQuery Point QIssues: how to guess range? The retrieval may be sub-optimal if incorrect range guessed. Would be a problem in high dimensional spaces.B CEFG HA:B:C:4 5 6 1 2 310 11 127 89E F GH672314589101112CSCI585CSCI5854• Given a query location q, find the nearest object.• Depth First and Best-First Search using R-trees • Goal: avoid visiting nodes that cannot contain resultsqaCSCI585CSCI5855Basic Pruning Metric: MINDIST• Minimum distance between q and an MBR.• mindist(E1,q) is a lower bound of d(o,q) for every object o in E1.• If we have found a candidate NN p, we can prune every MBR whose mindist > d(p, q).E1pqmindist(E1,q)CSCI585CSCI585MINDIST Property• MINDIST is a lower bound of any k-NN distance(s1, s2)(t1, t2)(p1, p2)(p1, p2)(p1, p2)(p1, p2)(p1, p2)6CSCI585CSCI5857Depth-First (DF) NN AlgorithmRoussoulos et al., SIGMOD, 1995204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E71 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E721013Note: distances not actually storedinside nodes. Only for illustration5Main ideaStarting from the root visit nodesaccording to their mindist in depth-first mannerCSCI585CSCI5858DF Search – Visit E1204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E71 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E725CSCI585CSCI5859DF Search – Find Candidate NN aFirst Candidate NN: a with distance 1 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E725204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E75CSCI585CSCI58510DF Search – Backtrack to Root and Visit E2First Candidate NN: a with distance 1 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E725204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E75CSCI585CSCI58511DF Search – Find Actual NN iFirst Candidate NN: a with distance 1 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E725204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E7Actual NN: i with distance 25CSCI585CSCI58512Optimality204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E7• Question: Which is the minimal set of nodes that must be visited by any NN algorithm?• Answer: The set of nodes whose MINDIST is smaller than or equal to the distance between qand its NN (e.g., E1, E2, E6).CSCI585CSCI58513A Better Strategy for KNN search• A sorted priority queue based on MINDIST;• Nodes traversed in order;• Stops when there is an object at the top of the queue; (1-NN found)• k-NN can be computed incrementally;I/O optimalCSCI585CSCI585Priority QueueB CEFG HA:B:C:4 561 2 3 10 11 127 89E F GHABCE FC67231458911ABCEFGHQuery Point QC5 6 4 FH 5 G 6 4 F7 5 8 9G 6 4 F1NN141012CSCI585CSCI58515Best-First (BF) NN Algorithm (Optimal)Hjaltason and Samet, TODS, 1998• Keep a heap H of index entries and objects, ordered by MINDIST.• Initially, H contains the root.• While H ≠φ– Extract the element with minimum MINDIST• If it is an index entry, insert its children into H.• If it is an object, return it as NN.• End whileCSCI585CSCI58516BF Search – Visit rootE11E22Visit RootAction Heap204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E71 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E721013CSCI585CSCI58517BF Search – Visit E1E11E22Visit Rootfollow E1E22E53E55E94Action Heap204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E71 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E721013CSCI585CSCI58518BF Search – Visit E2E11E22Visit Rootfollow E1E22E53E55E94Action Heapfollow E2E26E53E55E94E137204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E71 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E721013CSCI585CSCI58519BF Search – Visit E6E11E22Visit Rootfollow E1E22E53E55E94Action Heapfollow E2E26E53E55E94E137follow E6j10i2E53E55E94E137k13204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E71 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E721013CSCI585CSCI58520DF Search – Find Actual NN iE11E22Visit Rootfollow E1E22E53E55E94Action Heapfollow E2E26E53E55E94E137follow E6Report i and terminatej10i2E53E55E94E137k13204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E71 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E721013CSCI585CSCI58521Generalizations• Both DF and BF can be easily adapted to (i) extended (instead of point) objects and (ii) retrieval of k (>1) NN. • BF can be made incremental; i.e., it can report the NN in increasing order of distance without a given value of k.– Example: find the 10 closest cities to HK with population more than 1 million. We may have to retrieve many (>>10) cities around Hong Kong in order to answer the query.CSCI585CSCI58522Generalize to k-NN • Keep a sorted buffer of at most k current nearest neighbors• Pruning is done according to the distance of the furthest nearest neighbor in this buffer• Example:RThe k-th object in the bufferMINDISTPActual_distCSCI585CSCI585Another filter idea based on MBR Face Property• MBR is an n-dimensional Minimal Bounding Rectangle used in R trees, which is the minimal bounding n-dimensional rectangle bounds its corresponding objects. • MBR face property: Every face of any MBR contains at least one point of some object in the database. 23CSCI585CSCI585MBR Face Property – 2D24CSCI585CSCI585MBR Face Property – 3DRectangle R25CSCI585CSCI58526Improving the KNN Algorithm• While the MinDist based algorithm is I/O optimal, its performance may be further improved by pruning nodes from the priority queue.CSCI585CSCI585Properties of MINMAXDIST• MINMAXDIST(P,R) is the minimum over all dimensions distances from P to the furthest point of the closest face of R.• MINMAXDIST is the smallest possible upper bound of distances from the point P to the rectangle R.• MINMAXDIST guarantees there is an object within the R at a distance to P less than or equal to minmaxdist.• MINMAXDIST is an upper bound of the
View Full Document