DOC PREVIEW
USC CSCI 585 - Session16

This preview shows page 1-2-17-18-19-36-37 out of 37 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CSCI585CSCI585Nearest Neighbor QueriesInstructor: Cyrus ShahabiCSCI585CSCI5852Nearest Neighbor Search• Retrieve the nearest neighbor of query point Q• Simple Strategy:– convert the nearest neighbor search to range search.– Guess a range around Q that contains at least one object say O• if the current guess does not include any answers, increase range size until an object found.– Compute distance d’ between Q and O – re-execute the range query with the distance d’ around Q. – Compute distance of Q from each retrieved object. The object at minimum distance is the nearest neighbor!!!CSCI585CSCI585Naïve ApproachABCEFGHQuery Point QIssues: how to guess range? The retrieval may be sub-optimal if incorrect range guessed. Would be a problem in high dimensional spaces.B CEFG HA:B:C:4 5 6 1 2 310 11 127 89E F GH672314589101112CSCI585CSCI5854• Given a query location q, find the nearest object.• Depth First and Best-First Search using R-trees • Goal: avoid visiting nodes that cannot contain resultsqaCSCI585CSCI5855Basic Pruning Metric: MINDIST• Minimum distance between q and an MBR.• mindist(E1,q) is a lower bound of d(o,q) for every object o in E1.• If we have found a candidate NN p, we can prune every MBR whose mindist > d(p, q).E1pqmindist(E1,q)CSCI585CSCI585MINDIST Property• MINDIST is a lower bound of any k-NN distance(s1, s2)(t1, t2)(p1, p2)(p1, p2)(p1, p2)(p1, p2)(p1, p2)6CSCI585CSCI5857Depth-First (DF) NN AlgorithmRoussoulos et al., SIGMOD, 1995204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E71 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E721013Note: distances not actually storedinside nodes. Only for illustration5Main ideaStarting from the root visit nodesaccording to their mindist in depth-first mannerCSCI585CSCI5858DF Search – Visit E1204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E71 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E725CSCI585CSCI5859DF Search – Find Candidate NN aFirst Candidate NN: a with distance 1 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E725204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E75CSCI585CSCI58510DF Search – Backtrack to Root and Visit E2First Candidate NN: a with distance 1 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E725204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E75CSCI585CSCI58511DF Search – Find Actual NN iFirst Candidate NN: a with distance 1 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E725204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E7Actual NN: i with distance 25CSCI585CSCI58512Optimality204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E7• Question: Which is the minimal set of nodes that must be visited by any NN algorithm?• Answer: The set of nodes whose MINDIST is smaller than or equal to the distance between qand its NN (e.g., E1, E2, E6).CSCI585CSCI58513A Better Strategy for KNN search• A sorted priority queue based on MINDIST;• Nodes traversed in order;• Stops when there is an object at the top of the queue; (1-NN found)• k-NN can be computed incrementally;I/O optimalCSCI585CSCI585Priority QueueB CEFG HA:B:C:4 561 2 3 10 11 127 89E F GHABCE FC67231458911ABCEFGHQuery Point QC5 6 4 FH 5 G 6 4 F7 5 8 9G 6 4 F1NN141012CSCI585CSCI58515Best-First (BF) NN Algorithm (Optimal)Hjaltason and Samet, TODS, 1998• Keep a heap H of index entries and objects, ordered by MINDIST.• Initially, H contains the root.• While H ≠φ– Extract the element with minimum MINDIST• If it is an index entry, insert its children into H.• If it is an object, return it as NN.• End whileCSCI585CSCI58516BF Search – Visit rootE11E22Visit RootAction Heap204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E71 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E721013CSCI585CSCI58517BF Search – Visit E1E11E22Visit Rootfollow E1E22E53E55E94Action Heap204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E71 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E721013CSCI585CSCI58518BF Search – Visit E2E11E22Visit Rootfollow E1E22E53E55E94Action Heapfollow E2E26E53E55E94E137204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E71 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E721013CSCI585CSCI58519BF Search – Visit E6E11E22Visit Rootfollow E1E22E53E55E94Action Heapfollow E2E26E53E55E94E137follow E6j10i2E53E55E94E137k13204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E71 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E721013CSCI585CSCI58520DF Search – Find Actual NN iE11E22Visit Rootfollow E1E22E53E55E94Action Heapfollow E2E26E53E55E94E137follow E6Report i and terminatej10i2E53E55E94E137k13204 6810246810x axisy axisbcaE3defghijklmqueryE4E5E1E2E6E71 259 5213abcdeE1E2E3E4E5RootE1E2E3E4fghE5lmE7ijkE6E6E721013CSCI585CSCI58521Generalizations• Both DF and BF can be easily adapted to (i) extended (instead of point) objects and (ii) retrieval of k (>1) NN. • BF can be made incremental; i.e., it can report the NN in increasing order of distance without a given value of k.– Example: find the 10 closest cities to HK with population more than 1 million. We may have to retrieve many (>>10) cities around Hong Kong in order to answer the query.CSCI585CSCI58522Generalize to k-NN • Keep a sorted buffer of at most k current nearest neighbors• Pruning is done according to the distance of the furthest nearest neighbor in this buffer• Example:RThe k-th object in the bufferMINDISTPActual_distCSCI585CSCI585Another filter idea based on MBR Face Property• MBR is an n-dimensional Minimal Bounding Rectangle used in R trees, which is the minimal bounding n-dimensional rectangle bounds its corresponding objects. • MBR face property: Every face of any MBR contains at least one point of some object in the database. 23CSCI585CSCI585MBR Face Property – 2D24CSCI585CSCI585MBR Face Property – 3DRectangle R25CSCI585CSCI58526Improving the KNN Algorithm• While the MinDist based algorithm is I/O optimal, its performance may be further improved by pruning nodes from the priority queue.CSCI585CSCI585Properties of MINMAXDIST• MINMAXDIST(P,R) is the minimum over all dimensions distances from P to the furthest point of the closest face of R.• MINMAXDIST is the smallest possible upper bound of distances from the point P to the rectangle R.• MINMAXDIST guarantees there is an object within the R at a distance to P less than or equal to minmaxdist.• MINMAXDIST is an upper bound of the


View Full Document

USC CSCI 585 - Session16

Download Session16
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Session16 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Session16 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?