UMD CMSC 420 - Generalized, incremental NN, range searching, kd-tree variants

Unformatted text preview:

kd-Trees ContinuedGeneralized, incremental NN, range searching, kd-tree variantskd-tree Variants•How do you pick the cutting dimension?-kd-trees cycle through them, but may be better to pick a different dimension-e.g. Suppose your 3d-data points all have same Z-coordinate in a give region:•How do you pick the cutting value?-kd-trees pick a key value to be the cutting value, based on the order of insertion-optimal kd-trees: pick the key-value as the median-Don’t need to use key values => like PR Quadtrees => PR kd-trees•What is the size of leaves?-if you allow more than 1 key in a cell: bucket kd-trees•kd-trees: discriminator = (hyper)plane; quadtrees (and higher dim) discriminator complexity grows with dSliding Midpoint kd-trees•PR kd-tree: split in the midpoint, along the current cutting dimension•May result in trivial splits: if all points lie to one side of the median•Solution: if you get a trivial split, slide the split so that it cuts off at least one point:Sliding midpoint kd-treeAvoids empty cellsTends to put boundaries around bounding boxes of clusters of pointskd-Trees vs. Quadtrees, another viewgp1 p1c3 c4c2c1Consider a 3-d data set Octtreekd-treekd-tree splits the decision up over d levels don’t have to represent levels (pointers) that you don’t needQuadtrees: one point determines all splitskd-trees: flexibility in how splits are chosenxyzPath-compressed PR kd-treesab ef gdcxyxegfabcdhhXXXEmpty subtreesEmpty regionsaefdhPath compressed PR kd-trees=LLs=LStrings of Ls and Rs tell the decisions skipped that would lead to this nodeGeneralized Nearest Neighbor Search•Saw last time: nearest neighbor search in kd-trees.•What if you want the k-nearest neighbors?•What if you don’t know k?-E.g.: Find me the closest gas station with price < $3.25 / gallon.-Approach: go through points (gas stations) in order of distance from me until I find one that meets the $ criteria•Need a NN search that will find points in order of their distance from a query point q.•Same idea as the kd-tree NN search, just more generalGeneralized NN Search•A feature of all spatial DS we’ve seen so far: decompose space hierarchically. No matter what the DS, we get something like this:e1e2 e3e6 e7e5e4Let the items in the hierarchy be e1,e2,e3...Items may represent points, or bounding boxes, or ...Let Type(e) be an abstract “type” of the object: we use the type to determine which distance function to use E.g: if Type = “bounding box” then we’d use the point-to-rectangle distance function.A concrete example: in a Quadtree: internal nodes have type “bounding box”Leaves would have type “point”e8Generalized, Incremental NNHeapInsert(H, root, 0)while not Empty(H): e := ExtractMin(H) if IsLeaf(e): output e as next nearest else foreach c in Children(e): t = Type(c) HeapInsert(H, c, dt(q,c))Let IsLeaf(), Children(), and Type() represent the decomposition treeLet dt(q,et) be the distance function appropriate to compare points with elements of type t.Idea: keep a priority queue that contains all elements visited so far (points, bounding boxes)Priority queue (heap) is ordered by distance to the query pointWhen you dequeue a point (leaf), it will be the next closestdt(q,c) may be the distance to the bounding box represented by c, e.g.BSASIncremental, Generalized NN ExampleHeapInsert(H, root, 0)while not Empty(H): e := ExtractMin(H) if IsLeaf(e): output e as next nearest else foreach c in Children(e): t = Type(c) HeapInsert(H, c, dt(q,c))TLTRTBQcabcqTQSAQabL,R = left, rightA,B = above, belowSome spatial data structure:It’s spatial decomposition (NOT the actual data structure)BSASIncremental, Generalized NN ExampleHeapInsert(H, root, 0)while not Empty(H): e := ExtractMin(H) if IsLeaf(e) && IsPoint(e): output e as next nearest else foreach c in Children(e): t = Type(c) HeapInsert(H, c, dt(q,c))TLTRTBQcabcqTQSAQabL,R = left, rightA,B = above, belowSome spatial data structure:Its spatial decomposition (NOT the actual data structure)H = []H = [T]H = [LT RT]H = [AQ RT BQ]H = [RT BQ]H = [BS AS BQ ]H = [AS a BQ ]H = [c a BQ]H = [c a b]H = [a b]H = [b]H = []Range SearchingCMSC 420Range Searching in kd-trees•Range Searches: another extremely common type of query.•Orthogonal range queries:-Given axis-aligned rectangle-Return (or count) all the points inside it•Example: find all peoplebetween 20 and 30 years oldwho are between 5’8” and 6’ tall.Range Searching in kd-trees•Basic algorithmic idea:-traverse the whole tree, BUT• prune if bounding box doesn’t intersect with Query• stop recursing or print all points in subtree if bounding box is entirely inside QuerykRange Searching Exampleabcdiefghjabdh fgijkIf query box doesn’t overlap bounding box, stop recursionIf bounding box is a subset of query box, report all the points in current subtreeIf bounding box overlaps query box, recurse left and right.mmecRange Query Count PseudoCodedef RangeQueryCount(Q, T): if T == NULL: return 0 if BB(T) doesn’t overlap Query: return 0 if Query subset of BB(T): return T.size count = 0 if T.data in Query: count++ count += RangeQuery(Q, t.left) count += RangeQuery(Q, t.right) return count(For clarity, omitting the cutting dimension, and the BB(T) parameters that would be passed into the function)Range Query PseudoCodedef RangeQuery(Q, T): if T == NULL: return empty_set() if BB(T) doesn’t overlap Query: return 0 if Query subset of BB(T): return AllNodesUnder(T) set = empty_set() if T.data in Query: set.union({T.data}) set.union(RangeQuery(Q, T.left)) set.union(RangeQuery(Q, T.right)) return setExpected # of Nodes to Visit•Completely process a node only if query box intersects bounding box of the node’s cell:•In other words, one of the edges of Q must cut through the cell.•# of cells a vertical line will pass through ≥ the number of cells cut by the left edge of Q.•Top, bottom, right edges are the same, so bounding # of cells cut by a vertical line is sufficient.Cell uQ# of Stabbed Nodes = O(√n)acbConsider a node a with cutting dimension = xVertical line can intersect exactly one of a’s children (say c)But will intersect both of c’s children.Thus, line will intersect at most 2 of a’s grandchildren.abc2112343 4# of Stabbed Nodes = O(√n)So: you at most double # of cut nodes every 2 levelsIf kd-tree is balanced, has O(log n) levelsCells cut = 2(log n)/2 = 2log √n = √nabc213


View Full Document

UMD CMSC 420 - Generalized, incremental NN, range searching, kd-tree variants

Download Generalized, incremental NN, range searching, kd-tree variants
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Generalized, incremental NN, range searching, kd-tree variants and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Generalized, incremental NN, range searching, kd-tree variants 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?