CSCI585 Spatial Index Structures Instructor Cyrus Shahabi Outline CSCI585 Grid File Z ordering Hilbert Curve Quad Tree PM PR R Tree next session R Tree R Tree CSCI585 Grid File Hashing methods for multidimensional points extension of Extensible hashing Idea Use a grid to partition the space each cell is associated with one page Two disk access principle exact match CSCI585 Grid File Start with one bucket for the whole space Select dividers along each dimension Partition space into cells Dividers cut all the way CSCI585 Grid File Each cell corresponds to 1 disk page Many cells can point to the same page Cell directory potentially exponential in the number of dimensions CSCI585 Grid File Implementation Dynamic structure using a grid directory Grid array a 2 dimensional array with pointers to buckets this array can be large disk resident G 0 nx 1 0 ny 1 Linear scales Two 1 dimensional arrays that used to access the grid array main memory X 0 nx 1 Y 0 ny 1 Example CSCI585 Buckets Disk Blocks Grid Directory Linear scale Y Linear scale X CSCI585 The scales do not need to be uniform Y1 Y2 Y3 X1 X2 X3 X4 Scales Grid file buckets Grid File Search CSCI585 Exact Match Search at most 2 I Os assuming linear scales fit in memory First use liner scales to determine the index into the cell directory access the cell directory to retrieve the bucket address may cause 1 I O if cell directory does not fit in memory access the appropriate bucket 1 I O Range Queries use linear scales to determine the index into the cell directory Access the cell directory to retrieve the bucket addresses of buckets to visit Access the buckets CSCI585 Grid File Insertions Determine the bucket into which insertion must occur If space in bucket insert Else split bucket how to choose a good dimension to split ans create convex regions for buckets If bucket split causes a cell directory to split do so and adjust linear scales insertion of these new entries potentially requires a complete reorganization of the cell directory expensive CSCI585 Grid File Deletions Deletions may decrease the space utilization Merge buckets We need to decide which cells to merge and a merging threshold Buddy system and neighbor system A bucket can merge with only one buddy in each dimension Merge adjacent regions if the result is a rectangle CSCI585 One dimensional embedding Example Z Ordering or bit interleaving Basic assumption Finite precision in the representation of each coordinate K bits 2K values The address space is a square image and represented as a 2K x 2K array Each element is called a pixel Z Ordering CSCI585 Find a linear order for the cells of the grid while maintaining locality i e cells close to each other in space are also close to each other in the linear order Define this order recursively for a grid that is obtained by hierarchical subdivision of space 11 1 01 11 10 1110 01 0 00 0 10 00 1 00 01 10 11 Z ordering CSCI585 Impose a linear ordering on the pixels of the image 1 dimensional problem A ZA shuffle xA yA shuffle 01 11 11 0111 7 10 ZB shuffle 01 01 0011 10 01 00 00 01 10 11 B CSCI585 Z ordering Given a point x y and the precision K find the pixel for the point and then compute the z value Given a set of points use a B tree to index the z values A range rectangular query in 2 d is mapped to a set of ranges in 1 d Queries CSCI585 Find the z values that contained in the query and then the ranges QA QA range 4 7 11 QB ranges 2 3 and 8 9 10 01 00 00 01 10 11 QB CSCI585 Hilbert Curve We want points that are close in 2d to be close in the 1d Note that in 2d there are 4 neighbors for each point where in 1d only 2 Z curve has some jumps that we would like to avoid Hilbert curve avoids the jumps Hilbert Curve example CSCI585 It has been shown that in general Hilbert is better than the other space filling curves for retrieval Hi order i Hilbert curve for 2ix2i array 15 12 11 14 13 8 10 9 N 1 1 0 2 3 7 4 N 2 N 3 N 4 6 5 H 0 2N 1 d 0 2Nd 1 H V Jagadish Linear Clustering of Objects with Multiple Atributes Atributes ACM SIGMOD Conference 1990 332332 342 CSCI585 Quad Trees Region Quadtree The blocks are required to be disjoint Have standard sizes squares whose sides are power of two At standard locations Based on successive subdivision of image array into four equal size quadrants If the region does not cover the entire array subdivide into quadrants sub quadrants etc A variable resolution data structure CSCI585 Example of Region Quadtree 2 33 4 5 A 1 NW 1 6 11 11 7 8 13 2 15 16 17 18 B SW SE C E 14 9 10 12 NE 3 4 6 D 11 12 5 13 14 F 19 19 7 8 9 10 15 16 17 18 CSCI585 PR Quadtree PR Point Region quadtree Regular decomposition similar to Region quadtree Independent of the order in which data points are inserted into it if two points are very close decomposition can be very deep CSCI585 Example of PR Quadtree 0 100 100 100 Toronto 62 77 A Buffalo 82 65 Seattle 1 55 Atlanta 85 15 Mobile 52 10 Miami 90 5 0 0 E Seattle 1 55 Denver 5 45 Omaha 27 35 C B Chicago 35 42 F D Toronto 62 77 Buffalo Denver 82 65 5 45 Mobile 52 10 100 0 Subdivide into quadrants until the two points are located in different regions Chicago 35 42 Omaha 27 35 Miami 90 5 Atlanta 85 15 PM Quadtree CSCI585 PM Polygonal Map quadtree family PM1 quadtree PM2 quadtree PM3 quadtree PMR quadtree etc PM1 quadtree Based on regular decomposition of space Vertex based implementation Criteria At most one vertex can lie in a region represented by a quadtree leaf If a region contains a vertex it can contain no partial edge that does not include that vertex If a region contains no vertices it can contain at most one partial edge CSCI585 PM Quadtree PM1 quadtree PM2 quadtree PM3 quadtree CSCI585 Example of PM1 Quadtree 0 100 0 0 100 100 100 0 Each node in a PM quadtree is a collection of partial edges and a vertex Each point record has two field x y Each partial edge has four field starting point ending point left region right region CSCI585 References National Technical University of Athens Theoretical Computer Science II Advanced Data Structures J rg Nievergelt Hans Hinterberger Kenneth C Sevcik The Grid File An Adaptable Symmetric Multikey File Structure ACM Trans Database Syst 9 1 38 71 1984 H V Jagadish Linear Clustering of Objects with Multiple Atributes ACM SIGMOD Conference 1990 332 342
View Full Document
Unlocking...