DOC PREVIEW
U of I CS 511 - Indexing: R-Trees

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 5 Indexing R Trees Sept 7 2007 ChengXiang Zhai Most slides are adapted from Kevin Chang s lecture slides CS511 Advanced Database Management Systems 1 Efficient Processing of Declarative Queries From declarative to procedural query compilation Physical query plan operators Query optimization access path selection improves efficiency Implementation of physical query plan operators Access methods procedural support for relational operators Indexing improves efficiency CS511 Advanced Database Management Systems 2 Access Methods Indexing What is an index partition data into buckets label each bucket use label to determine if buckets are relevant to query One dimensional indexing hashing B trees B trees we ll simply call them B trees What are the differences pros cons Indexes are designed to serve queries check each index methods w r t typical queries CS511 Advanced Database Management Systems 3 Multidimensional Data GIS geographic info systems database e g map databases CAD database circuit layout of rectangles Data cube systems sale data date time store item price a tuple as a point in multi dimensional space queries typically retrieve a cube e g date between 1990 1999 store supermarket price 100 CS511 Advanced Database Management Systems 4 Example of Multidimensional Data Values may be multi dimensional CS511 Advanced Database Management Systems 5 Multidimensional Queries Point queries constrain values of some dimensions Range queries constrain ranges of some dimensions Nearest neighbor e g the closest city to Champaign Where am I query regions related to a query region given a point find out where it is located given a region find overlapping containing contained regions Others CS511 Advanced Database Management Systems 6 B Tree for Nearest Neighbor Queries Object represented by X Y X indexed by a B tree Y indexed by a B tree Given an object x0 y0 find its NN object let d 1 find objs in X d Y d to X d Y d if there are some objs find NN among them done else d d delta repeat Problems CS511 Advanced Database Management Systems 7 B Tree for Nearest Neighbor Queries Intersecting multiple indexes not efficient Need to guess the distance d and delta too small no objs in the bounding box too large may be too many in bounding box May actually miss the NN point closer point may be outside the d range wrong answer d CS511 Advanced Database Management Systems 8 Indexing Techniques How to reach the right buckets quickly Hash like schemes lookup by k h k grid files partitioned indexing functions Tree like schemes lookup by tree traversal multiple key indexes kd trees quad trees R trees See demo http www cs umd edu brabec quadtree CS511 Advanced Database Management Systems 9 Grid Files 250K 200K 150K Salary 80K 20K 0 30 40 70 100 Age CS511 Advanced Database Management Systems 10 Grid Files Each region corresponds to a bucket Why is this hash based indexing Good for partial match range NN Buckets may be empty or overfull over time points aren t distributed uniformly Number of buckets grows exponentially with D Reorganization requires repartitioning space need to move the grid lines CS511 Advanced Database Management Systems 11 Partitioned Hash Functions Hash function produces K bits a bucket Bits partitioned among attributes buckets identified by b1b2b3b4b5b6 age produces b1b2b3 salary produces b4b5b6 Differences from grid files can simulate a grid file generalization of grid files range NN queries bucket utilization CS511 Advanced Database Management Systems 12 Multiple Key Indexes Each level is an index for one attribute Partial match will work if the prefix of a path is specified But what if only salary is specified NN queries age CS511 Advanced Database Management Systems salary 13 kd Tree k dimensional Search Tree binary tree interior nodes attr div value attributes alternate at different levelsSalary binary search to find records 150 Age 60 Salary 80 Age 38 25 60 70 110 85 140 Age 47 50 275 Salary 300 60 260 50 100 50 120 45 60 30 260 25 400 45 350 50 75 CS511 Advanced Database Management Systems 14 Quad Trees Interior nodes represent square regions with at most M entries Split into four quadrants if more than M entries 200K Salary 0 CS511 Advanced Database Management Systems Age 100 15 Quad Trees cont In terms of space partitioning How is it different from a grid file How is it different from a kd tree CS511 Advanced Database Management Systems 16 R Trees Region Trees B tree related balanced tree by dynamic restructuring guaranteed utilization of nodes between half and completely full Support regions Coverage of siblings may overlap CS511 Advanced Database Management Systems 17 R Trees Structure like B tree but keys are minimum bounding rectangle MBR regions does B tree use 1 D MBR e g keys 57 81 95 Interior node contains sets of regions Regions can overlap Regions do not necessarily fully cover the entire parent region Region can be a point or a shape Utilization of nodes unless root max capacity M How to determine M min capacity m M 2 Why CS511 Advanced Database Management Systems 18 R Trees Region Trees How is it different from others CS511 Advanced Database Management Systems 19 R Tree Where Am I Queries Given point P return data regions containing P Start with root Find subregions S containing P if S is a data region return S else recursively search S CS511 Advanced Database Management Systems 20 R Tree Split Node Objectives minimize covered area of the containing region Why Is this the only objective What else CS511 Advanced Database Management Systems 21 R Tree Split Node Minimal covered area split is exponential need to consider all binary partitioning M 1 entries 2M 1 approximately Use heuristics not exhaustive quadratic algorithm linear algorithm CS511 Advanced Database Management Systems 22 R Tree Quadratic Split PickSeeds select two seeds for two groups heuristics pair that wastes the largest area waste area pair area I area J If can be further partitioned enough entries assignable PickNext heuristics greedily select the next with max preference d1 d2 area increase if entry E added to group 1 2 select E with max d1 d2 Add to the group with least enlargement Quadratic in M PickSeeds consider every pair What about PickNext CS511 Advanced Database Management Systems 23 R Tree Linear Split PickSeeds select two seeds for two groups heuristics pair that is the most separated in any dimension separation distance of the near sides normalized by the length of the dimension If


View Full Document

U of I CS 511 - Indexing: R-Trees

Download Indexing: R-Trees
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Indexing: R-Trees and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Indexing: R-Trees and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?