FSU CIS 5930r - Advanced Topics in Data Management

Unformatted text preview:

Feifei LiFall 2008(Many slides were made available by Ke Yi)CIS 5930 Advanced Topics in Data ManagementExternal Memory Data Structures• Names:– I/O-efficient data structures– Disk-based data structures (index structures) used in DB– Disk-resilient data structures (index structures) used in DB– Secondary indexes used in DB• Other Data structures– Queue, stack* O(N/B) space, O(1/B) push, O(1/B) pop– Priority queue* O(N/B) space, O(1/B ∙ logM/BN/B) insert, delete-maxMainly used in algorithmsExternal Memory Data Structures• General-purpose data structures– Space: linear or near-linear (very important)– Query: logarithmic in B or 2 for any query (very important)– Update: logarithmic in B or 2 (important)• In some sense, more useful than I/O-algorithms– Structure stored in disk most of the time– DB typically maintains many data structures for many different data sets: can’t load all of them to memory– Nearly all index structures in large DB are disk based– If nodes stored arbitrarily on disk Search in I/Os Rangesearch in I/Os• Binary search tree:– Standard method for search among N elements– We assume elements in leaves – Search traces at least one root-leaf pathExternal Search Trees)(log2NO)(log2N)(log2TNO External Search Trees• Bottom-up BFS blocking:– Block height– Output elements blocked Range query in I/Os• Optimal: O(N/B) space and query)(log2B)(B)(log)(log/)(log22NOBONOB)(logBTBN )(logBTBN • Maintaining BFS blocking during updates?– Balance normally maintained in search trees using rotations• Seems very difficult to maintain BFS blocking during rotation– Also need to make sure output (leaves) is blocked!External Search TreesxyxyB-trees• BFS-blocking naturally corresponds to tree with fan-out• B-trees balanced by allowing node degree to vary– Rebalancing performed by splitting and merging nodes)(B• (a,b)-tree uses linear space and has heightChoosing a,b = each node/leaf stored in one disk block(N/B) space and query(a,b)-tree• T is an (a,b)-tree (a≥2 and b≥2a-1)– All leaves on the same level (contain between a and b elements)– Except for the root, all nodes have degree between a and b– Root has degree between 2 and b)(log NOa)(logBTBN )(B(2,4)-tree(a,b)-Tree Insert• Insert:Search and insert element in leaf vDO v has b+1 elements/childrenSplit v:make nodes v’ and v’’ with and elementsinsert element (ref) in parent(v)(make new root if necessary)v=parent(v)• Insert touch nodes bb21 ab21)(log Navv’ v’’ 21b 21b1b(a,b)-Tree Insert(a,b)-Tree Delete• Delete:Search and delete element from leaf vDO v has a-1 elements/childrenFuse v with sibling v’:move children of v’ to vdelete element (ref) from parent(v)(delete root if necessary)If v has >b (and ≤ a+b-1<2b) children split vv=parent(v)• Delete touch nodes )(log NOavv1-a12 - a(a,b)-Tree Delete13External Searching: B-Tree• Each node (except root) has fan-out between B/2 and B• Size: O(N/B) blocks on disk• Search: O(logBN) I/Os following a root-to-leaf path• Insertion and deletion: O(logBN) I/OsSummary/Conclusion: B-tree• B-trees: (a,b)-trees with a,b = – O(N/B) space– O(logBN+T/B) query– O(logBN) update• B-trees with elements in the leaves sometimes called B+-tree– Now B-tree and B+tree are synonyms• Construction in I/Os– Sort elements and construct leaves– Build tree level-by-level bottom-up)(B)log(BNBNBMO2D Range Searchingq3q2q1q4Quadtree• No worst-case bound!• Hard to block!kd-tree• kd-tree:– Recursive subdivision of point-set into two half using vertical/horizontal line– Horizontal line on even levels, vertical on uneven levels– One point in each leafLinear space and logarithmic heightkd-Tree: Query• Query– Recursively visit nodes corresponding to regions intersecting query– Report point in trees/nodes completely contained in query• Query analysis– Horizontal line intersect Q(N) = 2+2Q(N/4) = regions– Query covers T regions I/Os worst-case)( NO)( TNO kdB-tree• kdB-tree:– Bottom-up BFS blocking– Same as B-tree• Query as before– Analysis as before but each region now contains Θ(B) pointsI/O query)(BTBNO Construction of kdB-tree• Simple algorithm– Find median of y-coordinates (construct root)– Distribute point based on median– Recursively build subtrees– Construct BFS-blocking top-down (can compute the height in advance)• Idea in improved algorithm– Construct levels at a time using O(N/B) I/Os)log(2BNBNO)log(BNBNBMOBMlogConstruction of kdB-tree• Sort N points by x- and by y-coordinates using I/Os• Building levels ( nodes) in O(N/B) I/Os:1. Construct by gridwith points in each slab2. Count number of points in eachgrid cell and store in memory3. Find slab s with median x-coordinate4. Scan slab s to find median x-coordinate and construct node5. Split slab containing median x-coordinate and update counts6. Recurse on each side of median x-coordinate using grid (step 3) Grid grows to during algorithm Each node constructed in I/OsBMlog)log(BNBNBMOBMBMBMNBM)(BMBMBMBM))/(( BNOBMkdB-tree• kdB-tree:– Linear space– Query in I/Os – Construction in O(sort(N)) I/Os– Height • Dynamic?– Difficult to do splits/merges or rotations …)(BTBNO )(log


View Full Document

FSU CIS 5930r - Advanced Topics in Data Management

Download Advanced Topics in Data Management
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Advanced Topics in Data Management and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Advanced Topics in Data Management 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?