DOC PREVIEW
UW CSE 332 - More B-Trees

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CSE332: Data AbstractionsLecture 10: More B-TreesTyler RobisonSummer 20101B-Tree Review: Another dictionary2 Overall idea: Large data sets won’t fit entirely in memory Disk access is slow Set up tree so we do one disk access per node in tree Then our goal is to keep tree shallow as possible Balanced binary tree is a good start, but we can do better than log2n height In an M-ary tree, height drops to logMn Why not set M really really high? Height 1 tree… Instead, set M so that each node fits in a disk blockB-Tree Review3 M-ary tree with room for L data items at each leaf All data kept at leaves Order property:Subtree between keys x and ycontains only data that is  xand < y (notice the ) Balance property:All nodes and leaves at least half full, and all leaves at same height find and insert efficient insert uses splitting to handle overflow, which may require splitting parent, and so on recursively31214151615183032 40323638184045M=3, L=3Horizontal: Internal, Vertical: LeafThere are different variants of B-Trees you can use (adoption, etc.)Insert(16)31415183018 323236M = 3 L = 331415183018 32323616183018 323236314151615415 3218What now?Split the internal node (in this case, the root)B-Tree Insertion ExampleB-Tree Insertion Algorithm Overview51. Traverse from the root to the proper leaf. Insert the data in its leaf in sorted order2. If the leaf now has L+1 items, overflow! Split the leaf into two leaves: Attach the new child to the parent3. If an internal node has M+1 children, overflow! Split the node into two nodes Attach the new child to the parentSplitting at a node (step 3) could make the parent overflow too So repeat step 3 up the tree until a node doesn’t overflow If the root overflows, make a new root with two childrenDelete(32)31214151615183032 40323638184045And Now for Deletion…6M = 3 L = 331214151615183040184045363836Easy case: Leaf still has enough data; just removeDelete(15)31214151615183036 403638184045M = 3 L = 3312141616183036 403638184045Underflow in the leaf7312141614183036 403638184045M = 3 L = 3312141616183036 4036381840458Adoption: grab a data item from neighboring leafDelete(16)312141614183036 403638184045M = 3 L = 314183036 40363818404531214Uh-oh, neighbors at their minimum!9M = 3 L = 314183036 4036381840453121431214183036 403638184045Merge the two nodes together. This causes underflow in the parent1031214183036 403638184045M = 3 L = 33121418183040363836404511Now grab a leaf node from parent’s neighborDelete(14)31214181830403638364045312181830403638364045M = 3 L = 312Easy case againDelete(18)312181830403638364045M = 3 L = 3312183040363836404513Leaf underflow; no neighbors with enough to steal from…31230403638364045M = 3 L = 3312183040363836404514Merge leaves…3123040363836404536 4031230336384045M = 3 L = 315Can’t steal leaf from parent’s neighbor; too few leaves. Instead merge parent w/ parent’s neighbor36 403123036384045M = 3 L = 336 403123033638404516Which causes an underflow in root; replace rootDeletion Algorithm171. Remove the data from its leaf2. If the leaf now has L/2 - 1, underflow! If a neighbor has > L/2 items, adopt and update parent Else merge node with neighbor Guaranteed to have a legal number of items Parent now has one less node3. If step (2) caused the parent to have M/2 - 1children, underflow! …Deletion algorithm continued183. If an internal node has M/2 - 1 children If a neighbor has > M/2 items, adopt and update parent Else merge node with neighbor Guaranteed to have a legal number of items Parent now has one less node, may need to continue up the treeIf we merge all the way up through the root, that’s fine unless the root went from 2 children to 1 In that case, delete the root and make child the root This is the only case that decreases tree heightEfficiency of delete19 Find correct leaf: Remove from leaf: Adopt/merge from/with neighbor leaf: Adopt or merge all the way up to root:Worst-case Delete: O(L + M logMn)But it’s not that bad: Merges are not that common Remember disk accesses were the name of the game:O(logMn)O(log2M logMn)O(L)O(L)O(M logMn)Insert vs delete comparison20Insert Find correct leaf: Insert in leaf: Split leaf: Split parents all the way up to root:Delete Find correct leaf: Remove from leaf: Adopt/merge from/with neighbor leaf: Adopt or merge all the way up to root:O(log2M logMn)O(L)O(L)O(M logMn)O(log2M logMn)O(L)O(L)O(M logMn)Aside: Limitations of B-Trees in Java21For most of our data structures, we have encouraged writing high-level, reusable code, such as in Java with genericsIt is worth knowing enough about “how Java works” to understand why this is probably a bad idea for B-Trees Assuming our goal is efficient number of disk accesses Java has many advantages, but it wasn’t designed for this If you just want a balanced tree with worst-case logarithmic operations, no problemThe problem is extra levels of indirection…One approach22Even if we assume data items have int keys, you cannot get the data representation you want for “really big data” interface Keyed<E> {int key(E);}class BTreeNode<E implements Keyed<E>> {static final int M = 128;int[] keys = new int[M-1];BTreeNode<E>[] children = new BTreeNode[M];int numChildren = 0;…}class BTreeLeaf<E> {static final int L = 32;E[] data = (E[])new Object[L];int numItems = 0;…}What that looks like23BTreeNode (3 objects with “header words”)M-1122045M70BTreeLeaf (data objects not in contiguous memory)20… (larger array)… (larger array)L… (larger array)All the red references indicate unnecessary indirectionThe moral24 The whole idea behind B trees was to keep related data in contiguous memory But that’s “the best you can do” in Java Again, the advantage is generic, reusable code But for your performance-critical web-index, not the way to implement your B-Tree for terabytes of data C# may have better support for “flattening objects into arrays” C and C++ definitely do Levels of indirection matter!Conclusion: Balanced Trees25 Balanced trees make good dictionaries because they guarantee logarithmic-time find, insert, and delete Essential and beautiful computer science But only if you can maintain balance within the time bound AVL trees maintain balance by tracking height and allowing all children to differ


View Full Document

UW CSE 332 - More B-Trees

Download More B-Trees
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view More B-Trees and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view More B-Trees 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?