Slide 1BTreesOur goalM-ary Search TreeProblems with how to proceedB+ Trees (we and the book say “B Trees”)FindStructure PropertiesExampleBalanced enoughDisk FriendlinessMaintaining balanceBuilding a B-Tree (insertions)Slide 14Slide 15Slide 16Slide 17Insertion AlgorithmInsertion algorithm continuedEfficiency of insertAnother option (we won’t use)Adoption exampleSame example, with splittingCSE332: Data AbstractionsLecture 9: BTreesTyler RobisonSummer 20101BTrees2BBBBBBBBB6891012141617 2022 272832 34383941444749 506070 12 44 6 20 27 34 50 19 24124 X XOur goal3Problem: What if our dictionary has so much data, most of it resides on disk; very slow to accessSay we had to do a disk access for each node accessUnbalanced BST with n nodes: n disk accesses (worst case)AVL tree: log2n accesses (worst case)log2(230)=30 disk accesses still a bit slowAn improvement, but we can do betterIdea: A balanced tree (logarithmic height) that is even shallower than AVL trees so that we can minimize disk accesses and exploit disk-block sizeIncrease the branching factor to decrease the heightGives us height log3n, log10n, log50n etc., based on branching factorAsymptotically still O(logn) height thoughM-ary Search Tree4# hops for find: If balanced, using logM n instead of log2 nIf M=256, that’s an 8x improvementExample: M = 256 and n = 240 that’s 5 instead of 40To decide which branch to take, divide into portionsBinary tree: Less than node value or greater?M-ary: In range 1? In range 2? In range 3?... In range M?Runtime of find if balanced: O(log2 M logM n)Hmmm… logM n is the height we traverse. Why the log2M multiplier?log2M: At each step, find the correct child branch to take using binary search•Build some kind of search tree with branching factor M:–Say, each node has an array of M sorted children (Node[])–Choose M to fit node snugly into a disk block (1 access per node)Problems with how to proceed5What should the order property be? How do we decide the ‘portion’ each child will take?How would you rebalance (ideally without more disk accesses)?Storing real data at inner-nodes (like in a BST) seems kind of wasteful…To access the node, will have to load data from disk, even though most of the time we won’t use itB+ Trees (we and the book say “B Trees”)6Two types of nodes: internal nodes & leavesEach internal node has room for up to M-1 keys and M childrenIn example on right, M=7No other data; all data at the leaves!Think of M-1 keys stored in internal nodes as ‘signposts’ used to find a path to a leafOrder property:Subtree between keys x and y contains only data that is x and < y (notice the )Leaf nodes (not shown here) have up to L sorted data itemsRemember:Leaves store dataInternal nodes are ‘signposts’3 7 12 21 21x12x<217x<123x<7x<3Empty cells at the end are currently unused, but may get filled in laterWhat’s the ‘B’ for? Wikipedia quote from 1979 textbook:The origin of "B-tree" has never been explained by the authors. As we shall see, "balanced," "broad," or "bushy" might apply. Others suggest that the "B" stands for Boeing. Because of his contributions, however, it seems appropriate to think of B-trees as "Bayer"-trees.Find7Different from BST in that we don’t store values at internal nodesBut find is still an easy recursive algorithm, starting at rootAt each internal-node do binary search on the (up to) M-1 keys to find the branch to takeAt the leaf do binary search on the (up to) L data itemsBut to get logarithmic running time, we need a balance condition…3 7 12 21 21x12x<217x<123x<7x<3Structure Properties8Non-root Internal nodesHave between M/2 and M children (inclusive), i.e., at least half fullLeaf nodesAll leaves at the same depthHave between L/2 and L data items (inclusive), i.e., at least half fullRoot (special case)If tree has L items, root is a leaf (occurs when starting up; otherwise unusual); can have any number of items LElse has between 2 and M children(Any M > 2 and L will work; picked based on disk-block size)Uh, why these bounds for internal nodes & children?Upper bounds make sense: don’t want one long list; pick to fit in blockWhy lower bounds?Ensures tree is sufficiently ‘filled in’: Don’t have unnecessary height, for instanceGuarantees ‘broadness’We’ll see more when we cover insert & deleteRemember: M=max # childrenL=max items in leafExample9Suppose M=4 (max children) and L=5 (max items at leaf)All internal nodes have at least 2 childrenAll leaves have at least 3 data items (only showing keys)All leaves at same depth6891012141617 2022 272832 34383941444749 506070 12 44 6 20 27 34 50 19 24124 Note: Only shows 1 key, but has 2 children, so it’s okayNote on notation: Inner nodes drawn horizontally, leaves vertically to distinguish. Include empty cellsBalanced enough10Not too difficult to show height h is logarithmic in number of data items nLet M > 2 (if M = 2, then a list tree is legal – no good!)Because all nodes are at least half full (except root may have only 2 children) and all leaves are at the same level, the minimum number of data items n for a height h>0 tree is… n 2 M/2 h-1 L/2minimum number of leavesminimum data per leafExponential in height because M/2 > 1Disk Friendliness11What makes B trees so disk friendly?Many keys stored in one nodeAll brought into memory in one disk accessIF we pick M wiselyMakes the binary search over M-1 keys totally worth it (insignificant compared to disk access times)Internal nodes contain only keysAny find wants only one data item; wasteful to load unnecessary items with internal nodesSo only bring one leaf of data items into memoryData-item size doesn’t affect what M isMaintaining balance12So this seems like a great data structure (and it is)But we haven’t implemented the other dictionary operations yetinsertdeleteAs with AVL trees, the hard part is maintaining structure propertiesExample: for insert, there might not be room at the correct leafUnlike AVL trees, there are no rotations Building a B-Tree (insertions)13The empty B-Tree (the root will be a
View Full Document