CMSC 424 – Database designLecture 14B+-treesHashingMihai PopAdministrative•Project questions?•HW2 answers•HW3 postponed till after midterm2Indexing...recap•Index – helps find/process records fast (surrogate for sorting the file)•Dense/sparse index•Multi-level indexing (inner/outer index)•Clustering/non-clustering index•Primary/secondary index•Key elements:–speed of access–space overhead–speed/ease of insertion/deletion–access type (find exact, find range, etc.)•Many insertion/deletions may lead to inefficient structure – indices may need to be rebuilt to improve performanceB+-trees•A variant of multi-level indexing•Extension of binary search tree concept•Optimize I/O efficiency – node size = disk block size•Balanced tree structure – all leaves are equidistant from rootExample B+ Tree• Search begins at root, and key comparisons direct it to a leaf.•Search for 5*, 15*, all data entries >= 24* ...☛ Based on the search for 15*, we know it is not in the tree!Root1724302*3* 5*7* 14* 16*19* 20*22* 24* 27*29* 33* 34*38*39*13B+ Tree - Properties•Balanced• Every node except root must be at least ½ full.•Order: the minimum number of keys/pointers in a non-leaf node• Fanout of a node: the number of pointers out of the nodeB+ Trees in Practice• Typical order: 100. Typical fill-factor: 67%.– average fanout = 133• Typical capacities:– Height 3: 1333 = 2,352,637 entries–Height 4: 1334 = 312,900,700 entries•Can often hold top levels in buffer pool:–Level 1 = 1 page = 8 Kbytes– Level 2 = 133 pages = 1 Mbyte– Level 3 = 17,689 pages = 133 MBytesB+ Trees: Summary• Searching:–logd(n) – Where d is the order, and n is the number of entries• Insertion:– Find the leaf to insert into– If full, split the node, and adjust index accordingly–Similar cost as searching• Deletion– Find the leaf node– Delete– May not remain half-full; must adjust the index accordinglyInsert 23*Root1724302*3* 5*7* 14* 16*19* 20*22* 24* 27*29* 33* 34*38*39*13Root1724302*3* 5*7* 14* 16*19* 20*22* 24* 27*29* 33* 34*38*39*1323*No splitting required.Insert 8*Root1724302*3* 5*7* 14* 16*19* 20*22* 24* 27*29* 33* 34*38*39*132* 3*Root17243014* 16*19* 20* 22* 24* 27*29* 33* 34*38*39*1357*5* 8*Example B+ Tree - Inserting 8*❖ Notice that root was split, leading to increase in height.❖ In this example, we can avoid split by re-distributing entries; however, this is usually not done in practice.Root1724302*3* 5*7* 14* 16*19* 20*22* 24* 27*29* 33* 34*38*39*132* 3*Root17243014* 16*19* 20* 22* 24* 27*29* 33* 34*38*39*1357*5* 8*Data vs. Index Page Split (from previous example of inserting “8”)• Observe how minimum occupancy is guaranteed in both leaf and index pg splits.•Note difference between copy-up and push-up; be sure you understand the reasons for this.2*3* 5*7*5Entry to be inserted in parent node.(Note that 5 iscontinues to appear in the leaf.)s copied up and2*3*5*7*8*…Data Page Split8*524 3013appears once in the index. Contrast17Entry to be inserted in parent node.(Note that 17 is pushed up and onlythis with a leaf split.)17243013Index Page Split5Root1724302*3* 5*7* 14* 16*19* 20*22* 24* 27*29* 33* 34*38*39*132* 3*Root17243014* 16*19* 20* 22* 24* 27*29* 33* 34*38*39*1357*5* 8*Delete 19*2* 3*Root17243014* 16*20* 22* 24* 27*29* 33* 34*38*39*1357*5* 8*Delete 20* ...2* 3*Root17243014* 16*20* 22* 24* 27*29* 33* 34*38*39*1357*5* 8*2* 3*Root17273014* 16*22* 24* 27* 29*33* 34*38*39*1357*5* 8*Delete 19* and 20* ...• Deleting 19* is easy.•Deleting 20* is done with re-distribution. Notice how middle key is copied up.•Further deleting 24* results in more drastic changesDelete 24* ...2* 3*Root17273014* 16*22* 24* 27* 29*33* 34*38*39*1357*5* 8*2* 3*Root17273014* 16*22* 27* 29*33* 34*38*39*1357*5* 8*No redistribution fromneighbors possibleDeleting 24*• Must merge.•Observe `toss’ of index entry (on right), and `pull down’ of index entry (below).3022* 27*29* 33* 34*38*39*2*3*7*14* 16*22*27*29*33* 34*38*39*5* 8*Root3013517Example of Non-leaf Re-distribution• Tree is shown below during deletion of 24*. (What could be a possible initial tree?)• In contrast to previous example, can re-distribute entry from left child of root to right child. Root13517 20223014* 16*17* 18*20* 33* 34*38*39*22* 27* 29*21*7*5* 8*3*2*After Re-distribution• Intuitively, entries are re-distributed by `pushing through’ the splitting entry in the parent node.• It suffices to re-distribute index entry with key 20; we’ve re-distributed 17 as well for illustration.14* 16*33* 34*38*39*22* 27* 29*17* 18*20* 21*7*5* 8*2*
View Full Document