CMSC 424 Database design Lecture 14 B trees Hashing Mihai Pop Administrative Project questions HW2 answers HW3 postponed till after midterm2 Indexing recap Index helps find process records fast surrogate for sorting the file Dense sparse index Multi level indexing inner outer index Clustering non clustering index Primary secondary index Key elements speed of access space overhead speed ease of insertion deletion access type find exact find range etc Many insertion deletions may lead to inefficient structure indices may need to be rebuilt to improve performance B trees A variant of multi level indexing Extension of binary search tree concept Optimize I O efficiency node size disk block size Balanced tree structure all leaves are equidistant from root Example B Tree Search begins at root and key comparisons direct it to a leaf Search for 5 15 all data entries 24 Root 13 2 3 5 7 14 16 17 24 19 20 22 30 24 27 29 33 34 38 39 Based on the search for 15 we know it is not in the tree B Tree Properties Balanced Every node except root must be at least full Order the minimum number of keys pointers in a nonleaf node Fanout of a node the number of pointers out of the node B Trees in Practice Typical order 100 Typical fill factor 67 average fanout 133 Typical capacities Height 3 1333 2 352 637 entries Height 4 1334 312 900 700 entries Can often hold top levels in buffer pool 1 page 8 Kbytes Level 1 Level 2 133 pages 1 Mbyte Level 3 17 689 pages 133 MBytes B Trees Summary Searching logd n Where d is the order and n is the number of entries Insertion Find the leaf to insert into If full split the node and adjust index accordingly Similar cost as searching Deletion Find the leaf node Delete May not remain half full must adjust the index accordingly Insert 23 Root 13 2 3 5 7 17 24 30 19 20 22 14 16 24 27 29 33 34 38 39 No splitting required Root 13 2 3 5 7 14 16 17 30 24 19 20 22 23 24 27 29 33 34 38 39 Insert 8 Root 13 3 2 5 7 17 24 30 19 20 22 14 16 24 27 29 33 34 38 39 Root 17 5 2 3 24 13 5 7 8 14 16 19 20 22 30 24 27 29 33 34 38 39 Example B Tree Inserting 8 Root Root 13 5 2 2 3 3 5 7 17 17 24 30 24 13 5 14 16 7 8 14 16 24 19 20 22 19 20 22 30 27 29 24 27 29 33 34 38 39 33 34 38 39 Notice that root was split leading to increase in height In this example we can avoid split by re distributing entries however this is usually not done in practice Data vs Index Page Split from previous example of inserting 8 Data Page Split Observe how minimum occupancy is guaranteed in both leaf and index pg splits Note difference between copy up and push up be sure you understand the reasons for this 2 3 5 7 5 8 13 17 24 30 Entry to be inserted in parent node Note that 17 is pushed up and only appears once in the index Contrast this with a leaf split 17 13 7 8 Index Page Split 5 5 Entry to be inserted in parent node Note that 5 iss copied up and continues to appear in the leaf 5 2 3 24 30 Delete 19 Root 17 Root 5 24 13 13 2 2 3 3 5 5 7 7 8 17 24 19 20 22 14 16 30 19 20 22 14 16 30 24 24 27 29 27 29 33 34 38 39 33 34 38 39 Root 17 5 2 3 24 13 5 7 8 14 16 20 22 30 24 27 29 33 34 38 39 Delete 20 Root 17 5 2 3 24 13 5 7 8 20 22 14 16 30 24 27 29 33 34 38 39 Root 17 5 2 3 27 13 5 7 8 14 16 22 24 30 27 29 33 34 38 39 Delete 19 and 20 Deleting 19 is easy Deleting 20 is done with re distribution Notice how middle key is copied up Further deleting 24 results in more drastic changes Delete 24 Root 17 5 2 3 27 13 5 7 8 22 24 14 16 30 27 29 No redistribution from neighbors possible Root 17 5 2 3 27 13 5 7 8 14 16 33 34 38 39 22 30 27 29 33 34 38 39 Deleting 24 Must merge Observe toss of index entry on right and pull down of index entry below 30 22 27 29 33 34 38 39 Root 5 2 3 5 7 8 13 14 16 17 30 22 27 29 33 34 38 39 Example of Non leaf Re distribution Tree is shown below during deletion of 24 What could be a possible initial tree In contrast to previous example can re distribute entry from left child of root to right child Root 22 5 2 3 5 7 8 13 14 16 17 30 20 17 18 20 21 22 27 29 33 34 38 39 After Re distribution Intuitively entries are re distributed by pushing through the splitting entry in the parent node It suffices to re distribute index entry with key 20 we ve redistributed 17 as well for illustration Root 17 5 2 3 5 7 8 13 14 16 20 17 18 20 21 22 30 22 27 29 33 34 38 39
View Full Document
Unlocking...