Mt Holyoke CS 341 - Tree-Structured Indexes

Unformatted text preview:

Tree-Structured IndexesIntroductionRange SearchesISAMComments on ISAMExample ISAM TreeAfter Inserting 23*, 48*, 41*, 42* ...... Then Deleting 42*, 51*, 97*B+ Tree: Most Widely Used IndexExample B+ TreeB+ Trees in PracticeInserting a Data Entry into a B+ TreeInserting 8* into Example B+ TreeExample B+ Tree After Inserting 8*Deleting a Data Entry from a B+ TreeExample Tree After (Inserting 8*, Then) Deleting 19* and 20* ...... And Then Deleting 24*Example of Non-leaf Re-distributionAfter Re-distributionPrefix Key CompressionBulk Loading of a B+ TreeBulk Loading (Contd.)Summary of Bulk LoadingA Note on `Order’SummarySummary (Contd.)Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1Tree-Structured IndexesChapter 10Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2IntroductionAs for any index, 3 alternatives for data entries k*: Data record with key value k <k, rid of data record with search key value k> <k, list of rids of data records with search key k>Choice is orthogonal to the indexing technique used to locate data entries k*.Tree-structured indexing techniques support both range searches and equality searches.ISAM: static structure; B+ tree: dynamic, adjusts gracefully under inserts and deletes.Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3Range Searches``Find all students with gpa > 3.0’’If data is in sorted file, do binary search to find first such student, then scan to find others.Cost of binary search can be quite high.Simple idea: Create an `index’ file. Can do binary search on (smaller) index file!Page 1Page 2Page NPage 3Data Filek2kNk1Index FileDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4ISAMIndex file may still be quite large. But we can apply the idea repeatedly! Leaf pages contain data entries.P0K1P1K2P2KmPmindex entryNon-leafPagesPagesOverflow pagePrimary pagesLeafDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5Comments on ISAMFile creation: Leaf (data) pages allocated sequentially, sorted by search key; then index pages allocated, then space for overflow pages.Index entries: <search key value, page id>; they `direct’ search for data entries, which are in leaf pages.Search: Start at root; use key comparisons to go to leaf. Cost log F N ; F = # entries/index pg, N = # leaf pgsInsert: Find leaf data entry belongs to, and put it there.Delete: Find and remove from leaf; if empty overflow page, de-allocate.  Static tree structure: inserts/deletes affect only leaf pages.Data PagesIndex PagesOverflow pagesDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6Example ISAM TreeEach node can hold 2 entries; no need for `next-leaf-page’ pointers. (Why?)10* 15* 20* 27* 33* 37* 40*46*51*55*63*97*20 33 51 6340RootDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7After Inserting 23*, 48*, 41*, 42* ...10* 15* 20* 27* 33* 37* 40*46*51*55*63*97*20 33 51 6340Root23*48*41*42*OverflowPagesLeafIndexPagesPagesPrimaryDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8 ... Then Deleting 42*, 51*, 97* Note that 51* appears in index levels, but not in leaf!10* 15* 20* 27* 33* 37* 40*46* 55*63*20 33 51 6340Root23*48*41*Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9B+ Tree: Most Widely Used IndexInsert/delete at log F N cost; keep tree height-balanced. (F = fanout, N = # leaf pages)Minimum 50% occupancy (except for root). Each node contains d <= m <= 2d entries. The parameter d is called the order of the tree.Supports equality and range-searches efficiently.Index EntriesData Entries("Sequence set")(Direct search)Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10Example B+ TreeSearch begins at root, and key comparisons direct it to a leaf (as in ISAM).Search for 5*, 15*, all data entries >= 24* ... Based on the search for 15*, we know it is not in the tree!Root17 24302*3* 5*7* 14* 16*19* 20* 22* 24* 27*29* 33* 34*38*39*13Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11B+ Trees in PracticeTypical order: 100. Typical fill-factor: 67%.average fanout = 133Typical capacities:Height 4: 1334 = 312,900,700 recordsHeight 3: 1333 = 2,352,637 recordsCan often hold top levels in buffer pool:Level 1 = 1 page = 8 KbytesLevel 2 = 133 pages = 1 MbyteLevel 3 = 17,689 pages = 133 MBytesDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12Inserting a Data Entry into a B+ TreeFind correct leaf L. Put data entry onto L.If L has enough space, done!Else, must split L (into L and a new node L2)•Redistribute entries evenly, copy up middle key.•Insert index entry pointing to L2 into parent of L.This can happen recursivelyTo split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.)Splits “grow” tree; root split increases height. Tree growth: gets wider or one level taller at top.Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13Inserting 8* into Example B+ TreeObserve how minimum occupancy is guaranteed in both leaf and index pg splits.Note difference between copy-up and push-up; be sure you understand the reasons for this.2*3* 5*7*8*5Entry to be inserted in parent node.(Note that 5 iscontinues to appear in the leaf.)s copied up andappears once in the index. Contrast5 24 301713Entry to be inserted in parent node.(Note that 17 is pushed up and onlythis with a leaf split.)Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14Example B+ Tree After Inserting 8* Notice that root was split, leading to increase in height. In this example, we can avoid split by re-distributing entries; however, this is usually not done in practice.2* 3*Root17243014* 16*19* 20* 22* 24* 27*29* 33* 34*38*39*1357*5* 8*Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15Deleting a Data Entry from a B+ TreeStart at root, find leaf L where entry belongs.Remove the entry.If L is at least half-full, done! If L has only d-1 entries,•Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).•If re-distribution fails, merge L and sibling.If merge occurred, must delete entry (pointing to L or sibling) from parent of L.Merge could propagate to root, decreasing height.Database Management Systems 3ed, R.


View Full Document

Mt Holyoke CS 341 - Tree-Structured Indexes

Download Tree-Structured Indexes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Tree-Structured Indexes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Tree-Structured Indexes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?