DOC PREVIEW
CORNELL CS 432 - Hash-Based Indexes

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Database Management Systems, R. Ramakrishnan and J. Gehrke 1Hash-Based IndexesDatabase Management Systems, R. Ramakrishnan and J. Gehrke 2Introduction As for any index, 3 alternatives for data entries k*: Data record with key value k <k, rid of data record with search key value k> <k, list of rids of data records with search key k> Hash-based indexes are best for equality selections.– Provide constant-time searches– But cannot support range searches Static and dynamic hashing techniques exist– Trade-offs similar to ISAM vs. B+ treesDatabase Management Systems, R. Ramakrishnan and J. Gehrke 3Static Hashing # primary pages fixed, allocated sequentially, never de-allocated; overflow pages if needed. h(k) mod N = bucket to which data entry withkey k belongs. (N = # of buckets)h(key) mod NhkeyPrimary bucket pagesOverflow pages20N-1Database Management Systems, R. Ramakrishnan and J. Gehrke 4Static Hashing (Contd.) Buckets contain data entries. Hash fn works on search key field of record r. Must distribute values over range 0 ... N-1.– h(key) = (a * key + b) usually works well.– a and b are constants; lots known about how to tune h. Long overflow chains can develop and degrade performance– Extendible and Linear Hashing: Dynamic techniques to fix this problem.Database Management Systems, R. Ramakrishnan and J. Gehrke 5Extendible Hashing Main idea: If bucket (primary page) becomes full, why not re-organize file by doubling # of buckets?– Essentially “splitting” buckets But reading and writing all buckets is expensive!– Idea: Use directory of pointers to buckets, – Double # of buckets by doubling the directory, splitting just the bucket that overflowed!– Directory much smaller than file, so doubling it is much cheaper. – No overflow pages!Database Management Systems, R. Ramakrishnan and J. Gehrke 6Insert h(r)=14000110112222LOCAL DEPTH2DIRECTORYGLOBAL DEPTHBucket ABucket BBucket CBucket D1* 5* 21*13*32*16*10*15* 7* 19*4* 12*000110112222LOCAL DEPTH2DIRECTORYGLOBAL DEPTHBucket ABucket BBucket CBucket D1* 5* 21*13*32*16*10*15* 7* 19*14*4* 12*Database Management Systems, R. Ramakrishnan and J. Gehrke 7Insert h(r)=20000110112222LOCAL DEPTH2DIRECTORYGLOBAL DEPTHBucket ABucket BBucket CBucket D1* 5* 21*13*32*16*10*15* 7* 19*20*000110112222LOCAL DEPTH22DIRECTORYGLOBAL DEPTHBucket ABucket BBucket CBucket DBucket A2(`split image'of Bucket A)1* 5* 21*13*32*16*10*15* 7* 19*4* 12*4* 12*Database Management Systems, R. Ramakrishnan and J. Gehrke 8Insert h(r)=2020*000110112222LOCAL DEPTH22DIRECTORYGLOBAL DEPTHBucket ABucket BBucket CBucket DBucket A2(`split image'of Bucket A)1* 5* 21*13*32*16*10*15* 7* 19*4* 12*19*222000001010011100101110111333DIRECTORYBucket ABucket BBucket CBucket DBucket A2(`split image'of Bucket A)32*1* 5* 21*13*16*10*15*7*4*20*12*LOCAL DEPTHGLOBAL DEPTHDatabase Management Systems, R. Ramakrishnan and J. Gehrke 9Insert h(r)=320LOCAL DEPTH0DIRECTORYGLOBAL DEPTHBucket A1*10*4* 12*1*10* 32*4*12*0111LOCAL DEPTH1DIRECTORYGLOBAL DEPTHBucket ABucket BDatabase Management Systems, R. Ramakrishnan and J. Gehrke 10Insert h(r)=161*10* 32*4*12*0111LOCAL DEPTH1DIRECTORYGLOBAL DEPTHBucket ABucket B1*32*16*10*4* 12*00011011212LOCAL DEPTH2DIRECTORYGLOBAL DEPTHBucket ABucket BBucket CDatabase Management Systems, R. Ramakrishnan and J. Gehrke 11Insert h(r)=201*32*16*10*4* 12*00011011212LOCAL DEPTH2DIRECTORYGLOBAL DEPTHBucket ABucket BBucket C12000001010011100101110111333DIRECTORYBucket ABucket BBucket CBucket A2(`split image'of Bucket A)32*1*16*10*4*20*12*LOCAL DEPTHGLOBAL DEPTHDatabase Management Systems, R. Ramakrishnan and J. Gehrke 12Insert h(r)=5, 15, 7, 1912000001010011100101110111333DIRECTORYBucket ABucket BBucket CBucket A2(`split image'of Bucket A)32*1*16*10*4*20*12*LOCAL DEPTHGLOBAL DEPTH5* 15* 7*22000001010011100101110111333DIRECTORYBucket ABucket BBucket CBucket A2(`split image'of Bucket A)32*1*16*10*4*20*12*LOCAL DEPTHGLOBAL DEPTH2Bucket B2(`split image'of Bucket B)15*19*7*5*Database Management Systems, R. Ramakrishnan and J. Gehrke 13Deletions Inverse of insertion If removal of data entry makes bucket empty, merge with ‘split image’ If each directory element points to same bucket as its split image, can halve directoryDatabase Management Systems, R. Ramakrishnan and J. Gehrke 14Comments on Extendible Hashing If directory fits in memory, equality search answered with one disk access; else two– 100MB file, 100 bytes/rec, 4K pages contains 1,000,000 records (as data entries) and 25,000 directory elements; chances are high that directory will fit in memory. Directory grows in spurts, and, if the distribution of hash values is skewed, directory can grow large– Multiple entries with same hash value cause problems!– When would this happen?Database Management Systems, R. Ramakrishnan and J. Gehrke 15Linear Hashing This is another dynamic hashing scheme, an alternative to Extendible Hashing LH handles the problem of long overflow chains without using a directory, and handles duplicates Main idea: split one bucket at a time in roundsDatabase Management Systems, R. Ramakrishnan and J. Gehrke 16Inserting h(r) = 432hh3(This infois for illustrationonly!)Level=2, N=400011011000001010011(The actual contentsof the linear hashedfile)Next=0PRIMARYPAGESData entry rwith h(r)=5Primary bucket page44*36*32*25*9* 5*14*18*10*30*31*35*11*7*2hh3Level=200011011000001010011Next=0PRIMARYPAGES32*25*9* 5*14* 18*10*30*31*35*11*7*OVERFLOWPAGES43*Database Management Systems, R. Ramakrishnan and J. Gehrke 17Example (Inserting h(r) = 43)2hh3Level=200011011000001010011Next=1PRIMARYPAGES44* 36*32*25*9* 5*14* 18*10*30*31*35*11*7*OVERFLOWPAGES43*001002hh3Level=200011011000001010011Next=0PRIMARYPAGES32*25*9* 5*14*18*10*30*31*35* 11*7*OVERFLOWPAGES43*Database Management Systems, R. Ramakrishnan and J. Gehrke 18Inserting h(r) = 50 (End of a Round)2hh322*0001101100000101001100100Next=30110101110Level=2PRIMARYPAGESOVERFLOWPAGES32*9*5*14*25*66*10*18* 34*35*31* 7*11*43*44*36*37*29*30*2hh337*000110110000010100110010010101110Next=0Level=311111PRIMARYPAGESOVERFLOWPAGES1132*9* 25*66*18* 10*34*35* 11*44*36*5*29*43*14*30* 22*31*7*50*Database Management Systems, R. Ramakrishnan and J. Gehrke 19Overview of LH File  In the middle of a round.Levelh Buckets that existed at thebeginning of this round: this is the range ofNextBucket to be split of other buckets) in this roundLevelh search key value )(search key value )(Buckets split in this


View Full Document

CORNELL CS 432 - Hash-Based Indexes

Download Hash-Based Indexes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hash-Based Indexes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hash-Based Indexes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?