Unformatted text preview:

CS 245: Database System PrinciplesPowerPoint PresentationSlide 3Slide 4Example hash functionSlide 6Within a bucket:Next: example to illustrate inserts, overflows, deletesEXAMPLE 2 records/bucketSlide 10Rule of thumb:How do we cope with growth?Extensible hashing: two ideasSlide 14Example: h(k) is 4 bits; 2 keys/bucketSlide 16Slide 17Extensible hashing: deletionDeletion example:Extensible hashingLinear hashingExample b=4 bits, i =2, 2 keys/bucketSlide 23Example Continued: How to grow beyond this? When do we expand file?Linear HashingExample: BAD CASESlide 28Next:Slide 30Slide 31Index definition in SQLSlide 33Slide 34Strategy I:Strategy II:Strategy III:ExampleFor which queries is this index good?Interesting application:Queries:Slide 43QueriesSlide 45Two more types of multi key indexesGrid IndexCLAIMSlide 49Solution: Use IndirectionWith indirection:Can also index grid on value rangesGrid filesSlide 54EX:Slide 56Slide 57Slide 58Slide 59Reading Chapter 5The BIG picture….CS 245 Notes 5 1CS 245: Database System PrinciplesHector Garcia-MolinaNotes 5: Hashing and MoreCS 245 Notes 5 2key  h(key)Hashing<key>Buckets(typically 1disk block)CS 245 Notes 5 3...Two alternativesrecords...(1) key  h(key)CS 245 Notes 5 4(2) key  h(key)Indexrecordkey 1Two alternatives•Alt (2) for “secondary” search keyCS 245 Notes 5 5Example hash function•Key = ‘x1 x2 … xn’ n byte character string•Have b buckets•h: add x1 + x2 + ….. xn– compute sum modulo bCS 245 Notes 5 6 This may not be best function … Read Knuth Vol. 3 if you reallyneed to select a good function.Good hash  Expected number of function: keys/bucket is thesame for all bucketsCS 245 Notes 5 7Within a bucket:•Do we keep keys sorted?•Yes, if CPU time critical & Inserts/Deletes not too frequentCS 245 Notes 5 8Next: example to illustrateinserts, overflows, deletesh(K)CS 245 Notes 5 9EXAMPLE 2 records/bucketINSERT:h(a) = 1h(b) = 2h(c) = 1h(d) = 00123dacbh(e) = 1eCS 245 Notes 5 100123abcedEXAMPLE: deletionDelete:effgmaybe move“g” upcdCS 245 Notes 5 11Rule of thumb:•Try to keep space utilizationbetween 50% and 80% Utilization = # keys used total # keys that fit•If < 50%, wasting space•If > 80%, overflows significantdepends on how good hashfunction is & on # keys/bucketCS 245 Notes 5 12How do we cope with growth?•Overflows and reorganizations•Dynamic hashing•Extensible•LinearCS 245 Notes 5 13Extensible hashing: two ideas(a) Use i of b bits output by hash function b h(K)  use i  grows over time….00110101CS 245 Notes 5 14(b) Use directoryh(K)[i ] to bucketCS 245 Notes 5 15Example: h(k) is 4 bits; 2 keys/bucketi =111000110011100Insert 1010111001010New directory200011011i =22CS 245 Notes 5 161000121001101021100Insert:01110000000110112i =Example continued011100000111000122CS 245 Notes 5 17000110112i =2100110102110020111200000001Insert:1001Example continued1001100110100000010100111001011101113i =33CS 245 Notes 5 18Extensible hashing: deletion•No merging of blocks•Merge blocks and cut directory if possible(Reverse insert procedure)CS 245 Notes 5 19Deletion example:•Run thru insert example in reverse!CS 245 Notes 5 20Extensible hashingCan handle growing files- with less wasted space- with no full reorganizationsSummary+Indirection(Not bad if directory in memory)Directory doubles in size(Now it fits, now it does not)--CS 245 Notes 5 21Linear hashing•Another dynamic hashing schemeTwo ideas:(a) Use i low order bits of hash01110101growsbi(b) File grows linearlyCS 245 Notes 5 22Example b=4 bits, i =2, 2 keys/bucket00 01 10 110101111100001010m = 01 (max used block)FuturegrowthbucketsIf h(k)[i ]  m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2i -1Rule0101• can have overflow chains!• insert 0101CS 245 Notes 5 23Example b=4 bits, i =2, 2 keys/bucket00 01 10 110101111100001010m = 01 (max used block)Futuregrowthbuckets1010100101• insert 01011111110101CS 245 Notes 5 24Example Continued: How to grow beyond this?00 01 10 1111111010010101010000m = 11 (max used block)i = 20 0 0 0100 101 110 1113. . .10010010110101010101CS 245 Notes 5 25•If U > threshold then increase m(and maybe i ) When do we expand file?•Keep track of: # used slots total # of slots= UCS 245 Notes 5 26Linear Hashing Can handle growing files- with less wasted space- with no full reorganizations No indirection like extensible hashingSummary++Can still have overflow chains-CS 245 Notes 5 27Example: BAD CASEVery fullVery empty Need to movem here…Would wastespace...CS 245 Notes 5 28Hashing- How it works- Dynamic hashing- Extensible- LinearSummaryCS 245 Notes 5 29Next:•Indexing vs Hashing•Index definition in SQL•Multiple key accessCS 245 Notes 5 30•Hashing good for probes given keye.g., SELECT … FROM RWHERE R.A = 5Indexing vs HashingCS 245 Notes 5 31•INDEXING (Including B Trees) good forRange Searches:e.g., SELECTFROM RWHERE R.A > 5Indexing vs HashingCS 245 Notes 5 32Index definition in SQL•Create index name on rel (attr)•Create unique index name on rel (attr)defines candidate key•Drop INDEX nameCS 245 Notes 5 33 CANNOT SPECIFY TYPE OF INDEX(e.g. B-tree, Hashing, …) OR PARAMETERS(e.g. Load Factor, Size of Hash,...) ... at least in SQL...NoteCS 245 Notes 5 34 ATTRIBUTE LIST  MULTIKEY INDEX(next) e.g., CREATE INDEX foo ON R(A,B,C)NoteCS 245 Notes 5 35 Motivation: Find records where DEPT = “Toy” AND SAL > 50kMulti-key IndexCS 245 Notes 5 36Strategy I:•Use one index, say Dept.•Get all Dept = “Toy” records and check their salaryI1CS 245 Notes 5 37•Use 2 Indexes; Manipulate PointersToy Sal> 50kStrategy II:CS 245 Notes 5 38•Multiple Key IndexOne idea: Strategy III:I1I2I3CS 245 Notes 5 39ExampleExampleRecordDeptIndexSalaryIndexName=JoeDEPT=SalesSAL=15kArtSalesToy10k15k17k21k12k15k15k19kCS 245 Notes 5 40For which queries is this index good?Find RECs Dept = “Sales” SAL=20kFind RECs Dept = “Sales” SAL > 20kFind RECs Dept = “Sales”Find RECs SAL = 20kCS 245 Notes 5 41Interesting application:•Geographic DataDATA: <X1,Y1, Attributes> <X2,Y2, Attributes>xy. . .CS 245 Notes 5 42Queries:•What city is at <Xi,Yi>?•What is within 5 miles from <Xi,Yi>?•Which is closest point to <Xi,Yi>?CS 245 Notes 5 43 hnbiacod10 2010 20Exampleegfmlkj2515 35 2040302010h ia bcd efgn omlj k• Search points near f• Search points near b51515CS 245 Notes 5


View Full Document

NCSU CSC 440 - Hashing and More

Download Hashing and More
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hashing and More and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hashing and More 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?