CS 245: Database System PrinciplesPowerPoint PresentationSlide 3Slide 4Example hash functionSlide 6Within a bucket:Next: example to illustrate inserts, overflows, deletesEXAMPLE 2 records/bucketSlide 10Rule of thumb:How do we cope with growth?Extensible hashing: two ideasSlide 14Example: h(k) is 4 bits; 2 keys/bucketSlide 16Slide 17Extensible hashing: deletionDeletion example:Extensible hashingLinear hashingExample b=4 bits, i =2, 2 keys/bucketSlide 23Example Continued: How to grow beyond this? When do we expand file?Linear HashingExample: BAD CASESlide 28Next:Slide 30Slide 31Index definition in SQLSlide 33Slide 34Strategy I:Strategy II:Strategy III:ExampleFor which queries is this index good?Interesting application:Queries:Slide 43QueriesSlide 45Two more types of multi key indexesGrid IndexCLAIMSlide 49Solution: Use IndirectionWith indirection:Can also index grid on value rangesGrid filesSlide 54EX:Slide 56Slide 57Slide 58Slide 59Reading Chapter 5The BIG picture….CS 245 Notes 5 1CS 245: Database System PrinciplesHector Garcia-MolinaNotes 5: Hashing and MoreCS 245 Notes 5 2key h(key)Hashing<key>Buckets(typically 1disk block)CS 245 Notes 5 3...Two alternativesrecords...(1) key h(key)CS 245 Notes 5 4(2) key h(key)Indexrecordkey 1Two alternatives•Alt (2) for “secondary” search keyCS 245 Notes 5 5Example hash function•Key = ‘x1 x2 … xn’ n byte character string•Have b buckets•h: add x1 + x2 + ….. xn– compute sum modulo bCS 245 Notes 5 6 This may not be best function … Read Knuth Vol. 3 if you reallyneed to select a good function.Good hash Expected number of function: keys/bucket is thesame for all bucketsCS 245 Notes 5 7Within a bucket:•Do we keep keys sorted?•Yes, if CPU time critical & Inserts/Deletes not too frequentCS 245 Notes 5 8Next: example to illustrateinserts, overflows, deletesh(K)CS 245 Notes 5 9EXAMPLE 2 records/bucketINSERT:h(a) = 1h(b) = 2h(c) = 1h(d) = 00123dacbh(e) = 1eCS 245 Notes 5 100123abcedEXAMPLE: deletionDelete:effgmaybe move“g” upcdCS 245 Notes 5 11Rule of thumb:•Try to keep space utilizationbetween 50% and 80% Utilization = # keys used total # keys that fit•If < 50%, wasting space•If > 80%, overflows significantdepends on how good hashfunction is & on # keys/bucketCS 245 Notes 5 12How do we cope with growth?•Overflows and reorganizations•Dynamic hashing•Extensible•LinearCS 245 Notes 5 13Extensible hashing: two ideas(a) Use i of b bits output by hash function b h(K) use i grows over time….00110101CS 245 Notes 5 14(b) Use directoryh(K)[i ] to bucketCS 245 Notes 5 15Example: h(k) is 4 bits; 2 keys/bucketi =111000110011100Insert 1010111001010New directory200011011i =22CS 245 Notes 5 161000121001101021100Insert:01110000000110112i =Example continued011100000111000122CS 245 Notes 5 17000110112i =2100110102110020111200000001Insert:1001Example continued1001100110100000010100111001011101113i =33CS 245 Notes 5 18Extensible hashing: deletion•No merging of blocks•Merge blocks and cut directory if possible(Reverse insert procedure)CS 245 Notes 5 19Deletion example:•Run thru insert example in reverse!CS 245 Notes 5 20Extensible hashingCan handle growing files- with less wasted space- with no full reorganizationsSummary+Indirection(Not bad if directory in memory)Directory doubles in size(Now it fits, now it does not)--CS 245 Notes 5 21Linear hashing•Another dynamic hashing schemeTwo ideas:(a) Use i low order bits of hash01110101growsbi(b) File grows linearlyCS 245 Notes 5 22Example b=4 bits, i =2, 2 keys/bucket00 01 10 110101111100001010m = 01 (max used block)FuturegrowthbucketsIf h(k)[i ] m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2i -1Rule0101• can have overflow chains!• insert 0101CS 245 Notes 5 23Example b=4 bits, i =2, 2 keys/bucket00 01 10 110101111100001010m = 01 (max used block)Futuregrowthbuckets1010100101• insert 01011111110101CS 245 Notes 5 24Example Continued: How to grow beyond this?00 01 10 1111111010010101010000m = 11 (max used block)i = 20 0 0 0100 101 110 1113. . .10010010110101010101CS 245 Notes 5 25•If U > threshold then increase m(and maybe i ) When do we expand file?•Keep track of: # used slots total # of slots= UCS 245 Notes 5 26Linear Hashing Can handle growing files- with less wasted space- with no full reorganizations No indirection like extensible hashingSummary++Can still have overflow chains-CS 245 Notes 5 27Example: BAD CASEVery fullVery empty Need to movem here…Would wastespace...CS 245 Notes 5 28Hashing- How it works- Dynamic hashing- Extensible- LinearSummaryCS 245 Notes 5 29Next:•Indexing vs Hashing•Index definition in SQL•Multiple key accessCS 245 Notes 5 30•Hashing good for probes given keye.g., SELECT … FROM RWHERE R.A = 5Indexing vs HashingCS 245 Notes 5 31•INDEXING (Including B Trees) good forRange Searches:e.g., SELECTFROM RWHERE R.A > 5Indexing vs HashingCS 245 Notes 5 32Index definition in SQL•Create index name on rel (attr)•Create unique index name on rel (attr)defines candidate key•Drop INDEX nameCS 245 Notes 5 33 CANNOT SPECIFY TYPE OF INDEX(e.g. B-tree, Hashing, …) OR PARAMETERS(e.g. Load Factor, Size of Hash,...) ... at least in SQL...NoteCS 245 Notes 5 34 ATTRIBUTE LIST MULTIKEY INDEX(next) e.g., CREATE INDEX foo ON R(A,B,C)NoteCS 245 Notes 5 35 Motivation: Find records where DEPT = “Toy” AND SAL > 50kMulti-key IndexCS 245 Notes 5 36Strategy I:•Use one index, say Dept.•Get all Dept = “Toy” records and check their salaryI1CS 245 Notes 5 37•Use 2 Indexes; Manipulate PointersToy Sal> 50kStrategy II:CS 245 Notes 5 38•Multiple Key IndexOne idea: Strategy III:I1I2I3CS 245 Notes 5 39ExampleExampleRecordDeptIndexSalaryIndexName=JoeDEPT=SalesSAL=15kArtSalesToy10k15k17k21k12k15k15k19kCS 245 Notes 5 40For which queries is this index good?Find RECs Dept = “Sales” SAL=20kFind RECs Dept = “Sales” SAL > 20kFind RECs Dept = “Sales”Find RECs SAL = 20kCS 245 Notes 5 41Interesting application:•Geographic DataDATA: <X1,Y1, Attributes> <X2,Y2, Attributes>xy. . .CS 245 Notes 5 42Queries:•What city is at <Xi,Yi>?•What is within 5 miles from <Xi,Yi>?•Which is closest point to <Xi,Yi>?CS 245 Notes 5 43 hnbiacod10 2010 20Exampleegfmlkj2515 35 2040302010h ia bcd efgn omlj k• Search points near f• Search points near b51515CS 245 Notes 5
View Full Document