1CS 277 – Spring 2002 Notes 5 1CS 277: Database SystemImplementationArthur KellerNotes 5: Hashing and MoreCS 277 – Spring 2002 Notes 5 2key → h(key)Hashing<key>...Buckets(typically 1disk block)CS 277 – Spring 2002 Notes 5 3...Two alternativesrecords...(1) key → h(key)CS 277 – Spring 2002 Notes 5 4(2) key → h(key)Indexrecordkey 1Two alternatives• Alt (2) for “secondary” search keyCS 277 – Spring 2002 Notes 5 5Example hash function•Key = ‘x1 x2 … xn’ n byte character string• Have b buckets• h: add x1 + x2 + ….. xn– compute sum modulo bCS 277 – Spring 2002 Notes 5 6˝ This may not be best function …˝ Read Knuth Vol. 3 if you reallyneed to select a good function.Good hash ? Expected number of function: keys/bucket is thesame for all buckets2CS 277 – Spring 2002 Notes 5 7Within a bucket:• Do we keep keys sorted?•Yes, if CPU time critical & Inserts/Deletes not too frequentCS 277 – Spring 2002 Notes 5 8Next: example to illustrateinserts, overflows, deletesh(K)CS 277 – Spring 2002 Notes 5 9EXAMPLE 2 records/bucketINSERT:h(a) = 1h(b) = 2h(c) = 1h(d) = 00123dacbh(e) = 1eCS 277 – Spring 2002 Notes 5 100123abcedEXAMPLE: deletionDelete:effgmaybe move“g” upcdCS 277 – Spring 2002 Notes 5 11Rule of thumb:• Try to keep space utilizationbetween 50% and 80% Utilization = # keys used total # keys that fit• If < 50%, wasting space• If > 80%, overflows significantdepends on how good hashfunction is & on # keys/bucketCS 277 – Spring 2002 Notes 5 12How do we cope with growth?• Overflows and reorganizations• Dynamic hashing•Extensible• Linear3CS 277 – Spring 2002 Notes 5 13Extensible hashing: two ideas(a) Use i of b bits output by hash function b h(K) → use i → grows over time….00110101CS 277 – Spring 2002 Notes 5 14(b) Use directoryh(K)[i ] to bucket......CS 277 – Spring 2002 Notes 5 15Example: h(k) is 4 bits; 2 keys/bucketi = 111000110011100Insert 1010111001010New directory200011011i =22CS 277 – Spring 2002 Notes 5 161000121001101021100Insert:01110000000110112i =Example continued011100000111000122CS 277 – Spring 2002 Notes 5 17000110112i =2100110102110020111200000001Insert:1001Example continued1001100110100000010100111001011101113i =33CS 277 – Spring 2002 Notes 5 18Extensible hashing: deletion• No merging of blocks•Merge blocks and cut directory if possible(Reverse insert procedure)4CS 277 – Spring 2002 Notes 5 19Deletion example:• Run thru insert example in reverse!CS 277 – Spring 2002 Notes 5 20Extensible hashingCan handle growing files- with less wasted space- with no full reorganizationsSummary+Indirection(Not bad if directory in memory)Directory doubles in size(Now it fits, now it does not)--CS 277 – Spring 2002 Notes 5 21Linear hashing•Another dynamic hashing schemeTwo ideas:(a) Use i low order bits of hash01110101growsbi(b) File grows linearlyCS 277 – Spring 2002 Notes 5 22Example b=4 bits, i =2, 2 keys/bucket00 01 10 110101111100001010m = 01 (max used block)FuturegrowthbucketsIf h(k)[i ] ≤ m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2i -1Rule0101• can have overflow chains!• insert 0101CS 277 – Spring 2002 Notes 5 23Example b=4 bits, i =2, 2 keys/bucket00 01 10 110101111100001010m = 01 (max used block)Futuregrowthbuckets1010100101• insert 01011111110101CS 277 – Spring 2002 Notes 5 24Example Continued: How to grow beyond this?00 01 10 1111111010010101010000m = 11 (max used block)i = 200 00100 101 110 1113. . .100100101101010101015CS 277 – Spring 2002 Notes 5 25• If U > threshold then increase m(and maybe i )* When do we expand file?•Keep track of: # used slots total # of slots= UCS 277 – Spring 2002 Notes 5 26Linear Hashing Can handle growing files- with less wasted space- with no full reorganizations No indirection like extensible hashingSummary++Can still have overflow chains-CS 277 – Spring 2002 Notes 5 27Example: BAD CASEVery fullVery empty Need to movem here…Would wastespace...CS 277 – Spring 2002 Notes 5 28Hashing- How it works- Dynamic hashing- Extensible- LinearSummaryCS 277 – Spring 2002 Notes 5 29Next:• Indexing vs Hashing•Index definition in SQL• Multiple key accessCS 277 – Spring 2002 Notes 5 30• Hashing good for probes given keye.g., SELECT … FROM RWHERE R.A = 5Indexing vs Hashing6CS 277 – Spring 2002 Notes 5 31• INDEXING (Including B Trees) good forRange Searches:e.g., SELECTFROM RWHERE R.A > 5Indexing vs HashingCS 277 – Spring 2002 Notes 5 32Index definition in SQL• Create index name on rel (attr)• Create unique index name on rel (attr)defines candidate key• Drop INDEX nameCS 277 – Spring 2002 Notes 5 33 CANNOT SPECIFY TYPE OF INDEX(e.g. B-tree, Hashing, …) OR PARAMETERS(e.g. Load Factor, Size of Hash,...) ... at least in SQL...NoteCS 277 – Spring 2002 Notes 5 34 ATTRIBUTE LIST ⇒ MULTIKEY INDEX(next) e.g., CREATE INDEX foo ON R(A,B,C)NoteCS 277 – Spring 2002 Notes 5 35Motivation: Find records where DEPT = “Toy” AND SAL > 50kMulti-key IndexCS 277 – Spring 2002 Notes 5 36Strategy I:• Use one index, say Dept.• Get all Dept = “Toy” records and check their salaryI17CS 277 – Spring 2002 Notes 5 37•Use 2 Indexes; Manipulate PointersToy Sal> 50kStrategy II:CS 277 – Spring 2002 Notes 5 38•Multiple Key IndexOne idea:Strategy III:I1I2I3CS 277 – Spring 2002 Notes 5 39ExampleExampleRecordDeptIndexSalaryIndexName=JoeDEPT=SalesSAL=15kArtSalesToy10k15k17k21k12k15k15k19kCS 277 – Spring 2002 Notes 5 40For which queries is this index good?Find RECs Dept = “Sales” SAL=20kFind RECs Dept = “Sales” SAL > 20kFind RECs Dept = “Sales”Find RECs SAL = 20kCS 277 – Spring 2002 Notes 5 41Interesting application:• Geographic DataDATA: <X1,Y1, Attributes> <X2,Y2, Attributes>xy. . .CS 277 – Spring 2002 Notes 5 42Queries:• What city is at <Xi,Yi>?• What is within 5 miles from <Xi,Yi>?• Which is closest point to <Xi,Yi>?8CS 277 – Spring 2002 Notes 5 43hnbiacod10 2010 20Exampleegfmlkj2515 35 2040302010h ia bcd efgn omlj k• Search points near f• Search points near b51515CS 277 – Spring 2002 Notes 5 44Queries•Find points with Yi > 20•Find points with Xi < 5• Find points “close” to i = <12,38>•Find points “close” to b = <7,24>CS 277 – Spring 2002 Notes 5 45• Many types of geographic index
View Full Document