Unformatted text preview:

1CS 277 – Spring 2002 Notes 5 1CS 277: Database SystemImplementationArthur KellerNotes 5: Hashing and MoreCS 277 – Spring 2002 Notes 5 2key → h(key)Hashing<key>...Buckets(typically 1disk block)CS 277 – Spring 2002 Notes 5 3...Two alternativesrecords...(1) key → h(key)CS 277 – Spring 2002 Notes 5 4(2) key → h(key)Indexrecordkey 1Two alternatives• Alt (2) for “secondary” search keyCS 277 – Spring 2002 Notes 5 5Example hash function•Key = ‘x1 x2 … xn’ n byte character string• Have b buckets• h: add x1 + x2 + ….. xn– compute sum modulo bCS 277 – Spring 2002 Notes 5 6˝ This may not be best function …˝ Read Knuth Vol. 3 if you reallyneed to select a good function.Good hash ? Expected number of function: keys/bucket is thesame for all buckets2CS 277 – Spring 2002 Notes 5 7Within a bucket:• Do we keep keys sorted?•Yes, if CPU time critical & Inserts/Deletes not too frequentCS 277 – Spring 2002 Notes 5 8Next: example to illustrateinserts, overflows, deletesh(K)CS 277 – Spring 2002 Notes 5 9EXAMPLE 2 records/bucketINSERT:h(a) = 1h(b) = 2h(c) = 1h(d) = 00123dacbh(e) = 1eCS 277 – Spring 2002 Notes 5 100123abcedEXAMPLE: deletionDelete:effgmaybe move“g” upcdCS 277 – Spring 2002 Notes 5 11Rule of thumb:• Try to keep space utilizationbetween 50% and 80% Utilization = # keys used total # keys that fit• If < 50%, wasting space• If > 80%, overflows significantdepends on how good hashfunction is & on # keys/bucketCS 277 – Spring 2002 Notes 5 12How do we cope with growth?• Overflows and reorganizations• Dynamic hashing•Extensible• Linear3CS 277 – Spring 2002 Notes 5 13Extensible hashing: two ideas(a) Use i of b bits output by hash function b h(K) → use i → grows over time….00110101CS 277 – Spring 2002 Notes 5 14(b) Use directoryh(K)[i ] to bucket......CS 277 – Spring 2002 Notes 5 15Example: h(k) is 4 bits; 2 keys/bucketi = 111000110011100Insert 1010111001010New directory200011011i =22CS 277 – Spring 2002 Notes 5 161000121001101021100Insert:01110000000110112i =Example continued011100000111000122CS 277 – Spring 2002 Notes 5 17000110112i =2100110102110020111200000001Insert:1001Example continued1001100110100000010100111001011101113i =33CS 277 – Spring 2002 Notes 5 18Extensible hashing: deletion• No merging of blocks•Merge blocks and cut directory if possible(Reverse insert procedure)4CS 277 – Spring 2002 Notes 5 19Deletion example:• Run thru insert example in reverse!CS 277 – Spring 2002 Notes 5 20Extensible hashingCan handle growing files- with less wasted space- with no full reorganizationsSummary+Indirection(Not bad if directory in memory)Directory doubles in size(Now it fits, now it does not)--CS 277 – Spring 2002 Notes 5 21Linear hashing•Another dynamic hashing schemeTwo ideas:(a) Use i low order bits of hash01110101growsbi(b) File grows linearlyCS 277 – Spring 2002 Notes 5 22Example b=4 bits, i =2, 2 keys/bucket00 01 10 110101111100001010m = 01 (max used block)FuturegrowthbucketsIf h(k)[i ] ≤ m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2i -1Rule0101• can have overflow chains!• insert 0101CS 277 – Spring 2002 Notes 5 23Example b=4 bits, i =2, 2 keys/bucket00 01 10 110101111100001010m = 01 (max used block)Futuregrowthbuckets1010100101• insert 01011111110101CS 277 – Spring 2002 Notes 5 24Example Continued: How to grow beyond this?00 01 10 1111111010010101010000m = 11 (max used block)i = 200 00100 101 110 1113. . .100100101101010101015CS 277 – Spring 2002 Notes 5 25• If U > threshold then increase m(and maybe i )* When do we expand file?•Keep track of: # used slots total # of slots= UCS 277 – Spring 2002 Notes 5 26Linear Hashing Can handle growing files- with less wasted space- with no full reorganizations No indirection like extensible hashingSummary++Can still have overflow chains-CS 277 – Spring 2002 Notes 5 27Example: BAD CASEVery fullVery empty Need to movem here…Would wastespace...CS 277 – Spring 2002 Notes 5 28Hashing- How it works- Dynamic hashing- Extensible- LinearSummaryCS 277 – Spring 2002 Notes 5 29Next:• Indexing vs Hashing•Index definition in SQL• Multiple key accessCS 277 – Spring 2002 Notes 5 30• Hashing good for probes given keye.g., SELECT … FROM RWHERE R.A = 5Indexing vs Hashing6CS 277 – Spring 2002 Notes 5 31• INDEXING (Including B Trees) good forRange Searches:e.g., SELECTFROM RWHERE R.A > 5Indexing vs HashingCS 277 – Spring 2002 Notes 5 32Index definition in SQL• Create index name on rel (attr)• Create unique index name on rel (attr)defines candidate key• Drop INDEX nameCS 277 – Spring 2002 Notes 5 33 CANNOT SPECIFY TYPE OF INDEX(e.g. B-tree, Hashing, …) OR PARAMETERS(e.g. Load Factor, Size of Hash,...) ... at least in SQL...NoteCS 277 – Spring 2002 Notes 5 34 ATTRIBUTE LIST ⇒ MULTIKEY INDEX(next) e.g., CREATE INDEX foo ON R(A,B,C)NoteCS 277 – Spring 2002 Notes 5 35Motivation: Find records where DEPT = “Toy” AND SAL > 50kMulti-key IndexCS 277 – Spring 2002 Notes 5 36Strategy I:• Use one index, say Dept.• Get all Dept = “Toy” records and check their salaryI17CS 277 – Spring 2002 Notes 5 37•Use 2 Indexes; Manipulate PointersToy Sal> 50kStrategy II:CS 277 – Spring 2002 Notes 5 38•Multiple Key IndexOne idea:Strategy III:I1I2I3CS 277 – Spring 2002 Notes 5 39ExampleExampleRecordDeptIndexSalaryIndexName=JoeDEPT=SalesSAL=15kArtSalesToy10k15k17k21k12k15k15k19kCS 277 – Spring 2002 Notes 5 40For which queries is this index good?Find RECs Dept = “Sales” SAL=20kFind RECs Dept = “Sales” SAL > 20kFind RECs Dept = “Sales”Find RECs SAL = 20kCS 277 – Spring 2002 Notes 5 41Interesting application:• Geographic DataDATA: <X1,Y1, Attributes> <X2,Y2, Attributes>xy. . .CS 277 – Spring 2002 Notes 5 42Queries:• What city is at <Xi,Yi>?• What is within 5 miles from <Xi,Yi>?• Which is closest point to <Xi,Yi>?8CS 277 – Spring 2002 Notes 5 43hnbiacod10 2010 20Exampleegfmlkj2515 35 2040302010h ia bcd efgn omlj k• Search points near f• Search points near b51515CS 277 – Spring 2002 Notes 5 44Queries•Find points with Yi > 20•Find points with Xi < 5• Find points “close” to i = <12,38>•Find points “close” to b = <7,24>CS 277 – Spring 2002 Notes 5 45• Many types of geographic index


View Full Document

UCSC CS 277 - Database System Implementation

Documents in this Course
Load more
Download Database System Implementation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Database System Implementation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Database System Implementation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?