DOC PREVIEW
CMU CS 15826 - Lecture

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CMU SCS15-826: Multimedia Databasesand Data MiningPrimary key indexing – hashingC. Faloutsos15-826 Copyright: C. Faloutsos (2005) 2CMU SCSOutlineGoal: ‘Find similar / interesting things’• Intro to DB• Indexing - similarity search• Data Mining15-826 Copyright: C. Faloutsos (2005) 3CMU SCSIndexing - Detailed outline• primary key indexing– B-trees and variants– (static) hashing– extendible hashing• secondary key indexing• spatial access methods• text• ...15-826 Copyright: C. Faloutsos (2005) 4CMU SCS(Static) HashingProblem: “find EMP record with ssn=123”What if disk space was free, and time was atpremium?15-826 Copyright: C. Faloutsos (2005) 5CMU SCSHashingA: Brilliant idea: key-to-address transformation:#0 page#123 page#999,999,999123; Smith; Main str15-826 Copyright: C. Faloutsos (2005) 6CMU SCSHashingSince space is NOT free:• use M, instead of 999,999,999 slots• hash function: h(key) = slot-id#0 page#123 page#999,999,999123; Smith; Main str215-826 Copyright: C. Faloutsos (2005) 7CMU SCSHashingTypically: each hash bucket is a page, holdingmany records:#0 page#h(123)M123; Smith; Main str15-826 Copyright: C. Faloutsos (2005) 8CMU SCSHashingNotice: could have clustering, or non-clusteringversions:#0 page#h(123)M123; Smith; Main str.15-826 Copyright: C. Faloutsos (2005) 9CMU SCS123...HashingNotice: could have clustering, or non-clusteringversions:#0 page#h(123)M...EMP file123; Smith; Main str....234; Johnson; Forbes ave345; Tompson; Fifth ave...15-826 Copyright: C. Faloutsos (2005) 10CMU SCSHashing - design decisions?• eg., IRS, 200M tax returns, by SSN15-826 Copyright: C. Faloutsos (2005) 11CMU SCSIndexing- overview• B-trees• hashing– hashing functions– size of hash table– collision resolution• Hashing vs B-trees• Indices in SQL15-826 Copyright: C. Faloutsos (2005) 12CMU SCSDesign decisions1) formula h() for hashing function2) size of hash table M3) collision resolution method315-826 Copyright: C. Faloutsos (2005) 13CMU SCSDesign decisions - functions• Goal: uniform spread of keys over hashbuckets• Popular choices:– Division hashing– Multiplication hashing15-826 Copyright: C. Faloutsos (2005) 14CMU SCSDivision hashingh(x) = (a*x+b) mod M• eg., h(ssn) = (ssn) mod 1,000– gives the last three digits of ssn• M: size of hash table - choose a primenumber, defensively (why?)15-826 Copyright: C. Faloutsos (2005) 15CMU SCS• eg., M=2; hash on driver-license number(dln), where last digit is ‘gender’ (0/1 = M/F)• in an army unit with predominantly malesoldiers• Thus: avoid cases where M and keys havecommon divisors - prime M guards againstthat!Division hashing15-826 Copyright: C. Faloutsos (2005) 16CMU SCSMultiplication hashingh(x) = [ fractional-part-of ( x * ) ] * M• : golden ratio ( 0.618... = ( sqrt(5)-1)/2 )• in general, we need an irrational number• advantage: M need not be a prime number• but must be irrational15-826 Copyright: C. Faloutsos (2005) 17CMU SCSOther hashing functions• quadratic hashing (bad)• ...• conclusion: use division hashing15-826 Copyright: C. Faloutsos (2005) 18CMU SCSDesign decisions1) formula h() for hashing function2) size of hash table M3) collision resolution method415-826 Copyright: C. Faloutsos (2005) 19CMU SCSSize of hash table• eg., 50,000 employees, 10 employee-records / page• Q: M=?? pages/buckets/slots15-826 Copyright: C. Faloutsos (2005) 20CMU SCSSize of hash table• eg., 50,000 employees, 10 employees/page• Q: M=?? pages/buckets/slots• A: utilization ~ 90% and– M: prime numberEg., in our case: M= closest prime to50,000/10 / 0.9 = 5,55515-826 Copyright: C. Faloutsos (2005) 21CMU SCSDesign decisions1) formula h() for hashing function2) size of hash table M3) collision resolution method15-826 Copyright: C. Faloutsos (2005) 22CMU SCSCollision resolution• Q: what is a ‘collision’?• A: ??15-826 Copyright: C. Faloutsos (2005) 23CMU SCSCollision resolution#0 page#h(123)M123; Smith; Main str.15-826 Copyright: C. Faloutsos (2005) 24CMU SCSCollision resolution• Q: what is a ‘collision’?• A: ??• Q: why worry about collisions/overflows?(recall that buckets are ~90% full)• A: ‘birthday paradox’515-826 Copyright: C. Faloutsos (2005) 25CMU SCSCollision resolution• open addressing– linear probing (ie., put to next slot/bucket)– re-hashing• separate chaining (ie., put links to overflowpages)15-826 Copyright: C. Faloutsos (2005) 26CMU SCSCollision resolution#0 page#h(123)M123; Smith; Main str.linear probing:15-826 Copyright: C. Faloutsos (2005) 27CMU SCSCollision resolution#0 page#h(123)M123; Smith; Main str.re-hashingh1()h2()15-826 Copyright: C. Faloutsos (2005) 28CMU SCSCollision resolution123; Smith; Main str.separate chaining15-826 Copyright: C. Faloutsos (2005) 29CMU SCSDesign decisions - conclusions• function: division hashing– h(x) = ( a*x+b ) mod M• size M: ~90% util.; prime number.• collision resolution: separate chaining– easier to implement (deletions!);– no danger of becoming full15-826 Copyright: C. Faloutsos (2005) 30CMU SCSIndexing- overview• B-trees• hashing– Hashing vs B-trees• Indices in SQL• extendible hashing615-826 Copyright: C. Faloutsos (2005) 31CMU SCSHashing vs B-trees:Hashing offers• speed ! ( O(1) avg. search time)..but:15-826 Copyright: C. Faloutsos (2005) 32CMU SCSHashing vs B-trees:..but B-trees give:• key ordering:– range queries– proximity queries– sequential scan• O(log(N)) guarantees for search, ins./del.• graceful growing/shrinking15-826 Copyright: C. Faloutsos (2005) 33CMU SCSHashing vs B-trees:thus:• B-trees are implemented in most systemsfootnotes:• hashing is rarely implemented (why not?)• ‘dbm’ and ‘ndbm’ of UNIX: offer one or both15-826 Copyright: C. Faloutsos (2005) 34CMU SCSIndexing- overview• B-trees• hashing– Hashing vs B-trees• Indices in SQL• extendible hashing15-826 Copyright: C. Faloutsos (2005) 35CMU SCSIndexing in SQL• create index <index-name> on <relation-name> (<attribute-list>)• create unique index <index-name> on<relation-name> (<attribute-list>)• drop index <index-name>15-826 Copyright: C. Faloutsos (2005) 36CMU SCSIndexing in SQL• eg.,create index ssn-indexon STUDENT (ssn)• or (eg., on TAKES(ssn,cid, grade) ):create index sc-indexon TAKES (ssn, c-id)715-826 Copyright: C. Faloutsos (2005) 37CMU SCSIndexing- overview• B-trees• hashing• Indices in SQL• extensible hashing– ‘extendible’ hashing


View Full Document

CMU CS 15826 - Lecture

Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?