WSU CPTS 223 - Hashing - D1149169

Home> Schools> Washington State University> (CPTS) > CPTS 223> Hashing

DOC PREVIEW

WSU CPTS 223 - Hashing

School name Washington State University

Course Cpts 223- Advanced Data Structures C/C++

Pages 51

This preview shows page 1-2-3-24-25-26-27-49-50-51 out of 51 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 51 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

111111Hashing22222OverviewHashingTechnique supporting insertion, deletion and search in average-case constant timeOperations requiring elements to be sorted (e.g., FindMin) are not efficiently supportedHash table ADTImplementationsAnalysisApplicationsHash TableOne approachHash table is an array of fixed size TableSizeArray elements indexed by a key, which is mapped to an array index (0…TableSize-1)Mapping (hash function) h from key to indexE.g., h(“john”) = 33Hash TableInsertT [h(“john”] = <“john”,25000>DeleteT [h(“john”)] = NULLSearchReturn T [h(“john”)]What if h(“john”) = h(“joe”) ?4Hash FunctionMapping from key to array index is called a hash functionTypically, many-to-one mappingDifferent keys map to different indicesDistributes keys evenly over tableCollision occurs when hash function maps two keys to same array index5Hash FunctionSimple hash functionh(Key) = Key mod TableSizeAssumes integer keysFor random keys, h() distributes keys evenly over tableWhat if TableSize = 100 and keys are multiples of 10?Better if TableSize is a prime numberNot too close to powers of 2 or 106Hash Function for String KeysApproach 1Add up character ASCII values (0-127) to produce integer keysSmall strings may not use all of tableStrlen(S) * 127 < TableSizeAnagrams will map to the same indexApproach 2Treat first 3 characters of string as base-27 integer (26 letters plus space)Key = S[0] + (27 * S[1]) + (272 * S[2])Assumes first 3 characters randomly distributedNot true of English7Hash Function for String KeysApproach 3Use all N characters of string as an N-digit base-K integerChoose K to be prime number larger than number of different digits (characters)I.e., K = 29, 31, 37If L = length of string S, thenUse Horner’s rule to compute h(S)Limit L for long strings8TableSizeiLSShLiimod37]1[)(10∗−−=∑−=Collision ResolutionWhat happens when h(k1) = h(k2)?Collision resolution strategiesChainingStore colliding keys in a linked listOpen addressingStore colliding keys elsewhere in the table9ChainingCollision resolution approach #1Collision Resolution by ChainingHash table T is a vector of listsOnly singly-linked lists needed if memory is tightKey k is stored in list at T[h(k)]E.g., TableSize = 10h(k) = k mod 10Insert first 10 perfect squares11Implementation of Chaining Hash Table12Generic hash functions for integers and keysImplementation of Chaining Hash Table1314Each of these operations takes time linear in the length of the list.15Later, but essentially doubles size of list and reinserts current elements.16All hash objects must define == and != operators.Hash function to handle Employee object typeCollision Resolution by Chaining: AnalysisLoad factor λ of a hash table TN = number of elements in TM = size of Tλ = N/MAverage length of a chain is λUnsuccessful search O(λ)Successful search O(λ/2)Ideally, want λ ≈ 1 (not a function of N)I.e., TableSize = number of elements you expect to store in the table17Open AddressingCollision resolution approach #2Collision Resolution byOpen AddressingWhen a collision occurs, look elsewhere in the table for an empty slotAdvantages over chainingNo need for addition list structuresNo need to allocate/deallocate memory during insertion/deletion (slow)DisadvantagesSlower insertion – May need several attempts to find an empty slotTable needs to be bigger (than chaining-based table) to achieve average-case constant-time performanceLoad factor λ ≈ 0.519Collision Resolution byOpen AddressingProbe sequenceSequence of slots in hash table to searchh0(x), h1(x), h2(x), …Needs to visit each slot exactly onceNeeds to be repeatable (so we can find/delete what we’ve inserted)Hash functionhi(x) = (h(x) + f(i)) mod TableSizef(0) = 020Linear Probingf(i) is a linear function of iE.g., f(i) = iExample: h(x) = x mod TableSizeh0(89) = (h(89)+f(0)) mod 10 = 9h0(18) = (h(18)+f(0)) mod 10 = 8h0(49) = (h(49)+f(0)) mod 10 = 9 (X)h1(49) = (h(49)+f(1)) mod 10 = 021Linear Probing Example22Linear Probing: AnalysisProbe sequences can get longPrimary clusteringKeys tend to cluster in one part of tableKeys that hash into cluster will be added to the end of the cluster (making it even bigger)23Linear Probing: AnalysisExpected number of probes for insertion or unsuccessful searchExpected number of probes for successful searchExample (λ = 0.5)Insert / unsuccessful search2.5 probesSuccessful search1.5 probesExample (λ = 0.9)Insert / unsuccessful search50.5 probesSuccessful search5.5 probes24−+2)1(1121λ−+)1(1121λRandom Probing: AnalysisRandom probing does not suffer from clusteringExpected number of probes for insertion or unsuccessful search:Exampleλ = 0.5: 1.4 probesλ = 0.9: 2.6 probes25λλ−11ln1Linear vs. Random Probing26Load factor λ# probesLinear probingRandom probingQuadratic ProbingAvoids primary clusteringf(i) is quadratic in iE.g., f(i) = i2Exampleh0(58) = (h(58)+f(0)) mod 10 = 8 (X)h1(58) = (h(58)+f(1)) mod 10 = 9 (X)h2(58) = (h(58)+f(2)) mod 10 = 227Quadratic Probing Example28Quadratic Probing: AnalysisDifficult to analyzeTheorem 5.1New element can always be inserted into a table that is at least half empty and TableSize is primeOtherwise, may never find an empty slot, even is one existsEnsure table never gets half fullIf close, then expand it29Quadratic ProbingOnly M (TableSize) different probe sequencesMay cause “secondary clustering”DeletionEmptying slots can break probe sequenceLazy deletionDifferentiate between empty and deleted slotSkip deleted slotsSlows operations (effectively increases λ)30Quadratic Probing: Implementation31Quadratic Probing: Implementation32Lazy deletionQuadratic Probing: Implementation33Ensure table size is primeQuadratic Probing: Implementation34Quadratic probe sequence (really)FindSkip DELETED;No duplicatesQuadratic Probing: Implementation35InsertRemoveNo deallocationneededNo duplicatesDouble HashingCombine two different hash functionsf(i) = i * h2(x)Good choices for h2(x) ?Should never evaluate to 0h2(x) = R – (x mod

View Full Document