Unformatted text preview:

CSCI 2720HashingImplementing Dynamic DictionariesHash tables vs. Other Data StructuresArray Approach – exampleHash TablesHash Tables – Conceptual ViewSlide 8Hash index/valueHash Functions & insert(…)Hash FunctionThe mod functionHash Tables: Insert ExampleDealing with CollisionsHashing with ChainingSlide 16Choosing a Hash Function – 1Choosing a Hash Function – 2Prime Number DistributionChoosing Hash Function – 3Hash Tables – Open AddressingHashing with Open AddressingOpen AddressingProbe Algorithms (Collision Resolution)Double HashingLoad FactorSlide 27Slide 28Open Addressing vs. Separate ChainingHash Tables in Java1CSCI 2720HashingSpring 20052HashingMotivationTechniquesHash functions3Implementing Dynamic DictionariesWant a data structure in which finds/searches are very fastAs close to O(1) as possibleminimum number of executed instructions per methodInsert and Deletes should be fast tooObjects in dictionary have unique keysA key may be a single property/attribute valueOr may be created from multiple properties/values4Hash tables vs. Other Data StructuresWe want to implement the dictionary operations Insert(), Delete() and Search()/Find() efficiently.Arrays: can accomplish in O(1) time but are not space efficient (assumes we leave empty space for keys not currently in dictionary)Binary search treescan accomplish in O(log n) time are space efficient. Hash Tables: A generalization of an array that under some reasonable assumptions is O(1) for Insert/Delete/Search of a key5Array Approach – exampleA social security application keeping track of people where the primary search key is a person’s social security number (SSN)You can use an array to hold references to all the person objectsUse an array with range 0 - 999,999,999Using the SSN as a key, you have O(1) access to any person objectUnfortunately, the number of active keys (Social Security Numbers) is much less than the array size (1 billion entries)Est. US population, Oct. 20th 2004: 294,564,209Over 60% of the array would be unused6Hash TablesVery useful data structureGood for storing and retrieving key-value pairsNot good for iterating through a list of itemsExample applications:Storing objects according to ID numbersWhen the ID numbers are widely spread outWhen you don’t need to access items in ID order7Hash Tables – Conceptual ViewObj5key=1obj1key=15Obj4key=2Obj2key=3076543210tableObj3key=4bucketshash value/index8Hash Tables solve these problems by using a much smaller array and mapping keys with a hash function. Let universe of keys U and an array of size m. A hash function h is a function from U to 0…m, that is: h : U 0…m Hash TablesU(universe of keys)k1 k2 k3 k4 k601234567h (k2)=2h (k1)=h (k3)=3h (k6)=5h (k4)=79Hash index/valueA hash value or hash index is used to index the hash table (array)A hash function takes a key and returns a hash value/index The hash index is a integer (to index an array)The key is specific value associated with a specific object being stored in the hash tableIt is important that the key remain constant for the lifetime of the object10Hash Functions & insert(…)Usage summary:int hashValue = hashFunction (int key);Or hashValue = hashFunction (String key);Or hashValue = hashFunction (itemType item);Insert method:public void insert (int key, itemType item) { hashValue = hashFunction (key); table[hashValue] = item;}11Hash FunctionYou want a hash function/algorithm that is:FastCreates a good distribution of hash values so that the items (based on their keys) are distributed evenly through the arrayHash functions can use as inputInteger key valuesString key valuesMultipart key valuesMultipart fields, and/orMultiple fields12The mod functionStands for moduloWhen you divide x by y, you get a result and a remainderMod is the remainder8 mod 5 = 39 mod 5 = 410 mod 5 = 015 mod 5 = 0Thus for key-value mod M, multiples of M give the same result, 0But multiples of other numbers do not give the same resultSo what happens when M is a prime number where the keys are not multiples of M?13Hash Tables: Insert ExampleFor example, if we hash keys 0…1000 into a hash table with 5 entries and use h(key) = key mod 5 , we get the following sequence of events: 01234key dataInsert 2 2 …01234key dataInsert 21 2 …21 …01234key dataInsert 34 2 …21 …34 …Insert 54 There is a collision atarray entry #4???14A problem arises when we have two keys that hash in the same array entry – this is called a collision.There are two ways to resolve collision:Hashing with Chaining (a.k.a. “Separate Chaining”): every hash table entry contains a pointer to a linked list of keys that hash in the same entryHashing with Open Addressing: every hash table entry contains only one key. If a new key hashes to a table entry which is filled, systematically examine other table entries until you find one empty entry to place the new keyDealing with Collisions15Hashing with ChainingThe problem is that keys 34 and 54 hash in the same entry (4). We solve this collision by placing all keys that hash in the same hash table entry in a chain (linked list) or bucket (array) pointed by this entry:01234 other key key dataInsert 54 2 2154 34CHAIN01234Insert 101 2 2154 3410116What is the running time to insert/search/delete?Insert: It takes O(1) time to compute the hash function and insert at head of linked listSearch: It is proportional to max linked list lengthDelete: Same as search Therefore, in the unfortunate event that we have a “bad” hash function all n keys may hash in the same table entry giving an O(n) run-time! So how can we create a “good” hash function?Hashing with Chaining17Choosing a Hash Function – 1The performance of the hash table depends on having a hash function that evenly distributes the keys: uniform hashing is the ideal targetChoosing a good hash function requires taking into account the kind of data that will be used.The statistics of the key distribution needs to be accounted forE.g., Choosing the first letter of a last name will likely cause lots of collisions depending on the nationality of the population Most programming languages (including java) have hash functions built in18Choosing a Hash


View Full Document

UGA CSCI 2720 - hashtables

Documents in this Course
Load more
Download hashtables
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view hashtables and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view hashtables 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?