Unformatted text preview:

CSE 326: Data Structures Part 5 HashingMidtermDictionary & Search ADTsImplementations So FarHash Tables: Basic IdeaApplicationsHow could you use hash tables to…Properties of Good Hash FunctionsInteger KeysSlide 10Strings as KeysHashing StringsMaking the String Hash Easy to ComputeHow Can You Hash…Slide 15Optimal Hash FunctionCollisions and their ResolutionA Rose by Any Other Name…Hashing with Separate ChainingLoad Factor with Separate ChainingSlide 21Alternative Strategy: Closed HashingCollision Resolution by Closed HashingClosed Hashing I: Linear ProbingLinear Probing ExampleDrawbacks of Linear ProbingLoad Factor in Linear ProbingOptimal vs LinearClosed Hashing II: Quadratic ProbingQuadratic Probing ExampleProblem With Quadratic ProbingLoad Factor in Quadratic ProbingClosed Hashing III: Double HashingDouble Hashing ExampleSlide 35Load Factor in Double HashingDeletion with Separate ChainingDeletion in Closed HashingLazy DeletionThe Squished Pigeon PrincipleRehashing ExampleRehashing Amortized AnalysisRehashing without StretchingCase StudySolutionsStorageAnalysisApproximate HashingSlide 49ExampleRough Error CalculationExact Error CalculationA Random Hash…PuzzlerDatabases1CSE 326: Data StructuresPart 5HashingHenry KautzAutumn 20022Midterm•Monday November 4th •Will cover everything through hash tables•No homework due that day, but a study sheet and practice problems on trees and hashing will be distributed•50 minutes, in class•You may bring one page of notes to refer to3Dictionary & Search ADTs•Operations–create–destroy–insert–find–delete•Dictionary: Stores values associated with user-specified keys–keys may be any (homogenous) comparable type–values may be any (homogenous) type–implementation: data field is a struct with two parts•Search ADT: keys = values•kim chi–spicy cabbage•kreplach–tasty stuffed dough•kiwi–Australian fruitinsertfind(kreplach)•kohlrabi - upscale tuber• kreplach - tasty stuffed dough4Implementations So FarunsortedlistsortedarrayTreesBST – averageAVL – worst casesplay – amortizedArray of size n where keys are 0,…,n-1insertfind+(1) (n) (log n)find(n) (log n) (log n)deletefind+(1) (n) (log n)5Hash Tables: Basic Idea•Use a key (arbitrary string or number) to index directly into an array – O(1) time to access records–A[“kreplach”] = “tasty stuffed dough”–Need a hash function to convert the key to an integerKey Data0kim chi spicy cabbage1kreplach tasty stuffed dough2kiwi Australian fruit6Applications•When log(n) is just too big…–Symbol tables in interpreters–Real-time databases (in core or on disk)•air traffic control•packet routing•When associative memory is needed…–Dynamic programming•cache results of previous computationf(x) if ( Find(x) ) then Find(x) else f(x)•Chess endgames–Many text processing applications – e.g. Web$Status{$LastURL} = “visited”;7How could you use hash tables to…•Implement a linked list of unique elements?•Create an index for a book?•Convert a document to a Sparse Boolean Vector (where each index represents a different word)?8Properties of Good Hash Functions•Must return number 0, …, tablesize•Should be efficiently computable – O(1) time•Should not waste space unnecessarily–For every index, there is at least one key that hashes to it–Load factor lambda  = (number of keys / TableSize)•Should minimize collisions= different keys hashing to same index9Integer Keys•Hash(x) = x % TableSize•Good idea to make TableSize prime. Why?10Integer Keys•Hash(x) = x % TableSize•Good idea to make TableSize prime. Why?–Because keys are typically not randomly distributed, but usually have some pattern•mostly even•mostly multiples of 10•in general: mostly multiples of some k–If k is a factor of TableSize, then only (TableSize/k) slots will ever be used!–Since the only factor of a prime number is itself, this phenomena only hurts in the (rare) case where k=TableSize11Strings as Keys•If keys are strings, can get an integer by adding up ASCII values of characters in key for (i=0;i<key.length();i++) hashVal += key.charAt(i);•Problem 1: What if TableSize is 10,000 and all keys are 8 or less characters long? •Problem 2: What if keys often contain the same characters (“abc”, “bca”, etc.)?12Hashing Strings•Basic idea: consider string to be a integer (base 128):Hash(“abc”) = (‘a’*1282 + ‘b’*1281 + ‘c’) % TableSize•Range of hash large, anagrams get different values•Problem: although a char can hold 128 values (8 bits), only a subset of these values are commonly used (26 letters plus some special characters)–So just use a smaller “base” –Hash(“abc”) = (‘a’*322 + ‘b’*321 + ‘c’) % TableSize13Making the String HashEasy to Compute•Horner’s Rule•Advantages:int hash(String s) { h = 0; for (i = s.length() - 1; i >= 0; i--) { h = (s.keyAt(i) + h<<5) % tableSize; } return h; }What is happening here???14How Can You Hash…•A set of values – (name, birthdate) ?•An arbitrary pointer in C?•An arbitrary reference to an object in Java?15How Can You Hash…•A set of values – (name, birthdate) ? (Hash(name) ^ Hash(birthdate))% tablesize•An arbitrary pointer in C? ((int)p) % tablesize•An arbitrary reference to an object in Java? Hash(obj.toString()) or just obj.hashCode() % tablesizeWhat’s this?16Optimal Hash Function•The best hash function would distribute keys as evenly as possible in the hash table•“Simple uniform hashing”–Maps each key to a (fixed) random number–Idealized gold standard–Simple to analyze–Can be closely approximated by best hash functions17Collisions and their Resolution•A collision occurs when two different keys hash to the same value–E.g. For TableSize = 17, the keys 18 and 35 hash to the same value–18 mod 17 = 1 and 35 mod 17 = 1•Cannot store both data records in the same slot in array!•Two different methods for collision resolution:–Separate Chaining: Use a dictionary data structure (such as a linked list) to store multiple items that hash to the same slot–Closed Hashing (or probing): search for empty slots using a second function and store item in first empty slot that is found18A Rose by Any Other Name…•Separate chaining = Open hashing•Closed hashing = Open addressing193210654a de bcHashing with Separate Chaining•Put a little dictionary at


View Full Document

UW CSE 326 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?