DOC PREVIEW
U-M EECS 281 - Lecture Note

This preview shows page 1-2-23-24 out of 24 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Announcements• HW1 PAST DUE• HW2 online: 7 questions, 60 points• Nat’l Inst visit Thu, ok?Last time:• Continued PA1 Walk Through• Dictionary ADT: Unsorted• HashingToday:• Finish up hashing• Sorted Dictionary ADT: Binary search, divide-and-conquer• Recursive function and recurrence relationSugih Jamin ([email protected])Hash Table and HashingReference items in a table (the hash table) by keysUse arithmetic operations to transform keys into table addresses (buckets)Two issues in hashing:1. construction of the hash function2. policy for collision resolutionSugih Jamin ([email protected])Components of Hash FunctionsHash function (h()): transforms the search key into a bucket index• hash code: t(key) → intmapmaps the key (which could be a string, etc.) into an integer (intmap)• compression map: c(intmap) → bucketmaps the intmap into an address within the table, i.e.,into the range [0, M − 1], for table of size MGiven a key: h(key) → c(t(key)) → bucket/indexSugih Jamin ([email protected])Hash FunctionsCommon parts of hashing function:• truncation (hash code): remove parts common across keys, orextract bits from keys that are more likely to be unique acros keys.Examples:o 734 763 1583o 141.213.8.193o cello.eecs.umich.edu• folding (hash code): partition the key into several parts and combinethem in a “convenient” way• modulo arithmetic (compression map)Sugih Jamin ([email protected])Example of Folding: Polynomial Hash CodePolynomial hash code (t) takes positional info into account:t(x[]) = x[0]ak−1+ x[1]ak−2+ . . . + x[k − 2]a + x[k − 1]If a = 33, the intmap’s are:• t(“tops”) =• t(“pots”) =Good choices of a for English words: {33, 37, 39, 41}What does it mean for a to be a good choice?Why are these particular values good?Sugih Jamin ([email protected])Compression Mapping• division method: |intmap| mod M, M table size (prime)• MAD (multiply and divide) method:|a ∗ intmap + b| mod M, a and b non-negative integersSugih Jamin ([email protected])Hashing: Collision ResolutionWith hashing, collision is inevitableWhat do you do when two items hash into the same bucket/bin?Methods of collision resolution:• separate chaining• scatter tableo coalesced chainingo open addressing:- linear probing- quadratic probing- double hashing• extensible hashing: dynamic hashingSugih Jamin ([email protected])Collision ResolutionSeparate chaining:let each bin points to a linked list of elements (see Fig. 8.3)Coalesced chaining:• store linked list in unused portion of the table• if item hashes to an already occupied bin, follow the linked list andadd item to the end (coalesced chaining, see Fig. 8.4)• hash table can only hold as many items as table sizeOpen addressing: if there’s a collision, apply another hash functionrepeatedly until there’s no collision. Set of hash functions:{h0, h1, h2, . . .}Sugih Jamin ([email protected])Open AddressingTo probe ::= to look at an entry and compare its key with targetLinear probing: hi(key) = (h0(key) + i) mod Mdo a linear search from h0(key) until you find an empty slot(see Fig. 8.6, compare against Fig. 8.4)Clustering: when contiguous bins are all occupiedWhy is clustering undesirable?Assuming input size N, table size 2N:What is the best-case cluster distribution?What is the worst-case cluster distribution?What’s the average time to find an empty slot in both cases?Sugih Jamin ([email protected])Open Addressing (cont)Quadratic probing: hi(key) = (h0(key) + i2) mod Mless likely to form clusters, but only works when table is less than half fullbecause it cannot hit every possible table addressDouble hashing: hi(key) = (h(key) + ih0(key)) mod Muse 2 distinct hash functionsRemoval on scatter tables: complicated, don’t want to move an elementup the table beyond its actual hash location; must rehash the rest ofchain/cluster, or otherwise mark deleted entry as “deleted” (see Fig. 8.5)Sugih Jamin ([email protected])Timing AnalysisDifferentiate between successful(S()) and unsuccessful(U()) searchExample: passwd lookup wants success to be quick,failure can be (is desirable to be) slowWhat’s the time complexity of search for separate chaining?Worst case:• S() =• U() =Sugih Jamin ([email protected])Average Case: Load FactorLoad factor: λ = N/M, where N = number of items, M = hash tablesize• empty table, λ = 0• half full, λ = 0.5• full, λ = 1• can λ > 1?If hash function is good, length of average chain in separate chaining isλ = N/M (number of comparison reduced by a factor of M if usingseparate chaining)Given λ,• average unsuccessful search time U(λ) = 1 + λ• average successful search time S(λ) = 1 + λ/2Sugih Jamin ([email protected])Average Case Search Times (FYI)Coalesced chaining:• U(λ) = 1 −14(e2λ− 1 − 2λ); For λ = 1, U(1) = 2.1• S(λ) = 1 −18λ(e2λ− 1 − 2λ) +λ4; For λ = 1, S(1) = 1.8Linear probing:• U(λ) =12(1 + (11−λ)2)• S(λ) =12(1 +11−λ)Quadratic probing and double hashing:• U(λ) =11−λ• S(λ) =1λlog11−λSugih Jamin ([email protected])Hashing Search TimeSeparate chaining: increases graduallyScatter table: increases dramatically as table fills,when table is full, no more keys may be insertedExtensible hashing (dynamic hashing):Double table size when it is half fullOld entries must be re-hashed into larger tableThis is an expensive, but hopefully infrequent, operationDoubling cost is amortized over individual search/insertSugih Jamin ([email protected])When NOT to use HashingRank search: return the kthlargest itemRange search: return values between h and kSort: return the values in orderReadings: done with Preiss Ch. 8Sugih Jamin ([email protected])Dictionary ADTTable lookup: look up a key in a tableThe key is associated with some other information (value)Key space is usually more structured/regular than value space,so easier to searchAn entry is a <key, value> pairFor example: <title, mp3 file>Usually the ADT stores pointers to entriesSugih Jamin ([email protected])RuntimeUnsorted list:• insert: O()• search: O()Sorted list:• insert: O()• search: O()Sugih Jamin ([email protected])Binary Search: Iterative VersionGiven a sorted list, with unique keys, perform a Divide and Conqueralgorithm to find a keyFind 31 in a[20 27 29 31 35 38 42 53 59 63 67 78]Write an iterative version of binary search:ibinsearch(int *a, int n, int key),a[] sorted, n array size,


View Full Document

U-M EECS 281 - Lecture Note

Download Lecture Note
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Note and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Note 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?