MIT 6 006 - Resizing Hash Tables - D829582

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 006> Resizing Hash Tables

DOC PREVIEW

MIT 6 006 - Resizing Hash Tables

School name Massachusetts Institute of Technology

Course 6 006- Introduction to Algorithms

Pages 3

This preview shows page 1 out of 3 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

6.006 Intro to Algorithms Recitation 07 February 25, 2011Resizing Hash TablesHash tables perform well if the number of elements in the table remain proportional to the size ofthe table. If we know exactly how many inserts/deletes are going to be performed on a table, wewould be able to set the table size appropriately at initialization. However, it is often the case thatwe won’t know what series of operations will be performed on a table. We must have a strategyto deal a various number of elements in the hash table while preserving an average O(1) access,insertion, and removal operations.To restrict the load balance so that it does not get too large (slow search, insert, delete) or toosmall (waste of memory), we will increase the size of the hash table if it gets too full and decreasethe size of the hash table if it gets too empty.Resizing a hash table consists of choosing a new hash function to map to the new size, creatinga hash table of the new size, iterating through the elements of the old table, and inserting them intothe new table.Consider a hash table that resolves collisions using the chaining method. We will double thesize of the hash table whenever we make an insert operation that results in the load balance exceed-ing 1, i.e. n > m. We will halve the size of the hash table whenever we make a delete operationthat results in the load balance falling beneath14, i.e. n <m4. In the next sections, we will analyzethis approach and show that the average runtime of each insertion and deletion is still O(1), evenfactoring in the time it takes to resize the table.Increasing Table SizeAfter doubling the table size due to an insert, n =m2and the load balance is12.We will need at leastm2insert operations before the next time we double the size of the hash table. The next resizingwill take O(2m) time, as that’s how long it takes to create a table of size 2m.6.006 Intro to Algorithms Recitation 07 February 25, 2011On average, since the number of elements is proportional to the size of the table at all times,each of them2inserts before resizing will still take O(1) time. The last insert will take O(2m)time as we need to factor in the time it takes to resize the table. We can use amortized analysis toargue that the average runtime of all the insertions is O(1). The last insert before resizing costsO(2m) time, but we neededm2inserts before actually paying that cost. We can imagine spreadingthe O(2m) cost across them2inserts evenly, which adds an additional average amortized cost ofO(2m0.5m) per insert, or O(1) per insert. Since the cost of insertion before was O(1), adding anadditional O(1) amortized cost to each insert doesn’t affect the asymptotic runtime and insertionson average take O(1) time still.Decreasing Table SizeSimilarly, after halving the table size due to an deletion, n =m2. We will need at leastm4deleteoperations before the next time we halve the size of the hash table. The cost of the next halving isO(m2) to make a sizem2table.Them4deletes take O(1) time and the resizing cost of O(m2) can be split evenly across thosem4deletes. Each deletion has an additional average amortized cost of O(0.5m0.25m) or O(1). This resultsin maintaining the O(1) average cost per deletion.Performance of Open AddressingRecall that searching, inserting, and deleting an element using open addressing required a probesequence (e.g. linear probing, quadratic probing, double hashing). To analyze the performance ofoperations in open addressing, we must determine on average how many probes does it take beforewe execute the operation. Before, we made the simple uniform hashing assumption (SUHA),which meant a hash function mapped to any slot from 0 to m − 1 with equal probability. Now, wemake the uniform hashing assumption (UHA), which is a slight extension from SUHA. UHAassumes that the probe sequence is a random permutation of the slots 0 to m − 1. In other words,each probe looks likes we’re examining a random slot that we havent examined before.If the table has load balance α, that means there is a p = 1 − α probability that the first probewill find an empty slot under UHA. If the first probe is a collision, note that the probability that the6.006 Intro to Algorithms Recitation 07 February 25, 2011second probe will find an empty slot is greater than p, since there are an equal number of emptyslots that we could insert in, but were choosing randomly from a pool of fewer slots. In general,after each collision, there is a probability of at least p that we will probe into an empty slot.Using principles of probability, if there is exactly a probability of p that we will find an emptyslot at each probe, then we expect to probe1ptimes before we succeed. For example, if p =14, weexpect to probe 4 times before we find an empty slot. Since in our case, our probability of successis actually increasing after each probe,1pis a high estimate on how many times we probe beforewe succeed. Since p = 1 − α, we expect to probe at most1ptimes. Looking at the behavior of the11−αgraph, it is clear that with open addressing, performance is fairly good until α approaches tooclose to 1.Universal HashingWith a fixed hashing function, an adversary could select a series of keys to insert into the hash tablethat all collide, giving the hash table worst case performance. Universal hashing is the idea thatwe select the hash function randomly from a group of hash functions. This means an adversarycannot choose keys that he knows will give worst case performance anymore, since the adversarydoesn’t even know what hash function will be chosen for the table. If we form the group of hashfunctions carefully, we can assure that the expected time for each operations is O(1), even if thereis an adversary who is trying to achieve worst case performance.For universal hashing to work, the group of hash functions H must be universal. This meansthat for each pair of distinct keys k, l, in the universe of keys, the number of hash functions in thegroup for which h(k) = h(l) is at most|H|m. This means, for each pair of distinct keys, the chancesof picking a hash function in which they collide is at most1m, which is the same probability givenby the simple uniform hashing

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 3 pages.

MIT 6 006 - Resizing Hash Tables

Sign up for free to view:

Please select your school