DOC PREVIEW
MIT 6 006 - Problem Set 2

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Introduction to Algorithms: 6.006Massachusetts Institute of Technology February 15, 2011Professors Erik Demaine, Piotr Indyk, and Manolis Kellis Problem Set 2Problem Set 2Both theory and programming questions are due Monday, February 28 at 11:59PM.Solutions should be turned in through the course website. Your solutions to questions that provokea written response should be in PDF format and typeset using LATEX˙Your solutions to problemsthat ask you to write code should be valid Python files which run from the command line. Atemplate for writing up solutions in LATEX is available on the course website. Remember, yourgoal is to communicate. Full credit will be given only to a correct solution which is describedclearly. Convoluted and obtuse descriptions might receive low marks, even when they are correct.Also, aim for concise solutions, as it will save you time spent on write-ups, and also help youconceptualize the key idea of the problem.Problem 2-1. [20 points] Hash functions and load(a) [3 points] Imagine that an algorithm requires us to hash strings containing Englishphrases. Knowing that strings are stored as sequences of characters, Alyssa P. Hackerdecides to simply use the sum of those character values (modulo the size of her hashtable) as the string’s hash.Will the performance of her implementation match the expected value shown in lec-ture? Argue that it will or provide a compelling reason why it will not (and propose asolution).(b) [3 points] Imagine that we plan to hash the addresses of small, in-memory data struc-tures. After much optimization (and reading over Alyssa’s shoulder), Ben Bitdiddlemanages to get this data struture to fit in 32 bytes (one cache line of the computershe’s using). He then writes an algorithm that allocates a contiguous block of memorycontaining many of these data structures and inserts some of their addresses into ahash table whose size is some appropriate power of two.Will the performance of his implementation match the expected value shown in lec-ture? Argue that it will or provide a compelling reason why it will not (and propose asolution).(c) [3 points] Ben then decides that he needs to hash unordered sets of plain integers. Hegives implementing this a try, but something seems to be wrong. His code is given insetHash. Explain what the problem is and provide an implementation that is a validhash function producing a plain integer. (You may not use any built-in Python hashfunctions in your solution.)(d) [5 points] The ability to quantitatively measure the performance of a hash functionwill be useful; we’d like to be able to validate our intuition regarding which hashfunctions are good and which ones are not.2 Problem Set 2In hasheval.py, you’ll find a stub called hashEval. It takes a hash function, a“hash table size”, and a collection of values and produce some statistics about whata hash table of the given size using the given function and storing the given valueswould look like.A simple wrapper called printHashEval that pretty-prints your results is included.The six statistics that your implementation of hashEval should return are listedabove the stub; they are (in order) the load factor, the proportion of nonempty slots,the number of values that collide with another value, the mean size of the nonemptyslots, the median size of the nonempty slots, and the number of items in the most-filledslot.You should also implement randInts, which generates a uniformly-distributed se-quence of random integers that we can use to experiment with hashEval.(e) [3 points] A hash function is provided that is based on Python’s own hashing schemefor integers (that is, the hash is the integer itself, modulo table size). Start with a tablesize of 8 and use hashEval to evaluate the provided hash function on a set of 4 valuesgenerated with randInt. Experiment with different table sizes and ratios of slots tovalues. Does this strategy seem to work well? What load factor, approximately, canyou reliably achieve? Provide a few statistics to support your conclusion.(f) [3 points] As you’ve just seen in a practical setting, the simple uniform hashing as-sumption has two parts; it requires that a particular value have an equal chance ofbeing placed in any slot (uniformity) and that the slot a particular value is placed in beunaffected by which slot any other value is placed in (independence). Briefly describea situation in which each is violated but the other holds.Problem 2-2. [25 points] Collision resolution and dynamic resizingIn this problem, we’ll consider two very serious, real-world challenges that face hash table imple-mentations: collision resolution and dynamic resizing. A simple hash table implementation calledFootable is provided in footable.py. As provided, it cannot dynamically resize itself, andit will raise an exception if a collision occurs.(The implementation of Footable uses a list internally, for which lookup is O(n). We’re goingto ignore this–that is, treat it as though it were O(1)–for the purposes of this problem; dictionariesare so central to the Python way of thinking that it’s not otherwise possible to do what we wantwithout using them!)(a) [2 points] If we have a collision resolution strategy, why do we need to considerdynamically resizing our hash table? It sounds like a lot of work, after all—both forus and for our computers!(b) [2 points] What about the other way around? If we’re going to bother dynamicallyresizing our hash table, why do we need a collision resolution strategy? Why don’twe just resize whenever a collision occurs?Problem Set 2 3(c) [3 points] One simple method for resolving collisions is linear probing. Implementthis strategy in a subclass named FootableLinear. Provide in a function namedbadLinear a test case demonstrating a situation in which this strategy is not good,and explain what undesirable behavior it produces and why.(d) [3 points] Another simple method for resolving collisions is chaining. Implement thisstrategy in a sublcass named FootableChaining.(e) [3 points] Implement quadratic probing in a subclass called FootableQuadratic.(f) [3 points] Implement double hashing in a subclass called FootableDouble.(g) [3 points] Compare and constrast, in terms of the problems that they can prevent andthe problems that they can cause, the four strategies you’ve implemented. (Say twothings about each method.)(h) [3 points] Now think about dynamically resizing your hash table. Provide a


View Full Document

MIT 6 006 - Problem Set 2

Documents in this Course
Quiz 1

Quiz 1

7 pages

Quiz 2

Quiz 2

12 pages

Quiz 2

Quiz 2

9 pages

Quiz 1

Quiz 1

10 pages

Quiz 2

Quiz 2

11 pages

Quiz 1

Quiz 1

12 pages

Graphs

Graphs

27 pages

Load more
Download Problem Set 2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Problem Set 2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Problem Set 2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?