Unformatted text preview:

Introduction to Algorithms 6 006 Massachusetts Institute of Technology Professors Erik Demaine Piotr Indyk and Manolis Kellis February 15 2011 Problem Set 2 Problem Set 2 Both theory and programming questions are due Monday February 28 at 11 59PM Solutions should be turned in through the course website Your solutions to questions that provoke a written response should be in PDF format and typeset using LATEXY our solutions to problems that ask you to write code should be valid Python files which run from the command line A template for writing up solutions in LATEX is available on the course website Remember your goal is to communicate Full credit will be given only to a correct solution which is described clearly Convoluted and obtuse descriptions might receive low marks even when they are correct Also aim for concise solutions as it will save you time spent on write ups and also help you conceptualize the key idea of the problem Problem 2 1 20 points Hash functions and load a 3 points Imagine that an algorithm requires us to hash strings containing English phrases Knowing that strings are stored as sequences of characters Alyssa P Hacker decides to simply use the sum of those character values modulo the size of her hash table as the string s hash Will the performance of her implementation match the expected value shown in lecture Argue that it will or provide a compelling reason why it will not and propose a solution b 3 points Imagine that we plan to hash the addresses of small in memory data structures After much optimization and reading over Alyssa s shoulder Ben Bitdiddle manages to get this data struture to fit in 32 bytes one cache line of the computer she s using He then writes an algorithm that allocates a contiguous block of memory containing many of these data structures and inserts some of their addresses into a hash table whose size is some appropriate power of two Will the performance of his implementation match the expected value shown in lecture Argue that it will or provide a compelling reason why it will not and propose a solution c 3 points Ben then decides that he needs to hash unordered sets of plain integers He gives implementing this a try but something seems to be wrong His code is given in setHash Explain what the problem is and provide an implementation that is a valid hash function producing a plain integer You may not use any built in Python hash functions in your solution d 5 points The ability to quantitatively measure the performance of a hash function will be useful we d like to be able to validate our intuition regarding which hash functions are good and which ones are not 2 Problem Set 2 In hasheval py you ll find a stub called hashEval It takes a hash function a hash table size and a collection of values and produce some statistics about what a hash table of the given size using the given function and storing the given values would look like A simple wrapper called printHashEval that pretty prints your results is included The six statistics that your implementation of hashEval should return are listed above the stub they are in order the load factor the proportion of nonempty slots the number of values that collide with another value the mean size of the nonempty slots the median size of the nonempty slots and the number of items in the most filled slot You should also implement randInts which generates a uniformly distributed sequence of random integers that we can use to experiment with hashEval e 3 points A hash function is provided that is based on Python s own hashing scheme for integers that is the hash is the integer itself modulo table size Start with a table size of 8 and use hashEval to evaluate the provided hash function on a set of 4 values generated with randInt Experiment with different table sizes and ratios of slots to values Does this strategy seem to work well What load factor approximately can you reliably achieve Provide a few statistics to support your conclusion f 3 points As you ve just seen in a practical setting the simple uniform hashing assumption has two parts it requires that a particular value have an equal chance of being placed in any slot uniformity and that the slot a particular value is placed in be unaffected by which slot any other value is placed in independence Briefly describe a situation in which each is violated but the other holds Problem 2 2 25 points Collision resolution and dynamic resizing In this problem we ll consider two very serious real world challenges that face hash table implementations collision resolution and dynamic resizing A simple hash table implementation called Footable is provided in footable py As provided it cannot dynamically resize itself and it will raise an exception if a collision occurs The implementation of Footable uses a list internally for which lookup is O n We re going to ignore this that is treat it as though it were O 1 for the purposes of this problem dictionaries are so central to the Python way of thinking that it s not otherwise possible to do what we want without using them a 2 points If we have a collision resolution strategy why do we need to consider dynamically resizing our hash table It sounds like a lot of work after all both for us and for our computers b 2 points What about the other way around If we re going to bother dynamically resizing our hash table why do we need a collision resolution strategy Why don t we just resize whenever a collision occurs Problem Set 2 3 c 3 points One simple method for resolving collisions is linear probing Implement this strategy in a subclass named FootableLinear Provide in a function named badLinear a test case demonstrating a situation in which this strategy is not good and explain what undesirable behavior it produces and why d 3 points Another simple method for resolving collisions is chaining Implement this strategy in a sublcass named FootableChaining e 3 points Implement quadratic probing in a subclass called FootableQuadratic f 3 points Implement double hashing in a subclass called FootableDouble g 3 points Compare and constrast in terms of the problems that they can prevent and the problems that they can cause the four strategies you ve implemented Say two things about each method h 3 points Now think about dynamically resizing your hash table Provide a description of an algorithm that would turn a FooTable of size n into a FooTable of size n m What is the asymptotic time complexity of your algorithm i 3


View Full Document

MIT 6 006 - Problem Set 2

Documents in this Course
Quiz 1

Quiz 1

7 pages

Quiz 2

Quiz 2

12 pages

Quiz 2

Quiz 2

9 pages

Quiz 1

Quiz 1

10 pages

Quiz 2

Quiz 2

11 pages

Quiz 1

Quiz 1

12 pages

Graphs

Graphs

27 pages

Load more
Loading Unlocking...
Login

Join to view Problem Set 2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Problem Set 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?