DOC PREVIEW
MIT 6 006 - Problem Set 2

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Introduction to Algorithms: 6.006Massachusetts Institute of Technology February 17rd, 2009Professors Sivan Toledo and Alan Edelman Handout 3Problem Set 2This problem set is divided into two parts: Part A problems are programming tasks, andPart B problems are theory questions.Part A questions are due Tuesday, March 3rd at 11:59PM.Part B questions are due Thursday, March 5th at 11:59PM.Solutions should be turned in through the course website in PDF form using LATEX orscanned handwritten solutions.A template for writing up solutions in LATEX is available on the course website.Remember, your goal is to communicate. Full credit will be given only to the correctsolution which is described clearly. Convoluted and obtuse descriptions might receivelow marks, even when they are correct. Also, aim for concise solutions, as it will save youtime spent on write-ups, and also help you conceptualize the key idea of the problem.Exercises are for extra practice and should not be turned in.Exercises:• CLRS 11.2-1 (page 228)• CLRS 11.2-2 (page 229)• CLRS 11.3-1 (page 236)• CLRS 11.3-3 (page 236)• Prove that red-black trees are balanced, i.e., if a red-black tree contains n nodes, thenits height is O(log n). Red-black trees are binary search trees satisfying the followingproperties:When a node does not have a left (or right) child, we say it has a NIL pointer instead.1. Each node is augmented with a bit signifying whether the node is red or black.2. If a node is red, then both of its children are black.3. The paths from the root to any NIL contain the same number of black nodes.2 Handout 3: Problem Set 2Part A: Due Tuesday, March 3rd1. (50 points) Longest Common SubstringHumans have 23 pairs of chromosomes, while other primates like chimpanzeeshave 24 pairs. Biologists claim that human chromosome #2 is a fusion of two pri-mate chromosomes that they call 2a and 2b. We wish to verify this claim by locatinglong nucleotide chains shared between the human and primate chromosomes.We define the longest common substring of two strings to be the longest contiguousstring that is a substring of both strings e.g. the longest common substring of DEAD-BEEF and EA7BEEF is BEEF.1If there is a tie for longest common substring, we justwant to find one of them.Download ps2-dna.zip from the class website. The problem set FAQ section of thecourse web site contains additional hints for this problem.(a) (2 points)Ben Bitdiddle wrote substring1.py. What is the asymptotic running time ofhis code? Assume |s| = |t| = n.(b) (2 points)Alyssa P Hacker realized that by only comparing substrings of the same length,and by saving substrings in a hash table (in this case, a Python set), she couldvastly speed up Ben’s code.Alyssa wrote substring2.py. What is the asymptotic running time of hercode?(c) (12 points) Recall binary search from Problem Set 1. Using binary search onthe length of the string, implement an O(n2log n) solution. You should be ableto copy Alyssa’s k substring code without changing it, and just rewrite theouter loop longest substring.Check that your code is faster than substring2.py for chr2 first 10000and chr2a first 10000.Put your solution in substring3.py, and submit it to the class website.(d) (30 points)Rabin-Karp string searching is traditionally used to search for a particular sub-string in a large string. This is done by first hashing the substring, and thenusing a rolling hash to quickly compute the hashes of all the substrings of thesame length in the large string.For this problem, we have two large strings, so we can use a rolling hashon both of them. Using this method, implement an O(n log n) solution for1Do not confuse this with the longest common subsequence, in which the characters do not need to becontiguous. The longest common subsequence of DEADBEEF and EA7BEEF is EABEEF.Handout 3: Problem Set 2 3longest substring. You should be able to copy over your outer looplongest substring from part (c) without changing it, and just rewritek substring.Your code should work given any two Python strings (see test-substring.pyfor examples). The comparison should be case-sensitive. We recommend usingthe ord function to convert a character to its ascii value.Check that your code is faster than substring3.py for chr2 first 10000and chr2a first 10000.Put your solution in substring4.py, and submit it to the class website.Remember to thoroughly comment your code, including an explanation of anyparameters chosen for the hash function, and what you do about collisions.(e) (4 points)The human chromosome 2 and the chimp chromosomes 2a and 2b are quitelarge (over 100,000,000 nucleotides each) so we took the first and last millionnucleotides of each chromosome and put them in separate files.chr2 first 1000000 contains the first million nucleotides of human chro-mosome 2, and chr2a first 1000000 contains the first million nucleotidesof chimpanzee chromosome 2a. Note: these files contain both uppercase andlowercase letters that are used by biologists to distinguish between parts of thechromosomes called introns and extrons.Run substring4.py on the following DNA pairs, and submit the lengths ofthe substrings.Warning: This part may take a while depending on your implementation of theRabin-Karp rolling hash. (Leave more than an hour for this part):chr2 first 1000000 and chr2a first 1000000chr2 first 1000000 and chr2b first 1000000chr2 last 1000000 and chr2a last 1000000chr2 last 1000000 and chr2b last 1000000If your code works, and biologists are correct, then the first million codons ofchr2 and chr2a should have much longer substrings in common than the firstmillion codons of chr2 and chr2b. The opposite should be true for the lastmillion codons.(f) Optional: Make your code run in O(n log k) time, where k is the length of thelongest common substring.4 Handout 3: Problem Set 2Part B: Due Thursday, March 5th1. (15 points) Building a Balanced Search Tree from a Sorted ListYou are given a sorted Python list containing n numbers.(a) (5 points) Show how to construct a binary search tree containing the samenumbers. The tree should be roughly balanced (its height should be O(log n))and the running time of your algorithm should be O(n).(b) (5 points) Argue that your algorithm returns a tree of height O(log n). Note: Itis probably easier to prove an absolute bound (such as 1 + log n or 2 log n) thanto use asymptotic notation in the argument.(c) (5 points) Argue that the algorithm runs in O(n) time (here it is hard to


View Full Document

MIT 6 006 - Problem Set 2

Documents in this Course
Quiz 1

Quiz 1

7 pages

Quiz 2

Quiz 2

12 pages

Quiz 2

Quiz 2

9 pages

Quiz 1

Quiz 1

10 pages

Quiz 2

Quiz 2

11 pages

Quiz 1

Quiz 1

12 pages

Graphs

Graphs

27 pages

Load more
Download Problem Set 2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Problem Set 2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Problem Set 2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?