DOC PREVIEW
CMU BSC 03711 - Homework

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

03-511/711 Computational Genomics and Molecular Biology, Fall 2010 1Problem Set 2 Due October 5thCollaboration is allowed on this homework. You must hand in homeworks individually and listthe names of the people you worked with.You may not use a program to do this homework. Turn in your handwritten answers on theattached sheets. Extra alignment templates will be available on the website.1. Progressive alignment is a heuristic method that builds up a series of partial multiple align-ments. At each step, alignments from the previous step are merged to yield a larger multiplealignment, until a multiple alignment containing all sequences in the data set is obtained.The most common approach to merging alignments is to align them as though they weresequences over a larger alphabet. For example, a pairwise global alignment yields a sequenceover the 24 letter alphabe t Σ∗× Σ∗− {()} (i.e., {AA, AC, AG, AT, CA, . . .}). Durbin callsthis a profile.Suppose you want to align a a profile s[1, .., m] with a sequence t[1, .., n], where elements ins[i] are of the form x y, xor y, and elements in t[i] are of the form z, where x, y, z ∈ Σ.(a) Write down the recurrence relation for calculating the alignment matrix a[i, j], for thecase where s[i] is of the form x y, in terms of p(·, ·), g, x, y and t[·].(b) Write down the recurrence relation for calculating a[i, j], for the c ase where s[i] is of theform xor y.03-511/711 Computational Genomics and Molecular Biology, Fall 2010 22. Consider the following multiple sequence ali gnment:A: I R E L S M F N I D MB: L K D M S V W Q I E LC: L N E L C V W Q V E VD: V K D L S L F H V Q V(a) Determine the parsimony score each of the three unrooted topologies for the taxa A, B,C and D, by summing the parsimony scores for each column. Which one(s) are mostparsimonious?(b) How many sites are informative ?(c) Compute the pairwise edit distance for all six pairs in the data set to obtain a pairwisedistance matrix.03-511/711 Computational Genomics and Molecular Biology, Fall 2010 3(d) Does your matrix fit a tree? How do you know?(e) Are all sequences in this data set changing at the same rate? How do you know?(f) Which of the three unrooted topologies with four leaves is preferred by this distancematrix? (Hint: to find just the preferred topology, without inferring the branch lengths,you do not need to apply an algorithm.)(g) Based on comparing your answers for (a) and (b) with your answer in (f), which approachis better for this data set: distances or parsimony? Why?03-511/711 Computational Genomics and Molecular Biology, Fall 2010 43. Consider a tree enumeration scheme for generating all unrooted trees with k leaves thatiterates from i = 4 to i = k. At the i’th step, it generates trees with i leaves by attachingthe ith taxon to each tree with i − 1 leaves at each possible edge. The figure below shows theenumeration process for i = 3, 4 and 5 for a tree with five taxa: A, B, C, D, E.03-511/711 Computational Genomics and Molecular Biology, Fall 2010 5(a) You wish to find all topologies with minimum parsimony score, given the data A = “T”,B = “G”, C = “T”, D = “G”, E = “G”. As a first step, give the parsimony score forthe trees with three leaves and four leaves. Show the internal labels and the edges onwhich change occurs. (There may be more than one set of optimal labels. It is sufficientto show one.)(b) Suppose you have a topology at step i − 1 with score s and you attach the ith taxon toone of its edges. What is the possible range of scores, in terms of s, that the resultingtree may have?(c) Based on your labels of trees of size i = 4, what is the minimum number of topologiesof size 5 must you score in order to ensure that you have found all most parsimonioustrees of size 5?(d) Show the set of all most parsimonious topologies with 5 leaves.03-511/711 Computational Genomics and Molecular Biology, Fall 2010 64. This problem illustrates the strengths and weaknesses of different approaches to tree recon-struction. Remember that while in this case you know the tree from which the distancematrix was derived, when analyzing real data, you will only have the matrix, not the tree.(a) Compute the distance matrix for the following rooted tree:912622126RabbitArmadilloMouseHumanFrog03-511/711 Computational Genomics and Molecular Biology, Fall 2010 7(b) UPGMA and Neighbor Joining (NJ) are greedy algorithms that build trees by iterativelyidentifying neighboring nodes and merging them to form a new subtree. How does theUPGMA algorithm select the taxa that will be neighbors in the next iteration?(c) Working only from the distance matrix (forget you know the tree), which nodes willUPGMA select as the first pair of neighbors? Why?(d) Draw the first subtree with an intermediate node x linking the two neighbors. What arethe branch lengths in this subtree?03-511/711 Computational Genomics and Molecular Biology, Fall 2010 8(e) How does the NJ algorithm select the taxa that will be neighbors in the next iteration?(f) Working only from the distance matrix, which nodes will Neighbor Joining select as thefirst pair of neighbors? Show your calculations for the first iteration of the algorithm.(g) Draw the first subtree constructed by the Neighbor Joining algorithm. What are thebranch lengths in this subtree?(h) Which algorithm is a better choice for reconstructing this tree? Why?(i) How would you determine the root of the reconstructed tree? Describe the procedure indetail.03-511/711 Computational Genomics and Molecular Biology, Fall 2010 95. There are three rooted trees with three leaves:BACCB AA B CFor each of these trees, we can build a tree with four leaves, by adding one more edge at fivedifferent locations:A B CDDA CBDA CBA B CDA B CDA B C03-511/711 Computational Genomics and Molecular Biology, Fall 2010 10(a) Give Ek, the number of edges in a rooted tree with k leaves, and Tk, the number ofrooted tress with k leaves, for k = 3, k = 4 and k = 5.(b) Write down the expression for Ekin terms of k.(c) Write down the recurrence relation for the number of rooted trees with k leaves in termsof Ek−1and Tk−1(d) Write down the expression for Tkin terms of k; i.e., solve your recurrence rel


View Full Document

CMU BSC 03711 - Homework

Documents in this Course
lecture

lecture

8 pages

Lecture

Lecture

3 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

7 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

21 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

lecture

lecture

11 pages

Lecture

Lecture

8 pages

Lecture

Lecture

9 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Lecture

Lecture

9 pages

Problem

Problem

7 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

6 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Problem

Problem

11 pages

Notes

Notes

7 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Homework

Homework

15 pages

Lecture

Lecture

16 pages

Problem

Problem

15 pages

Load more
Download Homework
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Homework and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Homework 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?