03-511/711 Computational Genomics and Molecular Biology, Fall 2002 1Problem Set 4Collaboration is allowed on this homework. You must hand in homeworks individually and list thenames of the people you worked with. Due in class on Tuesday, December 3rd1. Under the maximum parsimony criterion, we say a column, or site, in a multiple sequencealignment is informative, if it favors one tree topology over another. If the parsimony scoreat a given site in the alignment is the same for all topologies, then the site in uniformative.(a) For each site in the following alignment,1 2 3 4 5 6 7 8 9X. A A G A G T G C AY. A G C C G T C C CZ. A G A T A T C C AW. A G A G A T G C Cstatei. if it is an informative siteii. if so, which of the tree topologies, (XY,ZW), (XZ,YW) or (XW,YZ), does it favor?iii. if not, what is the parsimony score for this site?(b) Show the most parsimonious tree(s).(c) What is the maximum parsimony score for this data set?03-511/711 Computational Genomics and Molecular Biology, Fall 2002 22. (a) What is the parsimony score of the following tree? Show your work.T GC \ /| \ /A | \ /\ | /\ | /\ ____|________________// | \/ | \/ | \/ | \/ | \/ | /\/ | / \/ A / \C / \/ \A G(b) If you do not compute ancestral states, what is the complexity, in terms of the numberof taxa, k, of the algorithm required to compute the maximum parsimony score?03-511/711 Computational Genomics and Molecular Biology, Fall 2002 33. Consider the following matrix of observed distances between four species, A, B, C, D and E:B C D EA 6 8 12 14B 4 8 10C 8 10D 10(a) Reconstruct a tree for this matrix using Neighbor Joining. Show your work and thetopology and branch lengths obtained.03-511/711 Computational Genomics and Molecular Biology, Fall 2002 4(b) Reconstruct a tree for this matrix using UPGMA. Show your work and the topology andbranch lengths obtained.(c) Compare the trees for both reconstructions. Which tree is more plausible? Why?03-511/711 Computational Genomics and Molecular Biology, Fall 2002 54. (a) Suppose that the observed distances between species A, B, C and D is given by thefollowing matrix, O:B C DA 20 10 30B 28 41C 26Is it additive?(b) Given the following tree with branch lengths:A C\ /3 2\ _4_// \/ \/ \/ \18 22/ \/ \/ \B \\Dcompute the matrix of tree distances. Is it additive?03-511/711 Computational Genomics and Molecular Biology, Fall 2002 6(c) According to these distance matrices, are the four species changing at the same rate?(d) Define the error associated with any pair of species to be the absolute value of thedifference between the tree distance and the observed distance between these species.Compute the matrix of errors between all species pairs. Which pair of species is associ-ated with the largest error? What is the average error?03-511/711 Computational Genomics and Molecular Biology, Fall 2002 75. (a) How many rooted tree topologies are there for six species, A, B, C, D, E and F?(b) Suppose you know that species A and F are neighbors (that is, A and F are connectedthrough a single internal node.) Under this constraint, how many alternate rooted treehypotheses are there for A, B, C, D, E and
View Full Document