DOC PREVIEW
CMU BSC 03711 - Homework

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

03-511/711 Computational Genomics and Molecular Biology, Fall 2009 1Problem Set 2 Due October 6thCollaboration is allowed on this homework. You must hand in homeworks individually and listthe names of the people you worked with.1. Given the distance function, M = 0, m = 1, g = 1, what is the relationship (<, ≤, =,etc.) between the quantities X and Y in each of the following cases? Gi ve a one sentenceexplanation of your answer.(a) X = the sum-of-pairs (SP) score of a multiple sequence alignment.Y = the tree alignment score of the same multiple sequence alignment.(b) X = the SP sc ore of a multiple sequence alignment obtained by dynamic programming.Y = the SP score of a multiple alignment of the same sequences obtained by progressivealignment.(c) X = the SP score of the optimal multiple alignment of sequences S1. . . Sk.Y =Pki=1Pkj>iD(Si, Sj), where D(Si, Sj) is the optimal pairwise score between seq-uences Siand Sj.03-511/711 Computational Genomics and Molecular Biology, Fall 2009 22. Under the maximum parsimony criterion, we say a column, or site, in a multiple sequencealignment is informative, if it favors one tree topology over another. If the parsimony scoreat a given site in the alignment is the same for all topologies, then the site in uniformative.(a) For each site in the following alignment of sequences from four taxa,1 2 3 4 5 6 7 8 9X. C C G T A G G A CY. A C C T G T G T CZ. A G A T G T G C CW. A G T T A G G C Cstatei. if it is an informative siteii. if so, which of the possible tree top ol ogies for four taxa does it favor?iii. if not, what is the parsimony score for this site?(b) Show the most parsimonious tree(s).(c) What is the maximum parsimony score for this data set?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 33. What is the parsimony score of the following tree? Show your work.A CCGAT GTCG03-511/711 Computational Genomics and Molecular Biology, Fall 2009 44. Consider a pair of DNA sequence that are changing according to the Jukes-Cantor model.The residues in these sequences are changing at two different rates: Half the sites change atα1= 8 · 10−4changes per site per million years, while the other half of the sites are changingfive times as fast (α2= 5α1). Unfortunately, we don’t know which of the sites are changingat the fast rate.(a) Give an expression for Pmismatch, the expected fraction of sites with observable differ-ences between two sequences, as a function of α1, α2and t.(b) Suppose we incorrectly assume that all sites are following a Jukes-Cantor model ofchange with the same parameter α = (α1+ α2)/2. Give an expression forˆPmismatch,the expected fraction of observable differences as a function of α and t in this case.(c) Use your favorite plotting program, plotˆPmismatchand Pmismatchas a function of t,where t, given in units of millions of years, varies from 0 to 3000.(d) What are the consequences of ignoring rate variation and using the average rate instead?How does this effect vary with t? Can you explain why the impact is higher at somevalues of t than at others?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 55. This problem illustrates the strengths and weaknesses of different distance-based methodsfor tree reconstruction. Remember that while in this case you know the tree from which thedistance matrix was derived, when analyzing real data, you will only have the matrix, notthe tree.(a) Compute the distance matrix for the following rooted tree: A, B, C, D and E:CBADE101252210503-511/711 Computational Genomics and Molecular Biology, Fall 2009 6(b) UPGMA and Neighbor Joining (NJ) are greedy algorithms that build trees by iterativelyidentifying neighboring nodes and merging them to form a new subtree. How does theUPGMA algorithm select the taxa that will be neighbors in the next iteration?(c) Working only from the distance matrix (forget you know the tree), which nodes willUPGMA select as the first pair of neighbors? Why?(d) Draw the first subtree with an intermediate node x linking the two neighbors. What arethe branch lengths in this subtree?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 7(e) How does the NJ algorithm select the taxa that will be neighbors in the next iteration?(f) Working only from the distance matrix, which nodes will Neighbor Joining select as thefirst pair of neighbors? Show your calculations for the first iteration of the algorithm.(g) Draw the first subtree constructed by the Neighbor Joining algorithm. What are thebranch lengths in this subtree?(h) Which algorithm is a better choice for reconstructing this tree? Why?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 86. (a) Consider the following matrix of observed distances between four taxa, A, B, C and D:B C DA 21 6 42B 26 61C 45You wish to reconstruct a tree for the above matrix. Which algorithm would you use toreconstruct the topology and why? How would you determine the location of the root?(b) Consider the following matrix of observed distances between four taxa, A, B, C and D:B C DA 5 8 35B 12 26C 22You wish to reconstruct a tree for the above matrix. Which algorithm would you use toreconstruct the topology and why? How would you determine the location of the root?03-511/711 Computational Genomics and Molecular Biology, Fall 2009 9(c) Consider the following matrix of observed distances between four taxa, A, B, C and D:B C DA 9 18 19B 19 20C 5You wish to reconstruct a tree for the above matrix. Which algorithm would you use toreconstruct the topology and why? How would you determine the location of the


View Full Document

CMU BSC 03711 - Homework

Documents in this Course
lecture

lecture

8 pages

Lecture

Lecture

3 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

7 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

21 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

lecture

lecture

11 pages

Lecture

Lecture

8 pages

Lecture

Lecture

9 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Homework

Homework

10 pages

Lecture

Lecture

9 pages

Problem

Problem

7 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

6 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Problem

Problem

11 pages

Notes

Notes

7 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Homework

Homework

15 pages

Lecture

Lecture

16 pages

Problem

Problem

15 pages

Load more
Download Homework
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Homework and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Homework 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?