DOC PREVIEW
Stanford CS 262 - Lecture Notes

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 13 Scribe: Vinayak Ganeshan Introduction In this lecture we will discuss Phylogenetic tree reconstruction and Multiple Sequence Alignment Phylogenetic Tree Reconstruction Recall our discussion on the importance of being able to construct phylogenetic trees starting from multiple alignments. We have the following methods to do so Using pairwise distances Pairwise distances between sequence x and y can be computed by aligning x with y and counting the number of edits between the two sequences. This value is trivially non-negative, symmetric, satisfies the identity relationship (if x,y have no edits then they are the same) and obeys triangle inequality (x,y have p edits and y,z have q edits then x,z can’t have more than p+q edits) and is hence a metric. We have the following 2 methods using the pairwise distances 1. UPGMA – This method correctly constructs the phylogenetic tree UPGMA constructs the right tree if distances satisfy ultrametric property – i.e. all species have distances equal to twice the length of time from a common ancestor. The ultrametric property can also be identified by grounding the tree and observing the height of any node to be one half the sums of distances of any pair of leaves coming down from this node. However distances are often not ultrametric. For example: the rodent lineage runs must faster than humans (a possible cause for this is that speed by which mutations accumulate in DNA depends not only on chronological time but also on generation time since reproduction can accumulate mutations). 2. Neighbor join – Every pair of species is leaf and distances equal length of path. This procedure of tree construction works if distances are additive. Alternative methods These methods do not require computation of distances between pairs of sequences beforehand 1. Parsimony Scoring – In this method we minimize the total number of substitutions over the all trees which satisfy the given alignments. If we consider alignment column U, and leaves A, C, T or G, we wish to label nodes connecting these leaves such that the entire tree has minimal substitutions going from a parent to its children – a ‘Parsimonious Tree’. We have thefollowing algorithm going up from the leaves to generate the minimum cost tree for a given column Tree construction algorithm Initialization: Set cost C = 0; node k = 2N – 1 (last leaf) Iteration: If k is a leaf, set Rk = { xk[u] } If k is not a leaf, Let i, j be the daughter nodes; Set Rk = Ri Rj if intersection is nonempty Set Rk = Ri Rj, and C += 1, if intersection is empty Termination: Minimal cost of tree for column u, = C In the above algorithm, Rk stores all possible letters at leaves to still have the minimum parsimony score in the tree. We encounter the following 2 cases when building the tree – a) If two daughter nodes have non empty intersection then each of the daughter nodes can take a value from their intersection and still have minimal parsimony score and hence we can assign the intersection to their parent (we can later propagate one of the elements in the intersection down). The substitution score remains unaltered. For example if nodes are N1 = {A, C, G} and N2 = {A, C, T} then every node below N1 has parsimony score equal for a selection of A, C or G and N2 will have equal score for a selection of A, C or T. So the parent will select A or C and keep parsimony score as is. b) If the daughter nodes are disjoint, we place their union at their parent. This will increase the substitution score by 1. For example if Nodes are N1 = {A, C} and N2 = {T} then there has to be an additional substitution and the parent can pick A, C or T. Consider the following example where the represents a choice of substitutions when we pick B at the top. At the left substitution instance, if B is picked at the parent node the left child incurs a substitution and the right child node incurs a match. On the other hand if A is picked at this node, the left child incurs a Parsimony tree construction algorithmsubstitution and the right child node gets matched. Multiple Columns - When multiple columns are present, each is independent and hence the cost of the tree for all columns is the sum of costs for each column. Traceback - After constructing the parsimony tree, we will trace back from the top by picking a nucleotide at the set at the root node and using the following algorithm 1. Choose an arbitrary nucleotide from R2N – 1 for the root 2. Having chosen nucleotide r for parent k, If r Ri choose r for daughter i Else, choose arbitrary nucleotide from Ri Problems with Traceback - The traceback algorithm will construct the minimum substitution score path for some choices of nucleotides at the root, but it can generate a tree with more than optimal substitutions in other cases. The following parsimonious tree illustrates this point. We have an inadmissible traceback substitution when we make the choice B at each node. Sample Tree Parsimony Speed – All operations involved are very fast and hence this method is very popular to generate tree phylogeny. The process of constructing the minimum cost phylogeny is an NP hard problem and often various heuristics are used. This is an active area of research when large trees are encountered. 2. Probabilistic Methods In this method we start off with a given topology and have branch lengths associated with all branches in the tree. For example if x1 and x2 represented humans and chimpanzees respectively, the branch lengths t1, t2 could denote the number of generations. If we assume humans and chimpanzees don’t mingle then we would want Admissible substitution Inadmissible substitutiontheir evolutions to be independent. (Remark: A recent paper suggests humans and chimpanzees may have interbred). The independence of events gives P(x1, x2, xroot | t1, t2) = P(xroot) P(x1 | t1, xroot) P(x2 | t2, xroot) In the Jukes Cantor example, x1= xroot = A, x2=C, t1 = t2 = 1 and substituting for the conditional probabilities from last lecture, if α is the Jukes Cantor parameter P(x1, x2, xroot | t1, t2) = pA¼(1 + 3e-4α) ¼(1 – e-4α) = (¼)3(1 + 3e-4α)(1 – e-4α) Observe that we assume that we know the label for the root node. Assume that we know all internal labels in the tree and then define P(x1, x2, …, xN, xN+1, …, x2N-1 | T, t) = P(xroot)j rootP(xj | xparent(j), tj, parent(j)) P above will give the required


View Full Document

Stanford CS 262 - Lecture Notes

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?