DOC PREVIEW
Stanford CS 262 - Phylogeny Tree Reconstruction

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Phylogeny Tree ReconstructionInferring PhylogeniesModeling EvolutionSlide 4Slide 5Slide 6Phylogeny and sequence comparisonDistance between two sequencesA simple clustering method for building treeAlgorithm: UPGMAUltrametric Distances & UPGMAWeakness of UPGMAAdditive DistancesNeighbor-JoiningAlgorithm: Neighbor-joiningParsimonyParsimony ScoringExampleTraceback to find ancestral nucleotidesSlide 20Search through tree topologies: Branch and BoundBootstrapping to get the best treesProbabilistic MethodsFelsenstein’s Likelihood AlgorithmSlide 25Phylogeny Tree Reconstruction1432514235Inferring PhylogeniesTrees can be inferred by several criteria:Morphology of the organismsSequence comparisonExample:Orc: ACAGTGACGCCCCAAACGTElf: ACAGTGACGCTACAAACGTDwarf: CCTGTGACGTAACAAACGAHobbit: CCTGTGACGTAGCAAACGAHuman: CCTGTGACGTAGCAAACGAModeling EvolutionDuring infinitesimal time t, there is not enough time for two substitutions to happen on the same nucleotideSo we can estimate P(x | y, t), for x, y  {A, C, G, T}Then letP(A|A, t) …… P(A|T, t)S(t) = … …P(T|A, t) …… P(T|T, t)Modeling EvolutionReasonable assumption: multiplicative (implying a stationary Markov process)S(t+t’) = S(t)S(t’)That is, P(x | y, t+t’) = z P(x | z, t) P(z | y, t’)Jukes-Cantor: constant rate of evolution 1 - 3   For short time , S() =  1 - 3     1 - 3     1 - 3Modeling EvolutionJukes-Cantor:For longer times,r(t) s(t) s(t) s(t)S(t) = s(t) r(t) s(t) s(t)s(t) s(t) r(t) s(t)s(t) s(t) s(t) r(t)Where we can derive:r(t) = ¼ (1 + 3 e-4t)s(t) = ¼ (1 – e-4t)Modeling EvolutionKimura:Transitions: A/G, C/TTransversions: A/T, A/C, G/T, C/GTransitions (rate ) are much more likely than transversions (rate )r(t) s(t) u(t) s(t)S(t) = s(t) r(t) s(t) u(t)u(t) s(t) r(t) s(t)s(t) u(t) s(t) r(t)Where s(t) = ¼ (1 – e-4t)u(t) = ¼ (1 + e-4t – e-2(+)t)r(t) = 1 – 2s(t) – u(t)Phylogeny and sequence comparisonBasic principles:•Degree of sequence difference is proportional to length of independent sequence evolution•Only use positions where alignment is pretty certain – avoid areas with (too many) gapsDistance between two sequencesGiven (portion of) sequences xi, xj,Define dij = distance between the two sequencesOne possible definition:dij = fraction f of sites u where xi[u]  xj[u]Better model (Jukes-Cantor):dij = - ¾ log(1 – 4f / 3)A simple clustering method for building treeUPGMA (unweighted pair group method using arithmetic averages)Given two disjoint clusters Ci, Cj of sequences, 1 dij = ––––––––– {p Ci, q Cj}dpq |Ci|  |Cj|Note that if Ck = Ci  Cj, then distance to another cluster Cl is: dil |Ci| + djl |Cj| dkl = –––––––––––––– |Ci| + |Cj|Algorithm: UPGMAInitialization:Assign each xi into its own cluster CiDefine one leaf per sequence, height 0Iteration:Find two clusters Ci, Cj s.t. dij is minLet Ck = Ci  CjDefine node connecting Ci, Cj, & place it at height dij/2Delete Ci, CjTermination:When two clusters i, j remain, place root at height dij/21432514235Ultrametric Distances & UPGMAUPGMA is guaranteed to build the correct tree if distance is ultrametricProof:1. The tree topology is unique, given that the tree is binary2. UPGMA constructs a tree obeying the pairwise distances14235Weakness of UPGMAMolecular clock: implied time is constant for all speciesHowever, certain species (e.g., mouse, rat) evolve much fasterExample where UPGMA messes up:23411432Correct treeUPGMAAdditive DistancesGiven a tree, a distance measure is additive if the distance between any pair of leaves is the sum of lengths of edges connecting themGiven a tree T & additive distances dij, can uniquely reconstruct edge lengths:•Find two neighboring leaves i, j, with common parent k•Place parent node k at distance dkm = ½ (dim + djm – dij) from any node m12345678910121113d1,4Neighbor-Joining•Guaranteed to produce the correct tree if distance is additive•May produce a good tree even when distance is not additiveStep 1: Finding neighboring leavesDefineDij = dij – (ri + rj)Where 1 ri = –––––k dik |L| - 2Claim: The above “magic trick” ensures that Dij is minimal iff i, j are neighborsProof: Too little time today, please read Durbin et al.!12430.10.1 0.10.40.4Algorithm: Neighbor-joiningInitialization:Define T to be the set of leaf nodes, one per sequenceLet L = TIteration:Pick i, j s.t. Dij is minimalDefine a new node k, and set dkm = ½ (dim + djm – dij) for all m  LAdd k to T, with edges of lengths dik = ½ (dij + ri – rj)Remove i, j from L; Add k to LTermination:When L consists of two nodes, i, j, and the edge between them of length dijParsimony•One of the most popular methodsIdea: Find the tree that explains the observed sequences with a minimal number of substitutionsTwo computational subproblems:1. Find the parsimony cost of a given tree (easy)2. Search through all tree topologies (hard)Parsimony Scoring Given a tree, and an alignment columnLabel internal nodes to minimize the number of required substitutionsInitialization:Set cost C = 0; k = 2N – 1Iteration:If k is a leaf, set Rk = { xk[u] }If k is not a leaf,Let i, j be the daughter nodes;Set Rk = Ri  Rj if intersection is nonemptySet Rk = Ri  Rj, and C += 1, if intersection is emptyTermination:Minimal cost of tree for column u, = CExampleABAB{A, B}C+=1{A, B}C+=1{A}{A}{B}{A}{B}Traceback:1. Choose an arbitrary nucleotide from R2N – 1 for the root 2. Having chosen nucleotide r for parent k, If r  Ri choose r for daughter iElse, choose arbitrary nucleotide from RiEasy to see that this traceback produces some assignment of cost CTraceback to find ancestral nucleotidesExampleABAB{A, B}{A, B}{A}{A}{B}{A}{B}ABABAA AxxABABAB AxxABABBB BxxAdmissible with TracebackStill optimal, but inadmissible with TracebackSearch through tree topologies: Branch and BoundObservation: adding an edge to an existing tree can only increase the parsimony costEnumerate all unrooted trees with at most n leaves:[i3][i5][i7]……[i2N–5]]where each ik can take values from 0 (no edge) to kAt each point keep C = smallest cost so far for a complete treeStart B&B with tree [1][0][0]……[0]Whenever cost of current tree T is > C, then:T is not optimalAny tree extending T with more edges is not


View Full Document

Stanford CS 262 - Phylogeny Tree Reconstruction

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Phylogeny Tree Reconstruction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Phylogeny Tree Reconstruction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Phylogeny Tree Reconstruction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?