Molecular Evolution and Phylogenetic Tree ReconstructionPhylogenetic TreesInferring Phylogenetic TreesSlide 4Distance Between Two SequencesSlide 6OutlineMolecular EvolutionSlide 9Slide 10Evolutionary ModelsEstimating DistancesSlide 13A simple clustering method for building treeAlgorithm: Average LinkageAverage Linkage ExampleUltrametric Distances and Molecular ClockUltrametric Distances & Average LinkageWeakness of Average LinkageAdditive DistancesReconstructing Additive Distances Given TSlide 23Slide 24Slide 25Neighbor-JoiningAlgorithm: Neighbor-JoiningSlide 28ParsimonyExample: Parsimony Cost of One ColumnParsimony ScoringExampleParsimony TracebackAnother Parsimony AlgorithmProbabilistic MethodsSlide 37Slide 39Slide 40Slide 41Current popular methodsMolecular Evolution and Phylogenetic Tree Reconstruction1432514235Phylogenetic Trees•Nodes: species•Edges: time of independent evolution•Edge length represents evolution timeAKA genetic distanceNot necessarily chronological timeInferring Phylogenetic TreesTrees can be inferred by several criteria:Morphology of the organisms•Can lead to mistakes!Sequence comparisonExample:Mouse: ACAGTGACGCCCCAAACGTRat: ACAGTGACGCTACAAACGTBaboon: CCTGTGACGTAACAAACGAChimp: CCTGTGACGTAGCAAACGAHuman: CCTGTGACGTAGCAAACGAInferring Phylogenetic Trees•Sequence-based methodsDeterministic (Parsimony)Probabilistic (SEMPHY)•Distance-based methodsUPGMANeighbor-Joining•Can compute distances from sequencesDistance Between Two SequencesBasic principles:•Degree of sequence difference is proportional to length of independent sequence evolution•Only use positions where alignment is certain – avoid areas with (too many) gapsDistance Between Two SequencesGiven sequences xi, xj,Define dij = distance between the two sequencesOne possible definition:dij = fraction f of sites u where xi[u] xj[u]Better scores are derived by modeling evolution as a continuous change processOutline•Molecular Evolution•Distance MethodsUPGMA / Average LinkageNeighbor-Joining•Sequence MethodsDeterministic (Parsimony)Probabilistic (SEMPHY)Molecular EvolutionQ: How can we model evolution on nucleotide level? (ignore gaps, focus on substitutions)A: Consider what happens at a specific position for small time interval Δt•P(t) = vector of probabilities of {A,C,G,T} at time t•μAC = rate of transition from A to C per unit time•μA = μAC + μAG + μAT rate of transition out of A•pA(t+Δt) = pA(t) – pA(t) μA Δt + pC(t) μCA Δt + …Molecular EvolutionIn matrix/vector notation, we getP(t+Δt) = P(t) + Q P(t) Δtwhere Q is the substitution rate matrixMolecular Evolution•This is a differential equation:P’(t) = Q P(t)•A substitution rate matrix Q implies a probability distribution over {A,C,G,T} at each position, including stationary (equilibrium) frequencies πA, πC, πG, πT•Each Q is an evolutionary model (some work better than others)Evolutionary Models•Jukes-Cantor•Kimura•Felsenstein•HKYEstimating Distances•Solve the differential equation and compute expected evolutionary time given sequences•Jukes-Cantor •KimuraOutline•Molecular Evolution•Distance MethodsUPGMA / Average LinkageNeighbor-Joining•Sequence MethodsDeterministic (Parsimony)Probabilistic (SEMPHY)A simple clustering method for building treeUPGMA (unweighted pair group method using arithmetic averages)Or the Average Linkage MethodGiven two disjoint clusters Ci, Cj of sequences, 1dij = ––––––––– {p Ci, q Cj}dpq|Ci| |Cj|Claim that if Ck = Ci Cj, then distance to another cluster Cl is:dil |Ci| + djl |Cj| dkl = –––––––––––––– |Ci| + |Cj|Algorithm: Average LinkageInitialization:Assign each xi into its own cluster CiDefine one leaf per sequence, height 0Iteration:Find two clusters Ci, Cj s.t. dij is minLet Ck = Ci CjDefine node connecting Ci, Cj, and place it at height dij/2Delete Ci, CjTermination:When two clusters i, j remain, place root at height dij/21432514235Average Linkage Examplev w x y zv0 6 8 8 8w0 8 8 8x0 4 4y0 2z0y zxwv1234v w x yzv0 6 8 8w0 8 8x0 4yz0v w xyzv0 6 8w0 8xyz0vw xyzvw0 8xyz0Ultrametric Distances and Molecular ClockDefinition:A distance function d(.,.) is ultrametric if for any three distances dij dik dij, it is true that dij dik = dijThe Molecular Clock:The evolutionary distance between species x and y is 2 the Earth time to reach the nearest common ancestorThat is, the molecular clock has constant rate in all species1 4 2 3 5yearsThe molecular clock results in ultrametric distancesUltrametric Distances & Average LinkageAverage Linkage is guaranteed to reconstruct correctly a binary tree with ultrametric distancesProof: Exercise1 4 2 35Weakness of Average LinkageMolecular clock: all species evolve at the same rate (Earth time)However, certain species (e.g., mouse, rat) evolve much fasterExample where UPGMA messes up:23411432Correct treeAL treeAdditive DistancesGiven a tree, a distance measure is additive if the distance between any pair of leaves is the sum of lengths of edges connecting themGiven a tree T & additive distances dij, can uniquely reconstruct edge lengths:•Find two neighboring leaves i, j, with common parent k•Place parent node k at distance dkm = ½ (dim + djm – dij) from any node m i, j12345678910121113d1,4Reconstructing Additive Distances Given Txyzwv5473346v w x y zv0 10 17 16 16w0 15 14 14x0 9 15y0 14z0TIf we know T and D, but do not know the length of each leaf, we can reconstruct those lengthsDReconstructing Additive Distances Given Txyzwvv w x y zv0 10 17 16 16w0 15 14 14x0 9 15y0 14z0TDReconstructing Additive Distances Given Txyzwvv w x y zv0 10 17 16 16w0 15 14 14x0 9 15y0 14z0TDa x y za0 11 10 10x0 9 15y0 14z0aD1dax = ½ (dvx + dwx – dvw)day = ½ (dvy + dwy – dvw)daz = ½ (dvz + dwz – dvw)Reconstructing Additive Distances Given TxyzwvTa x y za0 11 10 10x0 9 15y0 14z0aD1a b za0 6 10b0 10z0D2bca ca0 3c0D3d(a, c) = 3d(b, c) = d(a, b) – d(a, c) = 3d(c, z) = d(a, z) – d(a, c) = 7d(b, x) = d(a, x) – d(a, b) = 5d(b, y) = d(a, y) – d(a, b) = 4d(a, w) = d(z, w) – d(a, z) = 4d(a, v) = d(z, v) – d(a, z) = 6Correct!!!5473346Neighbor-Joining•Guaranteed to produce the correct tree if distance is additive•May produce a good tree even when distance is not additiveStep 1: Finding neighboring leavesDefineDij = (N – 2) dij – ki dik – kj djk Claim: The
View Full Document