DOC PREVIEW
U of I CS 466 - Evolutionary tree reconstruction

This preview shows page 1-2-3-4-26-27-28-53-54-55-56 out of 56 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Evolutionary treereconstructionEarly Evolutionary Studies• Anatomical features were the dominant criteriaused to derive evolutionary relationshipsbetween species since Darwin till early 1960s• The evolutionary relationships derived fromthese relatively subjective observations wereoften inconclusive. Some of them were laterproved incorrectEvolution and DNA Analysis:the Giant Panda Riddle• For roughly 100 years scientists were unable to figureout which family the giant panda belongs to• Giant pandas look like bears but have features thatare unusual for bears and typical for raccoons, e.g.,they do not hibernate• In 1985, Steven O’Brien and colleagues solved thegiant panda classification problem using DNAsequences and algorithmsEvolutionary Tree of Bears and RaccoonsOut of Africa Hypothesis• DNA-based reconstruction of thehuman evolutionary tree led to the Outof Africa Hypothesis that claims ourmost ancient ancestor lived in Africaroughly 200,000 years agoEvolutionary tree• A tree with leaves = species, and edgelengths representing evolutionary time• Internal nodes also species: theancestral species• Also called “phylogenetic tree”• How to construct such trees from data?Rooted and Unrooted TreesIn the unrooted tree the position ofthe root (“oldest ancestor”) isunknown. Otherwise, they are likerooted treesDistances in Trees• Edges may have weights reflecting:– Number of mutations on evolutionary pathfrom one species to another– Time estimate for evolution of one speciesinto another• In a tree T, we often compute dij(T) - the length of a path between leaves i and j• This may be based on direct comparison ofsequence between i and jDistance in Trees: an Example d1,4 = 12 + 13 + 14 + 17 + 13 = 69ijFitting Distance Matrix• Given n species, we can compute then x n distance matrix Dij• Evolution of these species is describedby a tree that we don’t know.• We need an algorithm to construct atree that best fits the distance matrix Dij• That is, find tree T such that dij(T) = Dijfor all i,jReconstructing a 3 Leaved Tree• Tree reconstruction for any 3x3 matrix isstraightforward• We have 3 leaves i, j, k and a center vertex cObserve:dic + djc = Dijdic + dkc = Dikdjc + dkc = DjkReconstructing a 3 Leaved Tree(cont’d) dic + djc = Dij dic + dkc = Dik 2dic + djc + dkc = Dij + Dik2dic + Djk = Dij + Dikdic = (Dij + Dik – Djk)/2Similarly,djc = (Dij + Djk – Dik)/2dkc = (Dki + Dkj – Dij)/2Trees with > 3 Leaves• Any tree with n leaves has 2n-3 edges• This means fitting a given tree to a distancematrix D requires solving a system of “nchoose 2” equations with 2n-3 variables• This is not always possible to solve for n > 3Additive Distance MatricesMatrix D isADDITIVE if thereexists a tree T withdij(T) = DijNON-ADDITIVEotherwiseDistance Based PhylogenyProblem• Goal: Reconstruct an evolutionary tree from adistance matrix• Input: n x n distance matrix Dij• Output: weighted tree T with n leaves fitting D• If D is additive, this problem has a solutionand there is a simple algorithm to solve itSolution 1Degenerate Triples• A degenerate triple is a set of three distinctelements 1≤i,j,k≤n where Dij + Djk = Dik• Element j in a degenerate triple i,j,k lies onthe evolutionary path from i to k (or isattached to this path by an edge of length0).Looking for Degenerate Triples• If distance matrix D has a degenerate triple i,j,kthen j can be “removed” from D thus reducingthe size of the problem.• If distance matrix D does not have adegenerate triple i,j,k, one can “create” adegenerate triple in D by shortening all hangingedges (edge leading to a leaf) in the tree.Shortening Hanging Edges toProduce Degenerate Triples• Shorten all “hanging” edges (edges thatconnect leaves) until a degenerate tripleis foundFinding Degenerate Triples• If there is no degenerate triple, all hanging edges arereduced by the same amount δ, so that all pair-wisedistances in the matrix are reduced by 2δ.• Eventually this process collapses one of the leaves(when δ = length of shortest hanging edge), forming adegenerate triple i,j,k and reducing the size of thedistance matrix D.• The attachment point for j can be recovered in thereverse transformations by saving Dij for eachcollapsed leaf.Reconstructing Trees for Additive Distance MatricesAdditivePhylogeny Algorithm1. AdditivePhylogeny(D)2. if D is a 2 x 2 matrix3. T = tree of a single edge of length D1,24. return T5. if D is non-degenerate6. δ = trimming parameter of matrix D7. for all 1 ≤ i ≠j ≤ n8. Dij = Dij - 2δ9. else10. δ = 0AdditivePhylogeny (cont’d)11. Find a triple i, j, k in D such that Dij + Djk = Dik12. x = Dij13. Remove jth row and jth column from D14. T = AdditivePhylogeny(D)15. Add a new vertex v to T at distance x from i to k16. Add j back to T by creating an edge (v,j) of length 017. for every leaf l in T18. if distance from l to v in the tree ≠Dl,v19. output “matrix is not additive”20. return21. Extend all “hanging” edges by length δ22. return TAdditivePhylogeny (Cont’d)• This algorithm checks if the matrix D isadditive, and if so, returns the tree T.• How to compute the trimmingparameter δ ?Solution 2UPGMA: Unweighted Pair GroupMethod with Arithmetic Mean• UPGMA is a clustering algorithm that:– computes the distance between clustersusing average pairwise distance– assigns a height to every vertex in thetree• Does not require an additive distancematrix. Output tree does not necessarilymatch the distance matrix for every pair ofnodesClustering in UPGMA• Given two disjoint clusters Ci, Cj ofsequences, 1 dij = ––––––––– Σ{p ∈Ci, q ∈Cj}dpq |Ci| × |Cj|• Algorithm is a variant of the hierarchicalclustering algorithmUPGMA AlgorithmInitialization:Assign each xi to its own cluster CiDefine one leaf per sequence, each at height 0Iteration:Find two clusters Ci and Cj such that dij is minLet Ck = Ci ∪ CjAdd a vertex connecting Ci, Cj and place it at height dij /2 Length of edge (Ci,Ck) = h(Ck) - h(Ci) Length of edge (Cj,Ck) = h(Ck) - h(Cj)Delete clusters Ci and CjTermination:When a single cluster remainsUPGMA Algorithm (cont’d)143251 4235UPGMA’s Weakness• The algorithm produces an ultrametric tree :the distance from the root to any leaf is thesame• UPGMA assumes a constant molecular clock:all species represented by the


View Full Document

U of I CS 466 - Evolutionary tree reconstruction

Download Evolutionary tree reconstruction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Evolutionary tree reconstruction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Evolutionary tree reconstruction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?