Unformatted text preview:

CMSC423 Bioinformatic Algorithms Databases and Tools Lecture 14 phylogenetic trees Phylogeny questions Given several organisms a set of features usually sequence but also morphological wing shape color A Given a phylogenetic tree figure out what the ancestors looked like what are the features of internal nodes wings feathers teeth claws no wings fur B Find the phylogenetic tree that best describes the common evolutionary heritage of the organisms A B C CMSC423 Fall 2008 B A C C A B 2 Phylogeny questions A Easy ish can be done with dynamic programming B Hard Many possible trees 2n 3 n 2 2 n 2 CMSC423 Fall 2008 rooted trees with n leaves 3 Scoring a tree Sankoff s algorithm Assumption we try to minimize of state changes from root to leaves Parsimony approach Small parsimony given a tree where leaves are labeled with m character strings find labels at internal nodes s t of state transitions is minimzed Weighted small parsimony same as parsimony except that state transitions are assigned weights minimize the overall weight of the tree CMSC423 Fall 2008 4 Example 1 0 0 0 0 0 1 1 1 0 1 0 0 0 0 1 1 0 0 1 1 0 0 CMSC423 Fall 2008 0 1 1 0 5 Sankoff s algorithm At each node v in the tree store s v t best parsimony score for subtree rooted at v if character stored at v is t Traverse the tree in post order and update s v t as follows assume node v has children u and w s v t mini s u i score i t minj s w j score j t Character at root will be the one that maximizes s root t Note this solves the weighted version For unweighted set score i i 0 score i j 1 for any i j CMSC423 Fall 2008 6 Trees as clustering Start with a distance matrix distance e g alignment distance between any two sequences leaves Intuitively want to cluster together the most similar sequences UPGMA Unweighted Pair Group Method using Arithmetic averages Build pairwise distance matrix e g from a multiple alignment Pick pair of sequences that are closest to each other and cluster them create internal node that has the sequences as children Repeat including newly created internal nodes in the distance matrix Key element must be able to quickly compute distance between clusters internal nodes weighted distance D cl1 cl2 CMSC423 Fall 2008 1 D p q cl1 cl2 p cl1 q cl2 7 Trees as clustering Note that UPGMA does not estimate branch lengths they are all assumed equal Neighbor joining distance between two sequences is not sufficient must also know how each sequence compares to every other sequence NJdist i j D i j ri rj ri rj correction factors 1 ri D i k m 2 k CMSC423 Fall 2008 8 Neighbor joining Pick two nodes with NJdist i j minimal Create parent k s t D k m 0 5 D i m D j m D i j for every other node m D i k 0 5 D i j ri rj length of branch between i k D j k 0 5 D i j rj ri length of branch between j k CMSC423 Fall 2008 9 Trees as clustering Note that both UPGMA and NJ assume distance matrix is additive D i j D j k D i k usually not true but close Also NJ can be proven to build the optimal tree But simple alignment distance is not a good metric CMSC423 Fall 2008 10 Maximum likelihood For every branch S T of length t compute P T S t likelihood that sequence S could have evolved in time t into sequence T Find tree that maximizes the likelihood Note that likelihood of a tree can be computed with an algorithm similar to Sankoffs However no simple way to find a tree given the sequences most approaches use heuristic search techniques Often start with NJ tree then tweak it to improve likelihood CMSC423 Fall 2008 11 Tree analysis display CMSC423 Fall 2008 12 Three types of trees Cladogram Phylogram 6 Taxon B Taxon C Taxon A Taxon D no meaning 1 1 3 1 5 Ultrametric tree Taxon B Taxon B Taxon C Taxon C Taxon A Taxon A Taxon D Taxon D genetic change time All show the same evolutionary relationships or branching orders between the taxa from www albany edu faculty cs812 StewartTalk2 ppt CMSC423 Fall 2008 13 Different tree views CMSC423 Fall 2008 14 http www ab informatik uni tuebingen de software dendroscope welcome html Drawing trees Trees are easy to draw just need to figure out how much space the leaves will take Step 1 calculate how much space each node will take how many leaves from current node Step 2 spread out the nodes according to of leaves Many ways of optimizing e g width area For large trees 3D displays there s more room in 3D interactive displays expand contract nodes as needed CMSC423 Fall 2008 15 Analysis example Build multiple alignment e g Muscle ClustalW Clean up alignment manual editing filters pre defined structure information Build tree PAUP parsimony others Phylip maximum likelihood Tree Puzzle maximum likelihood etc many packages Integrated system ARB www arb home de CMSC423 Fall 2008 16 Antibiotic resistance in Staphylococcus aureus Green boxes individual strains in a phylogenetic tree Red diamonds yellow triangle acquisition of resistance Hexagon loss of resistance CMSC423 Fall 2008 17 Questions Why do you need a multiple alignment for phylogeny What is the running time of the neighbor joining algorithm given k sequences of length L What is the parsimony score of the following tree and what are the labels at internal nodes C CMSC423 Fall 2008 T G T A T 18


View Full Document

UMD CMSC 423 - Lecture 14

Documents in this Course
Midterm

Midterm

8 pages

Lecture 7

Lecture 7

15 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 14 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 14 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?