DOC PREVIEW
Berkeley STATISTICS 246 - Inferring trees and estimating rate matrices

This preview shows page 1-2-3-18-19-36-37-38 out of 38 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Inferring trees and estimating rate matrices Statistics 246 Week 6 Spring 2006 Lecture 2 1 What is a tree in this context Typically labelled binary unrooted Taxa plural for taxon at the tips aka leaves The topology is usually what counts A root may be added later 2 Tree topologies Identical A E C D B F F B D C E A Not identical 3 A E C D B F A E D C B F All 3 rooted binary tip labelled trees on 3 taxa just 1 if unrooted A B C B A C C A B4 15 rooted binary tip labelled trees on 4 taxa just 3 if unrooted A B C D B A C D C B A D D B C A A B C D A D C B B D C A C A B D D A B C A C B D A C B D B C A D C D A B D C A B A D 5 B C All trees 4 taxa In general for any strictly bifurcating rooted tree with n species there are 2n 3 2n 2 n 2 different topologies n trees 5 15 20 105 213 458 046 676 875 8 200 794 532 637 891 559 375 For unrooted trees it s only 2n 5 2n 3 n 3 6 Tree inference some methods Clustering methods Objective criterionbased methods UPGMA Neighbor joining WPGMA Single linkage Complete linkage Least squares distance Maximum parsimony Minimum evolution Maximum likelihood 7 Building trees distance methods There are many ways of building trees using distance methods All start by computing the pairwise distances between the sequences to be at the tips of the tree usually along the lines we discussed in the last lecture i e ML distance using a rate matrix One of the oldest distance methods still widely used though rather discredited in the molecular evolutionary context is UPGMA This stands for unweighted pair group method with arithmetic means It is easy to understand quickly and so I will describe it verbally I don t recommend it For more details see http www icp ucl ac be opperd private upgma html 8 Revisiting Beta globins 10 BG human BG macaque BG bovine BG platypus BG chicken BG shark M V H M L W W T S S P A G A E E G V E K L S N A Q H 20 A L E V I I T A T N G T L F T W G K K S V I 50 BG human BG macaque BG bovine BG platypus BG chicken BG shark R F Y F E A G S A N F L G K D N E L F S T N D D D L I V K G N S T Q F A S T T Q A K Q D L S T S S S A P A A C D G T S A Y V I G M L G N N K D E D D K F T S A P P Q V E E Q V L C T Q A A D I D N A H E S V L C L G G A A E K A L P K M E K L K H C A D E K E L H V D G A A 40 R L M L F V I I V Y P W T 70 V K R A E H G A K A K V 100 130 BG human BG macaque BG bovine BG platypus BG chicken BG shark V I K 60 90 BG human BG macaque BG bovine BG platypus BG chicken BG shark N K D 30 L T G D T T 80 A S S S F L S G G G D N V G A A A L M V V A N K K K T H N N 110 P V E N S F R K K N K L R L A L I A G S H R K K R E Y H G A N D K V I C Q T L D G 120 L F V I I C V V I V V E L A G H R R A I H N L F L G S S K 140 Y F W W W Q E K V L L Y V F A S R G G V V V A V N H H D means same as reference sequence means deletion 9 Beta globins Uncorrected pairwise distances Distances between protein sequences Calculated over 1 to 147 Below diagonal observed number of differences Above diagonal number of differences per 100 amino acids hum mac bov pla chi sha hum 5 16 23 31 65 mac 7 17 23 30 62 bov 23 24 27 37 65 pla 34 34 39 29 64 chi 45 44 52 42 61 sha 91 88 91 90 87 10 Beta globins Corrected pairwise distances Distances between protein sequences Calculated over residues 1 to 147 Below diagonal observed number of differences Above diagonal estimated number of substitutions per 100 amino acids Correction method Jukes Cantor hum mac bov pla chi sha hum 5 17 27 37 108 mac 7 18 27 36 102 bov 23 24 32 46 110 pla 34 34 39 34 106 chi 45 44 52 42 98 sha 91 88 91 90 87 11 UPGMA tree for beta globins BG shark BG chicken BG platypus BG bovine BG macaque BG human 12 UPGMA tree alternate form BG shark BG chicken BG platypus BG bovine BG human 13 BG macaque Example where UPGMA fails An example of a tree where the minimal dij is not achieved by neighboring leaves Using UPGMA with just the pairwise distances here would not lead us to the correct tree Distances are additive but the topology defeats UPGMA which likes a molecular clock A B 1 1 1 3 3 D C 14 Building trees distance methods A more recent and much more satisfactory method in molecular evolution is the neighbour joining approach abbrev NJ It takes longer to explain but I sketch it There are many places where the details of this and other methods are given including Durbin et al 1998 and the recent excellent book by the master of this topic Joseph Felsenstein Inferring phylogenies Sinauer 2004 15 Neighbor Joining Neighbor Joining assumes that the true tree is additive but does not require a common evolutionary rate in all branches of the tree Define the following quantities L nodes in current tree dij distance between nodes i and j 1 ri dij L 2 j L Dij dij ri rj 16 Neighbor Joining continued An example of a tree where the minimal dij is not achieved by neighboring leaves Assuming additivity however A 1 1 1 rA rB 6 rC rD 8 DAD DAB and NJ still infers the correct tree topology Exercise Apply the following …


View Full Document

Berkeley STATISTICS 246 - Inferring trees and estimating rate matrices

Documents in this Course
Meiosis

Meiosis

46 pages

Meiosis

Meiosis

47 pages

Load more
Download Inferring trees and estimating rate matrices
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Inferring trees and estimating rate matrices and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Inferring trees and estimating rate matrices and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?