Unformatted text preview:

Integrative Biology 200B University of California, Berkeley "Ecology and Evolution" Spring 2011 by Nat Hallinan, revised by Nick Matzke Lab 11: Testing for Clade Imbalance Today we are going to look at several different ways of testing clade imbalance. Mesquite has two limitations on its ability to test for this property. It can only simulate trees that have the same number of taxa as are in your character matrix and it has no statistic to compare the bottom two nodes. Therefore, we will do what we can in Mesquite and then we will turn to more hands-on methods of making these calculations. Colless’s Imbalance Colless (1982) proposed a way of measuring imbalance in a tree. It does not compare two clades but instead the overall imbalance throughout the tree. It is calculated as: 2/)2)(1(1!!!"=nnnnnodesirili Where nli and nri are the number of taxa descended from the left and right branches of node i respectively, and n is the total number of taxa in the tree. (n-1)(n-2)/2 is the maximum possible value of the sum, so that this statistic runs from 0 to 1 with 0 being perfectly even and 1 completely lateralized. Download the Colless_example.nex file from the website. In Mesquite, open the stored tree and calculate Colless’s imbalance by hand. Move up the tree and add up the difference between the right and left clades at each node, then divide by the denominator. Is this value statistically significant? We have to use simulations to compare it to values from a null distribution of trees to determine if it is. Select Analysis > New Bar & Line Chart for > Trees, Simulated Trees. You will now be offered several different simulation options. We will discuss what these options are below. Essentially each option represents a different null distribution of possible trees. Let’s start with Uniform Speciation. 10 is good for the tree depth. In the Value to calculate for trees window select Show secondary choices and Coless’s Imbalance. Let’s do 999 simulations for a little power. That’s a pretty chart, but we want a p-value. Use the text tab to get actual counts and use these numbers to calculate a p-value. The p-value is the number of trees with a Colless’s imbalance equal to or higher (if you want to show that the tree is particularly imbalanced) than your tree. Is it significant? Repeat this analysis, but this time use equiprobable trees for your null distribution. Are the p-values similar? What effect did the null distribution have on your p-values? Clade Imbalance For the rest of the lab we will be trying to determine weather two sister clades are of statistically different sizes. This will be a comparison for a single tree in which one clade islarger and has n taxa and the other clade has m (<n) taxa. We will be testing weather m is significantly different from n using a number of different null distributions. Something to keep in mind throughout this lab is whether to do a one-tailed or a two-tailed test. A one-tailed test would be appropriate if you had a hypothesis that clades with a certain character (ie: environment or morphology) should have more taxa than clades which do not have that character, and you are comparing a pair of sister clades in which one has the character and the other does not. However, in most situations you will first have identified that one clade is larger than the other, and after the fact you will hypothesize a reason why. In this case a two-tailed test is appropriate. Random Partition Trees Random Partition Trees are those trees found by randomly dividing the taxa into either clade 1 or clade 2, then proceeding up the tree randomly dividing the taxa in the same way until you have a fully branching tree. Therefore there is a 50% chance of each taxon ending up in each of the bottom two clades, and it is very easy to calculate the probability that you will have a difference between clades as great as you do. It should fit a binomial distribution with one exception. A binomial distribution includes the possibility that all the coin flips end up heads or tails, but in this case if all the taxa ended up in one clade, then we would not have a node. It is easy to calculate the probability of all taxa ending up in one clade, as (0.5)n. We can therefore calculate an appropriate p-distribution as: nntailedbinomialp)5.0(21)5.0()1,(!! We subtract the possibility of them all ending up in one clade from 1 tail and the probability of them all ending up in either clade from the total probability. Download and open up the Excel spreadsheet labeled Comparing_Clades.xls. First we will calculated the one-tailed binomial distribution. In the cell labeled binomial type “=BINOMDIST(B2,A2+B2,0.5,TRUE)”. A2 and B2 are the taxon totals for the big clade and small clade respectively, so that A2+B2 is the total number of attempts. 0.5 is the probability of ending up in each clade and TRUE makes it calculate the cumulative binomial distribution, in other words the chance of getting a value that big or smaller. Now calculate the p-value for our one-tailed distribution. First in the box labeled 1/2^n type “=POWER(0.5,A2+B2)”. In the box labeled p-1 tail type “=(B6-B7)/(1-2*B7)”. Do you see how this fits the formula above? Your 2 tailed distribution is now twice that value. Mess around with the values of your big and small clades to see how they affect your p-value. What’s the p-value for our tree from the previous example? Equiprobable Trees Another possibility is that every single labeled topology is equally likely. There is an important distinction between labeled and unlabeled topologies. Unlabeled topologies are just the trees without any taxa. While labeled topologies have the taxa assigned to the branches. Therefore, two unlabeled topologies can have different numbers of labeled topologies associated with them, and thus have different probabilities under an equiprobable trees null distribution.For example consider the two following unlabeled topologies for four taxa. How many possible labeled topologies can you count on each one? Remember that rotating a branch does not change the topology. One consequence of this is that imbalanced unlabeled topologies have more possible labeled topologies associated with them than balanced ones do. Thus your p-value for imbalance from a labeled distribution will be lowered relative to one that uses only


View Full Document

Berkeley INTEGBI 200B - Lab 11: Testing for Clade Imbalance

Documents in this Course
Quiz 2

Quiz 2

4 pages

Quiz 1

Quiz 1

4 pages

Quiz 1

Quiz 1

4 pages

Quiz

Quiz

2 pages

Quiz 1

Quiz 1

4 pages

Quiz

Quiz

4 pages

Load more
Download Lab 11: Testing for Clade Imbalance
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lab 11: Testing for Clade Imbalance and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lab 11: Testing for Clade Imbalance 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?