Berkeley INTEGBI 200B - Independent Contrasts - D775993

Home> Schools> University of California, Berkeley> Integrative Biology (INTEGBI) > INTEGBI 200B> Independent Contrasts

DOC PREVIEW

Berkeley INTEGBI 200B - Independent Contrasts

School name University of California, Berkeley

Course Integbi 200b- Principles of Phylogenetics: Ecology and Evolution

Pages 12

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Integrative Biology 200B University of California, Berkeley, Spring 2009 "Ecology and Evolution" NM Hallinan Lab 6: Independent Contrasts Today we’re going to use R to derive and analyze independent contrasts for continuous data. First we're going to learn about some statistical models in R by analyzing the raw data from our last lab. Then we'll generate some phylogenetically independent contrasts and see how they affect our analyses. Finally, in the last section you will be given a second data set to analyze and explain. Several other programs can do PIC, including CAIC, PDAP and the PDAP:PDTree module in Mesquite. We will stick with R for now, but you may want to explore some other options on your own. A word to the wary: I will use two abbreviations in this lab PC (Principle Component) and PIC (Phylogenetically independent Contrast). These are very different things and you should be sure to keep them straight. Statistical Analysis in R First open your workspace from the last lab. Type “ls()” to see all your objects. Statistical Distributions R has a number of statistical distributions automatically embedded in the software. These can be very useful for exploring and analyzing data. To see a list of distributions look at the Intro to R documentation on line. I will use an example of the normal distribution, which we are all familiar with. Later we will use the bivariate normal distribution and the binomial distribution. For any distribution you can add a prefix before the name of the distribution to create different functions: r---(n,parameters) will return n random draw from a distribution. q---(p,parameters) will return the value for the distribution which has the given p-value. p---(q,parameters) will return a p-value for a given value from the distribution. d—(x,parameters) will return the probability density for a given value. Let's compare our snout-vent length data to a normal distribution. One assumption of linear regression is that all the parameters are normally distributed. hist(Anole.ordered[,1]) mn<-mean(Anole.ordered[,1]) stand<-sd(Anole.ordered[,1])That looks pretty crappy. It looks like it has tons of positive skew. Let's see what the p-values look like for some of our biggest numbers: quant<-pnorm(sort(Anole.ordered[,1]),mn,stand) quant[28:30] Considering that we only have 30 values here, the fact that 3 of them are in the top 98.7% percentile is more than a little suspicious. (What percentile would you expect them to be in?) The p-values of our data could be considered the expected percentage of data points that should be less than that value. Furthermore the cumulative probability of the sampled data points represent the percentage of data points that are less than that value for the actual distribution. If our data is in fact normal, then our p-values should match up with the cumulative probability of our data points, which are distributed evenly between 0 and 1. Let's plot our p-values against some evenly distributed fractions and draw a straight line with slope equals 1 to see if they are the same: plot((1:30)/31,quant) abline(0,1) No, definitely not. On the flip side let's see what values we expect at the 90th percentile: qnorm(0.90,mn,stand) While we actually got: sort(Anole.ordered[,1],decreasing=TRUE)[3] For the normal distribution you can actually make plots of these comparisons automatically using qqnorm, but you could make these plots yourself anyway for any distribution. Finally let's look at what this distribution would look like if it had the same parameters, but was normal: hist(rnorm(30,mn,stand)) hist(rnorm(3000,mn,stand)) Parametric Correlation of Variables Now let's plot all our data against each other to see what it looks like: pairs(Anole.ordered) As you can see all these variables seam to be positively correlated. This is not surprising, since they are all measures of body size. For our initial analysis we are going to look at mass as a function of snout-vent length, so plot them against each other:plot(Anole.ordered[,1],Anole.ordered[,2]) To test for this correlation we will use the cor.test function based on Pearson's product moment correlation. . cor.test(Anole.ordered[,1],Anole.ordered[,2],alternative= “g”) alternative=“g” makes the test one tailed for a positive correlation between the variables. We can make this test one tailed, because we can a priori assume a positive relationship between measures of size. If you could not make this assumption, then you should keep the test two-tailed. A bunch of information will appear on the screen. At the bottom you are given the calculated correlation (the covariance divided by the total variance) and 95% confidence intervals. Above that you see statistics testing the hypothesis that there is no correlation between these data. The t-statistic is used with 28 degrees of freedom and the null hypothesis is completely rejected. Assumptions of Pearson's Correlation That looks great, right? Low p-values high R2? No, it sucks ass. The problem is that the data does not match the assumptions of the model, in particular the data are not from a bivariate normal distribution. That means not only is each data set normally distributed, but for every value of one parameter the other is normally distributed. First let's test the assumption that each variable is normally distributed. qqnorm(Anole.ordered[,1]) qqline(Anole.ordered[,1]) This is a plot of the quantiles of your data against the expected quantiles of a normal distribution. qqline puts a line through the first and third quartiles. If this distribution was close to normal, then all the dots would be close to the line. As you can see they are not, the terminal values fall far from the line. Repeat this for your mass as well. Since neither data set is normally distributed, you know that together they do not fit a bivariate normal distribution, but let's check anyway just for fun. This is a little tricky, so I wrote a function that will plot the cumulative probability of our data against their p-values from the multivariate normal distribution. The values from these distributions should match exactly, so we'll add a line of slope 1 through the origin for comparison. Load the mnormt package, open test.mvn.R, run it, and then: qqmvn(Anole.ordered[,1:2]) abline(0,1) Wow, that really sucks. Transforming Data Luckily we

View Full Document