Statistics 572 Statistical Methods for Bioscience II Cecile Ane Department of Statistics University of Wisconsin Madison Spring 2010 Outline 1 Introduction Course Information Overview of Linear models Expectations Short review of simple linear regression Outline 1 Introduction Course Information Overview of Linear models Expectations Short review of simple linear regression Course Information www stat wisc edu courses st572 ane Read the entire syllabus carefully Complete the survey sheet Switch section Late homework Block the dates and time for the exams NOW Tuesday March 2 Tuesday April 6 13 Friday May 14 12 25 2 25pm No discussion this week Possible room change for discussions Course Information Get help beyond lectures Reading materials course website forum discussion sections office hours etc Your feedback is highly appreciated Your evaluations are most valuable to me Ask questions get involved Forum on Learn UW Course Information Why R Why not Microsoft Excel Limitations of Microsoft Excel 65K raw data size limit little data protection little no tracking XL2000 has many errors without warning Can get negative correlation coefficients wrong pie charts wrong paired t test with missing values does not accept categorical predictors in multiple regression etc Some bugs are fixed new bugs are created in XL2003 Still doesn t have distributions right Lots of errors known over 10 years without fixes McCullough Wilson 2005 On the accuracy of statistical procedures in Microsoft Excel 2003 Computational Statistics Data Analysis 49 4 1244 1252 Foresight The International Journal of Applied Forecasting issue 3 2006 R Hesse Incorrect Nonlinear Trend Curves in Excel B McCullough The Unreliability of Excel s Statistical Procedures P Fields On the Use and Abuse of Microsoft Excel Overview of Linear models The course will cover multiple regression multi way ANOVA ANOVA with multiple factors linear models with random and mixed effects and standard experimental designs All these are examples of linear models They can give insight to biological understanding observations are treated as realization from a model All models are wrong some are useful George Box In other words No model accounts for all aspects of the underlying biology but an appropriately selected model can be very useful Overview of Linear models Typically we want to know how a response variable is related to one or more explanatory variables Quantitative variables discrete or on a continuous scale or Categorical variables counts in each category A linear combination of the variables X1 Xk takes the form 1 X1 2 X2 k Xk Linear and generalized linear models use linear combinations of explanatory variable to explain the response variable Examples of Linear models Linear regression example soybean yield explained by hours of daylight and amount of nitrogen response quantitative variable Y explanatory one or more quantitative variable s X1 Xk model yi 0 1 Xi1 2 Xi2 k Xik ei error distribution Normal ei N 0 2 ANOVA example nitrogen level in manure explained by diet treatment period and interaction response quantitative variable explanatory one or more categorical variable s model yi j i k i j i k i ei for instance error distribution Normal ei N 0 2 Examples of Linear models Linear regression with both types example milk yield explained by diet 4 treatments and days response quantitative variable explanatory both quantitative and categorical variables model yi 0 j i 1 Xi1 ei error distribution Normal ei N 0 2 Polynomial regression example bacterial colonies in log CFU explained by temperature response quantitative variable Y explanatory one quantitative variable model yi 0 1 Xi 2 Xi2 k Xik ei error distribution Normal ei N 0 2 Examples of Linear models Mixed models example percentage cover of vegetation explained by site modeled as a random effect and soil moisture response quantitative variable explanatory variables with both it fixed and random effects model yi 0 aj i 1 Xi ei and aj N 0 a2 error distribution Normal ei N 0 2 Repeated measures example hormone concentration explained by individual and day response quantitative variable explanatory one or more variable including random effect for individual error distribution Normal Examples of Linear models Logistic regression example seed germination explained by temperature and treatment response categorical variable with 2 levels explanatory one or more model IP yi 1 is a function of 0 aj i 1 Xi1 2 Xi2 error distribution Binomial Poisson regression example number of seeds produced explained by treatment light intensity and age of the plant response discrete variable non negative integer valued explanatory one or more variable s model IP yi k is a function of 0 aj i 1 Xi1 2 Xi2 error distribution Poisson Data request I will illustrate each type of models with an example Case studies will be most interesting if examples relate to your own research If you or someone in your lab has data that falls into the scope of these models and you are willing and able to share please contact me Expectation Computing using R I will assume basic skills The course webpage has resources R is extended with many packages developed by many people We will use these and possibly others lattice has functions for graphics lme4 and nlme have functions for mixed effects models It is easy to install a package if the computer is connected to the internet Start R At R s command line type install packages lattice only once At each session when you need the package type library lattice Expectation Computing using R Good practice keep assignments projects in separate folders Keep a plain text file r extension with the list of commands to replicate what you have done Example Being able to use a computing software is essential for you to analyze your own data when the time comes I will expect that you will experiment with R try things on your own so as to get a good understanding of how R works Getting error and warning messages is normal while experimenting Don t get stuck get help Forum friends TA instructor Expectation Assignments Must be written clearly When including R commands and output don t put them alone Add comments to explain in English what the commands are doing and interpret the results When using graphs include axis labels legend if necessary etc Your second take home midterm exam should look like a well written report that a colleague in the field should be able to understand FEV data set 654
View Full Document