DOC PREVIEW
UW-Madison STAT 572 - Model Modifications - Handouts

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6 2007 Statistics 572 Spring 2007 Model Modifications The Big Picture February 6 2007 1 20 Remedies after Model Diagnostics The Big Picture Residual plots can indicate lack of model fit There are several possible remedies including 1 Transform one or both variables and check if the standard assumptions are reasonable for the transformed variable s Might be useful when residual plots indicate non linearity and or heteroscedasticity Conventional transformation include logarithms and square roots The Box Cox family of transformations is also useful 2 3 Use weighted least squares when there is explainable heteroscedasticity but the linear model is otherwise fine Use polynomial regression when there is non linearity curvature but variances are close to constant Statistics 572 Spring 2007 Model Modifications February 6 2007 2 20 Example Bacteria Count Example Bacteria Count Example Data consist of number of surviving bacteria after exposure to X rays for different time periods Time denotes time measured in six minute intervals N denotes the number of survivors in hundreds Time N Time N 1 355 9 56 2 211 10 38 Statistics 572 Spring 2007 3 197 11 36 4 166 12 32 5 142 13 21 6 166 14 19 Model Modifications Example 7 104 15 15 8 60 February 6 2007 3 20 February 6 2007 4 20 Bacteria Count Example Example cont Begin by plotting data Fit a linear model Assess fit informally with residual plots Statistics 572 Spring 2007 Model Modifications Example Bacteria Count Example Scatterplot 350 300 Time 1 15 N c 355 211 197 166 142 166 104 60 56 38 36 32 21 19 15 par las 1 pch 16 plot Time N fit1 lm N Time plot fit1 which 1 N 250 200 150 100 50 2 4 6 8 10 12 14 Time Statistics 572 Spring 2007 Model Modifications Example February 6 2007 5 20 Bacteria Count Example Residual Plot 1 100 Residuals Scatterplot shows lack of linearity Residual plot also shows increasing variance 50 15 0 Observation 1 is a bit of an outlier Consider transforming variables to see if model fits better 8 50 0 50 100 150 200 250 Fitted values Statistics 572 Spring 2007 Model Modifications February 6 2007 6 20 The Log Transformation Review Review of Exponentiation and Logarithms The constant e 2 718 is the base of the natural logarithm Recall from calculus e is the unique base where the derivative equals d the function ex ex dx I will use log not ln to stand for the natural logarithm Products of exponentials are exponentials of sums ea eb ea b The natural logarithm of e is one log e 1 Any logarithm of 1 is zero log 1 0 Rule for exponents log ab b log a Logarithms of products log ab log a log b ex exists for all x and ex 0 log x is defined only for x 0 Statistics 572 Spring 2007 Model Modifications The Log Transformation February 6 2007 7 20 Exponential Model Exponential Model Here there is a theoretical model nt n0 e 1 t E where t is time nt is the number of bacteria at time t n0 is the number of bacteria at time t 0 1 0 is a decay rate E is some multiplicative error Take natural logs of both sides of the model log nt log n0 e 1 t E log n0 log e 1 t log E log n0 1 t log E 0 1 t e That is we log transformed nt and the result is a usual linear line model if the error E on the original scale is multiplicative and its logarithm is normally distributed Statistics 572 Spring 2007 Model Modifications February 6 2007 8 20 The Log Transformation Example Scatterplot 5 5 5 0 log N par las 1 pch 16 plot Time log N fit2 lm log N Time plot fit2 which 1 4 5 4 0 3 5 3 0 2 4 6 8 10 12 14 Time Statistics 572 Spring 2007 Model Modifications The Log Transformation February 6 2007 9 20 Example Residual Plot 6 Diagnostics are consistent with a model that fits well There is no obvious non linearity There is no obvious heteroscedasticity Residuals 0 4 0 2 0 0 Residual plot has no large deviations from random scatter 10 0 2 2 3 0 3 5 4 0 4 5 5 0 5 5 Fitted values Statistics 572 Spring 2007 Model Modifications February 6 2007 10 20 Example The Log Transformation Fitted Model for Log Transformed Data 0 6 03 1 0 222 summary fit2 On the original scale exp 0 415 Call lm formula log N Time Residuals Min 1Q Median 0 233578 0 091798 0 007255 3Q 0 050165 Max 0 413068 Fitted Model Coefficients Estimate Std Error t value Pr t Intercept 6 028695 0 088259 68 31 2e 16 Time 0 221629 0 009707 22 83 7 1e 12 Signif codes 0 0 001 0 01 0 05 0 1 y 415 e 0 222x 1 where y bacteria count in hundreds x time in 6 minute intervals Residual standard error 0 1624 on 13 degrees of freedom Multiple R Squared 0 9757 Adjusted R squared 0 9738 F statistic 521 3 on 1 and 13 DF p value 7 103e 12 Statistics 572 Spring 2007 Model Modifications The Log Transformation February 6 2007 11 20 Example Confidence and Prediction Intervals t0 data frame Time 10 predict fit2 t0 interval c fit lwr upr 1 3 812403 3 71256 3 912246 predict fit2 t0 interval p fit lwr upr 1 3 812403 3 44756 4 177246 exp predict fit2 t0 interval c fit lwr upr 1 45 25907 40 95853 50 01116 exp predict fit2 t0 interval p The point estimate for the mean bacteria count in the 10th time interval is 45 3 A 95 confidence interval goes from 41 to 50 A 95 prediction interval goes from 31 4 to 65 2 fit lwr upr 1 45 25907 31 42363 65 1861 Statistics 572 Spring 2007 Model Modifications February 6 2007 12 20 The Log Transformation Example Other Transformations We could transform either y or x or both Common transformations include natural log ln or log log base 10 log10 square root reciprocal 1 y Less common transformations include Squaring y 2 Reciprocal squaring 1 y 2 Cube root y 1 3 Arcsin transformation arcsin y useful when y is a proportion Statistics 572 Spring 2007 Model Modifications The Log Transformation February 6 2007 13 20 Box Cox Transformations Box Cox Transformations Box Cox transformations are a continuous family of power transformations The log transformation corresponds to a power of 0 The Box Cox transformation is y 1 y y log y if 6 0 if 0 In R this transfomration is in the library MASS in a function boxcox In theory we can estimate as a continuous parameter In practice we might select a common transformation power close to the continuous estimate Statistics 572 Spring 2007 Model Modifications February 6 …


View Full Document

UW-Madison STAT 572 - Model Modifications - Handouts

Download Model Modifications - Handouts
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Model Modifications - Handouts and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Model Modifications - Handouts and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?