The Big PictureRemedies after Model DiagnosticsExampleBacteria Count ExampleThe Log TransformationReviewExponential ModelExampleBox-Cox TransformationsWeighted RegressionExampleModel ModificationsBret LargetDepartments of Botany and of StatisticsUniversity of Wisconsin—MadisonFebruary 6, 2007Statistics 572 (Spring 2007) Model Modifications February 6, 2007 1 / 20The Big Picture Remedies after Model DiagnosticsThe Big PictureResidual plots can indicate lack of model fit.There are several possible remedies, including:1Transform one or both variables and check if the standard assumptionsare reasonable for the transformed variable(s).Might be useful when residual plots indicate non-linearity and/orheteroscedasticity.Conventional transformation include logarithms and square roots.The Box-Cox family of transformations is also useful.2Use weighted least squares when there is explainable heteroscedasticitybut the linear model is otherwise fine.3Use polynomial regression when there is non-linearity (curvature) butvariances are close to constant.Statistics 572 (Spring 2007) Model Modifications February 6, 2007 2 / 20Example Bacteria Count ExampleBacteria Count ExampleData consist of number of surviving bacteria after exposure to X-raysfor different time periods.Time denotes time measured in six-minute intervals.N denotes the number of survivors in hundreds.Time 1 2 3 4 5 6 7 8N 355 211 197 166 142 166 104 60Time 9 10 11 12 13 14 15N 56 38 36 32 21 19 15Statistics 572 (Spring 2007) Model Modifications February 6, 2007 3 / 20Example Bacteria Count ExampleExample (cont.)Begin by plotting data.Fit a linear model.Assess fit informally with residual plots.Statistics 572 (Spring 2007) Model Modifications February 6, 2007 4 / 20Example Bacteria Count ExampleScatterplot> Time = 1:15> N = c(355, 211, 197, 166, 142, 166, 104, 60, 56, 38,+ 36, 32, 21, 19, 15)> par(las = 1, pch = 16)> plot(Time, N)> fit1 = lm(N ~ Time)> plot(fit1, which = 1)●●●●●●●●●●●●●●●2 4 6 8 10 12 1450100150200250300350TimeNStatistics 572 (Spring 2007) Model Modifications February 6, 2007 5 / 20Example Bacteria Count ExampleResidual PlotScatterplot shows lack oflinearity.Residual plot also showsincreasing variance.Observation #1 is a bit of anoutlier.Consider transforming variablesto see if model fits better.0 50 100 150 200 250−50050100Fitted valuesResiduals●●●●●●●●●●●●●●●1815Statistics 572 (Spring 2007) Model Modifications February 6, 2007 6 / 20The Log Transformation ReviewReview of Exponentiation and LogarithmsThe constant e ≈ 2.718 is the base of the natural logarithm.Recall from calculus, e is the unique base where the derivative equalsthe function,ddx(ex) = ex.I will use log, not ln, to stand for the natural logarithm.Products of exponentials are exponentials of sums, ea× eb= ea+b.The natural logarithm of e is one, log e = 1.Any logarithm of 1 is zero, log 1 = 0.Rule for exponents, log ab= b log a.Logarithms of products, log(ab) = log(a) + log(b).exexists for all x and ex> 0.log x is defined only for x > 0.Statistics 572 (Spring 2007) Model Modifications February 6, 2007 7 / 20The Log Transformation Exponential ModelExponential ModelHere there is a theoretical model:nt= n0eβ1t× E ,wheret is timentis the number of bacteria at time tn0is the number of bacteria at time t = 0β1< 0 is a decay rateE is some multiplicative error.Take natural logs of both sides of the model:log(nt) = log(n0eβ1tE ) = log(n0) + log(eβ1t) + log(E )= log(n0) + β1t + log(E )= β0+ β1t + e,That is, we log-transformed ntand the result is a usual linear-linemodel, if the error E on the original scale is multiplicative and itslogarithm is normally distributed.Statistics 572 (Spring 2007) Model Modifications February 6, 2007 8 / 20The Log Transformation ExampleScatterplot> par(las = 1, pch = 16)> plot(Time, log(N))> fit2 = lm(log(N) ~ Time)> plot(fit2, which = 1)●●●●●●●●●●●●●●●2 4 6 8 10 12 143.03.54.04.55.05.5Timelog(N)Statistics 572 (Spring 2007) Model Modifications February 6, 2007 9 / 20The Log Transformation ExampleResidual PlotDiagnostics are consistent witha model that fits well.There is no obviousnon-linearity.There is no obviousheteroscedasticity.Residual plot has no largedeviations from random scatter.3.0 3.5 4.0 4.5 5.0 5.5−0.20.00.20.4Fitted valuesResiduals●●●●●●●●●●●●●●●6210Statistics 572 (Spring 2007) Model Modifications February 6, 2007 10 / 20The Log Transformation ExampleFitted Model for Log-Transformed Data> summary(fit2)Call:lm(formula = log(N) ~ Time)Residuals:Min 1Q Median 3Q Max-0.233578 -0.091798 -0.007255 0.050165 0.413068Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 6.028695 0.088259 68.31 < 2e-16 ***Time -0.221629 0.009707 -22.83 7.1e-12 ***---Signif. codes: 0'***'0.001'**'0.01'*'0.05'.'0.1' '1Residual standard error: 0.1624 on 13 degrees of freedomMultiple R-Squared: 0.9757, Adjusted R-squared: 0.9738F-statistic: 521.3 on 1 and 13 DF, p-value: 7.103e-12ˆβ0= 6.03,ˆβ1= −0.222.On the original scale,exp(ˆβ0) = 415.Fitted Model:y = 415 × e−0.222xwhere:y = bacteria count inhundredsx = time in 6-minute intervalsStatistics 572 (Spring 2007) Model Modifications February 6, 2007 11 / 20The Log Transformation ExampleConfidence and Prediction Intervals> t0 = data.frame(Time = 10)> predict(fit2, t0, interval = "c")fit lwr upr[1,] 3.812403 3.71256 3.912246> predict(fit2, t0, interval = "p")fit lwr upr[1,] 3.812403 3.44756 4.177246> exp(predict(fit2, t0, interval = "c"))fit lwr upr[1,] 45.25907 40.95853 50.01116> exp(predict(fit2, t0, interval = "p"))fit lwr upr[1,] 45.25907 31.42363 65.1861The point estimate for the meanbacteria count in the 10th timeinterval is 45.3.A 95% confidence interval goesfrom 41 to 50.A 95% prediction interval goesfrom 31.4 to 65.2.Statistics 572 (Spring 2007) Model Modifications February 6, 2007 12 / 20The Log Transformation ExampleOther TransformationsWe could transform either y or x or both.Common transformations include:natural log, ln or loglog base 10, log10square root,√·reciprocal, 1/yLess common transformations include:Squaring, y2Reciprocal squaring, 1/y2Cube root, y1/3,Arcsin transformation, arcsin√y, useful when y is a proportion.Statistics 572 (Spring 2007) Model Modifications February 6, 2007 13 / 20The Log Transformation Box-Cox TransformationsBox-Cox TransformationsBox-Cox transformations are a continuous family of powertransformations.The
View Full Document