DOC PREVIEW
UW-Madison STAT 572 - Model Modifications - Handouts

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

The Big PictureRemedies after Model DiagnosticsExampleBacteria Count ExampleThe Log TransformationReviewExponential ModelExampleBox-Cox TransformationsWeighted RegressionExampleModel ModificationsBret LargetDepartments of Botany and of StatisticsUniversity of Wisconsin—MadisonFebruary 6, 2007Statistics 572 (Spring 2007) Model Modifications February 6, 2007 1 / 20The Big Picture Remedies after Model DiagnosticsThe Big PictureResidual plots can indicate lack of model fit.There are several possible remedies, including:1Transform one or both variables and check if the standard assumptionsare reasonable for the transformed variable(s).Might be useful when residual plots indicate non-linearity and/orheteroscedasticity.Conventional transformation include logarithms and square roots.The Box-Cox family of transformations is also useful.2Use weighted least squares when there is explainable heteroscedasticitybut the linear model is otherwise fine.3Use polynomial regression when there is non-linearity (curvature) butvariances are close to constant.Statistics 572 (Spring 2007) Model Modifications February 6, 2007 2 / 20Example Bacteria Count ExampleBacteria Count ExampleData consist of number of surviving bacteria after exposure to X-raysfor different time periods.Time denotes time measured in six-minute intervals.N denotes the number of survivors in hundreds.Time 1 2 3 4 5 6 7 8N 355 211 197 166 142 166 104 60Time 9 10 11 12 13 14 15N 56 38 36 32 21 19 15Statistics 572 (Spring 2007) Model Modifications February 6, 2007 3 / 20Example Bacteria Count ExampleExample (cont.)Begin by plotting data.Fit a linear model.Assess fit informally with residual plots.Statistics 572 (Spring 2007) Model Modifications February 6, 2007 4 / 20Example Bacteria Count ExampleScatterplot> Time = 1:15> N = c(355, 211, 197, 166, 142, 166, 104, 60, 56, 38,+ 36, 32, 21, 19, 15)> par(las = 1, pch = 16)> plot(Time, N)> fit1 = lm(N ~ Time)> plot(fit1, which = 1)●●●●●●●●●●●●●●●2 4 6 8 10 12 1450100150200250300350TimeNStatistics 572 (Spring 2007) Model Modifications February 6, 2007 5 / 20Example Bacteria Count ExampleResidual PlotScatterplot shows lack oflinearity.Residual plot also showsincreasing variance.Observation #1 is a bit of anoutlier.Consider transforming variablesto see if model fits better.0 50 100 150 200 250−50050100Fitted valuesResiduals●●●●●●●●●●●●●●●1815Statistics 572 (Spring 2007) Model Modifications February 6, 2007 6 / 20The Log Transformation ReviewReview of Exponentiation and LogarithmsThe constant e ≈ 2.718 is the base of the natural logarithm.Recall from calculus, e is the unique base where the derivative equalsthe function,ddx(ex) = ex.I will use log, not ln, to stand for the natural logarithm.Products of exponentials are exponentials of sums, ea× eb= ea+b.The natural logarithm of e is one, log e = 1.Any logarithm of 1 is zero, log 1 = 0.Rule for exponents, log ab= b log a.Logarithms of products, log(ab) = log(a) + log(b).exexists for all x and ex> 0.log x is defined only for x > 0.Statistics 572 (Spring 2007) Model Modifications February 6, 2007 7 / 20The Log Transformation Exponential ModelExponential ModelHere there is a theoretical model:nt= n0eβ1t× E ,wheret is timentis the number of bacteria at time tn0is the number of bacteria at time t = 0β1< 0 is a decay rateE is some multiplicative error.Take natural logs of both sides of the model:log(nt) = log(n0eβ1tE ) = log(n0) + log(eβ1t) + log(E )= log(n0) + β1t + log(E )= β0+ β1t + e,That is, we log-transformed ntand the result is a usual linear-linemodel, if the error E on the original scale is multiplicative and itslogarithm is normally distributed.Statistics 572 (Spring 2007) Model Modifications February 6, 2007 8 / 20The Log Transformation ExampleScatterplot> par(las = 1, pch = 16)> plot(Time, log(N))> fit2 = lm(log(N) ~ Time)> plot(fit2, which = 1)●●●●●●●●●●●●●●●2 4 6 8 10 12 143.03.54.04.55.05.5Timelog(N)Statistics 572 (Spring 2007) Model Modifications February 6, 2007 9 / 20The Log Transformation ExampleResidual PlotDiagnostics are consistent witha model that fits well.There is no obviousnon-linearity.There is no obviousheteroscedasticity.Residual plot has no largedeviations from random scatter.3.0 3.5 4.0 4.5 5.0 5.5−0.20.00.20.4Fitted valuesResiduals●●●●●●●●●●●●●●●6210Statistics 572 (Spring 2007) Model Modifications February 6, 2007 10 / 20The Log Transformation ExampleFitted Model for Log-Transformed Data> summary(fit2)Call:lm(formula = log(N) ~ Time)Residuals:Min 1Q Median 3Q Max-0.233578 -0.091798 -0.007255 0.050165 0.413068Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 6.028695 0.088259 68.31 < 2e-16 ***Time -0.221629 0.009707 -22.83 7.1e-12 ***---Signif. codes: 0'***'0.001'**'0.01'*'0.05'.'0.1' '1Residual standard error: 0.1624 on 13 degrees of freedomMultiple R-Squared: 0.9757, Adjusted R-squared: 0.9738F-statistic: 521.3 on 1 and 13 DF, p-value: 7.103e-12ˆβ0= 6.03,ˆβ1= −0.222.On the original scale,exp(ˆβ0) = 415.Fitted Model:y = 415 × e−0.222xwhere:y = bacteria count inhundredsx = time in 6-minute intervalsStatistics 572 (Spring 2007) Model Modifications February 6, 2007 11 / 20The Log Transformation ExampleConfidence and Prediction Intervals> t0 = data.frame(Time = 10)> predict(fit2, t0, interval = "c")fit lwr upr[1,] 3.812403 3.71256 3.912246> predict(fit2, t0, interval = "p")fit lwr upr[1,] 3.812403 3.44756 4.177246> exp(predict(fit2, t0, interval = "c"))fit lwr upr[1,] 45.25907 40.95853 50.01116> exp(predict(fit2, t0, interval = "p"))fit lwr upr[1,] 45.25907 31.42363 65.1861The point estimate for the meanbacteria count in the 10th timeinterval is 45.3.A 95% confidence interval goesfrom 41 to 50.A 95% prediction interval goesfrom 31.4 to 65.2.Statistics 572 (Spring 2007) Model Modifications February 6, 2007 12 / 20The Log Transformation ExampleOther TransformationsWe could transform either y or x or both.Common transformations include:natural log, ln or loglog base 10, log10square root,√·reciprocal, 1/yLess common transformations include:Squaring, y2Reciprocal squaring, 1/y2Cube root, y1/3,Arcsin transformation, arcsin√y, useful when y is a proportion.Statistics 572 (Spring 2007) Model Modifications February 6, 2007 13 / 20The Log Transformation Box-Cox TransformationsBox-Cox TransformationsBox-Cox transformations are a continuous family of powertransformations.The


View Full Document

UW-Madison STAT 572 - Model Modifications - Handouts

Download Model Modifications - Handouts
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Model Modifications - Handouts and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Model Modifications - Handouts 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?