Example: Manatee Deaths predicted from Number of Boats RegisteredIES 612/STA 4-573/STA 4-576Spring 2005Week 03 – IES612-lecture-week03.docChecking Model Assumptions (OL 13.4) – an initial visitRECALL: Basic ModelYi = 0 +1Xi + i [“simple linear regression”]i ~ indep. N(0, 2)Definition:[Def 1] (Raw) Residuals = observed response – predicted responseoreiyiˆ y i[Def 2] (Standardized Residuals) eiMSEeis2residualstd.dev[Def 3] (Studentized Residuals) eiMSE(1 hii)eis2(1 hii)residualadj.SDAssumption Diagnostic? How do you check the assumption?Remediation?1. E(i) = 0 ] –> E(Yi) = 0 +1Xi–> line is a reasonable model for describing mean change as a function of xD1.1: Plot ei vs. ˆ y iD1.2: Plot ei vs. xi[check to see if pattern exists]D1.3: Plot Yi vs. xi and superimpose plot of ˆ y ivs. xi.D1.4: Large R2/signif. slopeCurvature? Polynomial regression model or nonlinear regression modelSmooth regression? LOWESSTransformation? Log/square root2. V(i) = 2–> V(Yi) =2 –> constant variance –> scatter about the line is the same regardless of the value ofxD2.1: Plot ei vs. ˆ y i[check to see if you have a constant band about zero]Weighted Least Squares?Transformation07:13 Monday, January 14, 2019 13. i ~ NormalD3.1: Normal probability plotof ei [see if linear]D3.2: Histogram of residuals [bell-shaped?]Transformation?Generalized Linear Models (e.g. logistic/probit regression for dichotomous responses; Poisson regression for count responses)4. i independentD4.1: Generally examining the design can suggest if this is trueD4.2: Durbin-Watson testCorrelated regression models?Time series/spatial methods5.* no important omitted variables {relates to pt. 1}D5.1: Plot ei vs. omitted variables [see if pattern]Add omitted variable to a model (multiple regression)6.* no points exerting undo influenceD6.1: Look at statistics that quantify influence (e.g. DFBETAS, DFFITS, etc.)D6.2: Look for extreme X values (break in stemplots of X)Smooth model-robust fitting procedure (e.g. Least AbsoluteValue regression)7.* no extreme outliers impacting inferenceD7.1: Large residual (e.g. standardized/studentized residual >3/2?)D7.2: Break in stemplot of residualsCheck to see if data sheet correct – fix? Don’t simply omit. Report analysis both including/excluding point?Example: Manatee Deaths predicted from Number of Boats Registeredoptions ls=75;data example1; input year nboats manatees; cards;77 447 1378 460 2179 481 2480 498 1681 513 2482 512 2083 526 1584 559 3485 585 3386 614 3387 645 3907:13 Monday, January 14, 2019 288 675 4389 711 5090 719 47;ODS RTF;*file='D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\linreg-output.rtf';proc reg;title 'Number of Manatees killed regressed on the number of boats registered in Florida'; model manatees = nboats / p r cli clm; plot manatees*nboats p.*nboats / overlay; plot r.*nboats r.*p.; * residuals vs x and yhat; plot r.*nqq.; * normal qqplot;run;ODS RTF CLOSE;Residuals plot – model adequate? Constant variance?* now in Excel07:13 Monday, January 14, 2019 3Plot of Residuals vs. Predicted-10-8-6-4-2024680 10 20 30 40 50 60Yhat (predicted response)ResidualSeries1manatees = -41.43 +0.1249 nboatsN 14 Rsq 0.8864AdjRsq0.8769RMSE 4.2764manatees101520253035404550nboats425 450 475 500 525 550 575 600 625 65 0 67 5 700 725Plot manatees*nb oats PRED*nboats07:13 Monday, January 14, 2019 4* now in ExcelScatterplot of Manatee Deaths with superimposed fit01020304050600 100 200 300 400 500 600 700 800Number of Boats (1000s)Manatees KilledStudentized Residuals – outliers?Output StatisticsObs -2-1 0 1 2Cook'sD1| | |0.0172| |** |0.1783| |** |0.1494| **| |0.0915| | |0.0066| *| |0.0217| ****| |0.24407:13 Monday, January 14, 2019 5Output StatisticsObs -2-1 0 1 2Cook'sD8| |** |0.0739| | |0.00510| *| |0.01511| | |0.00012| | |0.00013| |* |0.09114| | |0.027Studentized Residuals-3 -2 -1 0 1 2135791113ObservationNormal errors? - Normal quantile-quantile plot07:13 Monday, January 14, 2019 6manatees = -41.43 +0.1249 nboatsN 14 Rsq 0.8864AdjRsq0.8769RMSE 4.2764Residual-10.0-7.5-5.0-2.50.02.55.07.5Normal Q uantile-3 -2 -1 0 1 2 3Multiple Regression (OL Chapter 12)* More than one predictor variableExample: Lung function in miners exposed to coal dustFEV101COAL 2AGE 3HT 4SMOKING Example: Polynomial regressionY 01X 2X2orY 01X X 2X X 2Example: Indicator variables – e.g. different lines in different groupsY 01Igroup 22X 3Igroup 2X where Igroup2 = 1 (group 2) and Igroup2 = 0 (group 1)07:13 Monday, January 14, 2019 7GROUP 1:Y 02X GROUP 2 :Y 01 23 X So, GROUP 2 INTERCEPT differs from GROUP 1 intercept by 1GROUP 2 SLOPE differs from GROUP 1 slope by 3GENERAL FORM: Yi01Xi12Xi 2K kXikii~ N(0,2)i 1,K ,n(observations)n k(var iables)Comments:1. “LINEAR” model because the regression coefficients enter the model in a linear way – compareY 01X32sin(X ) andY 0X1So, how does a multiple regression model (MR) differ from simple linear regression (SLR)?i. SLR is the equation of LINE; MR is the equation of a (hyper-)PLANEii. 0 is the mean response when X=0 in SLR while 0 is the mean response when ALL X’s=0 in MRiii. 2 regression coefficients in SLR; k+1 regression coefficients in MRiv. interpretation of coefficients? Partial coefficients in MRv. Model scope (space covered by the Xs)Estimating regression coefficientsLeast squares – minimize nikikiiiniiiXXXYYEY122211012][Estimate of 2 )()()1(ˆ122parmregofnumbernsobservatioofnumberresidualssquaredofsumknyysnii07:13 Monday, January 14, 2019 8F Test of any relationship between Y and set of predictor variablesH0: 1 = 2 = …=k = 0Ha: at least one of i ≠ 0TS: Fobs = [SS(Reg)/k] / [SS(Resid)/(n-k-1)]= MS(Reg)/MS(Resid)RR: Reject H0 if Fobs > F, k,
View Full Document