Stat 401 B – Lecture 311Model Selection Response: Highway MPG Explanatory: 13 explanatory variables Indicator variables for types of car – Sports Car, SUV, Wagon, Minivan2Explanatory Variables Engine size (liters) Cylinders (number) Horsepower Weight Wheel Base Length Width3“Best” Model The 7-variable model with SUV, Minivan, All Wheel, Engine, Horsepower, Weight and Wheel BaseAppears to be the “best” model.Stat 401 B – Lecture 314Prediction EquationPredicted Highway MPG = 30.74 – 3.15*SUV – 3.28*Minivan –2.08*All Wheel – 1.65*Engine –0.0226*Horsepower –0.0029*Weight + 0.163*Wheel Base5Summary All variables add significantly. R2= 0.705 adj R2= 0.682 RMSE = 3.430786 Cp= 4.90116-20-15-10-505101520ResidualBest Model15 20 25 30 35Predicted Highw ay MPGStat 401 B – Lecture 317.01.05.10.25.50.75.90.95.99-3-2-10123Normal Quantile Plot5101520253035Count-5 0 5 10 158.01.05.10.25.50.75.90.95.99-3-2-10123Normal Quantile Plot10203040Count-2 -1 0 1 2 3 4 5Standardized ResidualDistributions9Box Plot – Potential Outliers3.664812.573333.426746Volkswagen Jetta GLS TDI 4dr4.573515.690735.309351Toyota Prius4dr (gas/electric)2.17747.470135.529943Toyota Echo 2dr manual2.51298.621335.378744Honda Civic HX 2drStandardized Residual, zResidualPredicted MPGHighway MPGVehicle NameStat 401 B – Lecture 3110Bonferroni Correction Adjust what is a small P-value. If a P-value is less than 0.0005, then the standardized residual is statistically significant.0005.010005.0residuals of #05.0==11Standardized Residual0.000253.6648Volkswagen JettaGLS TDI 4dr0.000004.5735Toyota Prius 4dr (gas/electric)0.029452.1774Toyota Echo 2dr manual0.011972.5129Honda Civic HX 2drProb > |z|Standardized Residual, zVehicle Name12Outliers Both the Toyota Prius and the Volkswagon Jetta have standardized residuals so extreme that they are considered statistically significant (P-value < 0.0005).Stat 401 B – Lecture 3113Leverage Because we have multiple explanatory variables, there is not an easy formula for leverage, h. The leverage, h, value takes into account all of the explanatory variables.14Rule of Thumb High Leverage Value ifn = 100, p = 7, ⎟⎠⎞⎜⎝⎛+>nph1216.01008212 =⎟⎠⎞⎜⎝⎛=⎟⎠⎞⎜⎝⎛+np15Leverage There are 10 vehicles that have leverage, h, greater than 0.16. Of these, 2 have F-statistics large enough to produce P-values smaller than 0.0005.Stat 401 B – Lecture 3116Leverage0.000008.8520.4084Porsche 911 GT2 2 dr0.000394.2800.2532Chevrolet Corvette convertible 2 drProb > FFh17Leverage What makes the leverage so high? Have to look for extreme values for the explanatory variables.18Leverage Chevy Corvette Has the 2ndlargest engine of all the vehicles – 5.7 liter and the 2ndhighest horsepower – 350 horsepower. Porsche 911 Has the highest horsepower of all the vehicles – 477 horsepower.Stat 401 B – Lecture 3119Influence – Cook’s D None of the vehicles has a value of Cook’s D that is greater than 1. The largest value of Cook’s D is 0.16 for the Toyota Prius 4dr (gas/electric). The second largest is 0.12 for the VW Jetta.20Influence Just because there are no vehicles with Cook’s D greater than 1, you should still look at the Studentized residuals.2110203040Count-2 -1 0 1 2 3 4 5Studentized ResidualDistributionsStat 401 B – Lecture 3122Studentized Residual0.000273.7843Volkswagen JettaGLS TDI 4dr0.000014.5031Toyota Prius 4dr (gas/electric)0.027022.2471Toyota Echo 2dr manual0.011892.5663Honda Civic HX 2drProb > | rs|StudentizedResidual, rsVehicle Name23Outliers Both the Toyota Prius and the Volkswagon Jetta have studentized residuals so extreme that they are considered statistically significant (P-value < 0.0005).24Summary Toyota Prius and Volkswagon Jettaare statistically significant outliers. Chevy Corvette and Porsche 911 are statistically significant high leverage values. Toyota Prius and Volkswagon Jettaexert statistically significant influence.Stat 401 B – Lecture 3125RSquareRSquare AdjRoot Mean Square ErrorMean of ResponseObservations (or Sum Wgts)0.704860.6824043.43078627.7100Summary of FitModelErrorC. TotalSource79299DF2586.13281082.86723669.0000Sum ofSquares369.44811.770Mean Square31.3881F Ratio<.0001*Prob > FAnalysis of VarianceInterceptSUVMinivanAll WheelEngineHorsepowerWeightWheel BaseTerm30.735611-3.147224-3.283013-2.081883-1.654325-0.022587-0.0026880.1632806Estimate6.1906581.3856281.4367111.016240.7389660.0086840.0012210.075565Std Error4.96-2.27-2.29-2.05-2.24-2.60-2.202.16t Ratio<.0001*0.0255*0.0246*0.0433*0.0276*0.0108*0.0302*0.0333*Prob>|t|Parameter EstimatesResponse Highway MPG26Prediction Equation All 100 vehiclesPredicted Highway MPG = 30.74 – 3.15*SUV – 3.28*Minivan –2.08*All Wheel – 1.65*Engine –0.0226*Horsepower –0.0029*Weight + 0.163*Wheel Base27RSquareRSquare AdjRoot Mean Square ErrorMean of ResponseObservations (or Sum Wgts)0.7700870.7522052.66182527.2755198Summary of FitModelErrorC. TotalSource79097DF2135.8830637.67822773.5612Sum ofSquares305.1267.085Mean Square43.0646F Ratio<.0001*Prob > FAnalysis of VarianceInterceptSUVMinivanAll WheelEngineHorsepowerWeightWheel BaseTerm29.554037-2.543539-2.412591-1.649593-1.146582-0.016019-0.003680.1740468Estimate4.848561.078671.120140.7909450.578080.006820.0009590.059234Std Error6.10-2.36-2.15-2.09-1.98-2.35-3.842.94t Ratio<.0001*0.0205*0.0339*0.0398*0.05040.0210*0.0002*0.0042*Prob>|t|Parameter EstimatesResponse Highway MPGStat 401 B – Lecture 3128Prediction Equation Excluding Prius and JettaPredicted Highway MPG = 29.55 – 2.54*SUV – 2.41*Minivan –1.65*All Wheel – 1.15*Engine –0.0160*Horsepower –0.0037*Weight + 0.174*Wheel Base29Comment Note that Engine has a P-value of 0.0504 and so is not significant at the 0.05 level. This suggests that there is a different “best” model if we exclude Prius and Jetta. 30Comment A similar thing happens if you exclude the Porsche 911 and the Chevy Corvette.Stat 401 B – Lecture 3131Multicollinearity High correlation among explanatory variables is called multicollinearity. Multicollinearity causes standard errors of estimates to be larger than they should be. 32Variance Inflation Factor A general measure of the effect of multicollinearity is the variance inflation factor, VIF.211iiRVIF−=33Multiple R2 is the value of R2among the k – 1 explanatory variables excluding explanatory
View Full Document