Stat 401 B – Lecture 201Quadratic Model In order to account for curvature in the relationship between an explanatory and a response variable, one often adds the square of the explanatory variable to the simple linear model.2Quadratic Model Conditions on Independent Identically distributed Normally distributed with common standard deviation, εβββ+++=2210XXYσε3Example Response, Y: Population of the U.S. (millions) Explanatory, X: Year the census was taken.Stat 401 B – Lecture 204Quadratic Model Predicted Population = 21006.1 –23.3785*Year + 0.00651*Year2 We cannot interpret the estimated slope coefficients because we cannot change Year by 1 while holding Year2 constant.5Model Utility F=8050.89, P-value<0.0001 The small P-value indicates that the quadratic model relating population to Year and Year2is statistically significant (useful).6Statistical Significance Year (added to Year2) t=–33.48, P-value<0.0001 The P-value is small, therefore the addition of Year is statistically significant.Stat 401 B – Lecture 207Statistical Significance Year2(added to Year) t=35.22, P-value<0.0001 The P-value is small, therefore the addition of Year2is statistically significant.8Quadratic Model R2=0.999 or 99.9% of the variation in population can be explained by the quadratic model. RMSE=2.779Summary - Quadratic The model is useful. Each term is a statistically significant addition. 99.9% of the variation in population is explained by the quadratic model.Stat 401 B – Lecture 2010Prediction Year 2000 Predicted Population = 21006.1 – 23.3785(2000) + 0.0065063*(2000)2= 274.3 million Not bad as the actually figure in 2000 was 281.422 million.11Prediction Year 1800 Predicted Population = 21006.1 – 23.3785(1800) + 0.0065063*(1800)2= 5.212 million Very close to the actual value of 5.308 million12050100150200250Population1750 1800 1850 1900 1950 2000YearStat 401 B – Lecture 2013-7.5-5.0-2.50.02.55.0Residual1750 1800 1850 1900 1950 2000Year14Plot of Residuals The residuals wiggle around the zero line. Hard to say whether this is a pattern or not. The residuals for 1940 and 1950 stick out. The quadratic model over predicts for these years.15Can we do better? Could try higher order polynomial terms like Year3or Year4. Year3 is not statistically significant in a cubic model. Year4 is not statistically significant in a quartic model.Stat 401 B – Lecture 2016Quadratic Model There is still the issue of trying to interpret the coefficients in the quadratic model. Again, creating a new explanatory variable, Year2,has introduced multicollinearity into the quadratic model.171800185019001950200033000003400000350000036000003700000380000039000004000000Year1800 1850 1900 1950 2000YearSqr3300000 3600000380000018Correlation Year and Year2 Correlation: r = 0.9999 For the values that Year takes on, there is an extremely strong positive linear correlation with Year2.Stat 401 B – Lecture 2019Centering Center Year by subtracting off the mean before constructing the squared term in the quadratic model. Mean year is 1890.20Quadratic Model Predicted Population = –2235.197 + 1.215*Year+ 0.00651*(Year – 1890)2 Note that the estimated slope for year is exactly the same as in the simple linear model.211800185019001950-20000200040006000800010000Year1800 1850 1900 1950YearCtrSqr-2000 2000 4000 6000 8000Stat 401 B – Lecture 2022Correlation Year and (Year – 1890)2 Correlation: r = –0.0000 For the values that Year takes on, there is no linear correlation with (Year – 1890)2.23Centering Centering has completely removed the multicollinearityresulting from the inclusion of the quadratic term in the quadratic model.24Quadratic Model Predicted Population = 61.926 + 1.215*(Year – 1890) + 0.00651*(Year – 1890)2 The predicted population in 1890 is 61.926 million.Stat 401 B – Lecture 2025Quadratic Model Predicted Population = 61.926 + 1.215*(Year – 1890) + 0.00651*(Year – 1890)2 For each additional year, the population goes up, on average, 1.215 million.26Quadratic Model Predicted Population = 61.926 + 1.215*(Year – 1890) + 0.00651*(Year – 1890)2 In addition to the average change per year, there is a bigger adjustment to this rate of change the further away you are from 1890.27One Year Change 1880: Pred = 50.427 million 1890: Pred = 61.926 million Difference of 11.499 1980: Pred = 224.007 1990: Pred = 248.526 Difference of
View Full Document