Unformatted text preview:

IES 612/STA 4-573/STA 4-576Spring 2005Week 04 – IES612-lecture-week04.docF Test of any relationship between Y and set of predictor variablesH0: 1 = 2 = …=k = 0Ha: at least one of i ≠ 0TS: Fobs = [SS(Reg)/k] / [SS(Resid)/(n-k-1)]= MS(Reg)/MS(Resid)RR: Reject H0 if Fobs > F, k, n-k-1Example: Life Expectancy across different countries – any association? GNPliterenLifeExpWom log210Source DF Sum ofSquaresMeanSquareF Value Pr > FModel 2 5678.11397 2839.05698 151.54 <.0001Error 74 1386.40551 18.73521 Corrected Total 76 7064.51948 H0: 1=2=0 [LIFE EXPECTANCY is not related to either LITERACY or LOGGNP]H1: Either 1≠0 or 2≠0 or BOTH (1≠0 AND 2≠0)TS: Fobs=151.54P-value<0.0001Conclusion: Reject H0 and conclude LIFE EXPECTANCY is related to either LITERACY or LOGGNP or both.10:11 Monday, January 14, 2019 1(Partial) Test of jH0: j = 0Ha: j ≠ 0 Ha: j <0 Ha: j >0TS: )(0jjobsbSEbtRR: Reject H0 if |tobs | > t, n-k-1 tobs < -t, n-k-1 tobs > t, n-k-1Conclusions: Reject/Fail-to-reject H0?P-value:P(tn-k-1> |tobs|) P(tn-k-1< tobs) P(tn-k-1> tobs)Example: Life Expectancy across different countries – testing single reg. parameters GNPliterenLifeExpWom log210Parameter EstimatesVariable DF ParameterEstimateStandardErrort Value Pr > |t|Intercept 1 23.51270 2.96162 7.94 <.0001liter 1 0.20117 0.02678 7.51 <.0001loggnp 1 8.86394 1.22709 7.22 <.0001H0: 2=0 [the prediction of LIFE EXPECTANCY is NOT improved by adding LOGGNP to a model already containing LITERACY]H1: 2≠0 [LOGGNP is needed in addition to LITERACY for predicting LIFE EXPECTANCY]TS: tobs=7.2210:11 Monday, January 14, 2019 2P-value<0.0001Conclusion: Reject H0 and conclude that LOGGNP is a significant variable for modeling LIFE EXPECTANCY that adds to a model already containing LITERACY.Testing a subset of the predictors [General Linear Test]H0: g+1 = g+2 = … = k = 0 [implies only need “g+1” of the “k+1” predictor variables]Ha: not H0 [more than the REDUCED model is needed]TS: )()/(#][]1/[),(Re)/()]Re,(Re),(Re[CompleteMSEdconstraineparametersExtraSSFknCompletesidSSgkducedgSSCompletegSSFobsobsExample: Life Expectancy across different countries – all 5 variables needed?“Complete”/”Full” model -> PCTURBANPOPNAREAGNPliterenLifeExpWom543210logloglog“Reduced” model -> GNPliterenLifeExpWom log210H0: 3=4=5=0 [LogAREA, LogPOPN and PCTURBAN do not add to a model already containing LITERACY and LOGGNP]H1: at least one of (3, 4, 5)≠0TS: Fobs=0.30 [see SAS output below]P-value=0.8223Conclusion: Fail to Reject H0 and conclude that LogAREA, LogPOPN and PCTURBAN do not appear to significantly improve a LIFE EXPECTANCY model that already contains LITERACY and LOGGNP as predictor variables.10:11 Monday, January 14, 2019 3/* SAS code for testing a subset of parameters in a model */data country; title ‘country data analysis’; infile "\\Casnov5\MST\MSTLab\Baileraj\country.data"; * reads an data file; input name $ area popnsize pcturban lang $ liter lifemen lifewom pcGNP; logarea = log10(area); logpopn = log10(popnsize); loggnp = log10(pcGNP); drop area popnsize pcgnp;proc reg; title LIFEWOM predicted from PCTURBAN LITER LOGAREA LOGPOPN LOGGNP; model lifewom = pcturban liter logarea logpopn loggnp; test pcturban=logarea=logpopn=0; ****** for testing subset; run; LIFEWOM predicted from PCTURBAN LITER LOGAREA LOGPOPN LOGGNP The REG Procedure Model: MODEL1 Dependent Variable: lifewom Number of Observations Read 79 Number of Observations Used 67 Number of Observations with Missing Values 12COMMENT: Some variables were missing on one or more of the predictor variables. SAS deletes records that are not complete on ALL variables. You will see that the regression model with only LITERand LOGGNP as predictors has a different number of observations. Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 5 4473.89310 894.77862 43.96 <.0001 Error 61 1241.74869 20.35654 Corrected Total 66 5715.64179 Root MSE 4.51182 R-Square 0.7827 Dependent Mean 64.77612 Adj R-Sq 0.7649 Coeff Var 6.96525 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 27.79999 4.53708 6.13 <.0001 pcturban 1 0.02241 0.03757 0.60 0.5530 liter 1 0.19211 0.03180 6.04 <.0001 logarea 1 -0.41442 0.93342 -0.44 0.6586 logpopn 1 -0.26259 1.06069 -0.25 0.8053 loggnp 1 7.73888 1.81985 4.25 <.0001COMMENT: The partial (single parameter) tests also casts doubt on whether PCTURBAN, LOGAREA andLOGPOPN add to the model; however, these don’t test ALL of these variables simultaneously. Model: MODEL1 Test 1 Results for Dependent Variable lifewom Mean Source DF Square F Value Pr > F Numerator 3 6.19064 0.30 0.8223 Denominator 61 20.3565410:11 Monday, January 14, 2019 4How stable is the model fit? Are the predictor variables highly correlated?Collinearity refers to the predictor variables being highly correlated – i.e. do the variables provide redundant information? This can be measured by different ways:1. Does a scatterplot (or pairwise r) of Xi vs. Xj suggest high correlation?2. Is the R2 when Xi is predicted from X1,…,Xi-1,Xi+1,…,Xk large? Or Tolerance = 1-R2 small? Or VIF = 1/(1-R2) large? (say


View Full Document

MIAMI IES 612 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?