DOC PREVIEW
MIAMI IES 612 - Lecture Notes

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

IES 612/STA 4-573/STA 4-576Spring 2005Week 05 – IES612-lecture-week05.docModel-Building Comments* Brain-first, computer-second – pick variables for inclusion in the model that are important based on your knowledge of the system. This doesn’t exclude the possibility of exploratory analyses; it simply asks for careful consideration of variables to be considered for inclusion in a model. Hierarchical Principle: If you have a higher order term, include all lower order constituent terms. For example, “X2” should be in a model if “X” (the linear term) is already present. The interaction term, “X1*X2” should be in the model if “X1” and “X2” are in the model. Aside: Interactions can be used to allow for the relationship between the response variable “Y” and an “X” variable, say “X1”, to vary with levels of another predictor variable “X2”. We will discuss this in more detail when we consider factorial designs in anova models. SYNERGY = observed response is beyond what would be predicted from the additive effect of the two variables [e.g. lung cancer as a function of smoking and asbestos exposure]. ANTAGONISM = observed response is less than would be predicted from the additive effects of the two variables [e.g. competition for the same binding sites].Principle of Parsimony: The model should be no more complicated than required to describe the response.Variable selection methodsi. ALL POSSIBLE REGRESSIONS (SAS Proc RSQUARE)Suppose you have 5 possible predictor variables – X1, X2, X3, X4, X5 - then you have 25-1 = 31 possible models#variables presentX1 X2 X3 X4 X51 X1X200:09 Monday, January 14, 2019 1X3X4X52 X1 X2X1 X3X1 X4X1 X5X2 X3X2 X4X2 X5X3 X4X3 X5X4 X53 X1 X2 X3X1 X2 X4X1 X2 X5X1 X3 X4X1 X3 X5X1 X4 X5X2 X3 X4X2 X3 X5X2 X4 X5X3 X4 X54 X1 X2 X3 X4X1 X2 X3 X5X1 X2 X4 X5X1 X3 X4 X5X2 X3 X4 X55 X1 X2 X3 X4 X5Which model do you select?1. Models with smallest incremental increase in R2 or smallest incremental decrease in SSE – look for the “elbow” in these indices.2. Adjusted R2 = 1 – [(n-1)/(n-p)]*SSE/SST and MSE=SSE/(n-p) adjust for the number of parameters in the model and are frequently used.00:09 Monday, January 14, 2019 23. Cp is an index of model biasedness – best if it is close in value to the number of parameters inthe model. 4. PRESS statistic = PREdiction Sum of Square criterion. Similar to SSE however predict each point from the remaining n-1 observations (i.e. delete ith case and then predict). 5. After you select a model, you still need to do all of the adequacy checks as we discussed previously.Example: Country data set – see country_mreg_varsel_27jan2003.docproc rsquare data=country outest=est mse cp sse;title2 all possible regression; model lifewom = logarea logpopn pcturban liter loggnp; run;ORdata country; title ‘country data analysis’; infile "\\Casnov5\MST\MSTLab\Baileraj\country.data"; * reads an data file; input name $ area popnsize pcturban lang $ liter lifemen lifewom pcGNP; logarea = log10(area); logpopn = log10(popnsize); loggnp = log10(pcGNP); ienglish = (lang="English"); drop area popnsize pcgnp; proc reg data=country; model lifewom= logarea logpopn pcturban liter loggnp / selection=rsquare; plot cp.*np. rsq.*np. mse.*np. run;[output – Edited]The REG ProcedureModel: MODEL1Dependent Variable: lifewomR-Square Selection MethodNumber of Observations Read 79Number of Observations Used 67Number of Observations with Missing Values 12Number in Model R-Square Variables in Model 1 0.6583 liter 1 0.6299 loggnp 1 0.4643 pcturban00:09 Monday, January 14, 2019 31 0.0161 logpopn 1 0.0146 logarea--------------------------------------------------------------- 2 0.7795 liter loggnp 2 0.7100 pcturban liter 2 0.6611 logpopn liter 2 0.6610 logarea liter 2 0.6449 pcturban loggnp 2 0.6335 logarea loggnp 2 0.6330 logpopn loggnp 2 0.4930 logarea pcturban 2 0.4843 logpopn pcturban 2 0.0183 logarea logpopn--------------------------------------------------------------- 3 0.7813 logarea liter loggnp 3 0.7810 logpopn liter loggnp 3 0.7801 pcturban liter loggnp 3 0.7179 logarea pcturban liter 3 0.7158 logpopn pcturban liter 3 0.6616 logarea logpopn liter 3 0.6523 logarea pcturban loggnp 3 0.6503 logpopn pcturban loggnp 3 0.6339 logarea logpopn loggnp 3 0.4943 logarea logpopn pcturban--------------------------------------------------------------- 4 0.7825 logarea pcturban liter loggnp 4 0.7820 logpopn pcturban liter loggnp 4 0.7815 logarea logpopn liter loggnp 4 0.7183 logarea logpopn pcturban liter 4 0.6527 logarea logpopn pcturban loggnp--------------------------------------------------------------- 5 0.7827 logarea logpopn pcturban liter loggnpPlot of R2 vs. number of parameters in the model00:09 Monday, January 14, 2019 4l i f e w o m = 2 7 . 8 - 0 . 4 1 4 4 l o g a r e a - 0 . 2 6 2 6 l o g p o p n + 0 . 0 2 2 4 p c t u r b a n + 0 . 1 9 2 1 l i t e r + 7 . 7 3 8 9 l o g g n pN 6 7 R s q 0 . 7 8 2 7A d j R s q0 . 7 6 4 9R M S E 4 . 5 1 1 80 . 00 . 10 . 20 . 30 . 40 . 50 . 60 . 70 . 8P2 . 0 2 . 5 3 . 0 3 . 5 4 . 0 4 . 5 5 . 0 5 . 5 6 . 0Plot of MSE vs. number of parameters in the model00:09 Monday, January 14, 2019 5l i f e w o m = 2 7 . 8 - 0 . 4 1 4 4 l o g a r e a - 0 . 2 6 2 6 l o g p o p n + 0 . 0 2 2 4 p c t u r b a n + 0 . 1 9 2 1 l i t e r + 7 . 7 3 8 9 l o g g n pN 6 7 R s q 0 . 7 8 2 7A d j R s q0 . 7 6 4 9R M S E 4 . 5 1 1 81 02 03 04 05 06 07 08 09 0P2 . 0 2 . 5 3 . 0 3 . 5 4 . 0 4 . 5 5 . 0 5 . 5 6 . 0ii. STEPWISE and other automatic variable selection methodsBackwards elimination: Start with all predictor variables in a model and eliminate those that are not important (don’t meet a significance level to stay in a model – SLSTAY in SAS).Forward selection: Start with none of predictor variables in a model and add those that are important


View Full Document

MIAMI IES 612 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?