PSU STAT 501 - Best subsets regression - D2914528

Home> Schools> Penn State University> Statistics (STAT) > STAT 501> Best subsets regression

DOC PREVIEW

PSU STAT 501 - Best subsets regression

School name Penn State University

Course Stat 501- Regression Methods

Pages 23

This preview shows page 1-2-22-23 out of 23 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Model selectionStatement of problemExample: Cement dataSlide 4Two basic methods of selecting predictorsWhy best subsets regression?Slide 7What is used to judge “best”?R-squaredAdjusted R-squared or MSEMallow’s Cp criterionSlide 12Facts about Mallow’s CpUsing the Cp criterionSlide 15Slide 16Example: Modeling PIQSlide 18Slide 19Example: Modeling BPSlide 21Slide 22Best subsets regressionModel selection Best subsets regressionStatement of problem•A common problem is that there is a large set of candidate predictor variables.•Goal is to choose a small subset from the larger set so that the resulting regression model is simple, yet have good predictive ability.Example: Cement data•Response y: heat evolved in calories during hardening of cement on a per gram basis•Predictor x1: % of tricalcium aluminate•Predictor x2: % of tricalcium silicate•Predictor x3: % of tetracalcium alumino ferrite•Predictor x4: % of dicalcium silicateExample: Cement data83.35105.0561637.2559.758.7518.2583.35105.0519.546.561637.2559.758.7518.2519.546.5yx1x2x3x4Two basic methods of selecting predictors•Stepwise regression: Enter and remove predictors, in a stepwise manner, until no justifiable reason to enter or remove more.•Best subsets regression: Select the subset of predictors that do the best at meeting some well-defined objective criterion.Why best subsets regression?# of predictors (p-1)# of regression models1 2 : ( ) (x1)2 4 : ( ) (x1) (x2) (x1, x2)3 8: ( ) (x1) (x2) (x3) (x1, x2) (x1, x3) (x2, x3) (x1, x2, x3) 4 16: 1 none, 4 one, 6 two, 4 three, 1 fourWhy best subsets regression?•If there are p-1 possible predictors, then there are 2p-1 possible regression models containing the predictors. •For example, 10 predictors yields 210 = 1024 possible regression models.•A best subsets algorithm determines the best subsets of each size, so that choice of the final model can be made by researcher.What is used to judge “best”?•R-squared•Adjusted R-squared•MSE (or S = square root of MSE)•Mallow’s CpR-squaredSS TOSSESSTOSS RR  12Use the R-squared values to find the point where adding more predictors is not worthwhile because it leads to a very small increase in R-squared.Adjusted R-squared or MSEMSESSTOnSSTOSSEpnnRa11112Adjusted R-squared increases only if MSE decreases, so adjusted R-squared and MSE provide equivalent information.Find a few subsets for which MSE is smallest (or adjusted R-squared is largest) or so close to the smallest (largest) that adding more predictors is not worthwhile.Mallow’s Cp criterionThe goal is to minimize the total standardized mean square error of prediction:  212ˆ1niiippYEYE      niniipiippYVarYEYE1 122ˆˆ1which equals:which in English is:    variancesomebias some pMallow’s Cp criterion pnXXMSESSECppp2),...,(11Mallow’s Cp statisticestimates pwhere:• SSEp is the error sum of squares for the fitted (subset) regression model with p parameters.• MSE(X1,…, Xp-1) is the MSE of the model containing all p-1 predictors. It is an unbiased estimator of σ2.• p is the number of parameters in the (subset) modelFacts about Mallow’s Cp•Subset models with small Cp values have a small total standardized MSE of prediction.•When the Cp value is …–near p, the bias is small (next to none),–much greater than p, the bias is substantial,–below p, it is due to sampling error; interpret as no bias.•For the largest model with all possible predictors, Cp= p (always).Using the Cp criterion•So, identify subsets of predictors for which:–the Cp value is smallest, and–the Cp value is near p (if possible)•In general, though, don’t always choose the largest model just because it yields Cp= p.Best Subsets Regression: y versus x1, x2, x3, x4Response is y x x x x Vars R-Sq R-Sq(adj) C-p S 1 2 3 4 1 67.5 64.5 138.7 8.9639 X 1 66.6 63.6 142.5 9.0771 X 2 97.9 97.4 2.7 2.4063 X X 2 97.2 96.7 5.5 2.7343 X X 3 98.2 97.6 3.0 2.3087 X X X 3 98.2 97.6 3.0 2.3121 X X X 4 98.2 97.4 5.0 2.4460 X X X XStepwise Regression: y versus x1, x2, x3, x4 Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is y on 4 predictors, with N = 13 Step 1 2 3 4Constant 117.57 103.10 71.65 52.58x4 -0.738 -0.614 -0.237 T-Value -4.77 -12.62 -1.37 P-Value 0.001 0.000 0.205 x1 1.44 1.45 1.47T-Value 10.40 12.41 12.10P-Value 0.000 0.000 0.000x2 0.416 0.662T-Value 2.24 14.44P-Value 0.052 0.000S 8.96 2.73 2.31 2.41R-Sq 67.45 97.25 98.23 97.87R-Sq(adj) 64.50 96.70 97.64 97.44C-p 138.7 5.5 3.0 2.7Example: Modeling PIQ130.591.5100.72886.28373.2565.75130.591.5170.5127.5100.72886.28373.2565.75170.5127.5PIQMRIHeightWeightBest Subsets Regression: PIQ versus MRI, Height, WeightResponse is PIQ H W e e i i M g g R h h Vars R-Sq R-Sq(adj) C-p S I t t 1 14.3 11.9 7.3 21.212 X 1 0.9 0.0 13.8 22.810 X 2 29.5 25.5 2.0 19.510 X X 2 19.3 14.6 6.9 20.878 X X 3 29.5 23.3 4.0 19.794 X X XStepwise Regression: PIQ versus MRI, Height, Weight Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is PIQ on 3 predictors, with N = 38 Step 1 2Constant 4.652 111.276MRI 1.18 2.06T-Value 2.45 3.77P-Value 0.019 0.001Height -2.73T-Value -2.75P-Value 0.009S 21.2 19.5R-Sq 14.27 29.49R-Sq(adj) 11.89 25.46C-p 7.3

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-22-23 out of 23 pages.

PSU STAT 501 - Best subsets regression

Sign up for free to view:

Please select your school