DOC PREVIEW
MUSC BMTRY 701 - lect11

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 11 MulticollinearityMulticollinearity IntroductionEasy answers?MulticollinearityNo Multicollinearity Example: Mouse experimentLinear modelingDose of Drug A on TumorDose of Drug B on TumorDiet on TumorAll in the model togetherCorrelation matrix of predictors and outcomeResultThe other extreme: perfect collinearityThe model has infinitely many solutionsEffects of MulticollinearitySlide 16Implications in inferenceSlide 18Implications in InferenceSo, which is the ‘correct’ variable?ExampleSENICSlide 23Let’s try an example with serious multicollinearitySlide 25Lecture 11MulticollinearityBMTRY 701Biostatistical Methods IIMulticollinearity IntroductionSome common questions we ask in MLR•what is the relative importance of the effects of the different covariates?•what is the magnitude of effect of a given covariate on the response?•can any covariate be dropped from the model because it has little effect or no effect on the outcome?•should any covariates not yet included in the model be considered for possible inclusion?Easy answers?If the candidate covariates are uncorrelated with one another: yes, these are simple questionsIf the candidate covariates are correlated with one another: no, these are not easy.Most commonly:•observational studies have correlated covariates•we need to adjust for these when assessing relationships•“adjusting” for confoundersExperimental designs?•less problematic•patients are randomized in common designs•no confounding exists because factors are ‘balanced’ across armsMulticollinearityAlso called “intercorrelation”refers to the situation when the covariates are related to each other and to the outcome of interestlike confounding, but a statistical terminology for it because of the effects it has on regression modelingNo Multicollinearity Example: Mouse experimentMouse Dose A Dose B Diet Tumor size1 100 25 0 452 200 25 0 563 300 25 0 254 100 50 0 155 200 50 0 176 300 50 0 107 100 25 1 308 200 25 1 289 300 25 1 2010 100 50 1 1011 200 50 1 512 300 50 1 3Linear modelingInterested in seeing which factors influence tumor size in miceNotice that the experiment is perfectly balanced.What does that mean?Dose of Drug A on Tumor> reg.a <- lm(Tumor.size ~ Dose.A, data=data)> summary(reg.a)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 32.50000 12.29041 2.644 0.0246 *Dose.A -0.05250 0.05689 -0.923 0.3779 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 16.09 on 10 degrees of freedomMultiple R-Squared: 0.07847, Adjusted R-squared: -0.01368 F-statistic: 0.8515 on 1 and 10 DF, p-value: 0.3779 >Dose of Drug B on Tumor> reg.b <- lm(Tumor.size ~ Dose.B, data=data)> summary(reg.b)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 58.0000 9.4956 6.108 0.000114 ***Dose.B -0.9600 0.2402 -3.996 0.002533 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 10.4 on 10 degrees of freedomMultiple R-Squared: 0.6149, Adjusted R-squared: 0.5764 F-statistic: 15.97 on 1 and 10 DF, p-value: 0.002533 >Diet on Tumor> reg.diet <- lm(Tumor.size ~ Diet, data=data)> summary(reg.diet)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 28.000 6.296 4.448 0.00124 **Diet -12.000 8.903 -1.348 0.20745 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 15.42 on 10 degrees of freedomMultiple R-Squared: 0.1537, Adjusted R-squared: 0.06911 F-statistic: 1.817 on 1 and 10 DF, p-value: 0.2075All in the model together> reg.all <- lm(Tumor.size ~ Dose.A + Dose.B + Diet, data=data)> summary(reg.all)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 74.50000 8.72108 8.543 2.71e-05 ***Dose.A -0.05250 0.02591 -2.027 0.077264 . Dose.B -0.96000 0.16921 -5.673 0.000469 ***Diet -12.00000 4.23035 -2.837 0.021925 * ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 7.327 on 8 degrees of freedomMultiple R-Squared: 0.8472, Adjusted R-squared: 0.7898 F-statistic: 14.78 on 3 and 8 DF, p-value: 0.001258Correlation matrix of predictors and outcome> cor(data[,-1]) Dose.A Dose.B Diet Tumor.sizeDose.A 1.0000000 0.0000000 0.0000000 -0.2801245Dose.B 0.0000000 1.0000000 0.0000000 -0.7841853Diet 0.0000000 0.0000000 1.0000000 -0.3920927Tumor.size -0.2801245 -0.7841853 -0.3920927 1.0000000>ResultFor perfectly balanced designs, adjusting does not affect the coefficientsHowever, it can affect the significanceWhy?•residual sum of squares is affected•if you explain more of the variance in the outcome, less is left to chance/error•when you adjust for another related factor, you will likely improve the significanceThe other extreme: perfect collinearityMouse Dose A Dose C Diet Tumor size1 100 100 0 452 200 300 0 563 300 500 0 254 100 100 0 155 200 300 0 176 300 500 0 107 100 100 1 308 200 300 1 289 300 500 1 2010 100 100 1 1011 200 300 1 512 300 500 1 3The model has infinitely many solutionsToo much flexibilityWhat happens?The fitting algorithm usually gives you some indication of this•will not fit the model and gives an error•drops one of the predictors“perfectly collinear” = “perfect confounding”Effects of MulticollinearityMost common result•two covariates are independently associated with Y in simple linear regression models•in MLR model with both covariates, one or both is insignificant•the magnitude of the regression coefficients is attenuated•why?recall the adjusted variable plotif the two are related, removing the systematic part of one from Y may leave too little left to explainEffects of MulticollinearityOther situations•Neither is significant alone, but they are both significant together (somewhat rare)•Both are significant alone and both retain signficance in the model•The regression coefficient for one of the covariates may change direction•Magnitude of coefficient may increase (in absolute value)It is usually hard to predict exactly what will happen when both are in the modelImplications in inferencethe interpretation of a regression coefficient measuring the change in the


View Full Document

MUSC BMTRY 701 - lect11

Documents in this Course
lect3

lect3

38 pages

lect9

lect9

28 pages

lect18

lect18

17 pages

lect1

lect1

51 pages

lect12

lect12

24 pages

lect7

lect7

38 pages

lect9

lect9

29 pages

lect13

lect13

40 pages

lect22

lect22

12 pages

lect10

lect10

40 pages

lect15

lect15

23 pages

lect14

lect14

47 pages

lect13

lect13

32 pages

lect12

lect12

24 pages

lecture18

lecture18

48 pages

lect17

lect17

29 pages

lect4

lect4

50 pages

lect4

lect4

48 pages

lect16

lect16

27 pages

lect8

lect8

20 pages

Load more
Download lect11
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lect11 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lect11 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?