DOC PREVIEW
MUSC BMTRY 701 - lect16

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 16: Logistic Regression: Goodness of Fit Information Criteria ROC analysisGoodness of FitSet up as a hypothesis testGoodness of Fit testSlide 5GoF test for Prostate Cancer ModelMore Goodness of FitInformation CriteriaSlide 9Akaike Information Criteria (AIC)AIC versus BICProstate cancer modelsAIC vs. BIC (N=380)AIC vs. BIC if N is multiplied by 10 (N=3800)ROC curve analysisSlide 16Slide 17Fitted probabilitiesROC curveSlide 20How to interpret?AUC of ROC curveUtility in model selectionROC curve of models 1, 2, and 3Sensitivity and Specificityphat = 0.50 cutoffphat = 0.25 cutoffLecture 16:Logistic Regression: Goodness of FitInformation CriteriaROC analysisBMTRY 701Biostatistical Methods IIGoodness of FitA test of how well the model explains the dataApplies to linear models and generalized linear modelsHow to do it?It is simply a comparison of the “current” model to a perfect model•What would the estimated likelihood function be in a perfect model?•What would the estimated log-likelihood function be in a perfect modelSet up as a hypothesis testHo: current modelH1: perfect modelRecall the G2 statistic comparing models: G2 = Dev(0) - Dev(1)How many parameters are there in the null model? How many parameters are there in the perfect model?Goodness of Fit testPerfect model: Assumed to be ‘saturated’ in most casesThat is, there is a parameter for each combination of predictorsIn our model? that is likely to be close to N due to the number of continuous variablesDefine c = number of parameters in saturated modelDeviance goodness of fit: Dev(0)Goodness of Fit testDeviance goodness of fit: Dev(0)If Dev(Ho) < χ2(c-p),1-α, conclude H0If Dev(Ho) > χ2(c-p),1-α conclude H1Why arent we subtracting deviances?GoF test for Prostate Cancer Model> mreg1 <- glm(cap.inv ~ gleason + log(psa) + vol + factor(dpros),+ family=binomial)> mreg0 <- glm(cap.inv ~ gleason + log(psa) + vol, family=binomial)> mreg1Coefficients: (Intercept) gleason log(psa) vol -8.31383 0.93147 0.53422 -0.01507 factor(dpros)2 factor(dpros)3 factor(dpros)4 0.76840 1.55109 1.44743 Degrees of Freedom: 378 Total (i.e. Null); 372 Residual (1 observation deleted due to missingness)Null Deviance: 511.3 Residual Deviance: 377.1 AIC: 391.1 Test Statistic: 377.1 ~ χ2(380 - 7) Threshold: χ2(373),1-α, = 419.0339p-value = 0.43More Goodness of FitThere are a lot of options!Deviance GoF is just one•Pearson Chi-square •Hosmer-Lemeshow•etcPrinciples, however, are essentially the sameGoF is not that commonly seen in medical research because it is rarely very importantInformation CriteriaInformation criterion is a measure of the goodness of fit of an estimated statistical model. It is grounded in the concept of entropy, • offers a relative measure of the information lost •describes the tradeoff precision and complexity of the model.An IC is not a test on the model in the sense of hypothesis testingit is a tool for model selection. Given a data set, several competing models may be ranked according to their ICThe model with the lowest IC is chosen as the “best”Information CriteriaIC rewards goodness of fit, but also includes a penalty that is an increasing function of the number of estimated parameters. This penalty discourages overfitting. The IC methodology attempts to find the model that best explains the data with a minimum of free parameters. More traditional approaches such as LRT start from a null hypothesis. IC judges a model by how close its fitted values tend to be to the true values. the AIC value assigned to a model is only meant to rank competing models and tell you which is the best among the given alternatives.Akaike Information Criteria (AIC)pLikAIC 2log2 Akaike, Hirotugu (1974). "A new look at the statistical model identification". IEEE Transactions on Automatic Control 19 (6): 716–723.. Bayesian Information Criteria)ln(log2 NpLikBIC Schwarz, Gideon E. (1978). "Estimating the dimension of a model". Annals of Statistics 6 (2): 461–464.AIC versus BICBIC and AIC are similarDifferent penalty for number of parameters The BIC penalizes free parameters more strongly than does the AIC. Implications: BIC tends to choose smaller modelsThe larger the N, the more likely that AIC and BIC will disagree on model selection)ln( . 2 NpvspProstate cancer modelsWe looked at different forms for volume:A: volume as continuousB: volume as binary (detectable vs. undetectable)C: 4 categories of volumeD: 3 categories of volumeE: linear + squared term for volumeAIC vs. BIC (N=380)p -2logLik AIC BICA: continuous 8 376.0 392.0 423.5B: binary 8 375.2 391.2 422.7C: 4 categories 10 373.6 393.6 433.0D: 3 categories 9 375.2 393.2 428.6E: quadratic 9 376.0 394.0 429.4AIC vs. BIC if N is multiplied by 10 (N=3800)p -2logLik AIC BICA: continuous 8 3760.0 3776.0 3825.9B: binary 8 3752.0 3768.0 3817.9C: 4 categories 10 3736.0 3756.0 3818.4D: 3 categories 9 3751.9 3769.9 3826.1E: quadratic 9 3760.0 3778.0 3834.2ROC curve analysisReceiver Operating Characteristic Curve AnalysisTraditionally, looks at the sensitivity and specificity of a ‘model’ for predicting an outcomeQuestion: based on our model, can we accurately predict if a prostate cancer patient has capsular penetration?ROC curve analysisAssociations between predictors and outcomes is not enoughNeed ‘stronger’ relationshipClassic interpretation of sens and spec•a binary test and a binary outcome•sensitivity = P(test + | true disease)•specificity = P(test - |true no disease)What is test + in our dataset?What does the model provide for us?ROC curve analysis0.00 0.25 0.50 0.75 1.00Sensitivity/Specificity0.00 0.25 0.50 0.75 1.00Probability cutoffSensitivity SpecificityFitted probabilitiesThe fitted probabilities are the probability that a NEW patient with the same ‘covariate profile’ will be a “case” (e.g., capsular penetration, disease, etc.)We select a probability ‘threshold’ to determine whether a patient is defined as a case or notSome options:•high sensitivity (e.g., cancer screens)•high specificity (e.g., PPD skin test for TB)•maximize the sum of sens and specROC curve. xi: logit capsule i.dpros detected gleason logpsai.dpros _Idpros_1-4


View Full Document

MUSC BMTRY 701 - lect16

Documents in this Course
lect3

lect3

38 pages

lect9

lect9

28 pages

lect18

lect18

17 pages

lect1

lect1

51 pages

lect12

lect12

24 pages

lect7

lect7

38 pages

lect9

lect9

29 pages

lect11

lect11

25 pages

lect13

lect13

40 pages

lect22

lect22

12 pages

lect10

lect10

40 pages

lect15

lect15

23 pages

lect14

lect14

47 pages

lect13

lect13

32 pages

lect12

lect12

24 pages

lecture18

lecture18

48 pages

lect17

lect17

29 pages

lect4

lect4

50 pages

lect4

lect4

48 pages

lect8

lect8

20 pages

Load more
Download lect16
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lect16 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lect16 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?