Lecture 17: Regression for Case-control StudiesOld business: Comparing AUCsAdditional Reading in Logistic REgressionCase Control Studies in Logistic RegressionRecall the simple 2x2 exampleExample: Case control study of HPV and Oropharyngeal CancerData on Case vs. HPVMultiple Logistic RegressionFit the model:How do we interpret the results?Interpreting the interactionHow can we assess if the effect of smoking differs by HPV status?How likely is it that someone who smokes and drinks will get oropharyngeal cancer?Matched case control studiesMatched design1-to-M matchingLogistic regression for matched case control studiesSolution: treat each matched set as a stratumHow many parameters is that?Conditional logistic regressionConditioningLikelihood function for CLRSlide 23Interpretation of βWhen to use matched vs. unmatched?Another approach to matched dataSlide 27Slide 28Slide 29Lecture 17:Regression for Case-control StudiesBMTRY 701Biostatistical Methods IIOld business: Comparing AUCsGood reference: Hanley and McNeill“Comparing AUCs for ROC curves based on the same data”See class website for pdf.Additional Reading in Logistic REgressionHosmer and Lemeshow, Applied Logistic Regressionhttp://en.wikipedia.org/wiki/Logistic_regressionhttp://luna.cas.usf.edu/~mbrannic/files/regression/Logistic.htmlhttp://www.statgun.com/tutorials/logistic-regression.htmlhttp://www.bus.utk.edu/stat/Stat579/Logistic%20Regression.pdfEtc: Google “logistic regression”Case Control Studies in Logistic Regressionhttp://www.oxfordjournals.org/our_journals/tropej/online/ma_chap11.pdfHow is a case-control study performed?What is the outcome and what is the predictor in the regression setting?Recall the simple 2x2 exampleOdds ratio for 2x2 table can be used in case-control studiesSimilarly, the logistic regression model can be used treating ‘case’ status as the outcome.It has been shown that the results do not depend on the sampling (i.e., cohort vs. case-control study).Example: Case control study of HPV and Oropharyngeal CancerGillison et al. (http://content.nejm.org/cgi/content/full/356/19/1944)100 cases and 200 controls with oropharyngeal cancerHow was the sampling done?Data on Case vs. HPV> table(data$hpv16ser, data$control) 0 1 0 186 43 1 14 57> epitab(data$hpv16ser, data$control)$tab OutcomePredictor 0 p0 1 p1 oddsratio lower upper p.value 0 186 0.93 43 0.43 1.00000 NA NA NA 1 14 0.07 57 0.57 17.61130 8.99258 34.49041 4.461359e-21Multiple Logistic RegressionThis is not ‘randomized’ studythere are lots of other predictors that may be associated with the cancerExamples:•smoking •alcohol•age •genderFit the model:Write down the model•assume main effects of tobacco, alcohol and their interactionWhat is the likelihood function?What are the MLEs?How do we interpret the results?Is there an effect of tobacco?Is there an effect of alcohol?Is there an interaction?Interpreting the interactionWhat is the OR for smoker/non-drinker versus a non-smoker/non-drinker?What is the OR for a smoker/drinker versus a non-smoker/drinker?How can we assess if the effect of smoking differs by HPV status?How likely is it that someone who smokes and drinks will get oropharyngeal cancer?How can we estimate the chance?Matched case control studiesReferences:•Hosmer and Lemeshow, Applied Logistic Regression•http://staff.pubhealth.ku.dk/~bxc/SPE.2002/Slides/mcc.pdf•http://staff.pubhealth.ku.dk/~bxc/Talks/Nested-Matched-CC.pdf•http://www.tau.ac.il/cc/pages/docs/sas8/stat/chap49/sect35.htm•http://www.ats.ucla.edu/stat/sas/library/logistic.pdf (beginning page 5)Matched designMatching on important factors is commonOP cancer:•age•genderWhy?•forces the distribution to be the same on those variables•removes any effects of those variables on the outcome•eliminates confounding1-to-M matchingFor each ‘case’, there is a matched ‘controlProcess usually dictates that the case is enrolled, then a control is identifiedFor particularly rare diseases or when large N is required, often use more than one control per caseLogistic regression for matched case control studiesRecall independenceBut, if cases and controls are matched, are they still independent?iixxiidiiidieeBernpBerny10101~)(~Solution: treat each matched set as a stratumone-to-one matching: 1 case and 1 control per stratumone-to-M matching: 1 case and M controls per stratumLogistic model per stratum: within stratum, independence holds.We assume that the OR for x and y is constant across strataikikxxikeexp1)(How many parameters is that?Assume sample size is 2n and we have 1-to-1 matching:n strata + p covariates = n+p parametersThis is problematic:•as n gets large, so does the number of parameters•too many parameters to estimate and a problem of precisionbut, do we really care about the strata-specific intercepts?“NUISANCE PARAMETERS”Conditional logistic regressionTo avoid estimation of the intercepts, we can condition on the study design.Huh?Think about each stratum:•how many cases and controls?•what is the probability that the case is the case and the control is the control?•what is the probability that the control is the case and the case the control?For each stratum, the likelihood contribution is based on this conditional probabilityConditioningFor 1 to 1 matching: with two individuals in stratum k where y indicates case status (1 = case, 0 = control)Write as a likelihood contribution for stratum k:)1,0()0,1()0,1()0,1(21212121kkkkkkkkyyPyyPyyPyyP)|1()|0()|0()|1()|0()|1(221122112211kkkkkkkkkkkkkxyPxyPxyPxyPxyPxyPLLikelihood function for CLRSubstitute in our logistic representation of p and simplify:kkkkkkkkkkkkkkkkkkkkkkkkkkkxxxxxxxxxxxxxxxkkkkkkkkkkkkkeeeeeeeeeeeeeeexyPxyPxyPxyPxyPxyPL211211221211211111111111)|1()|0()|0()|1()|0()|1(221122112211Likelihood function for CLRNow,
View Full Document