Lecture 18 Matched Case Control StudiesMatched case control studiesMatched design1-to-M matchingLogistic regression for matched case control studiesSolution: treat each matched set as a stratumHow many parameters is that?Conditional logistic regressionConditioningLikelihood function for CLRSlide 11Interpretation of βWhen to use matched vs. unmatched?Another approach to matched dataSlide 15Slide 16Slide 17Lecture 18Matched Case Control StudiesBMTRY 701Biostatistical Methods IIMatched case control studiesReferences:•Hosmer and Lemeshow, Applied Logistic Regression•http://staff.pubhealth.ku.dk/~bxc/SPE.2002/Slides/mcc.pdf•http://staff.pubhealth.ku.dk/~bxc/Talks/Nested-Matched-CC.pdf•http://www.tau.ac.il/cc/pages/docs/sas8/stat/chap49/sect35.htm•http://www.ats.ucla.edu/stat/sas/library/logistic.pdf (beginning page 5)Matched designMatching on important factors is commonOP cancer:•age•genderWhy?•forces the distribution to be the same on those variables•removes any effects of those variables on the outcome•eliminates confounding1-to-M matchingFor each ‘case’, there is a matched ‘controlProcess usually dictates that the case is enrolled, then a control is identifiedFor particularly rare diseases or when large N is required, often use more than one control per caseLogistic regression for matched case control studiesRecall independenceBut, if cases and controls are matched, are they still independent?iixxiidiiidieeBernpBerny10101~)(~Solution: treat each matched set as a stratumone-to-one matching: 1 case and 1 control per stratumone-to-M matching: 1 case and M controls per stratumLogistic model per stratum: within stratum, independence holds.We assume that the OR for x and y is constant across strataikikxxikeexp1)(How many parameters is that?Assume sample size is 2n and we have 1-to-1 matching:n strata + p covariates = n+p parametersThis is problematic:•as n gets large, so does the number of parameters•too many parameters to estimate and a problem of precisionbut, do we really care about the strata-specific intercepts?“NUISANCE PARAMETERS”Conditional logistic regressionTo avoid estimation of the intercepts, we can condition on the study design.Huh?Think about each stratum:•how many cases and controls?•what is the probability that the case is the case and the control is the control?•what is the probability that the control is the case and the case the control?For each stratum, the likelihood contribution is based on this conditional probabilityConditioningFor 1 to 1 matching: with two individuals in stratum k where y indicates case status (1 = case, 0 = control)Write as a likelihood contribution for stratum k:)1,0()0,1()0,1()0,1(21212121kkkkkkkkyyPyyPyyPyyP)|1()|0()|0()|1()|0()|1(221122112211kkkkkkkkkkkkkxyPxyPxyPxyPxyPxyPLLikelihood function for CLRSubstitute in our logistic representation of p and simplify:kkkkkkkkkkkkkkkkkkkkkkkkkkkxxxxxxxxxxxxxxxkkkkkkkkkkkkkeeeeeeeeeeeeeeexyPxyPxyPxyPxyPxyPL211211221211211111111111)|1()|0()|0()|1()|0()|1(221122112211Likelihood function for CLRNow, take the product over all the strata for the full likelihoodThis is the likelihood for the matched case-control designNotice:•there are no strata-specific parameters•cases are defined by subscript ‘1’ and controls by subscript ‘2’Theory for 1-to-M follows similarly (but not shown here)nkxxxnkkkkkeeeLL11211)(Interpretation of βSame as in ‘standard’ logistic regressionβ represents the log odds ratio comparing the risk of disease by a one unit difference in xWhen to use matched vs. unmatched?Some papers use both for a matched designTradeoffs:•bias•precisionSometimes matched design to ensure balance, but then unmatched analysisThey WILL give you different answersGillison paperAnother approach to matched datause random effects modelsCLR is elegant and simplecan identify the estimates using a ‘transformation’ of logistic regression resultsBut, with new age of computing, we have other approachesRandom effects models:•allow strata specific intercepts•not problematic estimation process•additional assumptions: intercepts follow normal distribution•Will NOT give identical results. xi: clogit control hpv16ser, group(strata) orIteration 0: log likelihood = -72.072957 Iteration 1: log likelihood = -71.803221 Iteration 2: log likelihood = -71.798737 Iteration 3: log likelihood = -71.798736 Conditional (fixed-effects) logistic regression Number of obs = 300 LR chi2(1) = 76.12 Prob > chi2 = 0.0000Log likelihood = -71.798736 Pseudo R2 = 0.3465------------------------------------------------------------------------------ control | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- hpv16ser | 13.16616 4.988492 6.80 0.000 6.26541 27.66742------------------------------------------------------------------------------. xi: logistic control hpv16serLogistic regression Number of obs = 300 LR chi2(1) = 90.21 Prob > chi2 = 0.0000Log likelihood = -145.8514 Pseudo R2 = 0.2362------------------------------------------------------------------------------ control | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- hpv16ser | 17.6113 6.039532 8.36 0.000 8.992582 34.4904------------------------------------------------------------------------------. xi: gllamm control hpv16ser, i(strata) family(binomial)number of level 1 units = 300number of level 2 units = 100 Condition Number = 2.4968508 gllamm model log likelihood = -145.8514
View Full Document