UNC-Chapel Hill BIOS 662 - Categorical Data- Contingency Tables

Unformatted text preview:

Categorical Data: Contingency TablesBios 662Michael G. Hudgens, [email protected]://www.bios.unc.edu/∼mhudgens2006-10-17 17:11BIOS 662 1 Categorical DataContingency Tables• Two-way (r × c) contingency table:ji 1 2 · · · c1 n11n12· · · n1c2 n21n22· · · n2c...............r nr1nr2· · · nrc• Notation:ni·=cXj=1nijn·j=rXi=1nijBIOS 662 2 Categorical DataContingency Tables• Two scenarios where r × c table arise1. Sample from a population and measure two charac-teristics, say X and YPr[X = i, Y = j] = πij;rXi=1cXj=1πij= 12. Each row corresponds to a sample from a differentpopulationcXj=1πij= 1BIOS 662 3 Categorical DataContingency Table: Example• A survey of physicians asked about the size of com-munity in which they were reared and the size of thecommunity in which they practicePracticeReared <5k 5-49k 50-99k 100k+ Total<5k 40 38 32 37 1475-49k 26 42 35 33 13650-99k 24 26 34 31 115100k+ 30 39 53 60 182120 145 154 161 580BIOS 662 4 Categorical DataContingency Table: Example• A case-control study was conducted to investigate therelationship between age at first birth and breast cancerAge at1st birth<20 20-24 25-29 30-34 ≥ 35 TotalCase 320 1206 1011 463 220 3220Control 1422 4432 2893 1092 406 102451742 5638 3904 1555 626 13465BIOS 662 5 Categorical DataContingency Tables• Physician’s example H0: size of place of practice is in-dependent of size of place of rearingH0: πij= πi·π·j• Test of independenceBIOS 662 6 Categorical DataContingency Tables• Breast cancer example H0: distribution across ages isthe same for cases and controlsH0: πij= πi0j; j = 1, 2, . . . , c• Test of homogeneity/associationBIOS 662 7 Categorical DataTest of Independence or Association• Under H0, the expected frequency in the (i, j) cell isEij=ni·n·jN• Consider breast cancer example– If H0is true, would expect the proportion of women< 20 to beˆπ·1=n11+ n21N=n·1N– There are n1·cases, so we would expectE11= n1·n·1N=n1·n·1Ncases to be < 20 years oldBIOS 662 8 Categorical DataTest of Independence• Under H0, the expected frequency in the (i, j) cell isEij=ni·n·jN• LetX2=rXi=1cXj=1(Oij− Eij)2Eiji.e.X2=rXi=1cXj=1(nij− ni·n·j/N)2ni·n·j/NBIOS 662 9 Categorical DataTest of Independence• Under H0,X2∼ χ2(r−1)(c−1)• Physician’s Example:(r − 1)(c − 1) = 3 × 3 = 9C.05= {X2: X2> χ2.95,9= 16.92}BIOS 662 10 Categorical DataPhysician’s Example• Expected valuesPracticeReared <5k 5-49k 50-99k 100k+ Total<5k 30.4 36.8 39.0 40.8 1475-49k 28.1 34.0 36.1 37.8 13650-99k 23.8 28.8 30.5 31.9 115100k+ 37.7 45.5 48.3 50.5 182120 145 154 161 580BIOS 662 11 Categorical DataPhysician’s Example• Calculate test statisticX2=(40 − 30.4)230.4+(38 − 36.8)236.8+· · ·+(60 − 50.5)250.5= 12.81• Do not reject H0.• There is not enough evidence to say that place of prac-tice and place of rearing are dependentBIOS 662 12 Categorical DataBreast Cancer Example• Underlying probabilitiesAge at1st birth<20 20-24 25-29 30-34 ≥ 35 TotalCase π11π12π13π14π151Control π21π22π23π24π251BIOS 662 13 Categorical DataBreast Cancer Example• Null hypothesisH0: π1j= π2jfor j = 1, 2, 3, 4, 5• Can use same statisticX2=2Xi=1cXj=1(Oij− Eij)2Eij∼ χ2(c−1)BIOS 662 14 Categorical DataBreast Cancer Example• Expected frequenciesAge at1st birth<20 20-24 25-29 30-34 ≥ 35 TotalCase 416.6 1348.3 933.6 371.9 149.7 3220Control 1325.4 4289.7 2970.4 1183.1 476.3 102451742 5638 3904 1555 626 13465BIOS 662 15 Categorical DataBreast Cancer Example• Test statisticX2=(320 − 416.6)2416.6+ · · · +(406 − 476.3)2476.3= 130.3• Rejection regionC.05= {X2: X2> χ2.95,4= 9.49}• Reject H0• The age distributions are not the sameBIOS 662 16 Categorical DataAsymptotic Approximation• Note the χ2distribution for X2is an approximation• The approximation works well for if Eij≥ 5 for all i, j• If Eij< 5, a generalization of Fisher’s exact test can beemployed or categories combinedBIOS 662 17 Categorical DataTest of Independence• For r = c = 2, can showX2=2Xi=12Xj=1(nij− ni·n·j/N)2ni·n·j/NequalsX2=n(n11n22− n12n21)2n1·n·1n2·n·2• Pearson chi-square statisticBIOS 662 18 Categorical DataTest for Trend• Consider a 2 × c• The χ2test for homogeneity does not tell us how theprobabilities differ• Rather, just if they differ• If the categories of the column variable are ordered, amore powerful test is possibleBIOS 662 19 Categorical DataTest for Trend• Suppose columns = exposure are ordered• Rows = disease (yes/no)• Interested in detecting alternatives where the probabil-ity of disease proportional to exposure• I.e., looking for a monotonic dose-response type rela-tionshipBIOS 662 20 Categorical DataTest for Trend• Breast cancer example, alternative of interest: the prob-ability of cancer increases as age at first birth increases• Let ρjdenote the conditional probability of being in row1 given in column j• For the breast cancer example,ρjis the probability of being a case conditional on beingin the jth age categoryBIOS 662 21 Categorical DataTest for Trend• TestH0: ρ1= ρ2= · · · = ρcversusHA: ρ1≤ ρ2≤ · · · ≤ ρcwith at least one strict inequality, orHA: ρ1≥ ρ2≥ · · · ≥ ρcwith at least one strict inequalityBIOS 662 22 Categorical DataTest for Trend• Numerical scores must be assigned to categories:xj: j = 1, 2, . . . , c• Example: use midrange of age categoriesx1= 17.5, x2= 22.5, x3= 27.5, x4= 32.5, x5= 37.5• When scores are equidistant, can use xj= j WLOGBIOS 662 23 Categorical DataTest for Trend• Let[n1x] ≡cXj=1n1jxj−n1·Pcj=1n·jxjN[x2] ≡cXj=1n·jx2j−(Pcj=1n·jxj)2Nandp ≡n1·N• Then the chi-square test for trend (p 215 text) isX2trend≡[n1x]2[x2]p(1 − p)BIOS 662 24 Categorical DataTest for Trend• Huh? Intuitive development:• Compute average score¯x ≡cXj=1n1jxjn1·• Compute finite sample expected value under the nullE(¯x) ≡cXj=1n·jxjNBIOS 662 25 Categorical DataTest for Trend• Compute finite sample varianceV (¯x) ≡1 − fn1·hE(¯x2) − {E(¯x)}2iwhere f = n1·/N is the sampling fraction andE(¯x2) ≡cXj=1n·jx2jN• Then the chi-square test for trend can equivalently bewrittenX2trend={¯x − E(¯x)}2V (¯x)BIOS 662 26 Categorical DataTest for Trend• Under H0,X2trend∼ χ21Cα= {X2: X2> χ21,1−α}p = Pr[χ21> x2]• Note degrees of freedom equal 1 regardless of cBIOS 662 27 Categorical DataTest for Trend• Breast


View Full Document

UNC-Chapel Hill BIOS 662 - Categorical Data- Contingency Tables

Download Categorical Data- Contingency Tables
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Categorical Data- Contingency Tables and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Categorical Data- Contingency Tables 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?