Unformatted text preview:

MIT OpenCourseWare http://ocw.mit.edu 18.443 Statistics for Applications Spring 2009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.� � � �� � � � � 18.443 TWO BY TWO CONTINGENCY TABLES: HYPERGEOMETRIC PROBABILITIES AND ODDS RATIOS 1. Hypergeometric probabilities. These give a way of testing the independence hypothesis (for the grand multinomial model) or the homogeneity hypothesis (for the π model) in 2 × 2 tables when at least one of the expected numbers ni· n ·j /n is less than 5, so a χ2 test is not recommended. Rice mentions hypergeometric distributions in subsection 2.1.4, p. 42 and Fisher’s exact test in Section 13.2, pp. 514-516. In a first probability course such as 18.4 40 you would also very likely have encountered hypergeometric distributio ns. Suppose we have a collection of N o bjects, of which m have a property A. We take a random sample of k o f the N objects, with probability 1/ Nk for each possible choice of the sample. Then the probability that the sample cont ains exactly j objects having A is m N−m h(j, k, m, N) = j �Nk� −j . k A binomial coefficient ab is defined for nonnegative i ntegers a and b as a!/(b!(a − b)!) or as 0 if b < 0 or b > a. For g iven N, m and k let Y be the number of objects in the sample hav ing A, so that P (Y = j) = h(j, k, m, N) for each possible j. The maximum possible value of j is min(k, m) and the minimum possible value is max(0, m + k − N). Here j ≥ m + k − N because t he number of objects not in A and not in the sample is N − k − m + j and this must be ≥ 0. Let’s define the cumulative probability, analogous to E for the binomial distribution, min(k,m) H(i, k, m, N) = P (Y ≥ i) = h(j, k, m, N). j=i It is known that EY = km/N and var(Y ) = k(N − k)m(N − m)/[N2(N − 1)], but we will not have need of the mean or variance in t his application. Suppose we have n0 indep endent trials with probability p0 of success on each trial and observe Y0 successes. Suppo se we also have n1 indep endent trials with probability p1 of success on each, also independent of the n0 trials, and observe Y1 successes. Viewing this i n terms of a 2×2 table, we can write n0· = n0 and n1· = n1 which may not be random. If we consider outcomes in the first column as “successes” then we will have π00 = p0, n00 = Y0, π10 = p1, and n10 = Y1. Suppose we want to test the hypothesis that p0 = p1 = p for some unknown value of p, in other words π00 = π10 = p, implying π01 = π11 = 1 − p, which give the homogeneity hypothesis H0 in the π model. If n0 and n1 are la rge and the estimated value ˆp = (Y0 + Y1)/(n0 + n1) i s not close to 0 or 1, we can apply normal approximations. But what if n0 and n1 are not so large? Assuming the null hypothesis p0 = p1, if we define the set A as consisting of the m = Y0 + Y1 successful trials, and the sample as the set of k = n0 trials, then the sample 1can be considered as bei ng chosen randomly from N = n0 + n1 objects, since for a trial to be a success is independent of whether it is one of the n0 trials or one of the n1. Thus, “Fisher’s exa ct test” (Rice, §13.2) of the hypothesis p0 = p1 at level α is, for a one-sided test, to reject it in favor of the alternative p0 > p1 if Y0 is too large, specifically, if H(Y0, k, m, N) < α. Or if the alternative is p0 < p1, we would reject the null hypothesis in favor o f that if H(Y1, n1, m, N) < α . For a two -sided test, we would reject the null hypothesis p0 = p1 in favor o f p0 6= p1 if either H( Y0, k, m, N) < α/2 or H(Y1, n1, m, N) < α/2. As an example, suppose n0 = n1 = 5, Y0 = 4 and Y1 = 1, so that m = 5 and N = 10. Then H(4, 5, 5 , 10) = 0.103, so the null hypothesis would not b e rejected at level α = 0.05. The normal approximation doesn’t work very well in this case since n0 and n1 are not large. 2. Odds ratios. Section 13.6 o f Rice is about these. Suppose given a two by two contingency table with probabilities pij for i, j = 0, 1. Then the (true) odds ratio is defined as Δ = p00p11/(p01p10), suppo si ng t hat all pij 6= 0 It’s easily seen that if the independence hypothesis H0 holds then the odds ratio equals 1, as its numerator and denominator b oth equal p0·p·0p1·p·1. Moreover the four equations pij = pi·p·j that define H0 are al l equivalent, as for example p00 = p0·p·0 implies p01 = p0· − p00 = p0·(1 − p·0) = p0·p·1, and likewise moving on to the entries in the second row. Actually the odds ratio equaling 1 implies H0 and so is equivalent to it, as follows. Suppose Δ = 1. Then p00/p01 = p10/p11. Adding 1 to bot h sides gives p0·/p01 = p1·/p11, and inverting both sides gives p01/p0· = p11/p1·; call this ra tio λ. Then p01 = λp0· and p11 = λp1·, so subtracting both sides of t he first equation from p0· and of the second from p1· gives p00 = (1 − λ)p0· and p10 = (1 − λ)p1·; adding columns it then follows that 1 − λ = p·0 and λ = p·1. Thus H0 holds as cla imed. If the null hypothesis H0 is rejected (by a chi-squared or Fisher’s exact test) one can then estima te the true odds ratio (different from 1) by the estimator given in the last display of Rice p. 528, which is Δˆ= n00n11/(n01n10). This is the MLE of the true odds ratio Δ. For large nij the log (to base e) of the odds ratio is approximately normally distributed. One can estimate the mean by the log of the above estimate. The variance is estimated �1 �1by i=0 j=0 1/nij . The square root of t hat gives a standard error for the ln odds rati o, so one can get confidence intervals for it. Then o ne can exponentiate the endpoi nts to get confidence intervals for the odds ratio itself.


View Full Document
Download TWO BY TWO CONTINGENCY TABLES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view TWO BY TWO CONTINGENCY TABLES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view TWO BY TWO CONTINGENCY TABLES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?