MIT 18 443 - THE LIMIT DISTRIBUTION OF THE X2 STATISTIC - D1876484

Home> Schools> Massachusetts Institute of Technology> (18) > 18 443> THE LIMIT DISTRIBUTION OF THE X2 STATISTIC

DOC PREVIEW

MIT 18 443 - THE LIMIT DISTRIBUTION OF THE X2 STATISTIC

School name Massachusetts Institute of Technology

Course 18 443- Statistics for Applications

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

MIT OpenCourseWare http://ocw.mit.edu 18.443 Statistics for Applications Spring 2009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.� � � 618.443 THE LIMIT DISTRIBUTION OF THE X2 STATISTIC Suppose we have a multinomial (n, p1, ..., pk) distribution, where pj is the probability of the jth of k possible outcomes on each of n independent trials. Thus pj 0 and �k ≥pj = 1. The probability that the jth outcome occurs nj times for each j in the nj=1 trials (so that necessarily n1 + + nk = n) is the multinomial probability ··· n n1 n2 nk n1, ..., nk p1 p2 ··· pk . If values n1, n2, ..., nk are observed, and a simple hypothesis H0 speciﬁes values of p1, ..., pk, then the X2 statistic for t esting H0 is X2 = k (nj − npj )2 . npjj=1 Theorem. If the hypothesis H0 is true, then as n → ∞, t he distribution of X2 converges to that of χ2(k − 1), i.e. χ2 with k − 1 degrees of freedom. Proof. Under H0, the random vector (n1, ..., nk) has a multinomial (n, p1, ..., pk) distribu-tion. Let’s ﬁnd the covariance of ni and nj for i = j. If we can do that for i = 1 and j = 2 we can extend the result to any i and j. For each j, the (marginal) distribution of nj is binomial (n, pj ). Let q1 := 1 − p1. Given n1, the conditional distribution of n2 is binomia l (n −n1, p2/q1). Thus E(n2|n1) = (n − n1)p2/q1 and E(n1n2) = E(n1E(n2|n1)) = n 2 p1p2/q1 − p2q −1En12 .1 Since En12 = Var(n1) + (En1)2 = np1q1 + n2p12 we get 2 2 2 E(n1n2) = n p1p2 q−1 n p1p2 − np1p2 = (n 2 − n)p1p2, which is symmetric in p1 and p2 as it should b e. It follows that Cov(n1, n2) = −np1p2. It’s natural t hat this covariance should be negative since for larger n1, n2 will tend to be smaller. For 1 ≤ i < j ≤ n we have likewise Cov(ni, nj ) = −npipj . 2Let Yj := (nj − npj )/√npj for j = 1, ..., k. Then X2 = Y + + Y2 For each j we 1 k .··· have EYj = 0 and EY2 = qj := 1 − pj . We al so have for i =6 jj EYiYj = Cov(Yi, Yj ) = Cov(ni, nj )/(n√pipj ) = −√pipj . Recall that δij = 1 for i = j and 0 for i =6 j. As a matrix, Iij = δij gives the k ×k identity matrix. We have Cij := EYiYj = Cov(Yi, Yj ) = δij −√pipj 16for all i, j = 1, ..., k. Let up be the col umn vector (√p1, ..., √pk)′ . This vector has length 1. We can then ′write C = I − upup as a matrix. Let’s make a change of basis in which up becomes one of the basis vectors, say the ﬁrst, e1, and e2, ..., ek are any other vectors of unit length ′perpendicular to each other and to e1. In this basis C becomes D = I − e1e1 which is a diagonal matrix, in other words Dij = 0 for i = j. Also D11 = 0, and Djj = 1 for j = 2, ..., k. Let the vector Y = (Y1, ..., Yk) in the new coordinates become Z = (Z1, ..., Zk), where EZj = 0 for all j and the Zj have covariance matri x D. As n → ∞, by the multidimensional central limit theorem (proved in 18.175, e.g. in Professor Panchenko’s OCW version of the course, Spring 2007), (Z1, Z2, ..., Zk) asymp-totically have a multivariate normal distribution with mean 0 and covariance matrix D, in other words Z1 ≡ 0 and Z2, ..., Zk are asymptotically i.i.d. N(0, 1). Thus X2 = Y2 = |Z|2 = Z2 + ··· + Z2 has asymptotically a χ2(k − 1) distribution as n → ∞, Q.E.D. | |2 k Chi-squared tests of composite hypotheses. In doi ng a chi-squared test of a composite hypothesis indexed by an m-dimensional parameter θ, two kinds of adjustment may be made. One is to estimate θ by some θˆand compute the chi-squared statistic k Xˆ2 = � (nj − npj (θˆ))2 . npj (θˆ)j=1 The usual rule is that for n large enough, this should have approximately a χ2 distribution with k − 1 − m degrees of freedom. For that to be valid, θˆshould be a suitable function of the statistics n1, ..., nk. Two suitable estimators for this are the minimum chi-squared estimate, where θˆis chosen to minimize Xˆ2, or the maximum likelihood estimate θˆMLE based on the given data n1, ..., nk. Another adjustment that ’s made is that if the expected numbers npj (θˆ) in some cate-gories are less than 5, we can combine categories until all the expectations are larger than 5. Suppose we combine categories, which certainly will happen if we start with inﬁnitely many possible outcomes, as in a Po isson or geometric distribution where t he outcome can be any nonnegative integer. Then when we come to do the test, the nj will no longer be the original data, which may b e called the ungrouped data, but they’ll be what are called grouped data. Another way data can be grouped is t hat the original observations X1, ..., Xn might be real numbers, for example, and we want to test by χ2 whether they have a normal N(µ, σ2) distribution, so we have an m = 2 dimensional parameter. One can decompose the real line into k intervals (the l eftmost and rightmost being half-lines) and do a χ2 test. Here the numbers ni of observations i n each interva l are already grouped data. It tends to be very convenient to estimate the parameters based on ungrouped data, for all the cases mentioned (normal, Poisson, geometric) and hard to estimate them using grouped data. Unfortunately though, using esti ma tes based on ungrouped data, but doing a chi-squared test on grouped data, violates the conditions for the X2 statistic to have 2a χ2 distribution with k − 1 − m degrees of freedom, as many textbooks have claimed it does, although Rice, third ed., p. 359, correctly po ints out the issue. The problem is that the ungrouped data have more information in them than the grouped data do, and consequently, if the hypothesis H0 is true, an estimate θ˜based on the ungrouped data tends to be closer to the true value θ0 of the parameter than the estima te θˆbased on the grouped data

View Full Document