252chisqnote 2 29 08 Explanation of Chi squared Formulas 1 The Relationship of the Chi squared Formula to other Formulas involving Proportions The purpose of this note is to try to explain the relationship of the formula 2 the formula for the test ratio to test the null hypothesis H 0 p p 0 This ratio is z p z p q 0 0 so that n p p0 O E 2 to E p p0 where p x p 0 q 0 Here p is the observed proportion of successes in a n n sample and p 0 is the expected proportion of successes As always q 1 p Look at the O and E tables in our chi squared example O Days Age 15 25 Age 26 49 Age 50 up Total 30 pc 0 3 1 2 Total 10 15 15 10 50 pr 50 1 150 3 5 10 15 10 40 40 15 15 10 20 60 60 30 40 40 40 40 40 1 40 4 4 4 150 5 150 15 150 15 150 15 E Column 1 2 3 4 Row 1 Row 2 Row 3 Total pc 150 150 150 1 15 2 5 1 10 13 1 3 13 1 3 13 13 50 pr 50 1 150 3 8 10 2 3 10 2 3 10 2 3 40 40 16 16 16 60 60 12 30 40 40 40 30 40 40 1 40 4 4 4 150 5 150 15 150 15 150 15 Total 4 150 1 The O table can be looked at as a set of observed proportions multiplied by a column sum We sum across the rows to get the row proportions which are weighted averages of the proportions in their respective rows If we use p i 0 for the weighted average proportion we have computed in row i and use p ij for the observed proportion in row i and column j we can write the O table as 150 4 150 1 15 2 5 252chisqnote 2 29 08 p11 n1 p12 n 2 p13 n 3 p14 n 4 total p10 n p 21 n1 p 22 n 2 p 23 n3 p 24 n 4 p 20 n p 31 n1 p 32 n 2 p 33 n3 p 34 n 4 p 30 n n1 n2 n3 n4 pr p10 p 20 For an example of Minitab computations a using p 30 n 1 the Minitab chi squared routines b simulating our computations and c working with proportions see 252chisqx2 We then get the E table by multiplying the column sums by the proportions in each row We can write the p10 n1 p10 n 2 p10 n3 p10 n 4 total p10 n E table as p 20 n1 p 30 n1 n1 p 20 n 2 p 20 n 3 p 20 n 4 p 20 n p 30 n 2 p 30 n3 p 30 n 4 p 30 n n2 n3 n4 pr p10 p 20 p 30 n 1 Our next step in a chi squared test is to write the O and E columns and compute 2 O E 2 E the O and But look at the element of this computation that comes from the upper left corner of E tables It can be written as p11 p10 2 p10 n1 we have p11 n1 p10 n1 2 If we remove n1 from the parentheses p10 n1 2 We can thus say that pij pi0 2 ij pi0 nj Now let us look at a problem that only involves one set of proportions In this case we test p1 p 2 p 3 p 4 Instead of using terms like p 20 for the probabilities in the second row we can realize that because there are only two rows we can call our p10 simply p 0 and realize that p 20 1 p10 q 0 This means that we can write the O table as total pr p1 n1 p2 n2 p3 n3 p4 n4 p0 n p0 q1 n1 q2 n2 q3 n3 q4 n4 q0 n q0 n1 n2 n3 n4 n 1 p 0 n1 p0 n2 p 0 n3 p0 n4 total p0 n q 0 n1 q0 n2 q 0 n3 q0 n4 q0 n q0 n1 n2 n3 n4 n 1 2 j p j p0 2 p0 nj j q j q 0 2 q0 nj pr p0 and the E table as The expression for the chi squared statistic becomes But note that q j q 0 2 1 p j 1 p 0 2 p j p0 2 252chisqnote 2 29 08 This implies that 2 j p j p0 2 p0 nj p j p0 2 j q0 nj q p0 1 1 n j 0 since p 0 q 0 1 n j p0 q0 q0 p0 q0 p0 p j p0 2 j n j 0 p0 2 q p j p0 2 q0 p0 nj p j p0 j 1 n j q0 p0 2 np j 0 nj Now note that q 0 This means that Finally remember that we started with the test ratio for an individual sample which we could zj p j p0 zj 2 Note that if j goes from 1 to p 0 q 0 We can now see that our is simply j n c the degrees of freedom are c 1 because the last square z c2 can be predicted from our knowledge of the sample sizes and the fact that the proportions must average to p 0 write as 2 252chisqnote 2 29 08 2 The Equivalence of the two formulas for computing Chi squared The two formulas usually given for computing chi squared in a chi squared test are 2 or 2 Oi E i 2 2 O2 n These are properly written as Ei E i 1 k Oi 2 Ei i 1 k 2 Ei 2 is computed is Oi2 Oi2 2Oi E i E i2 Oi2 2Oi E i E i2 2Oi E i This means that Ei Ei Ei Ei Ei Oi E i 2 Ei i 1 k 2 E and n We assume that the O and E columns each consist of k entries and add to n Then each entry in the column where Oi E i 2 O E 2 Oi2 2 Ei Oi Ei Oi2 2n n Ei Oi2 n Ei 3 Degrees of Freedom for computing chi squared for a test of homogeneity or independence Usually if we have a column of k numbers that are constrained by the fact that they must add to some number n we say that we have k 1 degrees of freedom When we put together something like E and then compute 2 we are limited by much more than a constraint on the entire sum The rule of thumb used is that every time we estimate a parameter we lose a degree of freedom Across each row we compute a row sum and then divide it by n to …
View Full Document
Unlocking...