Explanation of Chi-squared Formulas1) The Relationship of the Chi-squared Formula to other Formulas involving ProportionsThe purpose of this note is to try to explain the relationship of the formula to the formula for the test ratio to test the null hypothesis . This ratio is where so that . Here is the observed proportion of successes in a sample and is the expected proportion of successes. As always, .Look at the and tables in our chi-squared example.252chisqnote 2/29/08Explanation of Chi-squared Formulas 1) The Relationship of the Chi-squared Formula to other Formulas involving Proportions The purpose of this note is to try to explain the relationship of the formula EEO22 tothe formula for the test ratio to test the null hypothesis 00: ppH . This ratio is pppz0 wherenqpp00 so that nqpppz000. Here nxp is the observed proportion of successes in a sample and 0p is the expected proportion of successes. As always, pq 1. Look at the O and E tables in our chi-squared example.11541504015415040154150405115030115040404030Total521506015415040311505060405020101515101510510151510 up 50 Age49-26 Age25-15 AgeTotal3210DayscrppO11541504015415040154150405115030115040404030Total5215060154150403115050604050161616121010108131313103 Row2 Row1 RowTotal4321Column323232313131crppEThe Otable can be looked at as a set of observed proportions multiplied by a column sum. We sum across the rows to get the row proportions, which are weighted averages of the proportions in their respective rows. If we use 0ip for the weighted average proportion we have computed in row i and useijp for the observed proportion in row i and column j , we can write the O table as252chisqnote 2/29/0814321302010302010434333232131424323222121414313212111nnnnnpppnpnpnpnpnpnpnpnpnpnpnpnpnpnpnpptotalr. For an example of Minitab computations a) usingthe Minitab chi-squared routines, b) simulating our computations and c) working with proportions see 252chisqx2. We then get the E table by multiplying the column sums by the proportions in each row. We can write the E table as 14321302010302010430330230130420320220120410310210110nnnnnpppnpnpnpnpnpnpnpnpnpnpnpnpnpnpnpptotalr . Our next step in a chi-squared test is to write the O and E columns and compute EEO22. But look at the element of this computation that comes from the upper left corner of the O and E tables. It can be written as 1102110111npnpnp . If we remove 1n from the parentheses we have 11021011nppp . We can thus say that ijjiiijnppp0202. Now let us look at a problem that only involves one set of proportions. In this case we test4321pppp . Instead of using terms like 20p for the probabilities in the second row, we can realize that because there are only two rows, we can call our 10p simply 0p and realize that010201 qpp . This means that we can write the O table as1432100004433221144332211nnnnnqpnqnpnqnqnqnqnpnpnpnppto talr and the E table as1432100004030201040302010nnnnnqpnqnpnqnqnqnqnpnpnpnpptotalr. The expression for the chi-squared statistic becomes j jjjjjnqqqnppp0200202. But note that 20202011 ppppqqjjj.252chisqnote 2/29/08This implies that j jjjjjnqppnppp0200202 jjjjqnpnpp0020. Now note that since 100 qp, 0011qpnj000000pqppqqnj 001pqnj. This means that jjjjjnpqpppqnpp002000202. Finally, remember that we started with the test ratio for an individual sample, which we could write as nqpppzjj000. We can now see that our 2 is simply jjz2. Note that if j goes from 1 toc, the degrees of freedom are 1c, because the last square 2cz can be predicted from our knowledge of the sample sizes and the fact that the proportions must average to 0p.252chisqnote 2/29/082) The Equivalence of the two formulas for computing Chi-squared.The two formulas usually given for computing chi-squared in a chi-squared test are EEO22 or nEO22. These are properly written as kiiiiEEO,122 and nEOkiii ,122. We assume that the O and E columns each consist of k entries and add ton. Then each entry in the column where 2 is computed is iiiiiiiiiiiiiiiEEEEOEOEEEOOEEO2222222 iiiiEOEO 22 This means that .2222,122nnEOEOEOEEOiiiiiikiiii .2nEOii 3) Degrees of Freedom for computing chi-squared for a test of homogeneity or independence.Usually, if we have a column of k numbers that are constrained by the fact that they must add to some number n, we say that we have 1kdegrees of freedom. When we put together something like E andthen compute 2, we are limited by much more than a constraint on the entire sum. The rule of thumb used is that every time we estimate a parameter, we lose a degree of freedom. Across each row, we computea row sum and then divide it by n to figure out the proportion of each column that must be in each row. Before we used ijp for the observed proportion in row i and column j , and 0ip for the weighted average proportion we computed in row i. To continue this notation, let jp0 be the weighted average proportion in column j. If there are r rows, it must be true that 1,1 riiop and if there are c columns, it must be true that 1,10 cjjp, so that once we have estimated 1r values of 0ip and 1c values ofjp0, we do not need to bother to estimate the last value, since we can get it by subtracting from 1. We thus start with rcNumbers in the columns and thus 1rc degrees of freedom, but we must subtract 1r and 1c from it to get 111111
View Full Document