Topic 10

Home> Academic Documents> Topic 10

DOC PREVIEW

Topic 10

This preview shows page 1-2-3 out of 10 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Topic 10. Contingency Tables (Ch. 15) 1) Chi-Square Test • A contingency table is a tabular arrangement of nominal data from multiple populations. 1.1 2x2 tables • One way to analyze such data is the chi-square test. Suppose one takes a random sample of n units, and then categorizes the units of the basis of 2 (or more) categorical variables. For simplicity, let’s consider first a 2x2 table, where there are two categorical variables, each with two possible outcomes. • As an example, consider a random sample of n = 793 people involved in bicycle accidents. The accident report specifies whether or not each person 1) was wearing a helmet and 2) suffered a head injury. Suppose 147 were wearing helmets, and 646 were not. In the group with helmets, 17 (or 1.116p=) had a head injury and 130 (or ) did not; whereas in the group without helmets, 218 (or ) had a head injury and 428 (or 1 ) did not. The data in the form of a 2x2 contingency table are: 11 .884p−=2.337p =2.663p−= Helmet Yes No Total Head Injury Yes 17 218 235 No 130 428 558 Total 147 646 793 (1) Arranged as proportions, the data are Helmet Yes No Total Head Injury Yes .116 .337 .296 No .884 .663 .704 Total 1.0 1.0 1.0 • An obvious question is whether there is any association between the incidence of head injury and the use of helmets among those involved in bicycle accidents. The generic null hypothesis is 0:H variable A is independent of variable B, or in the context of this problem; 0:H the incidence of head injuries is independent of (or has no association with) the use of helmets.• One alternative way to state the equivalent hypothesis (as given in the text) is 01 212:.:AHp p vsHp p=≠ where p are the proportions of head injuries for those with helmets and those without helmets, respectively. Note that if 1and2p12,pp= there is no association between the two variables. • Recall that we had a test for 1pp2= in the previous chapter. A chi-square test gives us an alternative way to solve the problem, a method that will generalize to more than 2 categories for one or both of the variables. • To carry out the test, consider the following nomenclature. Let O denote the observed count in row i and column j; the total in row i for ijjriOi1, , ;i O=i the total in column j for 1, , ;jc=  and n the grand total of all observations. This gives table Column 1 2  c Total Row 1 11O 12O  1cO 1Oi 2 21O 22O  2cO 2Oi      r 1rO 2rO  rcO rOi Total 1Oi 2Oi  cOi n (2) • To calculate the test statistic, one must first calculate the expected counts assuming is true, e.g. if the use of helmets has no association with the incidence of head injuries. If true, and hence then the best estimate of the common injury rate is the total number of head injuries, or OO0H12,pp=1111,O2=+i793 0.296.= divided by the total sample size n. For these data, that is On Recall that O1/ 235i/=1147=i people wore helmets. Therefore, assuming independence, the expected number of people wearing helmets that would sustain a head injury is 11(/OOn).×ii Labeling the expected number in row 1 and column 1 as one has 11,E11 1 1/.EOO n=ii For these data, 11235 147 / 793 43.6E=×= Note that E which is the expected number of helmet users not sustaining a head injury is, by similar reasoning, 21,21558 147 / 793 103.4.E=×= This could also be found by subtraction, i.e. 21 1 11.EOE=−i 2In general, one can find the expected counts as /.ij i jEOOn=ii (3) Using formula (3) for the data in (1), the table of expected counts is Helmet Yes No Total Head Injury Yes 43.6 191.4 235 No 103.4 454.6 558 Total 147 646 793 (4) Note that the column and row totals in (1) and (4) are the same, but in (4) the counts are redistributed to give expected values under 0.H • The test statistic is 2(1)(1)()/r c ij ij ijijOE Eχ−−=∑ −2 (5) Under H , this is a chi-square statistic with (r-1)(c-1) df provided 01) all 1ijE >2) no more than 20% of 5.ijE< • The 2χ is illustrated in Figure 15.1. • As an example, for the present data, one has 22212(17 43.6) (130 103.4) (218 191.4)43.6 103.4 191.4(428 454.6)28.3.454.6χ−− −=+ +−+=2 To determine whether this is large under one can find the RR in Table A.8. This is a one-sided test, with critical value 0,H21,.053.84.χ= Therefore, for this data, one would reject with 0,H0.001.p< How would you interpret the result? • The hypothesis testing framework is 1) variables are independent 0:H2) variables have some association :AH3) in (5) 2(1)(1)rcTSχ−−4) 22RRαχχ> 5) Calculations. Find in (3), and substitute data into (5). ijE 3• For the case of a 2x2 table with small n, the use of the continuity correction improves the approximation. The test statistic with the correction is 22||0.5ij ij ijijOE Eχ=∑ − −/ Clearly, it would always reduce the 2χ statistic. When used with the bicycle helmet data, the new value of the statistic is 2127.3.χ= • Another test procedure for 2x2 tables is called Fisher’s exact test. It is computationally intensive. Though used in many statistical software packages, we will not develop it. 1.2 r x c tables • Consider a table in which r and/or c exceeds 2. This is called an r x c table. The general table layout is given in (2). • As an example, consider the text example. There are 575 death certificates which are investigated and classified according to 2 variables. One variable is type of hospital, with outcomes A or B denoting community and university, respectively. The other is death certificate accuracy, with 3 possible outcomes. The data are Certificate Status Accurate Incomplete Needs Change Total A 157 18 54 229 Hospital B 268 44 34 346 Total 425 62 88 575 • The general hypothesis which one could test is 0:H death certificate status is independent of hospital type. • This hypothesis is sometimes expressed in an equivalent but more technical way. Let ijp denote the proportion of certificates from hospital i with certificate status j. This would give the table. Certificate Status 1 2 3 Total 1 11p 12p 13p 1.0 Hospital 2 21p 22p 23p 1.0 The null hypothesis is that the proportions in each certificate status are the same for both hospitals, i.e. 011 2112 22 13 2:, andHp pp p p p3=== 4:H


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 10 pages.

Please select your school