Seizure TypeExpected TableR x C Contingency TablesTables with R rows and C columns that display the relationship between two variables. Row variable has R categories. Column variable has C categories. Tables can be used if we want to conduct a test of homogeneity; i.e., we want to compare the proportion with some characteristic in 3 or more populations. An example is the proportion of obese children in Louisiana vs. Mississippi vs. Alabama vs. Arkansas. They can also be used if we want to conduct a test of independence; i.e., we want to see if the distribution of one variable is the same no matter what the distribution of the other variable in a single population. An example ofthe latter is hair color and gender.Variable 1Variable 2 1 2 3 … c Total1 n11n12n1cn1.2 n21n22n2cn2.3…r nr1nr2nrcTotal n.1n.2n..The above are the observed frequencies and can be labeled O11, O12, …, O1c, O21, O22, …, O2c, …,Orc.Homogeneity:H0: The proportion with some characteristic is thesame in each populationH1: The proportion with some characteristic is not the same in each populationIndependence:H0: In a population, the 2 variables are independentH1: In a population the 2 variables are not independentGeneralizing from the 2 X 2 situation, the expected table can be found in a similar way. The expected number of units in the (i, j) cell = Eij = (the total in the ith row) * (the total in the jth column) / the grand total. The sum of the expected values for any row or column must equal the sum for the corresponding row or column in the observed table. Test Statistic to Compare Observed with Expected 21122))*(c(ri,jijijij~χEEOXH0 will be rejected for large values of X2; that is, if the observed and expected counts are very dissimilar, the more likely we are to reject H0.Note: No continuity correction is used for contingency tables larger than 2 X 2. The test should not be used, however, if the expected values of the cells are too small; that is, ifa. No more than 1/5 of the cells have expected values < 5b. No cell has an expected value < 1.For a test at level of significance α, reject H0 ifX2 > 2(r-1)*(c-1), 1-p-value: The approximate p-value is given by the area to the right of X2 under the 2(r-1)*(c-1) distribution.Example: In data from the Honolulu Heart Study,we want to see if smoking status and educational level are independent (one sample; two variables).EducationLevelSmoker Non-smokerTotalNone 9 16 25Primary 15 17 32Intermediate 12 12 24High 1 18 19Total 37 63 100H0: Smoking status and education level are independentH1: Smoking status and education level are not independent = 0.05Test Statistic: 21322*i,jijijij~χEEOXReject H0 if X2 > 23,0.95 = 7.81584111003732751510063252591003725211211.*E.*E.*ETable of Expected FrequenciesEducationLevelSmoker Non-smokerTotalNone 9.25 15.75 25Primary 11.84 20.16 32Intermediate 8.88 15.12 24High 7.03 11.97 19Total 37 63 100 299110377317225643800962149530843400040000680971197111803703711215121512888888121620162017841184111575157515162592599EEOX22222222ji,ij2ijij2.........................Decision: Reject H0p-value: 0.01 < p < 0.025Conclusion:Example: Patients in 3 age groups who had an unprovoked seizure were studied for 22 months to determine the risk of later seizure. 63 individuals less than 20 years old, 82 individuals ages 20-39 years, and 93 individuals 40 years and older had a later seizure. These types of seizures can be classified into two groups, idiopathic or remote symptomatic. Does the seizure type vary by age group? (Three populations, one variable)Seizure TypeAge Group I RS Total<20 52 11 6320-39 66 16 8240+ 55 38 93Total 173 65 238H0: p1 = p2 = p3 H1: At least two of the proportions with idiopathicseizure type differReject H0 if X2 > χ22, 0.95 = 5.990.5919355pˆ0.8058266pˆ0.8256352pˆ321Expected TableSeizure TypeAge Group I RS Total<20 45.79 17.21 6320-39 59.61 22.39 8240+ 67.60 25.40 93Total 173 65 238Expected: For cell (I, <20) 14.190625.4025.4038...59.6159.616645.7945.7952XTherefore,45.79238173*63E222211Reject H0 since X2 > 22,0.95 = 5.99; p < 0.001Example: Kodama et al. studied the relationship between age and several prognostic factors in squamous cell carcinoma of the cervix. Among the data collected were the frequenciesof histologic cell types in four age groups. Assume we have a random sample from each of the four populations of interest. We want to test whether the populations represented bythe four-age-group samples are homogeneous with respect to cell-type. Let = 0.05. H0: p1 = p2 = p3 = p4H1: At least two of the proportions differ. The four populations are not homogenous with respect to cell type.Observed Frequency of Histologic Cell Type by Age GroupCell TypeAge group (years)Large cellnon-keratinizingKeratinizing Small cellnon-keratinizingTotal30-39 18 7 9 3440-49 56 29 12 9750-59 83 38 23 14460-69 62 25 18 105Total 219 99 62 380Reject H0 if X2 > χ2(4 - 1)*(3 - 1), 0.95 = χ26, 0.95 = 12.5959193802193411.*E Expected Frequency of Histologic Cell Type by Age GroupCell TypeAge group (years)Large cellnon-keratinizingKeratinizing Small cellnon-keratinizingTotal30-39 19.59 8.86 5.55 3440-49 55.90 25.27 15.83 9750-59 82.99 37.52 23.49 14460-69 60.51 27.36 17.13 105Total 219 99 62 380 4.4440.044...0.3900.12917.1317.1318...8.868.86719.5919.5918X2222Do not reject H0. 45.335.5225.0,6250.0,6Therefore, (1 - 0.50) < p < (1 - 0.25) or .50 < p < 0.75We conclude that the four populations (age groups) may be homogenous with respect to cell
View Full Document