Chi-square testMore types of inference for nominal variablesGoodness of fit testSlide 4Slide 5Slide 6Goodness of fit test: Test statisticGoodness of fit test: Test statisticGoodness of fit test: Calculate p-valueSlide 10Chi-squared tableJMP output admissionsGoodness of fit test: Judging p-valueIndependence testSample of conditional frequenciesTest of independenceImplications of independenceSlide 18Slide 19Test of independence:Slide 21JMP output for chi-squared testChi-squared test detailsChi-squared test itemsChi-squared testFPP 28Chi-square testMore types of inference for nominal variablesNominal data is categorical with more than two categoriesCompare observed frequencies of nominal variable to hypothesized probabilitiesOne categorical variable with more than two categoriesChi-squared goodness of fit testTest if two nominal variables are independentTwo categorical variables with at least one having more than two categoriesChi-squared test of independenceGoodness of fit testDo people admit themselves to hospitals more frequently close to their birthday?Data from a random sample of 200 people admitted to hospitalsDays from birthdayNumber of admissionswithin 7 118-30 2431-90 6991+ 96Goodness of fit testAssume there is no birthday effect, that is, people admit randomly. Then, Pr (within 7) = = .0411 Pr (8 - 30) = = .1260 Pr (31-90) = = .3288 Pr (91+) = = .5041 So, in a sample of 200 people, we’d expect to be in “within 7” to be in “8 - 30” to be in “31 - 90” to be in “91+”Goodness of fit testIf admissions are random, we expect the sample frequencies and hypothesized probabilities to be similarBut, as always, the sample frequencies are affected by chance errorSo, we need to see whether the sample frequencies could have been a plausible result from a chance error if the hypothesized probabilities are true. Let’s build a hypothesis testGoodness of fit testHypothesisClaim (alternative hyp.) is admission probabilities change according to days since birthdayOpposite of claim (null hyp.) is probabilities in accordance with random admissions.H0 : Pr (within 7) = .0411 Pr (8 - 30) = .1260 Pr (31-90) = .3288 Pr (91+) = .5041HA : probabilities different than those in H0 .Goodness of fit test: Test statisticChi-squared test statistic€ X2= sum(observed - expected)2expected ⎛ ⎝ ⎜ ⎞ ⎠ ⎟Goodness of fit test: Test statistic€ X2= sum(observed - expected)2expected ⎛ ⎝ ⎜ ⎞ ⎠ ⎟= .94 + .057 + .16 + .23 =1.397Cell Obs Exp Dif Dif2Dif2/ExpIn 78-3031-9091+Goodness of fit test: Calculate p-valueX2 has a chi-squared distribution with degrees of freedom equal to number of categories minus 1. In this case, df = 4 – 1 = 3.Goodness of fit test: Calculate p-valueTo get a p-value, calculate the area under the chi-squared curve to the right of 1.397Using JMP, this area is 0.703. If the null hypothesis is true, there is a 70% chance of observing a value of X2 as or more extreme than 1.397Using the table the p-value is between 0.9 and 0.70Chi-squared tableJMP output admissions31 - 90 8 - 30 91+ Within 731 - 90 8 - 30 91+ Within 731 - 908 - 3091+Within 7TotalLevel 69 24 96 11 200Count0.345000.120000.480000.055001.00000Prob 4 LevelsFrequencies31 - 908 - 3091+Within 7Level0.345000.120000.480000.05500Estim Prob0.329000.126000.504000.04100Hypoth ProbLikelihood RatioPearsonTest 1.3063 1.3974ChiSquare 3 3DF0.72760.7061Prob>Chisq Method: Fix hypothesized values, rescale omittedTest ProbabilitiesDaysDistributionsGoodness of fit test: Judging p-valueThe .70 is a large p-value, indicating that the difference between the observed and expected counts could well occur by random chance when the null hypothesis is true. Therefore, we cannot reject the null hypothesis. There is not enough evidence to conclude that admissions rates change according to days from birthday.Independence testIs birth order related to delinquency?Nye (1958) randomly sampled 1154 high school girls and asked if they had been “delinquent”. Eldest 24 450In Between 29 312Youngest 35 211Only 23 70Sample of conditional frequencies% Delinquent for each birth order statusBased on conditional frequencies, it appears that youngest are more delinquentCould these sample frequencies have plausibly occurred by chance if there is no relationship between birth order and delinqeuncyOldest .05Middle .085Youngest .14Only .25Test of independenceHypothesesWant to show that there is some relationship between birth order and delinquency.Opposite is that there is no relationship.H0 : birth order and delinquency are independent.HA : birth order and delinquency are dependent.Implications of independenceExpected countsUnder independence, Pr(oldest and delinquent) = Pr(oldest)*Pr(delinquent)Estimate Pr(oldest) as marginal frequency of oldestEstimate Pr(delinquent) as marginal frequency of delinquentHence, estimate Pr(oldest and delinquent) asThe expected number of oldest and delinquent, under independence, equalsThis is repeated for all the other cells in tableTest of independenceExpected countsNext we compare the observed counts with the expected to get a test statisticOldest 45.59 428.41In Between32.80 308.2Youngest 23.66 222.34Only 8.95 84.05Use the X2 statistic as the test statistic:245.4205.84)05.8470(95.8)95.823(34.222)34.222211(66.23)66.2335(2.308)2.308312(80.32)80.3229(41.428)41.428450(49.45)59.4524(222222222XTest of independence:Calculate the p-valueX 2 has a chi-squared distribution with degrees of freedom:df = (number rows – 1) * (number columns – 1) In delinquency problem, df = (4 - 1) * (2 - 1) = 3.The area under the chi-squared curve to the right of 42.245 is less than .0001. There is only a very small chance of getting an X2 as or more extreme than 42.245.JMP output for chi-squared testFreq: Column 3B ir t h O r d e rEldestIn Betw eOnly ChildYoungest 450 38.99 43.14 94.94428.407 1.0883 24 2.08 21.62 5.0645.592710.2263 312 27.04 29.91 91.50 308.2 0.0468 29 2.51 26.13 8.5032.7998 0.4402 70 6.07 6.71 75.2784.0546 2.3500 23 1.99 20.72 24.738.9454122.0819 211 18.28 20.23 85.77222.338 0.5782 35 3.03 31.53 14.23 23.662 5.4327
View Full Document