1Stat 13, UCLA, Ivo DinovSlide 1UCLA STAT 13Introduction toStatistical Methods for the Life and Health SciencesInstructor: Ivo Dinov, Asst. Prof. of Statistics and NeurologyTeaching Assistants:Jacquelina Dacosta & Chris BarrUniversity of California, Los Angeles, Fall 2006http://www.stat.ucla.edu/~dinov/courses_students.htmlStat 13, UCLA, Ivo DinovSlide 2Chapter 10 Chi-Square TestRelative Risk/Odds RatiosStat 13, UCLA, Ivo DinovSlide 3The χ2Goodness of Fit Testz Let’s start by considering analysis of a single sample of categorical dataz This is a hypothesis test, so we will be going over the four major HT parts:z #1 The general for of the hypotheses: Ho: probabilities are equal to some specified values Ha: probabilities are not equal to some specified valuesz #2 The Chi-Square test statistic (p.393)O – Observed frequencyE – Expected frequency (according to Ho)For the goodness of fit test df = # of categories – 1∑−=EEOs22)(χStat 13, UCLA, Ivo DinovSlide 4The χ2Goodness of Fit Testz Like other test statistics a smaller value for indicates that the data agree with Hoz If there is disagreement from Ho, the test stat will be large because the difference between the observed and expected values is largez #3 P-value: Table 9, p.686http://socr.stat.ucla.edu/htmls/SOCR_Distributions.html Uses df (similar idea to the t table) After first n-1 categories have been specified, the last can be determined because the proportions must add to 1 One tailed distribution, not symmetric (different from t table)z #4 Conclusion similar to other conclusions (TBD)Stat 13, UCLA, Ivo DinovSlide 5The χ2Goodness of Fit TestExample: Mendel's pea experiment. Suppose a tall offspring is the event of interest and that the true proportion of tall peas (based on a 3:1 phenotypic ratio) is 3/4 or p = 0.75. He would like to show that his data follow this 3:1 phenotypic ratio. The hypotheses (#1):Ho:P(tall) = 0.75 (No effect, follows a 3:1phenotypic ratio)P(dwarf) = 0.25 Ha: P(tall) 0.75P(dwarf) 0.25≠≠Stat 13, UCLA, Ivo DinovSlide 6The χ2Goodness of Fit TestSuppose the data were:N = 1064 (Total)Tall = 787 These are the O’s (observed values)Dwarf = 277 To calculate the E’s (expected values), we will take the hypothesized proportions under Hoand multiply them by the total sample sizeTall = (0.75)(1064) = 798 These are the E’s (expected values), Dwarf = (0.25)(1064) = 266Quick check to see if total = 10642Stat 13, UCLA, Ivo DinovSlide 7The χ2Goodness of Fit TestNext calculate the test statistic (#2)The p-value (#3):df = 2 - 1 = 1P > 0.20, fail to reject HoCONCLUSION: These data provide evidence that the true proportions of tall and dwarf offspring are not statistically significantly different from their hypothesized values of 0.75 and 0.25, respectively. In other words, these data are reasonably consistent with the Mendelian 3:1 phenotypic ratio.607.0455.0152.0266)266277(798)798787(222=+=−+−=sχStat 13, UCLA, Ivo DinovSlide 8The χ2Goodness of Fit Testz Tips for calculating χ2(p.393): Use the SOCR Resource (www.socr.ucla.edu)The table of observed frequencies must include ALL categories, so that the sum of the Observed’sis equal to the total number of observations The O’s must be absolute, rather than relative frequencies (i.e., counts not percentages) Can round each part to a minimum of 2 decimal places, if you aren’t using your calculator’s memoryStat 13, UCLA, Ivo DinovSlide 9Compound Hypothesesz The hypotheses for the t-test contained one assertion: that the means were equal or not. z The goodness of fit test can contain more than one assertion (e.g., a=ao, b=bo,…, c=co) this is called a compound hypothesis The alternative hypothesis is non-directional, it measures deviations in all directions (at least oneprobability differs from its hypothesized value)Stat 13, UCLA, Ivo DinovSlide 10Directionalityz RECALL: dichotomous – having two categoriesz If the categorical variable is dichotomous, Hois not compound, so we can specify a directional alternative when one category goes up the other must go down RULE OF THUMB: when df = 1, the alternative can be specified as directionalStat 13, UCLA, Ivo DinovSlide 11DirectionalityExample: A hotspot is defined as a 10 km2area that is species rich (heavily populated by the species of interest). Suppose in a study of butterfly hotspots in a particular region, the number of butterfly hotspots in a sample of 2,588, 10 km2areas is 165. In theory, 5% of the areas should be butterfly hotspots. Do the data provide evidence to suggest that the number of butterfly hotspots is increasing from the theoretical standards? Test using α= 0.01.Stat 13, UCLA, Ivo DinovSlide 12DirectionalityHo: p(hotspot) = 0.05p(other spot) = 0.95Ha: p(hotspot) > 0.05p(other spot) < 0.95 Hotspot Other spot Total Observed 165 2423 2588 Expected (0.05)(2588) = 129.4 (0.95)(2588) = 2458.6 2588 31.1052.079.96.2458)6.24582423(4.129)4.129165(222=+=−+−=sχ3Stat 13, UCLA, Ivo DinovSlide 13Directionalitydf = 2 - 1 = 10.001 < p < 0.01, however because of directional alternative the p-value needs to be divided by 2 (* see note at top of table 9)Therefore, 0.0005 < p < 0.005; Reject HoCONCLUSION: These data provide evidence that in this region the number of butterfly hotspots is increasing from theoretical standards (ie. greater than 5%).Stat 13, UCLA, Ivo DinovSlide 14Goodness of Fit Test, in generalz The expected cell counts can be determined by: Pre-specified proportions set-up in the experiment For example: 5% hot spots, 95% other spots Implied For example: Of 250 births at a local hospital is there evidence that there is a gender difference in the proportion of males and females? Without further information this implies that we are looking for P(males) = 0.50 and P(females) = 0.50.Stat 13, UCLA, Ivo DinovSlide 15z Goodness of fit tests can be compound (i.e., Have more than 2 categories): For example: Of 250 randomly selected CP college students is there evidence to show that there is a difference in area of home residence, defined as: Northern California (North of SLO); Southern California (In SLO or South of SLO); or Out of State? Without further information this implies that we are looking for P(N.Cal) = 0.33, P(S.Cal) = 0.33, and P(Out of State) = 0.33.http://socr.stat.ucla.edu/Applets.dir/SOCRCurveFitter.htmlGoodness of Fit Test, in generalStat 13, UCLA, Ivo
View Full Document