Unformatted text preview:

1STAT 13, UCLA, Ivo Dinov Slide 1UCLA STAT 13Introduction toStatistical Methods for the Life and Health ScienceszInstructor: Ivo Dinov, Asst. Prof. In Statistics and NeurologyzTeaching Assistants: Tom Daula and Ming ZhengUCLA StatisticsUniversity of California, Los Angeles, Winter 2003http://www.stat.ucla.edu/~dinov/courses_students.htmlSTAT 13, UCLA, Ivo Dinov Slide 2Chapter 11: Tables of CountsWe discussed means and mean differences in Ch. 10 and developed a statistical toolbox for analyzing quantitative variables.Now we want to develop a similar approach for analyzing qualitative variables.Table-of-measurements Æ tables-of-counts;Means Æ proportionsT/F-tests for inference on qualitativevariables ÆChi-square (χ2) tests for categorical data.STAT 13, UCLA, Ivo Dinov Slide 3Chapter 11: Tables of CountszOne-dimensional tablesand goodness of fitzTwo-way tables of countsChi-square test of homogeneityChi-square test of independence2 by 2 tableszThe perils of collapsing tablesSTAT 13, UCLA, Ivo DinovSlide 41-dimensional tables –classify n-individuals in J-categoriesCat. jCat. 1 Cat. 2 Cat. JOProbabilitypJpjp2p1Observed countOJj21OOategory ...EExpected countEJj21EEE = n pjjQualitative (factors), class variablesdefine class/group membership(marital-status, blood-type, etc.)Frequency tables can be used toSummarize discrete/qualitative var’s.STAT 13, UCLA, Ivo DinovSlide 51-dimensional tables cont.expected cell count = total × specified cell probabilityThe T and F statistics are used for inference about quantitative variables. χ2statistics is used for analysis of categorical data.z When H0gives the probabilities of landing in each cell completely (no parameters to be estimated) , P(cell1)=p1, P(cell2)=p2, …, P(cellJ)=pJ, and Σpk=1.z Thus, having J-1 probabilities fixed determines the last probability.df = number of categories - 1STAT 13, UCLA, Ivo DinovSlide 6z The Chi-square test statistic (χ2) has observed valuez The P-value for the test isx02=(observed - expected)2expectedall cells in the table¦P − value = pr(X2≥ x02) where X2 ~ Chi - square(df )Chi-Square Test – goodness of fit testP-value = prob2x0Chi-square (df) density curveTo test a null-hypothesis, H0, we compare the observed counts in thetable to the expected (theoretical)counts. For this reason this test iscalled a goodness-of-fit test –observed/expected count fit.2STAT 13, UCLA, Ivo DinovSlide 7TABLE 11.1.1 Proportions of Three Blood Types A AB B Total No. Observed 39 70 42 151Proportion Observed 0.258 0.464 0.278 1.000Example of 1D table – Three blood typesSTAT 13, UCLA, Ivo DinovSlide 8TABLE 11.1.2 210 Rolls of a Die Outcome 123456Total Count 264037264338210Proportion 0.124 0.190 0.176 0.124 0.205 0.181 1.000Example of 1D table – rolling a single dieWhy aren’t these probabilities all equal?!?Are they supposed to?What are the expected probabilities (1/6)?χ2statistics is x0=7.54, df=5, P-value=0.18STAT 13, UCLA, Ivo DinovSlide 918-29 30-44 45-59 60+ TotalSample :13 29 30 28 100Population :22 32 24 22 100Age group(Percentages)(Percentages)(a) Table of exit-poll sample and population Age distributionsExit poll – sampling voters as they leave polling booths. Exit polls of 10,000 voters.Are there differences between the populationage groups and the exit-poll sample age groups? Younger voter underrepresentedvoters.Real differences or just due to sampling variation?STAT 13, UCLA, Ivo DinovSlide 10010203018-29 30-44 45-59 60+Sample of votersAdult populationAge group(b) Plot of exit-poll sample and population Age distributionsExit poll – Bar-plot of population/sample groupsH0: True proportions in the 4 age groups in thevoter sample and the whole population are the same!STAT 13, UCLA, Ivo DinovSlide 11Exit poll – Bar-plot of population/sample groupsH0: p18-29= 0.22; p30-34= 0.32; p45-59= 0.32; p60+= 0.32;18-29 30-44 45-59 60+ TotalObserved count 1300 2900 3000 2800 10,000Expected count 2200 3200 2400 2200 10,000Age group(c) Table of observed and expected counts(Note: Counts approximate due to the rounding of percentages in the report.)Figure 11.1.1Comparing the age distributions for voters and the population.STAT 13, UCLA, Ivo DinovSlide 12Exit poll – Bar-plot of population/sample groupsP-value = 0.000, very small, indicating extremely strong evidence against the null-hypothesis. The 95% CI for eachage groups are:[12.3 : 13.7]; [28.1 : 30.0]; [29.1 : 30.9]; [27.1 : 28.9]3141groups ofnumber df94.709 table thein cells allexpected2expected) -(observed20=−=−==¦=x3STAT 13, UCLA, Ivo DinovSlide 13The Chi-square distribution051015200.00.5df = 2df = 4df = 7df = 10Figure 11.1.2 Chi-square(df) p.d.f. curves.prob(prob)df2Figure 11.1.3 The ( prob) notation2dfSTAT 13, UCLA, Ivo DinovSlide 14TABLE 11.1.3 Frequency of Winning Numbers in LOTTO 1.(7) 2.(10) 3.(8) 4.(9) 5.(13) 6.(8) 7.(12) 8.(16) 9.(11) 10.(6) 11.(13) 12.(10) 13.(9) 14.(11) 15.(11) 16.(6) 17.(11) 18.(13) 19.(6) 20.(13) 21.(7) 22.(9) 23.(8) 24.(12) 25.(6) 26.(4) 27.(10) 28.(8) 29.(14) 30.(12) 31.(11) 32.(12) 33.(9) 34.(11) 35.(6) 36.(8) 37.(14) 38.(10) 39.(15) 40.(10) Lotto after 399 numbers have been drawn –Do some numbers appear more frequently in LOTTO?01010 20 30 401Number on ball209.975 (Expected freq.)Figure 11.1.4 Frequency of LOTTO winning numbersSTAT 13, UCLA, Ivo DinovSlide 15Lotto after 399 numbers have been drawn –Do some numbers appear more frequently in LOTTO?Number-range: [1:40]Number of balls selected at each draw: 7Number of samples: 57Total number of balls selected: 57*7=399,Expected value of each number: 399/40 = 9.975Observed χ2statistics is x0=30.97df=40-1=39P-value = 0.817Conclusion: No evidence for departure from the null hypothesis. STAT 13, UCLA, Ivo DinovSlide 16Review1. The test statistic for the Chi-square test compares observed and expected frequencies. In what sense are the expected frequencies expected? (Expected frequencies are the frequencies expected in H0were true.)2. What shape does the Chi-square distribution generally have? What happens to its shape as the degrees of freedom increase? (Skewed unimodal, becomes symmetric and Normal approximates it well for large df.)3. What values of the Chi-square test statistic (large or small) provide evidence against the null hypothesis? Why? (Large values, since P-value is small. See density curve.)STAT 13, UCLA, Ivo DinovSlide 17Review4. For one-dimensional tables, how do you compute the degrees of freedom df ?


View Full Document

UCLA STATS 13 - ch11

Documents in this Course
lab8

lab8

3 pages

lecture2

lecture2

78 pages

Lecture 3

Lecture 3

117 pages

lecture14

lecture14

113 pages

Lab 3

Lab 3

3 pages

Boost

Boost

101 pages

Noise

Noise

97 pages

lecture10

lecture10

10 pages

teach

teach

100 pages

ch11

ch11

8 pages

ch07

ch07

12 pages

ch04

ch04

10 pages

ch07

ch07

12 pages

ch03

ch03

5 pages

ch01

ch01

7 pages

ch10

ch10

7 pages

Lecture

Lecture

2 pages

ch06

ch06

11 pages

ch08

ch08

5 pages

lecture16

lecture16

101 pages

lab4

lab4

4 pages

ch01

ch01

7 pages

ch08

ch08

5 pages

lecture05

lecture05

13 pages

Load more
Download ch11
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view ch11 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view ch11 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?