DOC PREVIEW
UF STA 6166 - CATEGORICAL DATA – Chi-Square Tests for Univariate Data

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CATEGORICAL DATA – Chi-Square Tests For Univariate Data 1 CATEGORICAL DATA – Chi-Square Tests for Univariate Data Recall that a categorical variable is one in which the possible values are categories or groupings. We’ve seen one such variable: it’s the binary variable with only two possible outcomes: success or failure. In this topic we explore testing hypotheses about categorical variables with MORE than two outcomes. EXAMPLE Consider an experiment in which two different tomato phenotypes are crossed and the resulting offspring observed. The parent types are tall cut-leaf tomatoes and dwarf potato-leaf tomatoes. Variable: Offspring Phenotype Possible Values: 1) tall cut-leaf, 2) tall potato-leaf, 3) dwarf cut-leaf, and 4) dwarf potato-leaf. If Mendel’s laws of inheritance hold, the resulting population proportions in the offspring would be 1) 16/9, 2) 16/3, 3) 16/3, and 4) 16/1. One might hypothesize that Mendel’s Laws don’t hold for these genes. In an experiment to test that, the researcher observed the proportions 1) 0.575, 2) 0.179, 3) 0.182, and 4) 0.065 based on a sample of 1611 offspring.CATEGORICAL DATA – Chi-Square Tests For Univariate Data 2 EXAMPLE Consider an observational study in which the types of insects that feed on the nectar from a certain flower are studied. The scientist randomly selects hours during the day over several days during the summer season and selects several different plants. She counts the number of different kinds of insects that feed at the plant during the study. Variable: Insect Family Possible Values: 1) bees, 2) wasps, or 3) flies One might hypothesize that this flower attracts the different insect families in unequal proportions. Important Point: Testing procedures for hypotheses of this form are called Goodness-of-Fit tests. These tests compare the sample proportions to the hypothesized proportions to see how “good the fit is”. Important Point: These categories must be mutually exclusive and exhaustive. Notation: k = number of possible categories that the variable of interest can have.CATEGORICAL DATA – Chi-Square Tests For Univariate Data 3 Category True Population Proportion Sample Proportion HypothesizedPopulation Proportion 1 1π 1ˆπ 01π 2 2π 2ˆπ 02π … … … … k kπ kπˆ 0kπ “Exhaustive” means that ∑=1iπ, ∑=1ˆiπ, and . ∑= 10iπ EXAMPLE tomatoes and Mendel’s Laws . k = 4 Category True Population Proportion Sample Proportion Hypothesized Population Proportion tall cut-leaf 1π 575.0ˆ1=π 16901=π tall potato 2π 179.0ˆ2=π 16302=π dwarf cut 3π 182.0ˆ3=π 16303=π dwarf potato 4π 065.0ˆ4=π 16104=πCATEGORICAL DATA – Chi-Square Tests For Univariate Data 4 Now, for a sample of size n and a set of hypothesized proportions under the null hypothesis, I can calculate how many sample units should be in each category (if there was no sampling variability, of course). These numbers are called the EXPECTED CELL COUNTS under the null hypothesis and are calculated as n×hypothesized value (0iπ) for that category (cell). The OBSERVED CELL COUNTS are the actual counts seen in each category during the experiment. Category Expected Count Observed Count 1 01πn 1ˆπn 2 02πn 2ˆπn … … … k 0knπ knπˆ Important Point: This test procedure is valid only if the sample sizes and hypothesized proportions are such that virtually every cell has an expected count of 5 or more. If they aren’t you must use a different test procedure.CATEGORICAL DATA – Chi-Square Tests For Univariate Data 5 EXAMPLE Tomatoes & Mendel’s Laws. n = 1611 Category Expected Count Observed Count tall cut-leaf 2.906)16/9(161101==πn 926ˆ1=πn tall potato 1.302)16/3(161102==πn 288ˆ2=πn dwarf cut 1.302)16/3(161103==πn 293ˆ2=πn dwarf potato 7.100)16/1(161104==πn 104ˆ=knπ Hypotheses: Ho: 16/91=π , 16/32=π , 16/33=π, and 16/14=π HA: not Ho (Ho is not true) Important Point: Note how uninformative the alternative hypothesis is in a goodness-of-fit test. These tests compare the sample data against a specific set of hypothesized proportions. If the null hypothesis is rejected, one cannot tell what the true proportions are, only that they are not the ones listed in the null hypothesis. Significance Level: let’s choose α=0.04.CATEGORICAL DATA – Chi-Square Tests For Univariate Data 6 Test Statistic: is a summary of the comparison of the observed and expected cell counts. The actual form is ∑=Χcells all22count expectedcount) expected-count (observed This is called the CHI-SQUARE or GOODNESS-OF-FIT STATISTIC. Important Point: the closer the expected and observed counts are to each other, the smaller the value of . Small values of support the null hypothesis and large values support H2Χ2ΧA. EXAMPLE tomatoes and Mendel’s Laws. Category Expected Count Observed Count 020)ˆ(πππnnn −tall cut-leaf 2.90601=πn 926ˆ1=πn 0.433 tall potato 1.30202=πn 288ˆ2=πn 0.658 dwarf cut 1.30203=πn 293ˆ2=πn 0.274 dwarf potato 7.10004=πn 104ˆ=knπ 0.108 So, 473.1108.274.658.433.2=+++=ΧCATEGORICAL DATA – Chi-Square Tests For Univariate Data 7 P-value: under the null hypothesis, the test statistic has a sampling distribution known as the CHI-SQUARE DISTRIBUTION. 2Χ Like the T-distribution, the shape of the Chi-Square Distribution depends on the degrees of freedom. Here, df = k – 1. Important Point: the degrees of freedom for the Chi-Square Goodness of Fit test are the number of categories (k) minus 1 NOT the sample size minus 1. The p-value is the area under the Chi-square distribution to the right of the test statistic value: To find the P-value, first calculate 2Χ and the df. Then go to Table 8 (page 686 of the text).CATEGORICAL DATA – Chi-Square Tests For Univariate Data 8 Find the row labeled with the df you have for your test. Go across the values in the row, until you find two values that bracket your value. Read the P-value from the tops of the columns containing the two bracketing values. 2Χ EXAMPLE tomatoes, df=4-1=3 and 2Χ=1.473. So, on page 686, go to the row labeled “df=3” and find the closest value to 1.47. It’s bracketed by the values 0.5844 to the left and 6.251 to the right. The column headers for these two values are 0.90 (left) and 0.10 (right). This says that the P-value falls between 0.10


View Full Document

UF STA 6166 - CATEGORICAL DATA – Chi-Square Tests for Univariate Data

Documents in this Course
Exam 1

Exam 1

4 pages

Exam 1

Exam 1

4 pages

Exam 1

Exam 1

4 pages

VARIABLES

VARIABLES

23 pages

Exam #2

Exam #2

4 pages

Exam2

Exam2

6 pages

Sampling

Sampling

21 pages

Exam 1

Exam 1

4 pages

Exam 1

Exam 1

5 pages

Load more
Download CATEGORICAL DATA – Chi-Square Tests for Univariate Data
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CATEGORICAL DATA – Chi-Square Tests for Univariate Data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CATEGORICAL DATA – Chi-Square Tests for Univariate Data 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?