DOC PREVIEW
UW-Madison STAT 371 - Analysis of Variance

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Display of Cuckoo Bird Egg Lengths 22 23 24 25 Here is a plot of egg lengths mm of cuckoo bird eggs categorized by the species of the host bird 21 Analysis of variance ANOVA is a statistical procedure for analyzing data that may be treated as multiple independent samples with a single quantitative measurement for each sampled individual ANOVA is a generalization of the methods we saw earlier in the course for two independent samples The bucket of balls model is that we have k different buckets of balls each of which contains numbered balls The populations means and standard deviations of the numbers in each bucket are i and i respectively for i 1 k In ANOVA we often assume that all of the population standard deviations are equal 20 Analysis of Variance HedgeSparrow MeadowPipet PiedWagtail Robin TreePipet Wren birdSpecies Statistics 371 Fall 2003 1 Statistics 371 Fall 2003 3 Cuckoo Birds Analysis of Variance Bret Larget Department of Statistics University of Wisconsin Madison December 1 2003 Cuckoo birds have a behavior in which they lay their eggs in other birds nests The other birds then raise and care for the newly hatched cuckoos Cuckoos return year after year to the same territory and lay their eggs in the nests of a particular host species Furthermore cuckoos appear to mate only within their territory Therefore geographical sub species are developed each with a dominant foster parent species A general question is are the eggs of the different sub species distinct so that they are adapted to a particular foster parent species Specifically we can ask are the mean lengths of the cuckoo eggs the same in the different sub species Statistics 371 Fall 2003 Statistics 371 Fall 2003 2 The Big Picture Sums of Squares within Groups ANOVA is a statistical procedure where we test the null hypothesis that all population mean are equal versus the alternative hypothesis that they are not all equal The test statistic is a ratio of the variability among sample means over the variability within sample means When this ratio is large this indicates evidence against the null hypothesis The test statistic will have a different form than what we have previously seen The null distribution is an F distribution named after Ronald Fisher An ANOVA table is an accounting method for computing the test statistic We introduce a lot of new notation on the way Statistics 371 Fall 2003 5 SS within nj I X X i 1 j 1 I X i 1 yij y i 2 ni 1 s2 i Notice that this measure of variability is a weighted sum of the sample variances where the weights are the degrees of freedom for each respective sample Statistics 371 Fall 2003 7 Notation 25 A Dotplot of the Data We measure variability by sums of squared deviations The sums of squares within groups or SS within is a combined measure of the variability within all groups 23 24 This notation is used to describe calculations of variability within samples and variability among samples although for historical reasons of poor grammar the term between samples is more commonly used 22 yij the jth observation in the ith group I the number of groups 21 ni the ith sample size 20 y i the mean of the ith sample HedgeSparrow n MeadowPipet Statistics 371 Fall 2003 PiedWagtail Robin TreePipet Wren 4 I X ni the total number of observations i 1 PI P nj i 1 j 1 yij y the grand mean n Statistics 371 Fall 2003 6 Mean Square Within Degrees of Freedom In ANOVA a mean square will be the ratio of a sum of squares over the corresponding degrees of freedom The degrees of freedom between samples is simply the number of groups minus one SS within df within 2 n1 1 s2 1 nI 1 sI n I In other words the mean square within is a weighted average of the sample variances where the weights are the degrees of freedom within each sample The square root of the mean sqaure within is the estimate of the common variance for all the I populations df between I 1 MS within spooled q MS within Statistics 371 Fall 2003 9 Statistics 371 Fall 2003 11 Sums of Squares Between Among Degrees of Freedom Means The degrees of freedom within samples is simply the sum of degrees of freedom for each sample This is equal to the total number of observations minus the number of groups We measure variability by sums of squared deviations The sums of squares between groups or SS between is a measure of the variability among sample means SS between df within I X i 1 ni y i y 2 Notice that this measure of variability is a weighted sum of the deviations of the sample means from the grand mean weighted by sample size ni 1 i 1 n I Statistics 371 Fall 2003 I X 8 Statistics 371 Fall 2003 10 Total Sum of Squares ANOVA Table for the Cuckoo Example If we treated all observations as coming from a single population which would be the case if all population means were equal and all population standard deviations were equal as well then it would make sense to measure deviations from the grand mean This is the total sum of squares SS total ni I X X I 1 j 1 yij y 2 It turns out that the total sum of squares can be decomposed into the sum of squares within and the sum of squares between SS total SS within SS between Similarly the total degrees of freedom would be n 1 There is a similar decomposition df total df within df between n 1 n I I 1 Statistics 371 Fall 2003 13 Mean Square Between fit aov eggLength birdSpecies anova fit Analysis of Variance Table Response eggLength Df Sum Sq Mean Sq F value Pr F birdSpecies 5 42 940 8 588 10 388 3 152e 08 Residuals 114 94 248 0 827 Signif codes 0 0 001 0 01 0 05 0 1 1 In R the columns are in an unconventional order and there is no row for totals R names the row corresponding to between by the corresponding categorical variable R names the row corresponding to within Residuals There are six groups and 120 total observations which explains the degrees of freedom column Each mean square is the ratio of the corresponding sum of squares and degrees of freedom Statistics 371 Fall 2003 15 The F Statistic In ANOVA a mean square will be the ratio of a sum of squares over the corresponding degrees of freedom The F statistic is the ratio of the means square between over the mean square within MS between MS within If the populations are normal the population means are all equal the standard deviations are all equal and all observations are independent then the F statistic has an F distribution with I 1 and n I degrees of freedom An F distribution is positive and skewed right like the chi square distribution but it has two


View Full Document

UW-Madison STAT 371 - Analysis of Variance

Documents in this Course
HW 4

HW 4

4 pages

NOTES 7

NOTES 7

19 pages

Ch. 6

Ch. 6

24 pages

Ch. 4

Ch. 4

10 pages

Ch. 3

Ch. 3

20 pages

Ch. 2

Ch. 2

28 pages

Ch. 1

Ch. 1

24 pages

Ch. 20

Ch. 20

26 pages

Ch. 19

Ch. 19

18 pages

Ch. 18

Ch. 18

26 pages

Ch. 17

Ch. 17

44 pages

Ch. 16

Ch. 16

38 pages

Ch. 15

Ch. 15

34 pages

Ch. 14

Ch. 14

16 pages

Ch. 13

Ch. 13

16 pages

Ch. 12

Ch. 12

38 pages

Ch. 11

Ch. 11

28 pages

Ch. 10

Ch. 10

40 pages

Ch. 9

Ch. 9

20 pages

Ch. 8

Ch. 8

26 pages

Ch. 7

Ch. 7

26 pages

Load more
Download Analysis of Variance
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Analysis of Variance and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Analysis of Variance and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?