PSU STAT 401 - ONE WAY ANOVA AND TWO SAMPLE COMPARISON - D376445

Home> Schools> Penn State University> Statistics (STAT) > STAT 401> ONE WAY ANOVA AND TWO SAMPLE COMPARISON

PSU STAT 401 - ONE WAY ANOVA AND TWO SAMPLE COMPARISON

Pages 4

Download Save

Unformatted text preview:

ONE WAY ANOVA AND TWO SAMPLE COMPARISON 1 The basic procedure ANOVA is an acronym for Analysis Of Variance It is a methodology for testing hypotheses regarding population means in various contexts One way ANOVA refers to the methodology for testing the equality of two or more population means based on independent samples from each of the populations Thus if mu1 mu2 muk denote the means of k populations one way ANOVA is a methodology for testing H 0 mu1 mu2 muk against the alternative that not all are equal Here we will demonstrate the use of Minitab for carrying out this test procedure using the data from www stat psu edu mga 401 labs 05 lab6 anova fe data txt The data are about total Fe for four types of iron formation 1 carbonate 2 silicate 3 magnetite 4 hematite If the data from the different populations also called factor levels in the ANOVA jargon are given in different columns then use the sequence of commands Stat ANOVA One way Unstacked Enter C1 C4 for Response 95 for confidence level OK If the data from all factor levels are stored in one column there must also be a second column which indicates the group membership of each observation in the first column Call this second column formation The sequence of commands in this case are Stat ANOVA One way Enter Fe as response and formation as Factor OK The output that Minitab produces other software packages produce similar outputs is One way ANOVA C1 C2 C3 C4 Source Factor Error Total DF 3 36 39 S 3 955 Level C1 C2 C3 C4 N 10 10 10 10 SS 509 1 563 1 1072 3 MS 169 7 15 6 R Sq 47 48 Mean 26 080 24 690 29 950 33 840 StDev 3 391 4 425 2 854 4 831 Pooled StDev 3 955 F 10 85 P 0 000 R Sq adj 43 10 Individual 95 CIs For Mean Based on Pooled StDev 24 0 28 0 32 0 36 0 The ANOVA table gives the decomposition of the total sum of squares into a sum of squares due to the population differences Factor and a sum of squares due to the intrinsic error Thus 509 1 563 1 1072 3 not really due to rounding MS SS DF and the F statistic is the ratio of the MS for Factor over MS for error Typically statistics books also have F tables where the value of the F statistic can be looked up We do will not learn how to do that because Minitab produces the p value and does so in much greater accuracy than what we could do from the F tables The F distribution is characterized by two degrees of freedom here the degrees of freedom of the F statistic are 3 and 36 and thus contain only selected percentiles Because the p value is small the hypothesis of equality of the population means is rejected Following the ANOVA table there is information about the estimate of the standard deviation which is assumed to be the same in all populations here the estimate is S 3 955 and it is also given in the last line of the output and the R Sq which has the same significance as explained in the activity for regression The individual sample means estimated standard errors and 95 CI for each population mean are also given 2 Multiple Comparisons for One Way ANOVA When the null hypothesis is rejected it means that the data strongly suggest that at least one of the population means is different from the others When k 2 additional testing needs to be done to identify which means appear to be different This additional testing is called multiple comparisons It involves performing all pair wise comparisons i e testing the null hypothesis of equality of each possible pairs of means in such a way that the probability of committing a type I error for any of these pair wise test procedures does not exceed the designated level of significance alpha One of the ways of doing multiple comparisons is to perform the aforementioned ANOVA test for each pair wise comparison at an adjusted level of significance The adjusted level equals the designated alpha divided by the total number of pair wise comparisons This is called the Bonferroni method Here we will demonstrate a different method called the Tukey method for doing pairwise comparisons in such a way that the overall level of significance is alpha Stat ANOVA One way Unstacked Enter C1 C4 for Response 95 for confidence level Click Comparisons select Tukey s enter family error rate 5 for overall level of significance 0 05 OK OK The additional Minitab output with my comments in brackets is Tukey 95 Simultaneous Confidence Intervals All Pairwise Comparisons Individual confidence level 98 93 C1 subtracted from C2 C3 C4 Lower 6 155 0 895 2 995 Center 1 390 3 870 7 760 Upper 3 375 8 635 12 525 14 0 7 0 0 0 7 0 These are simultaneous CI for the differences mu1 mu2 mu1 mu3 and mu1 mu4 If a CI does not contain 0 the two means are declared significantly different Thus mu1 is significantly different from mu4 but not significantly different from mu2 or from mu3 C2 subtracted from C3 C4 Lower 0 495 4 385 Center 5 260 9 150 Upper 10 025 13 915 14 0 7 0 0 0 7 0 These are simultaneous CI for mu2 mu3 and mu2 mu4 None of these CI contains zero and thus mu2 is significantly different from both mu3 and mu4 C3 subtracted from C4 Lower 0 875 Center 3 890 Upper 8 655 14 0 7 0 0 0 7 0 This is a simultaneous CI for mu3 mu4 The CI contains zero and thus mu3 is not significantly different from mu4 3 A Nonparametric Test for Comparing k Means The ANOVA methodology is exact i e the F statistic has the F distribution only if the population distributions are normal and have the same variance they are homoscedastic but it is approximately valid if the sample sizes are large without the normality assumption provided the populations are homoscedastic Moreover the ANOVA methodology has more power i e rejects the null hypothesis when it is not true with higher probability than any other test only when the k population distributions are normal and homoscedastic An alternative test procedure which is nearly as powerful as ANOVA under normality and homoscedasticty but can be much more powerful than ANOVA when the population distributions are non normal is the Kruskal Wallis test Roughly speaking the Kruskal Wallis procedure consists of combining the data from the k populations and ranking the combined data set from smallest to largest Each of the original observations is then replaced by its rank and the ranks are used instead of the original observations in the ANOVA test statistic The actual Kruskal Wallis test statistic is somewhat different than the procedure just described but the difference gets smaller as the sample sizes increase In this activity we …

View Full Document


School:
Email:
New Password:
Confirm Password:

PSU STAT 401 - ONE WAY ANOVA AND TWO SAMPLE COMPARISON

Sign up for free to view:

Please select your school