DOC PREVIEW
UF STA 6166 - Means Comparisons

This preview shows page 1-2-23-24 out of 24 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Topic 22 - ANOVA (II) 22-1 Topic 22 – Inference of Means in a One-Way ANOVA When Variance is Constant for All Treatments Once we have rejected the null hypothesis that all means are equal, and we have checked the assumptions of the testing procedure, we usually wish to do some specific tests that can elucidate the relationships among the means. Note how this differs from regression. There, we assumed the relationship among the means was linear and so the only test of interest was of the slope. Here we made no such assumption mainly because X is categorical and so regression would make no sense. So, we want to do specific comparisons between the levels of X. These are variously called multiple comparisons tests, contrasts, or tests of linear combinations of means. A priori Hypotheses: hypotheses about population means that are decided during the planning of the experiment and prior to any data analysis. They are the reason for performing the experiment! A posteriori Hypotheses: hypotheses generated as a result of looking at the data after the experiment has been performed. Also called data snooping or data dredging. This is almost ALWAYS inappropriate and to be avoided. The only validTopic 22 - ANOVA (II) 22-2 reason for doing so is as an exploratory analysis that will guide future experimentation. Example (a posteriori testing): suppose a 1-way ANOVA is performed and the results are obtained. The analyst looks over the results and decides to test 2 means because they appear to be very different (e.g. the smallest and largest ones). Now, the effect could be due to a real difference in population means or to random occurrence due to sampling that makes them appear different. Investigating only comparisons for which the effect appears large leads to a true confidence level for a conclusion that is lower than the stated confidence level. In other words you are more likely to reject H0: not different. It can be shown that the actual confidence is 60% (!!!!) when 6 levels are used in an experiment and the statistical analysis always includes testing the difference between the largest and smallest means using a stated 95% confidence. Note also that that treatments compared each time need not be the same ones since the largest and smallest means could be for different treatments. There are times when it is possible to do a posteriori testing – BUT the statistical method needs to be modified appropriately to account for the data snooping (see later).Topic 22 - ANOVA (II) 22-3 1) Estimation of a Treatment Mean The population mean for the ith treatment iμ is estimated using the sample mean •=iiyμˆ with a standard error of iinMSEySE =•)( Under our assumptions of normality and random sampling, the (1–α)100% Confidence Interval of the ith population mean is )(,2•−•±itNiySEtyα where tNt−,2α is the critical value for the upper tail of a t-distribution on N – t df. Hypothesis testing of a single mean against a constant (μ0) is done using a t-test as is usual for a single population mean.Topic 22 - ANOVA (II) 22-4 2) Estimation of the Difference Between 2 Means The unbiased estimator of the difference between 2 population means kiDμμ−= is ••−=kiikyyDˆ which has a standard error of ⎟⎟⎠⎞⎜⎜⎝⎛+=kiiknnMSEDSE11)ˆ( assuming the variances are homogeneous (which we did assume and checked of course!). Under our assumptions of normality and random sampling, a (1–α)100% Confidence Interval of the difference of two population means (kiμμ− ) is )ˆ(ˆ,2iktNikDSEtD−±α where tNt−,2α is the critical value for the upper tail of a t-distribution on N – t degrees of freedom.Topic 22 - ANOVA (II) 22-5 Again, hypothesis testing is done using the t-test for two independent samples that we reviewed earlier this semester. EXAMPLE: Rehabilitation Therapy. A researcher is interested in the relationship between physical fitness in persons prior to knee surgery and the time required in physical therapy after surgery to obtain successful rehabilitation. 24 male subjects with a similar type of knee surgery during the past year were randomly selected from the patient records at the rehabilitation center. The number of days required for successful rehabilitation and prior physical fitness status were recorded for each patient. The patients were categorized into one of three levels of prior fitness. The hypotheses of interest are: 1) the mean time to recovery will differ among the three groups; 2) the above average fitness group will have a shorter recovery period than either the below average or average group and 3) the average group will have a shorter recovery than the below average group. In other words: 1) H0: belowaverageaboveμμμ== HA: at least one mean differs 2) H0: averageaboveμμ= HA: averageaboveμμ<Topic 22 - ANOVA (II) 22-6 H0: belowaboveμμ= HA: belowaboveμμ< 3) H0: belowaverageμμ= HA: belowaverageμμ< The SAS code and output for analyzing the dataset are: data fitness; input prior_fit $ recovery; datalines; below 29 below 42 below 38 below 40 below 43 below 40 below 30 below 42 average 30 average 35 average 39 average 28 average 31 average 31 average 29 average 35 above 26 above 32 above 21 above 20 above 23 above 22 above 25 above 23 ;Topic 22 - ANOVA (II) 22-7 proc boxplot; plot recovery*prior_fit; quit; proc glm data=fitness; class prior_fit; model recovery = prior_fit; lsmeans prior_fit / pdiff; * the pdiff option does pair-wise tests of means; quit; The output from Proc Boxplot: bel ow aver age above202530354045recoverypr i or _f i t The ANOVA table is:Topic 22 - ANOVA (II) 22-8 Sum of Source DF Squares Mean Square F Value Pr > F Model 2 792.333 396.166667 20.42 <.0001 Error 21 407.500 19.404762 CTotal 23 1199.833 So, we reject the null hypothesis that the means are all equal, i.e. there is sufficient evidence that at least one treatment mean differs from the others. But this is not conclusive until we check the assumptions. Before analyzing the means let’s quickly check the assumptions (SAS code not shown above): resid-9-8-7-6-5-4-3-2-1012345678yhat24 25 26 27 28 29 30 31 32 33 34 35 36 37 38Topic 22 - ANOVA (II) 22-9 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.98123 Pr < W 0.9174


View Full Document

UF STA 6166 - Means Comparisons

Documents in this Course
Exam 1

Exam 1

4 pages

Exam 1

Exam 1

4 pages

Exam 1

Exam 1

4 pages

VARIABLES

VARIABLES

23 pages

Exam #2

Exam #2

4 pages

Exam2

Exam2

6 pages

Sampling

Sampling

21 pages

Exam 1

Exam 1

4 pages

Exam 1

Exam 1

5 pages

Load more
Download Means Comparisons
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Means Comparisons and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Means Comparisons 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?