ANOVA (II) 1 One Way Analysis of Variance (II) Two issues still to be dealt with: a) checking the assumptions of the model, and b) inference on individual means or combinations of means 1. Estimation a. Predicted Values or LSMEANS (“Least Squares Means”) i. The best estimators of the “cell” means iµ are the sample means•iy ii. The variance of the mean estimators is estimated using iinMSEySE =•)( . b. Residuals i. The estimators of the error terms ijε are the residuals •−=iijijyye. ii. The residuals always sum to 0 (i.e.,∑∑=ijije 0) and have variance estimated by MSE.ANOVA (II) 2 iii. Under the assumptions, the residuals have a Normal distribution with mean 0 and constant variance.ANOVA (II) 3ANOVA (II) 4 Example: the comparisons of the effects of safelights on plant height proc mixed; class tmt; model height = tmt/ outp=resids; quit; proc print data=resids; var tmt height pred resid stderrpred; quit; Obs tmt height Pred Resid StdErrPred 1 D 32.94 34.0200 -1.0800 0.52244 2 D 35.98 34.0200 1.9600 0.52244 3 D 34.76 34.0200 0.7400 0.52244 4 D 32.40 34.0200 -1.6200 0.52244 5 AL 30.55 31.9000 -1.3500 0.52244 6 AL 32.64 31.9000 0.7400 0.52244 7 AL 32.37 31.9000 0.4700 0.52244 8 AL 32.04 31.9000 0.1400 0.52244 9 AH 31.23 30.8400 0.3900 0.52244 10 AH 31.09 30.8400 0.2500 0.52244 11 AH 30.62 30.8400 -0.2200 0.52244 12 AH 30.42 30.8400 -0.4200 0.52244 13 BL 34.41 34.3075 0.1025 0.52244 14 BL 34.88 34.3075 0.5725 0.52244 15 BL 34.07 34.3075 -0.2375 0.52244 16 BL 33.87 34.3075 -0.4375 0.52244 17 BH 35.61 34.2925 1.3175 0.52244 18 BH 35.00 34.2925 0.7075 0.52244 19 BH 33.65 34.2925 -0.6425 0.52244 20 BH 32.91 34.2925 -1.3825 0.52244ANOVA (II) 5 2. Checking The Assumptions of the Model a. Constant Variance i. Graphically – do box plots of the residuals for each treatment and look for similar variabilities ii. Hypothesis testing of the sample variances using Levene’s test or Hartley’s test 2is b. Normality i. Graphically 1. do a stem and leaf plot, a histogram, or something similar using the residuals to check for the shape of the distribution and for outliers 2. do a normal probability plot of the residuals NOTE: usually, normality is NOT reviewed or tested until after any problems with variance are corrected. Obviously, if the variances are unequal it is highly likely that the distribution of the residuals will look platykurtotic. c. Independence and Random selection/allocation i. This is something that is controlled and decided by the scientist when planning and executing the experiment. ii. Important points to consider: in addition to randomly selecting experimental units for inclusion in the study and randomly allocating those units to treatments, one should also randomly order the laboratory analyses of the units after the experiment is over.ANOVA (II) 6 For example, in the study of height of plants as affected by light regime, the scientist should randomly measure the plants rather than take plants from the same treatment sequentially. Subtle changes in the way measurements are done could be occurring that might influence the results. Remedial Measures Many different methods: 1. change the model to account for the non-independence 2. change the model to account for the unequal variance 3. do a transformation of the data for unequal variance and non-normality 4. use a non-parametric test for severely non-normal data a. Kruskal-Wallis test b. BootstrappingANOVA (II) 7 Estimation in a One-Way ANOVA Once we have rejected the null hypothesis that all means are equal, and we have checked the assumptions of the testing procedure, we usually wish to do some specific tests that can elucidate the relationships among the means. These tests are variously called multiple comparisons, contrasts, or estimation of linear combinations of means. A priori Hypotheses: hypotheses about population means that are decided during the planning of the experiment. They are the reason for performing the experiment! A posteriori Hypotheses: hypotheses generated as a result of looking at the data after the experiment has been performed. Also called data snooping or data dredging. This is almost ALWAYS inappropriate and to be avoided. The only valid reason for doing so is as an exploratory analysis that will guide future experimentation. Example (a posteriori testing): suppose a 1-way ANOVA is performed and the results are obtained. The analyst looks over the results and decides to test 2 means because they appear to be very different. Now, the effect could be due to a real difference in population means or to random occurrence due to sampling that makes them appear different. Investigating only comparisons for which the effect appears large implies that the true confidence level for a conclusion is lower than the stated confidence level when there is no difference. In other words you are more likely to reject H0: not different. It can be shown that the actualANOVA (II) 8 confidence is 60% (!!!!) when 6 levels are used in an experiment and the statistical analysis always includes testing the difference between the largest and smallest means using a stated 95% confidence (note that these means need not be the same treatment means each time). There are times when it is possible to do a posteriori testing – BUT the statistical method needs to be modified appropriately to account for the data snooping (see later). 1) Estimation Of A Treatment Mean The population mean for the ith treatmentiµis estimated using the sample mean •=iiyµˆ with a standard error of iinMSEySE =•)( Under our assumptions of normality and random sampling, the (1–α)100% Confidence Interval of the population mean is )(,2•−•±itNiySEtyα where tNt−,2α is the critical value for the upper tail of a t-distribution on N – t df. Hypothesis testing is done using a t-test as is usual for a single population mean.ANOVA (II) 9 2) Estimation Of The Difference Between 2 Treatment Means The unbiased estimator of the difference between 2 population means
View Full Document