TreatmentChapter 8 continued Chapter 8: Checking model assumptions Estimation Recall a. Predicted Values or LSMEANS (“Least Squares Means”) i. The best estimators of the treatment means iµ are the sample means .iy ii. The standard error of the mean estimator .y is estimated using .()iiMSESE yn=. b. Residuals i. The estimators of the error terms ijε are the residuals under the full model •−=iijijyye. ii. The residuals always sum to 0 and have standard deviation estimated by MSE (“Root Mean Squared Error”) iii. Under the assumptions, the residuals have a Normal distribution with mean 0 and constant variance2εσ. EXAMPLE: the comparisons of the effects of safelights on plant height Treatment Height Predicted Residual D 32.94 34.02 -1.08 D 35.98 34.02 1.96 D 34.76 34.02 0.74 D 32.4 34.02 -1.62 AL 30.55 31.9 -1.35 AL 32.64 31.9 0.74 AL 32.37 31.9 0.47 AL 32.04 31.9 0.14 AH 31.23 30.84 0.39 AH 31.09 30.84 0.25 AH 30.62 30.84 -0.22 AH 30.42 30.84 -0.42 BL 34.41 34.3075 0.1025 BL 34.88 34.3075 0.5725 BL 34.07 34.3075 -0.2375 BL 33.87 34.3075 -0.4375 BH 35.61 34.2925 1.3175 BH 35 34.2925 0.7075 BH 33.65 34.2925 -0.6425 BH 32.91 34.2925 -1.3825Chapter 8 continued Checking the Assumptions of the Model a. Constant Variance i. Graphically – do box plots of the residuals for each treatment and look for similar variabilities Plot of Residual Height By Treatment Residual Height-2-1.5-1-0.500.511.52AH AL BH BL DTreatment Box Plots of Residual Height By Treatment Residual Height-2-1.5-1-0.500.511.52AH AL BH BL DTreatment ii. Hypothesis testing of equality of the treatment variances using Levene’s test or similar. tii,...,2,1,2=σiii. The equal variances assumption is not as critical when the sample sizes are the same. Remember this when designing your experiments.Chapter 8 continued b. Normality i. Graphically 1. do a stem and leaf plot, a histogram, or something similar using the residuals to check for the shape of the distribution and for outliers 2. do a normal probability (quantile) plot of the residuals A Normal quantile plot is a graph of the observed values of the dataset (X-axis) against the expected values of a set of n random selection from a Normal distribution with the mean and variance of the sample data. To interpret: the points on the graph fall on a straight line when the data are normally distributed. They should definitely fall between the 95% confidence limits around the straight line (with a slope of 1). NOTE: usually, normality is NOT reviewed or tested until after any problems with variance are corrected. Example: A study was performed in order to determine if the mean weight of migrating warblers varied across different habitats of pine trees and hardwoods (dp, ep, hw, mw). A total of 174 birds were collected and weighed. Results of the ANOVA: Analysis of Variance Source DF Sum of Squares Mean Square F RatioModel 3 246.6366 82.2122 9.8080Error 170 1424.9719 8.3822 Prob > FC. Total 173 1671.6085 <.0001Chapter 8 continued Residual by Predicted Plot -100weight Residual30 40weight Predicted This plot is useful for checking constant variance and outliers Tests that the Variances are Equal Test F Ratio DFNum DFDen Prob > FO'Brien[.5] 0.2729 3 170 0.8449Brown-Forsythe 0.6891 3 170 0.5599Levene 1.0748 3 170 0.3613Bartlett 0.4077 3 . 0.7475 Conclusion: there is insufficient evidence to reject the null hypothesis that the variances are equal.Chapter 8 continued So, now let’s check for normality: Distribution: Residual weight .01.05.10.25.50.75.90.95.99-3-2-10123Normal Quantile Plot-10 0 Normal(-5e-15,2.86999) Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95%Location Mu -0.00000 -0.42944 0.429440Dispersion Sigma 2.86999 2.59681 3.207908 Based on the quantile plot and the overlay of a Normal distribution on the histogram, there is not much evidence that the assumption that the error terms are Normally distributed is reasonable. We could also do a Shapiro-Wilk’s test here as well. c. Independence and Random selection/allocation i. This is something that is controlled and decided by the scientist when planning and executing the experiment. ii. Important points to consider: in addition to randomly selecting experimental units for inclusion in the study and randomly allocating those units to treatments, one should also randomly order the laboratory analyses of the units after the experiment is over.Chapter 8 continued For example, in the study of height of plants as affected by light regime, the scientist should randomly measure the plants rather than take plants from the same treatment sequentially. Subtle changes in the way measurements are done could be occurring that might influence the results. Remedial Measures Many different methods: 1. change the model to account for the non-independence 2. change the model to account for the unequal variance 3. do a transformation of the data for unequal variance and non-normality 4. use a non-parametric test for severely non-normal data a. Kruskal-Wallis test b.
View Full Document