Unformatted text preview:

Lecture 13ANOVA=Analysis of VarianceExamples:Inference for One Way AnovaNematodes and plant growthThe ModelThe ANOVA F-testSlide Number 8Checking our assumptionsDo nematodes affect plant growth?Smoking influence on sleepSlide Number 12Slide Number 13Using Table ESlide Number 15Slide Number 16Yogurt preparation and tasteSlide Number 18Computation detailsNote: Two sample t-test and ANOVASlide Number 21R-first indicationMultiple comparisonsThe Bonferroni procedureR Pairwise test procedure:ContrastsSlide Number 27Nematodes and plant growthNematodes: planned comparisonRelaxing the variance assumptionOne-Way ANOVACh 12` Recall: x A categorical variable = factor. x Its values =levels` ANOVA in general studies the effect of categorical variables on a quantitative variable (response)` One-Way = only one factor with several levels` This is similar with testing if two population means are equal (except that we have more than two populations.Example 1: Numbers of days for healing a standard wound (in an animal) for several treatments.Example 2: Wages of different ethnic groups in a company.Example 3: Lifetimes of different brands of tires.` If comparing means of two groups, ANOVA is equivalentto a 2-sample (two-sided) pooled t-test ` ANOVA allows for 3 or more groups.`We first examine the multiple populations or multiple treatments to test for overall statistical significance as evidence of any differenceamong the parameters we want to compare. ÎANOVA F-test` If that overall test showed statistical significance, then a detailed follow-up analysis is legitimate.◦ If we planned our experiment with specific alternative hypotheses in mind (beforegathering the data), we can test them using contrasts.◦ If we do not have specific alternatives, we can examine all pair-wise parameter comparisons to define which parameters differ from which, using multiple comparisons procedures.Do nematodes affect plant growth? A botanist prepares 16 identical planting pots and adds different numbers of nematodes into the pots. Seedling growth (in mm) is recorded two weeks later.Nematodes and plant growthNematodes0 10.8 9.1 13.5 9.2 10.651,000 11.1 11.1 8.2 11.3 10.435,000 5.4 4.6 7.4 5 5.610,000 5.8 5.3 3.2 7.5 5.45Seedling growthoverall mean 8.03x iHypotheses: All μiare the same (H0) versus not All μiare the same (Ha)Random sampling always produces chance variations. Any “factor effect” would thus show up in our data as the factor-driven differences plus chance variations (“error”):Data = fit (“factor/groups”) + residual (“error”)The one-way ANOVA model analyses situations where chance variations are normally distributed N(0,σ) so that:We have I independent SRSs, from I populations or treatments.The ithpopulation has a normal distribution with unknown mean µi.All I populations have the same standard deviation σ, unknown.The ANOVA F statistic tests:When H0is true, F has the F distribution with I − 1 (numerator) and N − I (denominator) degrees of freedom.H0: μ1= μ2= … = μIHa: not all the μiare equal.)(SSE)1(SSGINIF−−=sample samein sindividual amongvariation means sample amongvariation =FDifference in means small relative to overall variabilityDifference in means large relative to overall variabilityLarger F-values typically yield more significant results. How large depends on the degrees of freedom (I − 1 and N − I).The ANOVA F-statistic compares variation due to specific sources (levels of the factor) with variation among individuals who should be similar (individuals in the same sample). Î F tends to be small Î F tends to be largeThe ANOVA F-test requires that all populations have the same standard deviation σ. Since σis unknown, this can be hard to check. Practically: The results of the ANOVA F-test are approximately correct when the largest sample standard deviation is no more than twice as large as the smallest sample standard deviation.(Equal sample sizes also make ANOVA more robust to deviations from the equal σrule)Each of the #I populations must be normally distributed (histograms or normal quantile plots). But the test is robust to normality deviations for large enough sample sizes, thanks to the central limit theorem.Seedling growth x¯isi0 nematode 10.8 9.1 13.5 9.2 10.65 2.0531000 nematodes 11.1 11.1 8.2 11.3 10.425 1.4865000 nematodes 5.4 4.6 7.4 5.0 5.6 1.24410000 nematodes 5.8 5.3 3.2 7.5 5.45 1.771Conditions required: • equal variances: checking that largest sino more than twice smallest siLargest si= 2.053; smallest si = 1.244• Independent SRSsFour groups obviously independent • Distributions “roughly” normalIt is hard to assess normality with onlyfour points per condition. But the pots in each group are identical, and there is no reason to suspect skewed distributions.A study of the effect of smoking classifies subjects as nonsmokers, moderate smokers, and heavy smokers. The investigators interview a random sample of 200 people in each group and ask “How many hours do you sleep on a typical night?”1. Study design?2. Hypotheses?3. ANOVA assumptions?4. Degrees of freedom?1. This is an observational study.Explanatory variable: smoking -- 3 levels: nonsmokers, moderate smokers, heavy smokersResponse variable: # hours of sleep per night2. H0: all 3 μiequal (versus not all equal)3. Three obviously independent SRS. Sample size of 200 should accommodate any departure from normality. Would still be good to check for smin/smax. 4. I = 3, n1 = n2 = n3 = 200, and N = 600, so there are I - 1 = 2 (numerator) and N - I = 597 (denominator) degrees of freedom.The ANOVA tableSource of variation Sum of squares SSDF Mean square MSF P value F critAmong or between “groups”I -1 SSG/DFG MSG/MSE Tail area above FValue of F for αWithin groups or “error”N - I SSE/DFETotal SST=SSG+SSE N –1∑−2)( xxnii∑−2)( xxij∑−2)1(iisnR2= SSG/SST √MSE = spCoefficient of determination Pooled standard deviationThe sum of squares represents variation in the data: SST = SSG + SSE. The degrees of freedom likewise reflect the ANOVA model: DFT = DFG + DFE.Data (“Total”) = fit (“Groups”) + residual (“Error”)Here, the calculated F-value (12.08) is larger than Fcritical(3.49) for α=0.05. (or just look at the p-value directly)Thus, the test is significant at α5% Î Not all mean seedling lengths are the same; nematode amount is an influential factor.The F distribution is asymmetrical and has two distinct degrees of freedom. This was discovered


View Full Document

STEVENS MA 331 - MA 331 Lecture 13

Download MA 331 Lecture 13
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view MA 331 Lecture 13 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view MA 331 Lecture 13 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?