ANOVA and linear regression July 15, 2004ANOVA for comparing means between more than 2 groupsANOVA (ANalysis Of VAriance)ANOVA AssumptionsThe “F-test”PowerPoint PresentationGroup means and standard deviationsThe F-TestThe F-distributionSlide 10ANOVA TableANOVA=t-testANOVA summaryQuestion: Why not just do 3 pairwise ttests?Multiple comparisonsSlide 16Correction for multiple comparisonsNon-parametric ANOVALinear regressionOutlineReview: what is “Linear”?Review: what’s slope?ExampleBirth-weight depends on gestation time (hypothetical data)Linear regression equation:PredictionSlide 27Slide 28At 30 weeks…Slide 30At 30 weeks…And, if X=20, 30, or 40…If X=20, 30, or 40…Mean values fall on the lineAssumptions (or the fine print)Non-homogenous varianceA ttest is linear regression!Multiple Linear RegressionANOVA is linear regression!Example: ANOVA = linear regressionFunctions of multivariate analysis:Multiple linear regression caveatsOther types of multivariate regressionReading for this weekNote: Midterm next weekANOVA and linear regressionANOVA and linear regressionJuly 15, 2004July 15, 2004ANOVAANOVAfor comparing means between for comparing means between more than 2 groupsmore than 2 groupsANOVA ANOVA ((ANANalysis alysis OOf f VAVAriance)riance)Idea: For two or more groups, test difference between means, for quantitative normally distributed variables. Just an extension of the t-test (an ANOVA with only two groups is mathematically equivalent to a t-test).Like the t-test, ANOVA is “parametric” test—assumes that the outcome variable is roughly normally distributed with a mean and standard deviation (parameters) that we can estimateANOVA ANOVA AssumptionsAssumptionsAssumptions: Normally distributed outcome variable; homogeneity of variances (like t-test)The “F-test”The “F-test”groupswithinyVariabilitgroupsbetweenyVariabilitF Is the difference in the means of the groups more than background noise (=variability within groups)?amenorrheic oligomenorrheic eumenorrheic0.70.80.91.01.11.2SPINEBetween group variationSpine bone density vs. Spine bone density vs. menstrual regularity menstrual regularity Within group variabilityWithin group variabilityWithin group variabilityGroup means and standard Group means and standard deviationsdeviationsAmenorrheic group (n=11):–Mean spine BMD = .92 g/cm2–standard deviation = .10 g/cm2Oligomenorrheic group (n=11)–Mean spine BMD = .94 g/cm2–standard deviation = .08 g/cm2Eumenrroheic group (n=11)–Mean spine BMD =1.06 g/cm2–standard deviation = .11 g/cm2The F-TestThe F-Test063.)13)97.06.1()97.94(.)97.92(.(*1122222xbetweennss0095.)11.08.10(.3122222 savgswithin6.60095.063.2230,2withinbetweenssFThe size of the groups.The difference of each group’s mean from the overall mean.Between-group variation. The average amount of variation within groups. Each group’s variance.Large F value indicates that the between group variation exceeds the within group variation (=the background noise).The F-distributionThe F-distributionThe F-distribution is a continuous probability distribution that depends on two parameters n and m (numerator and denominator degrees of freedom, respectively):The F-distributionThe F-distributionA ratio of sample variances follows an F-distribution: 22220::withinbetweenawithinbetweenHHThe F-test tests the hypothesis that two sample variances are equal. F will be close to 1 if sample variances are equal. mnwithinbetweenF,22~ANOVA TableANOVA TableBetween (k groups) k-1 SSB(sum of squared deviations of group means from grand mean) SSB/k-1 Go toFk-1,nk-kchart Total variationnk-1 TSS(sum of squared deviations of observations from grand mean) , Source of variation d.f. Sum of squaresMean Sum of Squares F-statistic p-valueWithin(n individuals per group)nk-kSSW (sum of squared deviations of observations from their group mean) s2=SSW/nk-k knkSSWkSSB 1TSS=SSB + SSWANOVA=t-testANOVA=t-test222222)()()(npptsYXsYXBetween (2 groups)1SSB(squared difference in means) Squared difference in meansGo toF1, 2n-2Chart notice values are just (t 2n-2)2Total variation2n-1 TSS, Source of variation d.f. Sum of squaresMean Sum of Squares F-statistic p-valueWithin 2n-2 SSW equivalent to numerator of pooled variancePooled varianceANOVA summaryANOVA summaryA statistically significant ANOVA (F-test) only tells you that at least two of the groups differ, but not which ones differ.Determining which groups differ (when it’s unclear) requires more sophisticated analyses to correct for the problem of multiple comparisons…Question: Question: Why not just do Why not just do 3 pairwise ttests?3 pairwise ttests?Answer: because, at an error rate of 5% each test, this means you have an overall chance of up to 1-(.95)3= 14% of making a type-I error (if all 3 comparisons were independent)If you wanted to compare 6 groups, you’d have to do 6C2 = 15 pairwise ttests; which would give you a high chance of finding something significant just by chance (if all tests were independent with a type-I error rate of 5% each); probability of at least one type-I error = 1-(.95)15=54%.Multiple comparisonsMultiple comparisonsWith 18 independent comparisons, we have 60% chance of at least 1 false positive.Multiple comparisonsMultiple comparisonsWith 18 independent comparisons, we expect about 1 false positive.Correction for multiple Correction for multiple comparisonscomparisonsHow to correct for multiple comparisons post-hoc…-Bonferroni’s correction (adjusts p by most conservative amount; assuming all tests independent, divide p by the number of tests)- Holm/Hochberg (gives p-cutoff beyond which not significant)- Tukey’s (adjusts p)- Scheffe’s (adjusts p)Non-parametric ANOVANon-parametric ANOVAKruskal-Wallis one-way ANOVA Extension of the Wilcoxon Rank-Sum test for 2 groups; based on ranks,Proc NPAR1WAY in SASLinear regressionLinear regressionOutlineOutline1. Simple linear regression and prediction2. Multiple linear regression and multivariate analysis3. Dummy coding categorical predictorsReview: what is “Linear”?Review: what is “Linear”?Remember this:Y=mX+B?BmReview: what’s slope?Review: what’s slope?A slope of 2 means that every 1-unit change in X yields a 2-unit change in Y.ExampleExampleWhat’s the relationship between gestation time and
View Full Document