Unformatted text preview:

ANOVA and linear regression July 15, 2004ANOVA for comparing means between more than 2 groupsANOVA (ANalysis Of VAriance)ANOVA AssumptionsThe “F-test”PowerPoint PresentationGroup means and standard deviationsThe F-TestThe F-distributionSlide 10ANOVA TableANOVA=t-testANOVA summaryQuestion: Why not just do 3 pairwise ttests?Multiple comparisonsSlide 16Correction for multiple comparisonsNon-parametric ANOVALinear regressionOutlineReview: what is “Linear”?Review: what’s slope?ExampleBirth-weight depends on gestation time (hypothetical data)Linear regression equation:PredictionSlide 27Slide 28At 30 weeks…Slide 30At 30 weeks…And, if X=20, 30, or 40…If X=20, 30, or 40…Mean values fall on the lineAssumptions (or the fine print)Non-homogenous varianceA ttest is linear regression!Multiple Linear RegressionANOVA is linear regression!Example: ANOVA = linear regressionFunctions of multivariate analysis:Multiple linear regression caveatsOther types of multivariate regressionReading for this weekNote: Midterm next weekANOVA and linear regressionANOVA and linear regressionJuly 15, 2004July 15, 2004ANOVAANOVAfor comparing means between for comparing means between more than 2 groupsmore than 2 groupsANOVA ANOVA ((ANANalysis alysis OOf f VAVAriance)riance)Idea: For two or more groups, test difference between means, for quantitative normally distributed variables. Just an extension of the t-test (an ANOVA with only two groups is mathematically equivalent to a t-test).Like the t-test, ANOVA is “parametric” test—assumes that the outcome variable is roughly normally distributed with a mean and standard deviation (parameters) that we can estimateANOVA ANOVA AssumptionsAssumptionsAssumptions: Normally distributed outcome variable; homogeneity of variances (like t-test)The “F-test”The “F-test”groupswithinyVariabilitgroupsbetweenyVariabilitF Is the difference in the means of the groups more than background noise (=variability within groups)?amenorrheic oligomenorrheic eumenorrheic0.70.80.91.01.11.2SPINEBetween group variationSpine bone density vs. Spine bone density vs. menstrual regularity menstrual regularity Within group variabilityWithin group variabilityWithin group variabilityGroup means and standard Group means and standard deviationsdeviationsAmenorrheic group (n=11):–Mean spine BMD = .92 g/cm2–standard deviation = .10 g/cm2Oligomenorrheic group (n=11)–Mean spine BMD = .94 g/cm2–standard deviation = .08 g/cm2Eumenrroheic group (n=11)–Mean spine BMD =1.06 g/cm2–standard deviation = .11 g/cm2The F-TestThe F-Test063.)13)97.06.1()97.94(.)97.92(.(*1122222xbetweennss0095.)11.08.10(.3122222 savgswithin6.60095.063.2230,2withinbetweenssFThe size of the groups.The difference of each group’s mean from the overall mean.Between-group variation. The average amount of variation within groups. Each group’s variance.Large F value indicates that the between group variation exceeds the within group variation (=the background noise).The F-distributionThe F-distributionThe F-distribution is a continuous probability distribution that depends on two parameters n and m (numerator and denominator degrees of freedom, respectively):The F-distributionThe F-distributionA ratio of sample variances follows an F-distribution: 22220::withinbetweenawithinbetweenHHThe F-test tests the hypothesis that two sample variances are equal. F will be close to 1 if sample variances are equal. mnwithinbetweenF,22~ANOVA TableANOVA TableBetween (k groups) k-1 SSB(sum of squared deviations of group means from grand mean) SSB/k-1 Go toFk-1,nk-kchart Total variationnk-1 TSS(sum of squared deviations of observations from grand mean) , Source of variation d.f. Sum of squaresMean Sum of Squares F-statistic p-valueWithin(n individuals per group)nk-kSSW (sum of squared deviations of observations from their group mean) s2=SSW/nk-k knkSSWkSSB 1TSS=SSB + SSWANOVA=t-testANOVA=t-test222222)()()(npptsYXsYXBetween (2 groups)1SSB(squared difference in means) Squared difference in meansGo toF1, 2n-2Chart notice values are just (t 2n-2)2Total variation2n-1 TSS, Source of variation d.f. Sum of squaresMean Sum of Squares F-statistic p-valueWithin 2n-2 SSW equivalent to numerator of pooled variancePooled varianceANOVA summaryANOVA summaryA statistically significant ANOVA (F-test) only tells you that at least two of the groups differ, but not which ones differ.Determining which groups differ (when it’s unclear) requires more sophisticated analyses to correct for the problem of multiple comparisons…Question: Question: Why not just do Why not just do 3 pairwise ttests?3 pairwise ttests?Answer: because, at an error rate of 5% each test, this means you have an overall chance of up to 1-(.95)3= 14% of making a type-I error (if all 3 comparisons were independent)If you wanted to compare 6 groups, you’d have to do 6C2 = 15 pairwise ttests; which would give you a high chance of finding something significant just by chance (if all tests were independent with a type-I error rate of 5% each); probability of at least one type-I error = 1-(.95)15=54%.Multiple comparisonsMultiple comparisonsWith 18 independent comparisons, we have 60% chance of at least 1 false positive.Multiple comparisonsMultiple comparisonsWith 18 independent comparisons, we expect about 1 false positive.Correction for multiple Correction for multiple comparisonscomparisonsHow to correct for multiple comparisons post-hoc…-Bonferroni’s correction (adjusts p by most conservative amount; assuming all tests independent, divide p by the number of tests)- Holm/Hochberg (gives p-cutoff beyond which not significant)- Tukey’s (adjusts p)- Scheffe’s (adjusts p)Non-parametric ANOVANon-parametric ANOVAKruskal-Wallis one-way ANOVA Extension of the Wilcoxon Rank-Sum test for 2 groups; based on ranks,Proc NPAR1WAY in SASLinear regressionLinear regressionOutlineOutline1. Simple linear regression and prediction2. Multiple linear regression and multivariate analysis3. Dummy coding categorical predictorsReview: what is “Linear”?Review: what is “Linear”?Remember this:Y=mX+B?BmReview: what’s slope?Review: what’s slope?A slope of 2 means that every 1-unit change in X yields a 2-unit change in Y.ExampleExampleWhat’s the relationship between gestation time and


View Full Document

Stanford STATS 210 - Lecture 8

Download Lecture 8
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 8 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 8 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?