Basic Analysis of Variance and the General Linear ModelWhy is it called analysis of variance anyway?Slide 3General Linear Model (GLM)Slide 5Slide 6GLMSlide 8Slide 9Slide 10Slide 11Slide 12Analysis – deviation approachSlide 14Slide 15Slide 16Slide 17Slide 18Slide 19Analysis – computational approachSlide 21Analysis – regression approachSlide 23Slide 24Slide 25Slide 26Slide 27Slide 28Statistical Inference and the F-testSlide 30Slide 31Slide 32Assumptions of the analysisSlide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Basic Analysis of Variance and the General Linear ModelPsy 420Andrew AinsworthWhy is it called analysis of variance anyway?If we are interested in group mean differences, why are we looking at variance?t-test only one place to look for variabilityMore groups, more places to lookVariance of group means around a central tendency (grand mean – ignoring group membership) really tells us, on average, how much each group is different from the central tendency (and each other)Why is it called analysis of variance anyway?Average mean variability around GM needs to be compared to average variability of scores around each group meanVariability in any distribution can be broken down into conceptual parts: total variability = (variability of each group mean around the grand mean) + (variability of each person’s score around their group mean)General Linear Model (GLM)The basis for most inferential statistics (e.g. 420, 520, 524, etc.)Simple form of the GLMscore=grand mean + independent variable + errorY m a e= + +General Linear Model (GLM)The basic idea is that everyone in the population has the same score (the grand mean) that is changed by the effects of an independent variable (A) plus just random noise (error)Some levels of A raise scores from the GM, other levels lower scores from the GM and yet others have no effect.General Linear Model (GLM)Error is the “noise” caused by other variables you aren’t measuring, haven’t controlled for or are unaware of. Error like A will have different effects on scores but this happens independently of A.If error gets too large it will mask the effects of A and make it impossible to analyze the effects of AMost of the effort in research designs is done to try and minimize error to make sure the effect of A is not “buried” in the noise.The error term is important because it gives us a “yard stick” with which to measure the variability cause by the A effect. We want to make sure that the variability attributable to A is greater than the naturally occurring variability (error)GLMExample of GLM – ANOVA backwardsWe can generate a data set using the GLM formulaWe start off with every subject at the GM (e.g. =5) a1 a2 Case Score Case Score s1 s2 s3 s4 s5 5 5 5 5 5 s6 s7 s8 s9 s10 5 5 5 5 5GLMThen we add in the effect of A (a1 adds 2 points and a2 subtracts 2 points) a1 a2 Case Score Case Score s1 s2 s3 s4 s5 5 + 2 = 7 5 + 2 = 7 5 + 2 = 7 5 + 2 = 7 5 + 2 = 7 s6 s7 s8 s9 s10 5 – 2 = 3 5 – 2 = 3 5 – 2 = 3 5 – 2 = 3 5 – 2 = 3 135aY 215aY 12245aY 2245aY 17aY 33aY GLMChanges produced by the treatment represent deviations around the GM 2 2 22 2 2 2( ) [(7 5) (3 5) ]5(2) 5( 2) 5[(2) ( 2) ] 40jn Y GM nor- = - + - =+ - + - =�GLMNow if we add in some random variation (error) a1 a2 Case Score Case Score SUM s1 s2 s3 s4 s5 5 + 2 + 2 = 9 5 + 2 + 0 = 7 5 + 2 – 1 = 6 5 + 2 + 0 = 7 5 + 2 – 1 = 6 s6 s7 s8 s9 s10 5 – 2 + 0 = 3 5 – 2 – 2 = 1 5 – 2 + 0 = 3 5 – 2 + 1 = 4 5 – 2 + 1 = 4 135aY 215aY 50Y 12251aY 2251aY 2302Y 17aY 33aY 5Y GLMNow if we calculate the variance for each group:The average variance in this case is also going to be 1.5 (1.5 + 1.5 / 2)222221( )155151.51 4aNYYNsN-- -= = =-��122221( )3525151.51 4aNYYNsN-- -= = =-��GLMWe can also calculate the total variability in the data regardless of treatment groupThe average variability of the two groups is smaller than the total variability.22221( )50302105.781 9NYYNsN-- -= = =-��Analysis – deviation approachThe total variability can be partitioned into between group variability and error.( ) ( ) ( )ij ij j jY GM Y Y Y GM- = - + -Analysis – deviation approachIf you ignore group membership and calculate the mean of all subjects this is the grand mean and total variability is the deviation of all subjects around this grand meanRemember that if you just looked at deviations it would most likely sum to zero so…Analysis – deviation approach( ) ( ) ( )2 2 2/ij j ij ji j j i jtotal bg wgtotal A S AY GM n Y GM Y YSS SS SSSS SS SS- = - + -= += +�� � ��Analysis – deviation approach A Score 2ijY GM 2jY GM 2ij jY Y a1 9 7 6 7 6 16 4 1 4 1 (7 – 5)2 = 4 4 0 1 0 1 a2 3 1 3 4 4 4 16 4 1 1 (3 – 5)2 = 4 0 4 0 1 1 50Y 2302Y 52 8 12 5Y 5(8) 40n 52 = 40 + 12Analysis – deviation approachdegrees of freedomDFtotal = N – 1 = 10 -1 = 9DFA = a – 1 = 2 – 1 = 1DFS/A = a(S – 1) = a(n – 1) = an – a = N – a = 2(5) – 2 = 8Analysis – deviation approachVariance or Mean squareMStotal = 52/9 = 5.78MSA = 40/1 = 40MSS/A = 12/8 = 1.5Test statisticF = MSA/MSS/A = 40/1.5 = 26.67Critical value is looked up with dfA, dfS/A and alpha. The test is always non-directional.Analysis – deviation approachANOVA summary tableSource SS df MS F A 40 1 40 26.67 S/A 12 8 1.5 Total 52 9Analysis – computational approachEquationsUnder each part of the equations, you divide by the number of scores it took to get the number in the numerator( )222 2Y TYTSS SS Y YN an= = - = -�� �( )22jAaTSSn an= -��( )22/jS AaSS Yn= -���Analysis – computational approachAnalysis of sample problem250302 5210TSS = - =2 2 235 15 50405 10ASS+= - =2 2/35 15302 125S ASS+= - =Analysis – regression approach Levels of A Cases Y X YX a1 S1 S2 S3 S4 S5 9 7 6 7 6 1 1 1 1 1 9 7 6 7 6 a2 S6 S7 S8 S9 S10 3 1 3 4 4 -1 -1 -1 -1 -1 -3 -1 -3 -4 -4 Sum 50 0 20 Squares Summed 302 10 N 10 Mean 5Analysis – regression approachY = a + bX + ee …
View Full Document