New version page

# Purdue PSY 20100 - Lecture notes

Pages: 4
Documents in this Course

5 pages

2 pages

4 pages

3 pages

3 pages

3 pages

2 pages

2 pages

6 pages

4 pages

4 pages

5 pages

5 pages

5 pages

5 pages

4 pages

158 pages

4 pages

4 pages

5 pages

4 pages

6 pages

4 pages

4 pages

3 pages

3 pages

4 pages

4 pages

3 pages

4 pages

Unformatted text preview:

Introduction to Statistics inPsychologyPSY 201Professor Greg FrancisLecture 12correlationDoes TV make you drink?INTERPRETATION OF rif we calculate a value of rHow do we know what it means?How do we compare r values fordifferent data sets?Rule of thumb|r| Interpretation0.9 to 1.0 Very high correlation0.7 to 0.9 High correlation0.5 to 0.7 Moderate correlation0.3 to 0.5 Low positive correlation0.0 to 0.3 Little if any correlation2SCALE OF rvalues of r are or dinal measures ofcorrelation• higher r values indicate larger corre-lation• equal spacings of r values may notindicate equal spacings of correlationthus, r =0.90 is not twice ascorrelated as r =0.45the difference in correlation betweenr =0.90 and r =0.75 is not the sameas the difference in correlation betweenr =0.60 and r =0.45.3VARIANCEwe can interpret r in terms of variancecorrelation coefficient indicatesrelationships between variablesalso indicates proportion ofindividual differences that can beassociated with individual differencesof another variable4VARIANCEthe idea is embedded in mathematicalmodelsassume you want to predict the finalexam score when you know the SATscoreline predicts score (could go in reversetoo)400 450 500 550 600 650 700Quantitative SAT Scores38404244Final Exam Grade5VARIATIONdeviation of a final exam score fromthe mean value can be due todeviation accounted for by SAT scores,or due to something else400 450 500 550 600 650 700Quantitative SAT Scores38404244Final Exam Grade6VARIATIONit turns out thatr2=s2as2y• s2y= the total variance in y• s2a= the variance in Y associatedwith variance in Xthus, r2is the proportion of variancein Y accounted for with variance in Xwe are skipping the mathematicaldetails(thank you!)called the coefficient of determination7CORRELATIONwe have studied the correlationalcoefficient called the Pearson rone limitation is that it only works forquantitative data (interval or ratio)sometimes we want to calculatecorrelations of ordinal (ranked) datae.g.does ranking near the top of SATscores correlate with ranking near thetop of Final Exam scores?we might not know the actual scores8Spearman ρspecial case of the Pearson rformula looks much differentρ =1−6Σd2n(n2− 1)where• n = number of paired ranks• d = difference between paired ranks9CALCULATIONTies take average rankX Y X rank Y rank d595 68 4 1 3520 55 8 7 1715 65 1 2 -1405 42 14 12 2680 64 2 3 -1490 45 11 10 1565 56 6.5 5.5 1580 59 5 4 1615 56 3 5.5 -2.5435 42 13 12 1440 38 12 14 -2515 50 9 9 0380 37 15 15 0510 42 10 12 -2565 53 6.5 8 -1.510CALCULATIONimportant points1. Ranking order must be the same forboth sets of data. (Highest-to-lowestor lowest-to-highest)2. Tied ranks take the average value ofthe rank positions.3. If we calculated the r value for theranks, it would equal ρ (as long asthere are no tied ranks)ρ =1−6(32+12+(−1)2+ ...)15(225 − 1)ρ =1−6(36.50)3360=0.9311SPEARMAN ρif we calculated r from the ranked datawe would get r =0.93when we calculated r from the rawscores we got r =0.90(text is misleading)graph of ranked dataρ =0.932 4 6 8 10 12 14SAT SCORE RANK51015FINAL EXAM GRADE RANK12CAUSALITYin behavioral sciences we often look forcausalities to try to determine how tomake decisionsSAT scores and Final Exam scores in astatistics class may be highlycorrelated but no one would claim thatdoing well on the SAT causessomeone to get a good grade instatisticsgetting a good grade and statisticsmay be caused by being smartgetting a high SAT score may becaused by being smartbeing smart causes both variables tobe highly correlated13CAUSATIONe.g.in some places (war) there is a highcorrelation between bullets in the brainand being deadthat does not mean that being deadcauses bullets to form in the brain(in fact it is probably the other wayaround)causation cannot be establishedthrough quantitative methodsestablishment of causation requiresunderstanding of the variables andtheir roles14TV, Teens, Drinking“High school students who watch lotsof television and music videos are morelikely to start drinking alcohol thanother youngsters while those who rentmovies are at less risk .” AssociatedPress 3 November 1998.found correlation between TV andmusic video watching and drinkingamong 9th gradersimplied that glamorization of drinkingon TV led to increased drinking inviewers15TV, Teens, Drinkingmay be true, but parents and peers are knownto have a very big influence, and this study didnot control for those influencescorrelation does not necessarily implycausationUSA Today got it right. they quoted ABC’sJulie Hoover, who likened the conclusions tosaying “gynecologists make women pregnantbecause... there are so many pregnant womenin their offices”Notice, this does not mean that TV programsdo not influence drinking. It just means that asingle correlation cannot demonstratecausality. (Some TV programs do influencedrinking behavior.)16CORRELATIONOther coefficients of correlation• Nominal data (φ, C coefficient, Cramer’sV , λ• Ordinal data (Rank-biserial, tetrachoric)• Interval/ratio and nominal (point-biserial)• Nonlinear (η)other uses of correlation• Inferential statistics (sampling the-ory)• Predict scores (linear regression)17CONCLUSIONSPearson rsizeinterpretationcoefficient of determination18NEXT TIMEprobabilityrulessignificanceWhy casinos make

View Full Document