Linear correlationIntroductionLinear CorrelationUseComputationA complete exampleCautionsConfidence Interval Belt GraphsSummaryFurther examplesIntroductory Statistics LecturesLinear correlationTesting two variables for a linear relationshipAnthony TanbakuchiDepartment of MathematicsPima Community CollegeRedistribution of this material is prohibitedwithout written permission of the author© 2009(Compile date: Tue May 19 14:51:18 2009)Contents1 Linear correlation 11.1 Introduction . . . . . . . 11.2 Linear Correlation . . . 3Use . . . . . . . . . . . . 6Computation . . . . . . 6A complete example . . 10Cautions . . . . . . . . . 12Confidence IntervalBelt Graphs . . . 131.3 Summary . . . . . . . . 171.4 Further examples . . . . 171 Linear correlation1.1 IntroductionMotivationIs there a relationship — correlation — between your height and . . . (1)your mother’s height? (2) your forearm height? (3) your work hours per week?(4) your commute distance?12 of 17 1.1 Introduction●●●●●●●●●●●●●●●●●●56 58 60 62 64 66 6865 70 75height_motherheight●●●●●●●●●●●●●●●●●●8 10 12 14 1665 70 75forearmheight●●●●●●●●●●●●●●●●●●0 10 20 30 40 5065 70 75work_hoursheight●●●●●●●●●●●●●●●●●●5 10 1565 70 75creditsheightMotivationExample 1. How much of a individual’s height is explained by their mother’sheight? Use our class data to determine if there is a linear relationship betweena mother’s height and their child’s height (your height) and how much variationin the child’s height can be explained by the mother’s height.R: he i gh t = c l a s s . data $ h e i g htR: he i gh t mother = c l a s s . dat a $ h e i g ht motherThe first few data points are:Use a scatter plot to see if a relationship existsR: p l ot ( h e i g h t mother , he ig h t , main = ”H e ig ht in i n c h e s ”)Anthony Tanbakuchi MAT167Linear correlation 3 of 17height height mother1 65 652 68 673 71 634 66 645 68 656 65 62●●●●●●●●●●●●●●●●●●56 58 60 62 64 66 6865 70 75Height in inchesheight_motherheightQuestion 1. Does it look like there is a linear relation ship? Draw a best fitline in the dataPaired data. Definition 1.1A set of (xi, yi) data where each pair is related. (Dependent samples.)ex: mother height, child height.1.2 Linear CorrelationCorrelation. Definition 1.2exists when two variables have a relationship with one another.Anthony Tanbakuchi MAT1674 of 17 1.2 Linear Correlation●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−10 −5 0 5linear correlationxy●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 40 5 10 15 20non−linear correlationxyAnthony Tanbakuchi MAT167Linear correlation 5 of 17●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−10 −5 0 5 10perfect positivexy●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−5 0 5 10strong positivexy●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−10 −5 0 5 10positivexy●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−1.5 −0.5 0.5 1.0 1.5no correlationxyAnthony Tanbakuchi MAT1676 of 17 1.2 Linear Correlation●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−10 −5 0 5 10perfect negativexy●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−10 −5 0 5 10strong negativexy●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−15 −5 0 5 10negativexy●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−1.5 −0.5 0.5 1.0 1.5no correlationxyUSEOften used to help answer:1. Is there a linear relationship between X and Y ?2. Can X be used to predict Y ?3. How much of the variation in X can be predicted with Y ?COMPUTATIONLinear correlation coefficient.Definition 1.3The linear correlation coefficient for a population is denoted with ρ. Wecan estimate ρ via a sample and calculate Pearson’s linear correlationcoefficient r:r =P(xi− ¯x)(yi− ¯y)(n − 1)sxsy(1)n is the number of pairs of data points (length of x or y).Anthony Tanbakuchi MAT167Linear correlation 7 of 17• Measures the strength of the linear relationship between x andy.• Larger values of |r| indicate stronger linear relationship.1• Positive r indicates positive slope, negative r indicates negativeslope.Examples of r●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−10 −5 0 5 10r = 1xy●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−5 0 5 10r = 0.99xy●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−10 −5 0 5 10r = 0.86xy●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−1.5 −0.5 0.5 1.0 1.5r = −0.1xy1Larger |r| does not indicate a steeper slope. We will find the slope later using regression.Anthony Tanbakuchi MAT1678 of 17 1.2 Linear Correlation●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−10 −5 0 5 10r = −1xy●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−10 −5 0 5 10r = −0.99xy●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−15 −5 0 5 10r = −0.88xy●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−1.5 −0.5 0.5 1.0 1.5r = −0.1xyProperties of r (ρ for populations)1. −1 ≤ r ≤ +12. r is scale invariant.3. r is invariant if x and y are interchanged.4. r only measures the strength of linear relationships.Coefficient of determination (explained variation).Definition 1.4r2is the proportion of linear variation in y that is explained by x.• 0 ≤ r2≤ 1• The closer r2is to 1 the stronger the linear relationship and like-wise the more variation in y that can be explained by x.Example of r2Anthony Tanbakuchi MAT167Linear correlation 9 of 17●●●●●●●●●●●●●●●●●●●●−4 −2 0 2 4−15 −5 0 5 10r = −0.88 r^2 = 0.78xyHypothesis test for linear correlation. Definition 1.5requirements (1) simple paired (x, y) random samples, (2) Pairs of(x, y) have a bivariate normal distribution2, (3) correlation islinear.null hypothesis ρ = 0 (no linear correlation)alternative hypothesis ρ 6= 0 ( a linear correlation exists3)Always make a scatter plot first to see if the
View Full Document