Front Back
Correlation
Relationship between 2 variables Linear relationship
Correlation Coefficient
Measure used to express extent or strength of relationship Represented by r
Positive Correlation
0 < r < 1 Score high on 1 variable and score high on the other Score low on 1 variable and score low on the other Positive Slope 1.0= perfect correlation
Negative Correlation
-1 < r < 0 Score high on 1 variable and score low on the other Negative Slope -1= perfect correlation
Zero
0= no correlation No linear relationship
Linear Relationship
Looking for linear relationship others may exist (u-shaped) Correlation only measures linear
Correlation RULE
Correlation does not = Causation!!! Just means that there was a relationship
r < 0.29
Small correlation/ weak relationship
r 0.3-0.49
Medium correlation/ relationship
r 0.5 - 1.0
Large correlation/ strong relationship
Scatter diagram
Graphic means to show data points and correlation and (later) regression
Centroid
(Mean of X, Mean of Y) This will be the central point (X,Y) of 2 variables
When to use Pearson r
Interval and ratio data
Pearson r Z Score method
r = sum (ZxZy)/N Good if you already have Z scores Answer must be between -1 and 1
Correlation Coefficient Pearson r Raw score method
Will have to find 8 sums
Covariace
Numerator Degree to which 2 variables share common variance Can be a negative number on top but not on bottom
High Covariance
More linear Covariance closer to +/- 1
Low Covariance
Less linear Closer to 0
If r = +/- 1
All data fall in a line
If r < 1
Data are scattered
3 types of Variation
Total= explained (r2) + unexplained (k2)
Total Variation
Graph with all arrows pointing at the mean line (middle horizontal line)
Explained Variation
Double arrows pointing from Mean line to regression line in 2 spots on either side of the centroid
Unexplained Variation
All arrows point toward regression line Weighs how far away data is from where the regression line is
If r = +/- 1
All is explained
If r= 0
All is unexplained
r2 Coefficient of Determination
The proportion of 1 variable explained by the other
k2 Coefficient of non-determination
proportion of 1 variable not explained by the other
toal = 1 or 100%
1= r2 + k2 k2= 1-r2
Cautions with pearson r
Measures linearity so low r means not linear; could still have non-linear relationship Distribution need not be normal but must be unimodal and skewed If truncated will get spuriously low r (r is always lower when you truncate the data)
Spearman r
Used with ordinal data rs Both variables must be rank ordered
Non parametric test
looks at ranks only
Parametric Test
Uses actual numbers
D
rank x- rank y
Sum of D
0
Tied Scores and Spearman r
If tied must take this into account to be fair take the mean of the tied ranks and assign mean rank to both If there are more than 2 take middle number
Correlation Matrix
Table to visualize many correlations Correlate the most= Number closest to 1 or -1 Correlate the least= Number closest to 0
If r is 0 or very low
Does not mean no correlation at all it means that there is no linear correlation
Share common variance
If high relationship= linear and closer to 1 /-1 If Low relationship= less linear and closer to 0
X Rank
High to low
Regression
Allows you to predict relationships
Regression Analysis Equation
Y= a + by X X, Y= Variables by= slope, (m), (tilt) a= y intercept (b) (where it hits y-axis)
r = +/- 1
It's easy to predict and draw the line
r < +/- 1
You must draw a "best fit" line
Properties of the regression line
Squared deviations around the line are minimal Sum deviations = 0 New symbols X' and Y' are for predictions
To find the regression line equation
Use the formula with 3 formulas in it
To draw the regression line for Y= a + by X
1.) Pick 2 reasonable values for X 2.) Put in the equation and solve for Y 3.) plot the 2 pairs of X,Y points 4.) Connect the dots with a line
X= a +bx Y
In regression analysis you can also find X= a + bx Y and get 2 regression lines that have certain relationship r = 1 Line is on the line r = 0-75 line crosses through the line narrow r = 0.25 line is quarter through r = 0 the line is perpendicular
r = +/- 1
Superimposed
r = 0
Perpendicular
Intersection point
Mean of x, Mean of Y = Centroid
Standard Error of the Estimate
sesty Estimate of the standard deviation of data around the regression line k2 was a version of this but not really in terms of standard deviation
r = +/- 1 and sesty
sesty = 0 means no errors/ deviation
r = 0 and sesty
means sesty is maximal a lot of deviation
Larger sesty means
Less accurate predictions
Y' and Y true
Recall Y' was a prediction not a fact Using sesty we can find an interval where are 68% sure that true Y will be
Sesty and Y true
are influenced by magnitude of X and Y
Variance and sesty
Low variance > better/ lower setsty > better Y true
Homoscedasticity
Where variance of 1 variable is constant at all levels of the other variable
Heteroscedasticity
Where variance of 1 variable is not constant at all levels of the other variable
Post Hoc Fallacy
Assuming a cause and effect relationship from correlation data

Access the best Study Guides, Lecture Notes and Practice Exams

Login

Join to view and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?