BIOM301 Chapter 13 Linear Correlation and Regression Analysis REVIEW LINEAR CORRELATION A correlation exists between 2 variables when one of them is related to the other in a linear manner o Measures the direction and strength of the linear relationship between 2 quantitative variables o Linear Correlation Coefficient r Correlation Goal Measuring the linear dependency of one variable on another o Is there a relationship What direction is the relationship How strong is the relationship Mathematically we are interested in the covariance between the 2 variables Ex Graph of Bivariate Data To think about covariance we need to find the central point in the data set o Avg the y values and avg the x values avg y avg x is your central point o This forms four quadrants We then look at how each point varies from the center also called the Centroid o Either or Calculating for covar x y o The average variability of a data set Ex Data and Covariance From Left to Right o to o to o No trend No Correlation What is r It asks how large is the covariance strength of the relationship relative to the variability in the data set o If it is then you have a significant correlation Hypothesis Testing Asking if there is a relationship between bivariate quantitative data in the population Sample statistic r Population Parameter o Can be a 1 tailed directional or 2 tailed nondirectional question o Test with a t test df n 2 EX For the state of Maryland you want to know if there is a correlation between the average test score at a school and the number of students that graduate You have data for all high schools in 2010 State your Question ID your population o For Maryland high schools is there a relationship between test scores and graduate numbers in 2010 State your null and alternative hypotheses define symbols o Ho 0 o Ho 0 where 2010 Maryland high school correlation between average test scores and number of students graduating Analyze your data statistically o You will be given a t value and are not expected to calculate it yourself You do need to know df n 2 o For this problem assume t 0 76 n 237 tcrit for df 237 alpha 0 05 in 2 tails 1 96 State your Conclusion o For the state of Maryland test scores and number of graduates in a school are not significantly correlated t 0 76 n 237 p 0 05 o Results would also include a graph with correlation coefficient value and NO LINE Assumptions Good Samples Normality of distribution of values 2 way normal distribution of values for both x and y values LINEAR REGRESSION Goal predict a value for y the output or dependent variable given a value of x input or independent variable Regression Line Equation o For any value of x you can predict a value of y given the regression equation y b 0 b x 1 Hypotheses Sample statistics bo intercept and b1 slope o Population parameters are o and 1 More common to test on slope is Y a function of X o Can do 1 or 2 tailed tests o Uses t distribution with df n 2 Calculation of b1 SS xy SS x The Error Estimating the variability in the data set Line of Best Fit minimizes deviation in Y direction o E g how much of the variability is data set is explained by variability in X EX You want to know if commute time is a function of distance from work for the acme pencil sharpener company You have bivariate data for 15 employees State your Question ID your population o For acme employees is the length the commute to work a function of distance to work State your null and alternative hypotheses define symbols o Ho 1 0 o Ho 1 0 where 1 slope relating length of commute as a function of distance to work at the Acme company Analyze your data statistically o You will be given a t value and are NOT expected to calculate it yourself You DO need to know df n 2 o For this problem assume t 3 10 n 15 tcrit for df 15 2 alpha 0 05 in 2 tails 2 16 State your Conclusion o For the Acme company time of commute significantly increases as a function of the distance to work t 3 10 n 15 p 0 05 o Results would also include a graph with the complete equation for the line the R2 value and WITH a regression line shown Assumptions Good Samples Y is a function of X Normal distribution of Y values for every X For Regression we also Typically include the standard error of the slope SEb1 Why o Small changes in the slope estimate can mean big changes in the relationship AND can add a Confidence Belt o range of values we are some level of confident sure will include the true population values What Should You Do Read Chapter 13 You do NOT need to be able to calculate statistics or estimates so problems in book not very helpful You do need to be able to complete analyses if given calculated values
View Full Document
Unlocking...