Course Multiple Regression Topic Bivariate Correlation Regression 1 BIVARIATE CORRELATION REGRESSION This semester we will discuss multiple regression correlation analysis MRC MRC is a flexible analysis strategy that enables us to examine the association among a continuous and at times categorical dependent variable and multiple continuous and or categorical independent variables Indeed analysis of variance when examined in terms of the general linear model can be considered a special case of MRC When we specify the models used in MRC we assume that the independent variable s IV cause i e produce changes in the dependent variable DV Keep in mind however that MRC only indicates whether there is an association among the variables The extent to which we can infer that the direction of the association flows from the IV to the DV and is causal in nature is strictly limited by the methodology used to collect the data We can more comfortably infer causal direction to the extent to which data were collected using an experimental method in which the a independent variables were manipulated b participants or observations were randomly assigned to levels of the independent variable and c extraneous i e confounding variables were controlled To the extent which the above criteria a b and c were not satisfied the less comfortably we can infer causation and can simply talk about associations among variables Furthermore we must keep in mind that the statistical analyses are performed typically on sample data in an attempt to make inferences about population level associations Consequently issues of sampling distributions and hypothesis testing continue to be relevant Today we will discuss the simple situation in which we examine the relationship between two variables It s likely that you covered much of today s material in an introductory statistics class in which case this will serve as a review As we will discuss in future classes bivariate associations can be very misleading when the data are not collected using a strict experimental method because a portion of the relationship between the independent variable or predictor and dependent variable or criterion is usually shared with other predictor variables Procedures for handling such complications must wait another day BRIEF REVIEW OF BASIC STATISTICS AND NOTATION Before venturing into the world of bivariate associations it might be fruitful to quickly review standard deviation and z scores Such statistics are frequently used in MRC Recall that there are separate yet related formulas for determining the standard deviation i e variation of a population and sample s X 2 N s X X 2 n 1 The sample formula is typically used when we wish to estimate the population standard deviation based on sample data The major difference between formulas is that the sample uses n 1 in the denominator as opposed to N to adjust for the tendency of samples to underestimate population variability The numerator of the formulas which sum the squared deviations of the scores of a distribution from the mean of the distribution is often abbreviated SS So standard deviation can also be expressed as squared SS N and s SS Variance is simply the standard deviation n 1 Course Multiple Regression Topic Bivariate Correlation Regression 2 The authors of our textbook Cohen and Cohen use slightly different notation to represent standard deviation They use sd to represent the population standard deviation and sd to represent the sample standard deviation i e when estimating from the sample to the population Furthermore the authors represent deviations scores i e X X with a lower case letter e g x Consequently SS is represented as x When formulas require that standard deviation of variable Y be divided by the standard deviation of variable X it need not matter whether we use the sample or population formulas when samples sizes are equal because the denominators of each formula cancel and we essentially divide by SS SS y SS x Ny Nx SS y Ny Nx SS x SS y SS x SS y and SS x ny 1 nx 1 SS y ny 1 nx 1 SS x SS y SS x When comparing variables that are measured on different scales e g Celsius and miles per hour it is often useful to transform the variables into a Z score metric which indicates the number of standard deviations by which a score deviates from the mean of the distribution z X If all of the scores in a distribution are transformed into z scores the transformed zdistribution will have a mean 0 standard deviation 1 and the shape of the transformed distribution will have the same shape as the original distribution Now let s proceed to our discussion of bivariate associations A DATA SET For the next few classes we will use the bogus data set on academic salary provided by Cohen and Cohen 1983 p 99 The data set is reproduced in the following table Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Salary 18000 19961 19828 17030 19925 19041 27132 27268 32483 27029 25362 28463 32931 28270 38362 PhD 1 2 5 7 10 4 3 8 4 16 15 19 8 14 28 Pubs 2 4 5 12 5 9 3 1 8 12 9 4 8 11 21 Sex 0 0 1 1 0 0 1 0 0 1 0 0 0 0 0 Citations 1 0 1 0 0 1 0 1 0 4 0 3 5 0 3 PhD years since PhD Pubs number of publications Sex 0 male 1 female Citations number of times a publication was cited Data are from Cohen Cohen 1983 Applied multiple regression correlation analysis for the behavioral sciences p 99 Hillsdale New Jersey Course Multiple Regression Topic Bivariate Correlation Regression 3 GRAPHIC REPRESENTATION OF BIVARIATE RELATIONS A useful and relatively easy way of assessing the nature of a bivariate relationship is to visually examine the relationship in a scatter plot For example we can visualize the relationship between salary and number of publications by plotting a given individuals salary by his her number of publications The following tables creates a SAS data set and uses the proc plot procedure to plot the salary by publications data Plot Procedure in SAS proc plot plot salary pubs run Such a procedure plots salary on the Y axis and publications on the X axis Simply reverse the order of variables to reverse the axis on which the variables are plotted Additional information about the plot procedure can be found using the help feature in SAS e g click Help click SASSystem Help click index type plot procedure and double click The following table contains the display the corresponding SPSS code and the output in SAS and SPSS Plot Procedure in SPSS DATA LIST salary 1 5 phd 7 8 pubs 10 11 sex 13 citations 15 BEGIN
View Full Document