Unformatted text preview:

9. Linear Regression and CorrelationLinear RelationshipsExample: Economic Level and CO2 EmissionsSlide 4Slide 5Effect of variable coding?Probabilistic ModelsSlide 8Estimating the linear equationData (some)Slide 11Slide 12Example: What causes b > 0 or b < 0?Motivation for formulas:Results for anxiety/externalizing data setInterpretationsResiduals (prediction errors)Prediction equation has “least squares” propertyThe Linear Regression ModelSlide 20Slide 21Software shows sums of squares in an “ANOVA” (analysis of variance) tableExample: (text, p. 267, study in undergraduate research journal by student at Indiana Univ. of South Bend)Slide 24Slide 25Measuring association: The correlationProperties of the correlationExamplesCorrelation implies that predictions regress toward the meanr2 = proportional reduction in errorSlide 31Example: high school GPA and TV watchingProperties of r2Inference about slope (b) and correlation ()Test of independence of x and yExample: Anxiety/externalizing behavior revisitedConfidence interval for slope Slide 38What if reverse roles of variables? (Now, y = externalizing behavior, x = anxiety Prediction equation changes Correlation stays same Result of t test is sameSome commentsSlide 41Example of effect of outlierSlide 43Software reports SS values, test results in an ANOVA (analysis of variance) table The F statistic in the ANOVA table is the square of the t statistic for testing H0:  = 0, and it has the same P-value as for the two-sided test. This is a more general statistic that we’ll need when a hypothesis contains more than one regression parameter (Chap. 11).9. Linear Regression and CorrelationData: y: a quantitative response variable x: a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical)For example (Wagner et al., Amer. J. Community Health, vol. 16, p. 189) y = mental health, measured with Hopkins Symptom List (presence or absence of 57 psychological symptoms)x = stress level (a measure of negative events weighted by the reported frequency and subject’s subjective estimate of impact of each event)We consider:•Is there an association? (test of independence)•How strong is the association? (uses correlation)•How can we describe the nature of the relationship, e.g., by using x to predict y? (regression equation, residuals)Linear RelationshipsLinear Function (Straight-Line Relation): y =  + x expresses y as linear function of x with slope  and y-intercept  For each 1-unit increase in x, y increases units > 0  Line slopes upward  = 0  Horizontal line  < 0  Line slopes downwardExample: Economic Level and CO2 Emissions OECD (Organization for Economic Development, www.oecd.org): Advanced industrialized nations “committed to democracy and the market economy.” oecd-data file (from 2004) on p. 62 of text and at text website www.stat.ufl.edu/~aa/social/•Let y = carbon dioxide emissions (per capita, in metric tons) Ranges from 5.6 in Portugal to 22.0 in Luxembourg mean = 10.4, standard dev. = 4.6• x = GDP (thousands of dollars, per capita) Ranges from 19.6 in Portugal to 70.0 in Luxembourg mean = 32.1, standard dev. = 9.6The relationship between x and y can be approximated by y = 0.42 + 0.31x.•At x = 0, predicted CO2 level y = •At x = 39.7 (value for U.S.), predicted CO2 level y = (actual = 19.8 for U.S.)•For each increase of 1 thousand dollars in per capita GDP, CO2 use predicted to increase by metric tons per capita •But, this linear equation is just an approximation, and the correlation between x and y for the OECD nations was 0.64, not 1.0. Scatterplot on next page.Effect of variable coding?Slope and intercept depend on units of measurement.•If x = GDP measured in dollars (instead of thousands of dollars), then y = because a change of $1 has only 1/1000 the impact of a change of $1000 (so, the slope is multiplied by 0.001).•If y = CO2 output in kilograms instead of metric tons (1 metric ton = 1000 kilograms), with x in dollars, then y = Suppose x changes from U.S. dollars to British pounds and 1 pound = 2 dollars. What happens?Probabilistic Models•In practice, the relationship between y and x is not “perfect” because y is not completely determined by x. Other sources of variation exist. –We let  +  x represent the mean of y-values, as a function of x.– We replace equation y =  +  x by E(y) =  +  x (for population)(Recall E(y) is the “expected value of y”, which is the mean of its probability distribution.) e.g., if y = income, x = no. years of education, we regard E(y) =  + (12) as the mean income for everyone in population having 12 years education.•A regression function is a mathematical function that describes how the mean of the response variable y changes according to the value of an explanatory variable x.•A linear regression function is part of a model (a simple representation of reality) for summarizing a relationship. •In practice, we use data to check whether a particular model is plausible (e.g., by looking at a scatterplot) and to estimate model parameters.Estimating the linear equation•A scatterplot is a plot of the n values of (x, y) for the n subjects in the sample•Looking at the scatterplot is first step of analysis, to check whether linear model seems plausibleExample: Are externalizing behaviors in adolescents (e.g., acting out in negative ways, such as causing fights) associated with feelings of anxiety?(Nolan et al., J. Personality and Social Psych., 2003)Data (some)Subject Externalizing (x) Anxiety (y) 1 9 37 2 7 23 3 7 26 4 3 21 5 11 42 6 6 33 7 2 26 8 6 35 9 6 23 10 9 28As exercise, conduct analyses with x, y reversed• Variables • Anxiety (y) Externalizing (x)• mean 29.4 6.6 • std. dev. 7.0 2.7•How to choose the line that “best fits” the data?–Criterion: Choose line that minimizes sum of squared vertical distances from observed data points to line. This is called the least squares prediction equation.Solution (using calculus): Denote estimate of  by a, estimate


View Full Document

UF STATISTICS 101 - Linear Regression and Correlation

Download Linear Regression and Correlation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Linear Regression and Correlation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Linear Regression and Correlation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?