DOC PREVIEW
UGA STAT 4210 - Chapter 3

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Chapter 3 – Association, Correlation, and RegressionChapter 2 described univariate properties and descriptives. This chapter looks at describing bivariate relationships (that is, relationships between two variables).Definition: an association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable.We assess association between two categorical variables by comparing conditional proportions. If the conditional proportions are different for different values of the conditioned variable, the two variables have an association.Definition: a conditional proportion is when the relative frequency of observations for the value of one variable is conditioned on the value of the other variable.This is made easier by first displaying the data in a contingency table, which displays categoricalvariables in rows and columns. The entries are the frequencies of observations in the sample corresponding to each combination of categories.Example:A sample of 100 pet owners was taken, and their gender and preference for dogs or cats was observed, resulting in the following contingency table. Is there an association between gender and the type of pet preferred?Dogs Cats TotalMales 42 10 52Females 9 39 48Total 51 49 100Proportion of men that own dogs: 42/52 = 0.808Proportion of women that own dogs: 9/48 = 0.188The above proportions are conditioned on gender. The total proportion people that own dogs is 51/100 = 0.51.Because the proportions of men and women that own dogs are not the same, we say there is an association between a person’s gender and preference for pet type. If there weren’t a difference inpreference for dogs between genders, the conditional proportions would be the same and the two variables (gender and preference) would be independent.When we have quantitative variables, we assess their association using the correlation coefficient, r, which is a measure of strength and direction of linear association. It summarizes the direction of the association via its sign (positive or negative) and the strength of the relationship by its magnitude (distance from zero). -1 ≤ r ≤ 1The correlation coefficient can be calculated from standardized values or from raw scores.r=1n−1∑(zxzy)=1n−1∑(x−´xsx)(y− ´ysy)For the cricket chirp data, r = 0.835, which indicates a strong, positive linear association betweenthe number of chirps/sec and the temperature.The correlation coefficient can’t do anything beyond that. It can’t predict anything, it doesn’t explain anything. It just tells about the strength of the relationship. If you want to predict, you have to fit a regression equation, which is of the form^y=a+bx,where y is the response variable (the thing you are predicting), x is the explanatory variable (the predictor), a is the y-intercept, and b is the slope.For the cricket chirp data, ^y=− 0.309+0.2119 xb = slope = 0.2119 chirps/sec per Fahrenheit degreer = 0.835Similarity? Difference? They are both positive, which makes sense, because they both capture the direction of association between the two variables.We use least squares estimation equations to get a and b:xi(¿−´x)−( yi− ´y)∑¿¿¿b=rsysx=¿a=´y−b ´xwhere s is the standard deviation (of x or y) and r is the correlation between x and y.Definition: extrapolation occurs when one uses the regression equation to predict values of the response value using values of the explanatory variable that fall outside the range of those used to create the regression line.(See birthweights data set and plot.)In the birthweights plot, response variable (y) is the weight of a newborn baby (in grams) and theexplanatory variable (x) is the weight of the mother at time of the baby’s conception (in pounds). The regression equation is:^y=2369.672+4.429 xa = 2369.672 gramsb = 4.429 grams/lbThe y-intercept, by definition, is the value of y when x = 0. Therefore, we interpret it in this case as: a mother who weighs 0lbs at the time she conceives her babies is expected to have a baby that weighs 2469.672lbs at birth.This is, technically, a correct interpretation. It is also physically impossible. That is the danger of extrapolation. We didn’t observe any mothers with a weight of 0 (in fact, we couldn’t), so it’s irresponsible to use the regression equation to predict that far away from the values we did collect.In the case of the y-intercept, this often happens. But we have to include it in the equation (that is, we always estimate it) to get the best fit line for the


View Full Document

UGA STAT 4210 - Chapter 3

Documents in this Course
Load more
Download Chapter 3
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chapter 3 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter 3 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?