DOC PREVIEW
UCLA STATS 10 - Regression Analysis Part2

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

4/21/2012 10:07:00 PM Lecture 6-Chapter 4 part 2 Regression Analysis: Exploring Association Between Numeric Variables Review of Last Time:  a scatterplot is a graphical display of 2 numeric variables  correlation is the numeric summary for 2 numeric variables Modeling Statistical Trends  By using 2 variables( such as x and y) a tool for graphical examination (such as a scatterplot) and a measure of association (such as a correlation) we can ask whether one variable can be used to predict another. Models are equations that inform us about the validity of the data The Regression Line  the regression line is used to make predictions  the regression line is a way to summarize linear relationships  by fitting a line to the data we can use this line to make predictions  the simplest form is a straight line with an intercept and a slope - the equations similar to equations seen in algebra ( y=mx +b) in which y represents the outcome/predicted variable, m is the slope, and b is the intercept. However in statistics we will rewrite this equation to (y=a + bx). Y is the predicted variable, a is the intercept, and b is the slope. o Y is called the predicted variable because the line generates PREDICTIONS about its values not the actual values  These predictions come close to real world numbers but are not necessarily exact, there is a percentage of error. The Least Squares Line  the regression line is the “best fit” for the data - the best fit meaning minimizing the average squared vertical distances  this is only useful when the data is a linear model Finding the Best Fit  this is usually done by the computer  find the slope and the intercept- to find the slope (b, which is the ratio of the two variables standard deviations by the correlation coefficient) b=r sy/ sx - b is the slope. r is the correlation coefficient and sy and sx are the standard deviations of both variables. o If the correlation coefficient is near 0 then the slope will most likely be near 0 - Next find the intercept (a) with the value that you have calculated for the slope (b) - a = y – b x - a is the intercept, y bar is the mean of the y value, b is the slope, and x bar is the mean of x - next we take the values we have calculated and put them input them in to a final equation to calculate the predicted value predicted y = a -bx Interpreting the slope if the x variable is increased then y is also predicted to increased by an average of the value of the slope i.e. for each unit of increase in the x variable we predict our y variable to increase by an average value of the slope  if the x variable decreases then the slope will also decrease slope can only be used if the data is a linear model ‘ Interpreting the intercept  the y intercept is the value of y when x is 0 the y intercept is used to interpret data when: - it makes sense to have a value of 0 for x - the y-intercept value is meaningful (for example, if talking about age and the value is negative than the value is obviously not meaningful) - the data include values equal to or close to 0 More Regression the equation will change if x and y are switched (because the standard deviations are different, if they are the same then there is no effect of the equation) statistics language: - the x variable is known as the “explainatory”, “predictor”, “treatment” or “independent variable” - The y variable is the “predicted” “outcome variable” of “dependent variable” o It is important to know these terms because in statistics it is important to understand phrasing More Language In statistics ”please predict a womans BMI from her waist hip ratio (WHR)” - By using statistics language we know that the y variable is the BMI and the x variable is WHR Evaluating the Model General rule: not for non-linear models  if there is not a clear line don’t try to find one Outliers outliers have a strong effect on both correlation and the rquation of the regression line - This strong effect on the regression line is known as the influential point - If there is an influential point preform regression analysis with and without this point Aggreate Datanot aggreate data aggreate data  aggregate data for regression means that each point is represents the mean of all of the y values - This eliminates a lot of variability. (which is not necessarily bad, however it is important to acknowledge that by summarizing or averaging data in this way some varability is lost) BEWARE of extrapolation  only use the regression line to predict y values for x values that are with in the linear range of the dta -  the range is the maximum minus the minimum value, data must be within this range - if the value is out of this range, it could result in nonsensical results - Coefficient of Determination r ^2 (r squared)  r squared measures how much of the variation in y (the response variable) can be accounted for when using x (the explanatory variable).  r squared helps determine which explanatory variable (x) would be best at making predictions about y  we can only account for SOME of the


View Full Document
Download Regression Analysis Part2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Regression Analysis Part2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Regression Analysis Part2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?