1Stat 13, UCLA, Ivo DinovSlide 1UCLA STAT 13Introduction toStatistical Methods for the Life and Health SciencesInstructor: Ivo Dinov, Asst. Prof. of Statistics and NeurologyTeaching Assistants:Fred Phoa, Kirsten Johnson, Ming Zheng & Matilda HsiehUniversity of California, Los Angeles, Fall 2005http://www.stat.ucla.edu/~dinov/courses_students.htmlStat 13, UCLA, Ivo DinovSlide 2Chapter 13Regression & CorrelationStat 13, UCLA, Ivo DinovSlide 3Linear Relationshipsz Analyze the relationship, if any, between variables x and y by fitting a straight line to the data If a relationship exists we can use our analysis to make predictionsz Data for regression consists of (x,y) pairs for each observation For example: the height and weight of individualsStat 13, UCLA, Ivo DinovSlide 4Lines in 2D(Regression and Correlation)Vertical LinesHorizontal LinesOblique linesIncreasing/DecreasingSlope of a lineInterceptY=α X + β, in general.Math Equation for the Line?Stat 13, UCLA, Ivo DinovSlide 5Lines in 2D(Regression and Correlation)Draw the following lines:Y=2X+1Y=-3X-5Line through (X1,Y1) and (X2,Y2). (Y-Y1)/(Y2-Y1)= (X-X1)/(X2-X1). Math Equation for the Line?Stat 13, UCLA, Ivo DinovSlide 6Correlation Coefficient Correlation coefficient (-1<=R<=1): a measure of linear association, or clustering around a line of multivariate data. Relationship between two variables (X, Y) can be summarized by: (µX, σX), (µY, σY) and the correlation coefficient, R. R=1, perfect positive correlation (straight line relationship), R =0, no correlation(random cloud scatter), R = –1, perfect negative correlation. Computing R(X,Y): (standardize, multiply, average)⎟⎠⎞⎜⎝⎛−∑=⎟⎠⎞⎜⎝⎛−−=yykxxkyNkxNYXRσµσµ111),(X={x1, x2,…, xN,}Y={y1, y2,…, yN,}(µX, σX), (µY, σY)sample mean / SD.2Stat 13, UCLA, Ivo DinovSlide 7Correlation Coefficient Example:⎟⎠⎞⎜⎝⎛−∑=⎟⎠⎞⎜⎝⎛−−=yykxxkyNkxNYXRσµσµ111),(Stat 13, UCLA, Ivo DinovSlide 8Correlation Coefficient Example:⎟⎠⎞⎜⎝⎛−∑=⎟⎠⎞⎜⎝⎛−−=yykxxkyNkxNYXRσµσµ111),(904.0),(),(,563.653.215 ,573.65216,kg 556332 ,cm 1616966==========YXRYXCorrYXYXσσµµStat 13, UCLA, Ivo DinovSlide 9Correlation Coefficient - PropertiesCorrelation is invariant w.r.t. linear transformations of X or Y⎟⎠⎞⎜⎝⎛−=⎟⎠⎞⎜⎝⎛×−+−=⎟⎠⎞⎜⎝⎛×+−+=⎟⎠⎞⎜⎝⎛−+++=⎟⎠⎞⎜⎝⎛−∑=⎟⎠⎞⎜⎝⎛−−=++xxkxkxxkbaxbaxkyykxxkxabbxaababaxbaxdcYbaXRyNkxNYXRσµσµσµσµσµσµ)(||)(since ),,(111),(Stat 13, UCLA, Ivo DinovSlide 10Correlation Coefficient - PropertiesCorrelation is AssociativeCorrelation measures linear association, NOT an association in general!!! So, Corr(X,Y) could be misleading for X & Y related in a non-linear fashion.),(11),( XYRyNkxNYXRyykxxk=⎟⎠⎞⎜⎝⎛−∑=⎟⎠⎞⎜⎝⎛−=σµσµStat 13, UCLA, Ivo DinovSlide 11Correlation Coefficient - Properties1. R measures the extent oflinear association betweentwo continuous variables. 2. Association does not implycausation - both variablesmay be affected by a thirdvariable – age was a confounding variable.),(11),( XYRyNkxNYXRyykxxk=⎟⎠⎞⎜⎝⎛−∑=⎟⎠⎞⎜⎝⎛−=σµσµStat 13, UCLA, Ivo DinovSlide 12Example: The data below are airfares ($) and distance (miles) to various US cities from Baltimore, Maryland.Destination DistanceAirfare Destination DistanceAirfareAtlanta 576 178 Miami 946 198 Boston 370 138 New Orleans 998 188 Chicago 612 94 New York 189 98 Dallas 1216 278 Orlando 787 179 Detroit 409 158 Pittsburgh 210 138 Denver 1502 258 St. Louis 737 98 Linear Relationships3Stat 13, UCLA, Ivo DinovSlide 13z Until now we have described data using statistics such as the sample meanz What seems to bemissing from thisone sample viewof the data?Descriptive Statistics: Distance, AirfareVariable N N* Mean SE Mean StDev Minimum Q1 Median Q3 MaximumDistance 12 0 713 116 403 189 380 675 985 1502Airfare 12 0 166.9 17.2 59.5 94.0 108.0 168.0 195.5 278.0Linear RelationshipsStat 13, UCLA, Ivo DinovSlide 14z This scatterplot gives us a view of how the dependent variable airfare (y) changes with the independent variable distance (x) z From this data there appears to be a linear trend, but the data do not fall in an exact straight line Still may be reasonable to fit a line to this dataDist anc eAirfare16001400120010008006004002000300250200150100Scatterpl ot of Ai rfare vs DistanceLinear RelationshipsStat 13, UCLA, Ivo DinovSlide 15z Two Contexts for regression:1. Y is an observed variable and X is specified by the researcher Ex. Y is hair growth after 2 months, for individuals at certain dose levels of hair growth cream (X)2. X and Y are observed variables Ex. Height (Y) and weight (X) for 20 randomly selected individualsLinear RelationshipsStat 13, UCLA, Ivo DinovSlide 16z Suppose we have n pairs (x,y)z If a scatterplot of the data suggests a general linear trend, it would be reasonable to fit a line to the dataz The question is which is the best line?ExampleAirfare (cont’) We can see from the scatterplot that greater distance is associated with higher airfare In other words airports that tend to be further from Baltimorethan tend to be more expensive airfarez To decide on the best fitting line, we use the least-squares method to fit the least squares (regression) lineThe Fitted Regression LineStat 13, UCLA, Ivo DinovSlide 17z RECALL: y = mx+ bz In statistics we call this Y = b0+ b1Xwhere Y is the dependent variableX is the independent variableb0is the y-interceptb1is the slope of the lineEquation of the Regression Line()()()∑∑−−−2xxyyxxiiixby1−Stat 13, UCLA, Ivo DinovSlide 18LS Estimates for the Linear Parameters1. The least-squares line passes through the points (x = 0, = ?) and (x = , = ?). Supply the missing values.xˆ y =ˆ β 0+ˆ β 1xyˆyˆ[]xynixixniyiyxix10ˆˆ ;12)(1))((1ˆβββ−=∑=−∑=−−=4Stat 13, UCLA, Ivo DinovSlide 19Hands – on worksheet !1. X={-1, 2, 3, 4}, Y={0, -1, 1, 2}, 2413-120-1YXxx−yy −2)( xx −2)( yy −)()(yyxx−×−[]xynixixniyiyxix10ˆˆ ;12)(1))((1ˆβββ−=∑=−∑=−−=Stat 13, UCLA, Ivo DinovSlide 20Hands – on worksheet !1. X={-1, 2, 3, 4}, Y={0, -1, 1, 2}, 32.2541.52240.50.2510.511302.250-1.50-121.50.259-0.5-30-1YXxx−yy −2)( xx −2)( yy −)()(yyxx−×−5.0 ,2 == yx1420.5
View Full Document