1STAT 13, UCLA, Ivo Dinov Slide 1UCLA STAT 13Introduction toStatistical MethodszInstructor: Ivo Dinov, Asst. Prof. In Statistics and NeurologyzTeaching Assistants: Tom Daula and Kaiding Zhu,UCLA StatisticsUniversity of California, Los Angeles, Fall 2002http://www.stat.ucla.edu/~dinov/STAT 13, UCLA, Ivo Dinov Slide 2Chapter 12: Lines in 2D(Regression and Correlation)zVertical LineszHorizontal LineszOblique lineszIncreasing/DecreasingzSlope of a linezInterceptzY=α X + β, in general.Math Equation for the Line?STAT 13, UCLA, Ivo Dinov Slide 3Chapter 12: Lines in 2D(Regression and Correlation)zDraw the following lines:zY=2X+1zY=-3X-5zLine through (X1,Y1) and (X2,Y2). z(Y-Y1)/(Y2-Y1)= (X-X1)/(X2-X1). Math Equation for the Line?STAT 13, UCLA, Ivo Dinov Slide 4Approaches for modeling data relationshipsRegression and CorrelationzThere are random and nonrandom variableszCorrelation applies if both variables (X/Y) are random (e.g., We saw a previous example, systolic vs. diastolic blood pressure SISVOL/DIAVOL) and are treated symmetrically.zRegression applies in the case when you want to single out one of the variables (response variable, Y) and use the other variable as predictor (explanatory variable, X), which explains the behavior of the response variable, Y.STAT 13, UCLA, Ivo DinovSlide 5Causal relationship? – infant death rate (per 1,000) in 14 countries4060 80% Breast feeding at 6 months206010014020 40 60 80 100% Access to safe water406080Predict behavior of Y (response)Based on the values of X(explanatory var.) Strategies foruncovering the reasons (causes)for an observed effect.Strong evidence (linear pattern)of death rate increase with increasing level of breastfeeding (BF)?Naïve conclusion breast feeding isbad? But high rates of BF is associated with lower access to H2O.STAT 13, UCLA, Ivo DinovSlide 6Regression relationship = trend + residual scatter9000 10000 11000 12000Disposable income ($)9000 10000 11000 12000(a) Sales/incomeDisposable income ($)From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 1999.z Regression is a way of studying relationships between variables (random/nonrandom) for predicting or explaining behavior of 1 variable (response) in terms ofothers (explanatory variables or predictors).2STAT 13, UCLA, Ivo DinovSlide 10xxyy (a) Which line? (b) Flatter line givesbetter predictions.Figure 3.1.8 Educating the eye to look vertically.From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.Looking verticallyFlatter line gives better prediction, since it approx. goes through themiddle of the Y-range, for each fixed x-value (vertical line)STAT 13, UCLA, Ivo DinovSlide 16Correlation Coefficient Correlation coefficient (-1<=R<=1): a measure of linear association, or clustering around a line of multivariate data. Relationship between two variables (X, Y) can be summarized by: (µX, σX), (µY, σY) and the correlation coefficient, R. R=1, perfect positive correlation (straight line relationship), R =0, no correlation(random cloud scatter), R = –1, perfect negative correlation. Computing R(X,Y): (standardize, multiply, average)¸¸¸¸¹¹¹¹····¨¨¨¨©©©©§§§§−−−−¦¦¦¦====¸¸¸¸¹¹¹¹····¨¨¨¨©©©©§§§§−−−−−−−−====yykxxkyNkxNYXRσσσσµµµµσσσσµµµµ111),(X={x1, x2,…, xN,}Y={y1, y2,…, yN,}(µX, σX), (µY, σY)sample mean / SD. STAT 13, UCLA, Ivo DinovSlide 17Correlation Coefficient Example:¸¸¸¸¹¹¹¹····¨¨¨¨©©©©§§§§−−−−¦¦¦¦====¸¸¸¸¹¹¹¹····¨¨¨¨©©©©§§§§−−−−−−−−====yykxxkyNkxNYXRσσσσµµµµσσσσµµµµ111),(STAT 13, UCLA, Ivo DinovSlide 18Correlation Coefficient Example:¸¸¸¸¹¹¹¹····¨¨¨¨©©©©§§§§−−−−¦¦¦¦====¸¸¸¸¹¹¹¹····¨¨¨¨©©©©§§§§−−−−−−−−====yykxxkyNkxNYXRσσσσµµµµσσσσµµµµ111),(904.0),(),(,563.653.215 ,573.65216,kg 556332 ,cm 1616966========================================YXRYXCorrYXYXσσσσσσσσµµµµµµµµSTAT 13, UCLA, Ivo DinovSlide 19Correlation Coefficient - PropertiesCorrelation is invariant w.r.t. linear transformations of X or Y¸¸¸¸¹¹¹¹····¨¨¨¨©©©©§§§§−−−−====¸¸¸¸¹¹¹¹····¨¨¨¨©©©©§§§§××××−−−−++++−−−−====¸¸¸¸¹¹¹¹····¨¨¨¨©©©©§§§§××××++++−−−−++++====¸¸¸¸¹¹¹¹····¨¨¨¨©©©©§§§§−−−−++++++++++++====¸¸¸¸¹¹¹¹····¨¨¨¨©©©©§§§§−−−−¦¦¦¦====¸¸¸¸¹¹¹¹····¨¨¨¨©©©©§§§§−−−−−−−−====++++++++xxkxkxxkbaxbaxkyykxxkxabbxaababaxbaxdcYbaXRyNkxNYXRσσσσµµµµσσσσµµµµσσσσµµµµσσσσµµµµσσσσµµµµσσσσµµµµ)()(since ),,(111),(STAT 13, UCLA, Ivo DinovSlide 20Correlation Coefficient - PropertiesCorrelation is AssociativeCorrelation measures linear association, NOT an association in general!!! So, Corr(X,Y) could be misleading for X & Y related in a non-linear fashion.),(11),( XYRyNkxNYXRyykxxk====¸¸¸¸¹¹¹¹····¨¨¨¨©©©©§§§§−−−−¦¦¦¦====¸¸¸¸¹¹¹¹····¨¨¨¨©©©©§§§§−−−−====σσσσµµµµσσσσµµµµ3STAT 13, UCLA, Ivo DinovSlide 21Correlation Coefficient - Properties1. R measures the extent oflinear association betweentwo continuous variables. 2. Association does not implycausation - both variablesmay be affected by a thirdvariable – age was a confounding variable.),(11),( XYRyNkxNYXRyykxxk====¸¸¸¸¹¹¹¹····¨¨¨¨©©©©§§§§−−−−¦¦¦¦====¸¸¸¸¹¹¹¹····¨¨¨¨©©©©§§§§−−−−====σσσσµµµµσσσσµµµµSTAT 13, UCLA, Ivo DinovSlide 22Essential Points6. If the experimenter has control of the levels of X used, how should these levels be allocated to the available experimental units?At random! Example, testing hardness of concrete, Y, based on levels of cement, X, incorporated. Factors effecting Y: amount of H2O, ratio stone-chips to sand, drying conditions, etc. To prevent uncontrolled differences in batches of concrete in confounding our impression of cement effects, we should choose which batch(H20 levels, sand, dry-conditions) gets what amount of cementat random! Then investigate for X-effects in Y observations. If some significance test indicates observed trend is significantly different from a random
View Full Document