1STAT 10, UCLA, Ivo DinovSlide 1UCLA STAT 10Introduction toStatistical ReasoningInstructor: Ivo Dinov, Asst. Prof. In Statistics and NeurologyTeaching Assistants: Yan Xiong, Will Anderson,UCLA StatisticsUniversity of California, Los Angeles, Winter 2002http://www.stat.ucla.edu/~dinov/STAT 10, UCLA, Ivo DinovSlide 2Chapters 7-10: Lines in 2D(Regression and Correlation)Vertical LinesHorizontal LinesOblique linesIncreasing/DecreasingSlope of a lineInterceptY=α X + β, in general.Math Equation for the Line?STAT 10, UCLA, Ivo DinovSlide 3Chapters 7-10: Lines in 2D(Regression and Correlation)Draw the following lines:Y=2X+1Y=-3X-5Line through (X1,Y1) and (X2,Y2). (Y-Y1)/(Y2-Y1)= (X-X1)/(X2-X1). Math Equation for the Line?STAT 10, UCLA, Ivo DinovSlide 4Approaches for modeling data relationshipsRegression and CorrelationThere are random and nonrandom variablesCorrelation applies if both variables (X/Y) are random (e.g., We saw a previous example, systolic vs. diastolic blood pressure SISVOL/DIAVOL) and are treated symmetrically.Regression applies in the case when you want to single out one of the variables (response variable, Y) and use the other variable as predictor (explanatory variable, X), which explains the behavior of the response variable, Y.STAT 10, UCLA, Ivo DinovSlide 5Causal relationship? – infant death rate (per 1,000) in 14 countries4060 80% Breast feeding at 6 months206010014020 40 60 80 100% Access to safe water406080Predict behavior of Y (response)Based on the values of X(explanatory var.) Strategies foruncovering the reasons (causes)for an observed effect.Strong evidence (linear pattern)of death rate increase with increasing level of breastfeeding (BF)?Naïve conclusion breast feeding isbad? But high rates of BF is associated with lower access to H2O.STAT 10, UCLA, Ivo DinovSlide 6Regression relationship = trend + residual scatter9000 10000 11000 12000Disposable income ($)9000 10000 11000 12000(a) Sales/incomeDisposable income ($)From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 1999. Regression is a way of studying relationships between variables (random/nonrandom) for predicting or explaining behavior of 1 variable (response) in terms ofothers (explanatory variables or predictors).2STAT 10, UCLA, Ivo DinovSlide 71000 2000 3000 4000Ventilation1000 2000 3000 4000Ventilation(b) Oxygen uptakeFrom Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 1999.Trend ( does not have to be linear) + scatter (could be of any type/distribution)STAT 10, UCLA, Ivo DinovSlide 815 20 25 30 35 40102030405060Gestational age (wk)15 20 25 30 35 40102030405060(c) Liver lengthsGestational age (wk)From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 1999.Trend + scatter (fetus liver length in mm)Change of scatter with ageSTAT 10, UCLA, Ivo DinovSlide 9Trend + scatterDotted curves (confidence intervals) represent the extend of the scatter.200030004000Weigh t (lbs)5000 200030004000Weight (lbs)5000(a) Scatter plot (b) With trend plus scatterOutliersFigure 3.1.7Displacement versus weight for 74 models of automobile.From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.STAT 10, UCLA, Ivo DinovSlide 10Looking verticallyFlatter line gives better prediction, since it approx. goes through themiddle of the Y-range, for each fixed x-value (vertical line)xxyy (a) Which line? (b) Flatter line givesbetter predictions.Figure 3.1.8 Educating the eye to look vertically.From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.STAT 10, UCLA, Ivo DinovSlide 11Outliers – odd, atypical, observations (errors, B, or real data, A)100 300 500Diastolic volumeBAFigure 3.1.9 Scatter plot from the heart attack data.From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.STAT 10, UCLA, Ivo DinovSlide 12A weak relationship58 abused children are rated(by non-abusive parents and teachers) on a psychological disturbancemeasure.How do we quantify weak vs. strongrelationship?40 60 80Parent’s rating20Figure 3.1.10Parent's rating versus teacher'srating for abused children.From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.3STAT 10, UCLA, Ivo DinovSlide 13In observational data, strong relationshipsare not necessarily causal. It is virtually impossible to conclude a cause-and-effect relationship between variables using observational data!A note of caution!STAT 10, UCLA, Ivo DinovSlide 14Essential Points1. What essential difference is there between the correlation and regression approaches to a relationship between two variables? (In correlationindependent variables; regression response var depends on explanatory variable.)2. What are the most common reasons why people fit regression models to data? (predict Y or unravel reasons/causes of behavior.)3. Can you conclude that changes in X caused the changes in Y seen in a scatter plot if you have data from an observational study? (No, there could be lurking variables, hidden effects/predictors, also associated with the predictor X, itself, e.g., time is often a lurking variable, or may be that changes in Y cause changes in X, instead of the other way around).STAT 10, UCLA, Ivo DinovSlide 15Essential Points5. When can you reliably conclude that changes in X cause the changes in Y? (Only when controlled randomized experiments are used – levels of X are randomly distributed to available experimental units, or experimental conditions need to be identical for different levels of X, this includes time.STAT 10, UCLA, Ivo DinovSlide 16Correlation Coefficient Correlation coefficient (-1<=R<=1): a measure of linear association, or clustering around a line of multivariate data. Relationship between two variables (X, Y) can be summarized by: (µX, σX), (µY, σY) and the correlation coefficient, R. R=1, perfect positive correlation (straight line relationship), R =0, no correlation(random cloud scatter), R = –1, perfect negative correlation. Computing R(X,Y): (standardize, multiply, average)−−−−∑∑∑∑====−−−−−−−−====yykxxkyNkxNYXRσσσσµµµµσσσσµµµµ111),(X={x1, x2,…, xN,}Y={y1, y2,…, yN,}(µX, σX), (µY, σY)sample mean / SD. STAT 10, UCLA, Ivo DinovSlide
View Full Document