DOC PREVIEW
UCLA STATS 10 - Ch07

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1STAT 10, UCLA, Ivo DinovSlide 1UCLA STAT 10Introduction toStatistical ReasoningInstructor: Ivo Dinov, Asst. Prof. In Statistics and NeurologyTeaching Assistants: Yan Xiong, Will Anderson,UCLA StatisticsUniversity of California, Los Angeles, Winter 2002http://www.stat.ucla.edu/~dinov/STAT 10, UCLA, Ivo DinovSlide 2Chapters 7-10: Lines in 2D(Regression and Correlation)Vertical LinesHorizontal LinesOblique linesIncreasing/DecreasingSlope of a lineInterceptY=α X + β, in general.Math Equation for the Line?STAT 10, UCLA, Ivo DinovSlide 3Chapters 7-10: Lines in 2D(Regression and Correlation)Draw the following lines:Y=2X+1Y=-3X-5Line through (X1,Y1) and (X2,Y2). (Y-Y1)/(Y2-Y1)= (X-X1)/(X2-X1). Math Equation for the Line?STAT 10, UCLA, Ivo DinovSlide 4Approaches for modeling data relationshipsRegression and CorrelationThere are random and nonrandom variablesCorrelation applies if both variables (X/Y) are random (e.g., We saw a previous example, systolic vs. diastolic blood pressure SISVOL/DIAVOL) and are treated symmetrically.Regression applies in the case when you want to single out one of the variables (response variable, Y) and use the other variable as predictor (explanatory variable, X), which explains the behavior of the response variable, Y.STAT 10, UCLA, Ivo DinovSlide 5Causal relationship? – infant death rate (per 1,000) in 14 countries4060 80% Breast feeding at 6 months206010014020 40 60 80 100% Access to safe water406080Predict behavior of Y (response)Based on the values of X(explanatory var.) Strategies foruncovering the reasons (causes)for an observed effect.Strong evidence (linear pattern)of death rate increase with increasing level of breastfeeding (BF)?Naïve conclusion breast feeding isbad? But high rates of BF is associated with lower access to H2O.STAT 10, UCLA, Ivo DinovSlide 6Regression relationship = trend + residual scatter9000 10000 11000 12000Disposable income ($)9000 10000 11000 12000(a) Sales/incomeDisposable income ($)From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 1999. Regression is a way of studying relationships between variables (random/nonrandom) for predicting or explaining behavior of 1 variable (response) in terms ofothers (explanatory variables or predictors).2STAT 10, UCLA, Ivo DinovSlide 71000 2000 3000 4000Ventilation1000 2000 3000 4000Ventilation(b) Oxygen uptakeFrom Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 1999.Trend ( does not have to be linear) + scatter (could be of any type/distribution)STAT 10, UCLA, Ivo DinovSlide 815 20 25 30 35 40102030405060Gestational age (wk)15 20 25 30 35 40102030405060(c) Liver lengthsGestational age (wk)From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 1999.Trend + scatter (fetus liver length in mm)Change of scatter with ageSTAT 10, UCLA, Ivo DinovSlide 9Trend + scatterDotted curves (confidence intervals) represent the extend of the scatter.200030004000Weigh t (lbs)5000 200030004000Weight (lbs)5000(a) Scatter plot (b) With trend plus scatterOutliersFigure 3.1.7Displacement versus weight for 74 models of automobile.From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.STAT 10, UCLA, Ivo DinovSlide 10Looking verticallyFlatter line gives better prediction, since it approx. goes through themiddle of the Y-range, for each fixed x-value (vertical line)xxyy (a) Which line? (b) Flatter line givesbetter predictions.Figure 3.1.8 Educating the eye to look vertically.From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.STAT 10, UCLA, Ivo DinovSlide 11Outliers – odd, atypical, observations (errors, B, or real data, A)100 300 500Diastolic volumeBAFigure 3.1.9 Scatter plot from the heart attack data.From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.STAT 10, UCLA, Ivo DinovSlide 12A weak relationship58 abused children are rated(by non-abusive parents and teachers) on a psychological disturbancemeasure.How do we quantify weak vs. strongrelationship?40 60 80Parent’s rating20Figure 3.1.10Parent's rating versus teacher'srating for abused children.From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.3STAT 10, UCLA, Ivo DinovSlide 13In observational data, strong relationshipsare not necessarily causal. It is virtually impossible to conclude a cause-and-effect relationship between variables using observational data!A note of caution!STAT 10, UCLA, Ivo DinovSlide 14Essential Points1. What essential difference is there between the correlation and regression approaches to a relationship between two variables? (In correlationindependent variables; regression response var depends on explanatory variable.)2. What are the most common reasons why people fit regression models to data? (predict Y or unravel reasons/causes of behavior.)3. Can you conclude that changes in X caused the changes in Y seen in a scatter plot if you have data from an observational study? (No, there could be lurking variables, hidden effects/predictors, also associated with the predictor X, itself, e.g., time is often a lurking variable, or may be that changes in Y cause changes in X, instead of the other way around).STAT 10, UCLA, Ivo DinovSlide 15Essential Points5. When can you reliably conclude that changes in X cause the changes in Y? (Only when controlled randomized experiments are used – levels of X are randomly distributed to available experimental units, or experimental conditions need to be identical for different levels of X, this includes time.STAT 10, UCLA, Ivo DinovSlide 16Correlation Coefficient Correlation coefficient (-1<=R<=1): a measure of linear association, or clustering around a line of multivariate data. Relationship between two variables (X, Y) can be summarized by: (µX, σX), (µY, σY) and the correlation coefficient, R. R=1, perfect positive correlation (straight line relationship), R =0, no correlation(random cloud scatter), R = –1, perfect negative correlation. Computing R(X,Y): (standardize, multiply, average)−−−−∑∑∑∑====−−−−−−−−====yykxxkyNkxNYXRσσσσµµµµσσσσµµµµ111),(X={x1, x2,…, xN,}Y={y1, y2,…, yN,}(µX, σX), (µY, σY)sample mean / SD. STAT 10, UCLA, Ivo DinovSlide


View Full Document

UCLA STATS 10 - Ch07

Download Ch07
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Ch07 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Ch07 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?