DOC PREVIEW
VCU STAT 210 - Exam 3 Study Guide

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

STAT 210 1nd EditionExam # 3 Study Guide Lectures: 14-17Lecture 14 (September 24)- Causal Relationshipo One variable causes change in the other variable(s). - Associationo When two variables are associated with one another and one does not cause a change in the other. - Independent Variableo X; the explanatory variable; a measurement variable that does not have any restraints and tries to explain Y. - Dependent Variableo Y; the response variable; a measurement variable that measures the effect of X. - Lurking Variableso Variables that have an effect on the relationship between X and Y but are NOT listed with the variables being examined. o When this exists, it is said that the effect that X has on Y is confounded with the effect of the lurking variable. - When describing the relationship between the independent and dependent variable, mustspecify the direction, form, and strength of the relationship. To do so, need a graph and numerical descriptor. o Direction: two variables are said to be positively associated if small or large values of X are associated with small or large values of Y and there is an upward trend from left to right. Two variables are said to be negatively associated if smallvalues of X are associated with large values of Y and vice versa and there is a downward trend from left to right. o Form: linear when points fall close to a straight line. Quadratic when points follow a parabola pattern. Exponential when points follow a curve pattern; exponential growth when upward and decay when downward. o Strength: measures the amount of scatter around the linear trend; the closer the points fall into a straight line the stronger the linear relationship between the two variables. - Scatterploto Graph procedure that displays the relationship between two variables. X is labelled along the horizontal axis while Y is labelled along the vertical axis. - Correlation Coefficiento A numerical measure of the direction and strength of the linear relationship between two variableso Measures the amount of scatter around the regression lineo Population correlation coefficient (ρ) is a parameter while sample correlationcoefficient (r) is a statistic. o Correlation coefficient is always between -1 and +1o Negative r means negative association between X and Y while positive r means positive association between X and Yo r close to 0 means very weak linear relationshipo Strength of the linear relationship increases as r moves away from 0 toward -/+ 1. If r near -/+ 1, means strong linear relationship. If r equal to exactly -/+ 1, means perfect linear relationship & all points fall exactly on a straight line.o The correlation coefficient affected by extreme values in either X/Y directiono Sign of correlation coefficient will depend on Sxyo Xbar= mean of X data; Ybar= mean of Y datao Sx= standard deviation of X data; Sy= standard deviation of Y dataLecture 15 (September 26)- Correlation Coefficient o Sxx = xΣ ¿¿¿2¿Σ x2−¿o Syy = yΣ ¿¿¿2¿Σ y2−¿o Sxy = Σ xy −(Σ x )(Σy)n- Regression Lineo The equation that best explains the relationship between X and Y.- Equation of a Lineo Y= intercept + slope(X) Intercept: predicted Y value when x=0. Slope: amount of change in Y (increase/decrease) when X increases by 1 unit.- Prediction Equationo Once intercept and slope are determined, equation of line can be used to predictY values given X values. o^Y =intercept+slope(X) y is the observed dependent variable value^y is the predicted dependent variable valuey−^y is called a residual. The goal is to have residuals be as small as possible.- Know how to apply the method of least squares in minimizing residuals.Lecture 16 (September 29)- Regression Line Equation: Y= intercept +slope(X)o Can predict the value of Y for any value of X by substituting the value of X into the equation.- Extrapolationo Predicting outside the range of the original X data. Should be avoided. Want to have the value of X that we predict to fall within the original X data range. - Residual is the vertical deviation of a data point from the regression line. o Can use residuals to assess the quality of regression line by making a residual plot. To do so: Compute residual for each observation Make scatterplot with independent variable X on horizontal axis and residuals on vertical axis.o An ideal residual plot has points randomly scattered around 0 without any obvious patterns. - Outliers are any observations that are significantly smaller/larger than majority of the dataor can be an observation that falls within horizontal direction’s data but lies far from the regression line in vertical direction (which produces large residual). - Influential Observationo Observations in the horizontal direction (X) that stand out from the other observations. Usually have large influence on position of regression line. - Coefficient of Determinationo Measures the proportion/fraction of the total variation in the Y values which can be explained using the X values. Want coefficient of determination to be as large as possible.or2=Sx y2SxxSyy R square will always be between 0 and +1. Close to 1 implies X explains most of variation in Y and therefore the regression line predicts Y values from X well. Close to 0 implies regression line useless. Lecture 17 (October 1)- Qualitative/Categorical Variableso Variables which vary in name but not magnitude, meaning that they cannot be ranked. Can only name the categories and count the number of observations falling in each category.- Categorical Data displayed in a two-way tableo Make sure understand how to set these types of tables up.- Marginal Distributiono Lists the categories of the variable with the frequency (count)/relative frequency (percentage) of observations in each category.- Conditional Distributiono Distribution conditioned on the category of variable 1. o For a specific category of variable 1, calculate the conditional distribution of the other variable; this can be done for each category of variable 1 and can be in terms of frequencies (counts) or relative frequencies (percentages).o If conditional distributions of variable 2 are nearly same for each category of variable 1, no association between the two variables. If significant difference in conditional distribution of variable 2 for different categories of variable 1, there isan association between the two variables.- Simpson’s Paradoxo


View Full Document

VCU STAT 210 - Exam 3 Study Guide

Download Exam 3 Study Guide
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Exam 3 Study Guide and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Exam 3 Study Guide 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?