QMB FINAL Topic 1 CHI SQUARE TESTS The chi square test of independence is used to determine if two categorical variables are independent of one another Step 1 Identify Null hypothesis Step 2 Calculate Expected Frequency Step 3 Calculate chi square test statistic x 2 Step 4 Determine the Chi Square test critical value x2 Step 5 Compare the test statistic x2 with x 2 Step 6 State the Conclusion Step 1 Identify Null Hypothesis H0 The averages and class times are independent of one another H1 The averages and class times are not independent of one another Step 2 Calculated Expected Frequency f e Row Total 09Column Total Total Number of observations Step 3 Calculate Chi squared Test statistic x 2 f o f e 2 f e Step 4 Determine the Chi Square critical value DF r 1 c 1 x2 x20 05 when 0 05 Step 5 Compare the statistic with the critical value x2 x2 Reject x2 x 2 Fail Reject Step 6 State the conclusion Example We do not reject the null hypothesis so we conclude that the production shift and production quality are independent of one another Topic 2 CORRELATION ANALYSIS Correlation analysis is used to measure both the strength and direction of a linear relationship between two variables A relationship is linear if the scatter plot of the independent and dependent variables has a straight line pattern Construct a Table to provide values for future calculations The Six Number Summary For correlation and Simple Regression SNSCSR X Y XY X 2 Y 2 N N Represents the number of ordered pairs in the table The Correlation Coefficient for a random sample r r n xy x y indicates the strength and direction of the linear relationship between independent and dependent variables Values range from 1 0 a strong negative relationship to 1 0 a strong positive relationship When r 0 there is no relationship between variables x and y The Correlation Coefficient for a population p Refers to the correlation between all values of two variables of interest in a Use a hypothesis test to determine if the population correlation coefficient is population significantly different from zero H0 p 0 H1 p 0 Tests Statistic t r t 1 r 2 n 2 df n 2 Requires a two tail test with 2 in each tail If H0 p 0 H1 p 0 then one tail test SIMPLE REGRESSION ANALYSIS Used to determine a straight line that best fits a series of ordered pairs x y An independent variable x is a variable used to predict explain or forecast a dependent variable y This technique is known as simple regression because we are using only multiple regression which includes more than one independent variable is one independent variable discussed in the next chapter Formula for the equation describing a straight line through ordered pairs Regression equation y a bx o y the predicted value of y given a value of x o x the independent variable o a the y intercept of the straight line o b the slope of the straight line The Residual ei ei yi yi The difference between the actual data value and the predicted value LEAST SQUARES METHOD Identifies the linear equation that best fits a set of ordered pairs Used to find the values for a the y intercept and b the slope of the line The resulting best fit line is called the regression line GOAL Minimize the total squared error between the values of y and the predicted values of ySum of squares error SSE SSE yi yi 2 n i 1 Regression Slope b n xy x y n x2 Example y intercept a y n b x n Total sum of squares SST Measures he total variation in the dependent variable Slope On average each additional TV ad increases the number of cars sold by 3 8947 per week Coefficient of determination R 2 R2 SSR SST measures the percentage of the total variation of the dependent variable that is explained by the independent variable from a sample Varies from 0 to 100 Higher values are more desirable because we would like to explain as much of the variation in the dependent variable as possible R2 is equal to the square of r The Population Coefficient of determination p 2 Unknown so we must use a hypothesis test to determine if p2 is significantly different than zero based on R2 o H0 p2 0 none of the variation is explained by x o H1 p2 0 x does explain a significant portion of the variation in y F test statistic is used F o SSR SSE n 2 o df D1 1 D2 N 2 o F F REJECT conclude that p2 is greater than 0 o F F FAIL REJECT Constructing a confidence interval around the point estimate Standard error of the estimate se se SSE n 2 Measures the amount of dispersion of the observed data around a regression line Formula for the confidence interval for an average value of y Average value of x x x n Prediction interval Population Regression slope Unknown must use hypothesis test to find out if it is significantly different than 0 o H0 0 There is no relationship between the independent and dependent variables o H1 0 There is no relationship between x and y T test statistic o t s b o b sb s e x 2 n o Df n 2 o Two tail Confidence Interval for the slope of a regression ASSUMPTIONS Assumption 1 The relationship between the independent and dependent variables is linear Not linear For low and high values of x the estimated y value will be too high estimated y values for x s in the middle of the x range will be too low Assumption 2 The residuals exhibit no patterns across values of the independent variable Assumption 3 homoscedasticity Can view residual plot to evaluate this assumption Multiple Regressions and Model Building This chapter takes in consideration more than one independent variable Regression equation using k independent variables y a b1 x 1 b2 x2 b k xk Regression Coefficient Slope Coefficients b1 b2 bk Predicts the change in a dependent variable due to a one unit increase in an independent variable All other variables held constant Explaining the variation of the dependent variable GOAL Determine the amount of the variation in y that is due to variation in the independent variables Equations for R2 are the same Whatever R2 comes out too is multiplied by 100 to reveal percent which is equal to the variation in y R2 SSR SST Testing the Significance of the overall regression Model Goal decide if the relationship between the independent and dependent variables is statistically significant Sample Values such as b1 are estimates of the population regression coefficients 1 If the population slopes 1 and 2 are both zero then x1 and x2 have no effect on y and we conclude there is no relationship between the dependent and independent variables 1 H0 1 2 0 There is no
View Full Document