**Unformatted text preview:**

Applied Business Statistics Week 2 Descriptive Statistics Variance Standard Deviation Covariance Correlation Descriptive Statistics Variance what is its importance Standard deviation what is its importance Covariance why are we focusing on it Correlation why are we focusing on it The variance and the standard deviation give us numerical measures of the spread of data sets We are interested in the relationship between 2 numerical variables Knowing the relationship between variables is quite helpful in making business decisions Variance Std Deviation We use all of these to describe the spread of the data Mean We know the mean tells us the central location but the variance and standard deviation tell us more about the spread or variation within the dataset Variance Std Dev Note that formulas for Variance and Std Dev here are for samples Variance Objective describe the spread of the data Lowest observation 19 Highest observation 110 Deviation 19 59 40 Strategy find the average squared deviation from the mean 59 Variance Objective describe the spread of the data Highest observation Lowest observation 19 59 110 Deviation 19 59 40 Why divide by n 1 Answer Variance is the average squared deviation from the population mean Sample Data 365 days of steps Day 17 113 Steps Taken 1113 3498 Variance 1113 3498 The sample mean is just one POSSIBLE position for the true population mean Smaller denominator adjusts the variance estimate upwards Why divide by n 1 Answer We need to subtract 1 so that we appropriately inflate our sample estimate of the population variance Degrees of Freedom Descriptive Statistics In Regression In Chi squared Tests Degrees of Freedom X 25 38 42 Obs 1 2 3 X u 8 5 9 3 degrees of freedom X 44 34 Obs 1 2 3 3 7 2 degrees of freedom Covariance Sample Data of Stock x y Prices on 5 Days Our objective To see statistically whether x y stock prices are related to one another x 30 35 40 25 35 33 Day 1 2 3 4 5 3 2 7 8 2 1 2 2 0 1 y 6 9 5 7 8 7 3 4 14 0 2 Our objective To see statistically whether x y stock prices are related to one another Covariance Sample Data of Stock x y Prices on 5 Days x 30 35 40 25 35 33 Day 1 2 3 4 5 COV x y y 6 9 5 7 8 7 3 2 7 8 2 1 2 2 0 1 3 4 14 0 2 Sum 5 X and Y are negatively related Our objective To see statistically whether x y stock prices are related to one another Covariance Sample Data of Stock x y Prices on 5 Days Day 1 2 3 4 5 x 30 35 40 25 35 33 COV x y VAR x y 6 9 5 7 8 7 3 2 7 8 2 1 2 2 0 1 3 4 14 0 2 Sum 5 Correlation Measures the linear relationship between two variables Correlation lies between 1 and 1 Correlation coefficient shows the direction and strength of the relationship No relationship 0 weak 0 to 0 3 moderate 0 4 to 0 6 strong 0 7 and above Value shown in the scatterplot is the equation of the line not the correlation coefficient Correlation Sample Data of Stock x y Prices on 5 Days Our objective To see statistically whether x y stock prices are related to one another x 30 35 40 25 35 33 Day 1 2 3 4 5 3 2 7 8 2 1 2 2 0 1 y 6 9 5 7 8 7 3 4 14 0 2 CORR x y 1 perfectly negatively correlated 0 no correlation 1 perfectly positively correlated Correlation Sample Data of Stock x y Prices on 5 Days Our objective To see statistically whether x y stock prices are related to one another x 30 35 40 25 35 33 Day 1 2 3 4 5 1 2 2 0 1 3 2 7 8 2 y 6 9 5 7 8 7 Covariance 1 25 COVARIANCE S Correlation 0 13868 CORREL 3 4 14 0 2 Sum 5 A measure of the strength of the x y relationship Coefficient of Variation CV CV formula Standard deviation is in absolute terms Coefficient of variation is in proportionate terms CV is a measure of variation that takes the scale of the dataset into account Sample Data Sets X Y 101 102 103 1 2 3 SE of the sample mean Standard Error SE Descriptive Statistics summary table from Excel SE formula Standard Error SE On your way out of class you ask 5 students how long in seconds it takes them to walk to the Student Union from JFF LL105 How confident can we be about the estimate of the true mean SE of the sample mean n 5 n 50 n 500 SE of the sample mean Standard Error SE The higher the standard error the more uncertainty we have about the location of the true population mean As n increases the standard error decreases Thus the higher the number of observations n the lower the SE The lower the SE the more confident you are that your sample mean is a good estimate of your population mean

View Full Document