Chapter 7 Statistical Inference I Populations and Samples case a Data about the population consists of data for every possible relevant b Data drawn from a sample is a subset of cases that are drawn from an underlying population c Samples of convenience are nonrandom samples d Statistical inference we use what we know to be true about one thing the sample to infer what is likely to be true about another thing the population II The Normal Distribution a Also known as the Bell Curve b The Normal Distribution is symmetrical about its mean such that the mode median and mean are the same c The Normal Distribution has a predictable area under the curve within specified distances of the mean 68 95 99 Rule i Starting from the mean and going one standard deviation in each direction will capture 68 of the area under the curve ii Going one additional standard deviation in each direction will capture a shade over 95 of the total area under the curve iii Going a third standard deviation in each direction will capture more than 99 of the total area under the curve d How to find the Normal Distribution i Calculate the mean ii Find the first standard deviation n sY i 1 Y i Y 2 n 1 1 iii Find the standard error of the mean standard error s Y sY n iv s Y 1 2 standard error of the mean the standard deviation of the original distribution 3 N Root of the sample size Y 2 s Y v is your answer 1 You are 95 confident that the population mean for the sample lies somewhere between the two answers III The Effects of Sample Size a The smaller the standard errors the tighter our resulting confidence b The larger the standard errors the wider our resulting confidence intervals will be intervals will be c When estimating population values based on samples you want tighter confidence intervals smaller standard errors d How big does my sample need to be It depends on how tight you want your confidence intervals to be Chapter 8 Bivariate Hypothesis Testing I Bivariate Hypothesis Tests and Establishing Causal Relationships a Bivariate hypothesis tests help us to answer the question Are X and Y related b They cannot help us find spurious relationships or confounding factors Variable types and appropriate Independent variable type bivariate hypothesis tests Categorical Continuous Dependent Categorical Tabular analysis Probit logit variable type Continuous Difference of means Correlation coefficient II All Roads Lead to p a p value is the probability value ranging between 0 and 1 b The Logic of p Values i In bivariate hypothesis tests they all compare the actual relationship between X and Y in sample data with what we would expect to find if X and Y were not related in the underlying population ii The lower the p value the greater confidence we have that there is a systematic relationship c Limitations of p Values i When a p value is very close to 0 this does not indicate that the relationship is strong simply that we can be more confident that there a relationship exists ii The further we are from a truly random sample the less confidence we should have in our p value d Statistical Significance i Lower p values and thus increased confidence that there is indeed a relationship between two variables lead to statistical significance ii General standard of a p value is 0 05 if p is less than 0 05 they consider the relationship to be statistically significant e Also we can use the p value to convey the level of confidence with which we can reject the null hypothesis III Bivariate Hypothesis Test 1 Tabular Analysis a This uses only two variables i Spurious variable Z is never taken into account ii Can only be X Y iii Both variables are categorical b Dependent variable placed in vertical rows independent variable c Ex We want to find out if there is no relationship between gender X placed in horizontal columns and presidential vote Y Table 8 4 Gender and vote in the 2004 presidential election Expectations for hypothetical scenario if there were no relationship Female Candidate 49 20 Kerry 50 80 Bush 100 00 Column total Note Cell entries are column percentages Row total 49 20 50 80 100 00 Male 49 20 50 80 100 00 Table 8 5 Gender and vote in the 2004 presidential election Candidate Kerry Bush Column total Note Cell entries are number of respondents Female 437 Male 374 Row total 399 412 811 Table 8 6 Gender and vote in the 2004 presidential election Calculating the expected cell values if gender and presidential vote are unrelated Candidate Kerry Bush Note Cell entries are expectation calculations if thee two variables are unrelated Female 0 492 x 437 215 0 508 x 437 222 Male 0 492 x 374 184 0 508 x 374 190 Table 8 7 Gender and vote in the 2004 presidential election Candidate Kerry Bush Column total Note Cell entries are number of respondents Female 229 208 0 5388 Male 170 204 0 4612 Row total 0 492 0 508 1 0 Table 8 8 Gender and vote in the 2004 presidential election Female Candidate O 229 E 215 Kerry O 208 E 222 Bush Note Cell entries are the number observed O the number expected if there were no relationship E Male O 170 E 184 O 204 E 190 d Now we want to know whether or not these differences are statistically significant e Chi squared X2 tests for tabular association i If there were no relationship between the two variables then we would get a contribution of 0 from that cell to the overall formula ii Math see Tables for numbers top of next page E x2 O E 2 170 184 2 x2 184 196 184 196 190 204 190 2 190 196 215 196 222 x2 x2 1 065 1 032 0 912 0 833 x2 3 892 229 215 2 215 208 222 2 222 iii Now we need to compare our X2 with the critical value of X2 iv Degrees of freedom df r 1 c 1 1 r row c column 2 IV Bivariate Hypothesis Test 3 Correlation Coefficient a Independent and dependent variables must be continuous b Tests whether there is a positive or negative relationship X Y c Covariance is a statistical way of summarizing the general pattern or association or the lack thereof between two variables n i 1 Xi X Y i Y cov XY i n ii Individual cases Xi and Yi in terms of their values relative to their means X and Y n number of cases 32 in example iii If both are 0 or if both are 0 the relationship is positive iv If either one varies in positivity or negativity the relationship v But this only answers the question Is the relationship positive d To determine the confidence of the relationship we use Pearson s r is negative or negative r covXY varX varY i ii Example next page Table 8 …
View Full Document