Testing hypotheses about proportions Tests of significance The reasoning of significance tests Stating hypotheses The P value Statistical significance Tests for a population proportion Confidence intervals to test hypotheses Reasoning of Significance Tests Example A coin is tossed 500 times It lands heads 275 times which is a bit more than we expect Is the coin fair or not x Is the somewhat higher number of heads due to chance variation Is it evidence that the coin is not fair Stating Hypotheses Situation We observe some effect and we have two explanations for it x 1 the effect is due to chance variation 2 the effect is due to something significant How to decide Statement 1 null hypothesis H0 the coin is fair Statement 2 alternative hypothesis Ha the coin is not fair The null hypothesis is a very specific statement about a parameter of the population s It is labeled H0 and states status quo previous knowledge no effect the observed difference is due to chance It is the one which we want to reject The alternative hypothesis is a more general statement about a parameter of the population s that is the opposite of the null hypothesis It is labeled Ha and is the one we try to prove Coin tossing example H0 p 1 2 p is the probability that the coin lands heads Ha p 1 2 p is either larger or smaller Analogy with a criminal trial H0 the defendant is innocent If sufficient evidence is presented the jury will reject this hypothesis and conclude that Ha the defendant is guilty One sided and two sided tests A two tail or two sided test of the population proportion has these null and alternative hypotheses H0 p p0 a specific proportion Ha p p0 A one tail or one sided test of a population proportion has these null and alternative hypotheses H0 p p0 a specific proportion Ha p p0 OR H0 p p0 a specific proportion Ha p p0 What determines the choice of a one sided versus a two sided test is what we know about the problem before we perform a test of statistical significance It is important to make the choice before performing the test or else you could make a choice of convenience The P value Example cont d A coin is tossed 500 times It lands heads 275 times H0 p 1 2 vs Ha p 1 2 What is the chance of observing something like what we observed if H0 is true Tests of statistical significance quantify the chance of obtaining a particular random sample result if the null hypothesis were true This quantity is the P value This is a way of assessing the believability of the null hypothesis given the evidence provided by a random sample Interpreting a P value Could random variation alone account for the difference between the null hypothesis and observations from a random sample A small P value implies that random variation due to the sampling process alone is not likely to account for the observed difference With a small p value we reject H0 The true property of the population is significantly different from what was stated in H0 Thus small P values are strong evidence AGAINST H0 Oftentimes a P value of 0 05 or less is considered significant The phenomenon observed is unlikely to be entirely due to chance event from the random sampling Test for a population proportion p The sampling distribution for is approximately normal for large sample sizes and its shape depends solely on p and n Thus we can easily test the null hypothesis H0 p p0 a given value we are testing If H0 is true the sampling distribution is known The likelihood of our sample proportion given the null hypothesis depends on how far from p0 our is in units of standard deviation p p0 p0 1 p0 n z p p0 1 p0 n p0 p This is valid when both expected counts expected successes np0 and expected failures n 1 p0 are each 10 or larger P values and one or two sided hypotheses And as always if the p value is as small or smaller than the significance level a then the difference is statistically significant and we reject H0 A national survey by the National Institute for Occupational Safety and Health on restaurant employees found that 75 said that work stress had a negative impact on their personal lives You investigate a restaurant chain to see if the proportion of all their employees negatively affected by work stress differs from the national proportion p0 0 75 H0 p p0 0 75 vs Ha p 0 75 2 sided alternative In your SRS of 100 employees you find that 68 answered Yes when asked Does work stress have a negative impact on your personal life The expected counts are 100 0 75 75 and 25 Both are greater than 10 so we can use the z test The test statistic is From the standard normal table we find the area to the left of z 1 62 is 0 9474 Thus P Z 1 62 1 0 9474 or 0 0526 Since the alternative hypothesis is two sided the P value is the area in both tails and P 2 0 0526 0 1052 5 The chain restaurant data are not significantly different from the national survey results Four steps of hypothesis testing Define the hypotheses to test and the required significance level a Calculate the value of the test statistic Find the p value based on the observed data State the conclusion Reject the null hypothesis if the p value a if p value a the data do not provide sufficient evidence to reject the null The significance level a The significance level is the largest P value tolerated for rejecting a true null hypothesis how much evidence against H0 we require This value is decided arbitrarily before conducting the test reject H0 H0 If the P value is equal to or less than P then we If the P value is greater than P then we fail to reject When the z score falls within the rejection region shaded area on the tail side the p value is smaller than and you have shown statistical significance Z z 1 645 One sided test 5 Two sided test 1 Rejection region for a two tail test of p with 0 05 5 A two sided test means that is spread between both tails of the curve thus A middle area C of 1 95 and An upper tail area of 2 0 025 Here are the traditional z critical values from the Normal model 0 025 0 025 Example A marketing company claims that it receives 8 responses from its mailing To test this claim a random sample of 500 were surveyed with 25 responses Test at the a 0 05 significance level Check npo 500 0 08 40 n 1 po 500 0 92 460 Both 10 normal assumption OK Rejection region Solution H0 p 0 08 HA p 0 08 a 0 05 n 500 p 0 05 …
View Full Document