DOC PREVIEW
UF STA 6166 - More About Tests

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Chapter 5: More About TestsStatistical significance versus practical significanceA large P-value is not strong evidence that the null hypotheCorrespondence between confidence intervals and two-sided teType I and Type II errorsChapter 5: More About Tests The p-value is a measure of the strength of the evidence against the null hypothesis and in favor of the alternative hypothesis. How small a P-value is convincing evidence that the alternative hypothesis is true? It depends on the context – how much you believe the null hypothesis to begin with, what are the costs if you make a mistake. It would probably take a much smaller P-value to convince you that someone had ESP than to convince you that someone wasn’t guessing on a multiple-choice test. However, sometimes we set a threshold for what constitutes convincing evidence. If the p-value falls below the threshold, then the result is said to be statistically significant. The threshold is sometimes denoted as α (alpha) and is called the significance level. Common values are .10, .05 and .01. If the P-value is less than .05, for example, the result is said to be statistically significant at the .05 level. We may also say, we reject the null hypothesis in favor of the alternative hypothesis at the .05 level. If the P-value is not bigger than .05, then we say the result is not statistically significant at the .05 level, or we fail to reject the null hypothesis at the .05 level (note: some will say “we accept the null hypothesis at the .05 level,” but this is misleading since a large P-value does not constitute evidence that the null hypothesis is true, but simply that we have no evidence that it is false). Significance levels are rather arbitrary, however, and what constitutes convincing evidence for me may not be the same as for you. Therefore, you should always report the exact P-value so that readers can draw their own conclusions. Statistical significance versus practical significance In the moose use-availability example, we were testing the hypotheses 34.:0=pH 34.: ≠pHA where p is the actual proportion of time moose spent in the interior of the burn during this winter. Suppose now that the sample data on the moose had come out differently. Compute the P-value in each of the following cases: 1) 100 observations of which 36 were in the burn: 2) 1000 observations of which 360 were in the burn:23) 10,000 observations of which 3600 were in the burn: What is your conclusion in each case? This illustrates that a P-value does not measure the size of the difference between the actual and hypothesized p. You should always compute a confidence interval for the unknown parameter to see how big or small the difference could be. A large P-value is not strong evidence that the null hypothesis is true, it only indicates a lack of evidence that it is false. A small P-value indicates only that the sample result is unlikely to have occurred if the null hypothesis were true; it does not necessarily mean that the difference between the true p and (the 0p effect size) is large. Unfortunately, you will see authors misinterpret P-values this way. For example, you might read “There was no difference (P = 0.23) between availability and use of interior burn by moose.” Or “Moose used the interior burn significantly less (P<.05) than availability” without any indication that this is “statistical significance” and does not necessarily indicate a difference of practical importance. A confidence interval will give you insight as to how large the difference might be. Correspondence between confidence intervals and two-sided tests A confidence interval with a confidence level of C% corresponds to a two-sided hypothesis test with an α level of 100-C%.. Hence, the null hypothesis 00: ppH= will generally be rejected at the α level of significance in favor of 0: ppHA≠only if is not the 100-C% confidence interval for p (see footnote 3 on p. 417 of the text for why we say “generally”). For example, if is in the 99% confidence interval, then would not be rejected in favor of the two-sided alternative at the .01 level of significance. If is not in the 99% confidence interval, then the null hypothesis would be rejected at the .01 level of significance. 0p0p00: ppH =0p Compute 95% confidence intervals for p for each of the three cases in the previous problem. Type I and Type II errors If we conduct a hypothesis test with a fixed level of significance α, where we reject H0 if the P-value is less than α and fail to reject it otherwise, then we can make an error. There are two types of errors: Type I error: H0 is true, but we mistakenly reject it.3 Type II error: H0 is false, but we fail to reject it. If H0 is true, what is the probability that we make a Type I error and reject it? That is exactly α because we have said we will reject H0 only if we get a result that would occur with probability α or less if H0 were true. Thus, we can reduce the probability of a Type I error by reducing α; that is, by requiring stronger evidence before we are willing to reject H0. What is the probability of a Type II error, that is, failing to reject H0 when it is false and HA is true? This probability is denoted by β, but its value depends on p. That’s because H0 is false for a lot of different values of p and we need to know p in order to compute β. So, there is not just one value of β; there’s a different value for every value of p in HA. Example: In the ESP example with 100 trials, suppose we decide to reject in favor of if the P-value is less than .05. The test statistic for this test is 2.:0=pH2.: >pHA 04.2.ˆ100)8(.2.2.ˆ−=−=ppz. If we reject H0 only if the P-value is less than .05, then that is equivalent to saying that we will reject H0 only if the test statistic is greater than 1.645. Why? So we will reject H0 only if 645.104.2.ˆ>−p or 266.02.0)04(.645.1ˆ=+>p. To compute the β, the probability of a Type II error, we need to compute the probability of obtaining a value of < 0.266 for any value of p in HpˆA (that is, the probability of not rejecting H0) . For example, if the true p were .25 (in other words, someone could actually get 25% right in the long run), then the probability of a Type II error is: 644.0)37.(100)75(.25.25.266..25)p if 266.ˆ( ≈<=⎟⎟⎟⎟⎠⎞⎜⎜⎜⎜⎝⎛−<==< ZPZPpP. There’s a 64% probability that


View Full Document

UF STA 6166 - More About Tests

Documents in this Course
Exam 1

Exam 1

4 pages

Exam 1

Exam 1

4 pages

Exam 1

Exam 1

4 pages

VARIABLES

VARIABLES

23 pages

Exam #2

Exam #2

4 pages

Exam2

Exam2

6 pages

Sampling

Sampling

21 pages

Exam 1

Exam 1

4 pages

Exam 1

Exam 1

5 pages

Load more
Download More About Tests
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view More About Tests and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view More About Tests 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?