Chapter 18 Inference for One Numerical Population Continued Chapter 17 did most of the heavy lifting for inference for one numerical population By comparison this chapter is pretty user friendly 18 1 A Test of Hypotheses for This section is very similar to Section 12 5 which presented a test of hypotheses for a binomial p Again the idea is that of all of the possible values of there is one value of special interest to the researcher This known special value of interest is denoted by 0 and the null hypothesis is that 0 As in Chapter 12 the justification for the value 0 is history theory or contracts or law As in Chapter 12 the test of this section is not terribly useful in science Recall that the very important McNemar s test of Chapter 16 was a special case of the not so important test of Chapter 12 Similarly Chapter 20 will present an important use for the test of this section As usual there are three possibilities for the alternative 0 0 or 6 0 As in Chapter 17 we assume that our data will consist of n i i d random variables X1 X2 X3 Xn with summary random variables X and S the mean and standard deviation of these variables The observed values of these guys are x1 x2 x3 xn x and s Because our hypotheses involve the mean of the population the obvious and natural choice for the test statistic is X with observed value x In order to obtain an approximate sampling distribution we standardize X and obtain X Z n 469 We don t yet have our test statistic there is a flaw inherent in this Z we don t know the values and Handling is easy enough we replace it in Z with S giving Z X S n We will follow our approach of Chapter 17 and use Gosset s t curve with df n 1 to obtain approximate probabilities for Z But Z is not a test statistic because we don t know the value of Just in time we recall that we want to know how the test statistic behaves on the assumption that the null hypothesis is true Given that the null hypothesis is true we can replace the unknown in Z with the known 0 The result is our test statistic T X 0 S n 18 1 after data are collected the observed value of T is t x 0 s n 18 2 The three rules for finding the P value are similar to earlier rules and are summarized in the following result The website we are using in these Course Notes gives areas to the left under a t curve In the items listed below I include an equivalent area to the right rule Result 18 1 In the formulas below t is given in Equation 18 2 and areas are computed under the t curve with df n 1 1 For the alternative 0 the approximate P value equals the area to the right of t If you prefer the approximate P value equals the area to the left of t 2 For the alternative 0 the approximate P value equals the area to the left of t If you prefer the approximate P value equals the area to the right of t 3 For the alternative 6 0 the approximate P value equals twice the area to the right of t If you prefer the approximate P value equals twice the area to the left of t I will illustrate the use of these rules Suppose that we have 0 20 n 16 x 23 00 and s 8 00 First we use Equation 18 2 to obtain the observed value of the test statistic t x 0 23 00 20 3 2 1 50 s n 8 00 16 Using the website introduced in Chapter 17 470 http stattrek com online calculator t distribution aspx and the rules above For the alternative enter n 1 16 1 15 for the degrees of freedom enter t 1 50 in the t score box and click on Calculate The approximate P value 0 0772 appears in the Cumulative probability box For the alternative leave 15 for the degrees of freedom enter t 1 50 in the t score box and click on Calculate The approximate P value 0 9228 appears in the Cumulative probability box For the alternative 6 the approximate P value equals twice the area to the left of t 1 50 From the above we know that this area equals 0 0772 Thus the approximate Pvalue equals 2 0 0772 0 1544 If you believe that the population is symmetric or approximately symmetric then the approximate P values given above should be reasonably accurate even for relatively small values of n If you suspect that the population is strongly skewed and your alternative is two sided 6 my advice is to use the above rules if your n is very large Of course very large is vague the guidelines we had in Chapter 17 i e how large depends on how skewed are fine here too If however you suspect that the population is strongly skewed and your alternative is onesided or then my advice is to never use the rules above I don t have the time to explain why but it s related to the fact that for a population that is strongly skewed to the right left the incorrect confidence intervals are too small large much more often than they are too large small This translates into the approximate P value being either much too large or much too small 18 2 Estimating the Median of a pdf Recall that a fundamental feature of a pdf is that the total area under it is equal to 1 It thus follows that there exists a number pronounced new with the following property The area under the pdf to the left and right of is equal to 0 5 The number is called the median of the pdf for rather obvious reasons Note my use of the definite article the median I am being a bit dishonest here Let me explain For every pdf we have seen including all the families of pdfs mentioned in Chapter 17 there is a unique number with the property that the area under the pdf to the left and right of is equal to 0 5 It is possible mathematically however for there to be an interval of numbers with this property Figure 18 1 presents a pdf a combination of two rectangles for which all numbers in the closed interval 10 15 are medians Notice how this happens It requires a gap between two collections of possible measurement values and exactly one half of the area is on each side of the gap Such 471 Figure 18 1 A bizarre pdf with an interval of medians In particular every real number between 10 and 15 inclusive is a median of this pdf 0 10 0 05 0 5 10 15 20 a picture is perfectly reasonable to a mathematician …
View Full Document