Chapter 17 Inference for One Numerical Population In Chapter 10 you learned about finite populations You learned about smart and dumb random samples from a finite population You learned that i i d trials can be viewed as the outcomes of a dumb random sample from a finite population Chapter 11 developed these ideas in the special case of a dichotomous response This was a very fruitful development leading to all the results not named Poisson in Chapters 12 16 And of course our results for the Poisson are related to our results for the binomial In Chapters 17 20 we mimic the work of Chapters 11 16 but for a numerical response rather than a dichotomy First you will see the familiar distinction between a finite population and a mathematical model for the process that generates the outcomes of trials Second you will see that responses that are counts must be studied differently than responses that are measurements We begin by studying responses that are obtained by counting Before we get to count responses let me lay out some notation for this chapter Recall that either dumb random sampling from a finite population or the assumption that trials are i i d result in our observing n i i d random variables X1 X2 X3 Xn The probability sampling distribution for each of these random variables is determined by the population Recall that for a dichotomous response the population is quite simple it is determined by the single number p For a numerical response as you will soon see the population is more complex it is a picture not a single number Finally when I want to talk about a generic random variable one observation of a trial or one population member selected at random I will use the symbol X without a subscript You may need on occasion to refer back to the preceding paragraph as you work through this chapter 17 1 Responses Obtained by Counting I will begin with finite populations 425 Table 17 1 The population distribution for the cat population 1 2 3 Total x 0 P X x 0 10 0 50 0 30 0 10 1 00 Figure 17 1 The probability histogram for the cat population 0 50 0 30 0 10 0 10 0 1 2 3 17 1 1 Finite Populations for Counts Please remember that the two examples in this subsection are both hypothetical In particular I claim no knowledge of cat ownership or household size in our society Example 17 1 The cat population A city consists of exactly 100 000 households Nature knows that 10 000 of these households have no cats 50 000 of these households have exactly one cat 30 000 of these households have exactly two cats and the remaining 10 000 households have exactly three cats We can visualize the cat population as a population box that contains 100 000 cards one for each household On a household s card is its number of cats 0 1 2 or 3 Consider the chance mechanism of selecting one card at random from the population box Equivalently selecting one household at random from the city Let X be the number on the card that will be selected It is easy to determine the sampling distribution of X and it is given in Table 17 1 For example 50 000 of the 100 000 households have exactly one cat thus P X 1 50 000 100 000 0 50 It will be useful to draw the probability histogram of the random variable X it is presented in Figure 17 1 To this end note that consecutive possible values of X differ by 1 thus 1 and the height of each rectangle in Figure 17 1 equals the probability of its center value For example the rectangle centered at 1 has a height of 0 50 because P X 1 0 50 Either the distribution in Table 17 1 or its probability histogram in Figure 17 1 can play the role of the population In the next section we will see that for a measurement response the population is a picture called the probability density function Indeed the population must be a picture for mathematical reasons trust me on this 426 Because we have no choice with a measurement the population is a picture for consistency I will refer to the probability histogram of a count response as the population Except when I don t occasionally it will be convenient for me to view the probability distribution such as the one in Table 17 1 as being the population As Oscar Wilde reportedly said Consistency is the last refuge of the unimaginative It can be shown that the mean of the cat population equals 1 40 cats per household and its standard deviation equals 0 80 cats per household I suggest you trust me on the accuracy of these values Certainly if one imagines a fulcrum placed at 1 40 in Figure 17 1 it appears that the picture will balance If you really enjoy hand computations you can use Equations 7 1 and 7 3 on pages 147 and 148 to obtain 1 40 and 2 0 64 Finally if you refer to my original description of the cat population in Example 17 1 you can easily verify that the median of the 100 000 population values is 1 In the sorted list positions 10 001 through 60 000 are all home to the response value 1 Thus the two center positions 50 000 and 50 001 both house 1 s hence the median is 1 For future use it is convenient to have a Greek letter to represent the median of a population we will use pronounced as new You have now seen the veracity of my comment in the first paragraph of this chapter the population for a count response a probability histogram is much more complicated than the population for a dichotomy the number p Thus far with the cat population I have focused exclusively on Nature s perspective We now turn to the view of a researcher Imagine that you are a researcher who is interested in the cat population All you would know is that the response is a count thus the population is a probability histogram But which probability histogram It is natural to begin with the idea of using data to estimate the population s probability histogram How should you do that Mathematically the answer is simple Select a random sample from the population of 100 000 households Provided that the sample size n is 5 or fewer of the population size N 100 000 whether the sample is smart or dumb matters little and can be ignored For the cat population this means a sample of 5 000 or fewer households It is beyond my imagination see Wilde quote above that a cat population researcher would have the energy and resources to sample more than 5 000 households In practice a researcher would attempt to obtain a sample for which the WTP assumption Definition 10 3 on page 240 is reasonable Because the cat population is hypothetical …
View Full Document