Chapter 12 Inference for a Binomial p In Part 1 of these Course Notes you learned a great deal about statistical tests of hypotheses These tests explore the unknowable in particular whether or not the Skeptic s Argument is true In Section 11 4 I briefly introduced you to three tests that explore whether or not a sequence of dichotomous trials are Bernoulli trials In this chapter we will assume that we have Bernoulli trials and turn our attention to the value of the parameter p Later in this chapter we will explore a statistical test of hypotheses concerning the value of p First however I will introduce you to the inference procedure called estimation I will point out that for Bernoulli trials estimation is inherently much more interesting than testing The estimation methods in this chapter are relatively straightforward This does not mean however that the material will be easy you will be exposed to several new ways of thinking about things and this will prove challenging After you complete this chapter however you will have a solid understanding of the two types of inference that are used by scientists and statisticians testing and estimation Most of the remainder of the material in these Course Notes will focus on introducing you to new scientific scenarios and then learning how to test and estimate in these scenarios In some scenarios you also will learn about the closely related topic of prediction Thus for the most part after this chapter you will have been exposed to the major ideas of this course and your remaining work being familiar should be easier to master 12 1 Point and Interval Estimates of p Suppose that we plan to observe n Bernoulli Trials More accurately we plan to observe n dichotomous trials and we are willing to assume for the moment at least that the assumptions of Bernoulli trials are met Throughout these Course Notes unless I state otherwise we always will assume that the researcher knows the value of n Before we observe the n Bernoulli trials if we know the numerical value of p then we can compute probabilities for X the total number of successes that will be observed If we do not know that numerical value of p then we cannot compute probabilities for X I would argue not everyone agrees with me that there is a gray area between these extremes refer to my example 283 concerning basketball player Kobe Bryant on page 264 of Chapter 11 i e if I have a massive amount of previous data from the process that generates my future Bernoulli trials then I might be willing to use the proportion of successes in the massive data set as an approximate value of p Still assuming that the numerical value of p is unknown to the researcher after n Bernoulli trials are observed if one is willing to condition on the total number of successes then one can critically examine the assumption of Bernoulli trials using the methods presented in Section 11 4 Alternatively we can use the data we collect the observed value x of X to make an inference about the unknown numerical value of p Such inferences will always involve some uncertainty To summarize if the value of p is unknown a researcher will attempt to infer its value by looking at the data It is convenient to create Nature introduced in Chapter 8 in the discussion of Table 8 8 who knows the value of p The simplest inference possible involves the idea of a point estimate estimator as defined below Definition 12 1 Point estimate estimator A researcher observes n Bernoulli trials counts the number of successes x and calculates p x n This proportion p is called the point estimate of p It is the observed value of the random variable P X n which is called the point estimator of p For convenience we write q 1 p for the proportion of failures in the data q is the observed value of the random variable Q 1 P Before we collect data we focus on the random variable the point estimator After we collect data we compute the value of the point estimate which is of course the observed value of the point estimator I don t like the technical term point estimate estimator More precisely I don t like half of it I like the word point because we are talking about a single number I recall the lesson I learned in math years ago Every number is a point on the number line and every point on the number line is a number I don t particularly like the use of the word estimate estimator If I become tsar of the Statistics world I might change the terminology I say might instead of will because frankly I can t actually suggest an improvement on estimate estimator I recommend that you simply remember that estimate estimator is a word statisticians use whenever they take observed data and try to infer a feature of a population It is trivially easy to calculate p x n thus based on experiences in previous math courses you might expect that we will move along to the next topic But we won t In a Statistics course we evaluate the behavior of a procedure What does this mean Statisticians evaluate procedures by seeing how they perform in the long run We say that the point estimate p is correct if and only if p p Obviously any honest researcher wants the point estimate to be correct As we will see now whereas having a correct point estimate is desirable the concept has some serious difficulties Let s suppose that a researcher observes n 100 Bernoulli trials and counts a total of x 55 successes Thus p 55 100 0 55 and this point estimate is correct if and only if p 0 55 This leads us to the first difficulty with the concept of being correct Nature knows whether p is correct the researcher never knows 284 The above example takes place after the data have been collected We can see this because we are told that a total of x 55 successes were counted Now let s go back in time to before the data are collected and let s take on the role of Nature I will change the scenario a bit to avoid confusing this current example with what I just did As Nature I am aware that a researcher plans to observe n 200 Bernoulli trials I also know that p 0 600 but the researcher does not know this In addition after collecting the data the researcher will calculate the point estimate of p What will happen I don t know what will happen I don t make Nature omniscient it just knows the value of p When I don t know what will happen I resort to calculating probabilities In particular as Nature I know that p will be correct if and only if the total number of successes turns out to be 120 making p 120 200
View Full Document