**Unformatted text preview:**

18.466, Dudley March 11, 2003 CHAPTER 1. DECISION THEORY AND TESTING SIMPLE HYPOTHESES 1.1 Deciding between two simple hypotheses: the Neyman-Pearson Lemma. Probability theory is reviewed in Appendix D. Suppose an experiment has a set X of possible outcomes. The outcome has some probability distribution µ deﬁned on X.In statistics, we typically don’t know what µ is, but we have hypotheses about what it may be. After making observations we’ll try to make a decision between or among the hypotheses. In general there could be inﬁnitely many possibilities for µ, but to begin with we’re going to look at the case where there are just two possibilities, µ = P or µ = Q, and we need to decide which it is. For example, a point x in X could give the outcome of a test for a certain disease, where P is the distribution of x for those who don’t have the disease and Q is the distribution for those who do. Often, we have n observations independent with distribution µ.Then X can be replaced by the set Xn of all ordered n-tuples (x1,... ,xn)ofpointsof X,and µ by the Cartesian product measure µ × ··· × µ of n copies of µ. In this way, the case of n observations x1,... ,xn reduces to that of one “observation” (x1,... ,xn). The probability measures P and Q are each deﬁned on some σ-algebra B of subsets of X, such as the Borel sets in case X is the real line R or a Euclidean space. A test of the hypothesis that µ = P will be given by a measurable set A,in other words a set A in B.If we observe x in A, then we will reject the hypothesis that µ = P in favor of the alternative hypothesis that µ = Q.Then P (A) is called the size of the test A (at P ). The size is the probability that we’ll make the error of rejecting P when it’s true, i.e. when µ = P , sometimes called a Type I error. On the other hand, Q(A) is called the power of the test A against the alternative Q. The power is the probability that when Q is true, the test correctly rejects P and prefers Q. The complementary probability 1 − Q(A) is sometimes called the probability of a Type II error. Given P and Q,for thetest A to be as eﬀective as possible, we’d like the size to be small and the power to be large. In the rest of this section, it will be shown how the choice of A can be made optimally. Example 1.1.1. Let X = R and let P and Q be normal measures, both with variance 0.04, P = N(0, 0.04) and Q = N(1, 0.04). Larger values of x tend to favor Q, so it seems reasonable to take A as a half-line [c, ∞)for some c.At x =1/2, the densities of P and Q are equal. For x< 1/2,P has larger density. For x> 1/2, Q does. Soifwehaveno reason in advance to prefer one of P and Q,wemight take c =1/2. Then the probabilities of the two types of errors are each about 0.0062 (from tables of the normal distribution). In other words the size is 0.0062 and the power is 0.9938. If the variances had been larger, so would the error probabilities. It’s not always best to prefer the distribution (P or Q) with larger density at the observation (or vector of observations). In testing for a disease, an error indicating a disease when the subject is actually healthy can lead to further, possibly expensive tests or inappropriate treatments. On the other hand the error of overlooking a disease when the patient has it could be much more serious, depending on the severity of the disease. 1Numerical values called losses will be assigned to the consequences of mistaken deci-sions. Let Lµν be the loss incurred when µ is true and we decide in favor of ν. A correct decision will be assumed to cause zero loss, so LPP = LQQ = 0. The losses LPQ and LQP will be positive and in general will be diﬀerent. Also, the statistician may have assigned some probabilities to P or Q in advance, called prior probabilities, say π(P )=1 − π(Q)with0 <π(P ) < 1. For example it could be known from other data (approximately) what fraction of people in a population being tested have a disease. The part of statistics in which prior probabilities are assumed to exist is known as Bayesian statistics, as contrasted with frequentist statistics where priors are not assumed. In this book, both are treated. Later on, some pros and cons of the Bayesian and frequentist approaches will be mentioned. It will turn out that the best tests between P and Q will be based on the ratio of densities of P and Q, called the likelihood ratio, deﬁned as follows. In general, P or Q could have continuous or discrete parts, but P and Q are always absolutely continuous with respect to P + Q, so that there is a Radon-Nikodym derivative (RAP, 5.5.4) h(x)= (dP/d(P + Q))(x). Then dQ/d(P + Q)=1 − h.The likelihood ratio RQ/P (x)of Q to P at x is deﬁned as (1 − h(x))/h(x), or +∞ if h(x) = 0. The likelihood ratio, like h, is deﬁned up to equality (P + Q)-almost everywhere. If P and Q have densities f and g respectively with respect to some measure, for example Lebesgue measure on R,then we can take RQ/P (x)= g(x)/f (x)if f (x) > 0, or +∞ when g(x) > 0= f (x), or 0 when g(x)=0= f (x). For a proof, see Appendix A. In Example 1.1.1, RQ/P (x) ≡ e25(x−0.5).Or, let P and Q both be Poisson distributions on the set N of nonnegative integers with P (k)= Pλ(k)= e−λλk /k!and Q = Pρ for some ρ.Then RQ/P (k)= eλ−ρ(ρ/λ)k for all k ∈ N. The sizes α =0.05, 0.01 and 0.001 were chosen rather arbitrarily in the ﬁrst half of the 20th century and used in selecting tests. So, if a test A has size α =0.05 or less at P ,and the observation x is in A, the outcome is called “statistically signiﬁcant” and the hypothesis P is rejected. If α ≤ 0.001 the outcome is called “highly signiﬁcant.” The levels 0.05 etc. are still in wide use in some applied ﬁelds, such as medicine and psychology, although they are no longer very popular among statisticians themselves. For discrete distributions, not many sizes of tests may be available, as in the following: Example 1.1.2.Let X = {0, 1, 2} ,P (0) = 0.8,P (1) = 0.05,P (2) = 0.15,Q(0) = 0.008,Q(1) = 0.002,Q(2) = 0.99. Then RQ/P (x)is 0.01, 0.04, and 6.6for x =0, 1, 2, respectively. Example 1.1.2 suggests that, at least for discrete distributions, one not insist on conventional, speciﬁc sizes for tests.

View Full Document