**Unformatted text preview:**

March 18, 2003 1.3 Bayes decision theory. The distinguishing feature of Bayesian statistics is that a probability distribution π, called a prior, is given on the parameter space (Θ,T ). Some-times, priors are also considered which may be inﬁnite, such as Lebesgue measure on the whole real line, but such priors will not be treated here at least for the time being. A Bayesian statistician chooses a prior π based on whatever information on the un-known θ is available in advance of making any observations in the current experiment. In general, no deﬁnite rules are prescribed for choosing π. Priors are often useful as technical tools in reaching non-Bayesian conclusions such as admissibility in Theorems 1.2.5 and 1.2.6. Bayes decision rules were deﬁned near the end of the last section as rules which minimize the Bayes risk and for which the risk is ﬁnite. Bayes tests of P vs. Q,treated in Theorem 1.1.8, are a special case of Bayes decision rules. We saw in that case that Bayes rules need not be randomized (Remark 1.1.9). The same is true quite generally in Bayes decision theory: if, in a given situation, it is Bayes to choose at random among two or more possible decisions, then the decisions must have equal risks (conditional on the observations) and we may as well just take one of them. Theorem 1.3.1 will give a more precise statement. In game theory, randomization is needed to have a strategy that is optimal even if the opponent knows it and can choose a strategy accordingly. If one knows the opponent’s strategy then it is not necessary to randomize. Sometimes, statistical decision theory is viewed as a game against an opponent called “Nature.” Unlike an opponent in game theory, “Nature” is viewed as neutral, not trying to win the game. Assuming a prior, as in Bayes decision theory, is to assume in eﬀect that “Nature” follows a certain strategy. In showing that randomization isn’t needed, it will be helpful to formulate randomiza-tion in a fuller way, where we not only choose a probability distribution over the possible actions, but then also choose an action according to that distribution, in a measurable way, as follows: Deﬁnition. A randomized decision rule d : X DE is realizable if there is a probability →space (Ω, F ,µ) and a jointly measurable function δ : X × Ω A such that for each → x in X, δ(x, ·) has distribution d(x), in other words d(x) is the image measure of µ by δ(x,·),d(x)= µ ◦ δ(x, ·)−1 . For example, a randomized test as in Sec. 1.1 is always a realizable rule, where we can take Ω as the interval [0, 1] with Lebesgue measure and let δ(x, t)= dQ if t ≤ f(x)and dP otherwise. It is shown in the next section that decision rules are realizable under conditions wide enough to cover a great many cases, for example whenever the action space is a subset of aspace Rk with Borel σ-algebra. It will be shown next that randomization is unnecessary for realizable Bayes rules. The idea is that the Bayes risk of a realizable randomized Bayes rule d(·) is an average of Bayes risks of non-randomized rules δ(·,ω). Since a Bayes rule has minimum Bayes risk, the risks of δ(·,ω) are no smaller, so they must almost all be equal to that of d(·). Then such non-randomized δ(·,ω)for ﬁxed ω are Bayes rules. 1 � = ��� � 1.3.1 Theorem. For any decision problem for a measurable family {Pθ ,θ ∈ Θ} and prior π, if there is a realizable Bayes randomized decision rule d, then there is a non-randomized Bayes decision rule. Proof. First, here is a helpful technical fact: 1.3.2 Lemma. For any measurable family {Pθ ,θ ∈ Θ} and nonnegative, jointly measurable function f : θ, x, ω f(θ, x, ω), the function g deﬁned by g(θ, ω):= →∫ f(θ, x, ω)dPθ (x) is jointly measurable. Proof. If f(θ, x, ω)=1T (θ)1B (x)1F (ω)for some T , B ∈Band F ∈F,then ∈Tg(θ, ω)= Pθ (B)1T (θ)1F (ω)is measurable in θ, ω since θ Pθ (B)is measurable by →assumption. The rest of the proof of the Lemma is like that of Prop. 1.2.4. � Now to prove Theorem 1.3.1, take (Ω, F,µ)and δ(·, ·) as in the deﬁnition that d is realizable. For each ﬁxed ω ∈ Ω, δ(·,ω) is a non-randomized decision rule. So r(π, δ(·,ω)) ≥ r(π, d)since d is Bayes for π. Also, writing ν(da):= dν(a)for a measure ν, r(π, d)= r(θ, d)dπ(θ)= �� r(θ, d(x))dPθ (x)dπ(θ) (by the deﬁnitions) L(θ, a)d(x)(da)dPθ (x)dπ(θ)= ��� L(θ, δ(x, ω))dµ(ω)dPθ (x)dπ(θ) by the image measure theorem, e.g. RAP, 4.1.11. So by the Tonelli-Fubini theorem for nonnegative measurable functions, twice, and the measurability shown in Lemma 1.3.2, we get r(π, d)= ��� L(θ, δ(x, ω))dPθ (x)dπ(θ)dµ(ω)= r(π, δ(·,ω))dµ(ω). Thus, r(π, δ(·,ω)) = r(π, d)for µ-almost all ω,and so forsome ω, providing a Bayes non-randomized decision rule δ(·,ω). � If every randomized rule is realizable, as is shown in the next section under conditions given there, then Theorem 1.3.1 shows that the non-randomized rules form an essentially complete class, as deﬁned in Sec. 1.2. It will also be shown in Sec. 2.2 below that non-randomized rules are (essentially) complete under some other conditions. Deﬁnition.A family {Pθ ,θ ∈ Θ} of laws on a measurable space (X, B) will be called dominated if for some σ-ﬁnite measure v,each law Pθ is absolutely continuous with respect to v, in other words for any A ∈B, v(A) = 0 implies Pθ (A) = 0 for all θ. Often, v would be Lebesgue measure on Rk ; or, if the measures were all concentrated on a countable set such as the integers, v would be counting measure (the measure giving mass 1 to each point) on the set. 2If Pθ is absolutely continuous with respect to v, then by the Radon-Nikodym theorem (RAP, 5.5.4), it has a density or Radon-Nikodym derivative f(θ, x):=(dPθ/dv)(x). A σ-algebra B is called countably generated if there is a countable subcollection C⊂B such that B is the smallest σ-algebra including C. In any separable metric space, the Borel σ-algebra is countably generated (taking C as the set of balls with rational radii and centers in a countable dense set). In the great majority of applications of statistics, sample spaces are separable metric spaces, in fact

View Full Document