New version page

# MIT 18 466 - Sequential decision theory

Pages: 2
Documents in this Course
Unformatted text preview:

February 14, 2003 1.6 Sequential decision theory. As previously, let the sample space be a measurable space (X,B)and P = {Pθ ,θ ∈ Θ} a family of laws on it. Let X1,X2,..., be independent and identically distributed with law Pθ .Let A be the space of possible (speciﬁc) actions, with a σ-algebra E.We have a σ-algebra T on Θ and a loss function L which is a measurable function L from Θ × A to [0, ∞]. A prior π may or may not be given on Θ. A sequential decision rule will consist of two functions N and δ as follows. Let X∞ be the set of all sequences {xn}n≥1 with xn ∈ X for all n.For n =1, 2,...,let Bn be the smallest σ-algebra of subsets of X∞ for which X1,... ,Xn are measurable. Let B0 be the trivial σ-algebra {φ, X∞}.Then N is a function from X∞ into N := {0, 1, 2,...}such that for each k =0, 1, 2,... , {N ≤ k}∈Bk . Such a function is called a stopping time or stopping rule.The terminal decision rule, δ, is a sequence of functions {δ0≤n<∞n}where δ0 ∈ A and for each n ≥ 1,δn is a measurable function from Xn into A. The action actually taken will be δN := δN (X1,... ,XN ). Let φ := {N (·),δ}. If c ≥ 0 is the cost of each observation, the total loss (including costs of observations) in a given case is L(θ, δN )+ Nc.The risk is then r(θ, φ):= c EN + EL(θ, δN )·where the expectations are with respect to P∞ on X∞.Note that if c =0, and N isθ required to take ﬁnite values as in the above deﬁnition, then in general, optimal rules do not exist. Example. This is actually not a statistical decision problem as just formulated but it will illustrate some possible diﬃculties. Suppose that a gambler can play a sequence of games as follows. In the nth game, the gambler wagers \$1 and wins \$100·2n+1 with probability 0.01/2n . Thus the expected gain in each game is \$2 - \$1 = \$1. So the “Bayes” or optimal strategy would seem to be to continue playing indeﬁnitely. But the probability that the gambler ever wins is ≤�n≥1 0.01/2n =0.01. If the gambler never wins, which occurs with probability ≥ 0.99, then the gambler wagers and loses inﬁnitely many dollars. The expected or average gain from games won is also inﬁnite, if the gambler continues to play, so that the overall average gain is ∞−∞ (undeﬁned). There is actually no Bayes (optimal) strategy. Let fn be the net winnings after n plays. Then Efn = n → +∞ while fn →−∞ a.s. by the Borel-Cantelli lemma. We can also consider sequential randomized decision rules deﬁned as follows. For n =1, 2,...,let (An, En) be a measurable space, where An is the space of speciﬁc actions whichcan be takenafter n observations. Often, all (An , En) will be equal to one space (A,E). Assume that for each n,0 ∈ An, where the action “0” will mean taking another observation Xn+1, while all other actions in An will imply taking no more observations. Each φn is a measurable function from (Xn , Bn)intothe space DE of all probability laws on (A,E). So, given X1,... ,Xn, if no decision to stop has been made earlier, we then take another observation with probability φn(X1,... ,Xn)({0}) and otherwise stop and take an action chosen from An \{0} with distribution φn(X1,... ,Xn)/(1 − φn(X1,... ,Xn)({0})). 1So for a sequential randomized test of P vs. Q,we can take An = A = {−1, 0, 1} for all n, where (as in Sec. 1.5) −1 means choosing P and +1 means choosing Q. PROBLEMS 1. In the example at the end of Sec. 1.5 and where RQ/P has only values t or 1/t (not 1) with t =2, let φ be a randomized test that does SPRT(1/8, 2) or SPRT(1/2, 8) with probability 1/2 each. Compare the performance of this test to SPRT(1/4, 4) in terms of error probabilities and average sample numbers. 2. For a sequential test of P vs. Q as in Problem 1, suppose that the nth observation costs 1/3n, while LPQ = LQP =3 and π(P )= π(Q)=1/2. A decision rule must reach a decision after a ﬁnite number N of observations. Is there an optimal (Bayes) sequential test in this case? Why, or why not?

View Full Document Unlocking...