UIUC ECE 461 - Statistical Decision Theory (Basics) - D1642529

Home> Schools> University of Illinois at Urbana, Champaign> Electrical and Computer Engineering (ECE) > ECE 461> Statistical Decision Theory (Basics)

DOC PREVIEW

UIUC ECE 461 - Statistical Decision Theory (Basics)

School name University of Illinois at Urbana, Champaign

Course Ece 461- Probability Theory

Pages 3

This preview shows page 1 out of 3 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

ECE 461 Spring 2002Handout # 2 January 16, 2002Statistical Decision Theory (Basics)References: Poor, Chapter 1; Weber, Ferguson, Chapter 1.The basic problem in statistical decision theory is to make a right (optimal) choice from a set ofalternatives in a noisy environment. As we discussed in class, there are five basic ingredients in atypical decision theory problem.• S: The set of states (of nature). This is a set of unknowns which we would like to determine orestimate. We denote a typical state by s, i.e., s ∈ S.• D: The set of decisions or actions. This set is the set of decisions about the state. Elements inD would typically correspond to elements in S. In some applications such as the communicationsexample with erasure, the set D could have larger cardinality than the set S.• C(d, s): The cost function between decisions and states, C : D × S 7→ IR. In order to be able totalk about optimizing the decision, we need to quantify the cost incurred from each decision. Thecost function C serves this purpose.Example of a cost function: Binary CommunicationsS = {0, 1} D = {0, 1}C(d, s) =0 if d = s1 if d 6= s• Γ: The set of observations. In most applications, the choice of d is not made blindly but dependenton some random observation Y taking values in Γ. As will be the convention in the rest of course,we denote random variables by uppercase letters and their corresponding realizations by lowercaseletters. In particular, a realization of Y is denoted by y. Note that y ∈ Γ.• ∆: The set of decision rules. Since the decisions are based on the observations, we need to havemappings from the observation set to the decision set. These are the decision rules, i.e., δ ∈ ∆,δ : Γ 7→ D.Probabilistic Structure for ΓWe associate with Γ, a set of subsets of Γ to which we assign probabilities. Typically such a set is asigma-algebra. We denote the set by G. The pair (Γ, G) is called an observation space.In our applications, we will almost exclusively have Γ = IRn, or Γ = {γ1, γ2, . . .}, a countable set. Inthe case that Γ = IRn, we take G to be the smallest sigma-algebra containing all the n-dimensionalrectangles in IRn, i.e., the Borel sigma-algebra Bn. In the case when Γ = {γ1, γ2, . . .}, we take G to bethe power set of Γ, i.e., 2Γ.For Γ = IRn, probabilities can be assigned by the use of an n-dimensional probability density function(pdf). For Γ = {γ1, γ2, . . .}, probabilities can be assigned in terms of a probability mass function (pmf).cV. V. Veeravalli, 2002 1As we discussed in class, we will use the term density for both pdfs and pmfs. We denote this densityfunction by p : Γ 7→ [0, 1], and use a joint notation for the probability measure.For A ∈ G,P (A) =Zy∈Ap(y)µ(dy) =(Ry∈Ap(y)dy for Γ = IRnPγi∈Ap(γi) for Γ = {γ1, γ2, . . .}Let g be a function on Γ. Then the expected value of the random variable g(Y ) is given byE{g(Y )} =ZΓp(y)g(y)µ(dy) =RΓp(y)g(y)dy for Γ = IRnPΓp(γi)g(γi) for Γ = {γ1, γ2, . . .}Conditional distributionsIn order to make a decision about the state based on the observation Y , we need to how Y depends ons statistically. Typically, we assume that the conditional pdf (pmf) of Y conditioned on the state beings (which we denote by pY |s(y|s)) is available for each s ∈ S.OptimalityThe cost associated with a decision rule δ ∈ ∆ is a random quantity (because Y is random) given byC(δ(Y ), s). Therefore, to order decision rules according to their “merit” we use the the quantity,Rs(δ) = Es[C(δ(Y ), s)] =ZC(δ(y), s)pY |s(y|s)µ(dy),which we call the conditional risk associated with δ when the state is s.The conditional risk function can be used to obtain a (partial) ordering of the δ’s in ∆, in the sensethatDefinition A decision rule δ is better than decision rule δ0ifRs(δ) ≤ Rs(δ0), ∀s ∈ SandRs(δ) < Rs(δ0) for at least one s ∈ SSometimes it may be possible to find a decision rule δ?∈ ∆ which is better than any other δ ∈ ∆. Inthis case, the statistical decision problem is solved. Unfortunately, this usually happens only for trivialcases as we saw in class.There are two main approaches to finding optimal decision rules:Bayesian ApproachHere we assume that we are given an a priori probability distribution on the set of states S. Here again,we will assume that S is either IRnor a countable set. The state is then denoted by a random variableS with pdf (pmf) PS(s). We will use the “µ” notation here to cover both the cases for S.Now we introduce the average risk or Bayes risk associated with a decision rule δ. We denote thisaverage risk by r(δ), which is given byr(δ) = E [RS(δ)] =ZSRs(δ)pS(s)µ(ds)cV. V. Veeravalli, 2002 2We can then obtain a (partial) ordering on the δ’s by using the Bayes risk. We choose the decision ruleδBwhich has minimum Bayes risk, i.e.,δB= arg minδ∈∆r(δ).The decision rule δBis called the Bayes rule.Minimax ApproachWhat if we are not given a prior distribution on the set S? We could assume a distribution on thepriors (for example, a uniform distribution) and use the Bayesian approach. This might be possibleif the set of states is finite, as in the binary communications example. On the other hand, one maywant to guarantee a certain level of performance for all choices of state. In this case, we use a minimaxapproach. The goal of the minimax approach is to find the decision rule δmwhich has minimum valuefor maxsRs(δ). That is, δmhas the best worst case cost.maxsRs(δm) ≤ maxsRs(δ) for any δ ∈ ∆The decision rule δmis called the minimax rule.In addition to Bayes and minimax approaches there are other criteria and techniques that are specific tospecial classes of decision-making problems. For example, in binary hypothesis testing, a third approachcalled the Neyman-Pearson approach is often used in practice.Randomized Decision RulesEven though this might seem counter-intuitive, it is sometimes possible to get a better decision rule byrandomly choosing between a set of deterministic decision rules.Definition A randomized decision rule˜δ is described by˜δ(y) = δi(y) with probability pi, i = 1, . . . , kfor some k and some pi’s such thatPipi = 1.The set˜∆ of randomized decision rules obviously contains the set ∆, and thus optimizing over˜∆ willnecessarily result in at least as good a decision rule as that obtained by optimizing over ∆. However,as we showed in class, randomization can never yield

View Full Document