1Ch. 11 General Bayesian Estimators2IntroductionIn Chapter 10 we: • introduced the idea of a “a priori” information on θ⇒ use “prior” pdf: p(θ)• defined a new optimality criterion⇒ Bayesian MSE• showed the Bmse is minimized by E {θ|x}called:• “mean of posterior pdf”• “conditional mean”In Chapter 11 we will:• define a more general optimality criterion⇒ leads to several different Bayesian approaches⇒ includes Bmse as special caseWhy? Provides flexibility in balancing: • model, • performance, and• computations311.3 Risk FunctionsPreviously we used Bmse as the Bayesian measure to minimize()εθθθθθ∆=−−=ˆ),(...ˆ2xptrwEBmseSo, Bmse is… Expected value of square of errorLet’s write this in a way that will allow us to generalize it.Define a quadratic Cost Function: ()22ˆ)(θθεε−==CThen we have that{})(εCEBmse=εC(ε) = ε2Why limit the cost function to just quadratic?4General Bayesian Criteria1. Define a cost function: C(ε)2. Define Bayes Risk: R = E{C(ε)} w.r.t. p(x,θ){})ˆ()ˆ(θθθ−= CERDepends on choice of estimator3. Minimize Bayes Risk w.r.t. estimateθˆThe choice of the cost function can be tailored to:• Express importance of avoiding certain kinds of errors• Yield desirable forms for estimates– e.g., easily computed•Etc.5Three Common Cost Functions1. Quadratic: C(ε) = ε2εC(ε)2. Absolute: C(ε) = | ε |εC(ε)3. Hit-or-Miss:≥<=δεδεε,1,0)(Cδ > 0 and smallεC(ε)δ–δ6General Bayesian EstimatorsDerive how to choose estimator to minimize the chosen risk:[]dxxpdθθ|xpθθCxpθ|xpdθdxx,θpθθCθθCEθg)()()ˆ()()()()ˆ()ˆ()ˆ()ˆ(}{∫∫∫∫∆=−==−=−=θRmust minimize this for each x valueSo… for a given desired cost function… you have to find the form of the optimal estimator7The Optimal Estimates for the Typical Costs1. Quadratic:())ˆ(ˆ)ˆ(2θθθθBmseE =−=Rxx|( of mean}|{ˆθθθpE==As we saw in Ch. 102. Absolute:{}θθθˆ)ˆ( −= ER)|( of median ˆxθθp=3. Hit-or-Miss:)|( of mode ˆxθθp=p(θ|x)θModeMedianMeanIf p(θ|x) is unimodal & symmetricmean = median = mode“Maximum A Posteriori”or MAP8Derivation for Absolute Cost Functionθθθθθθθθθθθθθθθθθθθθθθθˆ|ˆ| whereregionˆˆ|ˆ| whereregionˆ)|()ˆ()|()ˆ()|(|ˆ|)ˆ(−=−∞−=−∞−∞∞−∫∫∫−+−=−=dpdpdpgxxxWriting out the function to be minimized gives:Now set0ˆ)ˆ(=θ∂θ∂gand use Leibnitz’s rule for∫φφ∂∂)()(21),(uudvvuhu0)|()|(ˆˆ=−⇒∫∫∞−∞θθdθθpdθθp xxwhich is satisfied if… (area to the left) = (area to the right)⇒ Median of conditional PDF9Derivation for Hit-or-Miss Cost Function∫∫∫∫+−∞+−∞−∞∞−−=⋅+⋅=−=δθδθδθδθθθθθθθθθθθˆˆˆˆ)x|(1)x|(1)x|(1)x|()ˆ()ˆ(dpdpdpdθpCgWriting out the function to be minimized gives:Almost all the probability = 1 – left outMaximize this integralSo… center the integral around peak of integrand⇒ Mode of conditional PDF1011.4 MMSE EstimatorsWe’ve already seen the solution for the scalar parameter casexx|( of mean}|{ˆθθθpE==Here we’ll look at:• Extension to the vector parameter case• Analysis of Useful Properties11Vector MMSE EstimatorThe criterion is… minimize the MSE for each componentVector Parameter: []Tpθθθ21=θVector Estimate: []Tpθθθˆˆˆˆ21=θis chosen to minimize each of the MSE elements:∫−=−iiiiiiddpEθθθθθθxx ),()ˆ(})ˆ{(22= p(x, θ) integrated over all other θj’s∫∫∫∫∫===θxθxxxxxddpdddpddpippiiiii),(),,,(),(ˆ11θθθθθθθθθθ…From the scalar case we know the solution is:}|{ˆxiiEθθ=12So… putting all these into a vector gives:[][][]{}xxxxθ|}|{}|{}|{ˆˆˆˆ212121TpTpTpEEEEθθθθθθθθθ==={}xθθ |ˆE=Vector MMSE Estimate = Vector Conditional MeanSimilarly…[]pidpBmseiii,,1)(C)ˆ(|…==∫xxxθθwhere[][]{}TEEE }|{}|{||xθθxθθCxθxθ−−=13Ex. 11.1 Bayesian Fourier AnalysisSignal model is: x[n] = acos(2πfon) + bsin(2πfon) + w[n]AWGN w/ zero mean and σ2),(~2I0θθσNba=θ and w[n] are independent for each nThis is a common propagation model called Rayleigh FadingWrite in matrix form: x = Hθ + w Bayesian Linear Model↓↓↑↑= sinecosineH14122212211}|{ˆ−−+=+==σσσσσθθHHICxHHHIxθθx|θTTTEResults from Ch. 10 show thatFor fochosen such that H has orthogonal columns thenxHxθθTE+==222111}|{ˆσσσθ==∑∑−=−=1010)2sin(][2ˆ)2cos(][2ˆNnoNnonfnxNbnfnxNaπβπβ22/211θσσβN+=Fourier Coefficients in the BracketsRecall: Same form as classical result, except there β= 1Note:β≈ 1 if σθ2 >> 2σ2/N⇒ if prior knowledge is poor, this degrades to classical15Impact of Poor Prior KnowledgeConclusion: For poor prior knowledge in Bayesian Linear ModelMMSE Est. → MVU Est.Can see this holds in general: Recall that[][]θwwθθHµxCHHCHCµxθθ +++==−−−− 1111}|{ˆTTEFor no prior information:0Cθ→−1and0µθ→[]xCHHCHθww111ˆ−−−→TTMVUE for General Linear Model16Useful Properties of MMSE Est.1. Commutes over affine mappings:If we have α = Aθ + b thenbθAα +=ˆˆ2. Additive Property for independent data sets Assume θ, x1, x2are jointly Gaussian w/ x1and x2 independent}]{[}]{[}{ˆ2211112211xxCCxxCCθθxxθxxθEEE −+−+=−−a priori EstimateUpdate due to x1Update due to x2Proof: Let x = [x1Tx2T]T. The jointly Gaussian assumption gives:[]−−+=−+=−−−}{}{00}{}]{[}{ˆ22111112121xxxxCCCCθxxCCθθEEEEExxxxxxθθθIndep. ⇒ Block DiagonalSimplify to get the resultWill be used for Kalman Filter3. Jointly Gaussian case leads to a linear estimator:mPxθ
View Full Document