111.5 MAP EstimatorRecall that the “hit-or-miss” cost function gave the MAP estimator… it maximizes the a posteriori PDFQ: Given that the MMSE estimator is “the most natural” one…why would we consider the MAP estimator?A: If x and θ are not jointly Gaussian, the form for MMSE estimate requires integration to find the conditional mean.MAP avoids this Computational Problem!Note: MAP doesn’t require this integrationTrade “natural criterion” vs. “computational ease”What else do you gain? More flexibility to choose the prior PDF2Notation and Form for MAP)|(maxargˆxθθθpMAP=MAPθˆNotation: maximizes the posterior PDF“arg max” extracts the value of θ that causes the maximumEquivalent Form (via Bayes’ Rule):)]()|([maxargˆθθθθppMAPx=Proof: Use)()()|()|(xxxppppθθθ=)]()|([maxarg)()()|(maxargˆθθθθθθθpppppMAPxxx==Does not depend on θ3Vector MAP < Not as straight-forward as vector extension for MMSE >The obvious extension leads to problems:iθˆChoose to minimize }}ˆ({)ˆ(iiiCEθθθ−=RExp. over p(x,θi)⇒)|(maxargˆxiipiθθθ=1-D marginal conditioned on xNeed to integrate to get it!!Problem: The whole point of MAP was to avoid doing the integration needed in MMSE!!!Is there a way around this?Can we find an Integration-Free Vector MAP?pddppθθθ""21)|θ()|(∫∫= xx4Circular Hit-or-Miss Cost FunctionNot in BookFirst look at the p-dimensional cost function for this “troubling”version of a vector map:It consists of p individual applications of 1-D “Hit-or-Miss”ε1ε2δ-δ-δδ=square innot ),(,1square in ),(,0),(212121εεεεεεCThe corners of the square “let too much in” ⇒ use a circle!ε1ε2δ≥<=δδεεε,1,0)(CThis actually seems more natural than the “square” cost function!!!5MAP Estimate using Circular Hit-or-MissBack to BookSo… what vector Bayesian estimator comes from using this circular hit-or-miss cost function?Can show that it is the following “Vector MAP”)|(maxargˆxθθθpMAP=Does Not Require Integration!!!That is… find the maximum of the joint conditional PDFin all θiconditioned on x6How Do These Vector MAP Versions CompareIn general: They are NOT the Same!!Example: p = 2p(θ1, θ2| x)1/61/31/6θ1θ21234512The vector MAP using Circular Hit-or-Miss is:[]T5.05.2ˆ=θTo find the vector MAP using the element-wise maximization:θ1p(θ1|x)123451/61/3θ2p(θ2|x)121/32/3[]T5.15.2ˆ=θ7“Bayesian MLE”Recall… As we keep getting good data, p(θ|x) becomes more concentrated as a function of θ. But… since: )]()|([maxarg)|(maxargˆθθxxθθθθpppMAP==… p(x|θ) should also become more concentrated as a function of θ.p(x|θ)p(θ)θ• Note that the prior PDF is nearly constant where p(x|θ) is non-zero• This becomes truer as N →∞, and p(x|θ) gets more concentrated )|(maxarg)]()|([maxarg θxθθxθθppp≈MAP “Bayesian MLE”Uses conditional PDF rather than the parameterized PDF811.6 Performance CharacterizationThe performance of Bayesian estimators is characterized by looking at the estimation error:θθεˆ−=Random (due to a priori PDF)Random (due to x)Performance characterized by error’s PDF p(ε)We’ll focus on Mean and VarianceIf ε is Gaussian then these tell the whole storyThis will be the case for the Bayesian Linear Model (see Thm. 10.3) We’ll also concentrate on the MMSE Estimator9Performance of Scalar MMSE Estimator∫==θθθθθdpE)|(}|{ˆxxThe estimator is:Function of xSo the estimation error is:),(}|{θθθεxx fE =−=Function of two RV’sGeneral Result for a function of two RVs: Z = f (X, Y)dydxyxpyxfZEXY),(),(}{∫∫={}dydxyxpZEyxfZEZEZXY),(}){),((}){(}var{22∫∫−=−=10Evaluated as seen belowSo… applying the mean result gives: {}{}{}{}00}|{][}]|{[][}]|{[}|{}{|}|{|||,==−=−=−=−=xxxxxxxxxxxxxxEEEEEEEEEEEEEEEθθθθθθθθεθθθθθθSee Chart on “Decomposing Joint Expectations”in “Notes on 2 RVs”Pass Eθ|xthrough the termsTwo Notations for the same thing}|{)|(}|{)|(}|{}]|{[||on dependnot does|xxxxxxxxxθθθθθθθθθθθθθθEdpEdpEEE===∫∫0}{ =εEi.e., the Mean of the Estimation Error (over data & parm) is Zero!!!!11And… applying the variance result gives: {}{}{})ˆ(),()ˆ()ˆ(}){(}var{,2222θθθθθθθεεεεθBmseddpEEEE=−=−==−=∫∫xxxUse E{ε} = 0So… the MMSE estimation error has:•mean = 0•var = BmseSo… when we minimize Bmsewe are minimizing the variance of the estimateIf εis Gaussian then())ˆ(,0~θεBmseN12Ex. 11.6: DC Level in WGN w/ Gaussian PriorWe saw that22/1/1)ˆ(ANABmseσσ+=AAAANNxNAµσσσσσσ+++=///ˆ222222withconstantconstantSo…Aˆis Gaussian+22/1/1,0~ANNσσεNote: As N gets large this PDF collapses around 0.This estimate is “consistent in the Bayesian sense”Bayesian Consistency: For large N(regardless of the realization of A!)AA ≈ˆthis is Gaussian because it is a linear combo of the jointly Gaussian data samplesIf X is Gaussian then Y = aX + bis also Gaussian13Performance of Vector MMSE Estimatorθθεˆ−=Vector estimation error:The mean result is obvious. Must extend the variance result:θθx,εMεεCεˆ}{}cov{∆===TESome New Notation…“Bayesian Mean Square Error Matrix”Look some more at this: }{}{}{||ˆ}{ ]][[]][[}|{}|{}|{}|{xθxxθxθx,θxθθxθθxθθxθθMCEEEEEEEETT=−−=−−=0ε =}{E}{|ˆxθxθεMC CE==General Vector Results:See Chart on “Decomposing Joint Expectations”= Cθ|xIn general this is a function of x14θMˆThe Diagonal Elements of are Bmse’s of the Estimates[])(),(}]|{[),(}]|{[}{221iiiiiiiiiTBmsedθdθpEddpEEipθθθθθθθθ=−=−=∫∫∫∫ ∫xxθx,xxxθxθxxεε "To see this:Why do we call the error covariance the “Bayesian MSE Matrix”?Integrate over all the otherparameters…“marginalizing”the PDF15Perf. of MMSE Est. for Jointly Gaussian CaseLet the data vector x and the parameter vector θ be jointly Gaussian.0ε=}{ENothing new to say about the mean result:Now… look at the Error Covariance (i.e., Bayesian MSq Matrix): }{|ˆxθxθεMC CE==Recall General Result:Thm 10.2 says that for Jointly Gaussian Vectors we get that…Cθ|xdoes NOT depend on xxθxθxθεMC||ˆ}{ CCE ===xθxθxθxθθεCCCCMC1|ˆ−−===CThm 10.2 also gives the form as:16Perf. of MMSE Est. for Bayesian Linear ModelwHθx+=~N(µθ,Cθ)~N(0,Cw)Recall the
View Full Document