DOC PREVIEW
UCSD ECE 271A - Bayesian Parameter Estimation

This preview shows page 1-2-14-15-29-30 out of 30 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Bayesian parameter estimation Nuno Vasconcelos UCSD 1 Bayesian parameter estimation the main difference with respect to ML is that in the Bayesian case is a random variable basic concepts training set D x1 xn of examples drawn independently probability density for observations given parameter PX x prior distribution for parameter configurations P that encodes prior beliefs about them goal to compute the posterior distribution P X D 2 Bayes vs ML there are a number of significant differences between Bayesian and ML estimates D1 ML produces a number the best estimate to measure its goodness we need to measure bias and variance this can only be done with repeated experiments Bayes produces a complete characterization of the parameter from the single dataset in addition to the most probable estimate we obtain a characterization of the uncertainty lower uncertainty higher uncertainty 3 Bayes vs ML D2 optimal estimate under ML there is one best estimate under Bayes there is no best estimate only a random variable that takes different values with different probabilities technically speaking it makes no sense to talk about the best estimate D3 predictions remember that we do not really care about the parameters themselves they are needed only in the sense that they allow us to build models that can be used to make predictions e g the BDR unlike ML Bayes uses ALL information in the training set to make predictions 4 Bayes vs ML let s consider the BDR under the 0 1 loss and an independent sample D x1 xn ML BDR pick i if i x arg max PX Y x i i PY i i where i arg max PX Y D i two steps i find ii plug into the BDR all information not captured by is lost not used at decision time 5 Bayesian BDR in summary pick i if x i D P x i P i D d i x arg max PX Y T x i Di PY i i where PX Y T i X Y Y T i note as before the bottom equation is repeated for each class hence we can drop the dependence on the class and consider the more general problem of estimating PX T x D PX x P T D d 6 The predictive distribution the distribution PX T x D PX x P T D d is known as the predictive distribution this follows from the fact that it allows us to predict the value of x given ALL the information available in the training set note that it can also be written as PX T x D E T PX x T D since each parameter value defines a model this is an expectation over all possible models each model is weighted by its posterior probability given training data 7 The predictive distribution suppose that PX x N 1 P T D N 2 and weight 1 1 P T D PX T x D weight 2 1 2 2 2 weight 2 1 1 2 1 1 the predictive distribution is an average of all these Gaussians PX T x D PX x P T D d x 8 The predictive distribution Bayes vs ML ML pick one model Bayes average all models are Bayesian predictions very different than those of ML they can be unless the prior is narrow P T D P T D max Bayes ML max very different 9 MAP approximation this sounds good why use ML at all the main problem with Bayes is that the integral PX T x D PX x P T D d can be quite nasty in practice one is frequently forced to use approximations one possibility is to do something similar to ML i e pick only one model this can be made to account for the prior by picking the model that has the largest posterior probability given the training data MAP arg max P T D 10 MAP approximation this can usually be computed since MAP arg max P T D arg max PT D P and corresponds to approximating the prior by a delta function centered at its maximum P T D P T D MAP MAP 11 MAP vs ML ML BDR pick i if i x arg max PX Y x i i PY i i where i arg max PX Y D i Bayes MAP BDR pick i if i x arg max PX Y x i iMAP PY i i where iMAP arg max PT Y D i P Y i the difference is non negligible only when the dataset is small there are better alternative approximations 12 Example let s consider an example of why Bayes is usefull example communications a bit is transmitted by a source corrupted by noise and received by a decoder Y X channel Q what should the optimal decoder do to recover Y 13 Example the optimal solution is to threshold X pick T decision rule Y 0 if x T 1 if x T what is the threshold the midpoint between signal values x 1 0 2 14 Example today we consider a slight variation Y X atmosphere receiver still two states Y 0 transmit signal s 0 Y 1 transmit signal s 0 same noise model X Y N 0 2 15 Example the BDR is still pick 0 if x 0 0 2 0 this is optimal and everything works wonderfully one day we get a phone call the receiver is generating a lot of errors something must have changed in the rover there is no way to go to Mars and check goal to do as best as possible with the info that we have at X and our knowledge of the system 16 Example what we know the received signal is Gaussian with same variance 2 but the means have changed there is a calibration mode rover can send a test sequence but it is expensive can only send a few bits if everything is normal received means should be 0 and 0 action ask the system to transmit a few 1s and measure X compute the ML estimate of the mean of X 1 Xi n i result the estimate is different than 0 17 Example we need to combine two forms of information our prior is that X N 0 2 our data driven estimate is that X N 2 Q what do we do n f 0 n for large n n f for small n n f 0 intuitive combination n n 1 n 0 n 0 1 n 1 n 0 n n 0 18 Bayesian solution Gaussian likelihood observations PT D G D 2 2is known Gaussian prior what we know P G D 0 02 0 02 are known hyper parameters we need to compute posterior distribution for P T D PT D P PT D 19 Bayesian solution posterior distribution P T D PT D P PT D note that this is a probability density we can ignore constraints terms that do not depend on and normalize when we are done we only need to work with P T D …


View Full Document

UCSD ECE 271A - Bayesian Parameter Estimation

Download Bayesian Parameter Estimation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayesian Parameter Estimation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayesian Parameter Estimation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?