DOC PREVIEW
UCLA STAT 231 - Lecture 5

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

5. Maximum Likelihood –IITopicsExponential Distributions.Sufficient Statistics.Sufficient Statistics of GaussianMLE for GaussianSufficient Statistics for GaussianExponential Models and MLE.Slide 9Maximum Entropy Principle.EntropyMaximum Entropy PrincipleMaximum EntropyMaximum Entropy.Minimax Principle.Model Selection.Slide 17Model Selection & Minimax.Slide 19Slide 205. Maximum Likelihood –IIProf. Yuille.Stat 231. Fall 2004.Topics•Exponential Distributions, Sufficient Statistics, and MLE.•Maximum Entropy Principle.•Model Selection.Exponential Distributions.•Gaussians are a member of the class of exponential distribution.•Parameters•StatisticsSufficient Statistics.•The are the sufficient statistics of the distribution.•Knowledge of is all we need to know about the data The rest is irrelevant.•Almost all distributions can be expressed as Exponentials – Gaussian, Poisson, etc.Sufficient Statistics of Gaussian•One-Dimensional Gaussian and samples•Sufficient statistics are •And•These are sufficient to learn the parameters of the distribution from data.MLE for Gaussian•To estimate the parameters – maximize•Or equivalently, maximize:•The sufficient statistics are chosen so thatSufficient Statistics for Gaussian•Distribution is of form:•This is the same as a Gaussian with mean•and varianceExponential Models and MLE.•MLE corresponds to maximizing Equivalent to minimizing WhereExponential Models and MLE.•This minimization is a convex optimization problem and hence has a unique solution. But finding this solution may be difficult.•Algorithms such as Generalized Iterative Scaling are guaranteed to converge.Maximum Entropy Principle.•An alternative way to think of Exponential Distributions and MLE.•Start with the Statistics, and then estimate the form and the parameters of the probability distribution.• Using the Maximum Entropy principle.Entropy•The entropy of a distribution is•Defined by Shannon as a measure of the information obtained by observing a sample from P(x).Maximum Entropy Principle•Maximum Entropy Principle. Select the distribution P(x) which maximizes the entropy subject to constraints.•Lagrange multipliers•The observed value of the statistics areMaximum Entropy•Minimize with respect to P(x). Gives the (exponential) form of the distribution:•Maximizing with respect to the Lagrange parameters ensures that the constraints are satisfied:•Maximum Entropy.•This gives the same result as MLE for Exponential Distributions. •Maximum Entropy + Constraints = Exponential Distribution + MLE Parameter.•The Max-Ent distribution which has the observed sufficient statistics is the exponential distribution with those statistics.•Example: can obtain a Gaussian by performing Max-Ent on statisticsMinimax Principle.•Construct a distribution incrementally by increasing the number of statistics•The entropy of the Max-Ent distribution with M statistics is given by:•Minimax Principle: select the statistics to minimize the entropy of the maximum entropy distribution. This relates to model selection.Model Selection.•Suppose we do not know which model generates the data.•Two models•Priors•Model selection enables us to estimate which model is most likely to have generated the dataModel Selection.•Calculate•Compare with•Observe that we must sum over all possible values of the model parametersModel Selection & Minimax.•The entropy of the Max-Ent distribution•Is minus the probability of the data•So the Minimax Principle is a form of model selection. But it estimates the parameters instead of summing them out.Model Selection.•Important Issue: Suppose the model has more parameters than Then is more flexible and can fit a larger number of data models.•But summing over the parameters •and penalizes this flexibility.•Gives “Occam’s Razor” favoring the simpler model.Model Selection.•More advanced modeling requires performing model selection – where the models are complex. •Beyond scope of this


View Full Document

UCLA STAT 231 - Lecture 5

Download Lecture 5
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 5 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 5 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?