UCLA STAT 231 - Maximum Likelihood - D2890155

Home> Schools> University of California, Los Angeles> (STAT) > STAT 231> Maximum Likelihood

DOC PREVIEW

UCLA STAT 231 - Maximum Likelihood

School name University of California, Los Angeles

Course Stat 231- Pattern Recognition and Machine Learning

Pages 11

This preview shows page 1-2-3-4 out of 11 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

4. Maximum LikelihoodLearning Probability Distributions.Maximum Likelihood Estimation.Supervised versus Unsupervised Learning.Example of MLE.MLEMLE and Kullback-LeiblerSlide 8MLE exampleLearning with a Prior.Recursive Bayes LearningLecture note for Stat 231: Pattern Recognition and Machine Learning4. Maximum LikelihoodProf. A.L. YuilleStat 231. Fall 2004.Lecture note for Stat 231: Pattern Recognition and Machine LearningLearning Probability Distributions.•Learn the likelihood functions and priors from datasets.•Two Main Strategies. Parametric and Non-Parametric. •This Lecture and the next will concentrate on Parametric methods. (This assumes a parametric form for the distributions).Lecture note for Stat 231: Pattern Recognition and Machine LearningMaximum Likelihood Estimation.Assume distribution is of form •Independent Identically Distributed (I.I.D.) samples; •Choose •Lecture note for Stat 231: Pattern Recognition and Machine LearningSupervised versus Unsupervised Learning.•Supervised Learning assumes that we known the class label for each datapoint.•I.e. We are given pairs• where is the datapoint and is the class label.•Unsupervised Learning does not assume that the class labels are specified. This is a harder task.•But “unsupervised methods” can also be used for supervised data if the goal is to determine structure in the data (e.g. mixture of Gaussians).•Stat 231 is almost entirely concerned with supervised learning.Lecture note for Stat 231: Pattern Recognition and Machine LearningExample of MLE.•One-Dimensional Gaussian Distribution:• •Solve for by differentiation:Lecture note for Stat 231: Pattern Recognition and Machine LearningMLE•The Gaussian is unusual because the parameters of the distribution can be expressed as an analytic expression of the data.•More usually, algorithms are required. •Modeling problem: for complicated patterns – shape of fish, natural language, etc. – it requires considerable work to find a suitable parametric form for the probability distributions.Lecture note for Stat 231: Pattern Recognition and Machine LearningMLE and Kullback-Leibler•What happens if the data is not generated by the model that we assume?•Suppose the true distribution is and our models are of form •The Kullback-Leiber divergence is:•This is•K-L is a measure of the difference betweenLecture note for Stat 231: Pattern Recognition and Machine LearningMLE and Kullback-Leibler•Samples•Approximate •By the empirical KL: •Minimizing the empirical KL is equivalent to MLE.•We find the distribution of formLecture note for Stat 231: Pattern Recognition and Machine LearningMLE exampleN1ii);(log)(xplWe denote the log-likelihood as a function of is computed by solving equations0)(ddlFor example, the Gaussian familygives close form solution.Lecture note for Stat 231: Pattern Recognition and Machine LearningLearning with a Prior.•We can put a prior on the parameter values •We can estimate this recursively (if samples are i.i.d):•Bayes Learning: estimate a probability distribution onLecture note for Stat 231: Pattern Recognition and Machine LearningRecursive Bayes

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 11 pages.

UCLA STAT 231 - Maximum Likelihood

Sign up for free to view:

Please select your school