DOC PREVIEW
UT Dallas CS 6375 - mle

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11Machine LearningCS6375 --- Spring 2015aMaximum Likelihood Estimation Instructor: Yang LiuSlides adopted from Vincent Ng, Andrew Moore2Maximum Likelihood Estimation• From a Bayesian perspective, we are interested in finding the MAP hypothesis• But in many cases we have to assume a uniform distribution over the hypotheses (e.g., because of lack of prior knowledge of the domain), effectively seeking the maximum likelihood (ML) hypothesis()h|DPhHhMAP ∈= maxarg()()( )DPhPD|hPHh∈= maxarg()()hPD|hPHh∈=maxarg()D|hPhHhML ∈=maxarg23Maximum Likelihood Estimates• If the hypotheses are parameterized (by say θ), then seeking a ML hypothesis is equivalent to seeking values of θ that maximize data likelihood.• A maximum likelihood estimate (MLE) is a parameter estimate that maximizes the data likelihood. It is an estimate that is most consistent with the data.()θθθD|Pmaxarg*=4Example: Coin Tossing• How likely am I to toss a head? Assume that a series of 10 trials/tosses yields (h,t,t,t,h,t,t,h,t,t)(nH=3, nT=7), n = 10• Probability of tossing a head = 3/10• That’s an MLE! This estimate is absolutely consistent with the observed data.• But … is this an estimate that maximizes data likelihood?35Maximizing Data Likelihood• What’s the data likelihood? P(H)=θ• How to maximize the data likelihood function? Take the first derivative of the likelihood function with respect to the parameter θ and solve for 0. This value maximizes the likelihood function and is the MLE.()73)1()(θθθθ−== D|PL6Maximizing the Likelihood• It’s usually easier to maximize the log likelihood. So let’s maximize• Take the derivative of the function and set it to zero.• Solve for θ: ()73)1()(θθθθ−== D|PL))1((log)(log73θθθ−=L73)1log(log)(logθθθ−+=L)1log(7log3)(logθθθ−+=L0173)(log=−−=θθθθdLd103=θ47A General Scalar MLE StrategyTask: Find MLE θ that maximizes P(Data | θ)1. Write LL = log P(Data | θ)2. Work out the first derivative of the likelihood function using calculus3. Set the derivative to zero, thus creating an equation in terms of θ4. Solve it5. Check that you’ve found a maximum rather than a minimum or a saddle point 8MLE for Univariate Gaussian Learning Gaussians from data:Suppose you have x1, x2, …, xR~ (i.i.d) N(µ,δ2) (independent and identically distributed)But you don’t know µ (for now, assume you know δ)MLE: for which µ is x1,x2,…xRmost likely? MAP: which µ maximizes p(µ |x1,x2,…xR, δ)?59Just Algebra∑∑∑∑∏=====−=−−====RiiRiiRiRiiRiiRMLExxxpxpxxxp1212211212221)(minarg2)(21logmaxarg),|(logmaxarg),|(maxarg),|,,(maxargµσµσπσµσµσµµµµµµµLby i.i.dMonotonicity of logPlug in formula for Gaussiansimplification10ThusThe best estimate of the mean of a distribution is the mean of the sample! ∑∑==−−=−∂∂=∂∂=RiiRiixxLL112)(2)(0µµµµ∑==RiixR11µ611A General MLE StrategySuppose θ = (θ1, θ2, …, θn)Tis a vector of parameters.Task: Find MLE θ that maximizes P(Data | θ)1. Write LL = log P(Data | θ)2. Work out the partial derivative of LL w.r.t. each θi3. Solve the set of simultaneous equations4. Check that you are at a maximum.What if you cannot solve the simultaneous equations?• use gradient ascent01=∂∂θLL02=∂∂θLL0=∂∂nLLθ…12Gaussian AgainSuppose you have x1,x2,…xR~ (i.i.d) N(µ,σ2)But you don’t know µ or σ2MLE: for which θ=(µ, σ2) is x1,x2,…xRmost likely?713MLE for GaussianGaussian Naïve Bayes ClassifierConsider continuous attributes XiEach attribute N(µ,σ2)Training: MLE parameter estimation14∏==iiNByYXPyPY )|()(maxarg815We haven’t discussed Bayes estimation of parameters (e.g., for Gaussian) You need to understand the recipe of MLE, use it for some probability distributions16Unbiased EstimatorsAn estimator of a parameter is unbiased if the expected value of the estimate is the same as the true value of the parametersµMLEis unbiasedbiased if the expected value of the estimate is different from the true valueδMLEis biased917Does this strategy always work?Sometimes we don’t have complete data for MLE18An Example: Animal Classification• There are n animals classified into one of four possible categories Category counts are the sufficient statistics to estimate the parameters• Techniques for finding MLEs is the same Take derivative of likelihood function Solve for zero See homework for multinomial distribution1019An Example: Animal ClassificationThere are n=197 animals classified into one of 4 categories:Y = (y1, y2, y3, y4) = (125, 18, 20, 34)The probability associated with each category is given as:The resulting likelihood function for this data is:)41),1(41),1(41,4121(ππππ−−+=Θ4321)41())1(41())1(41()4121(!4!3!2!1!)(yyyyyyyynLπππππ−−+=20Maximizing Log Likelihood)!4!3!2!1!log()41log(*4))1(41log(*3))1(41log(*2)4121log(*1)(logyyyynyyyyL++−+−++=πππππ0413221)(log=+−+−+=πππππyyyydLd0341382125)(log=+−−+=πππππdLd627.0=π1121Adversity Strikes!• What if the observed data is incomplete? What if there are really 5 categories?• y1 is the composite of 2 categories (x1+x2)• How can we make a MLE, since we can’t observe category counts x1 and x2?! Unobserved sufficient statistics!?ππ41)2(,21)1(,4121)1( ==+= xpxpyp22The EM Algorithm• Expectation-Maximization (EM)• E-STEP: Find the expected values of the sufficient statistics for the complete data X, given the incomplete data Y and the current parameter estimates• M-STEP: Use those sufficient statistics to make a MLE as usual!• Repeat the above steps until convergence1223MLE for Complete Data54321)41())1(41())1(41()41()21(!5!4!3!2!1!)(xxxxxxxxxxnLπππππ−−=)34,20,18,2,1()5,4,3,2,1( xxxxxxxX==where x1+x2=125)41),1(41),1(41,41,21(ππππ−−=Θ24MLE for Complete Data)!5!4!3!2!1!log()41log(*5))1(41log(*4))1(41log(3)41log(2)21log(1)(logxxxxxnxxxxxL++−+−+=πππππ014352)(log=−+−+=ππππxxxxdLd0138342)(log=−−+=ππππxdLd1325E-Step• What are the sufficient statistics? X1 (X2 can be inferred from X1, since X2 = 125-X1)• The unobserved counts x1 and x2 are the categories with a sample size of 125 p(x1) + p(x2) = p(y1) = ½ + ¼ * pi• How can their expected value be computed? E[x1|y1] = n*p(x1|y1) p(x1|y1) = ½ / ( ½ + ¼ * pi) E[x2|y1] = n*p(x2|y1) = 125 – E[x1|y1] p(x2|y1) = ¼*pi / ( ½ + ¼*pi)26E-Step Iteration 1• Start with pi = 0.5 (this is just a random


View Full Document

UT Dallas CS 6375 - mle

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download mle
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view mle and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view mle 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?