New version page

CORNELL CS 664 - Lecture #4: Maximum likelihood estimation, Markov chains

Upgrade to remove ads
Upgrade to remove ads
Unformatted text preview:

CS664 Lecture #4: Maximum likelihood estimation, Markov chainsAnnouncementsRecap: Efros & LeungRecap: mean shiftLast lecture we saw:Probability vs. StatisticsDefinition of likelihoodAn example likelihoodCoin likelihood notesQ: What is a statistic?Failure modes of MLGaussian likelihoodParametric ML exampleML Parzen estimationCS664 Lecture #4: Maximum likelihood estimation, Markov chainsSome slides taken from: Yuri Boykov First quiz will be on Thursday– Coverage through lecture #3– Graded via CMS, the Course Management System– Example questions on mean shift• Mean shift won’t be on the quiz Guest lecture a week from Tuesday by Bryan Kressler– We’ll have a few of these over the semester3Recap: Efros & Leung Please be sure to ask questions!– You will probably have to implement these Efros & Leung– Example for 1-by-1, 1-by-2 windows• Estimation, sampling• Center-weighted L2distance– Parzen estimation• Uniform kernel• Gaussian kernel4Recap: mean shift Mean shift in histogram space5Last lecture we saw: Non-parametric density estimation– Histograms + various fitting methods– Nearest neighbor– Parzen estimation Finding local modes– Mean shift and applications Maximum likelihood estimation6Probability vs. Statistics Probability: Mathematical models of uncertainty predict outcomes– This is the heart of probability– Models, and their consequences• What is the probability of a model generating some particular data as an outcome? Statistics: Given an outcome, analyze different models– Did this model generate the data?– From among different models (or parameters), which one generated the data?7Definition of likelihood Intuition: the true PDF should not make the sample (data) you saw a “fluke”– It’s possible that the coin is fair even though you saw 106heads in a row… The likelihood of a hypothesis is the probability that it would have resulted in the data you saw– Think of the data as fixed, and try to chose among the possible PDF’s– Often, a parameterized family of PDF’s• ML parameter estimation8An example likelihood Consider a coin with probability of heads h. If we flip it n times, the probability of observing k heads is– Suppose we observe 51/100 heads. What is the likelihood, as a function of h?90.00%2.00%4.00%6.00%8.00%10.00%12.00%14.00%0 102030405060708090100Number of headsProbabilityFair coin (p=.5) Slightly biased coin (p=.67) Biased coin (p=.9)10Coin likelihood notes The maximum likelihood estimate is always that the coin’s bias was exactly what we saw– But how likely this is depends on what we saw– A biased coin is a better explanation for a skewed example than a less-biased coin is for a less-biased example Suppose we only have a few hypotheses– When do their likelihoods “cross”?• 58.5 (for h= 0.5 vs. h= 0.67)• 73.5 (for h = 0.5 vs. h = 0.9)11Q: What is a statistic? A: Anything computed from the data– More or less the formal definition– Example: sample mean• Percentage of heads in coin flipping A given model will lead to some distribution of that statistic– Which we just saw Some statistics do not allow you to select among certain models– Sample mean can’t tell the coin has “memory”12Failure modes of ML Likelihood isn’t the only criterion for selecting a model or parameter– Though it’s obviously an important one Bizarre models may have high likelihood– Consider a speedometer reading 55 MPH– Likelihood of “true speed = 55”: 10%– Likelihood of “speedometer stuck”: 100% ML likes “fairy tales”– In practice, exclude such hypotheses– There must be a principled solution…13Gaussian likelihood With a 1-D Gaussian the probability of observing the sample {x1,x2} is– We are assuming that both observations were drawn independently from the same distribution (i.e., same mean and variance)• IID (independent identically distributed)14Parametric ML example Suppose your sample {x1,x2,…,xn} is drawn IID from a Gaussian– What is the ML estimate of its parameters?– We can maximize the log likelihood, which is– To maximize this we compute15ML Parzen estimation What kernel maximizes the likelihood?– Hint: it’s not very useful How do we make Parzen estimation actually work?– Can we get all possible densities as answers?• Do we even want to? Smooth kernels lead to smooth estimates– Choice of kernel width (often called bandwidth) is thus critical– Embodies an idea of what estimate we

View Full Document
Download Lecture #4: Maximum likelihood estimation, Markov chains
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...

Join to view Lecture #4: Maximum likelihood estimation, Markov chains and access 3M+ class-specific study document.

We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture #4: Maximum likelihood estimation, Markov chains 2 2 and access 3M+ class-specific study document.


By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?