CORNELL CS 664 - Lecture #4: Maximum likelihood estimation, Markov chains

Unformatted text preview:

CS664 Lecture #4: Maximum likelihood estimation, Markov chainsAnnouncementsRecap: Efros & LeungRecap: mean shiftLast lecture we saw:Probability vs. StatisticsDefinition of likelihoodAn example likelihoodCoin likelihood notesQ: What is a statistic?Failure modes of MLGaussian likelihoodParametric ML exampleML Parzen estimationCS664 Lecture #4: Maximum likelihood estimation, Markov chainsSome slides taken from: Yuri Boykovhttp://www.csd.uwo.ca/faculty/yuri/Courses/WWW_433/index.html2Announcements First quiz will be on Thursday– Coverage through lecture #3– Graded via CMS, the Course Management System– Example questions on mean shift• Mean shift won’t be on the quiz Guest lecture a week from Tuesday by Bryan Kressler– We’ll have a few of these over the semester3Recap: Efros & Leung Please be sure to ask questions!– You will probably have to implement these Efros & Leung– Example for 1-by-1, 1-by-2 windows• Estimation, sampling• Center-weighted L2distance– Parzen estimation• Uniform kernel• Gaussian kernel4Recap: mean shift Mean shift in histogram space5Last lecture we saw: Non-parametric density estimation– Histograms + various fitting methods– Nearest neighbor– Parzen estimation Finding local modes– Mean shift and applications Maximum likelihood estimation6Probability vs. Statistics Probability: Mathematical models of uncertainty predict outcomes– This is the heart of probability– Models, and their consequences• What is the probability of a model generating some particular data as an outcome? Statistics: Given an outcome, analyze different models– Did this model generate the data?– From among different models (or parameters), which one generated the data?7Definition of likelihood Intuition: the true PDF should not make the sample (data) you saw a “fluke”– It’s possible that the coin is fair even though you saw 106heads in a row… The likelihood of a hypothesis is the probability that it would have resulted in the data you saw– Think of the data as fixed, and try to chose among the possible PDF’s– Often, a parameterized family of PDF’s• ML parameter estimation8An example likelihood Consider a coin with probability of heads h. If we flip it n times, the probability of observing k heads is– Suppose we observe 51/100 heads. What is the likelihood, as a function of h?90.00%2.00%4.00%6.00%8.00%10.00%12.00%14.00%0 102030405060708090100Number of headsProbabilityFair coin (p=.5) Slightly biased coin (p=.67) Biased coin (p=.9)10Coin likelihood notes The maximum likelihood estimate is always that the coin’s bias was exactly what we saw– But how likely this is depends on what we saw– A biased coin is a better explanation for a skewed example than a less-biased coin is for a less-biased example Suppose we only have a few hypotheses– When do their likelihoods “cross”?• 58.5 (for h= 0.5 vs. h= 0.67)• 73.5 (for h = 0.5 vs. h = 0.9)11Q: What is a statistic? A: Anything computed from the data– More or less the formal definition– Example: sample mean• Percentage of heads in coin flipping A given model will lead to some distribution of that statistic– Which we just saw Some statistics do not allow you to select among certain models– Sample mean can’t tell the coin has “memory”12Failure modes of ML Likelihood isn’t the only criterion for selecting a model or parameter– Though it’s obviously an important one Bizarre models may have high likelihood– Consider a speedometer reading 55 MPH– Likelihood of “true speed = 55”: 10%– Likelihood of “speedometer stuck”: 100% ML likes “fairy tales”– In practice, exclude such hypotheses– There must be a principled solution…13Gaussian likelihood With a 1-D Gaussian the probability of observing the sample {x1,x2} is– We are assuming that both observations were drawn independently from the same distribution (i.e., same mean and variance)• IID (independent identically distributed)14Parametric ML example Suppose your sample {x1,x2,…,xn} is drawn IID from a Gaussian– What is the ML estimate of its parameters?– We can maximize the log likelihood, which is– To maximize this we compute15ML Parzen estimation What kernel maximizes the likelihood?– Hint: it’s not very useful How do we make Parzen estimation actually work?– Can we get all possible densities as answers?• Do we even want to? Smooth kernels lead to smooth estimates– Choice of kernel width (often called bandwidth) is thus critical– Embodies an idea of what estimate we


View Full Document
Download Lecture #4: Maximum likelihood estimation, Markov chains
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture #4: Maximum likelihood estimation, Markov chains and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture #4: Maximum likelihood estimation, Markov chains 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?