Unformatted text preview:

Statistics 550 Notes 11 Reading: Section 2.2.Take-home midterm: I will e-mail it to you by Saturday, October 14th. It will be due Wednesday, October 25th by 5 p.m. I. Maximum LikelihoodThe method of maximum likelihood is an approach for estimating parameters in “parametric” model, i.e., a model in which the family of possible distributions( | ), }p �Qx{ q q,has a parameter space Qthat is a subset of d�for some finite d.Motivating Example: A box of Dunkin Donuts munchkins contains 12 munchkins. Each munchkin is either glazed or not glazed. Let q denote the number of glazed donuts in the box. To gain some information on q, you are allowed to select five of the munchkins from the box randomly without replacement and view them. Let X denote the number of glazed munchkins in the sample. Suppose X=3 of the munchkins in the sample are glazed. How should weestimate q?Probability model: Imagine that the munchkins are numbered 1-12. A sample of five donuts thus consists of 1five distinct numbers. All 127925� �=� �� �samples are equally likely. The distribution of X is hypergeometric:125( )125x xP X xqq q-� �� �� �� �-� �� �= =� �� �� �The following table shows the probability distribution for Xgiven qfor each possible value of q.X=Number of glazed munchkins in the sample0 1 2 3 4 5Number 0 1 0 0 0 0 0of glazed 1 .5833 .4167 0 0 0 0munch-kins2 .3182 .5303 .1515 0 0 0originally 3 .1591 .4773 .3182 .0454 0 0in 4 .0707 .3535 .4243 .1414 .0101 0Box 5 .0265 .2210 .4419 .2652 .0442 .0012(q) 6 .0076 .1136 .3788 .3788 .1136 .00767 .0012 .0442 .2652 .4419 .2210 .02658 0 .0101 .1414 .4243 .3535 .07079 0 0 .0454 .3182 .4773 .159110 0 0 0 .1515 .5303 .318211 0 0 0 0 .4167 .583312 0 0 0 0 0 12Once we obtain the sample X=3, what should we estimateq to be? It’s not clear how to apply the method of moments. We have ( ) 512E Xqq= but solving ˆ5 3 012q- = gives ˆ7.2q =, which is not in the parameter space.Maximum likelihood approach: We know that it is impossible that q=0, 1, 2, 11 or 12. The set of possible values for q once we observe X=3 are =3, 4, 5, 6, 7, 8, 9, 10. Although both q=3 and q=7 are possible, the occurrence of X=3 would be more “likely” ifq=7 [7( 3) .4419P Xq== =] than if q=3 [3( 3) .0454P Xq== =]. Among q=3, 4, 5, 6, 7, 8, 9, 10, theq that makes the observed data X=3 most “likely” is q=7.General definitions for maximum likelihood estimatorThe likelihood function is defined by( ) ( | )L p=XXq q. The likelihood function is just the joint probability mass or probability density of the data, except that we treat it as a function of the parameter q. Thus, : [0, )L Q � �X. The likelihood function is not a probability mass function or a probability density function: in general, it is not true that( )LXqintegrates to 1 with respect to q. In the motivating example, for 3X =, 3( ) 2.167XLqq=�Q=�. 3The maximum likelihood estimator (the MLE), denoted byˆMLEq, is the value of qthat maximizes the likelihood: ˆarg max ( )MLEL�Q=xqq q. For the motivating example,ˆMLEq=7.Intuitively, the MLE is a reasonable choice for an estimator.The MLE is the parameter point for which the observed sample is most likely.Equivalently, the log likelihood function is( ) log ( | )l p=xxq qˆarg max ( )MLEl�Q=xqq q.Example 2: Poisson distribution. Suppose 1, ,nX XKare iidPoisson(l). 1 1 1( ) log ( )log !!iXn n ni ii i iiel n X XXlll l l-= = == =- + -� � �xTo maximize the log likelihood, we set the first derivative of the log likelihood equal to zero,11'( ) 0.niil X nll== - =�Xis the unique solution to this equation. To confirm thatXin fact maximizes ( )l l, we can use the second derivative test,211''( )niil Xll=-=�4''( ) 0l X <as long as 10niiX=��so that Xin fact maximizes ( )l l. When 10niiX==�, it can be seen by inspection that 0 maximizes ( )l lx.Example 3: Suppose 1, ,nX XKare iid Uniform(0,q]. 0 if max( )1 if maxiinXLXqqq>��=����xqThus, ˆmaxMLE iXq =.Recall that the method of moments estimator is 2X. In notes 4, we showed that maxiXdominates 2Xfor the squared error loss function (although maxiXis dominated by 1maxinXn+). Key valuable features of maximum likelihood estimators: 1. The MLE is consistent.2. The MLE is asymptotically normal: ˆˆ( )MLEMLESEq qq-converges in distribution to a standard normal distribution for a one-dimensional parameter.3. The MLE is asymptotically optimal: roughly, this means that among all well-behaved estimators, the MLE has the smallest variance for large samples. 5Motivation for maximum likelihood as a minimum contrastestimate: The Kullback-Leibler distance (information divergence) between two density functions gand ffor a random variable Xthat have the same support is ( , ) [log( ( ) / ( ))] log[ ( ) / ( )] ( )fK g f E f X g X f x g x f x dx= =�Note that by Jensen’s inequality[log( ( ) / ( ))] [ log( ( ) / ( ))] log{ [ ( ) / ( )]} 0f ffE f X g X E g X f XE g X f X= -�-=where the inequality is strict if f g�since –log is a strictlyconvex function. Also note that ( , ) 0K f f =. Thus, the Kullback-Leibler distance between gand a fixed fis minimized at g f=.Suppose the family of models has the same support for each �q Q and that qis identifiable. Consider the function ( , ) ( )lr =-xx q q. The discrepancy for this function is 0000 0( , ) log ( | )( ( | ), ( | )) [log ( | )D E pK p p E pq=- =-qq q qq q qxx x x.By the results of the above paragraph, 0 0arg min ( , )D�=qq q qQso that ( , ) ( )lr =-xx q qis a valid contrast function. The minimum contrast estimator associated with the contrast 6function ( , ) ( )lr =-xx q qisˆarg min ( ) arg max ( )MLEl l�Q �Q- = =q qq q qx xThus, the maximum likelihood estimator is a minimum contrast estimator for a contrast that is based on the Kullback-Leibler distance. Consistency of maximum likelihood estimates: A basic desirable property of estimators is that they are consistent, i.e., converge to the true parameter when there is a “large” amount of data. The maximum likelihood estimator is generally, although not always consistent. We prove a special case of consistency here. Theorem: Consider the model 1, ,nKX X are iid with pmf or pdf( | ), }ip �QX q q{Suppose (a) the parameter space Qis finite; (b)qis identifiable and (c) the ( | )ip X qhave common support for all �Qq . Then the maximum likelihood estimator


View Full Document

Penn STAT 550 - STAT 550 LECTURE NOTES

Download STAT 550 LECTURE NOTES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view STAT 550 LECTURE NOTES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view STAT 550 LECTURE NOTES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?