Unformatted text preview:

Statistics 550 Notes 14 Reading: Section 2.3-2.4I. Review from last class (Conditions for Uniqueness and Existence of the MLE).Lemma 2.3.1: Suppose we are given a function :l Q � �where pQ ��is open and lis continuous. Suppose also thatlim { ( ) : }l��� �Q =- �qq qQ.Then there exists ˆ�Qqsuch thatˆ( ) max{ ( ) : }l l= �Qq q q.Proposition 2.3.1: Suppose our model is that Xhas pdf or pmf ( | ),p �X q q Q, and that (i) ( )lxqis strictly concave; (ii) ( )l � - �xq as � �Qq. Then the maximum likelihood estimator exists and is unique.Corollary: If the conditions of Proposition 2.3.1 are satisfied and ( )l qxis differentiable in q, then ˆMLEq is the unique solution to the estimating equation: ( ) 0l� =qqxNote: It is the strict concavity of ( )lxqthat guarantees that( ) 0l� =qqx has a unique solution.1II. Application to Exponential Families.1. Theorem 1.6.4, Corollary 1.6.5: For a full exponential family, the log likelihood is strictly concave. Consider the exponential family 1( | ) ( ) exp{ ( ) ( )}ki iip h T Ah== -�x x xh hNote that if ( )A his convex, then the log likelihood1log ( | ) log ( ) ( ) ( )ki iip h T Ah== + -�x x xh his concave in h.Proof that ( )A his convex:Recall that 1( ) log ( )exp[ ( )ki iiA h T dh==���K x x xh. To show that ( )A his convex, we want to show that 1 2 1 2( (1 ) ) ( ) (1 ) ( )A A Aa a a a+ - � + -h h h hfor 0 1a� � or equivalently 1 2 1 2exp{ ( (1 ) )} exp{ ( )}exp{(1 ) ( )}A A Aa a a a+ - � -h h h hWe use Holder’s Inequality to establish this. Holder’s Inequality (B.9.4 on page 518 of Bickel and Doksum) states that for any two numbers r and s with1 1, 1, 1r s r s- -> + =,1/ 1/| | { | | } { | | }r r s sE XY E X E Y�.More generally, Holder’s inequality states{ } { }1/ 1/| ( ) ( ) | ( ) | ( ) | ( ) | ( ) | ( )r sr sf x g x h x dx f x h x dx g x h x dx�� � �2We have( ) ( )1 2 1 211 21 111/ 1/(1 )1 21 11exp{ ( (1 ) )} { exp[ ( (1 ) ) ( )] ( )exp[ ( )]exp[ (1 ) ( )] ( ) }(exp[ ( )]) ( ) (exp[ (1 ) ( )]) ( )exp{ ( )}exp{ki i iik ki i i ii ik ki i i ii iA T h dT T h dT h d T h dAa aa aa a ah a hah a hah a ha== =--= =+ - = + -- �- =��� ��� �� �h hhx x x =x x x xx x x x x x2(1 ) ( )}Aa- hFor a full exponential family, the log likelihood is strictly concave.For a curved exponential family, the log likelihood is concave but not strictly concave. 2. Theorem 2.3.1, Corollary 2.3.2 spell out specific conditions under which ( )l � - �xq as � �Qq for exponential families. Example 1: Gamma distribution1 /1, 0( )( ; , )0, elsewherexx e xf xa baa ba b- -�< <��G=���[ ]1( , ) log ( ) log ( 1) log /ni iil X Xa b a a b a b== - G - + - -�for the parameter space 0, 0a b> >.The gamma distribution is a full two-dimensional exponential family so that the likelihood function is strictly concave.3The boundary of the parameter space is{( , ) : ,0 } {( , ) : 0,0 } {( , ) : 0 , } {( , ) : 0 , 0}a b a b a b a ba b a b a b a b�Q = =� � �� � = � �� ����=�ȣ��=Can check that lim { ( ) : }l��� �Q =- �qq qQ. Thus, by Proposition 2.3.1, the MLE is the unique solution to the likelihood equation.The partial derivatives of the log likelihood are121'( )log log( )niiniilXXlaba aab b b==� �� G= - - +� �� G� �� ��= - +� ��� ���Setting the second partial derivative equal to zero, we find1ˆˆniiMLEMLEXnba==�When this solution is substituted into the first partial derivative, we obtain a nonlinear equation for the MLE ofa:11'( )ˆlog log log 0( )nniiMLE iiXn n n Xnaaa==G- - + + =G��This equation cannot be solved in closed form. II. Numerical Methods for Finding MLEsThe Bisection Method4The bisection method is a method for finding the root of a one-dimensional function fthat is continuous on ( , )a b,( ) 0 ( )f a f b< < for which f is increasing (an analogous method can be used for f decreasing). Note: There is a root *( ) 0f x =by the intermediate value theorem.Bisection Algorithm: Decide on tolerance 0e >for *final| |x x-Stop algorithm when we find finalx1. Find 0 1,x xsuch that 0 1( ) 0, ( ) 0f x f x< >. Initialize 1 0,old oldx x x x+ -= =.2. If 1| | 2 , set ( ) and return 2old old final old old finalx x x x x xe+ - + -- < = - Else set 1( )2new old oldx x x+ -= +3. If ( ) 0, set new final newf x x x= =. If ( ) 0 set new old newf x x x-< = and go to step 2. If ( ) 0, set new old newf x x x+> =and go to step 2.Lemma 2.4.1: The bisection algorithm stops at a solutionfinalxsuch that *| |finalx x e- �.Proof: If mxis the mth iterate of newx, 51 1 2 1 011 1| | | | | |2 2m m m mmx x x x x x- - --- � - � � -K KMoreover, by the intermediate value theorem,*1 for all m mx x x m+� �.Therefore,*1 1 0| | 2 | |mmx x x x-+- � -For 2 1 0log (| | / ),m x x e= -we have *1| |mx x e+- �.Note: Bisection can be much more efficient than the approach of specifying a grid of points between a and b andevaluating f at each grid point, since for finding the root to within e, a grid of size 1 0| | /x x e-is required, while bisection requires only 2 1 0log (| | / )x x e-evaluations of f. Coordinate Ascent MethodThe coordinate ascent method is an approach to finding the maximum likelihood estimate in a multidimensional family.Suppose we have a k-dimensional parameter 1( , , )kq qK. The coordinate ascent method is:Choose an initial estimate 1ˆ ˆ( , , )kq qK 0. Set 1 1ˆ ˆ ˆ ˆ( , , ) ( , , )k old kq q q q=K K1. Maximize 1 2ˆ ˆ( , , , )kl q q qKxover 1qusing the bisection method by solving …


View Full Document

Penn STAT 550 - STAT 550 Notes

Download STAT 550 Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view STAT 550 Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view STAT 550 Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?