Statistics 550 Notes 14 Reading: Section 2.3-2.4I. Review from last class (Conditions for Uniqueness and Existence of the MLE).Lemma 2.3.1: Suppose we are given a function :l Q � �where pQ ��is open and lis continuous. Suppose also thatlim { ( ) : }l��� �Q =- �qq qQ.Then there exists ˆ�Qqsuch thatˆ( ) max{ ( ) : }l l= �Qq q q.Proposition 2.3.1: Suppose our model is that Xhas pdf or pmf ( | ),p �X q q Q, and that (i) ( )lxqis strictly concave; (ii) ( )l � - �xq as � �Qq. Then the maximum likelihood estimator exists and is unique.Corollary: If the conditions of Proposition 2.3.1 are satisfied and ( )l qxis differentiable in q, then ˆMLEq is the unique solution to the estimating equation: ( ) 0l� =qqxNote: It is the strict concavity of ( )lxqthat guarantees that( ) 0l� =qqx has a unique solution.1II. Application to Exponential Families.1. Theorem 1.6.4, Corollary 1.6.5: For a full exponential family, the log likelihood is strictly concave. Consider the exponential family 1( | ) ( ) exp{ ( ) ( )}ki iip h T Ah== -�x x xh hNote that if ( )A his convex, then the log likelihood1log ( | ) log ( ) ( ) ( )ki iip h T Ah== + -�x x xh his concave in h.Proof that ( )A his convex:Recall that 1( ) log ( )exp[ ( )ki iiA h T dh==���K x x xh. To show that ( )A his convex, we want to show that 1 2 1 2( (1 ) ) ( ) (1 ) ( )A A Aa a a a+ - � + -h h h hfor 0 1a� � or equivalently 1 2 1 2exp{ ( (1 ) )} exp{ ( )}exp{(1 ) ( )}A A Aa a a a+ - � -h h h hWe use Holder’s Inequality to establish this. Holder’s Inequality (B.9.4 on page 518 of Bickel and Doksum) states that for any two numbers r and s with1 1, 1, 1r s r s- -> + =,1/ 1/| | { | | } { | | }r r s sE XY E X E Y�.More generally, Holder’s inequality states{ } { }1/ 1/| ( ) ( ) | ( ) | ( ) | ( ) | ( ) | ( )r sr sf x g x h x dx f x h x dx g x h x dx�� � �2We have( ) ( )1 2 1 211 21 111/ 1/(1 )1 21 11exp{ ( (1 ) )} { exp[ ( (1 ) ) ( )] ( )exp[ ( )]exp[ (1 ) ( )] ( ) }(exp[ ( )]) ( ) (exp[ (1 ) ( )]) ( )exp{ ( )}exp{ki i iik ki i i ii ik ki i i ii iA T h dT T h dT h d T h dAa aa aa a ah a hah a hah a ha== =--= =+ - = + -- �- =��� ��� �� �h hhx x x =x x x xx x x x x x2(1 ) ( )}Aa- hFor a full exponential family, the log likelihood is strictly concave.For a curved exponential family, the log likelihood is concave but not strictly concave. 2. Theorem 2.3.1, Corollary 2.3.2 spell out specific conditions under which ( )l � - �xq as � �Qq for exponential families. Example 1: Gamma distribution1 /1, 0( )( ; , )0, elsewherexx e xf xa baa ba b- -�< <��G=���[ ]1( , ) log ( ) log ( 1) log /ni iil X Xa b a a b a b== - G - + - -�for the parameter space 0, 0a b> >.The gamma distribution is a full two-dimensional exponential family so that the likelihood function is strictly concave.3The boundary of the parameter space is{( , ) : ,0 } {( , ) : 0,0 } {( , ) : 0 , } {( , ) : 0 , 0}a b a b a b a ba b a b a b a b�Q = =� � �� � = � �� ����=�ȣ��=Can check that lim { ( ) : }l��� �Q =- �qq qQ. Thus, by Proposition 2.3.1, the MLE is the unique solution to the likelihood equation.The partial derivatives of the log likelihood are121'( )log log( )niiniilXXlaba aab b b==� �� G= - - +� �� G� �� ��= - +� ��� ���Setting the second partial derivative equal to zero, we find1ˆˆniiMLEMLEXnba==�When this solution is substituted into the first partial derivative, we obtain a nonlinear equation for the MLE ofa:11'( )ˆlog log log 0( )nniiMLE iiXn n n Xnaaa==G- - + + =G��This equation cannot be solved in closed form. II. Numerical Methods for Finding MLEsThe Bisection Method4The bisection method is a method for finding the root of a one-dimensional function fthat is continuous on ( , )a b,( ) 0 ( )f a f b< < for which f is increasing (an analogous method can be used for f decreasing). Note: There is a root *( ) 0f x =by the intermediate value theorem.Bisection Algorithm: Decide on tolerance 0e >for *final| |x x-Stop algorithm when we find finalx1. Find 0 1,x xsuch that 0 1( ) 0, ( ) 0f x f x< >. Initialize 1 0,old oldx x x x+ -= =.2. If 1| | 2 , set ( ) and return 2old old final old old finalx x x x x xe+ - + -- < = - Else set 1( )2new old oldx x x+ -= +3. If ( ) 0, set new final newf x x x= =. If ( ) 0 set new old newf x x x-< = and go to step 2. If ( ) 0, set new old newf x x x+> =and go to step 2.Lemma 2.4.1: The bisection algorithm stops at a solutionfinalxsuch that *| |finalx x e- �.Proof: If mxis the mth iterate of newx, 51 1 2 1 011 1| | | | | |2 2m m m mmx x x x x x- - --- � - � � -K KMoreover, by the intermediate value theorem,*1 for all m mx x x m+� �.Therefore,*1 1 0| | 2 | |mmx x x x-+- � -For 2 1 0log (| | / ),m x x e= -we have *1| |mx x e+- �.Note: Bisection can be much more efficient than the approach of specifying a grid of points between a and b andevaluating f at each grid point, since for finding the root to within e, a grid of size 1 0| | /x x e-is required, while bisection requires only 2 1 0log (| | / )x x e-evaluations of f. Coordinate Ascent MethodThe coordinate ascent method is an approach to finding the maximum likelihood estimate in a multidimensional family.Suppose we have a k-dimensional parameter 1( , , )kq qK. The coordinate ascent method is:Choose an initial estimate 1ˆ ˆ( , , )kq qK 0. Set 1 1ˆ ˆ ˆ ˆ( , , ) ( , , )k old kq q q q=K K1. Maximize 1 2ˆ ˆ( , , , )kl q q qKxover 1qusing the bisection method by solving …
View Full Document