Unformatted text preview:

Statistics 550 Notes 19 Reading: Section 3.5, 4.1I. Review of Information Inequality for Unbiased Estimators Fisher information number ( )I qis defined as 2( ) log ( | )I E pqq qq� ��� �=� �� ��� �� �� �X.Information Inequality Applied to Unbiased Estimators: Suppose the regularity conditions I and II from Notes 18 hold for the model ( | ), p q q �QX ~ x. Let ( )T Xbe any unbiased estimate of q. Then 1( ( ))( )Var TIqq�XNote on Attainment of Lower Bound: Using the conditions for equality in the Cauchy-Schwarz inequality (which the proof for the information inequality was based on), we can show that the lower bound is attained only in exponential families (Theorem 3.4.2).II. The Information Inequality and Asymptotic Optimality of the MLE1Theorem 2 of Notes 12 was about the asymptotic variance of the MLE. We will rewrite the result of the theorem in terms of the Fisher Information Number as we have definedit in this notes. Consider 1, ,nX XKiid from a distribution ( | )ip X q,q �Q, which satisfies assumptions I and II. Let1 1( ) log ( | )dI Var p Xdqq qq� �=� �� �. Theorem 2 of Notes 12 can be rewritten as Theorem 2’: Under “regularity conditions,” (including Assumptions I and II), 01 01ˆ( ) 0,( )LMLEn NIq qq� �- �� �� �The relationship between ( )I q and 1( )I qis 1 11 1( ) log ( | )log ( | ) log ( | )log ( | ) ( )n ni ii idI Var pdd dVar p X Var p Xd ddnVar p X nIdqq qqq qqq qq qq qq= =� �= =� �� �� � � �= =� � � �� � � �� �=� �� �� �XThus, from Theorem 2’, we have that for large n, ˆMLEq is approximately unbiased and has variance 11 1( ) ( )nI Iq q=. By the Information Inequality, the minimum variance of an 2unbiased estimator is 1( )I q. Thus the MLE approximately achieves the lower bound of the Information Inequality. This suggests that for large n, among all consistent estimators (which are approximately unbiased for large n), the MLE is achieving approximately the lowest variance and is hence asymptotically optimal.Note: Making precise the sense in which the MLE is asymptotically optimal took many years of brilliant work by Lucien Le Cam and other mathematical statisticians. It will be covered in Stat 552.III. Robustness (Chapter 3.5)Thus far, we have evaluated the performance of estimators assuming the underlying model is the correct one. Under this assumption, we have derived estimators that are optimal in some sense. However, if the underlying model is not correct, then we cannot be guaranteed of the optimality of our estimator.We cannot guard against all possible situations, and moreover, if our model is arrived at through some careful considerations, we shouldn’t have to. But we may be concerned about small-or-medium sized deviations from our assumed model. This leads us to consideration of robust estimators. Such estimators will give up optimality at the assumed model in exchange for a reasonable 3performance if the assumed model is not the true model. Peter Huber argued that …any statistical procedure should possess the following desirable features: (1) It should have a reasonably good (optimal or nearly optimal) efficiency at the assumed model. (2) It should be robust in the sense that small deviations from the model assumptions should impair the performanceonly slightly(3) Somewhat larger deviations from the model should not cause a catastrophe. For item (3), we consider the following measure of what happens if there is (are) unusually aberrant observation(s):: Consider an estimator 1( , , )nT X XK for a sample1( , , )n nX X= KX. Suppose we corrupt mdata points of the sample, i.e., we replace 1, ,mX XKby * *1, ,mX XKwhere these points are large outliers. Let* *1 1( , , , , , )m m m nX X X X+= K KX. Define the bias of the estimator upon corrupting mdata points to be bias( , , ) sup | ( ) ( ) |n n n n n mm T T T= -X X X, where the sup is taken over all possible corrupted samplesmX. If this bias is infinite, we say that the estimator has broken down. The smallest proportion of corruption an estimator can tolerate until it breaks down is called its finitesample breakdown point. More precisely if *min{ / : bias( , , ) }n n nm n m Te = =�X, 4then *neis called the finite sample breakdown point of1( , , )nT X XK. If as n � �, the limit * *ne e�exists, we call *ethe breakdown point of the estimator T. Example: Consider1, ,nX XKiid 2( , )N ms. The sample mean Xis the MLE (as well as minimax and a limit of Bayes estimator). However the breakdown point of Xis 0.In contrast, for the sample median, the finite sample breakdown value of the sample median is 0.5 for an even-sized sample and (n/2-0.5)/n for an odd sized sample. Thus, the breakdown value for the sample median is 0.5. The tradeoff is that as n � �, the ratio of the variance of the sample median to the sample mean converges to 1.56 (see Notes 12). A compromise between the sample median and the sample mean is the atrimmed mean, Xa([ 1]) ( [ ])2[ ]n n nX XXn na aaa+ -+ +=-Lwhere [ ]nais the largest integer na�and (1) ( )nX X< <Lare the ordered observations. That is, we throw out the “outer” [ ]naobservations on either side and take the mean of the rest. The breakdown value of Xa is a. Simulation Study of Size 80,000 from 1, , ~ (0,1)nX X NK:Estimator Variance of Estimator/5Variance of XX10.1X1.060.2X1.14Sample Median 1.55IV. Hypothesis Testing (Chapter 4.1)Motivating Example: A graphologist claims to be able to identify the writing of a schizophrenic person from a nonschizophrenic person. The graphologist is given a set of 10 folders, each containing handwriting samples of two persons, one nonschizophrenic and the other schizophrenic.Her task is to identify which of the writings are the work ofthe schizophrenics. When this experiment was actually performed, the graphologist made 6 correct identifications (Pascal and Suttell, 1947, Journal of Personality). Is there strong evidence that the graphologist is able to better identify the writing of schizophrenics better than a person who was randomly guessing would?Probability model: Let pbe the probability that the graphologist successfully identifies the writing of a randomly chosen schizophrenic vs. nonschizophrenic person. A reasonable model is that the 10 trials are iid Bernoulli with probability of success p.Hypotheses: A randomly guessing person would


View Full Document

Penn STAT 550 - STAT 550 Lecture notes

Download STAT 550 Lecture notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view STAT 550 Lecture notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view STAT 550 Lecture notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?