Unformatted text preview:

Economics 520, Fall 2010Lecture Note 11: Point Estimation: Method of Moments and Maximum Likelihood Estima-tion (CB 7.1, 7.2.1, 7.2.2)1. Point Estimation ProblemIn statistical inference, we consider a set of possible probability models, and try to use observationsto infer which of the set of probability models is the one that generated the data. To formalize this,suppose we have a parametric family of PDFs or PMFs{f(x; θ), θ ∈ Θ}We refer to θ as the parameter and Θ as the parameter space. You could think of θ as a particulartheory for some phenomenon.Suppose we have a random sample of size n from a distribution with PDF or PMF f (x; θ∗). Thatis, we observe X1, X2, . . . , Xn, where each Xiis IID with PDF or PMF f(x; θ∗). Here, θ∗∈ Θ isthe “true value” of the parameter, but we do not know what it is. Our goal is to use the observationsto provide an estimate of θ∗.A statisticis any function of the observations, say T (X1, . . . , Xn).A point estimator is a statistic used to provide a guess about θ.Often, we will use notation likebθ =bθ(X1, . . . , Xn) to denote a point estimator. For a givenrealization of the data, it is just a number, but if we were to take a new sample of data, it wouldtake on a different value. In this sense a statistic is itself a random variable, and we will evaluate itaccording to the repeated sampling criteria: we look at the behavior (distribution) of the estimatorwhen we repeatedly get new random samples of the same size. We would like the distribution ofthe estimator to be concentrated around θ∗.Note: Be careful about the notation here. We use θ∗here to denote the true value of the parameterthat generated the data, and use θ to denote any element of the parameter space. This distinction willbe important as we will evaluate the density function sometimes at the true value of the parameter,and sometimes at arbitrary values.2. Method of MomentsThe first approach to systematically find estimators is the method of moments. Consider a set ofindependent and identically distributed random variables with a PDF/PMF f(x; θ). Suppose θ is a1scalar. The mean of this distribution isE[X] = g(θ∗),where the function g(θ) (for all values of θ, and not just for the true value θ∗) is defined as:g(θ) =Zx · f (x; θ)dx.The function g(·) is clearly a known function given knowledge of f(·). Now suppose that wecalculate the average of our random sample, X =PXi/n. (For notational simplicity, I havedropped the “n” subscript.) With n reasonably large this average should be close to g(θ∗). Thereforethe value of θ that solvesg(θ) = X,would appear to be a sensible estimator for θ∗.Let us look at some examples. Suppose that Xihas a Bernoulli distribution with probability p∗.Then the expectation of X is p∗, so the method of moment estimator solvesbp = X.This is an unbiased estimator: its expectation is equal to the unknown parameter.Suppose that X has an exponential distribution with arrival rate λ. The expectation of X is 1/λ, sothe method of moments estimator solves:g(bλ) = 1/bλ = X,orbλ = 1/X.This estimator is not unbiased: the expectation of X is 1/λ, so the expectation of 1/X is greaterthan λ by Jensen’s inequality.Result 1 (JENSEN’S INEQUALITY, CB 4.7.7)For any random variable X, if g(x) is a convex function, thenEg(X) ≥ g(EX).Equality holds if and only if, for every line a + bx that it tangent to g(x) at x = EX, P (g(X) =a + bX) = 1.2Proof: Let l(x) be a tangent line to g(x) at the point g(EX). Write l(x) = a + bx for some a andb. By convexity of g, we have g(x) ≥ a + bx. ThereforeEg(X) ≥ E(a + bX)= a + bE(X)= l(EX)= g(EX)(Proof of the last part of the result left to the reader.) Now suppose we have two parameters, that is, θ is a two dimensional vector. We could calculate thefirst two moments:g1(θ) = E[X] =Zx · fX(x; θ)dx,g2(θ) = E[X2] =Zx2· fX(x; θ)dx,and equate them with the corresponding sample moments:g1(bθ) = X,g1(bθ) = X2.For example, suppose that X has a normal distribution with mean µ and variance σ2. The g(·)functions are:g1(µ, σ2) =ZxfX(x; µ, σ2)dx = µ,g2(µ, σ2) =Zx2fX(x; µ, σ2)dx = µ2+ σ2.Hence the method of moments estimators arebµ = X,bσ2= X2− (X)2.Again these appear reasonable and intuitive estimators. In other cases the estimators are somewhatless obvious and less attractive. Consider the case where the distribution of X is multinomial withparameters k and p both unknown (a relatively unusual problem in practice). The first two momentsare:g1(k, p) =XxfX(x; k, p) = k · p,g2(k, p) =Xx2fX(x; k, p) = k · p · (1 − p) + k2· p2.3Thus we solveX =bk · bp,X2=bk · bp · (1 − bp) +bk2· bp2.The solutions are:bp = X/bk,andbk =X · XX · X + X − X2+ 3.A problem is that these estimates can in fact be negative if the variance is larger than the mean,which is obviously not such a good estimate for a probability and a number of trials.3. Maximum LikelihoodThe second general approach to estimation we consider is maximum likelihood estimation. Thelikelihood function is the density function viewed as a function of the unknown parameters, ratherthan as a function of the random variable. Let X be a random variable with PDF/PMFfX(x; θ∗),where θ∗is the unknown true value of the parameter θ. The likelihood function is then:L(θ) = fX(X; θ).If we have more than one random variable, say X1, X2, . . . , Xn, the likelihood function is based onthe joint probability density/mass function:L(θ) = fX1,X2,...,Xn(X1, X2, . . . , Xn; θ).If the random variables are independent and identically distributed, with common density functionfX(x; θ), the likelihood function obviously simplifies:L(θ) = fX1,X2,...,Xn(X1, X2, . . . , Xn; θ) =nYi=1fX(Xi; θ).Often we prefer to work with the logarithm of the likelihood function, the log likelihood function,e.g., in the case of n independent and identically distributed random variables:L(θ) = ln L(θ) = ln fX1,X2,...,Xn(X1, X2, . . . , Xn; θ) =nXi=1ln fX(Xi; θ).4The maximum likelihood estimator or mle for θ∗is the value of θ that maximizes the likelihoodfunction, or equivalently, the log likelihood function (because the logarithm is a one to one, strictlymonotone, transformation the maximand of one is the maximand of the other.)Let us first look at a simple example. Suppose X has a normal distribution with unknown mean µand known variance 1. The probability density function isfX(x; µ) =1√2πexp−12(x −


View Full Document

UA ECON 520 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?