UA ECON 520 - Method of Moments and Maximum Likelihood Estimation

Unformatted text preview:

Economics 520, Fall 2006Lecture Note 10: Point Estimation: Method of Moments and Maximum Like-lihood Estimation (CB 7.1, 7.2.1, 7.2.2)1. Point Estimation ProblemIn statistical inference, we consider a set of possible probability models, and try to useobservations to infer which of the set of probability models is the one that generated thedata. To formalize this, suppose we have a parametric family of PDFs or PMFs{f(x; θ), θ ∈ Θ}We refer to θ as the parameter and Θ as the parameter space. You could think of θ as aparticular theory for some phenomenon.Suppose we have a random sample of size n from a distribution with PDF or PMF f(x; θ∗).That is, we observe X1, X2, . . . , Xn, where each Xiis IID with PDF or PMF f(x; θ∗). Here,θ∗∈ Θ is the “true value” of the parameter, but we do not know what it is. Our goal is touse the observations to provide an estimate of θ∗.A statistic is any function of the observations, say T (X1, . . . , Xn).A point estimator is a statistic used to provide a guess about θ.Often, we will use notation likeˆθ =ˆθ(X1, . . . , Xn) to denote a point estimator. For a givenrealization of the data, it is just a number, but if we were to take a new sample of data,it would take on a different value. In this sense a statistic is itself a random variable, andwe will evaluate it according to the repeated sampling criteria: we look at the behavior(distribution) of the estimator when we repeatedly get new random samples of the samesize. We would like the distribution of the estimator to be concentrated around θ∗.Note: Be careful about the notation here. We use θ∗here to denote the true value of theparameter that generated the data, and use θ to denote any element of the parameter space.This distinction will be important as we will evaluate the density function som etimes at thetrue value of the parameter, and sometimes at arbitrary values.2. Method of Moment sThe first approach to systematically find estimators is the method of moments. Consider aset of independent and identically distributed random variables with a PDF/PMF f(x; θ).Suppose θ is a scalar. The mean of this distribution isE[X] = g(θ∗),where the function g(θ) (for all values of θ, and not just for the true value θ∗) is defined as:g(θ) =Zx · f(x; θ)dx.The function g(·) is clearly a known function given knowledge of f(·). Now suppose thatwe calculate the average of our random sample, X =PXi/n. (For notational simplicity, I1have dropped the “n” subscript.) With n reasonably large this average should be close tog(θ∗). Therefore the value of θ that solvesg(θ) = X,would appear to be a sensible estimator for θ∗.Let us look at some examples. Suppose that Xihas a Bernoulli distribution with probabilityp∗. Then the expectation of X is p∗, so the method of moment estimator solvesˆp = X.This is an unbiased estimator: its expectation is equal to the unknown parameter.Suppose that X has an exponential distribution with arrival rate λ. The expectation of Xis 1/λ, so the method of moments estimator solves:g(ˆλ) = 1/ˆλ = X,ofˆλ = 1/X.This estimator is not unbiased: the exp e ctation of X is 1/λ, so the expectation of 1/X isgreater than λ by Jensen’s inequality.Result 1 (Jensen’s Inequality, CB 4.7.7)For any random variable X, if g(x) is a convex function, thenEg(X) ≥ g(EX).Equality holds if and only if, for every line a + bx that it tangent to g(x) at x = EX,P (g(X) = a + bX) = 1.Proof: Let l(x) be a tangent line to g(x) at the point g(EX). Write l(x) = a + bx for somea and b. By convexity of g, we have g(x) ≥ a + bx. ThereforeEg(X) ≥ E(a + bX)= a + bE(X)= l(EX)= g(EX)(Proof of the last part of the result left to the reader.) 2Now suppose we have two parameters, that is, θ is a two dimensional vector. We couldcalculate the first two moments:g1(θ) = E[X] =Zx · fX(x; θ)dx,g2(θ) = E[X2] =Zx2· fX(x; θ)dx,2and equate them with the corresponding sample moments:g1(ˆθ) = X,g1(ˆθ) = X2.For example, suppose that X has a normal distribution with mean µ and variance σ2. Theg(·) functions are:g1(µ, σ2) =ZxfX(x; µ, σ2)dx = µ,g2(µ, σ2) =Zx2fX(x; µ, σ2)dx = µ2+ σ2.Hence the method of moments estimators areˆµ = X,ˆσ2= X2− (X)2.Again these appear reasonable and intuitive estimators. In other cases the estimators aresomewhat less obvious and less attractive. Consider the case where the distribution of Xis multinomial with parameters k and p both unknown (a relatively unusual problem inpractice). The first two moments are:g1(k, p) =XxfX(x; k, p) = k · p,g2(k, p) =Xx2fX(x; k, p) = k · p · (1 − p) + k2· p2.Thus we solveX =ˆk · ˆp,X2=ˆk · ˆp · (1 − ˆp) +ˆk2· ˆp2.The solutions are:ˆp = X/ˆk,andˆk =X · XX · X + X − X2+ 3.A problem is that these estimates can in fact be negative if the variance is larger than themean, which is obviously not such a good estimate for a probability and a number of trials.3. Maximum LikelihoodThe sec ond general approach to estimation we consider is maximum likelihood estimation.The likelihood function is the density function viewed as a function of the unknown param-eters, rather than as a function of the random variable. Let X be a random variable withPDF/PMFfX(x; θ∗),where θ∗is the unknown true value of the parameter θ. The likelihood function is then:L(θ) = fX(X; θ).3If we have more than one random variable, say X1, X2, . . . , Xn, the likelihood function isbased on the joint probability density/mass function:L(θ) = fX1,X2,...,Xn(X1, X2, . . . , Xn; θ).If the random variables are independent and identically distributed, with common densityfunction fX(x; θ), the likelihood function obviously simplifies:L(θ) = fX1,X2,...,Xn(X1, X2, . . . , Xn; θ) =nYi=1fX(Xi; θ).Often we prefer to work with the logarithm of the likelihood function, the log likelihood function,e.g., in the case of n independent and identically distributed random variables:L(θ) = ln L(θ) = ln fX1,X2,...,Xn(X1, X2, . . . , Xn; θ) =nXi=1ln fX(Xi; θ).The maximum likelihood estimator or mle for θ∗is the value of θ that maximizes the likeli-hood function, or equivalently, the log likelihood function (because the logarithm is a one toone, strictly monotone, transformation the maximand of one is the maximand of the other.)Let us first look at a sim ple example. Suppose X has a normal distribution with unknownmean µ and known variance 1. The probability density function isfX(x; µ)


View Full Document

UA ECON 520 - Method of Moments and Maximum Likelihood Estimation

Download Method of Moments and Maximum Likelihood Estimation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Method of Moments and Maximum Likelihood Estimation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Method of Moments and Maximum Likelihood Estimation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?