MIT 14 385 - Nonparametric and Semiparametric Estimation - D257586

Home> Schools> Massachusetts Institute of Technology> Economics (14) > 14 385> Nonparametric and Semiparametric Estimation

MIT 14 385 - Nonparametric and Semiparametric Estimation

School name Massachusetts Institute of Technology

Course 14 385- Nonlinear Econometric Analysis

Pages 24

Download Save

Unformatted text preview:

Nonparametric and Semiparametric Estimation Whitney K. Newey Fall 2007 Introduction Function form misspeciﬁcation error is important in elemen tary econometrics. Flexi-ble functional forms; e.g. translog y = β1 + β2 ln(x)+ β3[ln(x)]2 Fine for simple nonlinearit y, e.g. diminishing returns. Economic theory does not restrict form. Nonparametric methods allow for complete ﬂexibility. Good for graphs. Good for complete ﬂexibility with a few dimensions. An Empirical Example An example illustrates. Deaton (1989); eﬀect of rice prices on the distributions of incomes in Thailand. p price of rice; q amount purchased; y amount sold. Change in beneﬁts from dp is dB =(q − y)dp = p(q − y)d ln(p). Elasticity form: dB /d ln(p)=(w − py/x), x w budget share of rice purchases; x total expenditure. Beneﬁt/expenditure measure is the negative of right-hand side. Empirical Distribution Function Simple nonparametric estimation problem. The CDF of Z is FZ (z)=Pr(Z ≤ z). Let Z1, ..., Zn be i.i.d. data, 1(A) indicator of A,so FZ (z)= E[1(Zi ≤ z)]. X FˆZ (z)=#{i|Zi ≤ z} =1 n1(Zi ≤ z). n n i=1 Empirical CDF. Probability weight 1/n on each observation. Consistent and asymptotically normal. Nonparametrically eﬃcient. No good for density estimation. Kernel Density Estimator Add a little continuous noise to smooth out empirical CDF. Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].¯Zn have empirical CDF. ¯ U a continuous random variable with pdf K(u), indep of Zn h a positive scalar. Deﬁne Z˜= Z¯n + hU Empirical CDF plus noise. Kernel density estimator is density of Z˜. R Derivation: Let FU (u)= u K(t)dt be CDF of U.−∞ By iterated expectations E[1( Z˜≤ z)] = E[E[1(Z˜≤ z)|Z¯n]], so by 1( Z˜≤ z)=1(U ≤ (z − Z¯n)/h), FZ˜(z)=Pr( Z˜≤ z)= E[1(Z˜≤ z)] ¯¯= E[[1(U ≤ z − hZn )|Zn]] n= E[FU ( z − Z¯n )] = X FU ( z − Zi )/n. h hi=1 Diﬀerentiating gives pdf nX fˆ h(z)= dF ˜(z)/dz = Kh(z − Zi)/n;Z i=1 Kh(u)= h−1K(u/h). This is a kernel density estimator.The function K(u)isthe kernel and the scalar h is the bandwidth. nX fˆ h(z)= dF ˜(z)/dz = Kh(z − Zi)/n;Z i=1 Kh(u)= h−1K(u/h). Bandwidth h controls the amount of smoothing. As h increases, density smoother, but more ”noise” from U, i.e. more bias. As h −→ 0 get rough density, spikes at data points, but bias shrinks. Choosing h important in practice; see below. fˆ h(z) will be consistent if h −→ 0and nh −→ ∞ . Examples: Gaussian kernel: K(u)= (2π)−1/2e−u2/2 . Epanec hnikov: K(u)= 1(|u| ≤ 1)(1 − u2)(3/4). Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].Choice of K does not matter as much as choice of h. Epanechnik ov kernel has slightly smaller mean square error, and so optimal. Bias and Variance of Kernel Estimators (Z1, ..., Zn) are i.i.d.. Bias: f0(z)ispdf of Zi. Expectation of k ernel estimator; with Z Z E[fˆ h(z)] = Kh(z − t)f0(t)dt =1 K( z − t )f0(t)dt h hZ = K(u)f0(z − hu)du, for change of variables u =(z − t)/h.Taylor expand f0(z − hu)around h =0, f0(z − hu)= f0(z) − f00(z)hu + Γ(h, u, z)h2 , Γ(h, u, z)= f000(z + h¯(z, u)u)u 2/2, R R¯where |h(z,u)| ≤ |h|.For K(u)u2du < ∞ , K(u)udu =0, assuming f000(z)contin-uous and bounded, Z Z K(u)Γ(h, u, z)du −→ [ K(u)u 2du]f000(z)/2. Then for o(h2)= a(h)withlimh−→ 0a(h)/h2 =0, Z Z h2 K(u)Γ(h, u, z)du = h2[ K(u)u 2du]f000(z)/2 +o(h2) Then mu ltiplying the expansion f0(z − hu)= f0(z) − f0(z)hu + Γ(h, u, z)h2 0by K(u) and integrating gives Z E[fˆ h(z)] = f0(z)+ h2f000(z) K(u)u 2du/2+ o(h2). We can summarize these calculations in the following result: Proposition 1: If f0(z) is twice continuously diﬀerentiable with bounded second R R R derivative, K(u)du =1, K(u)udu =0, u2K(u)du < ∞ ,then Z E[fˆ h(z)] − f0(z)= h2f00(z) K(u)u 2du/2+ o(h2).0 Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].Variance: From Proposition 1, E[Kh(z − Zi)] = E[fˆ h(z)] is bounded as h −→ 0. Let O(1/n)denote(an)∞n=1 such that nan is bounded. Then by fˆ h(z) a sample of av e rage of Kh(z − Zi), for h −→ 0, Var(fˆ h(z)) = {E[Kh(z − Zi)2] − {E[Kh(z − Zi)]}2}/nZ =1 K( z − t )2f0(t)dt/n + O(1/n)h2 hZ1 = K(u)2f0(z − hu)du/(nh)+ O(1/n). h R For f0(z) continuous and bounded and K(u)2du < ∞ , Z Z K(u)2f0(z − hu)du −→ f0(z) K(u)2du. By h −→ 0, it follows that nhO(1/n) −→ 0, so that O(1/n)= o(1/nh). Plugging in abovevarianceformula we ﬁnd, Z Var(fˆ h(z)) = f0(z) K(u)2du/(nh)+ o(1/(nh)). We can summarize these calculations in the following result: R Proposition 2: If f0(z) is continuous and bounded, K(u)2du < ∞ , h −→ 0, and nh →∞ then Z Var[fˆ h(z)] = f0(z) K(u)2du/(nh)+ o(1/(nh)). Consistency and Convergence Rate of Kernel Estimators For consistency implied by h 0; bias goes to zero. −→ nh −→ ∞ ; variance goes to zero. Bandwidth shrinks to zero slower than 1/n. Intuition for the h −→ 0: Smoothing ”noise” m u st go away asymptotically to remove all bias. Intuition for nh −→ ∞ :For Epanechnikov kernel; K((z − Zi)/h) > 0ifand only if |z − Zi| <h.If h shrinksasfastasorfasterthan1/n, the number of observations with |z − Zi| <h will not grow, so averaging over a ﬁnite number of observations, hence variance does not go to zero. Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].Explicit form for (MSE) under h −→ 0,nh −→ ∞ . MSE(fˆ h(z)) = Var(fˆ h(z)) + Bias2(fˆ h(z))Z = f0(z) K(u)2du/(nh) Z +h4{f00(z) K(u)u 2du/2}2 0 +o(h4 +1/(nh)). By h −→ 0, MSE vanishes slower than 1/n. Thus, kernel estimator converges slower than n−1/2 . Avoidance of bias b y h −→ 0 means fraction of the observations used goes to zero. Bandwidth Choice for Density Estimation: Graphical: Choose one that looks

View Full Document


School:
Email:
New Password:
Confirm Password:

MIT 14 385 - Nonparametric and Semiparametric Estimation

Sign up for free to view:

Please select your school