##
This **preview** shows page *1-2-3-4-5*
out of 16 **pages**.

*View Full Document*

End of preview. Want to read all 16 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document**Unformatted text preview:**

2004 Royal Statistical Society 1369–7412/04/66031J. R. Statist. Soc. B (2004)66, Part 1, pp. 31–46Low order approximations in deconvolution andregression with errors in variablesRaymond J. CarrollTexas A&M University, College Station, USAand Peter HallAustralian National University, Canberra, Australia[Received September 2002. Revised April 2003]Summary. We suggest two new methods, which are applicable to both deconvolution andregression with errors in explanatory variables, for nonparametric inference.The two approachesinvolve kernel or orthogonal series methods. They are based on defining a low order approxi-mation to the problem at hand, and proceed by constructing relatively accurate estimators ofthat quantity rather than attempting to estimate the true target functions consistently. Of course,both techniques could be employed to construct consistent estimators, but in many contexts ofimportance (e.g. those where the errors are Gaussian) consistency is, from a practical viewpoint,an unattainable goal. We rephrase the problem in a form where an explicit, interpretable, loworder approximation is available.The information that we require about the error distribution (theerror-in-variables distribution, in the case of regression) is only in the form of low order momentsand so is readily obtainable by a rudimentary analysis of indirect measurements of errors, e.g.through repeated measurements. In particular, we do not need to estimate a function, such as acharacteristic function, which expresses detailed properties of the error distribution.This featureof our methods, coupled with the fact that all our estimators are explicitly defined in terms ofreadily computable averages, means that the methods are particularly economical in computingtime.Keywords: Density estimation; Measurement error; Nonparametric regression; Orthogonalseries; Simulation–extrapolation1. IntroductionSuppose that we observe the value ofW = X + U, .1/where the random variables X and U are independently distributed. We either know or havedata on the distribution of U, and we wish to estimate the density or distribution of X. This is aclassical deconvolution problem in statistics. Its contemporary applications date at least fromwork of Mendelsohn and Rice (1982), on the deconvolution of microﬂuorescence data, andhave generated much methodological interest. Carroll and Hall (1988) addressed the problemof optimal deconvolution in the case where U has a normal distribution. They showed thatthere the fastest possible convergence rate is only logarithmic in sample size, the latter denotedby n, say. This implies that the problem of consistent estimation is, unless the variance of Uis small, effectively insoluble in practical terms. Fan (1991) treated settings where the optimalAddress for correspondence: Raymond J. Carroll, Department of Statistics, Texas A&M University, CollegeStation, TX 77843-3143, USA.E-mail: [email protected] R. J. Carroll and P. Hallconvergence rate is polynomial in n, and Fan (1992) discussed the contrary case where the rateis particularly poor. Even when the rate is polynomial, it is often particularly slow unless thedensity of U is so unsmooth as to contain a discontinuity. See also Efromovich (1997) and Wand(1998). Further references will be given later.These results argue that the problem of inference about the density or distribution of X shouldbe treated differently from in more standard statistical contexts. Since consistent estimation isso difﬁcult in many important cases, it can be argued that we should not attempt to estimate theactual density, fXsay, of X. Instead we should estimate a function that, in a well-deﬁned sense,approximates fXand is estimable relatively efﬁciently. In this paper we suggest two approachesto this problem, based on kernel or orthogonal series methods. Both require some knowledgeof the distribution of U. However, the necessary information is very rudimentary, being basedonly on low order moments, and is frequently available either from a sample drawn from thedistribution of U or from replications of observations of W—i.e. small numbers of repeatedobservations of W for the same X but different values of U.It should be emphasized that we are not estimating fX, but estimating an approximationto fX. From this viewpoint our approach is similar to that used in many dimension reductionproblems: solving the problem at hand is infeasible, in both theory and practice, so we changethe problem to one which captures the main features of interest and solve that instead. Sincethe target of our attention is no longer fXthen traditional measures of performance, e.g. thedistance of our empirical approximation from fX(see for example Carroll and Hall (1988) andFan (1991)), are no longer relevant.The ﬁrst of our two methods is based on the observation that we can express the expectedvalue of a kernel estimator of fXas a series expansion in expectations of kernel estimators ofderivatives of the density fWof W , and that coefﬁcients in the series depend only on momentsof the distribution of U. By truncating the expansion we obtain a readily computable estima-tor of a low order approximation to fX. Details of this technique will be given in Section 2.Of course, approximations to the distribution FXof X can be found simply by integrating thedensity approximations.The basis of the second method is a formula for expressing fXin an orthogonal expansionwith estimable coefﬁcients. The coefﬁcient estimators depend on the distribution of U onlythrough its moments, but the functions in the orthogonal series can be virtually arbitrary. Forexample, they may be polynomials or trigonometric functions. Therefore, the type of orthog-onal sequence can be chosen to reﬂect prior belief about the distribution of X. For example,if that distribution is believed to be supported on the whole real line then we might take thejth term in the series to be proportional to Hj.τx/ exp.−12τ2x2/, where Hjdenotes the jthHermite polynomial. The factor exp.−12τ2x2/ forces the density approximation to decrease to0 in the tails; larger values of τ accommodate lighter tails. However, if the X-distribution isknown to be supported on a compact interval I = [a, b], and to descend to 0 at the ends ofthe interval, then we might take the orthogonal sequence to be of polynomials multiplied by{1 − .2x − a − b/.b − a/−1}c, for some c>0, where the weight function is