DOC PREVIEW
maca.biometrika

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

NONPARAMETRIC REGRESSION IN THEPRESENCE OF MEASUREMENT ERRORBY RAYMOND J. CARROLLDepartment of Statistics, Texas A&M University, College Station TX 77843–3143, [email protected] D. MACADepartment of Biostatistics, Novartis Pharmaceuticals Corporation, 59 Route 10, East Hanover NJ07936–1080, [email protected] DAVID RUPPERTSchool of Operations Research and Industrial Engineering, Cornell University, Ithaca NY 14853, [email protected] many regression applications the independent variable is measured with error. When this happens,conventional parametric and nonparametric regression techniques are no longer valid. We consider twodifferent approaches to nonparametric regression. The first uses the SIMEX method and makes no as-sumption about the distribution of the unobserved error–prone predictor. For this approach we derivean asymptotic theory for kernel regression which has some surprising implications. Penalised regressionsplines are also considered for fixed number of known knots. The second approach assumes that the error–prone predictor has a distribution of a mixture of normals with an unknown number of mixtures, and usesregression splines. Simulations illustrate the results.Some key words: Estimating equation; Local polynomial regression, Measurement error; Regressionspline; Sandwich estimation; SIMEX.Short title: Nonparametric Regression with Measurement Error1 INTRODUCTIONWe consider the problem of nonparametric regression function estimation in the presence of measurementerror in the predictor. Suppose that the regression of a response Y on a predictor X is given by E(Y |X)=m(X). Instead of observing X, we can only observe W , an error-prone measurement related to X by anadditive error model, W = X + U, where U is a mean-zero normal random variable with variance σ2u.Thequestion is how to estimate m(·) when observations on Y and W are all that are available.This problem has been addressed previously, most notably by Fan & Truong (1993), who found thefollowing discouraging result. Suppose that we allow m(·)tohaveuptokderivatives. They showed that,if the measurement error is normally distributed, even with known error variance, then, based on a sampleof size n, no consistent nonparametric estimator of m(·) converges faster than the rate {log(n)}−k. Since,for example, log(10, 000, 000) ≈ 16, effectively this result suggests that consistent nonparametric regressionfunction estimation in the presence of measurement error is impractical.The Fan & Truong result can be interpreted in another way. As reviewed by Carroll, Ruppert &Stefanski (1995), much of the enormous practical progress made in the field of measurement error fornonlinear models has been through the use of approximately consistent estimators, i.e. estimators whichcorrect for most of measurement error induced bias, but not all. Furthermore, when the measurement errorvariance is zero then the associated convergence rate is of order n−1/2rather than {log(n)}−k.Wemightexpect, then, that estimation will be of practical use if the measurement error variance is not too large.Theoretically, for small errors, i.e., σ2u→ 0, the bias of naive estimators is of the order O(σ2u), while theapproximate error correctors have a bias of order O(σ6u) or less.A second positive interpretation is to remember that the Fan & Truong result pertains to globallyconsistent estimation, i.e. estimators of E(Y |X) which are consistent without anything but smoothnessassumptions. Such results say nothing about estimators which are consistent for a flexible yet parametricsubclass of the nonparametric family. For example, regression splines are a well-known parametric familywith the capability of estimating wide classes of regression functions. If one is willing to estimate E(Y |X)by a regression spline, then effective semiparametric estimation of E(Y |X) should be possible even in thepresence of measurement error.This paper develops the two ideas of approximately consistent and regression spline estimation in thepresence of measurement error. In § 2 we show how to implement the SIMEX method (Cook & Stefanski,11994; Stefanski & Cook, 1995) in ordinary nonparametric kernel regression, cubic smoothing splines andpenalised regression splines. The SIMEX method is a functional method, i.e. one that can be appliedwithout estimation of the distribution of the unobservable X.In§3, we take up the structural approachin the context of regression splines, showing that the observed data follow a type of regression splinedepending on the conditional distribution of X given W .IfWgiven X is normally distributed, X given Wdepends on the marginal distribution of X, which we model flexibly by a mixture of normal distributionswith an unknown number of components. This flexible distribution is estimated by modifying the Gibbssampling algorithm of Wasserman & Roeder (1997). Section 5 gives a number of simulations. Section 6has concluding remarks.While the discussion to follow is easiest in the case that the measurement error variance σ2uis known, inpractice this is usually not the case. In some instances, σ2uis estimated by an external dataset. Otherwise,internal replicates are used, so that we observe Wij= Xi+ Uijfor i =1,...,n and j =1,...,κi≥ 1,where the measurement errors Uijare independent, mean zero, normally distributed random variables withvariance σ2u; a components of variance estimate is given as equation (3.2) in Carroll et al. (1995). In theory,for either external or internal data, σ2uis estimated at ordinary parametric rates Op(n−1/2), and so theasymptotic effect of such estimation on nonparametric regression functions is often nil.2 THE SIMEX ESTIMATORThe SIMEX estimator was developed by Cook & Stefanski (1994); see Carroll et al. (1996) and Stefanski& Cook (1995) for related theory. The idea behind the method is most clearly seen in simple linearregression when the independent variable is subject to measurement error. Suppose the regression model isE(Y |X)=α+βX and that W = X + U, rather than X, is observed where U has mean zero and varianceσ2u,andσ2uis known. It is well known that the ordinary least squares estimate of the slope from regressingY on W converges to βσ2x(σ2x+σ2u)−1, where σ2xdenotes the variance of X.For any fixed λ>0, suppose one repeatedly ‘adds on,’ via simulation, additional error with mean zeroand variance σ2uλ to W , computes


maca.biometrika

Download maca.biometrika
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view maca.biometrika and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view maca.biometrika 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?