DOC PREVIEW
UW-Madison STAT 411 - wayman_multimp_aera2003

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Multiple Imputation For Missing Data: What Is It And How Can I Use It? Jeffrey C. Wayman, Ph.D. Center for Social Organization of Schools Johns Hopkins University [email protected] www.csos.jhu.edu Paper presented at the 2003 Annual Meeting of the American Educational Research Association, Chicago, IL. Address correspondence and requests for reprints to Jeff Wayman, Center for Social Organization of Schools, Johns Hopkins University, 3003 N. Charles Street, Suite 200, Baltimore, MD 21218. Email [email protected] Educational researchers have become increasingly aware of the problems and biases which can be caused by missing data. Significant advances have been made in the last 15 years regarding methodologies which handle responses to these problems and biases. Unfortunately, these methodologies are often not available to many researchers for a variety of reasons (e.g., lack of familiarity, computational challenges) and researchers often resort to ad-hoc approaches to handling missing data, ones which may ultimately do more harm than good (Little & Rubin, 1987; Graham, Hofer, Donaldson, MacKinnon, & Schafer, 1997; Schafer & Graham, 2002). There is a need to make available workable methodologies for handling missing data. Multiple imputation is one such method. Multiple imputation can be used by researchers on many analytic levels. Many research studies have used multiple imputation (e.g., Graham et al., 1997; Wayman, 2002a) and good general reviews on multiple imputation have been published (Graham, Cumsille, & Elek-Fisk, 2003; Graham & Hofer, 2000; Schafer & Olsen, 1998; Sinharay, Stern, & Russell, 2001). However, multiple imputation is not implemented by many researchers who could benefit from it, very possibly because of lack of familiarity with the technique. A paper which provides a more basic computational description than has previously been presented would be a helpful addition to this literature and might invite more researchers to explore and understand the technique. Therefore, the objective of this paper is to help familiarize researchers with the basic process of multiple imputation, including a data example which will guide the reader through the multiple imputation process. This paper will first present a brief discussion of some missing data issues. Following this will be a description of the workings of the multiple imputation process, with a data example interspersed throughout the description to provide illustration and clarity. Finally, the paper will conclude with a brief discussion of issues surrounding this particular analysis. Missing Data Methods for Treatment of Missing Data The intent of any analysis is to make valid inferences regarding a population of interest. Missing data threatens this goal if it is missing in a manner which makes the sample different than the population from which it was drawn, that is, if the missing data creates a biased sample. Therefore, it is important to respond to a missing data problem in a manner which reflects the population of inference. It is important to understand that once data are missing, it is impossible not to treat them – once data are missing, any subsequent procedure with that data set represents a response in some form to the missing data problem. As a result, there are many different methods of managing missing data, of which multiple imputation is one. I will present only a brief discussion of missing data methods here before proceeding to the multiple imputation example. More thorough discussion of missing data methods can be found in Graham et al., 2003; Graham & Hofer, 2000; Little and Rubin, 1987; Schafer, 1997; and Schafer and Graham, 2002, to name a few. Some of the most popular missing data methods involve ad-hoc deletion or replacement of missing data. These methods typically edit missing data to produce a complete data set and are attractive because they are easy to implement. However, researchers have been cautioned against3using these methods because they have been shown to have serious drawbacks (e.g., Little & Schenker, 1995; Graham & Hofer, 2000; Graham et al. 1997; Schafer & Graham, 2002). For example, handling missing data by eliminating cases with missing data (“listwise deletion” or “complete case analysis”) will bias results if the remaining cases are not representative of the entire sample. This method is the default in most statistical software. Another common method available in most statistical packages is mean substitution, which replaces missing data with the average of valid data for the variable in question. Because the same value is being substituted for each missing case, this method artificially reduces the variance of the variable in question, in addition to diminishing relationships with other variables. Graham et al. (2003) referred to these traditional methods as” “unacceptable methods.” Examples of other unacceptable methods include pairwise deletion and regression-based single imputation. Additionally, there exist more statistically principled methods of handling missing data which have been shown to perform better than ad-hoc methods (e.g., Little & Rubin, 1987; Graham et al., 1997; Schafer & Graham, 2002). These methods do not concentrate solely on identifying a replacement for a missing value, but on using available information to preserve relationships in the entire data set. Maximum likelihood estimation is one such method. This method requires specification of a statistical model for each analysis and is a sound method for treating missing data, but is often difficult to implement for less-advanced analysts. The Expectation Maximization (EM) algorithm is another method which has been applied to missing data, but obtaining standard errors using EM involves auxiliary methods such as bootstrapping. The topic of this paper, multiple imputation, is a statistically principled method which is more commonly used because of ease of use and available software. Mechanisms Responsible for Missing Data Whether implementing multiple imputation or some other method of dealing with missing data, it is important to understand why the data are missing. Graham et al. (2003) described that missing data can informally be thought of as being caused in some combination of three ways: random processes, processes which are measured, and processes which are not measured. Modern


View Full Document

UW-Madison STAT 411 - wayman_multimp_aera2003

Download wayman_multimp_aera2003
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view wayman_multimp_aera2003 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view wayman_multimp_aera2003 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?