New version page

UI STAT 4520 - Lecture Note

Pages: 22
Documents in this Course

8 pages

3 pages

3 pages

7 pages

12 pages

4 pages

4 pages

5 pages

6 pages

6 pages

4 pages

4 pages

4 pages

27 pages

4 pages

27 pages

5 pages

63 pages

4 pages

8 pages

2 pages

21 pages

16 pages

31 pages

25 pages

2 pages

5 pages

3 pages

5 pages

3 pages

2 pages

3 pages

47 pages

4 pages

3 pages

7 pages

14 pages

7 pages

Unformatted text preview:

Posterior Predictive Checking 1Running head: POSTERIOR PREDICTIVE CHECKING Posterior Predictive Checking of Unidimensional Item Response Theory Models Kyong Hee Chon, Yuki Nozawa, and Su Zhang December 4, 2006Posterior Predictive Checking 2Abstract This study applies the posterior predictive model checking (PPMC) method (Rubin, 1984) to assess the fit of unidimensional item response theory (IRT) models for binary responses, and examines the performance of several discrepancy measures for assessing different aspects of model misfit. One dataset was generated from the three parameter logistic (3PL) model, which was then fit with the one parameter logistic (1PL), two parameter logistic (2PL), and 3PL models. The performance of the discrepancy measures examined in this study suggests that different measures detect different aspects of model misfit and that the choice of measures depends on the context and the aspects of fit to be assessed.Posterior Predictive Checking 3Introduction Item response theory (IRT) models are widely used for the analysis of items, tests, and examinees. The one-, two-, and three-parameter logistic models (1PL, 2PL, and 3PL) are most commonly used for unidimensional dichotomous item responses. Appropriate use of these IRT models requires that several strong assumptions be met by the data, such as local independence (i.e., for examinees with the same ability or proficiency, the probability of getting one item correct is independent of the probability of getting any other item correct), a specific form of the item response function (e.g., 1PL model assumes that all items have equal discriminations and no guessing). If these assumptions are not adequately met, inferences regarding the nature of the items and examinees can be erroneous, and the potential advantages of IRT are not gained. It is therefore crucial to check the adequacy of the fit of the chosen IRT model to item responses. Several fit statistics have been proposed within the frequentist framework (Orlando & Thissen, 2003; Yen, 1981), but none is universally accepted and model checking still remains an underdeveloped area in IRT. In Bayesian framework, model fit can be checked by: (1) examining the sensitivity of inferences to reasonable changes in the prior distribution and the likelihood, (2) checking the sensibility of posterior inferences against one’s substantive knowledge, and (3) checking the plausibility of posterior predictive replicated data against observed data, often referred to as posterior predictive model checking (PPMC) (Gelman, et al., 2003). In the IRT context, there have been several applications of the PPMC method (Albert & Ghosh, 2000; Glas & Meijer, 2003; Hoijtink, 2001; Hoijtink & Molenaar, 1997; Janssen, et al., 2000; Rubin & Stern, 1994; Scheines, Hoijtink & Boomsma, 1999; Sinharay, 2005; Sinharay & Johnson, 2003; Sinharay, Johnson & Stern, 2006; van Onna, 2003). The choice of discrepancy measures in PPMC isPosterior Predictive Checking 4crucial, and different measures tend to capture different aspects of model misfit; this theme is echoed again and again in these studies. Sinharay and Johnson (2003), for example, examined the power of four discrepancy measures in detecting five types of model misfits. The biserial correlation coefficient and the item pair odds ratio were both found to be effective in detecting the inadequacy of the 1PL model for data from 2PL and 3PL models, whereas when the 2PL model was fit to data from the 3PL model, the item pair odds ratio was not effective at all, with the biserial correlation coefficient still powerful. Under the appeal of the Bayesian model checking tool in the IRT context, the purpose of this study is to 1) apply the PPMC method to assess the fit of unidimensional IRT models for dichotomous item responses, and 2) examine the performance of several discrepancy measures for assessing different aspects of model misfit. Method Posterior Predictive Model-Checking (PPMC) Method The idea of PPMC is to generate simulated values from the posterior predictive distributions of replicated data and to compare these samples to the observed data. If the replicated data and the observed data differ systematically, it is an indication of a potential model misfit. Letting be the observed data and yθ be the vector of all the parameters in the model, we then define )|(θyp as the likelihood and )(θp as the prior distribution on the parameters. The PPMC method suggests checking a model by examining whether the observed data appear extreme with respect to the posterior predictive distribution of replicated data , which is obtained by yrepyθθθdypypyypreprep)|()|()|(∫=. (1)Posterior Predictive Checking 5A discrepancy measure or a test quantity, , is then defined and computed from the replicated data, which is compared with the observed values of (i.e., computed from the observed data). The PPMC method allows a reasonable summary of such comparisons with the posterior predictive p-value (PPP-value): )( yT)( yTθθθθθθθddyypypyyTyTrepyTyTrepreprep)|()|()|),(),(Pr(),(),(∫≥=≥. (2) PPP-values that are close to 0 or 1 are indicative of model misfits. Data One dataset was generated from the 3PL model, with the probability of getting item j correct for examinees with ability or proficiency θ represented as )](exp[11)(jjjjjbDaccP−−+−+=θθ, (3) where is the pseudo-guessing (or lower asymptote) parameter, is the item difficulty parameter, is the item discrimination (or slope) parameter, and is the scaling factor of 1.7. For the 1PL model, is assumed to be zero and is fixed as a constant for all items. For the 2PL model, is assumed to be zero for all items and is allowed to vary. The 3PL model can be considered equivalent to the Generalized Linear Mixed Model (GLMM) for binary responses with random intercepts and slopes. jcjbjaDjcjajcja The simulated dataset contains responses of 1000 examinees to 15 dichotomous items. The ability parameters were generated from a standard normal distribution. The item parameters used to generate the responses were based on the real item parameter estimates from the National Assessment in Educational Progress (NAEP), as provided in Sinharay, Johnson, and Stern (2006).Posterior Predictive Checking 6Analysis The prior distributions on the item parameters are as follows: , , , which can all be considered noninformative priors. The data generated from

View Full Document