G89 2247 Lecture 10 Examples of Binary Data Binary Data and Correlation Measurement Models and Binary Data Measurement Models and Ordinal Data Analyzing binary data with different SEM software packages G89 2247 Lecture 10 1 Examples of Binary Data Some binary outcomes have categorical meaning Did Tasha get an academic job yes no Has Jimmy ever injected heroin yes no Other binary outcomes reflect passing some threshold Did Jenna make the Dean s list this semester Other binary outcomes may reflect some complex position on an ordered dimension True or False I am an outgoing person True or False I smoked marijuana last year G89 2247 Lecture 10 2 Dichotomized Data A Bad Habit of Psychologists Sometimes perfectly good quantitative data is made binary because it seems easier to talk about High vs Low The worst habit is median split Usually the High and Low groups are mixtures of the continua Rarely is the median interpreted rationally See references Cohen J 1983 The cost of dichotomization Applied Psychological Measurement 7 249 253 McCallum R C Zhang S Preacher K J Rucker D D 2002 On the practice of dichotomization of quantitative variables Psychological Methods 7 19 40 G89 2247 Lecture 10 3 Correlations of Binary data Product moment correlations computed on binary data are called phi coefficients Phi depends on the means of the two variables as well as their strength of relationship X1 0 1 X1 1 0 0 1 99 99 X 2 0 X 1 0 1 X 1 1 1 2 X 2 1 0 98 98 1 99 100 X 2 0 X 1 0 1 X 1 1 2 3 X 2 1 0 97 97 1 99 100 X2 0 X2 1 1 1 0 99 100 70 57 G89 2247 Lecture 10 4 Example Phi is 13 Underlying r is 66 Original Data 10 0 9 0 8 0 X2 7 0 6 0 X2 5 0 4 0 3 0 2 0 2 0 4 0 6 0 8 0 10 0 X1 G89 2247 Lecture 10 5 Factor Analysis of Phi Coefficients Loadings tend to be low In exploratory factor analysis some factors emerge that cluster together variables that have the same proportion positive mean values In educational psychology these are called difficulty factors Considered to be an artifact of cutpoint Conventional psychometric wisdom says factor analysis of phi correlations is incorrect G89 2247 Lecture 10 6 Phi Factor Analysis as Incorrect Mislevy 1986 summarized problems with the analysis of phi coefficients in an often cited paper on factor analysis of categorical data Phi coefficients depend on the means of the X variables as well as their strength of relationship The linear factor model inherently mispecified More appropriate models exist G89 2247 Lecture 10 7 The linear phi factor model is inherently mispecified Suppose that the binary X variables are coded as 0 1 Consider the linear factor model Xj 1jf1 2jf2 ej j 1 2 q Even if we assume that the model is meaningful for values between 0 and 1 there is no guarantee that the fitted values of X j will be in that interval G89 2247 Lecture 10 8 Modern appropriate methods Suppose X is a dichotomized variable X is the original continuous variable Xj 1 if Xj j and Xj 0 otherwise Tetrachoric correlations estimate the correlations among the X variables rather than the dichotomized ones When the sample size is large SEM software will compute the tetrachoric correlations assuming that the underlying distribution is bivariate normal G89 2247 Lecture 10 9 Example 66 phi 13 Tetrachoric 875 Original Data X2 1 10 0 9 0 8 0 X2 7 0 6 0 X2 X2 0 5 0 4 0 3 0 2 0 2 0 4 0 X1 0 6 0 8 0 X1 X1 1 10 0 G89 2247 Lecture 10 10 Example of Factor Analysis Use EQS to simulate simple one factor model Check solution with SPSS Dichotomize variables at two thresholds Compute biased factor analysis Compute analysis based on tetrachoric correlations Note the standard errors G89 2247 Lecture 10 11 Possible Overstatement of Conventional Wisdom In many substantive fields binary data are included in factor analyses and measurement models Inferences not necessarily wrong Means of binary data may similar Binary outcomes conceived more as categorical events than measures of some underlying continuum G89 2247 Lecture 10 12 Model Specification Always a problem X1 f e1 X2 f e2 Xq 1qf eq Whether the term 1jf1 exceeds the interval 0 1 depends on the distribution of f What do we know about the distribution of f ONLY WHAT WE ASSUME Normal Gibbons et al Continuous and unbounded Mislevy Arbitrary Bartholomew Distribution may be some other that prevents out of range scores in factor model G89 2247 Lecture 10 13 Generalization Ordinal data mixed data binary ordinal quantitative When one variable is quantitative and the other is binary Product moment correlation is called point biserial correlation Analogue of tetrachoric is simply biserial correlation When variables are ordinal Product moment r is Spearman Rank Correlation Inferred process correlation is Polychoric Correlation G89 2247 Lecture 10 14 Tetrachoric Polychoric Correlations require large 1000s to estimate For small n s the estimates can be unstable Unstable estimates lead to covariance structures that have problems Not positive definite Cannot be inverted Cannot be fit with SEM Muthen s software MPlus has better estimators of the polychoric and tetrachoric values G89 2247 Lecture 10 15 Interpretation of SEM models based on Categorical data Latent variables represent processes inferred from RECONSTRUCTED quantitative variables Think in terms of X rather than X Unit is standard deviation of implied continuum Effects are often larger Work on standard errors is still being done G89 2247 Lecture 10 16
View Full Document
Unlocking...