DOC PREVIEW
O-K-State PSYC 5314 - Working with unbalanced cell sizes in multiple regression with categorical predictors

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

The issuesFormulating hypothesesWhat does it mean to ``control for'' or ``ignore''?A worked-out exampleDescriptive statisticsInferential statisticsAn unweighted means analysisA weighted means analysisSummary and conclusionsAknowledgementsReferencesWorking with unbalanced cell sizes in multipleregression with categorical predictorsIsta ZahnFebruary 17, 2010Contents1 The issues 21.1 Formulating hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 What does it mean to “control for” or “ignore”? . . . . . . . . . . . . 32 A worked-out example 42.1 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Inferential statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.1 An unweighted means analysis . . . . . . . . . . . . . . . . . 82.2.2 A weighted means analysis . . . . . . . . . . . . . . . . . . . 113 Summary and conclusions 134 Aknowledgements 13References 13List of Tables1 Hypothetical Salary Data (in Thousands) for Female and Male em-ployees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Means and standard deviations for the salary data . . . . . . . . . . 53 Correlations among the contrast-coded variables . . . . . . . . . . . 84 Regression coefficients for gender and education predicting salaryusing unweighted means . . . . . . . . . . . . . . . . . . . . . . . . . 95 Type III ANOVA table for the gender and education data . . . . . . 96 Type II ANOVA table for the gender and education data . . . . . . 107 Regression coefficients for gender and education predicting salarycontrolling for main effects but not the interaction (Type II approach) 118 Type I ANOVA table for the gender and education data . . . . . . . 129 Regression coefficient for gender predicting salary (Type I approach) 121There are some fairly nuanced issues that arise when analyzing datawith categorical predictors and unbalanced cell sizes. In my opinion, manytextbooks fail to present these issues clearly. What follows is an attempt toclarify the issues, using an example-based approach.1 The issuesThe problem is basically this: with equal samples sizes, you can easily con-struct uncorrelated contrast codes, and the interpretation of the coefficientsis unambiguous and straightforward. With unequal cell sizes, contrast-codedvariables will be correlated even when the design matrix is orthogonal. Thismeans that in the unbalanced case, one has to decide how to treat the over-lapping variance shared by the contrast coded variables.Textbooks often discuss this problem under headings like weighted vs.unweighted and types of sums of squares (SS). This focus on the differenttechniques that can be used to analyze unbalanced designs can sometimeslead students to ask questions like “which type of SS should I use”. In fact,the real issue is that there are different hypotheses that can be tested whenyou have unbalanced data, and the different techniques (types of SS etc.)simply refer to some of these different hypotheses. In my view, it is better totalk straightforwardly about the actual hypotheses rather than focus on theterminology. On the other hand, it’s important to know a number of termsso that you’ll be able to understand them when you encounter them in theliterature or in your day-to-day research activities. In this article I’ve triedto explain the meaning of several key terms, while emphasizing the benefitsof talking directly about hypotheses.1.1 Formulating hypothesesAs noted above, the central issue revolves around the question “what is thehypotheses you want to test?” If you can answer this question clearly, thebattle is half won. In the examples that follow, I use example data from2X2 between-participants designs. Obviously your data will not always bethis simple, but understanding the possible hypotheses in this simple casewill hopefully help you generalize to other situations as well.So what hypotheses can we ask in the 2X2 between participants case?Well, among other things we can ask:1. What is the effect of variable 1 on y, ignoring variable 2?2. What is the effect of variable 2 on y, ignoring variable 1?23. What is the effect of variable 1 on y, controlling for variable 2?4. What is the effect of variable 2 on y, controlling for variable 1?5. Does the effect of variable 1 on y depend on the level of variable 2?It happens that when we have equal numbers of observations in eachcell, question 1 is the same as question 3, and question 2 is the same as 4.Because of this, it is less likely that one will accidentally test a hypothesisother than the one they are interested in. However, when there are unequalnumbers of observations in each cell, question 1 is not the same as question3, and question 2 is not the same as 4. In this case, it is important to clearlyunderstand which hypothesis you want to test, and to make sure you aretesting what you think you are.1.2 What does it mean to “control for” or “ignore”?“Ignoring”means that you do not take the overlapping variance into account.You let your predictor take credit for the overlap it shares with other predic-tors. “Controlling for” means the same thing in this context that it usuallydoes in multiple regression. That is, it means that we are testing the effectof a variable after taking out the variance due to another variable. Anotherway to say it is that we are testing the effect of variable 1 after removingthe overlap between variable 1 and variable 2.It follows that one way to understand the unequal cell size issue is toclearly understand what the overlapping variance represents. The overlap-ping variance represents the extent to which variable 1 can be predicted fromvariable 2. For example, if you are studying depressed vs. not-depressed per-sons, and males vs. females, it may be the case that more females than malesfall into the depressed category. This means that if you know that a personis depressed, the probability that the person is also a female is > 50%, i.e.,depression is correlated with gender. So do you want to control for genderwhen predicting something from the depressed vs. not depressed variable?If you do not control for it, than you are giving the depressed variable creditfor all the variance that it shares with gender. If you control for genderwhen predicting your outcome from the depressed/not depressed variable,then you are testing whether depressed status predicts the outcome over andabove the effect of gender.Because we


View Full Document

O-K-State PSYC 5314 - Working with unbalanced cell sizes in multiple regression with categorical predictors

Download Working with unbalanced cell sizes in multiple regression with categorical predictors
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Working with unbalanced cell sizes in multiple regression with categorical predictors and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Working with unbalanced cell sizes in multiple regression with categorical predictors 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?