Macalester MATH 155 - Lab 9 – Interaction and Logistic Regression

Unformatted text preview:

http://lab10.pdf 1Math 155, Spring 2005Introduction to Statistical ModelingDaniel Kaplan, Macalester CollegeLab 9 – Interaction and Logistic RegressionThis lab is about two topics in modeling. Interaction refersto a particular way of structuring models; it is a concept thatapplies to a wide variety of models. Logistic Regression is aform of nonlinear modeling that is particularly useful whenthe response variable is binomial, e.g., yes or no, alive or dead,success or failure. The two concepts are brought together inthis lab only because we want to cover both of them in thecourse and in part to show how interaction applies both inlinear modelling and nonlinear modelling.InteractionThe data set noise.csv is from an experiment to study hownoise affects the performance of children. In the experiment,second graders were given a test of math problems and theirscore was recorded. The children were either “normal” or“hyperactive,” and the test conditions were either high noiseor low noise. Each child took just one test.We want to see how the noise level affects the test scoreand whether hyperactivity plays a role as well. Test score(score), a quantitative variable, will be the response vari-able. Group and noise level (group and noise) will be theexplanatory variables. Although group could obviously notbe manipulated e xperimentally, the noise level could be. (SeeExercise 1.)To start, we read in the data.> a = read.csv(’noise.csv’)It’s often helpful to look at the data graphically. In thiscase, a boxplot of the test score as a function of the explana-tory variables is appropriate:> boxplot(score ~ group, data=a)> boxplot(score ~ noise, data=a)It looks like the hyperactive children score lower, and thathigh noise levels increase the spread of scores compared to lownoise levels, but don’t alter the median score by very much.We can confirm these visual impressions by constructinglinear models:Math 155: Introduction to Statistical Modeling: April 26, 2005http://lab10.pdf 2> lm(score ~ group, data=a)Coefficients:(Intercept) grouphyperactive187.80 -57.95The negative coefficient on group indicates that being hyper-active is associated with lower scores. The effect is statisti-cally significant.The level of noise also appears to have an effect on thescore, although it is only about 1/5 as strong as that of beinghyperactive.> lm(score ~ noise, data=a)Coefficients:(Intercept) noiseLow164.68 -11.72We can also look at the effects of both explanatory vari-ables at the same time:> lm(score ~ group+noise, data=a)Coefficients:(Intercept) grouphyperactive noiseLow193.65 -57.95 -11.71Perhaps surprisingly, the coefficients on each variable are iden-tical in the two-variable model as in the one-variable models.It’s easy to understand why this is: the assignment of groupwas made to be orthogonal to the group. This was done byblocking on the group.This orthogonality shows up in the ANOVA rep ort; theorder of the variables makes no difference:> summary(aov(score ~ group + noise, data=a))Df Sum Sq Mean Sq F value Pr(>F)group 1 335762 335762 481.521 < 2.2e-16noise 1 13724 13724 19.682 1.186e-05Residuals 397 276826 697> summary(aov(score ~ noise + group, data=a))Df Sum Sq Mean Sq F value Pr(>F)noise 1 13724 13724 19.682 1.186e-05group 1 335762 335762 481.521 < 2.2e-16Residuals 397 276826 697The orthogonality was achieved by making the study bal-anced — evenly splitting the assignment to noise levels foreach of the two groups. This can be seen by a simple table ofthe number of cases with each combination of the noise leveland group.> table(a$group, a$noise)High Lowcontrol 100 100hyperactive 100 100There are 400 cases altogether, evenly split among thefour different possible combinations of noise level and groupmembership.Look back at the ANOVA reports, paying particular at-tention to the degrees of freedom. Note that each of thevariables, group and noise, contributes one degree of free-dom. This makes sense because each of those variables hastwo levels, and the number of degrees of freedom is one lessthan the number of levels (since there is redundancy with the1s vector). But looking at the table indicates that their arefour different explanatory conditions: normal with low noise,normal with high noise, hyperactive with low noise, and hy-peractive with high noise. Seen this way, it seems that thereshould be three degrees of freedom in the model (since thereare 4 different explanatory conditions).What’s happened to the 3rd degree of freedom in themodel?The answer is that there is nothing in the modelscore ~ group + noise to reflect the possibility that noisemight effect the different groups differently. Here is a boxplotshowing the score for each of the four different explanatoryconditions:Notice that for the hyperactive subjects, high noise is as-sociated with lower scores, while for the control subjects theopposite is true.This is an interaction between group and noise. An inter-action term, in general, describes a situation where the asso-ciation of a response variable with one explanatory variabledepends on the level of another explanatory variable.An interaction term, in the R-language syntax, is specifiedusing a colon. Here is the model that includes the interactionbetween group and noise:> lm(a$score ~ a$group + a$noise + a$group:a$noise)Coefficients:(Intercept) a$grouphyperactive210.65 -91.94a$noiseLow a$grouphyperactive:a$noiseLow-45.71 67.99The fourth coefficient gives the amount to be added for hy-peractive kids who are in a low-noise environment. Adding upthe coefficients appropriately for the four levels of the crossedfactors gives the same model values as the four group meanswe encountered in the t-tests.Math 155: Introduction to Statistical Modeling: April 26, 2005http://lab10.pdf 3A shorthand for including the main effects of both fac-tors along with their interaction is to use the multiplicationsymbol: a$score ~ a$group*a$noiseIn constructing the model matrix, each of the main effectsin the model c orresponds to one or more vectors. The inter-action term is constructed from taking the products of thevectors that arise from each level of the main effects.Interaction effects apply not just to nominal variables, butto qualitative variables as well. For example, in the hotdogdata, we might be interested in constructing a model of calo-ries as a function of hotdog type and of the sodium content:> lm(hotdogs$cals ~ hotdogs$type + hotdogs$sodium)Coefficients:(Intercept)


View Full Document

Macalester MATH 155 - Lab 9 – Interaction and Logistic Regression

Download Lab 9 – Interaction and Logistic Regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lab 9 – Interaction and Logistic Regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lab 9 – Interaction and Logistic Regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?