Unformatted text preview:

Chapter 10Is the Model Any Good?1In the last chapter we built regression models that measured the effects of several explana-tory variables on a dependent variable. For example, how educational background, priorexperience, years with a company, job level, or gender affect salary. We determined howeach explanatory variable, whether numerical or categorical, expressed its effect on salarythrough its coefficient in the regression equation. The process of building such a model is astatistical one; that is, it involves determining a best-fit equation by calculating how muchof the total variation is accounted for by the model. This calculation, in turn, is based oncertain probabilistic assumptions concerning how the data is distributed. The first section ofthis chapter concerns how confident we can be that the coefficients of our explanatory vari-ables are trustworthy. This is critically important if we are to make decisions based on ourunderstanding of what a model seems to be telling us. We need criteria to determine whichexplanatory variables are truly significant in affecting the dependent variable–and which arenot–if our model is to be at all useful. This section helps us to separate the wheat from thechaff.The second section of this chapter furthers the process of building more complex andaccurate models from several explanatory variables by considering how interactions betweenthe variables themselves might have an effect on the dependent variable. That is, some ofthese variables might express their effects on the dependent variable in combination withother explanatory variables. In fact, there are even cases in which an explanatory vari-able appears to have a significant effect only when it is combined with one or more otherexplanatory variables. For example, it may be that employees’ gender by itself has no sig-nificant effect on salary, but gender together with job level might have a negative impact onsalary. That is, the negative effect of gender on salary only has a significant impact when theemployee is a female in a higher-level position: the well-known ”glass-ceiling” effect. Thissection, then, concerns not only the effects of several individual explanatory variables on adependent variable, but also the effects of pairs of them on the dependent variable. You willlearn in this chapter how to create multiple regression models with interaction variables builtfrom both numerical and categorical explanatory variables and assess their significance. Youwill learn how to analyze and interpret these often complex models.• As a result of this chapter, students will learn1c2011 Kris H. Green and W. Allen Emerson285286 CHAPTER 10. IS THE MODEL ANY GOOD√How to determine the trustworthiness of the coefficients of a regression equation√How to determine which coefficients should be kept in a model and which shouldnot√How to interpret models with complex interaction terms involving both numericaland categorical variables• As a result of this chapter, students will be able to√To determine with 95% confidence the range of values within which regressionscoefficients fall√Create interaction terms√Identify the reference categories of interaction variables√Use StatPro’s interaction routine to construct dummy variable for interactionvariables√Construct a model using interaction terms√How to use StatPro’s stepwise regression routine to build complex models10.1. WHICH COEFFICIENTS ARE TRUSTWORTHY? 28710.1 Which coefficients are trustworthy?In the last chapter, several regression models of EnPact’s employee salary structure weredeveloped in order to determine if female employees earn less than their male counterparts.These models indicate that females do earn less than their male counterparts, often manythousands of dollars a year less, depending on which variables are used in the models. AsEnPact’s Human Resources Director, you are aware that if females do indeed earn substan-tially less than males, say $5000 a year, then EnPact could be liable for a potentially ruinousmulti-million dollar law suit. But to what degree can you be confident that these models areindeed producing accurate results?We will answer this question and related questions in this chapter, but first we need someconcepts.Suppose we have a regression equation with two explanatory variables, X1and X2, andtheir coefficients, and , respectively:dependent variable = constant + B1× X1+ B2× X2If one of the coefficients is zero, say B1, then X1makes no contribution to the dependentvariable no matter what value it takes on because 0 × X1= 0 and the equation reduces todependent variable = constant + B2× X2In this case, X1is said to be insignificant.Just because a coefficient is nonzero, however, does not mean that the variable is neces-sarily significant. A statistician would warn us that regression coefficients are only estimates2and that some of them, in fact, should–or rather could–be zero. The question is, then, canwe identify which variables could possibly have zero coefficients and thus be eliminated fromour analysis because they are insignificant? The answer is: not with 100% certainty–butwe can be 95% confident as to which variables are significant and which are not. Whenstatisticians use the phrase, ”95% confident,” they mean that 95% of the time we will beable to correctly identify whether a particular variable is or is not significant.We need to understand two formulations concerning what it means to say that a variableis significant:1. A variable is significant if we are 95% confident that its coefficient is nonzero is equiv-alent to saying2. A variable is significant if there is less than a 5% chance that its coefficient is zero.Both of these perspectives concerning the significance of a variable are given to us inregression output and provide slightly different information.2Remember: the data we are working with is a sample rather than the entire population. If we samplethe data again, we would get different values for the coefficients in the regression model.288 CHAPTER 10. IS THE MODEL ANY GOOD10.1.1 Definitions and Formulasp-value The probability that a particular regression coefficient is zero. When p is small,say less than .05, there is only a 5% chance or less than the coefficient is zero.Significant variable or coefficient A variable or a coefficient of a variable is significantwhen its p-value is less than .05. That is, there is less than a 5% chance that


View Full Document

SJFC MSTI 130 - Lecture notes

Download Lecture notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?