Nov 21 2006 LEC 14 ECON 240A 1 L Phillips Goodman Log Linear Model for Qualitative Data I Introduction Goodman s log linear model can be used to extend bivariate analysis for qualitative variables from 2x2 Chi Square tables or more generally mxn tables to trivariate and multivariate analyses This tool is the analog to multivariate linear regression exploring the relationship between a dependent variable and an explanatory variable conditional on other independent variables Analysis of the relationship between two variables is called two way analysis between three variables 3 way analysis etc However this technique is seldom used beyond 4 way or 5 way analysis in contrast to multivariate regression Another difference is that if there are 3 or more categories for a qualitative variable non linear aspects of a relationship can be revealed We will use an example of brand preference for detergent dependent on water hardness soft medium hard and prior use of brand M yes or no This example is from Yvonne Bishop Stephen Fienberg and Paul Holland Discrete Multivariate Analysis Theory and Practice We begin by developing the log linear model for 2 way analysis and then extend it to three way analysis for application to detergent preference If one of the variables in the analysis is clearly the dependent variable then the log linear model can be simplified and expressed in terms of the logarithm of the odds In the case of this example the log odds of preferring brand X to brand M II Survey of Detergent Preference Nov 21 2006 LEC 14 ECON 240A 2 L Phillips Goodman Log Linear Model for Qualitative Data The survey of detergent preference included 1008 people They were asked a number of questions One question is whether they preferred brand X or brand M Another question was the hardness of their water A third question was whether they had previously used brand M or not The results of the survey are described in the tables below depending on whether they had previously used brand M or not Table 1A Previous use of Brand M Yes Water Hardness Soft Medium Hard Totals Prefer Brand X 76 70 61 207 Prefer Brand M 77 102 95 274 Totals 153 172 156 481 Table 1B Previous Use of Band M No Water Hardness Soft Medium Hard Totals Prefer Brand X 92 99 110 301 Prefer Brand M 80 73 72 225 Totals 172 172 182 526 As a preview of coming attractions we will see how revealing looking at the odds can be In the next table we list the odds of preferring brand X to brand M as it varies with water softness and prior use of brand M The odds are calculated by dividing column 2 by column 3 in Table 1A and proceeding in a similar fashion for Table 1B Table 2 Odds of Preferring Brand X Over Brand M Vs Water Hardness and Prior Use Water Hardness Soft Medium Prior Use of Brand M 0 987 0 686 No Prior Use of Brand M 1 15 1 36 Nov 21 2006 LEC 14 ECON 240A 3 L Phillips Goodman Log Linear Model for Qualitative Data Hard 0 642 1 53 By comparing columns 2 and 3 of Table 2 it is apparent that the odds for preferring brand X are higher for those with no prior use of brand M Note that for these consumers the odds of preferring brand X increase with water hardness In contrast for those consumers who had used brand M previously the odds for brand X decrease with water hardness It would appear that preference for detergent brand X not only depends on water hardness but that the nature of that dependence is conditional on whether or not there has been prior use of brand M From this exploratory analysis using the odds approach it would appear that three way analysis is appropriate To begin at a simple starting point we collapse Tables 1A and 1B into a 2x2 two way analysis by summing over water hardness The survey results are reported in Table 3 Using the row sums and column sums from table 3 as well as the grand total of 1007 we calculate the marginal probabilities reported in Table 4 Using the marginal probabilities from Table 4 and the grand total of 1007 we calculate the expected cell counts reported in Table 5 Table 3 1007 People Given Two Brands of Detergent Observed Counts Brand X Brand M Prior Use of M 207 274 481 No Prior Use of M 301 225 526 Totals 508 499 1007 Table 4 1007 People Given Two Brands of Detergent Marginal Probabilities Prior Use of M No Prior Use of M Totals Nov 21 2006 LEC 14 ECON 240A 4 L Phillips Goodman Log Linear Model for Qualitative Data Brand X Brand M 0 4777 0 5223 0 5045 0 4955 1 Table 5 1007 People Given Two Brands of Detergent Expected Cell Counts Brand X Brand M Prior Use of M 242 7 238 4 No Prior Use of M 265 3 260 6 Totals 1 Lastly Using the observed cell counts in Table 3 and the expected cell counts in Table 5 we calculate the contribution to Chi Square for each cell as reported in Table 6 Table 6 1007 People Given Two Brands of Detergent Contribution to 2 Brand X Brand M Prior Use of M 5 3 5 3 No Prior Use of M 4 8 4 9 Totals 12 5 3 4 8 5 3 4 9 20 3 where the critical value at the 5 level is 3 84 so we reject the null hypothesis of no association between brand preference for these two detergents and prior use of brand M III Two Way Log Linear Model Preference Between Two Brands Vs Prior use of One A The Model The probability of each of the cells for example in Table 3 is Pij For example the observed probability for the first row and the first column is P11 207 1007 0 206 The probabilities for each cell are postulated to depend on the exponential of a linear function of a number of parameters where the superscripts B and U refer to brand X or Nov 21 2006 LEC 14 ECON 240A 5 L Phillips Goodman Log Linear Model for Qualitative Data M and yes or no for prior use of brand M The subscripts i and j refer to the row and column in the 2x2 table PijBU exp u uiB ujU uijB U 1 Taking natural logarithms lnPijBU u uiB ujU uijB U 2 hence the name log linear model The parameter u is an overall effect There are two parameters uiB one for each row or brand Similarly there are two parameters ujU one for each column or prior use of brand M yes or no Lastly there are four parameters uijBU one for each of the four cells in Table 3 Thus the log linear model has a total of nine parameters but we only need to fit four observed probabilities one for each cell in this 2x2 table Thus the model is …
View Full Document
Unlocking...