Econ 513 USC Fall 2005 Lecture 17 Discrete Response Models Random Coefficient or Mixed Multinomial Logit Models Let us first recall some of the properties of the conditional logit We consider a case with 3 choices dinner at Spargo Watergrill or McDonalds y S W M There is only one characteristic of the choice that matters price To make the comparisons simpler let us suppose that the prices for the first two are equal and much higher than for the other one PS PW PM The coefficient on this characteristic in the utility function is 0 We leave out the intercept in the utility function for simplicity These would capture taste preferences for the three restaurants So the utilities for the three choices are UiS PS iS UiW PW iW and UiM PM iM The probability of dinner at Spargo is Pr yi S Pr UiS max UiS UiW UiM exp PS exp PS exp PW exp PM It follows from the IIA independence of irrelevant alternatives property of the conditional logit that Pr yi W yi 6 S Pr UiW UiM UiS max UiS UiW UiM exp PW exp PW exp PM It is also clear that the probability that UiW UiM is Pr UiW UiM exp PW exp PW exp PM Thus it follows that Pr UiW UiM yi S Pr UiW UiM UiS max UiS UiW UiM 1 exp PW exp PW exp PM So an implication of the IIA property is that the probability that the second choice is Watergrill given that the first choice is Spargo is the same as the conditional probability that you choose Watergrill to begin with Again very unappealing So suppose that Spargo raises its prices Fewer people will go there and they will go instead to Watergrill and MacDonalds The division of this subpopulation of people who would have gone to Spargo over the two remaining choices is the same as in the population as a whole Now suppose that there are two types of people in terms of their price sensitivity We model this as a discrete mixture model to keep it tractable for the time being i with 0 and Pr i Pr i 1 2 People with i are more price sensitive and thus less likely to go to Spargo and Watergrill than people with i Now in this mixture random coefficients model let us look at the probability Pr UiW UiM yi S 1 and compare this to the marginal probability Pr UiW UiM The latter is Pr UiW UiM Pr UiW UiM i Pr i Pr UiW UiM i Pr i exp PW 1 exp PW 1 exp PW exp PM 2 exp PW exp PM 2 To study the probability in 1 it is useful to first consider the probability Pr i yi S Pr yi S i Pr i Pr yi S exp PS exp PS exp PW exp PM exp PS exp PS exp PW exp PM 12 1 2 exp PS exp PS exp PW exp PM 2 1 2 With 0 and PW PS PM it follows that exp PS exp PS exp PS exp PW exp PM exp PS exp PW exp PM and that Pr i yi S is less than 1 2 Pr i Not surprisingly the probability that someone choosing to eat at Spargo conditioning on yi S is less likely to be a price sensitive type a type with i than a typical person Now let us go back to the probability Pr UiW UiM yi S Conditional on i we have Pr UiW UiM yi S i exp PW exp PW exp PM Thus Pr UiW UiM yi S Pr UiW UiM yi S i Pr i yi S Pr UiW UiM yi S i Pr i yi S exp PW Pr i yi S exp PW exp PM 1 2 exp PW Pr i yi S exp PW exp PM exp PW exp PW exp PW exp PM exp PW exp PM 1 Pr i yi S 2 3 exp PW exp PW exp PW exp PM exp PW exp PM 1 2 exp PW exp PW exp PW exp PM exp PW exp PM Thus Pr UiW UiM yi S Pr UiW UiM The probability that Watergrill is the second choice given that Spargo is the first choice is higher than the marginal probability that Watergrill is preferred to MacDonalds Another implication is that increasing the price of a dinner at Spargo reduces the demand for dinner at Spargo with more of that reduction going to Watergrill than to MacDonalds compared to their original shares This is much more plausible than the IIA property The implication of this argument is that allowing for heterogeneity in the coefficients can get us around the undesirable properties of the conditional logit model Another way of thinking about this approach is to write i 0 i Then we can write the utilities as UiS 0 PS iS UiW 0 PW iW and UiM 0 PM iM where the three unobserved components iS iW iM are no longer independent but instead have the structure iS PS i iS iW PW i iW iM PM i iM Unobserved components are now positively correlated with the strength of the correlation depending on the closeness of the observed characteristics the price in this case At the same time this correlation structure does not add a lot of additional parameters In this case we can add just a single parameter one for the variance of i and allow for correlations 4 between all unobserved components This can be both a advantage and a disadvantage Such a structure cannot pick out any correlation structure that exists between the unobserved components in the utility More generally we can model the utility for choice j as Uij i0 xij ij with i a random coefficient We can allow the mixture distribution to partly depend on individual characteristics i x0i i where the xi are individual specific covariates Recall that the original conditional logit allowed those to enter additively into the utility function Now they are allowed to affect both the slope and the intercept of the utility function Typically researchers make parametric assumptions on the i e g multivariate normal or a discrete distribution with only a couple of points of support Let us denote the parameters of i by Obviously we do not have to allow all parameters to vary accross individuals In practice we may only want to do this for the most important covariates such as prices Estimation is difficult for these models Simply attempting to write down the likelihood function and maximize will only work if there are few parameters For example if the i are multivariate normal with dimension K evaluating the likelihood function will involve solving a K dimensional integral That is computationally difficult Accurate approximations based on numerical integration are difficult for K 2 One can sometimes do this through simulation of the likelihood functions or of the probabilities themselves simulated maximum likelihood 5
View Full Document