Stanford LING 289 - LING 289 LECTURE NOTES

Unformatted text preview:

The winner takes it all – almost. Cumulativity in grammatical variation Gerhard Jäger & Anette Rosenbach Robert Munro, for LING 289: Quantitative and Probabilistic Explanations in Linguistics. 03 Dec 2007 Synopsis The paper argues that Maximum Entropy (MaxEnt) models are preferable to Stochastic Optimality (StOT) models, as MaxEnt models allow low-ranked constraints to ‘gang-up’ on high ranked constraints. That is, they allow cumulativity. In addition to ganging-up cumulativity the authors distinguish counting cumulativity. Counting cumulativity is simply being sensitive to the number of violations of a single constraint. In a sense it is no different to ganging-up cumulativity, simply allowing a higher ranked constraint to be ganged-up on by multiple violations of the same lower-ranked constraint. Many of the arguments follow from existing comparisons of MaxEnt and StOT made by Goldwater and Johnson (2003). The authors give a worked example modeling English genitive variation, demonstrating that MaxEnt models give a better account of the observed data than StOT. Background: Stochastic OT Researchers have generally found that StOT is better than standard OT in predicting the relative frequency of the outcomes in observed data.1 StOT is similar to standard OT, but instead of there being hard-divisions between constraints, the ranking of constraints is defined by normal distributions on a continuum. Instead of one constraint outranking another in 100% of cases, as in standard OT, it will outrank another constraint p% of the time, where p is determined by the degree to which the two distributions intersect. In order words, StOT extends OT by defining a probability of outcomes, for each ranking, not just a single dominant outcome. 1 But not always, cf: Paul Kiparsky (2005).In standard OT, a1, b1 and d1 are the winners in (1), but it’s possible that the dual violations o c2 and c3 are ‘worse’ than the one violation of c1, so perhaps d2 is a more desirable outcome. Possible outcomes: • standard OT will not allow d2 to be the winner. • StOT can allow d2 to be the winner, but never with more frequency than d1. If we have data where we observe d2 with more frequency than d1 in the context of violations of c2 and c3, we need a probabilistic model of constraints that has the possibility of predicting this.2 The authors demonstrate that MaxEnt models are a good way to achieve this. Maximum Entropy models MaxEnt models are also known as log-linear models. They differ mostly from what we’ve seen in class in the terminology: • Bias. The amount by which a model differs from the observed data. The least biased of all possible models is therefore the best fit. • Entropy. An information theoretic notion that quantifies the bias. The entropy H of a probability distribution p is defined as: For all intents and purposes they are using logistic-regression, but note the higher the entropy, the lower the bias. They set up the models as follows: • Each feature represents a constraint; with each value the number of observed violations. • The ‘rank’ of a constraint is the weight given to that feature after the model has been fit. 2 The authors note, citing pc from Paul Boersma, that standard OT could be extended to allow d2 to be the winner by simply modeling that c2 and c3 combined outrank c1. The authors call this strong cumulativity, as opposed to the cumulativity implemented in the paper which is weak cumulativity. Strong cumulativity entails the weak.English genitive variation The worked example in the paper is on English genitive variation, looking at the various factors that contribute to it: They demonstrate that while animacy is the most important factor, the others factors can interact. Referring to earlier work, they looked at the weight of the NP, but needed to tease NP-weight apart from animacy, as the two can correlate. They find that the relative strength of animacy and weight is not absolute, but depends on the NP-weight of the possessor. They define NP-weight as the number of pre-modifiers, hence it is an example of counting cumulativity, that is, every pre-modifier is modeled as a violation of a NP-weight constraint against modifiers for prenominal genitives: They build models of the data using both MaxEnt and StOT, comparing the two to the observed data using the Kullback-Leibler distance. The results show that MaxEnt is a (slightly) better fit. The main set of results are on the next page, where Figure 1 is the observed distribution, Figure 4 the predictions of StOT, and Figure 5 the predictions of MaxEnt.Conclusions The paper concludes with a reply to some criticisms of MaxEnt modeling that presumably originated in the StOT community: 1. No evidence for cumulativity has been brought forward so far - this is an isolated phenomenon. 2. MaxEnt models are basically a version of Harmonic Grammar. The factorial typology that is predicted by HG is much more liberal than the predictions of OT, and the available evidence suggest that OT is closer to the truth… 3. Counting cumulativity can always be avoided by binarizing constraints. 4. StOT is cognitively more realistic than MaxEnt, whatever the mathematical merits of the latter model may be. To which the authors answer: 1. True. 2. Only for categorical data. 3. True, but the MaxEnt model is simpler and you don’t need to make real/integer values categorical. 4. They’re equally realistic. The paper deserves to bring more ‘converts’ to logistic-regression than it will probably generate (if that is its primary goal). By focusing on a fairly narrow set of syntactic features the authors aren’t selling the potentially exciting scope of allowing cumulativity. There are many contextual influences that could be modeled. For example, the existing feature of topicality could be extended to any number of similar pragmatic conditions that would contribute to the outcome non-deterministically.Some notes for discussion Feature interaction The authors demonstrate that a MaxEnt model is a further relaxation of standard OT. But they’ve relaxed the advantages of a hierarchy of constraints right out the window. For all its shortcomings, standard OT does model some feature interaction.3 To adapt the terminology of the authors, standard OT allows strong cumulative dominance, but MaxEnt does not. Consider: 1. There is an outcome o1 that is always observed when a constraint a1 is inviolate. 2.


View Full Document

Stanford LING 289 - LING 289 LECTURE NOTES

Documents in this Course
Load more
Download LING 289 LECTURE NOTES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view LING 289 LECTURE NOTES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view LING 289 LECTURE NOTES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?