Stanford CS 224n - Joseph Baker-Malone and David Christie

Unformatted text preview:

CS224n – Final ProjectJoseph Baker-Malone and David Christie1. Theory and ApproachIn this project, we attempted to build an anaphora resolver for parsed sentences – i.e. asystem that takes a tree of parsed sentences, and links each pronoun to its antecedentnoun phrase. We based our approach on the work of Ge et. al [1], which was in turnbased on the work of Hobbs [2]. We limit our testing to pronouns and do not attempt toresolve references of non-pronoun words. We also limit our limit our search to pronounsreferring to words occurring no more than one sentence earlier in the text. This limitationis lessened somewhat by our reference-assigning method, discussed in the next section.1.1 CorpusFor our training and test data, we used the Brown Laboratory for Linguistic InformationProcessing (BLLIP) corpus, which contains parsed sentences as well as co-referentialinformation. This corpus is the output of Ge et al’s parsing and anaphora resolutionalgorithm, and thus may not be as accurate as a hand-parsed and tagged corpus. However,it was the only such parsed and tagged corpus available to us, and after evaluation wefound it to be accurate enough for our purposes.As is often the case in such corpora, all phrases referring to the same entity are labeledwith the same tag. We only attempt to perform anaphora resolution on pronouns, and sofor our purposes we consider each pronoun to refer to the previous non-pronoun phrasewith the same tag. While this may be a somewhat nonstandard approach, we believe it isappropriate and useful in this situation. It allows us to limit the search space for eachresolution, and improves the performance of the Hobbs algorithm especially.Additionally, it allows us to perform resolution on a larger percentage of the pronouns inthe testing corpus from only 58% to over 73%.1.2 The Hobbs AlgorithmThe first step to determining the antecedent of a pronoun is to collect a list of possibleantecedent candidates. We followed Ge’s footsteps in this respect using the Hobbsalgorithm. Hobbs algorithm starts by “intelligently” walking through the tree structure ofthe sentence containing the pronoun, looking for noun phrases that could be possibleantecedents, and adding them to a list of candidates. For details of the 9 steps in theHobbs algorithm, we will refer the reader to reference [2], or to the commented Java codein the method Hobbs.RunHobbs. But in a nutshell, the algorithm iteratively uses multiplebreadth-first searches in the tree. It first handles reflexive pronouns by looking at thesame depth in the tree where the pronoun resides. It then systematically works its way upthe tree, proposing possible NP or S candidate nodes.In the original Hobbs formulation simply walked through the tree in this manner, andproposed the first antecedent it encountered which matched gender/numberqualifications. Ge however uses a more probabilistic approach, which we adopted. Hedefined a metric called the “Hobbs distance (dH)”, which is simply stating that anantecedent is the nth candidate that the Hobbs algorithm found (e.g. if the antecedent wasthe second node that the Hobbs algorithm found, it would have dH=2). He then trained onthe data to find P(dH), which he calculated as follows:santecedentndwithsantecedentndPHH##)(===Note that this is clearly a probability distribution as it equals unity when summed acrossall dH. He then computed other probabilities (gender, mention count, etc) separately, andmultiplicatively computed a total probability. The following sections describe thesevarious factors we used (following the footsteps of Ge).1.3 Gender/Animaticity/Number ProbabilityWe incorporated gender/number information in the same way that Ge did. We firstdivided the pronouns into various “gender buckets” (male, female, inanimate, unknownplural, and unknown singular). We then collected frequency information on the trainingdata between each pronoun and the individual words in the antecedent NP. Theprobability of a pronoun (bucketed by gender) given a word in its antecedent is thus:( )aofoccurencespforantecedentinaofoccurencesapP##1)|(ε−=And the probability given to a pair not previously seen (an UNK) is:ε=)|( aUNKPWe can show this to be a probability distribution as follows:€ P( p | a)p∪UNK∑= P(UNK | a) + 1−ε( )# occurencesof ain antecedent for p# occurencesof ap∑=ε+ 1−ε( )# occurencesof a# occurencesof a= 1Note the smoothing for the case where we haven’t yet seen the word a paired withpronoun gender bucket p (in our implementation, we used € ε = 0.2, chosenexperimentally). This equation leaves us with the question of which word (a) in theantecedent to choose when testing. We tried computing probabilities on all of the wordsin the noun phrase (a = string of words), but due to data sparseness, this actually madeour accuracy a bit worse. So we took the solution of Ge, and incorporated the Dunningalgorithm[3,4] to pick the word in the antecedent phrase that has the highest loglikelihood when paired with the pronoun. A summary of the Dunning algorithm and ourimplementation of it is given in the following section.1.3.1 Dunning AlgorithmThe idea behind the Dunning algorithm is to treat comparisons of word pairs as abinomial process (like the flipping of a coin). Basically, a pair of words (say anantecedent word A and a pronoun B) are independent if p(A | B) = p(A | ~B) = p(A) (i.e.probability of a same if B present, not present, or no information about B is given). Thuswe can test for independence of A and B by checking for this equality. Of course, weknow that the words in general are not independent, but we’re looking for pairs that havehigh correlation (by what appears to be low independence). To this end, we calculate thefrequencies of pairs of pronoun words/antecedent words. Thus, for each pair (a, p) ofantecedent word a and pronoun p, we compute the following frequencies:A = € ε + (1-€ ε)*(# pairs with a and p) / (# pairs)B = € ε + (1-€ ε)*(# pairs without a but with p) / (# pairs)C = € ε + (1-€ ε)*(# pairs with a but without p) / (# pairs)D = € ε + (1-€ ε)*(# pairs without a or p) / (# pairs)Where we have € ε=0.1 for smoothing. We then compute the log likelihood as follows:likelihood = A*log(A) + B*log(B) + C*log(C) + D*log(D)-(A+B)*log(A+B) – (A+C)*log(A+C)-(B+D)*log(B+D) – (C+D)*log(C+D)+(A+B+C+D)*log(A+B+C+D)When testing an antecedent NP containing multiple nouns, we choose the


View Full Document
Download Joseph Baker-Malone and David Christie
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Joseph Baker-Malone and David Christie and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Joseph Baker-Malone and David Christie 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?