DOC PREVIEW
CORNELL CS 674 - Study Notes

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Techniques for Anaphora Resolution: A SurveyTejaswini Deoskar1 Introduction2 Relevance of this problem2.2 Relevance from the Linguistics point of view2 Background2.1 Knowledge-rich Approaches2.1.1 Syntax-based approachesThese approaches typically assume the existence of a fully parsed syntactic tree and traverse the tree looking for antecedents and applying appropriate syntactic and morphological constraints on them. Hobbs 1977 is a classical result using this approach.Hobb’s Algorithm(2) John saw a picture of him. Hobbs 19772.1.3 Hybrid Approaches2.1.4 Corpus based Approaches2.2 Knowledge-poor ApproachesIn recent years, there is a trend towards knowledge-poor approaches that use machine learning techniques. Soon, Ng, Lim (2001) obtained results that were comparable to non-learning techniques for the first time. They resolved not just pronouns but all definite descriptions. They used a small annotated corpus to obtain training data to create feature vectors. These training examples were then given to a machine learning algorithm to build a classifier. The learning method they used is a modification of C4.5 (Quinlan 1993) called C5, a decision tree based algorithm. An important point to note about their system is that it is an end-to-end system which includes sentence segmentation, part-of-speech tagging, morphological processing, noun phrase identification and semantic class determination.Cardie and Ng (2002) tried to make up for the lack of linguistically motivated features in Soon, et al.’s approach. They increased the feature set from 12 to 53. They introduced additional lexical, semantic and knowledge based features, with a large number of additional grammatical features, that included a variety of linguistic constraints and preferences.3 Comparison of Approaches4 Evaluation of PerformanceFollowing these guidelines can facilitate a realistic evaluation and comparison of results.5 Aspect as an indicator of backgrounding or foregrounding5 Concluding Remarks and Future WorkReferencesTechniques for Anaphora Resolution: A SurveyTejaswini DeoskarCS 6745/17/20041 Introduction Anaphora resolution is one of the more prolific areas of research in the NLP community.Anaphora is a ubiquitous phenomenon in natural language and is thus required in almostevery conceivable NLP application. There is a wide variety of work in the area, based onvarious theoretical approaches. Simply stated, anaphora resolution is the problem of finding the reference of a nounphrase. This noun phrase can be a fully specified NP (definite or indefinite), a pronoun, ademonstrative or a reflexive. Typically this problem can be divided into two parts – (i) Finding the co-reference of a full NP (commonly referred to as co-referenceresolution) (ii) Finding the reference of a pronoun or reflexive (commonly referred to as anaphora resolution). The second part of the problem may be thought of as a subset of the first. Though there are similarities in the two problems, there are significant differences in the function of pronouns and that of full NP’s in natural language discourse. Thus significant difference is seen in their distribution too. For instance, a broad heuristic is that pronouns usually refer to entities that are not farther than 2-3 sentences, while definite NP’s can refer to entities that are quite far away.In this paper, I examine in detail various approaches in this area, with more focus onanaphora resolution than noun phrase co-reference resolution. I look at these approachesfrom the point of view of understanding the state of art of the field and also from the viewof understanding the interaction between NLP research in the computational linguisticscommunity and theoretical linguistics. Due to the second goal, I have looked at someclassical results in the field (such as Hobbs 1977) (even though they are dated), since theywere motivated mainly by linguistic considerations. I also note that most knowledge sources in anaphora resolution research have drawn onstructural indications of information prominence, but have not considered other sourcessuch as tense and aspect, which may prove to be important knowledge sources.12 Relevance of this problem2.1 Relevance to NLPFrom the NLP point of view, anaphora resolution is required in most problems such asquestion-answering, information extraction, text summarization, dialogue interpretationsystems, etc. Thus to a large extent, successful end-to-end systems require a successfulanaphora resolution module. This implies that the various forms of preprocessingrequired in anaphora resolution systems, such as noun phrase identification,morphological processing, semantic class determination, etc. are equally relevant to theissue.2.2 Relevance from the Linguistics point of view Binding Theory is one of the major results of the principles and parameters approachdeveloped in Chomsky (1981) and is one of the mainstays of generative linguistics. TheBinding Theory deals with the relations between nominal expressions and possibleantecedents. It attempts to provide a structural account of the complementarity ofdistribution between pronouns, reflexives and R-expressions1. Condition A: A reflexive must be bound in its governing category2Condition B: A pronoun must be free in its governing categoryCondition C: An R-expression must be free.However, this formulation of the Binding Theory runs into major problems empirically.Currently, various modifications to the standard Binding Theory exist as also somecompletely different frameworks (such as Reinhart and Reuland (1993)’s semanticpredicate based theory) to explain binding phenomenon. 2.3 Dichotomy between Linguistic and NLP ResearchThe Binding Theory (and its various formulations) deals only with intrasententialanaphora, which is a very small subset of the anaphoric phenomenon that practical NLPsystems are interested in resolving. A much larger set of anaphoric phenomenon is theresolution of pronouns intersententially. This problem is dealt with by DiscourseRepresentation Theory and more specifically by Centering Theory (Grosz et al., 1995).Centering Theory, being more computationally tractable than most linguistic theories, has1 An R-expression is a referring expression like John, Bill, The dog, etc. which identifies an entity int eh real


View Full Document
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?