SWARTHMORE CS 97 - Using Semantic Information from Neural Networks to Detect Context-Sensitive Spelling Errors - D2077696

Home> Schools> Swarthmore College> (CS) > CS 97> Using Semantic Information from Neural Networks to Detect Context-Sensitive Spelling Errors

DOC PREVIEW

SWARTHMORE CS 97 - Using Semantic Information from Neural Networks to Detect Context-Sensitive Spelling Errors

School name Swarthmore College

Course Cs 97- Computer Perception

Pages 7

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Appeared in: Proceedings of the Class of 2003 Senior Conference, pages 16–22Computer Science Department, Swarthmore CollegeUsing Semantic Information from Neural Networks to DetectContext-Sensitive Spelling ErrorsJulie CorderSwarthmore College CS97Spring 2003AbstractThis paper proposes a means of using the internal representationsof an artificial neural network to represent the semantic contexts inwhich a word can appear. Once the network has been trained, itshidden layer activations are recorded as a representation of theaverage context in which a word can appear. This context can thenbe compared to the contexts in which a word appears in novel textto detect context-sensitive spelling errors. While no significantresults are found in the trials described here, several modificationsof the system are proposed that might prove promising in futurework.IntroductionContext sensitive spelling correction is the process of identifying words in writtentext that are spelled correctly but are used in the wrong context. Kukich (1992)discusses various studies that show that between 25% and 40% of spellingerrors in typed text result in legal words. This category of spelling errors includesword pairs that are easily mistyped (e.g. “form” and “from”), homophones (e.g.“they’re”, “their” and “there”) and words with similar usages (e.g. “affect” and“effect”). Because all of these errors result in words that are valid, an approachthat relies on just a dictionary look-up process will not detect them as spellingerrors. Further, Atwell and Elliott [1987] found that 20% to 38% of errors in textsfrom a variety of sources resulted in valid words that did not result in localsyntactic errors. Since dictionary- and syntax-based approaches are not able todetect most context-sensitive spelling errors, semantic clues must be taken intoaccount to determine if the correct word is being used in a given context.Previous WorkInstead of relying on a comparison to a dictionary of valid words, researchersinterested in context sensitive spelling correction must find ways to represent thesemantic context in which a word occurs to determine if it is spelled correctly.This approach may be as simple as calculating statistical probabilities of wordsappearing in certain n-grams, or they may involve greater syntactic and semanticanalysis of a corpus. Jones and Martin [1997] report accuracy rates of 56% to94% for various sets of confusable words using Latent Semantic Analysis.Granger [1983], Ramshaw [1989] and others have used expectation-basedtechniques. Their systems maintain a list of words that they expect to see next in16Appeared in: Proceedings of the Class of 2003 Senior Conference, pages 16–22Computer Science Department, Swarthmore Collegea corpus based on semantic, syntactic, and pragmatic information in the text. Ifthe next word that appears is not on the list of words that were expected, it ismarked as a spelling error. In this way, the systems can both detect spellingerrors and learn the meaning of new words (by comparing to the meanings of theexpected words when a novel word appears).In all of these cases, though, the researcher must specify the level of informationthat is relevant to the task. Jones and Martin [1997], for example, specifically telltheir system to look at a window of seven words before or after the word inquestion to build the initial matrices for their analysis. They rely on the researcherto determine how big the window should be. Further, since they look at wordsbefore and after the word in question, their method is only useful with completetexts.These limitations can, perhaps, be avoided by a system that incorporates aneural network. Artificial neural networks (ANNs) are well-suited to a variety ofNLP tasks; they can develop their own characterization of which features of aproblem are most significant. In addition, simple recurrent networks can store acopy of their previous hidden layer activations. In this way, they are able to buildup abstract representations of patterns that occur throughout time [Elman et al.1996]. Thus, a simple recurrent network should be able to develop an abstractrepresentation of the current context by looking at its internal representation ofany number of the words that come before the current word. Given this context,an expectation-based system should be able to predict which words should beable to come next in the text. If the actual next word is not on this list, it should bemarked as a spelling error. Further, this system can be combined with a shortestpath algorithm to select a word from the list as the correct word, as Wang andJean [1993] did to correct spelling errors resulting from character merging duringOCR. Because this method does not look at future words, it would be useful inapplications like word processing systems, where the most recently entered wordcan be examined for a potential context-sensitive spelling error before more textis entered.MethodsOne of the most limiting aspects of neural networks is the fact that the timeneeded to train them increases rapidly as the size of the network increases. Totest my method, it was necessary to drastically limit the size of the inputrepresentation for the network. Consequently, a very small vocabularyrepresented in localist binary vectors was used to encode the corpus. Vocabularywords were represented by vectors whose length was equal to the number ofwords in the vocabulary. For a vocabulary of twenty-five words, then, onlytwenty-five bits were needed to represent any given word. Each vector consistedof exactly one bit that was “on,” and the rest of the bits were set to zero.17Appeared in: Proceedings of the Class of 2003 Senior Conference, pages 16–22Computer Science Department, Swarthmore CollegeTraining and testing data came from a part of speech tagged Wall Street Journalcorpus. Several categories of words were collapsed into a single “pseudoword”based on part of speech as a means of decreasing the vocabulary size. Inparticular, the part of speech categories of NN, NNP, NNS, JJ, VBD, VBN, VBZ,DT, and MD were only recorded in the training data by their part of speech class.Further, all punctuation marks were collapsed into the single pseudowordPUNCT. Finally, all numerals and number words were changed to the pseudoword CD since each number is relatively uncommon in the training text butnumbers can usually appear in the same positions in texts. The remaining

View Full Document