DOC PREVIEW
SWARTHMORE CS 97 - Using Semantic Information from Neural Networks to Detect Context-Sensitive Spelling Errors

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Appeared in: Proceedings of the Class of 2003 Senior Conference, pages 16–22Computer Science Department, Swarthmore CollegeUsing Semantic Information from Neural Networks to DetectContext-Sensitive Spelling ErrorsJulie CorderSwarthmore College CS97Spring 2003AbstractThis paper proposes a means of using the internal representationsof an artificial neural network to represent the semantic contexts inwhich a word can appear. Once the network has been trained, itshidden layer activations are recorded as a representation of theaverage context in which a word can appear. This context can thenbe compared to the contexts in which a word appears in novel textto detect context-sensitive spelling errors. While no significantresults are found in the trials described here, several modificationsof the system are proposed that might prove promising in futurework.IntroductionContext sensitive spelling correction is the process of identifying words in writtentext that are spelled correctly but are used in the wrong context. Kukich (1992)discusses various studies that show that between 25% and 40% of spellingerrors in typed text result in legal words. This category of spelling errors includesword pairs that are easily mistyped (e.g. “form” and “from”), homophones (e.g.“they’re”, “their” and “there”) and words with similar usages (e.g. “affect” and“effect”). Because all of these errors result in words that are valid, an approachthat relies on just a dictionary look-up process will not detect them as spellingerrors. Further, Atwell and Elliott [1987] found that 20% to 38% of errors in textsfrom a variety of sources resulted in valid words that did not result in localsyntactic errors. Since dictionary- and syntax-based approaches are not able todetect most context-sensitive spelling errors, semantic clues must be taken intoaccount to determine if the correct word is being used in a given context.Previous WorkInstead of relying on a comparison to a dictionary of valid words, researchersinterested in context sensitive spelling correction must find ways to represent thesemantic context in which a word occurs to determine if it is spelled correctly.This approach may be as simple as calculating statistical probabilities of wordsappearing in certain n-grams, or they may involve greater syntactic and semanticanalysis of a corpus. Jones and Martin [1997] report accuracy rates of 56% to94% for various sets of confusable words using Latent Semantic Analysis.Granger [1983], Ramshaw [1989] and others have used expectation-basedtechniques. Their systems maintain a list of words that they expect to see next in16Appeared in: Proceedings of the Class of 2003 Senior Conference, pages 16–22Computer Science Department, Swarthmore Collegea corpus based on semantic, syntactic, and pragmatic information in the text. Ifthe next word that appears is not on the list of words that were expected, it ismarked as a spelling error. In this way, the systems can both detect spellingerrors and learn the meaning of new words (by comparing to the meanings of theexpected words when a novel word appears).In all of these cases, though, the researcher must specify the level of informationthat is relevant to the task. Jones and Martin [1997], for example, specifically telltheir system to look at a window of seven words before or after the word inquestion to build the initial matrices for their analysis. They rely on the researcherto determine how big the window should be. Further, since they look at wordsbefore and after the word in question, their method is only useful with completetexts.These limitations can, perhaps, be avoided by a system that incorporates aneural network. Artificial neural networks (ANNs) are well-suited to a variety ofNLP tasks; they can develop their own characterization of which features of aproblem are most significant. In addition, simple recurrent networks can store acopy of their previous hidden layer activations. In this way, they are able to buildup abstract representations of patterns that occur throughout time [Elman et al.1996]. Thus, a simple recurrent network should be able to develop an abstractrepresentation of the current context by looking at its internal representation ofany number of the words that come before the current word. Given this context,an expectation-based system should be able to predict which words should beable to come next in the text. If the actual next word is not on this list, it should bemarked as a spelling error. Further, this system can be combined with a shortestpath algorithm to select a word from the list as the correct word, as Wang andJean [1993] did to correct spelling errors resulting from character merging duringOCR. Because this method does not look at future words, it would be useful inapplications like word processing systems, where the most recently entered wordcan be examined for a potential context-sensitive spelling error before more textis entered.MethodsOne of the most limiting aspects of neural networks is the fact that the timeneeded to train them increases rapidly as the size of the network increases. Totest my method, it was necessary to drastically limit the size of the inputrepresentation for the network. Consequently, a very small vocabularyrepresented in localist binary vectors was used to encode the corpus. Vocabularywords were represented by vectors whose length was equal to the number ofwords in the vocabulary. For a vocabulary of twenty-five words, then, onlytwenty-five bits were needed to represent any given word. Each vector consistedof exactly one bit that was “on,” and the rest of the bits were set to zero.17Appeared in: Proceedings of the Class of 2003 Senior Conference, pages 16–22Computer Science Department, Swarthmore CollegeTraining and testing data came from a part of speech tagged Wall Street Journalcorpus. Several categories of words were collapsed into a single “pseudoword”based on part of speech as a means of decreasing the vocabulary size. Inparticular, the part of speech categories of NN, NNP, NNS, JJ, VBD, VBN, VBZ,DT, and MD were only recorded in the training data by their part of speech class.Further, all punctuation marks were collapsed into the single pseudowordPUNCT. Finally, all numerals and number words were changed to the pseudoword CD since each number is relatively uncommon in the training text butnumbers can usually appear in the same positions in texts. The remaining


View Full Document

SWARTHMORE CS 97 - Using Semantic Information from Neural Networks to Detect Context-Sensitive Spelling Errors

Documents in this Course
Load more
Download Using Semantic Information from Neural Networks to Detect Context-Sensitive Spelling Errors
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Using Semantic Information from Neural Networks to Detect Context-Sensitive Spelling Errors and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Using Semantic Information from Neural Networks to Detect Context-Sensitive Spelling Errors 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?