SWARTHMORE CS 97 - Hebrew Vowel Restoration With Neural Networks - D2677891

Home> Schools> Swarthmore College> (CS) > CS 97> Hebrew Vowel Restoration With Neural Networks

DOC PREVIEW

SWARTHMORE CS 97 - Hebrew Vowel Restoration With Neural Networks

School name Swarthmore College

Course Cs 97- Computer Perception

Pages 7

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Appeared in: Proceedings of the Class of 2003 Senior Conference, pages 1–7Computer Science Department, Swarthmore CollegeHebrew Vowel Restoration With Neural NetworksM. Spiegel and J. VolkSwarthmore CollegeComputer Science Dept.spiegel,volk@cs.swarthmore.eduAbstractModern Hebrew is written without vowels, presenting aproblem for those wishing to carry out lexical analysison Hebrew texts. Although fluent speakers can easily re-place vowels when reading or speaking from a text, thereare no simple rules that would allow for this task to beeasily automated. Previous work in this field has involvedusing statistical methods to try to solve this problem. In-stead we use neural networks, in which letter and mor-phology information are fed into a network as input andthe output is the proposed vowelplacement. Using a pub-licly available Hebrew corpus containing vowel and mor-phological tagging, we were able to restore 85% of thecorrect vowels to our test set. We achieved an 87% suc-cess rate for restoring the correct phonetic value for eachletter. While our results do not compare favorably to pre-vious results, we believe that, with further experimenta-tion, our connectionist approach could be made viable.1 IntroductionAs would be expected from the lack of vowels, He-brew text contains a large amount of ambiguous words.Levinger et al. (1995) calculated, using a Modern He-brew newspaper corpus, that 55% of Hebrew words areambiguous1. This ambiguity can take several forms: dif-ferent vowel patterns can be used to distinguish betweenmultiple verb or noun forms, as well as between multipleforms of other parts of speech, such as certain preposi-tions. Vowel patterns also distinguish between two un-related words in certain cases. As an example of the fol-lowing, the consonant string SPR2can be vowel tagged inone way such that it means “book” and another way suchthat it means “to count”. This consonant string also hasfour other possible vowel patterns, each with a different1It is unclear whether this refers to types or tokens.2Throughout this paper we have used an ASCII representa-tion in place of actual Hebrew letters and vowels.translation.The problem is further complicated by the fact that He-brew has twelve vowels, designated in our corpus (dis-cussed below) asA, F, E, ”, I, O, U, W., :, :A, :E, :F .However, the number of vowels can sometimes be sim-plified by using phonetic groupings. These are groupsof vowels for which all the vowels in the group produceequivalent (or near-equivalent) sounds. As will be dis-cussed later, in certain situations it is enough to produceavowel from the phonetic group of the target vowel, ratherthan having to produce the exact vowel. We have identi-fied the following phonetic groups:A, F, :A, :F , eachof which produce, roughly, the sound “ah” andE, :E ,each of which produce, roughly, the sound “eh”.We believe that there are two main areas to which thiswork could be applied, each of which demands somewhatdifferent goals. The first is automated speech genera-tion, which, of course, requires vowel restoration. Forthis task, we would only need phonetic group accuracybecause the vowels within phonetic groups all make thesame sounds in spoken Hebrew. The second task is therestoration of vowels to Hebrew texts for the purpose ofaiding people who are learning the language, either chil-dren learning it as their first language or adults. For thistask, we would not be able to combine phonetic groups.2 Previous WorkPrevious relevant work falls into two main categories:work done in the field of Hebrew vowel restoration andwork done using neural networks to solve lexical prob-lems which could be similar to vowel restoration. Notethat these categories are mutually exclusive; to the bestof our knowledge, no previouswork has been done whichhas attempted to combine these fields as we are.This work makes use of a number of performancemet-rics which are defined as follows: Word accuracy is thepercentage of words which have their complete vowelpattern restored exactly. Letter accuracy is the percentage1Appeared in: Proceedings of the Class of 2003 Senior Conference, pages 1–7Computer Science Department, Swarthmore Collegeof letters which are tagged with the correct vowel. W-phonetic group accuracy is word accuracy, allowing forvowels to be substituted for others within a given pho-netic group. L-phonetic group accuracy is letter accu-racy, allowing for vowels to be substituted within pho-netic groups.In the field of Hebrew vowel restoration, the most re-cent work can be found in Gal (2002). This paper at-tempts to perform vowel restoration on both Hebrew andArabic using Hidden Markov models. Using a bigramHMM, Gal achieved 81% word accuracy for Hebrew and87% W-phonetic group accuracy. He did not calculateletter accuracy, but we have to assume that it would havebeen higher than 81%. Gal also used the WestminsterDatabase as a corpus and calculated that approximately30Similar work was also done in Yarowsky(1994),wherehe addressed accent restoration in Spanish and French.He, like Gal, uses statistical methods - decision lists inthis case. His descision lists rely on both local anddocument-wide collocational information. While thistask is quite similar to ours, French and Spanish accentpatterns are much less ambiguous than Hebrew vowelpatterns; Yarowsky cites baseline values in the 97-98%range. Given this baseline, he is able to achieve 99% ac-curacy.Gal also cites a Hebrew morphological analyzer calledNakdan Text (Choueka and Neeman 1995) which usescontext-dependent syntactic rules and other probabilisticrules. Nakdan Text uses the morphological informationthat it creates as a factor in vowel placement, meaningthat its results are most comparableto ours obtainedusingmorphology tags. Nakdan Text achieved 95% accuracy,but the authors do not specify if this is word accuracy orletter accuracy.In the field of neural networks, networks have, in anumber of past experiments, been applied to part-of-speech tagging problems, as well as used to solve otherlexical classification problems. For example, Hasan andLua (1996) applied neural nets to the problems of part-of-speech tagging and semantic category disambiguationin Chinese. In Schmid (1994), the author describes hisNet-Taggersoftware which uses a connectionistapproachto solve part-of-speech tagging. Schmid’s Net-Taggersoftware performs as well as a trigram-based tagger andoutperforms a tagger based on

View Full Document