UW-Madison ECE 539 - Speech sound production - Recognition using recurrent neural networks - D61660

Home> Schools> University of Wisconsin, Madison> Electrical and Computer Engr (ECE) > ECE 539> Speech sound production - Recognition using recurrent neural networks

DOC PREVIEW

UW-Madison ECE 539 - Speech sound production - Recognition using recurrent neural networks

School name University of Wisconsin, Madison

Course Ece 539- Introduction to Artificial Neural Network and Fuzzy Systems

Pages 20

This preview shows page 1-2-19-20 out of 20 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

SPEECH SOUND PRODUCTION: RECOGNITION USING RECURRENT NEURAL NETWORKSBy: Eric NuttDecember 18, 2003ContentsDiscussion and Research Pages 2-7Network Training Results Appendix AImportant Formulae Appendix BpVector – Feature Extraction Software Appendix CNetwork Testing Code Appendix DReferences Appendix E1AbstractIn this paper I present a study of speech sound production and methods for speech recognition systems. One method for important speech sound feature extraction along with a possible full scale recognition system implementation using recurrent neural networks is presented. Neural network testing results are examined and suggestions for further research and testing are given at the end of this paper. Speech MechanismsHuman speech is produced by complex interactions between the diaphragm, lungs, throat,mouth and nasal cavity. The processes which control speech production are phonation, resonation and articulation. Phonation is the process of converting air pressure into sound via the vocal folds, or vocal cords as they are commonly called. Resonation is the process by whichcertain frequencies are emphasized by resonances in the vocal tract, and articulation is the process of changing the vocal tract resonances to produce distinguishable sounds. Air is forced up from the lungs by the diaphragm then passes through the vocal folds at the base of the larynx. If the vocal folds are used to produce sound then that sound is said to be voiced. The vocal tract acts as a cavity resonator forming regions where the sounds produced are filtered. Each resonantregion in the spectrum of a speech sound usually contains one peak. These peaks are referred to as the formants.Helmholtz ResonatorElectrical AnalogThe Helmholtz Resonator is an example of a cavity resonator which acts as a lumped acoustic system. The figure on the left is the mechanical resonator with the volume V, the neck length L, and the neck area S. The electrical analog of the Helmholtz Resonator is given as the figure on the right. The resonant frequency (0) is given by the condition that the reactance of the system goes to zero:210)'(VLScWhere c is the sound velocity in the medium under consideration and 'L is the effective length of the neck which depends on the shape of the opening. The equations for the electrical analog further depend on the radiation resistance (Rr), and the effective stiffness (s) which are given in Appendix B [2].Phonology2Phonology is the study of the smallestdistinguishable speech sounds that humansproduce. Phonology can be used to break thespeech sounds into groups based on how thesound is produced in the vocal tract. Thesimplest group of speech sounds, or phonemes,is vowels. Vowel sounds are a group of soundsproduced using voicing (vibrations of the vocalfolds) and unrestricted air flow. Vowel soundscan be distinguished by the first three formantsof the vowel spectra which are attributedrespectively to the following articulators: lip opening, shape of the body of the tongue, and the location of the tip of the tongue [4]. Other phonemes are produced by more complex interactionsin the vocal tract involving air flow restrictions (consonant phonemes). Stops, or plosives, are anexample of more complex phonemes. These phonemes are produced by completely restricting air flow and then releasing the air to make some sound. Examples of stops are /b/ as in boy, /t/ asin tall, /m/ as in make and /n/ as in now. One method used to distinguish consonant phonemes is to examine the manner of articulation. This method breaks consonant phonemes into groups based on the location and shape of vocal tract articulators. The main consonant groups by manner of articulation are: fricatives, stops or plosives, and affricates. Some of these consonants from American English aregiven above in square brackets along with an example where the bold-faced letter/letters represent the phoneme. Looking at the waveform and spectrum plots below for /v/ and /s/ one can readily see the difference between each. One of the main causes of difference between these two phonemes is that /v/ is voiced and /s/ is not. The voicing of /v/, which is a result of vocal fold vibrations, is what causes its periodicity. Since /s/ is produced without using the vocal folds it does not have a periodic structure.Waveform and Spectrum of /s/Waveform and Spectrum of /v/Fricatives Stops/Plosives Affricates[f], fie [p], pie [č], chalk[v], vie [b], buy [jˇ], gin[θ], thigh [m], my[ð], thy [t], tie[s], sky [d], dog[š], shy [n], now[ž], zoo [k], kite[g], girl[ŋ], king3Below are examples of the phonemes /a/ and /o/. By examining these waveforms and spectrums one can easily see that both appear periodic. This is because all vowels are voiced, just like the consonant phoneme /v/ above. The formants are also apparent in the spectrum (which I have pointed out for clarity).Waveform and Spectrum of /a/Waveform and Spectrum of /I/Speech Recognition IntroductionThere are several viable methods currently used for speech recognition including template matching, acoustic-phonetic recognition and stochastic processing. In order to examinethe methods used to produce speech sounds I have chosen to try and implement an acoustic-phonetic recognition process. Acoustic-phonetic recognition is based on distinguishing the phonemes of a language. First, the speech is analyzed and a set of phoneme hypotheses are made. These hypotheses correspond to the closest recognized phonemes in the order that they are introduced to the system. Next, the phoneme hypotheses are compared against stored words and the word that best matches the hypothesis is picked [1].One of the most important aspects of this type of speech recognition is the phoneme feature extraction. Feature extraction is the method of retrieving information that distinguishes each phoneme. For this project I have developed a software program called pVector which can be used to extract phoneme vectors (the features used to distinguish phonemes) from a wave file and store them in a convenient manner. Documentation for pVector is included in Appendix C at the end of this report and the methods employed by pVector are discussed below in the section labeled Feature Extraction. After the feature vectors are extracted a method must be implemented which can take a feature vector as input and decide which phoneme this feature vector corresponds, or does not correspond, to.

View Full Document

UW-Madison ECE 539 - Speech sound production - Recognition using recurrent neural networks

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-19-20 out of 20 pages.

UW-Madison ECE 539 - Speech sound production - Recognition using recurrent neural networks

Sign up for free to view:

Please select your school