James HillenbrandThere is a significant body of research examining the intelligibility of sinusoidal replicas of natural speech. Discussion has followed about what the sinewave speech phenomenon might imply about the mechanisms underlying phonetic recognition. However, most of this work has been conducted using sentence material, making it unclear what the contributions are of listeners’ use of linguistic constraints versus lower level phonetic mechanisms. This study was designed to measure vowel intelligibility using sinusoidal replicas of naturally spoken vowels. The sinusoidal signals were modeled after 300 /hVd/ syllables spoken by men, women, and children. Students enrolled in an introductory phonetics course served as listeners. Recognition rates for the sinusoidal vowels averaged 55%, much lower than the ~95% intelligibility of the original signals. Attempts to improve performance using three different training methods met with modest success, with post-training recognition rates rising by ~5-11 percentage points. Follow-up work showed that more extensive training produced further improvements, with performance leveling off at ~73-74%. Finally, modeling work showed that a fairly simple pattern-matching algorithm trained on naturally spoken vowels classified sinewave vowels with 78.3% accuracy, showing that the sinewave speech phenomenon does not necessarily rule out template matching as a mechanism underlying phonetic recognition.RESULTS AND DISCUSSIONACKNOWLEDGMENTSVowel Identified by ListenerVowel Classified by the Narrow Band ModelVowel Classified by the Listener) 0 1000 2000 3000 4000Frequency (Hz) 32 ms Fourier SpectrumMasking Threshold(328 Hz running average)0 1000 2000 3000 4000Frequency (Hz) Masked SpectrumEnvelope010203040506070PERCENT CORRECT CONTROL FEEDBACK SENTENCES TRIADCONDITION55.9%58.6%57.6%65.1%53.7%58.7%52.3%63.2%304050607080BLOCK NUMBER PERCENT CORRECT 1 2 3 4 5 648.6%52.3%54.0%58.3%57.6%59.0%50556065707580PERCENT CORRECT PRET1 T2 T3 T4 T5POSTCONDITION 53.7% 62.3% 64.9% 72.5% 73.3% 74.0% 73.5%0 500 1000 1500 2000 2500 3000Frequency (kHz) Amplitude 15%30%45%60%75%Amplitude/Gain Input SpectrumGain Function(Inverse of 1266 HzRunning Average)(a)0 1000 2000 3000 4000Frequency (Hz) Amplitude After Spectrum Level Normalization(b)Masking Threshold(328 Hz Running Average)0 1000 2000 3000 4000 Amplitude (c)After MaskingNormalized NB Spectrum of Signal to be Classified Harmonic Spectrum IsSubtracted from EachSpectral Shape Template15%
View Full Document