TAMU CSCE 689 - matsumoto1973acousticCorrelatesMDS

Unformatted text preview:

428 IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, VOL. AU-21, NO. 5, OCTOBER 1973 Multidimensional Representation of Personal Quality of Vowels and its Acoustical Correlates HIROSHI MATSUMOTO, SHIZUO HIKI, TOSHIO SONE, and TADAMOTO NIMURA Abstract-The personal quality of sustained vowels uttered by eight male talkers was represented multidimensionally in a psychological auditory space (PAS) by means of Kruskal’s multidimensional scaling procedure based on the perceptual confusion in talker discrimination tests. Physical properties of the vowels were analyzed in terms of elementary acoustical parameters, such as formant frequencies, slope of glottal source spectrum, mean fundamental pitch frequency, and rapid fluctuation of fundamental pitch period. Then the relation- ship between the configuration on the PAS and the acoustical parameters was examined through multiple correlation and regression analysis. The contribution of those acoustical parameters to the per- sonal quality of the five Japanese vowels and the relative con- tributions of the vocal tract and the glottal source characteris- tics are demonstrated quantitatively. These results were obtained partially. by utilizing hybrid voices in which the source wave or the formant frequency pattern was inter- changed among different talkers. I. Introduction As part of a general study investigating the auditory process for extracting personal information from speech, the relation between the perceptual difference in personal quality and the difference in physical properties was analyzed for sustained vowels. In order to observe the perceptual difference in per- sonal quality, recognition rates or confusion matrices have been utilized in most of the previous studies. In this study, however, it was tried to observe quantita- tively the multidimensional nature that underlies personal quality in the psychological process in terms of distance on the psychological auditory space (PAS). In the first stage of the auditory process for extrac- ting personal information from speech, voice input is mapped onto a sensory auditory space through an elementary auditory process that deals with sensory differences in the basic attributes of sound, such as intensity, pitch, and spectral pattern. Then, we Manuscript received April 12, 1972; revised May 4, 1973. ing and the Research Institute of Electrical Communication, The authors are with the Department of Electrical Engineer- Tohoku University, Sendai, Japan. hypothesize, through higher auditory processing the sensory auditory space is mapped onto a PAS in which interpoint distance relates monotonically to the perpetual dissimilarity of personal quality of voice. (This space is independent of or parallel with the PAS of phonetic quality [ 11, which is also mapped from the sensory auditory space.) In the last stage, in order to output personal information, the judgment process is applied to the personal quality represented in the PAS. In the ordinary case, the process of judg- ment is the identification of the talker, which may be ascribed to a discrimination between the representa- tion of a given voice input in the PAS and that of the voice characteristics of familiar talkers stored in the long-term memory. In this experiment, the personal quality of sustained vowels was scaled multidimensionally, utilizing dis- crimination tests in which listeners were supposed to store the representation of the personal quality in the PAS of the preceding stimulus in short-term memory and compare it with that of the following stimulus. In this way the nature of the PAS is observed sepa- rately from the ordinary identification process. This will serve to avoid the ambiguities caused by the listener’s familiarity with the talker and uncertainty of the memory. I I. Voice Samples of the Vowel /a/ Voice samples used in the first experiment were 24 sustained vowels, Japanese /a/, uttered with three levels of fundamental pitch frequency (120,140, and 160 Hz, approximately) by each of eight male adult talkers (voice set I). The vowel /a/ was used here be- cause this occures most frequently among the five vowels in Japanese speech. These eight talkers were chosen as the representa- tives of 25 candidates in the age range from 20 to 35 years. There was no candidate with pathologic voice. The talkers were instructed to utter the vowel for a few seconds with natural intensity, adjusting its pitch frequency to that of a pure tone (120, 140, and 160 Hz) that was presented to one ear of each talker through an earphone. Then, a 0.5s portion of the steady part was ex- tracted from each of these sustained vowels by re- producing the master recording through a gate circuit with a 10-ms rise and fall time. The intensity was ad- justed in dubbing the submaster recordings so that the peak volume units (vu) meter reading was the same for each voice sample. Ill. Determination of Acoustical Parameters The acoustical parameters used here in examining their relation to personal quality were as follows: the lowest three formant frequencies (F, , F2 , and F, ); the slope of glottal source spectrum ((x); the meanMATSUMOTO et al.: PERSONAL QUALITY OF VOWELS AND ACOUSTICAL CORRELATES 429 logarithmic fundamental pitch frequency (log Po ); and the rapid fluctuation of fundamental pitch period (o(A T/?) (the standard deviation of dif- ferences between adjacent fundamental pitch periods normalized by the mean fundamental pitch period). The formant frequencies were estimated by means of an analysis-by-synthesis method [2], applied to the log-amplitude spectrum calcuiated by the fast Fourier transform (FFT) from the digitized waveform of a single pitch period for each voice sample. In the analysis-by-synthesis algorithm, the six parameters (first, second, third, and fourth formant frequencies, the higher pole correction term, and the slope of glottal source spectrum in dB/octave) are automati- cally controlled by means of the maximum neighbor- hood method


View Full Document

TAMU CSCE 689 - matsumoto1973acousticCorrelatesMDS

Documents in this Course
slides

slides

10 pages

riccardo2

riccardo2

33 pages

ffd

ffd

33 pages

intro

intro

23 pages

slides

slides

19 pages

p888-ju

p888-ju

8 pages

w1

w1

23 pages

vfsd

vfsd

8 pages

subspace

subspace

48 pages

chapter2

chapter2

20 pages

MC

MC

41 pages

w3

w3

8 pages

Tandem

Tandem

11 pages

meanvalue

meanvalue

46 pages

w2

w2

10 pages

CS689-MD

CS689-MD

17 pages

VGL

VGL

8 pages

ssq

ssq

10 pages

Load more
Download matsumoto1973acousticCorrelatesMDS
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view matsumoto1973acousticCorrelatesMDS and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view matsumoto1973acousticCorrelatesMDS 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?