ROBUST SPEECH RECOGNITION IN THE AUTOMOBILE

Home> Academic Documents> ROBUST SPEECH RECOGNITION IN THE AUTOMOBILE

DOC PREVIEW

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

ABSTRACT1. INTRODUCTION2. DATABASES2.1. The Motorola Automotive Database2.2. The Census Database3. NOISE CHARACTERISTICSFigure 1. Typical noise spectra in the “windows down” condition at 0 and 55 m.p.h. The horizontal...4. EXPERIMENTAL RESULTSFigure 2. Digit error rates obtained using the Motorola automotive database for two car speeds: 0...5. ADAPTIVE NOISE CANCELLATION OF SIGNALS FROM THE CAR RADIOFigure 3. Digit error rates obtained with and without adaptive noise cancellation of AM-radio tal...6. CONCLUSIONSACKNOWLEDGEMENTSREFERENCES1. Dal Degan, N., and Prati, C., “Acoustic Noise Analysis and Speech Enhancement Techniques for M...2. Lockwood, P., Baillargeat, C., Gillot, J.M., Boudy, J.,and Faucon, G., “Noise Reduction for Sp...3. Mokbel, C., and Chollet, G., “Word Recognition in the Car: Speech enhancement / Spectral Trans...4. Oh, S., Viswanathan, V., and Papamichalis, P., “Hands- Free Voice Communication in an Automobi...5. Gales, M. J. F., and Young, S., “An Improved Approach to the Hidden Markov Model Decomposition...6. Liu, F.H., Acero, A., and Stern, R.M., “Efficient Joint Compensation of Speech for the Effects...7. Acero, A., and Stern, R. M., “Environmental Robustness in Automatic Speech Recognition”, ICASS...8. Lee, K.-F., Hon, H.-W., and Reddy, D. R., “An Overview of the SPHINX Speech Recognition System...9. Davis, S. B., and Mermelstein, P., “Comparison of Parametric Representations of Monosyllabic W...10. Widrow, B., and Stearns, S. D., Adaptive Signal Processing. Prentice-Hall, Englewood Cliffs, ...ROBUST SPEECH RECOGNITIONIN THE AUTOMOBILENobutoshi Hanai* and Richard M. SternDepartment of Electrical and Computer Engineeringand School of Computer ScienceCarnegie Mellon UniversityPittsburgh, PA 15213ABSTRACT In this paper we discuss a number of the ways in which the recog-nition accuracy of automatic speech recognition systems is affect-ed by ambient noise in the automobile, along with the extent towhich various techniques for robust speech recognition can pro-vide for more robust recognition. We consider separately the ef-fects of engine noise, interference by turbulent air outside the car,interference by sounds from the car’s radio, and interference bythe sounds of the car’s windshield wipers. Recognition accuracywas compared using baseline processing, cepstral mean normal-ization (CMN), and codeword-dependent cepstral normalization(CDCN). The greatest degradation in recognition accuracy wasproduced by interference from AM-radio talk shows. The use ofCMN and especially CDCN was found to be significantly im-prove recognition accuracy, except for the effects of interferencefrom radio talk shows at low car speeds. This type of interferenceis effectively suppressed through the use of adaptive noise cancel-lation techniques. 1. INTRODUCTION The need for robustness in speech recognition accuracy in real ap-plications environments such as long-distance telephone lines,automobiles, aircraft cockpits, offices, and factory floors is be-coming increasingly important as speech recognition is becomingmore successful. This paper concerns speech recognition accura-cy in the automobile, which is a critical factor in the developmentof hands-free cellular telephony. Major factors that impede recog-nition accuracy in the automobile include noise sources such astire and wind noise while the vehicle is in motion, engine noise,and noise produced by the car radio, fan, windshield wipers, horn,turn signals, etc.A number of researchers have considered the problem of robustrecognition in the automobile previously. Their approaches in-clude adaptive noise cancelling techniques ( e.g. [1, 2]), spectraltransformation [3], the use of microphone arrays [ e.g. 4], andmulti-dimensional HMMs [5]. For the most part these studiesdealt only with “running noise” sources such as tire, engine, andwind noise, and they did not consider “functional noise” causedby functional components such as the car radio, fan, and wind- ROBUST SPEECH RECOGNITION IN THE AUTOMOBILE Nobutoshi Hanai* and Richard M. Stern Department of Electrical and Computer Engineeringand School of Computer ScienceCarnegie Mellon University Pittsburgh, PA 15213 shield wipers. In this paper we consider the effects of all of thesesources of degradation, and we compare the extent to which theseeffects are ameliorated by the compensation techniques of ceps-tral mean normalization (CMN) [6], and codeword-dependentcepstral normalization (CDCN) [7]. 2. DATABASES The experimental results in this paper were obtained by trainingthe CMU SPHINX-I system [8] on the previously-described cen-sus database [7] and tested using a database of speech recorded inautomobiles recorded by and provided by the Motorola Corpora-tion. In this section we describe the Motorola automotive data-base which was used to evaluate effects of the noise in theautomobile on the SPHINX system. We also briefly review thecontents of the census database. 2.1. The Motorola Automotive Database The Motorola automotive database consists of 12 speakers: 9males and 3 females in their 20s and 30s. Each speaker uttered six7-digit strings at three driving speeds: 0 (with engine idling), 30,and 55 m.p.h., and the following six conditions in the vehicle: (1)baseline (windows up, fan, radio, and windshield wipers off), (2)driver’s window down, (3) fan on, (4) FM radio playing music, (5)AM radio playing a talk show, (6) windshield wipers on (recordedat 0 m.p.h. only). The digit strings were read from a script withequal probabilities for all digits. The digit, ‘0’, had two pronunci-ations, “zero” and “oh”. Speech was recorded on a DAT recorder in various automobilesusing 2 microphones located on the driver’s visor. The micro-phone used for our data was a high-fidelity Sony ECM-959DT,which uses an electret element and has a flat bandpass responseover 50 - 18,000 Hz. The data were lowpass filtered to about 6,720Hz before sampling at 16 kHz using the line inputs of an ArielDigital Microphone. Since the goal for collecting the database was to make it as real-istic as possible, the recording conditions were somewhat variableand reflected what an untrained population of users might pro-duce. Some of the files for various speakers were missing due torecording problems which were not noticed until the data were re-viewed. * Currently at Mitsubishi Heavy Industries, Ltd.2.2. The Census Database The census database was used to train the


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 4 pages.

Please select your school