Robust Speech Feature Extraction

Home> Academic Documents> Robust Speech Feature Extraction

DOC PREVIEW

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1842 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 6, AUGUST 2007Robust Speech Feature Extraction by GrowthTransformation in Reproducing Kernel Hilbert SpaceShantanu Chakrabartty, Member, IEEE, Yunbin Deng, Member, IEEE, and Gert Cauwenberghs, Senior Member, IEEEAbstract—The performance of speech recognition systems de-pends on consistent quality of the speech features across variableenvironmental conditions encountered during training and eval-uation. This paper presents a kernel-based nonlinear predictivecoding procedure that yields speech features which are robustto nonstationary noise contaminating the speech signal. Featuresmaximally insensitive to additive noise are obtained by growthtransformation of regression functions that span a reproducingkernel Hilbert space (RKHS). The features are normalized byconstruction and extract information pertaining to higher-orderstatistical correlations in the speech signal. Experiments withthe TI-DIGIT database demonstrate consistent robustness tonoise of varying statistics, yielding significant improvements indigit recognition accuracy over identical models trained usingMel-scale cepstral features and evaluated at noise levels between 0and 30-dB signal-to-noise ratio.Index Terms—Feature extraction, growth transforms, noise ro-bustness, nonlinear signal processing, reproducing kernel HilbertSpace, speaker verification.I. INTRODUCTIONWHILE most current speech recognizers give acceptablerecognition accuracy for clean speech, their perfor-mance degrades when subjected to noise present in practicalenvironments [1]. For instance, it has been observed thatadditive white noise severely degrades the performance ofMel-cepstra-based recognition systems [1], [2]. This perfor-mance degradation has been attributed to unavoidable mismatchbetween training and recognition conditions. Therefore, in lit-erature, several approaches have been presented for alleviatingthe effects of mismatch. These methods can be broadly catego-rized as follows:• noise estimation and filtering methods that reconditions thespeech signal based on noise characteristics [2];• online model adaptation methods for reducing the effect ofmismatch in training and test environments [3];• robust feature extraction methods [4], which includes tech-niques based on human auditory modeling [5], [6].Manuscript received May 3, 2006; revised March 8, 2007. This work wassupported in part by a grant from the Catalyst Foundation, the National ScienceFoundation (NFS) under Grants IIS-0209289 and IIS-0434161, and the Officeof Naval Research/Defense Advanced Research Projects Agency under GrantN00014-00-C-0315. The associate editor coordinating the review of this manu-script and approving it for publication was Dr. Alex Acero.S. Chakrabartty is with the Department of Electrical and Computer Engi-neering, Michigan State University, East Lansing, MI 48824 USA.Y. Deng is with LumenVox, LLC, San Diego, CA 92123 USA.G. Cauwenberghs is with the Section of Neurobiology, Division of BiologicalSciences, University of California San Diego, La Jolla, CA 92093 USA.Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TASL.2007.899285Fig. 1. Signal flow in KPCC feature extraction.An excellent survey of techniques for improving perfor-mance of speech recognition systems under noisy environmentscan be found in [1]. This paper describes a novel feature ex-traction algorithm based on nonlinear processing of the speechsignal. Termedkernel predictive coding cepstra (KPCC) [7],the procedure consists of two key steps, as summarized inFig. 1: 1) estimation of a nonlinear function that captures robusthigher order statistics in a segment of speech signal and 2) map-ping of nonlinear function parameters onto a computationallytractable lower-dimensional manifold using growth transfor-mations. Growth transformation is an iterative procedure foroptimizing homogeneous polynomial functions of probabilitymass functions [13]. The technique has been used in discrimi-native hidden Markov model (HMM) training using maximummutual information (MMI) [14], where it has been extended tooptimizing nonhomogeneous rational functions. In this paper,estimation of nonlinear function is performed using regressiontechniques over a reproducing kernel Hilbert space (RKHS)[9]. RKHS regression have been extensively studied in thecontext of regularization theory [11], support vector machines[12], and for detection/estimation of covariance functionals[10]. Combining RKHS regression with growth transformationendows the proposed KPCC feature extraction algorithm withthe following robustness properties.1) The algorithm uses a semiparametric function estimationprocedure without making any prior assumption on noisestatistics.2) The algorithm uses kernel methods to extract features thatare nonlinear, thus utilizing higher-order statistical corre-lations in speech which are robust to corruption by noise.3) Robust parameter estimation is ensured by imposingsmoothness constraints based on regularization principles.4) The features extracted are self-calibrated and normalized,which reduces mismatch between training and testingconditions.In this paper, a step-by-step derivation of the KPCC algo-rithm is described along with some of its mathematical proper-ties (Sections II and III). In Section IV, robustness of the KPCCalgorithm is demonstrated by training a simple HMM-based rec-ognizer and comparing the results with an equivalent systemtrained on Mel frequency cepstral coefficient (MFCC)-basedfeatures. Section V provides concluding remarks and with pos-sible extensions of the KPCC algorithm.1558-7916/$25.00 © 2007 IEEEAuthorized licensed use limited to: Michigan State University. Downloaded on August 4, 2009 at 12:19 from IEEE Xplore. Restrictions apply.CHAKRABARTTY et al.: ROBUST SPEECH FEATURE EXTRACTION 1843II. THEORYThe theory of the KPCC feature extraction algorithm usesconcepts from inner-product spaces, and in particular RKHS,which for the sake of completeness, is described briefly in thissection. For detailed treatment of RKHS and its properties, thereaders are referred to [8], [9], and [20].A. Kernel RegressionThe first step in the KPCC feature extraction algorithm is anonlinear functional estimation procedure that extracts higherorder statistics from speech signals. Given a stationary discretetime speech signal represented


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

Please select your school