New version page

MAP Adaptation with SphinxTrain

Upgrade to remove ads

This preview shows page 1-2-3-4 out of 12 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

Theory of MAP adaptationMAP adaptation in practiceMAP adaptation in practiceMAP adaptation in practiceMAP with SphinxTrainCombining MAP and MLLRUnsupervised MAPMAP results on RM1 (CDHMM)MAP results on RM1 (SCHMM)MAP on SRI CALO scenario meetingsReferencesMAP adaptation with SphinxTrainDavid [email protected] Technologies InstituteCarnegie Mellon UniversityMAP adaptation with SphinxTrain – p.1/12Theory of MAP adaptationStandard Baum-Welch training produces amaximum-likelihood estimate of model parameters λ:λM L= arg maxλP (O|λ)MAP training produces the maximum a-posterioriestimate:λM AP= arg maxλP (O|λ)P (λ)Reduces to ML estimate with a non-informative priorP (λ).For speaker adaptation, the prior P (λ) is derived from abaseline or speaker-independent model.MAP adaptation with SphinxTrain – p.2/12MAP adaptation in practiceThe simplest method is Bayesian updating of eachGaussian mean, assuming the following (incorrect)prior:µ ∼ N(µSI, σ2SI)This reduces to interpolation between SI parametersand ML (forward-backward) estimates from theadaptation data:ˆµM AP=PTt=1γt(i, k)σ2SIµM L+ σ2M LµSIPTt=1γt(i, k)σ2SI+ σ2M LThe posterior variance can also be computed, but it isnot useful.MAP adaptation with SphinxTrain – p.3/12MAP adaptation in practiceWith a more detailed prior, all HMM/GMM parameterscan be updated.This is important for semi-continuous models since themixture weights can be modified.Prior is a product of a Dirichlet distribution withhyperparameters {η, ν} and a Gamma-Normaldistribution with hyperparameters {α, β, µ, τ}.The τ hyperparameter controls the “speed” ofadaptation. Larger τ = less adaptation.Estimation of these hyperparameters is tricky.Generally, τ is estimated, then all otherhyperparameters derived from it and the SI model.MAP adaptation with SphinxTrain – p.4/12MAP adaptation in practiceThe τ can be fixed to a global value (e.g. 2.0) or it canbe estimated separately for each Gaussian:τik=pPTt=1γt(i, k)PTt=1γt(i, k)(ˆµik− µik)T(wikΣik)(ˆµik− µik)νikis then estimated as wikPKk=1τikand the mixtureweights are re-estimated as:ˆwik=νik− 1 +PTt=1γt(i, k)PKk=1νik− K +PKk=1PTt=1γt(i, k)MAP adaptation with SphinxTrain – p.5/12MAP with SphinxTrainMAP interpolation and updating has been implementedin SphinxTrain as the map_adapt tool.It works similarly to the norm tool, except that itproduces a MAP re-estimation rather than an ML one.1. Collect forward-backward statistics on adaptationdata using the baseline models and bw.2. Run map_adapt the same way you would norm,specifying the baseline model files and the outputMAP model files.Works for continuous and semi-continuous models.(SCHMM is broken in current version but I’ll fix it).MAP adaptation with SphinxTrain – p.6/12Combining MAP and MLLRIn theory they are equivalent, if each Gaussian has itsown regression class.In practice, this never happens, and their effects areadditive.To combine them:1. Compute an MLLR transformation with bw andmllr_solve2. Apply it to the baseline means withmllr_transform3. Re-run bw with the transformed means4. Run map_adapt to produce a MAP re-estimationMAP adaptation with SphinxTrain – p.7/12Unsupervised MAPThis doesn’t work. Don’t do it.The lack of parameter tying in the standard MAPalgorithm means that the adaptation is not robust.Incorrect transriptions of adaptation data result in thewrong models being updated.MLLR alone is a better choice for sparse or noisyadaptation data.MAP adaptation with SphinxTrain – p.8/12MAP results on RM1 (CDHMM)1000 CD senones, 8 gaussians, Sphinx 3.x fast decoderSpeaker Baseline 100 200 400 MLLR+400 Relativebef0_3 10.40% 8.43% 7.75% 7.75% 7.84% -24.62%cmr0_2 8.78% 6.66% 6.40% 6.40% 4.66% -46.92%das1_2 9.73% 7.04% 5.92% 4.75% 4.21% -56.73%dms0_4 8.31% 5.66% 5.48% 5.01% 4.42% -46.81%dtb0_3 9.43% 7.49% 6.63% 5.84% 5.10% -45.92%ers0_7 8.19% 7.93% 7.96% 6.34% 5.92% -27.72%hxs0_6 16.36% 11.14% 11.08% 7.52% 7.22% -55.87%jws0_4 8.81% 6.90% 6.69% 6.40% 5.69% -35.41%pgh0_1 7.84% 6.37% 6.54% 5.13% 5.04% -35.71%rkm0_5 24.20% 17.83% 17.18% 13.82% 11.97% -50.54%tab0_7 6.51% 5.57% 5.33% 4.27% 4.30% -33.95%MAP adaptation with SphinxTrain – p.9/12MAP results on RM1 (SCHMM)4000 CD senones, 256 gaussians, Sphinx 3.x slow decoderSpeaker Baseline MAP MLLR+MAP Relativebef0_3 8.87% 8.37% 7.87% -11.27%cmr0_2 7.10% 6.63% 6.42% -9.58%das1_2 6.07% 5.31% 4.75% -21.75%dms0_4 5.87% 5.25% 4.69% -20.10%dtb0_3 7.87% 7.13% 6.93% -11.94%ers0_7 7.10% 6.93% 6.60% -7.04%hxs0_6 10.08% 8.81% 7.60% -24.60%jws0_4 6.63% 5.92% 5.63% -15.08%pgh0_1 7.93% 7.13% 6.54% -17.53%rkm0_5 15.97% 14.09% 11.94% -25.23%tab0_7 5.89% 5.04% 4.63% -21.39%MAP adaptation with SphinxTrain – p.10/12MAP on SRI CALO scenario meetingsCALOBIG models, 5000 CD senones, 16 gaussians,Sphinx 3.x fast decoderOne meeting in a sequence of five was adapted with theother four. Results were averaged for all five meetings.Speaker Baseline MLLR MLLR+MAP Best relative WERbill_deans 50.89% 47.37% 39.39% -22.60%lpound 56.54% 51.34% 38.90% -31.20%jpark 29.54% 27.71% 25.88% -12.40%MAP adaptation with SphinxTrain – p.11/12ReferencesChin-Hui Lee and Jean-Luc Gauvain, “SpeakerAdaptation Based on MAP Estimation of HMMParameters”. Procedings of ICASSP 1993, pp.558-561.Chin-Hui Lee and Jean-Luc Gauvain, “MAP Estimationof Continuous Density HMM: Theory and Applications”.Proceedings of DARPA Speech & Nat. Lang. 1992.Qiang Huo and Chorkin Chan, “Bayesian AdaptiveLearning of the Parameters of Hidden Markov Model forSpeech Recognition”. IEEE Transactions on Speechand Audio Processing, 3:5, pp. 334-345.MAP adaptation with SphinxTrain –


Download MAP Adaptation with SphinxTrain
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view MAP Adaptation with SphinxTrain and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view MAP Adaptation with SphinxTrain 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?