DOC PREVIEW
Columbia COMS W4706 - HMM Based Speech Synthesis

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

HMM-Based Speech SynthesisErica CooperCS4706Spring 2011Concatenative SynthesisHMM Synthesis A parametric model Can train on mixed data from many speakers Model takes up a very small amount of space Speaker adaptationHMMs Some hidden process has generated some visible observation.HMMs Some hidden process has generated some visible observation.HMMs Hidden states have transition probabilities and emission probabilities.HMM Synthesis Every phoneme+context is represented by an HMM.The cat is on the mat.The cat is near the door.< phone=/th/, next_phone=/ax/, word='the', next_word='cat', num_syllables=6, .... > Acoustic features extracted: f0, spectrum, duration Train HMM with these examples.HMM Synthesis Each state outputs acoustic features (a spectrum, an f0, and duration)HMM Synthesis Each state outputs acoustic features (a spectrum, an f0, and duration)HMM Synthesis Many contextual features = data sparsity Cluster similar-sounding phones e.g: 'bog' and 'dog'the /aa/ in both have similar acoustic features, even though their context is a bit different Make one HMM that produces both, and was trained on examples of both.Experiments: Google, Summer 2010 Can we train on lots of mixed data? (~1 utterance per speaker) More data vs. better data 15k utterances from Google Voice Search as training dataace hardware rural supplyMore Data vs. Better Data Voice Search utterances filtered by speech recognition confidence scores50%, 6849 utterances75%, 4887 utterances90%, 3100 utterances95%, 2010 utterances99%, 200 utterancesFuture Work Speaker adaptation Phonetically-balanced training data Listening experiments Parallelization Other sources of data Voices for more languagesReference


View Full Document

Columbia COMS W4706 - HMM Based Speech Synthesis

Download HMM Based Speech Synthesis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view HMM Based Speech Synthesis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view HMM Based Speech Synthesis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?