CMU CS 15492 - Speech Synthesis Signal Processing - D1972064

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 15492> Speech Synthesis Signal Processing

DOC PREVIEW

CMU CS 15492 - Speech Synthesis Signal Processing

School name Carnegie Mellon University

Course Cs 15492- Special Topic: Speech Processing

Pages 24

This preview shows page 1-2-23-24 out of 24 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 24 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Speech Processing 15-492/18-492Speech SynthesisSignal ProcessingSignal ManipulationSignal Parameterization Signal Parameterization JoiningJoiningLPCLPCPSOLA: pitch and duration modificationPSOLA: pitch and duration modificationStatistical ParameterizationStatistical ParameterizationMELCEP/MLSAMELCEP/MLSALSF, STRAIGHT, HNM, HSMLSF, STRAIGHT, HNM, HSMTTS Signal ProcessingJoin together pieces of speechJoin together pieces of speechProsodic modification Prosodic modification Pitch (F0)Pitch (F0)DurationDurationPowerPowerChange spectral propertiesChange spectral propertiesStress/Stress/unstressunstressSpectral tiltSpectral tiltSpeaking styleSpeaking styleJoiningJust put them togetherJust put them togetherGets clicks at join pointsGets clicks at join pointsJoin them at zero crossingsJoin them at zero crossingsWindow them and overlap themWindow them and overlap themWSOLAWSOLAJoin them at pitch periodsJoin them at pitch periodsProsodic ModificationModify pitch and duration Modify pitch and duration independentlyindependentlyChanging sample rate changes bothChanging sample rate changes both“chipmunk” style speech“chipmunk” style speechDurationDurationDuplicate/delete parts of the signalDuplicate/delete parts of the signalPitchPitch“resample” to change pitch“resample” to change pitchSpeech and Short Term SignalsDuration ModificationPitch ModificationModify pitch and durationFind ideal pitch periods and durationFind ideal pitch periods and durationFind closest actual periods from unitsFind closest actual periods from unitsEnd withEnd withPitch period (short term signals)Pitch period (short term signals)Distances between themDistances between themSignal ReconstructionTDTD--PSOLA™PSOLA™Time domain pitch synchronous overlap and addTime domain pitch synchronous overlap and addPatented by France TelecomPatented by France TelecomExpired 2004Expired 2004Very efficient:Very efficient:No FFT (or inverse FFT)No FFT (or inverse FFT)Can modify Hz * 2.0 (or 0.5)Can modify Hz * 2.0 (or 0.5)The reason no one publishes algorithmsThe reason no one publishes algorithmsThe (partial) reason unit selection typically doesn’t The (partial) reason unit selection typically doesn’t do pitch/duration modificationdo pitch/duration modificationLPC: Linear predictive coding• Linear predictive coding– Predict next sample point from previous– Weighted sum of previous points– Filter of order p.– Residual excited LPCLPCWorks well but can be Works well but can be buzzybuzzyCan be very compactCan be very compactCan be pitch synchronousCan be pitch synchronousExcitedExcitedPulsePulseTriangular pulseTriangular pulseMultiMulti--pulsepulseFull residualFull residualUsed in standard speech codingUsed in standard speech codingLPC10: 2.4kpsLPC10: 2.4kpsCELP: codebook excited LPCCELP: codebook excited LPCOther Parametric RepresentationsTypically split spectral and residualTypically split spectral and residualMBROLA:MBROLA:MultiMulti--band overlap and addband overlap and addHNM/HSM:HNM/HSM:Harmonic plus (noise/stochastic) modelingHarmonic plus (noise/stochastic) modelingSTRAIGHTSTRAIGHTMELCEP/MLSAMELCEP/MLSAOften used in HMM synthesisOften used in HMM synthesisSinusoidal (HARMONIC)Sinusoidal (HARMONIC)WaveletWaveletLSF/LPCLSF/LPCChoosing the right unit typeDiphonesDiphonesPhonePhone--phone phone Joins at stable portions, not transitionsJoins at stable portions, not transitionsHalf phone (AT&T Natural Voices)Half phone (AT&T Natural Voices)Hybrid systems (Hybrid systems (HadifixHadifix––Bonn systems)Bonn systems)Other selection systems:Other selection systems:Syllable, phone, HMM stateSyllable, phone, HMM stateEven frame levelEven frame levelAcoustically Derived UnitsE.gE.gBacchianiBacchiani99 or Rita Singh CMU99 or Rita Singh CMUFrom some waveformsFrom some waveformsFind N most diverse unit typesFind N most diverse unit typesVaried in lengthVaried in lengthStill need to map letters to unitsStill need to map letters to unitsAcoustic Phonetic ClusteringParameterize databaseParameterize databaseMelcepMelcepplus powerplus powerKK--meansmeansEuclidean distance measureEuclidean distance measure100 clusters 100 clusters Label DB with best clusterLabel DB with best clusterBuild Build clunitsclunitssynthesizersynthesizerCan’t predict APC cluster directlyCan’t predict APC cluster directlyUse held out data for testingUse held out data for testingAcoustic Phonetic ClusteringGrapheme Based SynthesisSynthesis without a phoneme setSynthesis without a phoneme setUse the letters as phonemesUse the letters as phonemes(“(“alanalan” nil (a l a n))” nil (a l a n))(“black” nil ( b l a c k ))(“black” nil ( b l a c k ))Spanish (easier ?)Spanish (easier ?)419 utterances419 utterancesHMM training to label databasesHMM training to label databasesSimple pronunciation rulesSimple pronunciation rulesPolici’aPolici’a--> p o l i c i’ a> p o l i c i’ aCuatroCuatro--> c u a t r o> c u a t r oSpanish Grapheme SynthesisEnglish Grapheme Synthesis--Use Letters are phonesUse Letters are phones--26 26 ““phonemesphonemes””--( ( ““alanalan””n (a l a n))n (a l a n))--( ( ““blackblack””n (b l a c k))n (b l a c k))--Build HMM acoustic models for labelingBuild HMM acoustic models for labeling--For EnglishFor English--““This is a penThis is a pen””--““We went to the church at ChristmasWe went to the church at Christmas””--Festival introFestival intro--““do eight meatdo eight meat””--Requires method to fix errorsRequires method to fix errors--Letter to letter mappingLetter to letter mappingSignal Processing for TTSPitch and duration modificationPitch and duration modificationLPCLPCFinding the right unit typeFinding the right unit typeGraphemeGrapheme--based Synthesisbased SynthesisHW1: TTSDue 3:30pm Friday October 2Due 3:30pm Friday October 2ndndInstall

View Full Document