CMU CS 15492 - Speech Synthesis Talking heads Singing Synthesis - D2076910

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 15492> Speech Synthesis Talking heads Singing Synthesis

DOC PREVIEW

CMU CS 15492 - Speech Synthesis Talking heads Singing Synthesis

School name Carnegie Mellon University

Course Cs 15492- Special Topic: Speech Processing

Pages 18

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Speech Processing 15-492/18-492Speech SynthesisTalking headsSinging SynthesisMore Information is BetterVoice + text is easier to understandVoice + text is easier to understandVoice + face is easier tooVoice + face is easier tooTalking HeadsAdds novelty/character/personificationAdds novelty/character/personificationExperimentsExperimentsshow better understandingshow better understandingLip synchingLip synchingFacial movementsFacial movementsListeners swear its better synthesisListeners swear its better synthesisTalking headsTalking HeadsSynthesize textSynthesize textOutput phone position in audio streamOutput phone position in audio streamMap phones to lip/tongue positionsMap phones to lip/tongue positionsBuild visual streamBuild visual streamChoose appropriate framesChoose appropriate framesAligned with audioAligned with audioHow many facial positionsHow many facial positionsVisemesBaphyBaphyThree positionsThree positionsClosed, open and roundedClosed, open and roundedRhoRho10 lip positions10 lip positionsEyelid 4Eyelid 4Eyes 2Eyes 2When should the alignWhen should the alignFollow trajectories, not just at time instantFollow trajectories, not just at time instantShape for syllables not just phonesShape for syllables not just phonesSynthesis AnalogiesArticulatoryArticulatorySynthesisSynthesisModeling the vocal tractModeling the vocal tractBaldiBaldi: movement of muscles: movement of musclesFormat:Format:Modeling of signal syntheticallyModeling of signal syntheticallyCarton based faces (Carton based faces (BaphyBaphy))Concatenative Concatenative Joining natural segmentsJoining natural segmentsJPL exampleJPL exampleInterval’s Video RewriteInterval’s Video RewriteUnit sizeUnit sizeBaphyBaphy== == uniphoneuniphoneJPL == JPL == diphonediphoneVideo Rewrite == unit selectionVideo Rewrite == unit selectionTalking HeadsPersonalization:Personalization:Can look like a mask put on a dummyCan look like a mask put on a dummyUncanny valleyUncanny valleyThe more human like, the more critical we areThe more human like, the more critical we are33--D movement (in real time)D movement (in real time)SecondSecond--life type characterslife type charactersGesture generation tooGesture generation tooOffOff--line line (Gollum, (Gollum, JabbaJabbathe Hut)the Hut)Usually actors do the voicesUsually actors do the voicesSinging SynthesisSimple pitch and duration control Simple pitch and duration control But singing is more than that But singing is more than that Proper singing synthesisProper singing synthesisRecording a singing database Recording a singing database Phonetic, prosodic, and singing style coveragePhonetic, prosodic, and singing style coverageSang rather than spoken voiceSang rather than spoken voiceFlinger (Festival Singer) (Macon)Sinusoidal modelingSinusoidal modelingMore pitch control than just PSOLAMore pitch control than just PSOLAMIDI interfaceMIDI interfaceAllow mixing with musicAllow mixing with musicStandard MIDI authoring techniquesStandard MIDI authoring techniquesFestival Singing ModeDominic Dominic MazzoniMazzoni(11(11--752 project 2001)752 project 2001)XML based song descriptionXML based song description<DURATION BEATS=“1.0”><DURATION BEATS=“1.0”><PITCH NOTE=“C4”>Oh</PITCH><PITCH NOTE=“C4”>Oh</PITCH></DURATION></DURATION>But not just setting pitch at duration pointBut not just setting pitch at duration pointWhen do you move it (based on syllable and voicing)When do you move it (based on syllable and voicing)How quickly do you move pitchHow quickly do you move pitchSinging Example<?xml version="1.0"?><?xml version="1.0"?><!DOCTYPE SINGING PUBLIC "<!DOCTYPE SINGING PUBLIC "--//SINGING//DTD SINGING mark up//EN" //SINGING//DTD SINGING mark up//EN" "Singing.v0_1.dtd""Singing.v0_1.dtd"[]>[]><SINGING BPM="30"><SINGING BPM="30"><PITCH NOTE="G3"><DURATION BEATS="0.3">doe</DURATION></PITCH><PITCH NOTE="G3"><DURATION BEATS="0.3">doe</DURATION></PITCH><PITCH NOTE="A3"><DURATION BEATS="0.3">ray</DURATION></PITCH><PITCH NOTE="A3"><DURATION BEATS="0.3">ray</DURATION></PITCH><PITCH NOTE="B3"><DURATION BEATS="0.3">me</DURATION></PITCH><PITCH NOTE="B3"><DURATION BEATS="0.3">me</DURATION></PITCH><PITCH NOTE="C4"><DURATION BEATS="0.3"><PITCH NOTE="C4"><DURATION BEATS="0.3">fahfah</DURATION></PITCH></DURATION></PITCH><PITCH NOTE="D4"><DURATION BEATS="0.3">sew</DURATION></PITCH><PITCH NOTE="D4"><DURATION BEATS="0.3">sew</DURATION></PITCH><PITCH NOTE="E4"><DURATION BEATS="0.3"><PITCH NOTE="E4"><DURATION BEATS="0.3">lahlah</DURATION></PITCH></DURATION></PITCH><PITCH NOTE="F#4"><DURATION BEATS="0.3">tee</DURATION></PITCH><PITCH NOTE="F#4"><DURATION BEATS="0.3">tee</DURATION></PITCH><PITCH NOTE="G4"><DURATION BEATS="0.3">doe</DURATION></PITCH><PITCH NOTE="G4"><DURATION BEATS="0.3">doe</DURATION></PITCH></SINGING></SINGING>Future in TTSMore natural voicesMore natural voicesSound humanSound humanInteract in a human way (not just words)Interact in a human way (not just words)More personalizationMore personalizationSound like a particular personSound like a particular personCross lingual synthesisCross lingual synthesisMore flexibleMore flexibleSay it with more feelingSay it with more feelingRealtimeRealtimevoice transformationvoice transformationHave an American accent while you speakHave an American accent while you speakText to speech processText analysisText analysisFrom characters to wordsFrom characters to wordsLinguistic analysisLinguistic analysisFrom words to pronunciationsFrom words to pronunciationsWaveform analysisWaveform analysisFrom pronunciations to noisesFrom pronunciations to noisesHW2: TTSDue 3:30pm Monday October 20Due 3:30pm Monday October 20ththInstall Festival and Install Festival and FestvoxFestvoxFind 10 errors in each of two different Find 10 errors in each of two different synthesizerssynthesizersBuild a voiceBuild a voiceA Talking ClockA Talking ClockA general voiceA general voice(or both)(or

View Full Document