DOC PREVIEW
Columbia COMS W4706 - Speech Synthesis - Then and Now

This preview shows page 1-2-23-24 out of 24 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1TodayThe First ‘Speaking Machine’Joseph Faber’s Euphonia, 1846Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Formant/Resonance/Acoustic SynthesisSynthesis by ComputerConcatenative SynthesisVariants of Concatenative SynthesisOverview: Synthesizer I/OTTS Production LevelsText Normalization IssuesPronunciation IssuesIntonation Assignment Issues: PhrasingIntonation Assignment Issues: AccentIntonation Assignment Issues: ContoursPhonological Specification IssuesNot Quite ThereNext Class01/14/2019 1Speech Synthesis: Then and NowJulia HirschbergCS 470601/14/2019 2Today•Then: Early speech synthesizers•Now: Overview of Modern TTS Systems•Think about: how do we evaluate a synthesizer01/14/2019 3The First ‘Speaking Machine’•Wolfgang von Kempelen, Mechanismus der menschlichen Sprache nebst Beschreibung einer sprechenden Maschine, 1791 (in Deutsches Museum still and playable)•First to produce whole words, phrases – in many languages01/14/2019 4Joseph Faber’s Euphonia, 184601/14/2019 5•Constructed 1835 w/pedal and keyboard control–Whispered and ordinary speech–Model of tongue, pharyngeal cavity with manipulable shape–Singing too: “God Save the Queen”•Riesz’s 1937 synthesizer with almost natural vocal tract shape•Forerunners of Modern Articulatory Synthesis: George Rosen’s DAVO synthesizer (1958) at MIT01/14/2019 601/14/2019 7•World’s Fair in NY, 1939•Requires much training to ‘play’•Purpose: coding/compression–Reduce bandwidth needed to transmit speech, so many phone calls can be sent over single line01/14/2019 801/14/2019 901/14/2019 10•Answers:–These days a chicken leg is a rare dish.–It’s easy to tell the depth of a well.–Four hours of steady work faced us.•‘Automatic’ synthesis from spectrogram – but can also use hand-painted spectrograms as input•Purpose: understand perceptual effect of spectral details01/14/2019 11Formant/Resonance/Acoustic Synthesis•Parametric or resonance synthesis–Specify minimal parameters, e.g. f0 and first 3 formants–Pass electronic source signal thru filter•Harmonic tone for voiced sounds•Aperiodic noise for unvoiced•Filter simulates the different resonances of the vocal tract•E.g.–Walter Lawrence’s Parametric Artificial Talker (1953) for vowels and consonants–Gunnar Fant’s Orator Verbis Electris (1953) for vowels–Formant synthesis download01/14/2019 12Synthesis by Computer•Beginnings ~1960; dominant from 1970—01/14/2019 13Concatenative Synthesis•Most common type today•First practical application in 1936: British Phone company’s Talking Clock–Optical storage for words, part-words, phrases–Concatenated to tell time•E.g. •And a ‘similar’ example from Radio Free Vestibule (1994)•Bell Labs TTS (1977) (1985)01/14/2019 14Variants of Concatenative Synthesis•Inventory units–Diphone synthesis (e.g. early Festival)–Microsegment synthesis–“Unit Selection” – large, variable units (e.g. current Festival)•Issues–How well do units fit together?–What is the perceived acoustic quality of the concatenated units? –Is post-processing on the output possible, to improve quality?01/14/2019 15Overview: Synthesizer I/O•Front end: From input to control parameters–Acoustic/phonetic representations, naturally occurring text, constrained mark-up language, semantic/conceptual representations•Back end: From control parameters to waveform–Articulatory, formant/acoustic, concatenative, (diphone, unit-selection/corpus, HMM) synthesisTTS Production LevelsKnowledge•World Knowledge•Syntax, semantics, lexicon•Phonetics/phonology•Acoustics/signal processingTask•Text Normalization•Pronunciation, intonation assignment•Duration, f0, durations•Waveform production01/14/2019 1601/14/2019 17Text Normalization Issues•Reading is what W. hates most.•Reading is what Wilde hated most.•The NAACP just elected a new president.•In 1996 she sold 2010 shares and deposited $42 in her 401(k).•The duck dove supply.•Homographs, numbers, abbreviationsPronunciation Issues•Rules for disambiguation in context: bass•Lexicon: comb, tomb, Punxsutawney Phil–Letter-to-Sound Rules •Hand built•Learned from data (pronunciation dictionary) •Hard to get good accuracy and coverage – many exceptions–Dictionary of pronunciations•More accurate•New (Out-of-Vocabulary) words a problem01/14/2019 1801/14/2019 19Intonation Assignment Issues: Phrasing•Traditional: hand-built rules–Use punctuation: 234-5682, New York, NY–Context/function word: no breaks after function word: He went to dinner. He came to and went to dinner.–Syntax: She favors the nuts and bolts approach. She went home and Dave stayed.•Current: machine learning on large labeled corpus01/14/2019 20Intonation Assignment Issues: Accent•Hand-built rules–Function/content distinction He went out the back door/He threw out the trash–Complex nominals: •Main Street/Park Avenue•city hall parking lot (stress shift)•Statistical procedures trained on large corpora–Need lots of data–Why learn what you already know?01/14/2019 21Intonation Assignment Issues: Contours•Simple rules–‘.’ = declarative contour–‘?’ = yes-no-question contour unless wh-word present at/near front of sentence•Well then, how did he do it? And what do you know?•Pretty monotonous in long stretches of speech•Problem: no one knows how to assign other contours from text01/14/2019 22Phonological Specification Issues•Task is to produce a phonological representation from phones and intonational assignment•Align phones and f0 contour•Specify durations and intensity•Select/create appropriate acoustic realization from this specification:–Acoustic transformation–Concatenation: diphone, unit selection–HMMNot Quite There•Festival concatenative:•Acuvoice concatenative: •HMM synthesis (Rob Donovan):•Rhetorical unit selection–(acquired by Nuance)•AT&T Labs Naturally Speaking •Other TTS systems01/14/2019 2301/14/2019 24Next Class•Project Phase I assigned: building a TTS System•Introduction to Festival


View Full Document

Columbia COMS W4706 - Speech Synthesis - Then and Now

Download Speech Synthesis - Then and Now
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Speech Synthesis - Then and Now and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Speech Synthesis - Then and Now 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?