DOC PREVIEW
Columbia CSEE 4840 - Text 2 Speech Synthesizer

This preview shows page 1-2-3-4-5-34-35-36-37-38-69-70-71-72-73 out of 73 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Text 2 Speech SynthesizerContentsI. Project ProposalII. ResearchIII. Project Design & ImplementationVocal Tract ModelSRAMAudio CodecIV. MiscellaneousV. ReferencesVI. Source CodeOverviewII. ResearchSRAMDebuggingText 2 Speech Synthesizer Aisha Ahmad Akshay Deoras [email protected] [email protected] Girish Gupta [email protected] May 10, 2005 1Contents I. Project Proposal Overview and Implementation Challenges II. Research Speech synthesis technology Text to Phoneme Translation III. Project Design & Implementation Vocal Tract Model SRAM Audio Codec IV. Miscellaneous Who did what? Lessons Learned Advice for the Future What went wrong? V. References VI. Source Code 2I. Project Proposal Overview We propose to build a text-to-speech synthesizer using the FPGA on the XESS XSB-300E. This project will require the use of the audio CODEC for analog output as well as extra memory space provided by the SRAM. The generation of speech and phonetic sounds can be implemented in a number of ways, three basic variations of which we are exploring are: 1. Concatenative speech consisting of fundamental phonetic sounds arranged according to input and retained in a phonetic library residing in on-board memory 2. Hard-wired vocal tract model implemented in VHDL, creating voiced and unvoiced speech sounds through a process of filtering and coefficient modification 3. Program code implementation of vocal tract model, shown in figure 1, with speech synthesis and input manipulation done in program compile Implementation Challenges Each implementation method involved a set of challenges we will have to face. For method 1: • Memory concerns, the storage of fundamental phonetic sounds in a library would take up much more memory than is available on the board • Quality and robustness of synthesis, with only a predefined set of phonemes the output range will be limited, possibly leading to poor quality word synthesis For method 2: • Complexity, numerous algorithms for vocal tract modeling exist and have been researched but involve the implementation of complex filters making porting of such to VHDL difficult and leaving much room for error • Compatibility, while much research has been done, locating a full top-down synthesizable hardware model has been rare and piecing together different models may result in performance issues For method 3: • Code size, synthesis requires the use of a variety of complex functions and large libraries creating issue of where to store the code • Compatibility, same as for VHDL models 3II. Research Speech Synthesis Technology Research in the area of speech synthesis has been going on for decades. As we found out with our research, numerous models and theories exist for the best way implementing a speech synthesis system. Although the models seemed intuitive from a high level perspective (see figure 1), they quickly grew in complexity as we got closer to implementation. Notice in figure 2 the large number of filter coefficients needed. Figure 1: High level vocal tract model Figure 2: Corresponding digital hardware implementation The variety of research in the topic lead us to observe that there exits three levels of Text to Speech technology: high, middle and low: High: that is the initial text processing, finding the sentences (or whatever sized chunks you wish), dealing with punctuation, fonts, section titles etc. 4Mid: translating words to phonemes, assigning duration, intonation tunes and prosodic phrasing Low: synthesizing the waveform itself. Our project focuses on the Mid and Low levels of the technology. The middle level, has a very active following, papers on letter to sound rules, intonation, phrasing appear in many of the major computational linguistic conferences and journals. For the low level the most popular and simplest methods of synthesis we found were: Formant Synthesis (as in the Klatt synthesizer ) - with the proper specification of parameters it is possible to make distinguishable human speech using filters Concatenative synthesis - where sections of natural speech are concatenated to form utterances. A major difficulty here is to join them seamlessly. While the research exists for these two areas, it is for the most part segmented, finding a complete working model for full text to speech synthesis was rare. One reason a model for synthesis of speech waveform that included the corresponding rules for coefficient generation was rare was that such model are only commercially available and are highly protected. 5III. Project Design Our implementation of the speech synthesizer on XESS board is shown in figure 3. The diagram is a simple depiction of the overall structure of our projected goals. The proposition was that the FPGA would execute instructions downloaded SRAM. The program then outputs a bit stream of audio data to the audio codec in order to output the appropriate sound. After much deliberation spent on other complex ideas, we finally decided that this diagram was the best choice for this project. Many incomplete protected research endeavors bogged us down. Things just got confusing and complex because we tried to take different elements from various researchers. It was in the best interest of our group to stick to this simple diagram, which implemented our project, with the given limitations of the board, optimally. Figure 3: Block Diagram of XESS system Vocal Tract Model Our attempts to create a speech synthesizer include three attempts at a fitting vocal tract model. In generation one a simple concatenative speech system would be used to pull pre-recorded phonemes from memory and play them in a way such that full words could be generated. Due to limitation of memory on the board we could not fit enough recorded data onto the board to make a useful model. In the second generation a Klatt synthesizer hardwired in VHDL would represent our vocal tract. Figure 4 shows a block diagram implementation of the 6filter system to be created. This model though initially a viable option for hardware synthesis also Figure 4 In the final project we decided to implement the synthesizer in C code. Given the complex signal processing nature of speech synthesizers, we realized that it would be best to use code already written as a basis for our project. After consulting with professors here at Columbia


View Full Document

Columbia CSEE 4840 - Text 2 Speech Synthesizer

Documents in this Course
SPYCAM

SPYCAM

91 pages

PAC-XON

PAC-XON

105 pages

lab 1

lab 1

6 pages

memory

memory

3 pages

Structure

Structure

12 pages

Video

Video

3 pages

pacman

pacman

4 pages

Lab 1

Lab 1

6 pages

Scorched

Scorched

64 pages

lab 1

lab 1

3 pages

Video

Video

22 pages

Memory

Memory

23 pages

DVoiceR

DVoiceR

29 pages

MAZE

MAZE

56 pages

PAC XON

PAC XON

13 pages

PACXON

PACXON

13 pages

MP3 Player

MP3 Player

133 pages

Load more
Download Text 2 Speech Synthesizer
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Text 2 Speech Synthesizer and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Text 2 Speech Synthesizer 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?