Voice Modulator Adam Rosenfield Lunduo Ye 6.111 Final Project Spring 2007 TA: Amir Hirsch Abstract We designed, implemented, and tested a voice modulation system, which takes in audio data and modulates the pitch of the data. The modulator can change the pitch of vocal data while preserving vocal formants, maintaining intelligible speech over a wide range of frequencies. The system is implemented on a field-programmable gate array (FPGA) and operates in real time.Table of Contents 1. Introduction 2. Module Descriptions and Implementations 2.1. Audio 2.1.1. AC’97 Controller 2.1.2. Fourier Transform 2.1.3. Spectrum Analyzer 2.1.4. Voice Modulator 2.1.5. Inverse Fourier Transform 2.2. Video 2.3. Input 3. Testing and Debugging 4. Results and Conclusions 5. References List of Figures Figure 1: High-level overview of system components 2 Figure 2: AC’97 controller 4 Figure 3: HPS Algorithm 5 Figure 4: Video component 7 Figure 5: Timing diagram of buffer swaps 9 Figure 6: Timing diagrams for VGA control signals 10 Figure 7: Device-to-host communication for PS/2 11 1. Introduction [Adam and Lunduo] The voice modulator changes the pitch of voice inputs. Users speak or sing into a microphone while playing keys on a keyboard. Real-time visualizations of the waveforms are displayed on a VGA screen. The modulator outputs frequency-shifted copies of the voice data to match the notes selected from the keyboard while preserving vocal formants as much as possible. This device allows users of any musical ability to sing notes or chords perfectly. The modulator is implemented on a field programmable gate array (FPGA). Inputs are taken via a microphone and a PS/2 keyboard. A VGA monitor is used to display waveforms. MIDI keyboard support was originally planned; however, we could not get it to work in time. 1Visualizations for the real-time Fourier transforms of the voice input were also not debugged in time. The system has two main components, audio and video. Figure 1 shows a high-level overview of the inputs, outputs, and interactions between parts. PS/2 Controller & Decoder AC’97 Controller Audio Modules Figure 1: High-level overview of system components. The audio component of the system consists of an AC’97 audio controller, a fast Fourier transform (FFT) module, a pitch detection module, a frequency modulator, and an inverse fast Fourier transform (IFFT) module. Audio data is continuously sent through the FFT module to compute its frequency spectrum. The Harmonic Product Spectrum (HPS) algorithm is used to determine the input pitch. The modulator shifts frequencies to match those specified from the keyboard, and sends the output to the IFFT module. The resulting waves are buffered, and sent back to the AC’97 at a sample rate of 48KHz. All computations are done on 1024-sample windows. The visual components include a VGA controller, a wave display module, and a (non-functional) FFT display module. By default, the wave display updates continuously as it receives data. The user can freeze the current screen or cause the display to trigger on a rising Video Modules VGA Controller 2edge of the waveform. The screen displays both input and output waves. Ideally, the real-time FFT outputs would also be displayed. The VGA runs at a 1024x768 resolution with a 60Hz refresh rate. All modules are written in Verilog with Xilinx ISE 8. Unit testing was done with ModelSim, although most modules required incremental testing on the FPGA with a Tektronix TLA5202 Logic Analyzer. 2. Module Descriptions and Implementations The voice modulator was developed in three parts: audio, visual, and keyboard input. 2.1. Audio [Adam] The audio component is the major component of the project. Its job is to: 1. Sample the microphone data 2. Compute the Fourier transform of each audio frame 3. Analyze the frequency spectrum to determine the fundamental pitch of the input 4. Modulate the spectrum to change the fundamental pitch 5. Synthesize the spectrum back into a new audio frame with the inverse Fourier transform 6. Send the audio data to the headphones 2.1.1. AC’97 Controller The AC’97 controller (Figure 2) provides a simple audio interface for the rest of the project. On system reset, it initializes the AC’97 by setting the various command registers to appropriate values (e.g. unmuting the headphone and microphone ports). It translates between the AC’97’s bit-serial protocol and a simpler 18-bit parallel protocol, and it also synchronizes from the AC’97’s 12.288MHz bit clock and the FPGA’s 27MHz clock. It provides a 48KHz sync pulse called frame_enable every time a new frame of audio data is ready to be sent to the headphones or received from the microphone. 3To lab kit To AC97 2.1.2. Fourier Transform The Fourier transform module computes a 1024-point short-time fast Fourier transform of the audio input with a rectangular windowing function. It stores audio samples in block RAM until it acquires 1024 samples, at which point it begins the computation. The FFT is implemented by the Xilinx IP CoreGen FFT, which uses the Cooley-Tukey algorithm. The entire system works with monaural data, so the stereo inputs are converted to mono by averaging the two channels before they are fed into the FFT. Likewise, the final output signal is copied onto both output channels. When the FFT has finished computing, it stores the resulting transform in another block RAM and pulses a start signal to the analyzer module, which then reads from that RAM as necessary. 2.1.3. Spectrum Analyzer The spectrum analyzer module computes the fundamental frequency of the current window of audio data. It does so using the Harmonic Product Spectrum (HPS) algorithm audio_in_right AC97 Controller audio_reset_b ac97_bit_clock ac97_sdata_in ac97_sdata_out ac97_synch clock_27mhz frame_enable reset audio_in_left 1818audio_out_left 18audio_out_right 18Figure 2: AC’97 Controller 4(Figure X). The basic idea behind HPS is that voice data will almost always have strong harmonics above the fundamental at twice, three times, etc. the frequency. Figure 3: HPS Algorithm [1] To exploit this, consider the spectra you would get from down sampling the input – they would be contracted by a factor equal to that of the down sampling. Now multiply these spectra together for
View Full Document