DOC PREVIEW
CMU CS 15492 - Spectrogram, Cepstrum and Mel-Frequency Analysis

This preview shows page 1-2-3-24-25-26-27-48-49-50 out of 50 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Speech Technology - Kishore Prahallad ([email protected])1Speech Technology: A Practical IntroductionTopic: Spectrogram, Cepstrum and Mel-Frequency AnalysisKishore PrahalladEmail: [email protected] Mellon University&International Institute of Information Technology HyderabadSpeech Technology - Kishore Prahallad ([email protected])2Topics• Spectrogram• Cepstrum • Mel-Frequency Analysis • Mel-Frequency Cepstral CoefficientsSpeech Technology - Kishore Prahallad ([email protected])3SpectrogramSpeech Technology - Kishore Prahallad ([email protected])4Speech signal represented as a sequence of spectral vectorsFFT FFT FFTSpectrumSpeech Technology - Kishore Prahallad ([email protected])5Speech signal represented as a sequence of spectral vectorsFFTSpectrumFFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFTSpeech Technology - Kishore Prahallad ([email protected])6Speech signal represented as a sequence of spectral vectorsFFTSpectrumFFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFTHzAmp.Speech Technology - Kishore Prahallad ([email protected])7Speech signal represented as a sequence of spectral vectorsFFTSpectrumFFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFTHzAmplitudeRotate it by 90 degreesSpeech Technology - Kishore Prahallad ([email protected])8Speech signal represented as a sequence of spectral vectorsFFTSpectrumFFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFTHz• MAP spectral amplitude to a grey level (0-255) value. 0 represents black and 255 represents white.• Higher the amplitude, darker the corresponding region.AmplitudeSpeech Technology - Kishore Prahallad ([email protected])9Speech signal represented as a sequence of spectral vectorsFFTSpectrumFFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFTHzTimeSpeech Technology - Kishore Prahallad ([email protected])10Speech signal represented as a sequence of spectral vectorsFFTSpectrumFFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFTHzTimeTime Vs Frequency representation of a speech signal is referred to as spectrogramSpeech Technology - Kishore Prahallad ([email protected])11Some Real SpectrogramsDark regions indicate peaks (formants) in the spectrumSpeech Technology - Kishore Prahallad ([email protected])12Why we are bothered about spectrogramsPhones and their properties are better observed in spectrogramSpeech Technology - Kishore Prahallad ([email protected])13Why we are bothered about spectrogramsSounds can be identified much better by the Formants and by their transitionsSpeech Technology - Kishore Prahallad ([email protected])14Why we are bothered about spectrogramsSounds can be identified much better by the Formants and by their transitionsHidden Markov Models implicitly model these spectrograms to perform speech recognitionSpeech Technology - Kishore Prahallad ([email protected])15Usefulness of Spectrogram• Time-Frequency representation of the speech signal• Spectrogram is a tool to study speech sounds (phones)• Phones and their properties are visually studied by phoneticians• Hidden Markov Models implicitly model spectrograms for speech totext systems• Useful for evaluation of text to speech systems– A high quality text to speech system should produce synthesized speech whose spectrograms should nearly match with the natural sentences.Speech Technology - Kishore Prahallad ([email protected])16Cepstral AnalysisSpeech Technology - Kishore Prahallad ([email protected])17A Sample Speech SpectrumFrequency (Hz)dB• Peaks denote dominant frequency components in the speech signal • Peaks are referred to as formants• Formants carry the identity of the soundSpeech Technology - Kishore Prahallad ([email protected])18What we want to Extract? –Spectral Envelope• Formants and a smooth curve connecting them• This Smooth curve is referred to as spectral envelope Frequency (Hz)dBSpeech Technology - Kishore Prahallad ([email protected])19Spectral EnvelopeSpectral EnvelopeSpectrumSpectral detailsSpeech Technology - Kishore Prahallad ([email protected])20Spectral EnvelopeSpectral EnvelopeSpectrumSpectral detailslog X[k]log H[k]log E[k]Speech Technology - Kishore Prahallad ([email protected])21Spectral EnvelopeSpectral EnvelopeSpectrumSpectral detailslog X[k]log H[k]log E[k]log X[k] = log H[k] + log E[k]1. Our goal: We want to separate spectral envelope and spectral details from the spectrum. 2. i.e Given log X[k], obtain log H[k] and log E[k], such that log X[k] = log H[k] + log E[k]Speech Technology - Kishore Prahallad ([email protected])22How to achieve this separation ?Speech Technology - Kishore Prahallad ([email protected])23Play a Mathematical TrickSpectral EnvelopeSpectral detailsSpectrum• Trick: Take FFT of the spectrum!!• An FFT on spectrum referred to as Inverse FFT (IFFT).• Note: We are dealing with spectrum in log domain (part of the trick)• IFFT of log spectrum would represent the signal in pseudo-frequency axisSpeech Technology - Kishore Prahallad ([email protected])24Play a Mathematical TrickSpectral EnvelopeA pseudo-frequency axisSpectral detailsSpectrumSpeech Technology - Kishore Prahallad ([email protected])25Play a Mathematical TrickSpectral EnvelopeSpectrumSpectral detailsA pseudo-frequency axisLow Freq. regionHigh Freq. regionSpeech Technology - Kishore Prahallad ([email protected])26Play a Mathematical TrickSpectral EnvelopeSpectrumSpectral detailsA pseudo-frequency axisLow Freq. regionHigh Freq. regionIFFTSpeech Technology - Kishore Prahallad ([email protected])27Play a Mathematical TrickSpectral EnvelopeSpectrumSpectral detailsA pseudo-frequency axisLow Freq. regionHigh Freq. regionIFFTTreat this as a sine wave with 4 cycles per sec.Speech Technology - Kishore Prahallad ([email protected])28Play a Mathematical TrickSpectral EnvelopeSpectrumSpectral detailsA pseudo-frequency axisLow Freq. regionHigh Freq. regionIFFTTreat this as a sine wave with 4 cycles per sec.Gives a peak at 4 Hz in frequency axisSpeech Technology - Kishore Prahallad ([email protected])29Play a Mathematical TrickSpectral EnvelopeSpectrumSpectral detailsA pseudo-frequency axisLow Freq. regionHigh Freq. regionIFFTTreat this as a sine wave with 4 cycles per sec.Gives a peak at 4 Hz in frequency axisSpeech Technology - Kishore Prahallad ([email protected])30Play a Mathematical TrickSpectral EnvelopeSpectrumSpectral detailsA pseudo-frequency axisLow Freq. regionHigh Freq. regionIFFTSpeech Technology - Kishore Prahallad


View Full Document
Download Spectrogram, Cepstrum and Mel-Frequency Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Spectrogram, Cepstrum and Mel-Frequency Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Spectrogram, Cepstrum and Mel-Frequency Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?