DOC PREVIEW
CMU CS 15492 - Speech Recognition Signal Processing

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Speech Processing 15-492/18-492Speech RecognitionSignal ProcessingAnalog to DigitalSpeech (sound) is analogSpeech (sound) is analogComputers are digital Computers are digital We need to convertWe need to convertSample from ASample from A--D converter D converter N times a secondN times a secondHow many times a second?How many times a second?Goals of Signal ProcessingDistinguish between phonetic typesDistinguish between phonetic typesBe invariant to channel/room conditionsBe invariant to channel/room conditionsBe invariant to speaker characteristicsBe invariant to speaker characteristicsComputational efficiencyComputational efficiencyTime vs Frequency DomainHuman ear distinguishes frequenciesHuman ear distinguishes frequenciesInitial ASR used time domain featuresInitial ASR used time domain featuresPowerPowerZero crossings (sort of frequency)Zero crossings (sort of frequency)Source Filter ModelPulseNoiseFilterVocal Track ModelPitchVoicedUnvoicedTime domain SignalWaveform RepresentationSpeech Spectragram/iy/ vs /ae/• “beat” /b iy t/ and “bat” /b ae t/Frequency Domain• “pencils” /p eh n s ih l z/Frequency Domain• “beats pits” / b iy t s p ih t s /Speech AnalysisStandard ParameterizationSplit waveform into “frames”Split waveform into “frames”Advance every 10msAdvance every 10msSize around 25ms (overlapping frames)Size around 25ms (overlapping frames)Window themWindow themPerform FFT/Mel Perform FFT/Mel CepstralCepstralanalysisanalysisFind Deltas (difference from previous)Find Deltas (difference from previous)Find Delta Deltas (difference in delta)Find Delta Deltas (difference in delta)SummaryTime domain Time domain vsvsFrequency domain Frequency domain Parameterization of speechParameterization of speechFrequency domainFrequency domainShort term Short term FFTsFFTsFFT FFT vsvsMEL MEL


View Full Document
Download Speech Recognition Signal Processing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Speech Recognition Signal Processing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Speech Recognition Signal Processing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?