CMU CS 15492 - Speech Recognition Signal Processing - D1309192

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 15492> Speech Recognition Signal Processing

DOC PREVIEW

CMU CS 15492 - Speech Recognition Signal Processing

School name Carnegie Mellon University

Course Cs 15492- Special Topic: Speech Processing

Pages 15

This preview shows page 1-2-3-4-5 out of 15 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Speech Processing 15-492/18-492Speech RecognitionSignal ProcessingAnalog to DigitalSpeech (sound) is analogSpeech (sound) is analogComputers are digital Computers are digital We need to convertWe need to convertSample from ASample from A--D converter D converter N times a secondN times a secondHow many times a second?How many times a second?Goals of Signal ProcessingDistinguish between phonetic typesDistinguish between phonetic typesBe invariant to channel/room conditionsBe invariant to channel/room conditionsBe invariant to speaker characteristicsBe invariant to speaker characteristicsComputational efficiencyComputational efficiencyTime vs Frequency DomainHuman ear distinguishes frequenciesHuman ear distinguishes frequenciesInitial ASR used time domain featuresInitial ASR used time domain featuresPowerPowerZero crossings (sort of frequency)Zero crossings (sort of frequency)Source Filter ModelPulseNoiseFilterVocal Track ModelPitchVoicedUnvoicedTime domain SignalWaveform RepresentationSpeech Spectragram/iy/ vs /ae/• “beat” /b iy t/ and “bat” /b ae t/Frequency Domain• “pencils” /p eh n s ih l z/Frequency Domain• “beats pits” / b iy t s p ih t s /Speech AnalysisStandard ParameterizationSplit waveform into “frames”Split waveform into “frames”Advance every 10msAdvance every 10msSize around 25ms (overlapping frames)Size around 25ms (overlapping frames)Window themWindow themPerform FFT/Mel Perform FFT/Mel CepstralCepstralanalysisanalysisFind Deltas (difference from previous)Find Deltas (difference from previous)Find Delta Deltas (difference in delta)Find Delta Deltas (difference in delta)SummaryTime domain Time domain vsvsFrequency domain Frequency domain Parameterization of speechParameterization of speechFrequency domainFrequency domainShort term Short term FFTsFFTsFFT FFT vsvsMEL MEL

View Full Document