UCSB ECE 160 - MPEG Audio Compression - D2843694

Home> Schools> University of California, Santa Barbara> East Asian Cultural Studies (ECE) > ECE 160> MPEG Audio Compression

DOC PREVIEW

UCSB ECE 160 - MPEG Audio Compression

School name University of California, Santa Barbara

Course Ece 160- Multimedia Systems

Pages 44

This preview shows page 1-2-3-21-22-23-42-43-44 out of 44 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

ECE160 / CMPS182 MultimediaVocodersPhase InsensitivityChannel VocoderSlide 5Formant VocoderLinear Predictive Coding (LPC)LPC Coding ProcessSlide 9Code Excited Linear Prediction (CELP)PsychoacousticsEqual-Loudness RelationsThreshold of HearingFrequency MaskingFrequency Masking CurvesFrequency Masking CurveSlide 17Critical BandsCritical Bands and BandwidthBark UnitTemporal MaskingTemporal and Frequency MaskingSlide 23MPEG AudioMPEG LayersSlide 26MPEG Audio StrategySlide 28MPEG Audio Compression AlgorithmSlide 30Slide 31Bit Allocation AlgorithmSlide 33MPEG Layers 1 and 2Layer 2 of MPEG AudioLayer 3 of MPEG AudioMPEG Layer 3 CodingMP3 Compression PerformanceMPEG-2 AAC (Advanced Audio Coding)MPEG-2 AACMPEG-4 AudioSlide 42Other Commercial Audio CodecsMPEG-7 and MPEG-21ECE160Spring 2009 Lecture 14MPEG Audio Compression1ECE160 / CMPS182MultimediaLecture 14: Spring 2009MPEG Audio CompressionECE160Spring 2009 Lecture 14MPEG Audio Compression2Vocoders•Vocoders - voice coders, which cannot be usefully applied when other analog signals, such as modem signals, are in use.–concerned with modeling speech so that the salient features are captured in as few bits as possible.–use either a model of the speech waveform in time (LPC (Linear Predictive Coding) vocoding), or ... –break down the signal into frequency components and model these (channel vocoders and formant vocoders).•Vocoder simulation of the voice is not very good yet. There is a compromise between very strong compression and speech quality.ECE160Spring 2009 Lecture 14MPEG Audio Compression3Phase Insensitivity•A complete reconstituting of speech waveform is really unnecessary, perceptually: what is needed is for the amount of energy at any time and frequency to be right, and the signal will sound about right.•Phase is a shift in the time argument inside a function of time.–Suppose we strike a piano key, and generate a roughly sinusoidal sound cos(ωt), with ω = 2πf.–Now if we wait sufficient time to generate a phase shift π/2 and then strike another key, with sound cos(2ωt + π/2), we generate a waveform like the solid line –This waveform is the sum cos(ωt) + cos(2ωt + π/2).–If we did not wait before striking the second note, then our waveform would be cos(ωt) + cos(2ωt). But perceptually, the two notes would sound the same sound, even though in actuality they would be shifted in phase.ECE160Spring 2009 Lecture 14MPEG Audio Compression4Channel VocoderVocoders canoperate atlow bit-rates,1-2 kbps.A channel vocoderfirst applies a filterbank to separate out the differentfrequencycomponentsECE160Spring 2009 Lecture 14MPEG Audio Compression5Channel Vocoder•A channel vocoder first applies a filter bank to separate out the different frequency components.•Due to Phase Insensitivity (only the energy is important):–The waveform is “rectified" to its absolute value.–The filter bank derives power levels for each frequency range.–A subband coder would not rectify the signal, and would use wider frequency bands.•A channel vocoder also analyzes the signal to determine the general pitch of the speech (low-bass, or high-tenor), and also the excitation of the speech.•A channel vocoder applies a vocal tract transfer model to generate a vector of excitation parameters that describe a model of the sound, and also guesses whether the sound is voiced or unvoiced.ECE160Spring 2009 Lecture 14MPEG Audio Compression6Formant Vocoder•Formants: the salient frequency components that are present in a sample of speech.•Rationale: encode only the most important frequencies.•The solid line shows frequencies present in the first 40 msec of a speech sample. The dashed line shows that while similar frequencies are still present one second later, these frequencies have shifted.ECE160Spring 2009 Lecture 14MPEG Audio Compression7Linear Predictive Coding (LPC)•LPC vocoders extract salient features of speech directly from the waveform, rather than transforming the signal to the frequency domain•LPC Features:–uses a time-varying model of vocal tract sound generated from a given excitation–transmits only a set of parameters modeling the shape and excitation of the vocal tract, not actual signals or differences - small bit-rate•About “Linear": The speech signal generated by the output vocal tract model is calculated as a function of the current speech output plus a second term linear in previous model coefficientsECE160Spring 2009 Lecture 14MPEG Audio Compression8LPC Coding ProcessLPC starts by deciding whether the current segment is voiced (vocal cords resonate) or unvoiced:•For unvoiced: a wide-band noise generator creates a signal f(n) that acts as input to the vocal tract simulator•For voiced: a pulse train generator creates signal f(n)•Model parameters ai: calculated by using a least-squares set of equations that minimize the difference between the actual speech and the speech generated by the vocal tract model, excited by the noise or pulse train generators that capture speech parametersECE160Spring 2009 Lecture 14MPEG Audio Compression9LPC Coding Process•If the output values generate s(n), for input values f(n), the output

View Full Document