DOC PREVIEW
UW CSEP 590 - Perceptual Audio Coding

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

12/6/20071Perceptual Audio CodingHenrique MalvarManaging Director, Redmond LabUW Lecture – December 6, 2007Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression2•Examples12/6/20072Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression3•ExamplesMany applications need digital audio • Communication– Digital TV, Telephony (VoIP) & teleconferencing– Voice mail, voice annotations on e-mail, voice recording•Business– Internet call centers– Multimedia presentations• Entertainment– 150 songs on standard CD4– thousands of songs onportable music players– Internet / Satellite radio, HD Radio– Games, DVD Movies12/6/20073Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression5•ExamplesLinear Predictive Coding (LPC)periodic excitationLPC coefficients() () ( )N∑CombineSynthesis Filterpitchperiodgainscoefficients1() () ( )rrxnen axnr==+−∑e(n)x(n)noise excitationsynthesized speech612/6/20074LPC basics – analysis/synthesissynthesisSynthesis Filtersynthesis parametersAnalysis algorithmresidual waveform1() () ( )Nrren xn axn r==−−∑synthesized speechoriginal speech7LPC variant - CELPselectionindexEncoderSynthesis FiltergainLPC coefficients...originalspeechDecoderexcitation codebooksynthesized speech812/6/20075LPC variant - multipulseLPC coefficientsSynthesis FilterexcitationCompute pulse positions and amplitudessynthesized speechoriginal speech9G.723.1 architectureSynthesis FilterCoefficientsAudioInputLPCFRAMERANALYSISPERCEPTUALNOISE SHAPINGHIGHPASSFILTERSIMULATEDDECODERZERO-INPUTRESPONSE(MEMORY)ExcitationPast DecodedResidualWeightedResidualPITCHESTIMATIONMP-MLQ/ACELPEXCITATIONCOMPUTATION++–ExcitationIndicesEncodedPitch ValuesResidual1012/6/20076Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression11•ExamplesPhysiology of the ear• Automatic gain control– muscles around transmission bones• Directivity– pinna• Boost of middle frequencies– auditory canal• Nonlinear processing– auditory nerve•Filter bank separationCOCHLEAAUDITORYNERVEEARDRUMTRANSMISSIONBONESAUDITORY CANALPINNA12Filter bank separation–cochlea• Thousands of “microphones”– hair cells in cochleaEUSTACHIANTUBE12/6/20077Filter bank modelbandpassamplitudeamplitudex(n)A1(m1)group intolinear linearnonlinearbandpassfilter 1incomingsoundamplitudemeasurementamplitude, band 1bandpassfilter 2amplitudemeasurementamplitude, band 2bandpassamplitudeamplitude......()group into blocksgroup into blocksgroup into...A2(m2)AN(mN)13• Explains frequency-domain maskingbandpassfilter Namplitudemeasurementamplitude, band Ngroup into blocksFrequency-domain masking0204060ude, dBtone 1102103104-60-40-200frequency, Hzamplitu204060e, dBtone 1tone 2tone 3tone 1+3tone 1+2+314102103104-60-40-200frequency, Hzamplitud12312/6/20078Absolute threshold of hearing• Fletcher-Munson curves7080102030405060• Basis for loudness correction in audio amplifiers15102103104-100frequency, HzExample of masking•Typical spectrum& masking threshold• Original sound:• Sound after removing components belowthe thresholdamplitude, dB16the threshold(1/3 to 1/2 of the data):12/6/20079Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression17•ExamplesBlock signal processingExtractBl kDirectOrthogonalXxT=PxBlockOrthogonalTransformInverseOrthogonalAppendBl kProcessingInputSignalOutputSignal~~xX=P~X18OrthogonalTransformBlockSignal is reconstructed as alinear combination of basis functions12/6/200710• Pro: allows adaptabilityBlock processing: good and bad19• Con: blocking artifactsWhy transforms?• More efficient signal representation–Frequency domainqy– Basis functions ~ “typical” signal components• Faster processing– Filtering, compression•Orthogonality20gy– Energy preservation– Robustness to quantization12/6/200711Compactness of representation• Maximum energy concentration in as few coefficients as possible• For stationary random signals, the optimal basis is the Karhunen-Loève transform:• Basis functions are the columns of PλiixxipRp==, PP IT21• Minimum geometric mean of transform coefficient variancesSub-optimal transforms•KLT problems:–Signal dependencySignal dependency– P not factorable into sparse components• Sinusoidal transforms:– Asymptotically optimal for large blocks ii22–Frequency component interpretation– Sparse factors - e.g. FFT12/6/200712Lapped transforms• Basis functions have tails beyond block boundaries– Linear combinations of overlapping functions such as generate smooth signals, without blocking artifacts23Modulated lapped transforms• Basis functions = cosines modulating the same low-pass (window) prototype h(n):• Can be computed from the DCT or FFT•Projectioncan be computed in pn hnMnMkMkafaf=++FHIK+FHIKLNMOQP21212cosπXxT=POMlog2bg24•Projectioncan be computed in operations per input pointXx=POMlog2bg12/6/200713Fast MLT computation−hna()zM−uM n(/ )2+xn().M-sampleblockone-blockdelay.hMna()−−1hna()uM n(/ )21−−Xk()xMn()−−1wMn(/)2+−hns()input signal...MLTcoefficientsyn()output signalwindowingDCT-IVtransform...Uk()25()wM n(/ )21−−hM ns()−−1zM−hns()windowing...DCT-IVtransformYk()processedMLTcoefficientsyMn()−−1M-sampleblockone-blockdelay...Basis functionsDCT: MLT:2612/6/200714Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression27•ExamplesBasic architectureTIME-TO-FREQUENCY TRANSFORMSIDE INFOTRANSFORM(MLT)ORIGINALAUDIOSIGNALMUXMASKING THRESHOLD SPECTRUMWEIGHTINGENCODEDBITSTREAMSI28UNIFORM QUANTIZERENTROPY ENCODERSI12/6/200715Quantization of transform coefficients• Quantization = rounding to nearest integer.•Small range of integer values = fewer bits needed to Small range of integer values fewer bits needed to represent data• Step size T controls range of integer valuesyyTxT=int( / )are mappedto this value29xall valuesin this range …Encoding of quantized coefficients• Typical plot of quantized transform


View Full Document

UW CSEP 590 - Perceptual Audio Coding

Documents in this Course
Sequitur

Sequitur

56 pages

Sequitur

Sequitur

56 pages

Protocols

Protocols

106 pages

Spyware

Spyware

31 pages

Sequitur

Sequitur

10 pages

Load more
Download Perceptual Audio Coding
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Perceptual Audio Coding and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Perceptual Audio Coding 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?