UW CSEP 590 - Perceptual Audio Coding - D483419

Home> Schools> University of Washington> (CSEP) > CSEP 590> Perceptual Audio Coding

DOC PREVIEW

UW CSEP 590 - Perceptual Audio Coding

School name University of Washington

Course Csep 590- Special Topics In Computer Science (PMP)

Pages 20

This preview shows page 1-2-19-20 out of 20 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

12/6/20071Perceptual Audio CodingHenrique MalvarManaging Director, Redmond LabUW Lecture – December 6, 2007Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression2•Examples12/6/20072Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression3•ExamplesMany applications need digital audio • Communication– Digital TV, Telephony (VoIP) & teleconferencing– Voice mail, voice annotations on e-mail, voice recording•Business– Internet call centers– Multimedia presentations• Entertainment– 150 songs on standard CD4– thousands of songs onportable music players– Internet / Satellite radio, HD Radio– Games, DVD Movies12/6/20073Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression5•ExamplesLinear Predictive Coding (LPC)periodic excitationLPC coefficients() () ( )N∑CombineSynthesis Filterpitchperiodgainscoefficients1() () ( )rrxnen axnr==+−∑e(n)x(n)noise excitationsynthesized speech612/6/20074LPC basics – analysis/synthesissynthesisSynthesis Filtersynthesis parametersAnalysis algorithmresidual waveform1() () ( )Nrren xn axn r==−−∑synthesized speechoriginal speech7LPC variant - CELPselectionindexEncoderSynthesis FiltergainLPC coefficients...originalspeechDecoderexcitation codebooksynthesized speech812/6/20075LPC variant - multipulseLPC coefficientsSynthesis FilterexcitationCompute pulse positions and amplitudessynthesized speechoriginal speech9G.723.1 architectureSynthesis FilterCoefficientsAudioInputLPCFRAMERANALYSISPERCEPTUALNOISE SHAPINGHIGHPASSFILTERSIMULATEDDECODERZERO-INPUTRESPONSE(MEMORY)ExcitationPast DecodedResidualWeightedResidualPITCHESTIMATIONMP-MLQ/ACELPEXCITATIONCOMPUTATION++–ExcitationIndicesEncodedPitch ValuesResidual1012/6/20076Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression11•ExamplesPhysiology of the ear• Automatic gain control– muscles around transmission bones• Directivity– pinna• Boost of middle frequencies– auditory canal• Nonlinear processing– auditory nerve•Filter bank separationCOCHLEAAUDITORYNERVEEARDRUMTRANSMISSIONBONESAUDITORY CANALPINNA12Filter bank separation–cochlea• Thousands of “microphones”– hair cells in cochleaEUSTACHIANTUBE12/6/20077Filter bank modelbandpassamplitudeamplitudex(n)A1(m1)group intolinear linearnonlinearbandpassfilter 1incomingsoundamplitudemeasurementamplitude, band 1bandpassfilter 2amplitudemeasurementamplitude, band 2bandpassamplitudeamplitude......()group into blocksgroup into blocksgroup into...A2(m2)AN(mN)13• Explains frequency-domain maskingbandpassfilter Namplitudemeasurementamplitude, band Ngroup into blocksFrequency-domain masking0204060ude, dBtone 1102103104-60-40-200frequency, Hzamplitu204060e, dBtone 1tone 2tone 3tone 1+3tone 1+2+314102103104-60-40-200frequency, Hzamplitud12312/6/20078Absolute threshold of hearing• Fletcher-Munson curves7080102030405060• Basis for loudness correction in audio amplifiers15102103104-100frequency, HzExample of masking•Typical spectrum& masking threshold• Original sound:• Sound after removing components belowthe thresholdamplitude, dB16the threshold(1/3 to 1/2 of the data):12/6/20079Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression17•ExamplesBlock signal processingExtractBl kDirectOrthogonalXxT=PxBlockOrthogonalTransformInverseOrthogonalAppendBl kProcessingInputSignalOutputSignal~~xX=P~X18OrthogonalTransformBlockSignal is reconstructed as alinear combination of basis functions12/6/200710• Pro: allows adaptabilityBlock processing: good and bad19• Con: blocking artifactsWhy transforms?• More efficient signal representation–Frequency domainqy– Basis functions ~ “typical” signal components• Faster processing– Filtering, compression•Orthogonality20gy– Energy preservation– Robustness to quantization12/6/200711Compactness of representation• Maximum energy concentration in as few coefficients as possible• For stationary random signals, the optimal basis is the Karhunen-Loève transform:• Basis functions are the columns of PλiixxipRp==, PP IT21• Minimum geometric mean of transform coefficient variancesSub-optimal transforms•KLT problems:–Signal dependencySignal dependency– P not factorable into sparse components• Sinusoidal transforms:– Asymptotically optimal for large blocks ii22–Frequency component interpretation– Sparse factors - e.g. FFT12/6/200712Lapped transforms• Basis functions have tails beyond block boundaries– Linear combinations of overlapping functions such as generate smooth signals, without blocking artifacts23Modulated lapped transforms• Basis functions = cosines modulating the same low-pass (window) prototype h(n):• Can be computed from the DCT or FFT•Projectioncan be computed in pn hnMnMkMkafaf=++FHIK+FHIKLNMOQP21212cosπXxT=POMlog2bg24•Projectioncan be computed in operations per input pointXx=POMlog2bg12/6/200713Fast MLT computation−hna()zM−uM n(/ )2+xn().M-sampleblockone-blockdelay.hMna()−−1hna()uM n(/ )21−−Xk()xMn()−−1wMn(/)2+−hns()input signal...MLTcoefficientsyn()output signalwindowingDCT-IVtransform...Uk()25()wM n(/ )21−−hM ns()−−1zM−hns()windowing...DCT-IVtransformYk()processedMLTcoefficientsyMn()−−1M-sampleblockone-blockdelay...Basis functionsDCT: MLT:2612/6/200714Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression27•ExamplesBasic architectureTIME-TO-FREQUENCY TRANSFORMSIDE INFOTRANSFORM(MLT)ORIGINALAUDIOSIGNALMUXMASKING THRESHOLD SPECTRUMWEIGHTINGENCODEDBITSTREAMSI28UNIFORM QUANTIZERENTROPY ENCODERSI12/6/200715Quantization of transform coefficients• Quantization = rounding to nearest integer.•Small range of integer values = fewer bits needed to Small range of integer values fewer bits needed to represent data• Step size T controls range of integer valuesyyTxT=int( / )are mappedto this value29xall valuesin this range …Encoding of quantized coefficients• Typical plot of quantized transform

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-19-20 out of 20 pages.

UW CSEP 590 - Perceptual Audio Coding

Sign up for free to view:

Please select your school