12/6/20071Perceptual Audio CodingHenrique MalvarManaging Director, Redmond LabUW Lecture – December 6, 2007Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression2•Examples12/6/20072Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression3•ExamplesMany applications need digital audio • Communication– Digital TV, Telephony (VoIP) & teleconferencing– Voice mail, voice annotations on e-mail, voice recording•Business– Internet call centers– Multimedia presentations• Entertainment– 150 songs on standard CD4– thousands of songs onportable music players– Internet / Satellite radio, HD Radio– Games, DVD Movies12/6/20073Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression5•ExamplesLinear Predictive Coding (LPC)periodic excitationLPC coefficients() () ( )N∑CombineSynthesis Filterpitchperiodgainscoefficients1() () ( )rrxnen axnr==+−∑e(n)x(n)noise excitationsynthesized speech612/6/20074LPC basics – analysis/synthesissynthesisSynthesis Filtersynthesis parametersAnalysis algorithmresidual waveform1() () ( )Nrren xn axn r==−−∑synthesized speechoriginal speech7LPC variant - CELPselectionindexEncoderSynthesis FiltergainLPC coefficients...originalspeechDecoderexcitation codebooksynthesized speech812/6/20075LPC variant - multipulseLPC coefficientsSynthesis FilterexcitationCompute pulse positions and amplitudessynthesized speechoriginal speech9G.723.1 architectureSynthesis FilterCoefficientsAudioInputLPCFRAMERANALYSISPERCEPTUALNOISE SHAPINGHIGHPASSFILTERSIMULATEDDECODERZERO-INPUTRESPONSE(MEMORY)ExcitationPast DecodedResidualWeightedResidualPITCHESTIMATIONMP-MLQ/ACELPEXCITATIONCOMPUTATION++–ExcitationIndicesEncodedPitch ValuesResidual1012/6/20076Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression11•ExamplesPhysiology of the ear• Automatic gain control– muscles around transmission bones• Directivity– pinna• Boost of middle frequencies– auditory canal• Nonlinear processing– auditory nerve•Filter bank separationCOCHLEAAUDITORYNERVEEARDRUMTRANSMISSIONBONESAUDITORY CANALPINNA12Filter bank separation–cochlea• Thousands of “microphones”– hair cells in cochleaEUSTACHIANTUBE12/6/20077Filter bank modelbandpassamplitudeamplitudex(n)A1(m1)group intolinear linearnonlinearbandpassfilter 1incomingsoundamplitudemeasurementamplitude, band 1bandpassfilter 2amplitudemeasurementamplitude, band 2bandpassamplitudeamplitude......()group into blocksgroup into blocksgroup into...A2(m2)AN(mN)13• Explains frequency-domain maskingbandpassfilter Namplitudemeasurementamplitude, band Ngroup into blocksFrequency-domain masking0204060ude, dBtone 1102103104-60-40-200frequency, Hzamplitu204060e, dBtone 1tone 2tone 3tone 1+3tone 1+2+314102103104-60-40-200frequency, Hzamplitud12312/6/20078Absolute threshold of hearing• Fletcher-Munson curves7080102030405060• Basis for loudness correction in audio amplifiers15102103104-100frequency, HzExample of masking•Typical spectrum& masking threshold• Original sound:• Sound after removing components belowthe thresholdamplitude, dB16the threshold(1/3 to 1/2 of the data):12/6/20079Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression17•ExamplesBlock signal processingExtractBl kDirectOrthogonalXxT=PxBlockOrthogonalTransformInverseOrthogonalAppendBl kProcessingInputSignalOutputSignal~~xX=P~X18OrthogonalTransformBlockSignal is reconstructed as alinear combination of basis functions12/6/200710• Pro: allows adaptabilityBlock processing: good and bad19• Con: blocking artifactsWhy transforms?• More efficient signal representation–Frequency domainqy– Basis functions ~ “typical” signal components• Faster processing– Filtering, compression•Orthogonality20gy– Energy preservation– Robustness to quantization12/6/200711Compactness of representation• Maximum energy concentration in as few coefficients as possible• For stationary random signals, the optimal basis is the Karhunen-Loève transform:• Basis functions are the columns of PλiixxipRp==, PP IT21• Minimum geometric mean of transform coefficient variancesSub-optimal transforms•KLT problems:–Signal dependencySignal dependency– P not factorable into sparse components• Sinusoidal transforms:– Asymptotically optimal for large blocks ii22–Frequency component interpretation– Sparse factors - e.g. FFT12/6/200712Lapped transforms• Basis functions have tails beyond block boundaries– Linear combinations of overlapping functions such as generate smooth signals, without blocking artifacts23Modulated lapped transforms• Basis functions = cosines modulating the same low-pass (window) prototype h(n):• Can be computed from the DCT or FFT•Projectioncan be computed in pn hnMnMkMkafaf=++FHIK+FHIKLNMOQP21212cosπXxT=POMlog2bg24•Projectioncan be computed in operations per input pointXx=POMlog2bg12/6/200713Fast MLT computation−hna()zM−uM n(/ )2+xn().M-sampleblockone-blockdelay.hMna()−−1hna()uM n(/ )21−−Xk()xMn()−−1wMn(/)2+−hns()input signal...MLTcoefficientsyn()output signalwindowingDCT-IVtransform...Uk()25()wM n(/ )21−−hM ns()−−1zM−hns()windowing...DCT-IVtransformYk()processedMLTcoefficientsyMn()−−1M-sampleblockone-blockdelay...Basis functionsDCT: MLT:2612/6/200714Contents• Motivation• “Source coding”: good for speech• “Sink coding”: Auditory Masking• Block & Lapped Transforms• Audio compression27•ExamplesBasic architectureTIME-TO-FREQUENCY TRANSFORMSIDE INFOTRANSFORM(MLT)ORIGINALAUDIOSIGNALMUXMASKING THRESHOLD SPECTRUMWEIGHTINGENCODEDBITSTREAMSI28UNIFORM QUANTIZERENTROPY ENCODERSI12/6/200715Quantization of transform coefficients• Quantization = rounding to nearest integer.•Small range of integer values = fewer bits needed to Small range of integer values fewer bits needed to represent data• Step size T controls range of integer valuesyyTxT=int( / )are mappedto this value29xall valuesin this range …Encoding of quantized coefficients• Typical plot of quantized transform
View Full Document