BIEN 2011Automatic Volume Leveler for Real Time Speech ApplicationsJustin BareDr. Carol Espy-WilsonDr. Tarun PruthiBIEN 2011The Volume LevelerVolume LeveledNoisy Noise ReducedNoise ReductionVolume LevelingBIEN 2011Motivation• Noise reduction is useful• But it has problems:– Attenuation of speech– Changes in color or timbre and loudness0 0.5 1 1.5 2 2.5 3x 104-1-0.500.51Time (in samples)AmplitudeNoisy Speech and Noise-Reduced Speech Noisy SpeechNoise-Reduced SpeechNoiseSpeechBIEN 2011Overview• My goals for the automatic volume leveler– Fix speech attenuation– Restore loudness– Ensure no clipping occurs in output signal– Do all of the above under real-time constraints• Only present and past frames of signal are available• For each frame (0.01 seconds) of signal:Input noisy speech frameNoise removal (e.g. spectral subtraction)Automatic volume levelerOutput frameBIEN 2011AlgorithmInput noise-reduced signal frameVoice Activity Detector (VAD)Frame has no speechUpdate noise trackingDo not amplifyFrame has speechEstimate the Signal-to-Noise Ratio (SNR)High SNR: Amplify to original levelMiddle SNR: Amplify conservativelyLow SNR: Do not amplifyPrevent clippingBIEN 2011Ideal vs. Real VAD• Ideal VAD (not obtained in real time)• Real VAD (obtained in real time)0 0.5 1 1.5 2 2.5 3x 104-0.8-0.6-0.4-0.200.20.40.60.81Clean SignalPhonetic TranscriptionTimeSound1.78‘f’1.9‘aa’2.02‘r’0 0.5 1 1.5 2 2.5 3x 104-0.200.20.40.60.81VAD Decision0 0.5 1 1.5 2 2.5 3x 104-1-0.8-0.6-0.4-0.200.20.40.60.81Noisy SignalVAD Algorithm0 0.5 1 1.5 2 2.5 3x 104-0.200.20.40.60.81VAD DecisionBIEN 2011Ideal VAD vs. Real VAD0 0.5 1 1.5 2 2.5 3x 104-1-0.500.51Time (in samples)AmplitudeVolume-Leveler Using Ideal VAD NoisyVolume-LeveledNoise-ReducedIdeal VAD0 0.5 1 1.5 2 2.5 3x 104-1-0.500.51Time (in samples)AmplitudeVolume-Leveler Using Real VAD NoisyVolume-LeveledNoise-ReducedReal VADBIEN 2011Ideal VAD vs. Real VAD: SNR and Noise Level Increase-12dB-3dB 0dB 3dB 6dB 12dB18dB01234567Using Ideal VADSignal-to-Noise Ratio (dB)Amount of Increase (dB) SNR IncreaseTotal NoiseLevel IncreaseNear-SpeechNoise Level Increase-12dB-3dB 0dB 3dB 6dB 12dB18dB01234567Using Real VADSignal-to-Noise Ratio (dB)Amount of Increase (dB)BIEN 2011Volume Restoration-12dB -3dB 0dB 3dB 6dB 12dB 18dB020406080100120Percent of OriginalSpeech Amplitude AttainedSignal-to-Noise Ratio (dB)Level of Restoration ofSpeech to Original Amplitude (%) Volume-Leveled Speech (Ideal VAD)Volume-Leveled Speech (Real VAD)Noise-Reduced SpeechBIEN 2011Future Work• Less reliance on VAD accuracy• Incorporate coloring/timbre restoration• Implement in a fast, low level language such as CAcknowledgments• National Science Foundation OCI award #1063035• Dr. Carol Espy-Wilson• Dr. Tarun Pruthi• MERIT BIEN faculty and
or
We will never post anything without your permission.
Don't have an account? Sign up