Unformatted text preview:

Acoustics of Speech Julia Hirschberg CS 4706 01 14 19 1 Claim How things are said can be critical to understanding I e Varying phrasing prominence pitch range speaking rate pitch contour voice quality conveys meaning What is our evidence How do we prove Observation Hypotheses Experimentation perception production Speech analysis independent variables Correlation with dependent variable 01 14 19 2 What does our data look like What tools do we have for analysis 01 14 19 3 What is sound Pressure fluctuations in the air caused by a musical instrument a car horn a voice Cause eardrum to move Auditory system translates into neural impulses Brain interprets as sound Can we tell one sound from another Can we distinguish one particular sound in noise 01 14 19 4 From a speech centric point of view when sound is not produced by the human voice we may term it noise Ratio of speech generated sound to other simultaneous sound signal to noise ratio 01 14 19 5 How Loud are Common Sounds Event Absolute Whisper Quiet office Conversation Bus Subway Thunder DAMAGE 01 14 19 Pressure Pa 20 200 2K 20K 200K 2M 20M 200M Db 0 20 40 60 80 100 120 140 6 Some Sounds are Periodic Simple Periodic Waves sine waves defined by Frequency how often does pattern repeat per time unit Cycle one repetition Period duration of cycle Frequency cycles per time unit e g Frequency in Hz 1sec period in sec Horizontal axis of waveform Amplitude peak deviation of pressure from normal atmospheric pressure 01 14 19 7 Phase timing of waveform relative to a reference point Complex periodic waves Cyclic but composed of two or more sine waves Fundamental frequency F0 rate at which largest pattern repeats also GCD of component freqs Components not always easily identifiable power spectrum graphs amplitude vs frequency Any complex waveform can be analyzed into a set of sine waves with their own frequencies amplitudes and phases Fourier s theorem E g some speech sounds mostly vowels cat wav 01 14 19 8 Some Sounds are Aperiodic Waveforms with random or non repeating patterns Random aperiodic waveforms white noise Flat spectrum equal amplitude for all frequency components Transients sudden bursts of pressure clicks pops door slams Waveform shows a single impulse click wav Fourier analysis shows a flat spectrum Some speech sounds e g many consonants e g cat wav 01 14 19 9 Speech Production Voiced and voiceless sounds Vocal fold vibration filtered by the Vocal tract produces complex periodic waveform Cycles per sec of lowest frequency component of signal fundamental frequency F0 Fourier analysis yields power spectrum with component frequencies and amplitudes F0 is first lowest frequency peak Harmonics are resonances of vocal track multiples of F0 01 14 19 10 Vocal fold vibration UCLA Phonetics Lab demo 01 14 19 11 Places of articulation dental labial alveolar post alveolar palatal velar uvular pharyngeal laryngeal glottal 01 14 19 http www chass utoronto ca danhall phonetics sammy html 12 How do we capture speech for analysis Recording conditions A quiet office a sound booth an anachoic chamber Microphones Analog devices e g tape recorders store and analyze continuous air pressure variations speech as a continuous signal Digital devices e g computers DAT first convert continuous signals into discrete signals A to D conversion 01 14 19 13 File format wav aiff ds au sph Conversion programs e g sox Storage Function of how much information we store about speech in digitization Higher quality closer to original More space 1000s of hours of speech take up a lot of space 01 14 19 14 Sampling Sampling rate how often do we need to sample At least 2 samples per cycle to capture periodicity of a waveform component at a given frequency 100 Hz waveform needs 200 samples per sec Nyquist frequency highest frequency component captured with a given sampling rate half the sampling rate 01 14 19 15 Sampling storage tradeoff Human hearing 20K top frequency Do we really need to store 40K samples per second of speech Telephone speech 300 4K Hz 8K sampling But some speech sounds e g fricatives f s p t d have energy above 4K Peter teeter Dieter 44k CD quality audio vs 16 22K usually good enough to study pitch amplitude duration 01 14 19 16 Sampling Errors Aliasing Signal s frequency higher than half the sampling rate Solutions Increase the sampling rate Filter out frequencies above half the sampling rate anti aliasing filter 01 14 19 17 Quantization Measuring the amplitude at sampling points what resolution to choose Integer representation 8 12 or 16 bits per sample Noise due to quantization steps avoided by higher resolution but requires more storage How many different amplitude levels do we need to distinguish Choice depends on data and application 44K 16bit stereo requires 10Mb storage 01 14 19 18 But clipping occurs when input volume is greater than range representable in digitized waveform Increase the resolution Decrease the amplitude 01 14 19 19 What can we do if our data is noisy Acoustic filters block out certain frequencies of sounds Low pass filter blocks high frequency components of a waveform High pass filter blocks low frequencies Reject band what to block vs pass band what to let through But if frequencies of two sounds overlap source separation 01 14 19 20 How can we capture pitch contours pitch range What is the pitch contour of this utterance Is the pitch range of X greater than that of Y Pitch tracking Estimate F0 over time as fn of vocal fold vibration A periodic waveform is correlated with itself One period looks much like another cat wav Find the period by finding the lag offset between two windows on the signal for which the correlation of the windows is highest Lag duration T is 1 period of waveform Inverse is F0 1 T 01 14 19 21 Errors to watch for Halving shortest lag calculated is too long underestimate pitch Doubling shortest lag too short overestimate pitch Microprosody errors e g v 01 14 19 22 Sample Analysis File Pitch Track Header version 1 type code 4 frequency 12000 000000 samples 160768 start time 0 000000 end time 13 397333 bandwidth 6000 000000 dimensions 1 maximum 9660 000000 minimum 17384 000000 time Sat Nov 2 15 55 50 1991 operation record padding xxxxxxxxxxxx 01 14 19 23 Sample Analysis File Pitch Track Data F0 Pvoicing Energy A C Score 147 896 1 2154 07 0 902643 140 894 1 1544 93 0 967008 138 05 1 1080 55 0 92588 130 399 1 745 262 0 595265 0 0 567 153 0 504029 0 0 638 037 0 222939 0 0 670 936 0 370024 0 0 790 751 0 357141


View Full Document

Columbia CS 4706 - Acoustics of Speech

Loading Unlocking...
Login

Join to view Acoustics of Speech and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Acoustics of Speech and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?