Audio Features CS498Today’s lecture • Audio Features • How we hear sound • How we represent sound – In the context of this classWhy features? • Features are a very important area – Bad features make problems unsolvable – Good features make problems trivial • Learning how to pick features is the key – So is understanding what they meanA simple example • Compare two numbers: x,y = {3,3} x,z = {3,100}A simple example • Compare two numbers: – x,y similar but x,z not so much • Best way to represent a number is itself! x − y = 0 x − z = 970 1 2 3 4 5 600.511.50 1 2 3 4 5 600.511.50 1 2 3 4 5 600.511.50 1 2 3 4 5 600.511.5Moving up a level • Compare two vectors: x,yx,zMoving up a level • Compare two vectors: – Simply generalizing numbers concept ∠x, y = 0.03 rad ∠x, z = 0.7 radx − y = 0.16 x − z = 1.07Moving up again • Compare two longer vectors: 0 10 20 30 40 50 60 70 80 90 10000.511.50 10 20 30 40 50 60 70 80 90 10000.511.50 10 20 30 40 50 60 70 80 90 10000.511.50 10 20 30 40 50 60 70 80 90 10000.511.5Look similar but are not! • Oops! ∠x, y = 1.57 rad, x − y = 7.64How about this? • Are these two vectors the same? – Not if you look at their norm or angle … 1 2 3 4 5 6 7x 104−0.6−0.4−0.200.20.40.60.81 2 3 4 5 6 7x 104−1−0.500.5Data norms won’t get you far! • You need to articulate what matters – You need to know what matters • Features are the means to do so • Let’s examine what matters to our ears – Our bodies sorta know bestHearing • Sounds and hearing • Human hearing aspects – Physiology and psychology • Lessons learnedThe hardware (outer/middle ear) Outer ear Middle ear Pinna Ear canal Ear drum • The pinna (auricle) – Aids sound collection – Does directional filtering – Holds earrings, etc … • The ear canal – About 25mm x 7mm – Amplifies sound at ~3kHz by ~10dB – Helps clarify a lot of sounds! • Ear drum – End of middle ear, start of inner ear – Transmits sound as a vibration to the inner earMore hardware (inner ear) • Ear drum (tympanum) – Excites the ossicles (ear bones) • Ossicles – Malleus (hammer), incus (anvil), stapes (stirrup) – Transfers vibrations from ear drum to the oval window – Amplify sound by ~14dB (peak at ~1kHz) – Muscles connected to ossicles control the acoustic reflex (damping in presence of loud sounds) • The oval window – Transfers vibrations to the cochlea • Eustachian tube – Used for pressure equalization Ear drum Eustachian tube Ossicles Oval window Cochlea Auditory nerveThe cochlea • The “A/D converter” – Translates oval window vibrations to a neural signal – Fluid filled with the basilar membrane in the middle – Each section of the basilar membrane resonates with a different sound frequency – Vibrations of the basilar membrane move sections of hair cells which send off neural signals to the brain • The cochlea acts like the equalizer display in your stereo – Frequency domain decomposition • Neural signals from the hair cells go to the auditory nerve Microscope photograph of hair cells (yellow)Masking & Critical bands • When two different sounds excite the same section of the basilar membrane one is masked • This is observed at the micro-level – E.g. two tones at 150Hz and 170Hz, if one tone is loud enough the other will be inaudible – A tone can also hide a noise band when loud enough • There are 24 distinct bands throughout the cochlea – a.k.a critical bands – Simultaneous excitation on a band by multiple sources results in a single source percept • There is also some temporal masking – Preceding sounds mask what’s next • This is a feature which is taken into advantage by a lot of audio compression – Throws away stuff you won’t hear due to masking Masking for close frequency tones vs distant tonesThe neural pathways • A series of neural stops • Cochlear nuclei – Prepping/distribution of neural data from cochlea • Superior Olivary Complex – Coincidence detection across ear signals – Localization functions • Inferior Colliculus – Last place where we have most original data – Probably initiates first auditory images in brain • Medial Geniculate Body – Relays various sound features (frequency, intensity, etc) to the auditory cortex • Auditory Cortex – Reasoning, recognition, identification, etc – High-level processing Superior olivary complex Cochlear nuclei Inferior colliculus Medial geniculate body Auditory cortex Stream of conciousness … Cochleas ? EarsThe limits of hearing • Frequency – 20Hz to 20kHz (upper limit decreases with age/trauma) – Infrasound (< 20Hz) can be felt through skin, also as events – Ultrasound (> 20kHz) can be “emotionally” perceived (discomfort, nausea, etc) • Loudness – Low limit is 2x10-10 atm – 0dB SPL to 130dB SPL (but also frequency dependent) • A dynamic range of 3x106 to 1! – 130dB SPL threshold of pain"– 194dB SPL is definition of a shock wave, sounds stops!"16 315 53 125 250 5000 1000 2000 4000 8000 16000 Frequency (Hz) Intensity (dB) -10 0 10 20 30 40 50 60 70 80 90 100 110 120 130 Speech Music Audible sounds Pain! Inaudibility Tones at various frequencies, how high can you hear?Perception of loudness • Loudness is subjective – Perceived loudness changes with frequency – Perception of “twice as loud” is not really that! – Ditto for equal loudness • Fletcher-Munson curves – Equal loudness perception curves through frequencies • Just noticeable difference is about 1dB SLP • 1kHz to 5kHz are the loudest heard frequencies – What the ear canal and ossicles amplify! • Low limit shifts up with age!Perception of pitch • Pitch is another subjective (and arbitrary) measure!• Perception of pitch doubling doesn’t imply doubling of Hz!– Mel scale is the perceptual pitch scale!– Twice as many Mels correspond to a perceived pitch doubling!• Musically useful range varies from 30Hz to 4kHz!• Just noticeable difference is about 0.5% of frequency!– Varies with training
View Full Document