U of I CS 498 - Audio Features - D2878013

Home> Schools> University of Illinois> Computer Science (CS) > CS 498> Audio Features

DOC PREVIEW

U of I CS 498 - Audio Features

School name University of Illinois

Course Cs 498- Special Topics

Pages 48

This preview shows page 1-2-3-23-24-25-26-46-47-48 out of 48 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Audio Features CS498Today’s lecture • Audio Features • How we hear sound • How we represent sound – In the context of this classWhy features? • Features are a very important area – Bad features make problems unsolvable – Good features make problems trivial • Learning how to pick features is the key – So is understanding what they meanA simple example • Compare two numbers: x,y = {3,3} x,z = {3,100}A simple example • Compare two numbers: – x,y similar but x,z not so much • Best way to represent a number is itself! x − y = 0 x − z = 970 1 2 3 4 5 600.511.50 1 2 3 4 5 600.511.50 1 2 3 4 5 600.511.50 1 2 3 4 5 600.511.5Moving up a level • Compare two vectors: x,yx,zMoving up a level • Compare two vectors: – Simply generalizing numbers concept ∠x, y = 0.03 rad ∠x, z = 0.7 radx − y = 0.16 x − z = 1.07Moving up again • Compare two longer vectors: 0 10 20 30 40 50 60 70 80 90 10000.511.50 10 20 30 40 50 60 70 80 90 10000.511.50 10 20 30 40 50 60 70 80 90 10000.511.50 10 20 30 40 50 60 70 80 90 10000.511.5Look similar but are not! • Oops! ∠x, y = 1.57 rad, x − y = 7.64How about this? • Are these two vectors the same? – Not if you look at their norm or angle … 1 2 3 4 5 6 7x 104−0.6−0.4−0.200.20.40.60.81 2 3 4 5 6 7x 104−1−0.500.5Data norms won’t get you far! • You need to articulate what matters – You need to know what matters • Features are the means to do so • Let’s examine what matters to our ears – Our bodies sorta know bestHearing • Sounds and hearing • Human hearing aspects – Physiology and psychology • Lessons learnedThe hardware (outer/middle ear) Outer ear Middle ear Pinna Ear canal Ear drum • The pinna (auricle) – Aids sound collection – Does directional ﬁltering – Holds earrings, etc … • The ear canal – About 25mm x 7mm – Ampliﬁes sound at ~3kHz by ~10dB – Helps clarify a lot of sounds! • Ear drum – End of middle ear, start of inner ear – Transmits sound as a vibration to the inner earMore hardware (inner ear) • Ear drum (tympanum) – Excites the ossicles (ear bones) • Ossicles – Malleus (hammer), incus (anvil), stapes (stirrup) – Transfers vibrations from ear drum to the oval window – Amplify sound by ~14dB (peak at ~1kHz) – Muscles connected to ossicles control the acoustic reﬂex (damping in presence of loud sounds) • The oval window – Transfers vibrations to the cochlea • Eustachian tube – Used for pressure equalization Ear drum Eustachian tube Ossicles Oval window Cochlea Auditory nerveThe cochlea • The “A/D converter” – Translates oval window vibrations to a neural signal – Fluid ﬁlled with the basilar membrane in the middle – Each section of the basilar membrane resonates with a diﬀerent sound frequency – Vibrations of the basilar membrane move sections of hair cells which send oﬀ neural signals to the brain • The cochlea acts like the equalizer display in your stereo – Frequency domain decomposition • Neural signals from the hair cells go to the auditory nerve Microscope photograph of hair cells (yellow)Masking & Critical bands • When two diﬀerent sounds excite the same section of the basilar membrane one is masked • This is observed at the micro-level – E.g. two tones at 150Hz and 170Hz, if one tone is loud enough the other will be inaudible – A tone can also hide a noise band when loud enough • There are 24 distinct bands throughout the cochlea – a.k.a critical bands – Simultaneous excitation on a band by multiple sources results in a single source percept • There is also some temporal masking – Preceding sounds mask what’s next • This is a feature which is taken into advantage by a lot of audio compression – Throws away stuﬀ you won’t hear due to masking Masking for close frequency tones vs distant tonesThe neural pathways • A series of neural stops • Cochlear nuclei – Prepping/distribution of neural data from cochlea • Superior Olivary Complex – Coincidence detection across ear signals – Localization functions • Inferior Colliculus – Last place where we have most original data – Probably initiates ﬁrst auditory images in brain • Medial Geniculate Body – Relays various sound features (frequency, intensity, etc) to the auditory cortex • Auditory Cortex – Reasoning, recognition, identiﬁcation, etc – High-level processing Superior olivary complex Cochlear nuclei Inferior colliculus Medial geniculate body Auditory cortex Stream of conciousness … Cochleas ? EarsThe limits of hearing • Frequency – 20Hz to 20kHz (upper limit decreases with age/trauma) – Infrasound (< 20Hz) can be felt through skin, also as events – Ultrasound (> 20kHz) can be “emotionally” perceived (discomfort, nausea, etc) • Loudness – Low limit is 2x10-10 atm – 0dB SPL to 130dB SPL (but also frequency dependent) • A dynamic range of 3x106 to 1! – 130dB SPL threshold of pain"– 194dB SPL is definition of a shock wave, sounds stops!"16 315 53 125 250 5000 1000 2000 4000 8000 16000 Frequency (Hz) Intensity (dB) -10 0 10 20 30 40 50 60 70 80 90 100 110 120 130 Speech Music Audible sounds Pain! Inaudibility Tones at various frequencies, how high can you hear?Perception of loudness • Loudness is subjective – Perceived loudness changes with frequency – Perception of “twice as loud” is not really that! – Ditto for equal loudness • Fletcher-Munson curves – Equal loudness perception curves through frequencies • Just noticeable diﬀerence is about 1dB SLP • 1kHz to 5kHz are the loudest heard frequencies – What the ear canal and ossicles amplify! • Low limit shifts up with age!Perception of pitch • Pitch is another subjective (and arbitrary) measure!• Perception of pitch doubling doesn’t imply doubling of Hz!– Mel scale is the perceptual pitch scale!– Twice as many Mels correspond to a perceived pitch doubling!• Musically useful range varies from 30Hz to 4kHz!• Just noticeable difference is about 0.5% of frequency!– Varies with training

View Full Document