DOC PREVIEW
UW-Madison ECE 539 - Audio classification

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Audio classification Discriminating speech, music and environmental audioObjectiveFeature extractionSlide 4Neural network developmentResultsAudio classificationDiscriminating speech, music and environmental audioRajas A. SambhareECE 539ObjectiveDiscrimination between speech, music and environmental audio (special effects) using short 3-second samples•To extract a relevant set of feature vectors from the audio samples•To develop a pattern classifier that can successfully discriminate the three different classes based on the extracted vectorsFeature extraction0202|)(||)(|dFdFcFrequency CentroiddFdFBc02022|)(||)(|)(BandwidthFeature extraction3 sec audio sample(22050 Hz)512-sample frames512 point FFTExtract centroid, energy in 22 critical bands,and bandwidth23.21ms, 512 samples, 25% overlap, Hanning Calculate log power ratios in each bandCalculate mean, SD for centroid, log power ratios andbandwidth across all frames21Calculate silence ratio (SR)Concatenate mean, SDof centroid, log powerratios, bandwidth andsilence ratioSave 49 dimensionfeature vectorNeural network development•Create a database of 135 training and 45 testing samples•Develop neural network using MATLAB•Dynamically partition training samples using 25% for tuning•Decide on network architecture (No. of hidden layers and neurons)•Decide on network parameters like  and •Attempt classification using various combinations of feature vectorsFeedforward Multi-layer perceptron with back-propagation training49203Designed network, 49-20-3Results•Classification rate of 82.37% after using critical sub-band ratios, frequency centroid, bandwidth and silence ratios•Classification rate of 79.78% after using only critical sub-band ratios.•Classification rate of 84.44% after using only frequency centroid, bandwidth and silence ratios but extremely slow training and variable results (2.34% std. dev. in classification rate)•Baseline study: Study by Zhang and Kuo [1] a classification rate of ~90% was reported, using a rule-based heuristic. However better results are expected on increasing database size. References: [1] Hierarchical System for Content-based Audio Classification and Retrieval, Tong Zhang, C.-C. Jay Kuo, Proc. SPIE Vol. 3527, p. 398-409, Multimedia Storage and Archiving Systems III,


View Full Document

UW-Madison ECE 539 - Audio classification

Documents in this Course
Load more
Download Audio classification
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Audio classification and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Audio classification 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?