CMU LTI 11751 - An Acoustic Model of the Water Noise in the Dolphin Project

Unformatted text preview:

IntroductionModelsSetupAcoustic modelsLanguage models and DictionaryDatabaseEvaluation planSCliteEval.plDevelopment CycleIteration 0Iteration 1 (more labels from Bahamas database)Present workFuture workLessons learntTimeline and goals reachedReferenceCarnegie Mellon UniversityLanguage Technology InstituteAn Acoustic Model of the Water Noisein the Dolphin Project11-751 Speech Recognition and UnderstandingFall 2004Instructor: Dr. Tanja SchultzKai-min Kevin ChangAlex KangIntroductionTowards Communication with Dolphins is an ambitious long-term scientific study ofTursiops Truncatus’ (Atlantic Bottlenose Dolphin) speech at Carnegie Mellon University.Working with the Wild Dolphin Project, a non-profit, all-volunteer organization, the twogroups set out a goal to utilize speech recognition technology to better classify dolphinrecordings. Since the establishment of the Wild Dolphin Project in 1989, giga bytes ofdolphin sound have been recorded but left untranscribed. Manual transcription wouldconsume hours of tedious routine work. Thus, in our term project for 11-751 SpeechRecognition and Understanding, we propose to train acoustic models for water noise witha goal of automatic classification of dolphin sound and water noise.Literature surveyWater noise modeling can be classified into two schools. On one hand, a noisedetection model uses standard HMM to classify a series of sound units into noise andnon-noise. On the other hand, a noise filter model utilizes signal processing theories toseparate noise and non-noise into different channels. Given that water noise isbackground noise that is often prevalent throughout the entire recordings, a noise filtermodel is of ultimate interest. However, a noise filter model is an open question in signalprocessing and is beyond the scope of a term project. For the purpose of automaticclassification of dolphin sound and water noise, a noise detection model suffices.A typical recording in the Dolphin Project contains a range of dolphin sound andnoises. There are three types of dolphin sound, namely broadband clicks used inecholocation, broadband burst pulses, and whistles. The noises are more difficult todescribe. Depending on different recording equipment / environment, noises includehuman speech, various types of machine noises that are due to microphones, propeller ofthe boat, and various types of water noise that are due to splashes, habitants in the water,and perhaps different types and depths of waterbed. Previous work in the Dolphin project involves identification of different dolphinsusing Janus speech recognition toolkit. Janus is a HMM-based speech recognizer that isdeveloped in the Interactive Systems Lab at CMU. In a HMM speech recognizer,acoustic models are trained to capture the phonetic feature of the interested classification.Hence, the dolphin-ID project trained an acoustic model for each recognized dolphinbased on its signature whistles, while aggregating one acoustic model for all other noises.Initial success in distinguishing dolphins has been achieved. The present work will be anextension to the dolphin-ID project, while focusing on differentiation of water noise fromdolphin sound and all other noises. That is, we will aggregate one acoustic model for alldolphin sound, one model of the water noise, and one acoustic model for all noises.ApproachModelsSetupExtending the Dolphin-ID project, we established current project in the InteractiveSystems Laboratories. The linux box meenie.is.cs.cmu.edu has been setup with audiosupport, which enables us to view the spectrum of the utterance via Emulabel programand to hear the utterance via headset / speaker. Janus speech recognition toolkit, a HMM-based speech recognizer will be used to train the acoustic models. Acoustic modelsThe phoneme topology for non-silence phonemes (ie. dolphin sound) will be threestates with transitions to themselves and to the next state. The silence, water noise, andgarbage phones will only use one state. All the transitions have equal probability (0.5).We will start with a fully continuous densities system with 39 Gaussian mixture models.The setup is adopted from the dolphin-ID project.Language models and DictionaryInitially, the dictionary contains only 4 entries: dolphin, water, pause, garbage andsilence. A uni-gram language model will be used. Thus, each vocabulary has uniformprobability that is equal to one another.DatabaseThere are two sets of dolphin recordings: CMU database and Bahamas recording.CMU database - CMU database contains utterances that are recorded in the WildDolphin Project. Utterances are selected from those of clear dolphin sound. Therecordings contain either dolphin or water noises, and rarely silence or garbage. Thereare 166 utterances in the CMU database. Each utterance is roughly 15 seconds induration and sampled at 10kHz. 100 utterances will be randomly selected to the trainingset, while the remaining 66 utterances be the unseen testing set. The recordings havebeen labeled by Dr. Alan Black. The CMU database was the source of data in theDolphin-ID project.Bahamas database – Bahamas database contains utterances that are recorded duringDr. Bob Frederking’s trip to Bahamas in 2004. There is five days worth of untranscribedrecordings. Unfortunately, a constant buzziness throughout the first four days ofrecordings rendered those recordings bad source of data. Thus, we decide to select theutterances from the last day, when the recording was done by manually directing themicrophone toward dolphin. There are 56 utterances in the Bahamas database. Eachutterance is roughly 15 seconds in duration and sampled at 10kHz. Because we onlyhave 56 recordings available, we elect not to distinguish training and testing set. Duringthe labeling process, we found the recordings to be qualitatively different from the CMUdatabase. Whereas CMU database contains utterances that contain distinctive dolphinand water sound, Bahamas database contains lots of silence. This difference is expectedto effect the recognition accuracy.Evaluation planSCliteThe conventional metric for speech recognition is a recognition accuracy calculatedbased on word accuracy as follows.WACC = (Len – (Sub + Ins + Del) / Len ) * 100%where Sub, Ins, Del and Len are the numbers of


View Full Document

CMU LTI 11751 - An Acoustic Model of the Water Noise in the Dolphin Project

Download An Acoustic Model of the Water Noise in the Dolphin Project
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view An Acoustic Model of the Water Noise in the Dolphin Project and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view An Acoustic Model of the Water Noise in the Dolphin Project 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?