UNCC ECGR 6185 - Sound Source Localization for Robot Auditory Systems

Unformatted text preview:

Y. Cho et al.: Sound Source Localization for Robot Auditory Systems Contributed Paper Manuscript received July 15, 2009 0098 3063/09/$20.00 © 2009 IEEE 1663Sound Source Localization for Robot Auditory Systems Youngkyu Cho, Dongsuk Yook, Member, IEEE, Sukmoon Chang, Member, IEEE, and Hyunsoo Kim Abstract — Sound source localization (SSL) is a major function of robot auditory systems for intelligent home robots. The steered response power-phase transform (SRP-PHAT) is a widely used method for robust SSL. However, it is too slow to run in real time, since SRP-PHAT searches a large number of candidate sound source locations. This paper proposes a search space clustering method designed to speed up the SRP-PHAT based sound source localization algorithm for intelligent home robots equipped with small scale microphone arrays. The proposed method reduces the number of candidate sound source locations by 30.6% and achieves 46.7% error reduction compared to conventional methods.1 Index Terms — Sound source localization, steered response power (SRP), search space clustering, small scale microphone array, robot auditory system, intelligent home robot. I. INTRODUCTION Following on from recent advances in humanoid robot technology, intelligent service robots are expected to work in the living environment in the near future. They will support human activities, such as housekeeping and assistance for elderly people. While much of the previous effort in the development of robot technology focused on robot locomotion and vision systems, establishing an effective communication method between humans and robots is an imperative. Speech recognition is one of the most promising communication tools for human-robot interaction, for both expert and non-expert users, since it offers bidirectional interaction and diverse levels of control. Thus, the development of the robot auditory system plays a potentially important role in developing intelligent home robots working seamlessly with human users [1]. A core component of the robot auditory system for human-robot interaction in home robot environments is sound source localization (SSL) [2]. When a user interacts with a humanoid robot using spoken language, the robot must be able to automatically find the location of the user, i.e., the location of the voice source. Fig. 1 describes a sound source localization 1 This work was supported by the Korea Research Foundation Grant funded by the Korean Government (KRF-2006-311-D00822). It was also supported by the MKE (Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute for Information Technology Advancement) (IITA-2008-C1090-0803-0006). Youngkyu Cho and Dongsuk Yook (corresponding author) are with the Speech Information Processing Laboratory, Department of Computer and Communication Engineering, Korea University, Seoul, 136-701, Republic of Korea (e-mail: [email protected] and [email protected]). They thank Samsung Electronics for their support. Sukmoon Chang is with Pennsylvania State University, Middletown, PA 17057, USA (e-mail: [email protected]). Hyunsoo Kim is with Samsung Electronics, Korea (e-mail: [email protected]). scenario for the small service robot developed at Samsung Electronics. For example, if the user says “Come here!” to the robot from a distance, the robot must be able to identify the user’s location, to respond appropriately. Moreover, the accurate estimation of the sound source location enhances speech quality by beamforming multichannel sound signals. This is useful for robots to recognize distant speech. In addition, sound source localization is one way to find the location of the speaker even in the dark. Many sound source localization methods have been proposed. For example, methods based on the time difference of arrival (TDOA) use generalized cross correlation (GCC) [3] to estimate the TDOAs and relate them to the location of the sound source. Methods based on high resolution spectral analysis [4] use spatial spectra derived from the signals to locate sound sources. Steered response power (SRP) methods [5] electronically steer the microphone array to locate the sound source with the highest power. The steered response power with the phase transform filter (SRP-PHAT) is a robust method for sound source localization when room reverberation is present [6]. However, SRP-PHAT methods usually employ a grid search scheme that examines a large number of the candidate sound source locations. Therefore, SRP-PHAT using the computationally intensive grid search method cannot be used in real-time systems, such as those of small size service robots with limited computational power. Several search methods have been proposed for real-time SRP-PHAT [7]-[9]. A hierarchical search method was proposed in [7] that gradually prunes the candidate sound source locations in a coarse-to-fine search. A drawback of this method is that it may prematurely prune the sound source with the highest power, before it reaches the final decision. A hybrid method was proposed to speed up SRP-PHAT in [8]. This method first generates a small set of candidate sound source locations using a TDOA-based search. It then performs a SRP-PHAT-based grid search on this small set of candidate locations. If TDOA estimation is unsuccessful in the first step, then SRP-PHAT will fail in the final decision. This method Fig. 1. Sound source localization of a small size home robot to find theuser’s location. Where? Come here! Authorized licensed use limited to: University of North Carolina at Charlotte. Downloaded on February 25,2010 at 11:35:57 EST from IEEE Xplore. Restrictions apply.IEEE Transactions on Consumer Electronics, Vol. 55, No. 3, AUGUST 2009 1664decreases sound source localization performance in low signal-to-noise (SNR) environments due to its dependency on TDOA estimation to generate the small set of candidate locations. In [9], the cross-correlation functions are used to find the time delays that may correspond to the sound source location. An inverse mapping function relates a relative time delay to a set of candidate locations. Only the output powers of the locations that are inversely mapped by the time delays are considered to find the maximum power location. However, this method may fail to find the maximum power location, because it searches only a few


View Full Document

UNCC ECGR 6185 - Sound Source Localization for Robot Auditory Systems

Documents in this Course
Zigbee

Zigbee

33 pages

Load more
Download Sound Source Localization for Robot Auditory Systems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Sound Source Localization for Robot Auditory Systems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Sound Source Localization for Robot Auditory Systems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?