This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

840 IEEE TRANSACTIONS ON ROBOTICS, VOL. 23, NO. 5, OCTOBER 2007Enabling Multimodal Human–Robot Interactionfor the Karlsruhe Humanoid RobotRainer Stiefelhagen, Hazım Kemal Ekenel, Christian F¨ugen, Petra Gieselmann, Hartwig Holzapfel, Florian Kraft,Kai Nickel, Michael Voit, and Alex WaibelAbstract—In this paper, we present our work in buildingtechnologies for natural multimodal human–robot interaction.We present our systems for spontaneous speech recognition,multimodal dialogue processing, and visual perception of a user,which includes localization, tracking, and identification of the user,recognition of pointing gestures, as well as the recognition of aperson’s head orientation. Each of the components is described inthe paper and experimental results are presented. We also presentseveral experiments on multimodal human–robot interaction, suchas interaction using speech and gestures, the automatic determi-nation of the addressee during human–human–robot interaction,as well on interactive learning of dialogue strategies. The workand the components presented here constitute the core buildingblocks for audiovisual perception of humans and multimodalhuman–robot interaction used for the humanoid robot developedwithin the German research project (Sonderforschungsbereich)on humanoid cooperative robots.Index Terms—Audiovisual perception, human-centeredrobotics, human–robot interaction, multimodal interaction.I. INTRODUCTIONOVER the last decade, much research effort has been fo-cused on the development of humanoid robots, and greatprogress has been made in developing robots with human-liketorsi, including legs, arms, head etc., as well as some human-likemotoric skills, such as walking, grasping, dancing [3], [4], [29],[33], [41], [46]. Other researchers have focussed on advancingrobots’ capabilities in perceiving, interacting, and cooperatingwith humans [2], [9], [12], [18].Manuscript received October 14, 2006; revised May 23, 2007. This paper wasrecommended for publication by Associate Editor C. Laschi and Editor H. Araiupon evaluation of the reviewers’ comments. This work was supported in partby the German Research Foundation (DFG) under SonderforschungsbereichSFB 588—Humanoid Robots. This paper was presented in part at the 13thEuropean Signal Processing Conference, Antalya, Turkey, 2005, in part at theICSLP, Jeju-Islands, Korea, 2004, in part at the International Conference onMultimodal Interfaces (ICMI), State College, 2004, in part at the KI 2006,Bremen, Germany, in part at the INTERSPEECH, Pittsburgh, PA, 2006, inpart at the Proceedings of the 7th International Conference on MultimodalInterfaces, Trento, Italy, October 4–6, 2005, in part at the Sixth InternationalConference on Face and Gesture Recognition—FG 2004, May, Seoul,Korea, and in part at the First International CLEAR Evaluation Workshop,Southampton, U.K., April 2006.R. Stiefelhagen, H. K. Ekenel, C. F¨ugen, H. Holzapfel, F. Kraft,K. Nickel, and M. Voit are with the Interactive Systems Laboratories,Universit¨at Karlsruhe (TH), 76131 Karlsruhe, Germany (e-mail: [email protected]; [email protected]; [email protected]; [email protected];[email protected]; [email protected]; [email protected]).P. Gieselmann is with Lucy Software and Services GmbH, 81643 Muenchen,Germany (e-mail: [email protected]).A. Waibel is with Carnegie Mellon University, Pittsburgh, PA 15213 USA,and also with the Interactive Systems Laboratories, Universit¨at Karlsruhe (TH),76131 Karlsruhe, Germany (e-mail: [email protected]).Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TRO.2007.907484In the framework of the German research center SFB 588“Humanoid Robots—Learning and cooperating multimodalrobots” [1], an interdisciplinary researcher team is working onthe development of humanoid robots that can safely coexist,cooperate, and interact with humans in their daily environment.To this end, we focus on the development of an appropriatehuman-like robotic platform [6] as well as on the developmentof components that are necessary to facilitate human-friendly,natural human–robot interaction.Our own research has been the development of such compo-nents to enable the perception of the user(s), including many oftheir important communication cues, such as speech, gestures,head orientation among others, to develop mechanisms to fuseand understand the perceptional and communicative cues, andto build multimodal dialogue components that enable the robotto engage in task-oriented dialogue with their users.In this paper, we present many of the core perceptual andinteraction components that we have developed for the hu-manoid robots. These include speech recognition, multimodaldialogue processing, visual detection, and tracking and identi-fication of users, including head-pose estimation and pointinggesture recognition. All components have been integrated on amobile robot platform and can be used for real-time multimodalinteraction with a robot. We also report on several human–robot interaction experiments that we conducted. These includeexperiments on interaction using speech and gestures, the au-tomatic determination of the addressee in human–human–robotinteraction, as well as interactive learning of efficient dialoguestrategies.The remainder of this work is organized as follows: InSection II, we give a system overview of the developed compo-nents and on information flow between the components. We alsogive some background on the robotic platform. In Section III,we describe the components for speech recognition, visual per-ception of the user, and dialogue processing. In Section IV, we,then, present some multimodal human–robot interaction exper-iments. We conclude the paper in Section V.II. SYSTEM OVERVIEWFig. 1 shows the humanoid robot ARMAR III and its prede-cessor ARMAR II that are being developed in Karlsruhe withinthe SFB 588 “Humanoid Robots.” For a detailed descriptionof the ARMAR platform, we refer to [6]. The sensors availableon the robot head are a stereo camera system and six omnidirec-tional microphones. For speech recognition, we, alternatively,use a remote close-talking microphone. The current version ofthe robot has a 1.6-GHz industrial personal computer (IPC) that1552-3098/$25.00 © 2007 IEEEAuthorized licensed use limited to: University of Southern California. Downloaded on October 16, 2008 at


View Full Document

USC CSCI 584 - HRIKarlsruhe

Documents in this Course
Load more
Download HRIKarlsruhe
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view HRIKarlsruhe and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view HRIKarlsruhe 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?