New version page

Lee_AAMAS2009

This preview shows page 1-2-3 out of 8 pages.

View Full Document
View Full Document

End of preview. Want to read all 8 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

Learning a Model of Speaker Head Nods using GestureCorporaJina LeeInstitute for Creative TechnologiesUniversity of Southern California13274 Fiji Way, Marina del Rey, CA 90292 [email protected] MarsellaInstitute for Creative TechnologiesUniversity of Southern California13274 Fiji Way, Marina del Rey, CA 90292 [email protected] face-to-face conversation, the speaker’s head is con-tinually in motion. These movements serve a variety of im-portant communicative functions. Our goal is to develop amodel of the speaker’s head movements that can be used togenerate head movements for virtual agents based on a ges-ture annotation corpora. In this paper, we focus on the firststep of the head movement generation process: predictingwhen the speaker should use head nods. We describe ourmachine-learning approach that creates a head nod modelfrom annotated corpora of face-to-face human interaction,relying on the linguistic features of the surface text. Wealso describe the feature selection process, training process,and the evaluation of the learned model with test data indetail. The result shows that the model is able to predicthead nods with high precision and recall.Categories and Subject DescriptorsI.2.6 [Artificial Intelligence]: Learning; I.2.11 [DistributedArtificial Intelligence]: Intelligent agentsGeneral TermsDesign, Human FactorsKeywordsVirtual Agents, Embodied Conversational Agents, Nonver-bal Behaviors, Head Nods, Machine Learning1. INTRODUCTIONDuring face-to-face conversation, the head is constantly inmotion, especially during speaking turns [12]. These move-ments are not random; research has identified a number ofimportant functions served by head movements [24] [17] [13][14]. Head movements provide a range of information inaddition to the verbal channel. We may nod to show ouragreement with what the other is saying, shake our heads toexpress disbelief, or tilt the head upwards along with gazeaversion when pondering something. In addition to servingthese explicit functions, head movements may also influencethe observer in more subtle ways. For example, overt headCite as: Learning a Model of Speaker Head Nods using Gesture Cor-pora, Jina Lee, Stacy Marsella, Proc. of 8th Int. Conf. on Au-tonomous Agents and Multiagent Systems (AAMAS 2009),Decker, Sichman, Sierra and Castelfranchi (eds.), May, 10–15, 2009, Bu-dapest, Hungary, pp. XXX-XXX.Copyrightc° 2009, International Foundation for Autonomous Agents andMultiagent Systems (www.ifaamas.org). All rights reserved.movements is found to be instrumental in the formation ofan observer’s affective response to the speaker [32]. Addi-tionally, the various head movements we make during con-versation make the interaction look more natural.Consistent with the important role that head movementsplay in human-human interaction, virtual agent systems haveincorporated head movements to realize a variety of func-tions [1] [4] [5] [10] [20] [21] [30]. The incorporation of ap-propriate head movements in a virtual agent has been shownto have positive effects during human-agent interaction [27].The goal of our work is to build a domain-independent modelof speaker’s head movements that can be used to generatehead movements for virtual agents. To use the model forinteractive virtual agents, we design it to work in real-timeand to be flexible enough to be used in different virtual agentsystems.Often virtual humans use hand-crafted models to generatehead movements. For instance, in our previous work wedeveloped the Nonverbal Behavior Generator (NVBG) [21],which is a rule-based system that analyzes the informationon the agent’s cognitive processing, such as its internal goalsand emotional state, but also analyzes the syntactic andsemantic structure of the surface text to generate a rangeof nonverbal behaviors. To specify which nonverbal behav-iors should be generated at each given context, the knowl-edge from the psychological literature and analysis of humannonverbal behavior corpora are used to identify the salientfactors most likely to be associated with certain nonverbalbehaviors.As with a number of systems [1] [4] [5] [20] that generatenonverbal behaviors for virtual humans, the NVBG workstarts with specific factors that would cause various ges-tures to be displayed. Although the knowledge encoded inthe NVBG rules has been reused and demonstrated to be ef-fective across a range of applications [31] [33] [18] [15], thereare limitations with this approach. One major drawback isthat the rules have to be hand-crafted. This means that theauthor of the rules is required to have a broad knowledge ofthe phenomena he/she wishes to model. However, as moreand more factors are added that may influence the myriadof behaviors generated, it becomes harder to specify how allthose factors contribute to the overall outcome. Unless therule-author has a complete knowledge on the correlationsof the various factors, manual rule construction may sufferfrom sparse coverage of the rich phenomena.Feature Encoding & Data ConstructionFeature SelectionUH, 0, 0, Inform, 1, ... 0, shakePER, 0, 0, Inform, 0, ..., 0, noneNNP, 1, 0, Inform, 0, ..., 0, noneNNP, 0, 0, Inform, 0, ... 0, nonePER, 0, 0, Suggest, 0, ... 1, nod...UH, Inform, 1, 0, 1, 0, not_nodPER, Inform, 0,1, 1, 0, not_nodNNP, Inform, 0, 0, 0, 0, not_nodNNP, Inform, 0, 1, 0, 0, not_nodPER, Suggest, 0, 0, 0, 1, nod...TranscriptHead MovementDialogue Acts...Gesture CorporaTrainingPredicted HeadMovementp2p1NOD HMMNOT_NOD HMMFigure 1: Overview of the head nod prediction framework. The information in the gesture corpus is encodedand aligned to construct the data set. The feature selection process chooses a subset of the features thatare most correlated with head nods. Using these features, probabilistic sequential models are trained andutilized to predict whether or not a head nod should occur.To complement the limitations of our previous rule-basedapproach, we present a data-driven, automated approachto generate speaker nonverbal behaviors, which we demon-strate and evaluate. Specifically, the approach uses a ma-chine learning technique (i.e. learning a hidden Markovmodel [29]) to create a head nod model from annotated cor-pora of face-to-face human interaction. Because our goal isa flexible system that can be used in different virtual agentsystems with various approaches to natural language gener-ation, we restrict the features used in the machine learningto those available across


Loading Unlocking...
Login

Join to view Lee_AAMAS2009 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lee_AAMAS2009 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?