Unformatted text preview:

Emotional Speech CS 4706 Julia Hirschberg thanks to Jackson Liscombe and Lauren Wilcox for some slides Outline Why study emotional speech Why is modeling emotional speech so difficult Production and perception studies Voice Quality features the holy grail CS 4706 2 Why study emotional speech Recognition Customer care centers Tutoring systems Automated agents Wildfire Generation Characteristics of emotional speech little understood so hard to produce a voice that sounds friendly sympathetic authoritative TTS systems Games CS 4706 3 Emotion in Spoken Dialogue Systems Batliner Huber Fischer Spilker N th 2003 Verbmobil Wizard of Oz scenarios Ang Dhillon Krupski Shriberg Stolcke 2002 DARPA Communicator Liscombe Guicciardi Tur Gokken Tur 2005 How May I Help You call center Lee Narayanan 2004 Speechworks call center Liscombe Hirschberg Venditti 2005 ITSpoke Tutoring System physics CS 4706 4 Why is emotional speech so hard to model Colloquial definitions of speakers and listeners technical definitions Utterances may convey multiple emotions simultaneously Result Human consensus low Hard to get reliable training data CS 4706 5 Spontaneous Corpora Unconstrained Campbell 2003 Roach 2000 Cowie et al 2001 Call centers Vidrascu Devillers 2005 Ang et al 2002 Litman and Forbes Riley 2004 Batliner et al 2003 Lee Narayanan 2005 Meetings Wrede and Shriberg 2003 CS 4706 6 Acted Corpora happy sad angry confident frustrated friendly interested anxious bored encouraging CS 4706 7 LDC Emotional Prosody and Transcripts cor pus Semantically neutral dates and numbers 8 actors 15 emotions CS 4706 8 Are Emotions Mutually Exclusive User study to classify tokens from LDC Emotional Prosody corpus 10 emotions only Positive confident encouraging friendly happy interested Negative angry anxious bored frustrated sad Example CS 4706 9 Emotion Intercorrelations Emotion sad sad angry bored 0 44 0 44 frust anxs friend 0 26 0 22 0 27 angry 0 70 0 21 0 41 bored 0 14 0 14 0 28 0 32 frustrated anxious friendly conf 0 32 happy inter encour 0 42 0 32 0 33 0 37 0 09 0 32 0 17 0 32 0 42 0 27 0 43 0 09 0 47 0 16 0 39 0 14 0 25 0 17 0 44 0 77 0 14 0 59 0 75 confident 0 45 0 51 happy 0 58 0 73 interested 0 62 encouraging p 0 001 CS 4706 10 Results Emotions are heavily correlated Positive with positive Negative with negative Emotions are non exclusive Can they be clustered empirically Activation Valency CS 4706 11 Different Valence Activation Global Pitch Statistics CS 4706 12 Different Valence Same Activation CS 4706 13 Identifying Emotions Automatic Acoustic prosodic Davitz 1964 Huttar 1968 Global characterization pitch loudness speaking rate Intonational Contours Mozziconacci Hermes 1999 Spectral Tilt Banse Scherer 1996 Ang et al 2002 CS 4706 14 Machine Learning Experiment RIPPER 90 10 split Binary classification for each emotion Results 62 average baseline 75 average accuracy Acoustic prosodic features for activation H L for negative L L for positive Spectral tilt for valence CS 4706 15 Accuracy Distinguishing One Emotion from the Rest Emotion Baseline Accuracy angry 69 32 77 27 confident 75 00 75 00 happy 57 39 80 11 interested 69 89 74 43 encouraging 52 27 72 73 sad 61 93 80 11 anxious 55 68 71 59 bored 66 48 78 98 friendly 59 09 73 86 frustrated 59 09 73 86 CS 4706 16 A Call Center Application AT T s How May I Help You system Customers often angry and frustrated CS 4706 17 HMIHY Example Very Frustrated Somewhat Frustrated CS 4706 18 Pitch Energy and Rate CS 4706 19 Features Automatic Acoustic prosodic Contextual Cauldwell 2000 Lexical Schr der 2003 Brennan 1995 Pragmatic Ang et al 2002 Lee Narayanan 2005 CS 4706 20 Results Feature Set Accuracy Rel Improv over Baseline Majority Class 73 1 pros lex 76 1 pros lex da 77 0 1 2 all 79 0 3 8 CS 4706 21 Tutoring Systems Should Respond to Uncertainty SCoT Pon Barry et al 2006 Responding to uncertainty Active listening Hinting vs paraphrasing Features examined Latency Filled pauses Hedges Performance metric Learning gain But no improvement by responding to uncertainty CS 4706 22 What does uncertainty sound like CS 4706 23 pr01 sess00 prob58 CS 4706 24 Uncertainty in ITSpoke um sigh I don t even think I have an idea here now mass isn t weight mass is the space that an object takes up is that mass 71 67 1 92 113 CS 4706 25 ITSpoke Experiment Human Human Corpus AdaBoost C4 5 90 10 split in WEKA Classes Uncertain vs Certain vs Neutral Results Features Accuracy Baseline 66 Acoustic prosodic 75 contextual 76 breath groups 77 CS 4706 26 ITSpoke Results Emotion Precision Recall F measure certain 0 611 0 602 0 606 uncertain 0 515 0 393 0 446 neutral 0 846 0 868 Emotion label 0 891 Classified as certain uncertain neutral certain 80 11 42 uncertain 26 35 28 neutral 25 22 384 CS 4706 27 Voice Quality and Emotion Perceptual coloring Derived from a variety of laryngeal and supralaryngeal features modal creaky whispered harsh breathy Correlates with emotion Laver 80 Scherer 86 Murray Arnott 93 Laukkanen 96 Johnstone Scherer 99 Gobl Chasaide 03 Fernandez 00 CS 4706 28 Phonation Gestures Adductive tension interarytenoid muscles adduct the arytenoid muscles Medial compression adductive force on vocal processes adjustment of ligamental glottis Longitudinal pressure tension of vocal folds CS 4706 29 Modal Voice Neutral mode Muscular adjustments moderate Vibration of vocal folds periodic full closing of glottis no audible friction Frequency of vibration and loudness in low to mid range for conversational speech CS 4706 30 Tense Voice Very strong tension of vocal folds very high tension in vocal tract CS 4706 31 Whispery Voice Very low adductive tension Medial compression moderately high Longitudinal tension moderately high Little or no vocal fold vibration Turbulence generated by friction of air in and above larynx CS 4706 32 Creaky Voice Vocal fold vibration at low frequency irregular Low tension only ligamental part of glottis vibrates The vocal folds strongly adducted Longitudinal tension weak Moderately high medial compression CS 4706 33 Breathy Voice Tension low Minimal adductive tension Weak medial compression Medium longitudinal vocal fold tension Vocal folds do not come together completely leading to frication CS 4706 34 Estimating Voice Quality Estimate wrt controlled neutral quality But how do we know the control is truly neutral Must must match the natural laryngeal behavior to laboratory neutral Our knowledge of models of vocal fold movements may be inadequate for describing real phonation


View Full Document

Columbia CS 4706 - emotion

Loading Unlocking...
Login

Join to view emotion and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view emotion and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?