DOC PREVIEW
USC CSCI 534 - Lecture2011-4-ToM

This preview shows page 1-2-3-25-26-27-28-50-51-52 out of 52 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Slide 46Slide 47Slide 48Slide 49Slide 50Slide 51Slide 52David V. PynadathAffective Behavior with Theory of Mind2Reason vs. EmotionDistinction in philosophy–Reason should be slave of emotion (Hume)–Our emotions impede our reason (Stoics)Distinction in computational modeling–Emotions are irrational–Emotion is needed for more accurate modelsBut what is reason?3Rational Decision MakingWhat is “rational”–e.g., for the row player in IPD?Cooperate DefectCooperate 3,3 0,5Defect 5,0 1,14Is defecting rational?It is a Nash equilibrium strategy–It is a best response to either action–But what if opponent is playing tit-for-tat?Cooperate DefectCooperate 3,3 0,5Defect 5,0 1,15Is cooperating rational?It produces a social optimum–Best response to tit-for-tat–But what if other player always defects?Cooperate DefectCooperate 3,3 0,5Defect 5,0 1,16Ideal Rational Agent (Russell & Norvig)“For each possible percept sequence, an ideal rational agent should do whatever action is expected to maximize its performance measure, on the basis of the evidence provided by the percept sequence and whatever built-in knowledge the agent has.”7Performance MeasureMy total score–Plus % of my partner's?–Sooner better than later?Cooperate DefectCooperate 3,3 0,5Defect 5,0 1,18KnowledgeI believe:–People are cooperative–People are selfishCooperate DefectCooperate 3,3 0,5Defect 5,0 1,19EvidenceIf I defect, then other defects:–Tit-for-tat?–Some other strategy?Cooperate DefectCooperate 3,3 0,5Defect 5,0 1,110Rational if maximizing performance“...whatever action is expected to maximize its performance measure, on the basis of the evidence...”Decision Theory–Utility represents performance measure–Probability distribution captures evidence–Choose action to maximize expected utility11Rational if using knowledge of other“...the evidence provided by the percept sequence and whatever built-in knowledge the agent has.”Theory of Mind–Other people are also reasoning–Maximizing their performance measure–...and also using Theory of Mind about me12PsychSimDecision Theory + Theory of Mind–Maximize expected utility–Uncertainty about model of the otherOpen Question:–Enough to express social phenomena?13Markov Decision ProblemsActionStateRewardState14Markov Decision ProblemsState, s–Money won by me–Money won by otherAction, a–Cooperate–DefectReward, R(s,a)–My money + α·other's money15Markov Decision ProblemsTransition Probability, P(s0,a,s1)–For every possible initial state and action➔Probability over resulting stateCompute Expected Reward–Vt(s0,a) = R(s0,a)+Σs1P(s0,a,s1)Vt-1(s1)where V(s) = maxa V(s,a)Many off-the-shelf algorithms16What's missing?R(s,a) and P(s0,a,s1)–Both depend on action of other playerCooperate DefectCooperate 3,3 0,5Defect 5,0 1,117Mental ModelsWhat is in the other players' head?–I can't read minds–But I have prior knowledge about people–And as we iterate the game, I get evidenceState has possible models of others–e.g., beliefs, reward, strategy, etc.Including mental models of me–e.g., my beliefs, my reward, my strategy, etc.»Including my mental models of other»...18Mental Models in IPDMental models are hidden state–Reward: My money + α·other's money–α =1 (altruistic), 0 (selfish), 0.5 (mixed)–Models of me: α =1 (altruistic), 0 (selfish)Other's action driven by model–Model affects transition of my action–Model affects reward of my action19Markov Decision Problems (MDPs)ActionState+Mode lRewardState+Mode l20Partially Observable MDPsActionState+Mode lRewardState+Mode lBeliefs21Partially Observable MDPsActionState+Mode lRewardState+Mode lBeliefsObservation22Partially Observable MDPsActionState+Mode lRewardState+Mode lBeliefsObservationBeliefs23Hypothetical Reasoning (mixed case)If other is altruistic:–Regardless of what other thinks of meI expect other to cooperate (6 > 5 and 5 > 1)So I will defect (5 > 4.5)If other is selfish:–Regardless of what other thinks of meI expect other to defect (5 >3 and 1 > 0)So I will cooperate (2.5 > 1.5)24Hypothetical Reasoning (mixed case)If other is mixed:–And thinks I am altruisticI expect other will defect (5 > 4.5)So I will cooperate (2.5 > 1.5)–And thinks I am selfishI expect other will cooperate (2.5 > 1.5)So I will defect (5 > 4.5)25Hypothetical Reasoning (mixed case)Unfortunately, uncertainty about other–ER(Cooperate)P(other is altruistic) * 4.5 ++P(other is selfish) * 2.5+P(other is mixed, thinks I am altruistic) * 2.5+P(other is mixed, thinks I am selfish) * 4.5–ER(Defect)P(other is altruistic) * 5.0 +P(other is selfish) * 1.5P(other is mixed, thinks I am altruistic) * 1.5P(other is mixed, thinks I am selfish) * 5.026Hypothetical Reasoning (mixed case)But this is Iterated Prisoner's Dilemma–Immediate reward is only one part–My action affects other's beliefs about me–Which in turn affects other's future behavior–Which in turn affects my future rewardsI update my beliefs as well–Observe other's action–Modify my distribution over α =0, 0.5, 127ConsistencyIf I observe the other cooperate:–I should increase belief that other is altruistic–I should decrease belief that other is selfishComputationally:–Agents more likely to pursue higher reward–Models that give observed action higher reward are more likely28Hypothetical Reasoning RevisitedIf other is altruistic:–Regardless of what other thinks of meI expect other to cooperate (6 > 5 and 5 > 1)So I will defect (5 > 4.5)If other is selfish:–Regardless of what other thinks of meI expect other to defect (5 >3 and 1 > 0)So I will cooperate (2.5 > 1.5)29Belief UpdateModel that better explains behavior–Reuse expectations already generated–Favor models where behavior has higher rewardUse same mechanism to model other–if I cooperate, other thinks I'm more altruistic–If I defect, other thinks I'm more selfish30Decision Cycle and Belief UpdateActionState+Mode lRewardState+Mode


View Full Document

USC CSCI 534 - Lecture2011-4-ToM

Download Lecture2011-4-ToM
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture2011-4-ToM and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture2011-4-ToM 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?