MS&E246:Lecture10RepeatedgamesRamesh JohariWhatisarepeatedgame?A repeated game is:A dynamic game constructed by playingthe same game over and over.It is a dynamic game of imperfect information.Thislecture• Finitely repeated games• Infinitely repeated games• Trigger strategies• The folk theoremStagegameAt each stage, the same game is played:the stage game G.Assume:• G is a simultaneous move game•In G, player i has:•Action set Ai•Payoff Pi(ai, a-i)FinitelyrepeatedgamesG(K) : G is repeated K timesInformation sets:All players observe outcome of each stage.What are:strategies? payoffs? equilibria?HistoryandstrategiesPeriod t history ht: ht= (a(0), …, a(t-1)) wherea(τ) = action profile played at stage τStrategy si: Choice of stage t action si(ht) ∈ Aifor each history ht i.e. ai(t) = si(ht)PayoffsAssume payoff = sum of stage game payoffsExample:Prisoner’sdilemmaRecall the Prisoner’s dilemma:(2,2)(0,4)cooperate(4,0)(1,1)defectcooperatedefectPlayer 1Player 2Example:Prisoner’sdilemmaTwo volunteersFive roundsNo communication allowed!511111Player 2511111Player 1Total54321RoundSPNESuppose aNEis a stage game NE.Any such NE gives a SPNE:Player i plays aiNEat every stage, regardless of history.Question: Are there any other SPNE?SPNEHow do we find SPNE of G(K)?Observe:Subgame starting after history htis identical to G(K - t)SPNE:UniquestagegameNESuppose G has a unique NE aNEThen regardless of period K history hK,last stage has unique NE aNE⇒ At SPNE, si(hK) = aiNESPNE:BackwardinductionAt stage K -1, given s-i(·), player i chooses si(hK -1) to maximize:Pi(si(hK -1), s-i(hK -1)) + Pi(s(hK))payoff at stage K -1 payoff at stage KSPNE:BackwardinductionAt stage K -1, given s-i(·), player i chooses si(hK -1) to maximize:Pi(si(hK -1), s-i(hK -1)) + Pi(aNE)payoff at stage K -1 payoff at stage KWe know: at last stage, aNEis played.SPNE:BackwardinductionAt stage K -1, given s-i(·), player i chooses si(hK -1) to maximize:Pi(si(hK -1), s-i(hK -1))payoff at stage K -1⇒ Stage game NE again!SPNE:ConclusionTheorem:If stage game has unique NE aNE,then finitely repeated game hasunique SPNE:si(ht) = aiNEfor all htExample:Prisoner’sdilemmaMoral: “Cooperate” should never be played.Axelrod’s tournament (1980):Winning strategy was “tit for tat”:Cooperate if and only ifyour opponent did so at the last stageSPNE:MultiplestagegameNENote: If multiple NE exist for stage game NE,there may exist SPNE where actions are played that appear inno stage game NE(See Gibbons, 2.3.A)Infinitelyrepeatedgames• History, strategy definitions same as finitely repeated games•Payoffs:Sum might not be finite!DiscountingDefine payoff as:i.e., discounted sum of stage game payoffsThis game is denoted G(δ, ∞)(Note: (1 - δ) is a normalization)DiscountingTwo interpretations:1. Future payoffs worth lessthan today’s payoffs2. Total # of stages is ageometric random variableFolktheorems• Major problem with infinitely repeated games:If players are patient enough,SPNE can achieve “any” reasonable payoffs.Prisoner’sdilemmaConsider the following strategies, (s1, s2):1. Play C at first stage.2. If ht= ( (C,C), …, (C,C) ), then play C at stage t.Otherwise play D.i.e., punish the other player for defectingPrisoner’sdilemmaNote: G(δ, ∞) is stationaryCase 1: Consider any subgame where at least one player has defected in ht.Then (D,D) played forever.This is NE for subgame, since (D,D) is stage game NE.Prisoner’sdilemmaStep 2: Suppose ht= ( (C,C), …, (C,C) ).Player 1’s options:(a) Follow s1⇒ play C forever(b) Deviate at time t ⇒ play D foreverPrisoner’sdilemmaGiven s2:Playing C forever gives payoff:(1- δ) ( P1(C,C) + δ P1(C,C) + … ) = P1(C,C)Playing D forever gives payoff:(1- δ) ( P1(D,C) + δ P1(D,D) + … )= (1-δ) P1(D,C) + δ P1(D,D)Prisoner’sdilemmaSo cooperate if and only if:P1(C,C) ≥ (1 - δ) P1(D,C) + δ P1(D,D)Note: if P1(C,C) > P1(D,D),then this is always true for δ close to 1Conclude:If δ close to 1, then (s1, s2) is an SPNEPrisoner’sdilemmaIn our game: Need 2 ≥ (1 - δ) 4 + δ ⇒ δ ≥ 2/3So cooperation can be sustained if time horizon is finite but uncertain.TriggerstrategiesIn a (Nash) trigger strategy for player i :1. Play aiat first stage.2. If ht= ( a, …, a ), then play aiat stage t.Otherwise play aiNE.TriggerstrategiesIf a Pareto dominates aNE, trigger strategies will be an SPNEfor large enough δFormally: needPi(a) > (1 - δ) Pi(ai’, a-i) + δ Pi(aNE)for all players i and actions ai’.AchievablepayoffsAchievable payoffs:T = Convex hull of { ( P1(a), P2(a) ) : ai∈ Si}e.g., in Prisoner’s Dilemma:(0,4)(4,0)(2,2)(1,1)P1P2AchievablepayoffsandSPNEA key result in repeated games:Any “reasonable” achievable payoff can be realized in an SPNE of the repeated game,if players are patient enough.Simple proof: generalize prisoner’s dilemma.Randomization• To generalize, suppose before stage tall players observe i.i.d. uniform r.v. Ut• History:ht= (a(0), …, a(t-1), U0, …, Ut)•Players can use Utto coordinatestrategies at stage tRandomizationE.g., suppose players want to achieveP = α P(a) + (1 - α) P(a’)If Ut≤ α : Player i plays aiIf Ut> α : Player i plays ai’We’ll call this the P-achieving action for i.(Uniquely defined for all P ∈ T.)RandomizationE.g., Prisoner’s DilemmaLet P = (3,1).P‐achieving actions:Player 1 plays C if Ut≤ ½and D if Ut> ½Player 2 plays C if Ut≤ ½and C if Ut> ½(0,4)(4,0)(2,2)(1,1)P1P2(3,1)RandomizationandtriggeringSo now suppose P ∈ T and:Pi> Pi(aNE) for all iTrigger strategy: Punish forever (by playing aiNE) if opponent deviates from P-achieving actionRandomizationandtriggeringBoth players using this trigger strategyis again an SPNE for large enough δ.Formally: need(1 - δ) Pi(pi, p-i) + δ Pi> (1 - δ) Pi(ai’, p-i) + δ Pi(aNE)for all players i and actions ai’.(Here p is P-achieving action for player i, andp-iis P-achieving action vector for all other players.)RandomizationandtriggeringBoth players using this trigger strategyis again an SPNE for large enough δ.Formally: need(1 - δ) Pi(pi, p-i) + δ Pi> (1 - δ) Pi(ai’, p-i) + δ Pi(aNE)for all players i and actions ai’.(At time t:LHS is payoff if player i does not deviate after seeing Ut; RHS is payoff if player i deviates to
View Full Document