Unformatted text preview:

CPS 296.3 Learning in gamesLearning in (normal-form) gamesIterated best responseFictitious playDoes the historical distribution of play converge to equilibrium?Historical distribution (non)convergence for fictitious playCorrelated equilibrium [Aumann 74]A correlated equilibrium for “chicken”A nonzero-sum variant of rock-paper-scissors (Shapley’s game [Shapley 64])RegretTargeted learning“Teaching”Evolutionary game theoryStabilityEvolutionarily stable strategiesCPS 296.3Learning in gamesVincent Conitzer [email protected] in (normal-form) games•Approach we have taken so far when playing a game: just compute an optimal/equilibrium strategy•Another approach: learn how to play a game by–playing it many times, and –updating your strategy based on experience•Why?–Some of the game’s utilities (especially the other players’) may be unknown to you–The other players may not be playing an equilibrium strategy–Computing an optimal strategy can be hard–Learning is what humans typically do–…•Learning strategies ~ strategies for the repeated gameIterated best response0, 0 -1, 1 1, -11, -1 0, 0 -1, 1-1, 1 1, -1 0, 0•In the first round, play something arbitrary•In each following round, play a best response against what the other players played in the previous round•If all players play this, it can converge (i.e. we reach an equilibrium) or cycle-1, -1 0, 00, 0 -1, -1• Alternating best response: players alternatingly change strategies: one player best-responds each odd round, the other best-responds each even roundrock-paper-scissorsa simple coordination gameFictitious play0, 0 -1, 1 1, -11, -1 0, 0 -1, 1-1, 1 1, -1 0, 0• In the first round, play something arbitrary• In each following round, play a best response against the historical distribution of the other players’ play– I.e. as if other players randomly select from their past actions• Again, if this converges, we have a Nash equilibrium•Can still fail to converge…-1, -1 0, 00, 0 -1, -1rock-paper-scissorsa simple congestion gameDoes the historical distribution of play converge to equilibrium?•… for iterated best response?•… for fictitious play?3, 0 1, 21, 2 2, 1Historical distribution (non)convergence for fictitious play•Historical distribution under fictitious play does not converge for Shapley’s game (starting with (U, M)):0, 0 0, 1 1, 01, 0 0, 0 0, 10, 1 1, 0 0, 0•The historical distribution under fictitious play converges for–generic 2x2 games [Miyasawa 61]–zero-sum games [Robinson 51]– games solvable by iterated strict dominance [Nachbar 90]Correlated equilibrium [Aumann 74]•Suppose there is a mediator who has offered to help out the players in the game•The mediator chooses a profile of pure strategies, perhaps randomly, then tells each player what her strategy is in the profile (but not what the other players’ strategies are)•A correlated equilibrium is a distribution over pure-strategy profiles for the mediator, so that every player wants to follow the recommendation of the mediator (if she assumes that the others do so as well)•Every Nash equilibrium is also a correlated equilibrium–Corresponds to mediator choosing players’ recommendations independently•… but not vice versa•(Note: there are more general definitions of correlated equilibrium, but it can be shown that they do not allow you to do anything more than this definition.)A correlated equilibrium for “chicken”•Why is this a correlated equilibrium?•Suppose the mediator tells the row player to Dodge•From Row’s perspective, the conditional probability that Column was told to Dodge is 20% / (20% + 40%) = 1/3•So the expected utility of Dodging is (2/3)*(-1) = -2/3•But the expected utility of Straight is (1/3)*1 + (2/3)*(-5) = -3•So Row wants to follow the recommendation•If Row is told to go Straight, he knows that Column was told to Dodge, so again Row wants to follow the recommendation•Similar for Column0, 0 -1, 11, -1 -5, -5DSD S20%40%40%0%A nonzero-sum variant of rock-paper-scissors (Shapley’s game [Shapley 64])•If both choose the same pure strategy, both lose•These probabilities give a correlated equilibrium:•E.g. suppose Row is told to play Rock•Row knows Column is playing either paper or scissors (50-50)–Playing Rock will give ½; playing Paper will give 0; playing Scissors will give ½•So Rock is optimal (not uniquely)0, 0 0, 1 1, 01, 0 0, 0 0, 10, 1 1, 0 0, 01/6 1/61/6 1/61/61/6000Regret•For each player i, action ai and time t, define the regret ri(ai, t) as (Σ1≤t’≤t-1ui(ai, a-i,t’) - ui(ai,t’, a-i,t’))/(t-1)•An algorithm has zero regret if for each ai, the regret for ai becomes nonpositive as t goes to infinity (almost surely) against any opponents •Regret matching [Hart & Mas-Colell 00]: at time t, play an action that has positive regret ri(ai, t) with probability proportional to ri(ai, t) –If none of the actions have positive regret, play uniformly at random•Regret matching has zero regret•If all players use regret matching, then play converges to the set of weak correlated equilibria–Weak correlated equilibrium: playing according to joint distribution is at least as good as any strategy that does not depend on the signal•Variants of this converge to the set of correlated equilibria•Smooth fictitious play [Fudenberg & Levine 95] also gives no regret–Instead of just best-responding to history, assign some small value to having a more “mixed” distributionTargeted learning•Assume that there is a limited set of possible opponents•Try to do well against these•Example: is there a learning algorithm that–learns to best-respond against any stationary opponent (one that always plays the same mixed strategy), and–converges to a Nash equilibrium (in actual strategies, not historical distribution) when playing against a copy of itself (so-called self-play)?•[Bowling and Veloso AIJ02]: yes, if it is a 2-player 2x2 game and mixed strategies are observable•[Conitzer and Sandholm ICML03/ML06]: yes (without those assumptions)–AWESOME algorithm (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium): (very) rough sketch:play according to equilibrium strategybest-respond to recent historynot all players appear to be playing equilibriumnot all players appear to be playing stationary strategies“Teaching”4, 4 3, 55, 3 0, 0•Suppose you are playing against a player that uses one of


View Full Document

Duke CPS 296.3 - Learning in games

Download Learning in games
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Learning in games and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Learning in games 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?