Duke CPS 296.1 - Learning in games - D1742760

Home> Schools> Duke University> (CPS) > CPS 296.1> Learning in games

DOC PREVIEW

Duke CPS 296.1 - Learning in games

School name Duke University

Course Cps 296.1- Introduction to Computer Vision

Pages 16

This preview shows page 1-2-3-4-5 out of 16 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CPS 296.1 Learning in games“2/3 of the average” game“2/3 of the average” game revisitedLearning in (normal-form) gamesIterated best responseFictitious play [Brown 1951]Fictitious play on rock-paper-scissorsDoes the empirical distribution of play converge to equilibrium?Fictitious play is guaranteed to converge in…Shapley’s game on which fictitious play does not convergeRegretTargeted learning“Teaching”Evolutionary game theoryStabilityEvolutionarily stable strategiesCPS 296.1Learning in gamesVincent Conitzer [email protected]“2/3 of the average” game•Everyone writes down a number between 0 and 100•Person closest to 2/3 of the average wins•Example:–A says 50–B says 10–C says 90–Average(50, 10, 90) = 50–2/3 of average = 33.33–A is closest (|50-33.33| = 16.67), so A wins“2/3 of the average” game revisited0100(2/3)*100(2/3)*(2/3)*100…dominateddominated after removal of (originally) dominated strategiesLearning in (normal-form) games•Approach we have taken so far when playing a game: just compute an optimal/equilibrium strategy•Another approach: learn how to play a game by–playing it many times, and –updating your strategy based on experience•Why?–Some of the game’s utilities (especially the other players’) may be unknown to you–The other players may not be playing an equilibrium strategy–Computing an optimal strategy can be hard–Learning is what humans typically do–…•Learning strategies ~ strategies for the repeated game•Does learning converge to equilibrium?Iterated best response0, 0 -1, 1 1, -11, -1 0, 0 -1, 1-1, 1 1, -1 0, 0•In the first round, play something arbitrary•In each following round, play a best response against what the other players played in the previous round•If all players play this, it can converge (i.e., we reach an equilibrium) or cycle-1, -1 0, 00, 0 -1, -1•Alternating best response: players alternatingly change strategies: one player best-responds each odd round, the other best-responds each even roundrock-paper-scissorsa simple congestion gameFictitious play [Brown 1951]0, 0 -1, 1 1, -11, -1 0, 0 -1, 1-1, 1 1, -1 0, 0•In the first round, play something arbitrary•In each following round, play a best response against the empirical distribution of the other players’ play–I.e., as if other player randomly selects from his past actions•Again, if this converges, we have a Nash equilibrium•Can still fail to converge…-1, -1 0, 00, 0 -1, -1rock-paper-scissorsa simple congestion gameFictitious play on rock-paper-scissors0, 0 -1, 1 1, -11, -1 0, 0 -1, 1-1, 1 1, -1 0, 0Row Column30% R, 50% P, 20% S30% R, 20% P, 50% SDoes the empirical distribution of play converge to equilibrium?•… for iterated best response?•… for fictitious play?3, 0 1, 21, 2 2, 1Fictitious play is guaranteed to converge in…•Two-player zero-sum games [Robinson 1951]•Generic 2x2 games [Miyasawa 1961]•Games solvable by iterated strict dominance [Nachbar 1990]•Weighted potential games [Monderer & Shapley 1996]•Not in general [Shapley 1964]•But, fictitious play always converges to the set of ½-approximate equilibria [Conitzer 2009]Shapley’s game on which fictitious play does not converge•starting with (U, M):0, 0 0, 1 1, 01, 0 0, 0 0, 10, 1 1, 0 0, 0Regret•For each player i, action ai and time t, define the regret ri(ai, t) as (Σ1≤t’≤t-1ui(ai, a-i,t’) - ui(ai,t’, a-i,t’))/(t-1)•An algorithm has zero regret if for each ai, the regret for ai becomes nonpositive as t goes to infinity (almost surely) against any opponents •Regret matching [Hart & Mas-Colell 00]: at time t, play an action that has positive regret ri(ai, t) with probability proportional to ri(ai, t) –If none of the actions have positive regret, play uniformly at random•Regret matching has zero regret•If all players use regret matching, then play converges to the set of weak correlated equilibria–Weak correlated equilibrium: playing according to joint distribution is at least as good as any strategy that does not depend on the signal•Variants of this converge to the set of correlated equilibria•Smooth fictitious play [Fudenberg & Levine 95] also gives no regret–Instead of just best-responding to history, assign some small value to having a more “mixed” distributionTargeted learning•Assume that there is a limited set of possible opponents•Try to do well against these•Example: is there a learning algorithm that–learns to best-respond against any stationary opponent (one that always plays the same mixed strategy), and–converges to a Nash equilibrium (in actual strategies, not historical distribution) when playing against a copy of itself (so-called self-play)?•[Bowling and Veloso AIJ02]: yes, if it is a 2-player 2x2 game and mixed strategies are observable•[Conitzer and Sandholm ML06]: yes (without those assumptions)–AWESOME algorithm (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium): (very) rough sketch:play according to equilibrium strategybest-respond to recent historynot all players appear to be playing equilibriumnot all players appear to be playing stationary strategies“Teaching”4, 4 3, 55, 3 0, 0•Suppose you are playing against a player that uses one of these strategies–Fictitious play, anything with no regret, AWESOME, …•Also suppose you are very patient, i.e., you only care about what happens in the long run•How will you (the row player) play in the following repeated games?–Hint: the other player will eventually best-respond to whatever you do1, 0 3, 12, 1 4, 0•Note relationship to optimal strategies to commit to•There is some work on learning strategies that are in equilibrium with each other [Brafman & Tennenholtz AIJ04]Evolutionary game theory•Given: a symmetric game1, 1 0, 22, 0 -1, -1dovedovehawkhawk•A large population of players plays this game, players are randomly matched to play with each other•Each player plays a pure strategy–Fraction of players playing strategy s = ps–p is vector of all fractions ps (the state)•Utility for playing s is u(s, p) = Σs’ps’u(s, s’)•Players reproduce at a rate that is proportional to their utility, their offspring play the same strategy–Replicator dynamic•dps(t)/dt = ps(t)(u(s, p(t)) - Σs’ps’u(s’, p(t)))•What are the steady states of this?Nash equilibria: (d, h), (h, d), ((.5, .5), (.5, .5))Stability•A steady state is stable if slightly perturbing the

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 16 pages.

Duke CPS 296.1 - Learning in games

Sign up for free to view:

Please select your school