UW-Madison COMPSCI 540 - Game Theory - Fingers, Prisoners, Goats - D2373431

Home> Schools> University of Wisconsin, Madison> Computer Sciences (COMPSCI) > COMPSCI 540> Game Theory - Fingers, Prisoners, Goats

UW-Madison COMPSCI 540 - Game Theory - Fingers, Prisoners, Goats

School name University of Wisconsin, Madison

Course Compsci 540- Introduction to Artificial Intelligence

Pages 48

Download Save

Unformatted text preview:

slide 1 Game Theory Fingers, prisoners, goats Xiaojin Zhu [email protected] Computer Sciences Department University of Wisconsin, Madison [based on slides from Andrew Moore http://www.cs.cmu.edu/~awm/tutorials]slide 2 Overview • Matrix normal form • Chance games • Games with hidden information • Non-zero sum gamesslide 3 Pure strategy • A pure strategy for a player is the mapping between all possible states the player can see, to the move the player would make. • Player A has 4 pure strategies: A’s strategy I: (1L, 4L) A’s strategy II: (1L, 4R) A’s strategy III: (1R, 4L) A’s strategy IV: (1R, 4R) • Player B has 3 pure strategies: B’s strategy I: (2L, 3R) B’s strategy II: (2M, 3R) B’s strategy III: (2R, 3R) • How many pure strategies if each player can see N states, and has b moves at each state? (1)-a (4)-a (3)-b (2)-b ( ) +5 ( ) +4 ( ) -1 ( ) +3 ( ) +7 L R R R M L R Lslide 4 Matrix Normal Form of games • The matrix normal form is the game value matrix indexed by each player’s strategies. A’s strategy I: (1L, 4L) A’s strategy II: (1L, 4R) A’s strategy III: (1R, 4L) A’s strategy IV: (1R, 4R) B’s strategy I: (2L, 3R) B’s strategy II: (2M, 3R) B’s strategy III: (2R, 3R) (1)-a (4)-a (3)-b (2)-b ( ) +5 ( ) +4 ( ) -1 ( ) +3 ( ) +7 L R R R M L R L B-III B-II B-I 5 5 5 A-IV 5 5 5 A-III 4 3 7 A-II -1 3 7 A-I The matrix encodes every outcome of the game! The rules etc. are no longer needed.slide 5 Matrix normal form example • How many pure strategies does A have? • How many does B have? • What is the matrix form of this game? (1)-a (3)-b (2)-b ( ) +2 ( ) +5 ( ) +2 L (4)-a ( ) +4 ( ) -1 R R R L L R Lslide 6 • How many pure strategies does A have? 4 A-I (1L, 4L) A-II (1L,4R) A-III (1R,4L) A-IV (1R, 4R) • How many does B have? 4 B-I (2L, 3L) B-II (2L,3R) B-III (2R,3L) B-IV (2R, 3R) • What is the matrix form of this game? Matrix normal form example (1)-a (3)-b (2)-b ( ) +2 ( ) +5 ( ) +2 L (4)-a ( ) +4 ( ) -1 R R R L L R L 2 2 2 2 B-IV B-III B-II B-I 5 2 5 A-IV 5 2 5 A-III 2 4 4 A-II 2 -1 -1 A-Islide 7 Minimax in Matrix Normal Form • Player A: for each strategy, consider all B’s counter strategies (a row in the matrix), find the minimum value in that row. Pick the row with the maximum minimum value. • Here maximin=5 B-III B-II B-I 5 5 5 A-IV 5 5 5 A-III 4 3 7 A-II -1 3 7 A-I (1)-a (4)-a (3)-b (2)-b ( ) +5 ( ) +4 ( ) -1 ( ) +3 ( ) +7 L R R R M L R Lslide 8 Minimax in Matrix Normal Form • Player B: find the maximum value in each column. Pick the column with the minimum maximum value. • Here minimax = 5 B-III B-II B-I 5 5 5 A-IV 5 5 5 A-III 4 3 7 A-II -1 3 7 A-I (1)-a (4)-a (3)-b (2)-b ( ) +5 ( ) +4 ( ) -1 ( ) +3 ( ) +7 L R R R M L R L Fundamental game theory result (proved by von Neumann): In a 2-player, zero-sum game of perfect information, Minimax==Maximin. And there always exists an optimal pure strategy for each player.slide 9 Minimax in Matrix Normal Form • Player B: find the maximum value in each column. Pick the column with the minimum maximum value. • Here minimax = 5 B-III B-II B-I 5 5 5 A-IV 5 5 5 A-III 4 3 7 A-II -1 3 7 A-I (1)-a (4)-a (3)-b (2)-b ( ) +5 ( ) +4 ( ) -1 ( ) +3 ( ) +7 L R R R M L R L Fundamental game theory result (proved by von Neumann): In a 2-player, zero-sum game of perfect information, Minimax==Maximin. And there always exists an optimal pure strategy for each player. Interestingly, A can tell B in advance what strategy A will use (the maximin), and this information will not help B! Similarly B can tell A what strategy B will use. In fact A knows what B’s strategy will be. And B knows A’s too. And A knows that B knows … The game is at an equilibriumslide 10 Matrix Normal Form for NONdeterministic games • Recall the chance nodes (coin flip, die roll etc.): neither player moves, but a random move is made according to the known probability • The game theoretic value is the expected value if both players are optimal • What’s the matrix form of this game? ( )-a ( )-chance ( )-b ( )-b -20 +4 ( )-b ( )-chance +3 ( )-a +10 ( )-a -5 ( )-a p=0.8 p=0.2 p=0.5 p=0.5slide 11 Matrix Normal Form for NONdeterministic games • A-I: L, A-II: R, B-I: L, B-II: R • The i,jth entry is the expected value with strategies A-i,B-j • von Neumann’s result still holds • Minimax == Maximin ( )-a ( )-chance ( )-b ( )-b -20 +4 ( )-b ( )-chance +3 ( )-a +10 ( )-a -5 ( )-a p=0.8 p=0.2 p=0.5 p=0.5 B-II B-I 3 -2 A-II -8 -8 A-Islide 12 Non-zero sum gamesslide 13 Non-zero sum games • One player’s gain is not the other’s loss • Matrix normal form: simply lists all players’ gain • Previous zero-sum games trivially represented as B-II B-I -1, -1 0, -10 A-II -10, 0 -5, -5 A-I Convention: A’s gain first, B’s next O-II O-I 4, -4 -3, 3 E-II -3, 3 2, -2 E-I Note B now wants to maximize the blue numbers.slide 14 Prisoner’s dilemma B-refuse B-testify -1, -1 -10, 0 A-refuse 0, -10 -5, -5 A-testifyslide 15 Strict domination • A’s strategy i dominates A’s strategy j, if for every B’s strategy, A is better off doing i than j. B-refuse B-testify -1, -1 -10, 0 A-refuse 0, -10 -5, -5 A-testify If B-testify: A-testify (-5) is better than A-refuse (-10) If B-refuse: A-testify (0) is better than A-refuse (-1) A: Testify is always better than refuse. A-testify strictly dominates (all outcomes strictly better than) A-refuse.slide 16 Strict domination • Fundamental assumption of game theory: get rid of strictly dominated strategies – they won’t happen. • In some cases like prisoner’s dilemma, we can use strict domination to predict the outcome, if both players are rational. B-refuse B-testify -1, -1 -10, 0 A-refuse 0, -10 -5, -5 A-testifyslide 17 Strict domination • Fundamental assumption of game theory: get rid of strictly dominated strategies – they won’t happen. • In some cases like prisoner’s dilemma, we can use strict domination to predict the outcome, if both players are rational. B-refuse B-testify -1, -1 -10, 0 A-refuse 0, -10 -5, -5 …

View Full Document


School:
Email:
New Password:
Confirm Password:

UW-Madison COMPSCI 540 - Game Theory - Fingers, Prisoners, Goats

Sign up for free to view:

Please select your school