CS 188: Artificial Intelligence Spring 2006Game TheoryStrategiesDominance and OptimalityEquilibriaCoordination GamesMixed Strategy Games(Zero-Sum) Minimax StrategiesContinuous MinimaxRepeated GamesPartially Observed GamesThe Ultimatum GameMechanism DesignAuctionsCS 188: Artificial IntelligenceSpring 2006Lecture 26: Game Theory4/25/2006Dan Klein – UC BerkeleyGame TheoryGame theory: study of strategic situations, usually simultaneous actionsA game has:PlayersActionsPayoff matrixExample: prisoner’s dilemmaATestify RefuseB Testify -5,-5 -10,0Refuse 0,-10 -1,-1Prisoner’s DilemmaStrategiesStrategy = policyPure strategyDeterministic policyIn a one-move game, just a moveMixed strategyRandomized policyEver good to use one?Strategy profile: a spec of one strategy per playerOutcome: each strategy profile results in an (expected) number for each playerOOne TwoE One -2,2 3,-3Two 3,-3 -4,4Prisoner’s DilemmaTwo-Finger MorraATestify RefuseB Testify -5,-5 -10,0Refuse 0,-10 -1,-1Dominance and OptimalityStrategy Dominance:A strategy s for A (strictly) dominates s’ if it produces a better outcome for A, for any B strategyOutcome Dominance:An outcome o Pareto dominates o’ if all players prefer o to o’An outcome is Pareto optimal if there is no outcome that all players would preferOOne TwoE One -2,2 3,-3Two 3,-3 -4,4Prisoner’s DilemmaTwo-Finger MorraATestify RefuseB Testify -5,-5 -10,0Refuse 0,-10 -1,-1EquilibriaIn the prisoner’s dilemma:What will A do?What will B do?What’s the dilemma?Both testifying is a (Nash) equilibriumNeither player can benefit from a unilateral change in strategyI.e., it’s a local optimum (not necessarily global)Nash showed that every game has such an equilibriumNote: not every game has a dominant strategy equilibriumWhat do we have to change for the prisoners to refuse?Change the payoffsConsider repeated gamesLimit the computational ability of the agentsHow would we model a “code of thieves”?ATestify RefuseB Testify -5,-5 -10,0Refuse 0,-10 -1,-1Coordination GamesNo dominant strategyBut, two (pure) Nash equilibriaWhat should agents do?Can sometimes choose Pareto optimal Nash equilibriumBut may be ties!Naturally gives rise to communicationAlso: correlated equilibriaADVD HD-DVDB DVD 5,5 -2,-1HD-DVD -2,-1 8,8ALeft RightB Left 1,1 -1,-1Right -1,-1 1,1Technology ChoiceDriving DirectionMixed Strategy GamesWhat’s the Nash equilibrium?No pure strategy equilibriumMust look at mixed strategiesMixed strategies:Distribution over actions per stateIn a one-move game, a single distributionFor Morra, a single number peven specifies the strategyHow to choose the optimal mixed strategy?OOne TwoE One -2,2 3,-3Two 3,-3 -4,4Two-Finger Morra(Zero-Sum) Minimax StrategiesIdea: force one player to chose and declare a strategy firstSay E reveals firstFor each E strategy, O has a minimax responseUtility of the root favors O (why?) and is -3 (from E’s perspective)If O goes first, root is 2 (for E)If these two utilities matched, we would know the utility of the maximum equilibriumMust look at mixed strategiesOOne TwoE One -2,2 3,-3Two 3,-3 -4,4Two-Finger Morra2 -3 -3 41 21 2 1 22 -3 -3 41 21 2 1 2Continuous MinimaxImagine a minimax tree:Instead of the two pure strategies, first player has infinitely many mixed onesNote that second player should always respond with a pure strategy (why?)Here, can calculate the minimax (and maximin) valuesBoth are ½ (from O’s perspective)Correspond to [7/12; 1, 5/12; 2] for both playersOOne TwoE One -2,2 3,-3Two 3,-3 -4,4Two-Finger Morra(2)(p)+(-3)(1-p)[p;1, (1-p);2]1 2(-3)(p)+(4)(1-p)Repeated GamesWhat about repeated games?E.g. repeated prisoner’s dilemmaFuture responses, retaliation becomes an issueStrategy can condition on past experienceRepeated prisoner’s dilemmaFixed numbers of games causes repeated betrayalIf agents unsure of number of future games, other optionsE.g. perpetual punishment: silent until you’re betrayed, then testify thereafterE.g. tit-for-tat: do what was done to you last roundIt’s enough for your opponent to believe you are incapable of remembering the number of games played (doesn’t actually matter whether the limitation really exists)Partially Observed GamesMuch harder to analyzeYou have to work with trees of belief statesProblem: you don’t know your opponent’s belief state!Newer techniques can solve some partially observable gamesMini-poker analysis shows, e.g., that bluffing can be a rational actionRandomization: not just for being unpredictable, also useful for minimizing what opponent can learn from your actionsThe Ultimatum GameGame theory can reveal interesting issues in social psychologyE.g. the ultimatum gameProposer: receives $x, offers split $k / $(x-k)Accepter: eitherAccepts: gets $k, proposer gets $(x-k)Rejects: neither gets anythingNash equilibrium?Any strategy profile where proposer offers $k and accepter will accept $k or greaterBut that’s not the interesting part…Issues:Why do people tend to reject offers which are very unfair (e.g. $20 from $100)?Irrationality?Utility of $20 exceeded by utility of punishing the unfair proposer?What about if x is very very large?Mechanism DesignOne use of game theory: mechanism designDesigning a game which induces desired behavior in rational agentsE.g. avoiding tragedies of the commonsClassic example: farmers share a common pastureEach chooses how many goats to grazeAdding a goat gains utility for that farmerAdding a goat slightly degrades the pastureInevitable that each farmer will keep adding goats until the commons is destroyed (tragedy!)Classic solution: charge for use of the commonsPrices need to be set to produce the right behaviorAuctionsExample: auctionsConsider auction for one itemEach bidder i has value vi and bids bi for itemEnglish auction: increasing bidsHow should bidder i bid?What will the winner pay?Why is this not an optimal result?Sealed single-bid auction, highest pays bidHow should bidder i bid?Why is bidding your value no longer dominant?Why is this auction not optimal?Sealed single-bid second-price auctionHow should bidder i bid?Bid vi –
View Full Document