DOC PREVIEW
Berkeley COMPSCI 188 - Lecture 23 Games

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 188 Artificial Intelligence Spring 2006 Lecture 23 Games 4 18 2006 Dan Klein UC Berkeley Game Playing in Practice Checkers Chinook ended 40 year reign of human world champion Marion Tinsley in 1994 Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board a total of 443 748 401 247 positions Exact solution imminent Chess Deep Blue defeated human world champion Gary Kasparov in a six game match in 1997 Deep Blue examined 200 million positions per second used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply Othello human champions refuse to compete against computers who are too good Go human champions refuse to compete against computers who are too bad In go b 300 so most programs use pattern knowledge bases to suggest plausible moves Game Playing Axes Deterministic or not Number of players Perfect information or not Want algorithms for calculating a strategy policy which recommends a move in each state Deterministic Single Player Deterministic single player perfect information Know the rules Know what moves will do Have some utility function over outcomes E g Freecell 8 Puzzle Rubik s cube it s basically just search Slight reinterpretation Calculate best utility from each node Each node is a max over children Note that goal values are on the goal not path sums as before 8 2 5 6 Stochastic Single Player What if we don t know what the result of an action will be E g solitaire minesweeper trying to drive home just an MDP Can also do expectimax search Chance nodes like actions except the environment controls the action chosen Calculate utility for each node Max nodes as in search Chance nodes take expectations of children 8 2 5 6 Deterministic Two Player Turns E g tic tac toe Minimax search Basically a state space search tree Each layer or ply alternates players Choose move to position with highest minimax value best achievable utility against best play Zero sum games One player maximizes result The other minimizes result 8 2 5 6 Minimax Example Minimax Search Minimax Properties Optimal against a perfect player Otherwise Time complexity O bm Space complexity O bm For chess b 35 m 100 Exact solution is completely infeasible But do we need to explore the whole tree Multi Player Games Similar to minimax Utilities are now tuples Each player maximizes their own entry at each node Propagate or back up nodes from children 1 2 6 4 3 2 6 1 2 7 4 1 5 1 1 1 5 2 7 7 1 5 4 5 Games with Chance E g backgammon Expectiminimax search Environment is an extra player than moves after each agent Chance nodes take expectations otherwise like minimax Games with Chance Dice rolls increase b 21 possible rolls with 2 dice Backgammon 20 legal moves Depth 4 20 x 21 x 20 3 1 2 x 109 As depth increases probability of reaching a given node shrinks So value of lookahead is diminished So limiting depth is less damaging But pruning is less possible TDGammon uses depth 2 search very good eval function reinforcement learning world champion level play Games with Hidden Information Imperfect information E g card games where opponent s initial cards are unknown Typically we can calculate a probability for each possible deal Seems just like having one big dice roll at the beginning of the game Idea compute the minimax value of each action in each deal then choose the action with highest expected value over all deals Special case if an action is optimal for all deals it s optimal GIB current best bridge program approximates this idea by 1 generating 100 deals consistent with bidding information 2 picking the action that wins most tricks on average Drawback to this approach It s broken Though useful in practice Averaging over Deals is Broken Road A leads to a small heap of gold pieces Road B leads to a fork take the left fork and you ll find a mound of jewels take the right fork and you ll be run over by a bus Road A leads to a small heap of gold pieces Road B leads to a fork take the left fork and you ll be run over by a bus take the right fork and you ll find a mound of jewels Road A leads to a small heap of gold pieces Road B leads to a fork guess correctly and you ll nd a mound of jewels guess incorrectly and you ll be run over by a bus Efficient Search Several options Pruning avoid regions of search tree which will never enter into optimal play Limited depth don t search very far into the future approximate utility with a value function familiar Next Class More game playing Pruning Limited depth search Connection to reinforcement learning Pruning Example Q Learning Model free TD learning with Q functions Function Approximation Problem too slow to learn each state s utility one by one Solution what we learn about one state should generalize to similar states Very much like supervised learning If states are treated entirely independently we can only learn on very small state spaces Discretization Can put states into buckets of various sizes E g can have all angles between 0 and 5 degrees share the same Q estimate Buckets too fine takes a long time to learn Buckets too coarse learn suboptimal often jerky control Real systems that use discretization usually require clever bucketing schemes Adaptive sizes Tile coding DEMOS Linear Value Functions Another option values are linear functions of features of states or action state pairs Good if you can describe states well using a few features e g for game playing board evaluations Now we only have to learn a few weights rather than a value for each state 0 80 0 85 0 70 0 60 0 65 0 90 0 95 0 80 0 85 0 70 0 75 TD Updates for Linear Values Can use TD learning with linear values Actually it s just like the perceptron Old Q learning update Simply update weights of features in Q a s Example TD for Linear Qs


View Full Document

Berkeley COMPSCI 188 - Lecture 23 Games

Documents in this Course
CSP

CSP

42 pages

Metrics

Metrics

4 pages

HMMs II

HMMs II

19 pages

NLP

NLP

23 pages

Midterm

Midterm

9 pages

Agents

Agents

8 pages

Lecture 4

Lecture 4

53 pages

CSPs

CSPs

16 pages

Midterm

Midterm

6 pages

MDPs

MDPs

20 pages

mdps

mdps

2 pages

Games II

Games II

18 pages

Load more
Download Lecture 23 Games
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 23 Games and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 23 Games 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?