DOC PREVIEW
Berkeley COMPSCI 188 - Lecture 25: Games II

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 188: Artificial IntelligenceSpring 2006Lecture 25: Games II4/20/2006Dan Klein – UC BerkeleyRecap: Minimax TreesMinimax Search DFS Minimaxα-β Pruning Example [Code in book]α-β Pruning General configuration α is the best value (to MAX) found so far off the current path If V is worse than α, MAX will avoid it, so prune V’s branch Define β similarly for MIN2α-β Pruning Properties Pruning has no effect on final result Good move ordering improves effectiveness of pruning With “perfect ordering”: Time complexity drops to O(bm/2) Doubles solvable depth Full search of, e.g. chess, is still hopeless! A simple example of metareasoning, here reasoning about which computations are relevantResource Limits Cannot search to leaves Limited search Instead, search a limited portion of the tree Replace terminal utilities with an evalfunction for non-terminal positions Guarantee of optimal play is gone Example: Suppose we have 100 seconds, can explore 10K nodes / sec So can check 1M nodes per move α-β reaches about depth 8 – decent chess programEvaluation Functions Function which scores non-terminals Ideal function: returns the utility of the position In practice: typically weighted linear sum of features: e.g. f1(s) = (num white queens – num black queens), etc.Function Approximation Problem: inefficient to learn each state’s utility (or evalfunction) one by one Solution: what we learn about one state (or position) should generalize to similar states Very much like supervised learning If states are treated entirely independently, we can only learn on very small state spacesLinear Value Functions Another option: values are linear functions of features of states (or action-state pairs) Good if you can describe states well using a few features (e.g. for game playing board evaluations) Now we only have to learn a few weights rather than a value for each state0.600.700.80 0.850.65 0.700.800.900.750.850.95Recap: Model-Free Learning Recall MDP value updates for a given estimate of U If you know the model T, use Bellman update Temporal difference learning (TD) Make (epsilon greedy) action choice (or follow a provided policy) Update using results of the action3Example: Tabular Value Updates Example: Blackjack +1 for win, -1 for loss or bust, 0 for tie Our hand shows 14, current policy says “hit” Current U(s) is 0.5 We hit, get an 8, bust (end up in s’ = “lose”) Update Old U(s) = 0.5 Observed R(s) = 0 Old U(s’) = -1 New U(s) = U(s) + α [ γ (R(s) + U(s’) – U(s) ] If α = 0.1, γ = 1.0 New U(s) = 0.5 + 0.1 [ 0 + -1 – 0.5 ] = 0.5 + 0.1 [-1.5] = 0.35TD Updates: Linear Values Assume a linear value function: Can almost do a TD update: Problem: we can’t “increment” U(s) explicitly Solution: update the weights of the features at that stateLearning Eval Parameters with TD Ideally, want eval(s) to be the utility of s Idea: use techniques from reinforcement learning Samuel’s 1959 checkers system Tesauro’s 1992 backgammon system (TD-Gammon) Basic approach: temporal difference updates Begin in state s Choose action using limited minimax search See what opponent does End up in state s’ Do a value update of U(s) using U(s’) Not guaranteed to converge against an adversary, but can work in practiceQ-Learning With TD updates on values You don’t need the model to update the utility estimates You still do need it to figure out what action to take! Q-Learning with TD updates No model needed to learn or to choose actionsTD Updates for Linear Qs Can use TD learning with linear Qs (Actually it’s just like the perceptron!) Old Q-learning update: Simply update weights of features in Qθ(a,s)Coming Up Real-world applications Large- scale machine / reinforcement learning NLP: language understanding and translation Vision: object and face


View Full Document

Berkeley COMPSCI 188 - Lecture 25: Games II

Documents in this Course
CSP

CSP

42 pages

Metrics

Metrics

4 pages

HMMs II

HMMs II

19 pages

NLP

NLP

23 pages

Midterm

Midterm

9 pages

Agents

Agents

8 pages

Lecture 4

Lecture 4

53 pages

CSPs

CSPs

16 pages

Midterm

Midterm

6 pages

MDPs

MDPs

20 pages

mdps

mdps

2 pages

Games II

Games II

18 pages

Load more
Download Lecture 25: Games II
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 25: Games II and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 25: Games II 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?