DOC PREVIEW
Berkeley COMPSCI 188 - Games II

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 188: Artificial Intelligence Spring 2006Recap: Minimax TreesMinimax SearchDFS Minimax- Pruning Example- Pruning- Pruning PropertiesResource LimitsEvaluation FunctionsFunction ApproximationLinear Value FunctionsRecap: Model-Free LearningExample: Tabular Value UpdatesTD Updates: Linear ValuesLearning Eval Parameters with TDQ-LearningTD Updates for Linear QsComing UpCS 188: Artificial IntelligenceSpring 2006Lecture 25: Games II4/20/2006Dan Klein – UC BerkeleyRecap: Minimax TreesMinimax SearchDFS Minimax- Pruning Example[Code in book]- PruningGeneral configuration is the best value (to MAX) found so far off the current pathIf V is worse than , MAX will avoid it, so prune V’s branchDefine  similarly for MIN- Pruning PropertiesPruning has no effect on final resultGood move ordering improves effectiveness of pruningWith “perfect ordering”:Time complexity drops to O(bm/2)Doubles solvable depthFull search of, e.g. chess, is still hopeless!A simple example of metareasoning, here reasoning about which computations are relevantResource LimitsCannot search to leavesLimited searchInstead, search a limited portion of the treeReplace terminal utilities with an eval function for non-terminal positionsGuarantee of optimal play is goneExample:Suppose we have 100 seconds, can explore 10K nodes / secSo can check 1M nodes per move- reaches about depth 8 – decent chess programEvaluation FunctionsFunction which scores non-terminalsIdeal function: returns the utility of the positionIn practice: typically weighted linear sum of features:e.g. f1(s) = (num white queens – num black queens), etc.Function ApproximationProblem: inefficient to learn each state’s utility (or eval function) one by oneSolution: what we learn about one state (or position) should generalize to similar statesVery much like supervised learningIf states are treated entirely independently, we can only learn on very small state spacesLinear Value FunctionsAnother option: values are linear functions of features of states (or action-state pairs)Good if you can describe states well using a few features (e.g. for game playing board evaluations)Now we only have to learn a few weights rather than a value for each state0.600.700.80 0.850.65 0.700.800.900.750.850.95Recap: Model-Free LearningRecall MDP value updates for a given estimate of UIf you know the model T, use Bellman updateTemporal difference learning (TD)Make (epsilon greedy) action choice (or follow a provided policy)Update using results of the actionExample: Tabular Value UpdatesExample: Blackjack+1 for win, -1 for loss or bust, 0 for tieOur hand shows 14, current policy says “hit”Current U(s) is 0.5We hit, get an 8, bust (end up in s’ = “lose”)UpdateOld U(s) = 0.5Observed R(s) = 0Old U(s’) = -1New U(s) = U(s) +  [  (R(s) + U(s’) – U(s) ]If  = 0.1,  = 1.0New U(s) = 0.5 + 0.1 [ 0 + -1 – 0.5 ] = 0.5 + 0.1 [-1.5] = 0.35TD Updates: Linear ValuesAssume a linear value function:Can almost do a TD update:Problem: we can’t “increment” U(s) explicitlySolution: update the weights of the features at that stateLearning Eval Parameters with TDIdeally, want eval(s) to be the utility of sIdea: use techniques from reinforcement learningSamuel’s 1959 checkers systemTesauro’s 1992 backgammon system (TD-Gammon)Basic approach: temporal difference updatesBegin in state sChoose action using limited minimax searchSee what opponent doesEnd up in state s’Do a value update of U(s) using U(s’)Not guaranteed to converge against an adversary, but can work in practiceQ-LearningWith TD updates on valuesYou don’t need the model to update the utility estimatesYou still do need it to figure out what action to take!Q-Learning with TD updatesNo model needed to learn or to choose actionsTD Updates for Linear QsCan use TD learning with linear Qs(Actually it’s just like the perceptron!)Old Q-learning update:Simply update weights of features in Q(a,s)Coming UpReal-world applicationsLarge-scale machine / reinforcement learningNLP: language understanding and translationVision: object and face


View Full Document

Berkeley COMPSCI 188 - Games II

Documents in this Course
CSP

CSP

42 pages

Metrics

Metrics

4 pages

HMMs II

HMMs II

19 pages

NLP

NLP

23 pages

Midterm

Midterm

9 pages

Agents

Agents

8 pages

Lecture 4

Lecture 4

53 pages

CSPs

CSPs

16 pages

Midterm

Midterm

6 pages

MDPs

MDPs

20 pages

mdps

mdps

2 pages

Load more
Download Games II
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Games II and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Games II 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?