Berkeley COMPSCI 188 - Games II - D371870

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 188> Games II

DOC PREVIEW

Berkeley COMPSCI 188 - Games II

School name University of California, Berkeley

Course Compsci 188- Introduction to Artificial Intelligence

Pages 18

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS 188: Artificial Intelligence Spring 2006Recap: Minimax TreesMinimax SearchDFS Minimax- Pruning Example- Pruning- Pruning PropertiesResource LimitsEvaluation FunctionsFunction ApproximationLinear Value FunctionsRecap: Model-Free LearningExample: Tabular Value UpdatesTD Updates: Linear ValuesLearning Eval Parameters with TDQ-LearningTD Updates for Linear QsComing UpCS 188: Artificial IntelligenceSpring 2006Lecture 25: Games II4/20/2006Dan Klein – UC BerkeleyRecap: Minimax TreesMinimax SearchDFS Minimax- Pruning Example[Code in book]- PruningGeneral configuration is the best value (to MAX) found so far off the current pathIf V is worse than , MAX will avoid it, so prune V’s branchDefine  similarly for MIN- Pruning PropertiesPruning has no effect on final resultGood move ordering improves effectiveness of pruningWith “perfect ordering”:Time complexity drops to O(bm/2)Doubles solvable depthFull search of, e.g. chess, is still hopeless!A simple example of metareasoning, here reasoning about which computations are relevantResource LimitsCannot search to leavesLimited searchInstead, search a limited portion of the treeReplace terminal utilities with an eval function for non-terminal positionsGuarantee of optimal play is goneExample:Suppose we have 100 seconds, can explore 10K nodes / secSo can check 1M nodes per move- reaches about depth 8 – decent chess programEvaluation FunctionsFunction which scores non-terminalsIdeal function: returns the utility of the positionIn practice: typically weighted linear sum of features:e.g. f1(s) = (num white queens – num black queens), etc.Function ApproximationProblem: inefficient to learn each state’s utility (or eval function) one by oneSolution: what we learn about one state (or position) should generalize to similar statesVery much like supervised learningIf states are treated entirely independently, we can only learn on very small state spacesLinear Value FunctionsAnother option: values are linear functions of features of states (or action-state pairs)Good if you can describe states well using a few features (e.g. for game playing board evaluations)Now we only have to learn a few weights rather than a value for each state0.600.700.80 0.850.65 0.700.800.900.750.850.95Recap: Model-Free LearningRecall MDP value updates for a given estimate of UIf you know the model T, use Bellman updateTemporal difference learning (TD)Make (epsilon greedy) action choice (or follow a provided policy)Update using results of the actionExample: Tabular Value UpdatesExample: Blackjack+1 for win, -1 for loss or bust, 0 for tieOur hand shows 14, current policy says “hit”Current U(s) is 0.5We hit, get an 8, bust (end up in s’ = “lose”)UpdateOld U(s) = 0.5Observed R(s) = 0Old U(s’) = -1New U(s) = U(s) +  [  (R(s) + U(s’) – U(s) ]If  = 0.1,  = 1.0New U(s) = 0.5 + 0.1 [ 0 + -1 – 0.5 ] = 0.5 + 0.1 [-1.5] = 0.35TD Updates: Linear ValuesAssume a linear value function:Can almost do a TD update:Problem: we can’t “increment” U(s) explicitlySolution: update the weights of the features at that stateLearning Eval Parameters with TDIdeally, want eval(s) to be the utility of sIdea: use techniques from reinforcement learningSamuel’s 1959 checkers systemTesauro’s 1992 backgammon system (TD-Gammon)Basic approach: temporal difference updatesBegin in state sChoose action using limited minimax searchSee what opponent doesEnd up in state s’Do a value update of U(s) using U(s’)Not guaranteed to converge against an adversary, but can work in practiceQ-LearningWith TD updates on valuesYou don’t need the model to update the utility estimatesYou still do need it to figure out what action to take!Q-Learning with TD updatesNo model needed to learn or to choose actionsTD Updates for Linear QsCan use TD learning with linear Qs(Actually it’s just like the perceptron!)Old Q-learning update:Simply update weights of features in Q(a,s)Coming UpReal-world applicationsLarge-scale machine / reinforcement learningNLP: language understanding and translationVision: object and face

View Full Document

Berkeley COMPSCI 188 - Games II

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Berkeley COMPSCI 188 - Games II

Sign up for free to view:

Please select your school