DOC PREVIEW
Berkeley COMPSCI 188 - MDPs and Spider Solitaire

This preview shows page 1 out of 2 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

MDPs and Spider SolitaireCS 188: Section HandoutDecember 1, 20051 Defining a Markov Decision Process (MDP)1.1 Fully observable MDPs• Initial State: S0• Transition Model: T (s, a, s0), the probability of going from s to s0with action a.• Reward Function: R(s), the reward for being in state s.1• Discount Factor: γ, the discount for rewards: a reward r in t steps is worth rγtnow.A solution to an MDP is called a policy, which is a function π(s) that maps from states to actions.Thus, for every state, there is one best action championed by a particular policy π.Question 1: What is the transition model for spider solitaire?1.2 Partially observable MPDs (POMDPs)• Observation Model: O(s, o), the probability of observing o in state s.• Belief State Mo del: b(s), the probability of being in state s in belief state b.A solution to a POMDP is also a policy, but it is a function from belief states to actions.Question 2: What is the observation model for spider solitaire?Question 3: What would a policy look like for spider solitaire?Question 4: Sometimes, humans guess when they play solitare. How can we represent guessing?1Sometimes rewards have different structures, such as R(s, a, s0): the reward for moving from s to s0via action a.12 Solving an MDP2.1 Value iteration: an exact solution to MDPsThe quick and dirty story of value iteration:• Solves the Bellman equation: U (s) = R(s) + γ maxaPs0T (s, a, s0)U(s0)• Iterates through each state many times, updating U(s).• Iteration always converges to the correct answer given infinite time.Question 5: Given this short description, why might value iteration not work for spider solitaire?2.2 Expectimax search: an approximate solution to MDPsThe quick and dirty story of expectimax:• Expectimax yields approximate solutions to MDPs with unknown utility functions.• Requires an approximate utility function – preferably one that can be calculated quickly.• Similar to a two-player game tree search, but with the other player making random moves.• When calculating an expectation is too difficult, we can use sampling to approximate.Max nodeExpectationnodeMove Q to KMove 3 to 4Flip a 4Figure 1: An expectimax treeQuestion 6: How many samples do we need to approximate a card draw


View Full Document

Berkeley COMPSCI 188 - MDPs and Spider Solitaire

Documents in this Course
CSP

CSP

42 pages

Metrics

Metrics

4 pages

HMMs II

HMMs II

19 pages

NLP

NLP

23 pages

Midterm

Midterm

9 pages

Agents

Agents

8 pages

Lecture 4

Lecture 4

53 pages

CSPs

CSPs

16 pages

Midterm

Midterm

6 pages

MDPs

MDPs

20 pages

mdps

mdps

2 pages

Games II

Games II

18 pages

Load more
Download MDPs and Spider Solitaire
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view MDPs and Spider Solitaire and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view MDPs and Spider Solitaire 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?