DOC PREVIEW
Berkeley COMPSCI 188 - FORMULATING AND SOLVING MDPS

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS188 – Introduction to Artificial IntelligenceSection Handout #5, FORMULATING AND SOLVING MDPSKlein, Fall 2007Question 1 (Class)Consider the above MDP, representing a robot on a balance beam. Each grid square is a state and the available actions are right and left. The agent starts in state s2 , and all states have reward 0 aside from the ends of the grid s1 and s8 and the ground state, which have the rewards shown. Moving left or right results in a move left or right (respectively) with probability p. With probability 1 p, the robot falls− off the beam (transitions to ground, and receives a reward of -1). Falling off, or reaching either endpoint, result in the end of the episode (i.e., they are terminal states). Note that terminal states receive no future reward.a. For what values of p is the optimal action from s2 to move right if the discount γ is 1?b. For what values of is the optimal action from sγ2 to move right if p = 1?CS188 – Introduction to Artificial IntelligenceSection Handout #5, FORMULATING AND SOLVING MDPSKlein, Fall 2007c. Given initial value estimates of zero, show the results of one, then two rounds of value iteration.d. We can develop learning updates that involve two actions instead of one. Write down the utility Uπ(s) of a state s under policy in terms of the next two states s'π and s'', given that Us=s 'T s ,s , s ' [ R s , s , s '  Us ' ]e. Write a two-step-look-ahead value iteration update that involves U (s) and U (s''), where s'' is the state two time steps later. Why would this update not be used in practice?f. Write a two-step-look-ahead TD-learning update that involves U (s) and U (s'') for the observed state-action-state-action-state sequence s, a, s', a', s''CS188 – Introduction to Artificial IntelligenceSection Handout #5, FORMULATING AND SOLVING MDPSKlein, Fall 2007g. Given initial q-value estimates of zero, show the result of Q-learning with learning rate = 0.5 after two epsiodes: [s2 , s3 , ground] and [s2 , s3 , s4 , s5 ,α ground] where the agent always moves right. You need only write down the non-zero entries. For the purposes of Q-learning updates, terminal states should be treated as having a single action die which leads to future rewards of zero. Hint: q-values of terminal states which have been visited should not be zero.CS188 – Introduction to Artificial IntelligenceSection Handout #5, FORMULATING AND SOLVING MDPSKlein, Fall 2007Question 1 (Class)Golf as an MDPWe formulate golf as an MDP as follows:State Space : {Tee, Fairway,Sand, Green}Actions : {Conservative shot, Power shot}Initial State : TeeTransition model : (note that action not on this list have probability 0)Rewards:(note: R(·,·,s) means that the reward is received for transitioning to state s, regardless of action taken or previous state)sR(·,·,s)FairwaySandGr e en-1-23CS188 – Introduction to Artificial IntelligenceSection Handout #5, FORMULATING AND SOLVING MDPSKlein, Fall 2007a. Consider the policy of always taking the “Conservative Shot”. What is the utility of the initial state under this policy?b. Compute estimates of the utility of each state under the optimal policy using Value Iteration with 3 iterations. Show the utilities of each state at each iteration. Assume we start with all utilities set to


View Full Document

Berkeley COMPSCI 188 - FORMULATING AND SOLVING MDPS

Documents in this Course
CSP

CSP

42 pages

Metrics

Metrics

4 pages

HMMs II

HMMs II

19 pages

NLP

NLP

23 pages

Midterm

Midterm

9 pages

Agents

Agents

8 pages

Lecture 4

Lecture 4

53 pages

CSPs

CSPs

16 pages

Midterm

Midterm

6 pages

MDPs

MDPs

20 pages

mdps

mdps

2 pages

Games II

Games II

18 pages

Load more
Download FORMULATING AND SOLVING MDPS
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view FORMULATING AND SOLVING MDPS and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view FORMULATING AND SOLVING MDPS 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?