UT Dallas CS 6375 - hw6-PartII - D3105026

Home> Schools> University of Texas at Dallas> Computer Science (CS) > CS 6375> hw6-PartII

DOC PREVIEW

UT Dallas CS 6375 - hw6-PartII

School name University of Texas at Dallas

Course Cs 6375- Machine Learning

Pages 2

This preview shows page 1 out of 2 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 2 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 2 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS 6375 Machine Learning Homework 6 Part II Due: 05/03/2015 1. Reading: reinforcement learning application. (10 pts) Read a paper about using reinforcement learning for an application. Briefly summarize the paper, and explain clearly the states, reward, actions, and learning for the task. 2. MDP. (25 pts) The following figure shows an MDP with N states. All states have two actions (north and right) except Sn, which can only self-loop. As you can see from the figure, all state transitions are deterministic. The discount factor is γ. (a) What is J*(Sn)? (b) What is the optimal policy? (c) What is J*(S1)? (d) Use value iteration to solve this MDP. What is J1(S1) and J2(S1) in the first and second iteration respectively? Hint: If you don’t remember the formula for summing up geometric series, you will need the following one, where 0 <= α <1: 3. Policy iteration. (20 pts) Consider the following MDP with three states, with rewards -1, -2, 0 respectively. State 3 is the terminal state. There are two possible actions, a and b, for states 1 and 2. The transition probabilities for the two actions are shown in the figure. Use a discounting factor of 0.5. (a) Assume the initial policy has action b in both states 1 and 2. Apply policy iteration to determine the optimal policy and the values of states 1 and 2. Show the steps. (b) What if the initial policy has action a in both states?Note: since there are only two states in this problem, when you use polity iteration, please compute the J* values in each iteration by solving the matrix equations (rather than using the simplified iteration algorithm we used in

View Full Document

UT Dallas CS 6375 - hw6-PartII

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 2 pages.

UT Dallas CS 6375 - hw6-PartII

Sign up for free to view:

Please select your school