CS 6375 Machine Learning Homework 6 Part II Due: 05/03/2015 1. Reading: reinforcement learning application. (10 pts) Read a paper about using reinforcement learning for an application. Briefly summarize the paper, and explain clearly the states, reward, actions, and learning for the task. 2. MDP. (25 pts) The following figure shows an MDP with N states. All states have two actions (north and right) except Sn, which can only self-loop. As you can see from the figure, all state transitions are deterministic. The discount factor is γ. (a) What is J*(Sn)? (b) What is the optimal policy? (c) What is J*(S1)? (d) Use value iteration to solve this MDP. What is J1(S1) and J2(S1) in the first and second iteration respectively? Hint: If you don’t remember the formula for summing up geometric series, you will need the following one, where 0 <= α <1: 3. Policy iteration. (20 pts) Consider the following MDP with three states, with rewards -1, -2, 0 respectively. State 3 is the terminal state. There are two possible actions, a and b, for states 1 and 2. The transition probabilities for the two actions are shown in the figure. Use a discounting factor of 0.5. (a) Assume the initial policy has action b in both states 1 and 2. Apply policy iteration to determine the optimal policy and the values of states 1 and 2. Show the steps. (b) What if the initial policy has action a in both states?Note: since there are only two states in this problem, when you use polity iteration, please compute the J* values in each iteration by solving the matrix equations (rather than using the simplified iteration algorithm we used in
View Full Document