OSU CS 533 - RL for Large State Spaces: Policy Gradient (30 pages)

Previewing pages 1, 2, 14, 15, 29, 30 of 30 page document View the full content.
View Full Document

RL for Large State Spaces: Policy Gradient



Previewing pages 1, 2, 14, 15, 29, 30 of actual document.

View the full content.
View Full Document
View Full Document

RL for Large State Spaces: Policy Gradient

35 views

Other


Pages:
30
School:
Oregon State University
Course:
Cs 533 - Intelligent Agents And Decision Making
Intelligent Agents And Decision Making Documents

Unformatted text preview:

RL for Large State Spaces Policy Gradient Alan Fern 1 RL via Policy Gradient Search So far all of our RL techniques have tried to learn an exact or approximate utility function or Q function Learn optimal value of being in a state or taking an action from state Value functions can often be much more complex to represent than the corresponding policy Do we really care about knowing Q s left 0 3554 Q s right 0 533 Or just that right is better than left in state s Motivates searching directly in a parameterized policy space Bypass learning value function and directly optimize the value of a policy 2 Aside Gradient Ascent Given a function f 1 n of n real values 1 n suppose we want to maximize f with respect to A common approach to doing this is gradient ascent The gradient of f at point denoted by f is an n dimensional vector that points in the direction where f increases most steeply at point Vector calculus tells us that f is just a vector of partial derivatives f f f 1 n f 1 i 1 i i 1 n f f lim where i 0 3 Aside Gradient Ascent Gradient ascent iteratively follows the gradient direction starting at some initial point Initialize to a random value Repeat until stopping condition f With proper decay of learning rate gradient descent is guaranteed to converge to local optima Local optima of f 2 1 4 RL via Policy Gradient Ascent The policy gradient approach has the following schema 1 Select a space of parameterized policies 2 Compute the gradient of the value of current policy wrt parameters 3 Move parameters in the direction of the gradient 4 Repeat these steps until we reach a local maxima 5 Possibly also add in tricks for dealing with bad local maxima e g random restarts So we must answer the following questions How should we represent parameterized policies How can we compute the gradient 5 Parameterized Policies One example of a space of parametric policies is s arg max Q s a a where Q s a may be a linear function e g Q s a 0 1 f1 s a 2 f 2 s a n f n s a The goal is



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view RL for Large State Spaces: Policy Gradient and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view RL for Large State Spaces: Policy Gradient and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?