View Full Document

Hierarchy, Behavior, and Off- policy Learning



View the full content.
View Full Document
View Full Document

1 views

Unformatted text preview:

Outline Hierarchy Behavior and Offpolicy Learning Rich Sutton University of Alberta is hierarchical behavior a user illusion is it something we use it to explain our behavior to others and to ourselves but not what are brains are really doing or is it a real phenomena involved in every muscle we twitch a micro scale model of cognition in which abstractions play no role in producing behavior abstraction in state and time can be supported by options but off policy learning is required a new actor critic advantage algorithm for offpolicy learning Working hypothesis Hierarchy and abstraction play no role in producing behavior there is no current option no goal stack no hierarchical execution no execution of high level anything ever all execution is at a very low level say 100hz some definitions some definitions action lowest level action 100hz action lowest level action 100hz observation lowest level sensation 100hz observation lowest level sensation 100hz state some representation memory of the state of the world updated at 100hz state some representation memory of the state of the world updated at 100hz policy the mapping from state to action used to produce behavior at 100hz policy the mapping from state to action used to produce behavior at 100hz Abstractions are used only for changing the policy on every step 100hz PO Y LIC ACTION by learning OBS by planning STATE UPDATE STATE T STATE T 1 Outline a micro scale model of cognition in which abstractions play no role in producing behavior abstraction in state and time can be supported by options but off policy learning is required a new actor critic advantage algorithm for offpolicy learning abstractions are also used in the design of the state representation but in the end to produce behavior there is just a low level policy Definitions re options option a way of behaving that terminates when one of a set of states is reached defined entirely in low level terms 100hz actions are a special case of options option



Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Hierarchy, Behavior, and Off- policy Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hierarchy, Behavior, and Off- policy Learning and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?