UW-Madison ECE 539 - Reinforcement Learning and the Temporal Difference Algorithm (18 pages)

Previewing pages 1, 2, 3, 4, 5, 6 of 18 page document View the full content.
View Full Document

Reinforcement Learning and the Temporal Difference Algorithm



Previewing pages 1, 2, 3, 4, 5, 6 of actual document.

View the full content.
View Full Document
View Full Document

Reinforcement Learning and the Temporal Difference Algorithm

70 views


Pages:
18
School:
University of Wisconsin, Madison
Course:
Ece 539 - Introduction to Artificial Neural Network and Fuzzy Systems
Introduction to Artificial Neural Network and Fuzzy Systems Documents

Unformatted text preview:

Lenz 1 Reinforcement Learning and the Temporal Difference Algorithm By John Lenz Lenz 2 1 Introduction to Reinforcement Learning Reinforcement learning is learning to maximize a reward signal by exploring many possible actions The agent is not told the correct actions instead it explores the possible actions and remembers the reward it receives With supervised learning an agent takes an action and is then told what was the correct action For example the agent will classify a picture as a number 3 and the teacher will explain that the picture is the number 8 In reinforcement learning the agent takes an action and then receives a reward based on that action there is no teacher to give the correct action In some problems like games such as checkers or chess the correct action isn t even known Reinforcement learning can be applied to many control problems where there is no expert knowledge about the task Reinforcement learning attempts to mimic one of the major the way humans learn Instead of being told what to do we learn through experience in our interaction with the environment we feel pain and pleasure punishing or rewarding us for our actions In a similar way a reinforcement learning agent learns to interact with an unknown and unspecified environment Reinforcement learning can be applied to any goal directed and decision making problem specific knowledge about the environment and expert teaching are not required The reinforcement learning problem model is an agent continuously interacting with an environment The agent and the environment interact in a sequence of time steps At each time step t the agent receives the state of the environment and a scalar numerical reward for the previous action and then the agent then selects an action A time sequence t 1 2 3 can either form an episode where the state is reset to the initial state and t is reset to 1 after a specific terminal state is reached or time can continue marching towards infinity to form a continual task A



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Reinforcement Learning and the Temporal Difference Algorithm and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Reinforcement Learning and the Temporal Difference Algorithm and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?