DOC PREVIEW
Berkeley COMPSCI 188 - Apprenticeship Learning for Robotic Control

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Page 1Apprenticeship Learning for Robotic Control, with Applications to Quadruped Locomotion and Autonomous Helicopter Flight Pieter AbbeelUC Berkeley EECSIn collaboration with: Andrew Y. Ng, Adam Coates, J. Zico Kolter, Morgan Quigley, Dmitri Dolgov, Sebastian Thrun. Key idea: learning from demonstrations Concretely inverse reinforcement learning Has enabled advancing the state of the art in various robotic domains.OverviewPage 2Reinforcement learning / Optimal controlSystemDynamicsPPPPsasasasastate ssss0ssss1Systemdynamics PPPPsasasasa…SystemDynamicsPPPPsasasasassssTTTT-1ssssTTTTssss2aaaa0aaaa1aaaaT-1reward RRRR(ssss0)RRRR(ssss2) RRRR(ssssTTTT-1)RRRR(ssss1) RRRR(ssssTTTT)+ ++…++Goal: Pick actions over time so as to maximize the expected score: E[R(s) + R(s) + … + R(sT)]Solution: controller ,which specifies an action for each possible state for all times t= 0, 1, … , T-1.Examples: car driving, helicopter flight, legged locomotion; load balancing, pricing, ad placement, …Example task: drivingPage 3 Input:  State space, action space Transition model Psa(st+1| st, at)Noreward function Teacher’s demonstration: s0, a0, s1, a1, s2, a2, …(= trace of the teacher’s policy π*) Inverse reinforcement learning:Can we recover R from the teacher’s demonstration?Problem setup Alleviate the need for specifying a reward function, which can be hard in practice--- Several example applications in this lecture Modeling and understanding of behaviour Biological behaviour Multi-agent systems: understand (exploit?!) the other agentsApplicationsPage 4Inverse reinforcement learningE[Tt=0R(st)|π∗] ≥ E[Tt=0R(st)|π] ∀π = π∗ Condition for the reward function R to be consistent with the teacher’s policy π*: Find the reward function that maximizes the margin by which the teacher outperforms a set of other policies: Two technical aspects unaddressed in this lecture: How to generate a good set of alternative policies How to compute the expected sum of rewards for the teacher’s policy (we only have a trace)Inverse reinforcement learningPage 5Related work to Abbeel and Ng, 2004 Prior work: Behavioral cloning.  Utility elicitation / Inverse reinforcement learning, Ng & Russell, 2000. Closely related later work: Ratliff et al., 2006, 2007; Neu & Szepesvari, 2007; Ramachandran and Amir, 2007; Syed & Schapire, 2008; Ziebart et al., 2008; … Work on specialized reward function: trajectories. E.g., Atkeson & Schaal, 1997.Highway drivingTeacher in Training World Learned Policy in Testing World Input:  Dynamics model / Simulator Psa(st+1| st, at) Teacher’s demonstration: 1 minute in “training world” Note: R* is unknown. Reward features: 5 features corresponding to lanes/shoulders; 10 features corresponding to presence of other car in current lane at different distancesPage 6More driving examplesIn each video, the left sub-panel shows ademonstration of a different driving“style”, and the right sub-panel showsthe behavior learned from watching thedemonstration.Driving demonstrationDriving demonstrationLearned behaviorLearned behaviorParking lot navigation[Abbeel et al., IROS 08] Reward function trades off:  Staying “on-road,” Forward vs. reverse driving, Amount of switching between forward and reverse,  Lane keeping, On-road vs. off-road, Curvature of paths.Page 7 Demonstrate parking lot navigation on “train parking lots.” Run our apprenticeship learning algorithm to find the reward function. Receive “test parking lot” map + starting point and destination.  Find the trajectory that maximizes the learned reward functionfor navigating the test parking lot.Experimental setupNice driving stylePage 8Sloppy driving-style“Don’t mind reverse” driving-stylePage 9 Reward function trades off 25 features.Quadruped[Kolter, Abbeel & Ng, 2008] Demonstrate path across the “training terrain” Run our apprenticeship learning algorithm to find the reward function Receive “testing terrain”---height map.  Find the optimal policy with respect to the learned reward functionfor crossing the testing terrain.Experimental setupPage 10Without learningWith learned reward functionPage 11 Key idea: learning from demonstrations Concretely inverse reinforcement learning Has enabled advancing the state of the art in various robotic domains.Recap How does helicopter dynamics work Autonomous helicopter setup Application of inverse RL to autonomous helicopter flightRemainder of lecture: application to extreme helicopter flightPage 12 4 control inputs: Main rotor collective pitch Main rotor cyclic pitch (roll and pitch) Tail rotor collective pitchHelicopter dynamicsAutonomous helicopter setupOn-Board Inertial Measurements Unit (IMU) dataSend out controls to helicopter1. Kalman filter2.Feedback controllerPosition dataPage 13Related work Bagnell & Schneider, 2001; LaCivita, Papageorgiou, Messner & Kanade, 2002; Ng, Kim, Jordan & Sastry 2004a (2001);  Roberts, Corke & Buskey, 2003; Saripalli, Montgomery & Sukhatme, 2003; Shim, Chung, Kim & Sastry, 2003; Doherty et al., 2004. Gavrilets, Martinos, Mettler and Feron, 2002; Ng et al., 2004b.Maneuvers presented here are significantly more challenging and more diverse than those performed by any other autonomous helicopter.1. Our expert pilot demonstrates the airshow several times.Experimental setup for helicopterPage 14Demonstrations1. Our expert pilot demonstrates the airshow several times.2. Learn a reward function---trajectory.3. Learn a dynamics model.Experimental setup for helicopterPage 15Learned reward (trajectory)1. Our expert pilot demonstrates the airshow several times.2. Learn A. Reward function.B. Dynamics model.3. Find the optimal control policy for learned reward and dynamics model.4. Autonomously fly the airshow5. Learn an improved dynamics model. Go back to step 4.Experimental setup for helicopterPage 16Thank you.Page


View Full Document

Berkeley COMPSCI 188 - Apprenticeship Learning for Robotic Control

Documents in this Course
CSP

CSP

42 pages

Metrics

Metrics

4 pages

HMMs II

HMMs II

19 pages

NLP

NLP

23 pages

Midterm

Midterm

9 pages

Agents

Agents

8 pages

Lecture 4

Lecture 4

53 pages

CSPs

CSPs

16 pages

Midterm

Midterm

6 pages

MDPs

MDPs

20 pages

mdps

mdps

2 pages

Games II

Games II

18 pages

Load more
Download Apprenticeship Learning for Robotic Control
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Apprenticeship Learning for Robotic Control and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Apprenticeship Learning for Robotic Control 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?