Intelligent control through learning and optimization Emo Todorov Applied Mathematics and Computer Science Engineering University of Washington Winter 2014 Emo Todorov UW AMATH CSE 579 Winter 2014 1 12 Administrative information No fixed office hours or TA Feel free to contact me with any questions or to set up a meeting email todorov cs washington edu office CSE 422 Grading will be based on 3 homeworks and one paper presentation Materials will be posted on the class website http www cs washington edu homes todorov courses amath579 Emo Todorov UW AMATH CSE 579 Winter 2014 2 12 Background and readings Expected background Helpful background Linear algebra Vector calculus Basic probability theory Matlab programming Emo Todorov UW Numerical optimization Stochastic processes Dynamical systems ODEs PDEs AMATH CSE 579 Winter 2014 3 12 Background and readings Expected background Helpful background Linear algebra Vector calculus Basic probability theory Matlab programming Numerical optimization Stochastic processes Dynamical systems ODEs PDEs There is no suitable textbook but here are some useful books Sutton and Barto 1998 Reinforcement Learning An Introduction online Bertsekas and Tsitsiklis 1996 Neuro dynamic programming Bertsekas 2000 Dynamic programming and optimal control Stengel 1994 Optimal control and estimation General purpose readings available on the class website Lecture notes Pieter Abbeel Berkeley Lecture notes book in preparation Russ Tedrake MIT Lecture slides Dimitri Bertsekas MIT Lecture notes Ben Van Roy Stanford Book chapter on Optimal Control Emo Todorov UW Emo Todorov UW AMATH CSE 579 Winter 2014 3 12 Why learning and optimization It is good to be optimal The real question is how to get there This is how nature works and it produces much more intelligent controllers than anything we have ever built We are not smart enough to manually design really complex control systems and analytical solutions are generally lacking optimal or not Indeed the vast majority of control systems used in industry are PID If we are going to use a computer to design a control system casting the problem in terms of numerical optimization makes sense Learning and optimization have already produced very impressive control systems The advent of massively parallel processors makes it possible to re optimize control systems in real time which may turn out to be the only way to solve really hard control problems Emo Todorov UW AMATH CSE 579 Winter 2014 4 12 The big picture noise control signal controller efference copy state estimate dynamical system plant We will focus on problems where the controller has direct access to the state estimator sensor data noise Emo Todorov UW AMATH CSE 579 Winter 2014 5 12 The big picture noise control signal controller efference copy state estimate dynamical system plant We will focus on problems where the controller has direct access to the state estimator sensor data noise Example Linear quadratic Gaussian LQG system dynamical system sensor data optimal estimator optimal controller Emo Todorov UW xt 1 Axt But t yt Hxt t b xt 1 Ab xt But K yt ut Lb xt AMATH CSE 579 Hb xt Winter 2014 5 12 Direct and indirect methods Indirect methods Direct methods 1 Choose a parametric form of the control law 1 Choose a parametric form of the value function 2 Implement a function which can evaluate the performance of any control law usually by extensive simulation 2 Solve the equation which this function is supposed to satisfy Bellman equation Pontryagin maximum principle 3 Optimize this function with respect to the control law parameters using generic optimization tools 3 Derive the corresponding control law by greedy optimization of the value function often analytically Robust but often slow Emo Todorov UW Fast but not always robust AMATH CSE 579 Winter 2014 6 12 Local and global methods Local methods Global methods 1 Represent the solution along a trajectory 2 Find a direction of improvement in trajectory space do line search No feedback but see MPC Emo Todorov UW 1 Represent the solution as a globally defined function 2 Improve the solution globally by modifying the function Curse of dimensionality AMATH CSE 579 Winter 2014 7 12 Specific methods Local Direct Indirect Global space time optimization L x0 x1 xN policy gradient E t xt ut xt w DDP dynamic programming vt x minu f t x u vt 1 x0 g vt x Emo Todorov UW ct xT St x AMATH CSE 579 Winter 2014 8 12 Optimal control in the context of optimization generic optimization problem minw L w learning problem minw L w where L w 1 N N yn f xn w n 1 optimal control problem minw L w where L w Emo Todorov UW 1 N N xt w and xt 1 f xt w t 1 AMATH CSE 579 Winter 2014 9 12 Part I Stochastic optimal control theory Stochastic optimal control in discrete space and time Markov Decision Processes MDPs Bellman equations for different problem formulations Policy iteration and value iteration contraction mappings Linear programming view of MDPs Stochastic optimal control in continuous space and time Controlled diffusions Numerical discretization Hamilton Jacobi Bellman equations Exact solutions for Linear quadratic Gaussian LQG problems Stochastic optimal control problems with linear Bellman equations Discrete and continuous problem classes with linear Hamilton Jacobi Bellman equations Compositionality of optimal control laws Embedding of generic optimal control problems Path integrals Emo Todorov UW AMATH CSE 579 Winter 2014 10 12 Part II Numerical methods for optimal control Robot dynamics and control Multi joint kinematics and dynamics Dynamics in the presence of frictional contacts Computed torque control PID control sliding mode control Hierarchical and operational space control Approximate dynamic programming and Reinforcement Learning Dynamic programming with function approximation Function approximation in the linear Bellman equation framework Monte Carlo methods temporal difference methods TD lambda Off policy and on policy methods Q learning and Sarsa Policy gradient methods Local trajectory based methods for optimal control Maximum principle for deterministic and stochastic systems ODE methods pseudo spectral methods space time optimization Differential dynamic programming and iterative LQG Model predictive and receding horizon control rollout policies Emo Todorov UW AMATH CSE 579 Winter 2014 11 12 Part III Other topics Data driven methods Using motion capture to design controllers Imitation learning Unsupervised learning from motion
View Full Document
Unlocking...