DOC PREVIEW
Berkeley COMPSCI 287 - Lecture Notes

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Page 1CS 287: Advanced RoboticsFall 2009Lecture 6: Control 5: Optimal control --- [Function approximation in dynamic programming---special case: quadratic]Pieter AbbeelUC Berkeley EECSPHDPage 2 Will there be lecture this Thursday (Sept 24)? Yes. No office hours this Thursday (as I am examining students for prelims). Feel free to schedule an appointment by email instead.Announcements Final project contents: Original investigation into an area that relates sufficiently closely to the course. Could be algorithmic/theoretical idea Could be application of existing algorithm(s) to a platform or domain in which these algorithms carry promise but have not been applied Alternatively: Significant improvement for an existing (or new) assignment for this course or for an existing (or new) assignment for 188 which has close ties to this course. Ideally: we are able to identify a topic that relates both to your on-going PhD research and the course. You are very welcome to come up with your own project ideas, yet make sure to pass them by me **before** you submit your abstract. Feel free to stop by office hours or set an appointment (via email) to discuss potential projects.AnnouncementsPage 3 Final project logistics: Final result: 6-8 page paper.  Should be structured like a conference paper, i.e., focus on the problem setting, why it matters, what is interesting/unsolved about it, your approach, results, analysis, and so forth. Cite and briefly survey prior work as appropriate, but don’t re-write prior work when not directly relevant to understand your approach.  Milestones: Oct. 9th, 23:59: **Approved-by-me** abstracts due: 1 page description of project + goals for milestone. Make sure to sync up with me before then! Nov 9th, 23:59: 1 page milestone report due Dec 3rd, In-class project presentations [tentatively] Dec 11th, 23:59: Final paper due 1 or 2 students/project. If you are two students on 1 final project, I will expect twice as much research effort has gone into it!AnnouncementsBellman’s curse of dimensionality n-dimensional state space Number of states grows exponentially in n In practice Discretization is considered only computationally feasible up to 5 or 6 dimensional state spaces even when using Variable resolution discretization Very fast implementationsPage 4 Linear Quadratic (LQ) setting --- special case: can solve continuous optimal control problem exactlyGreat reference:[optional] Anderson and Moore, Linear Quadratic Methods --- standard reference for LQ settingTodayLinear Quadratic Regulator (LQR)The LQR setting assumes a linear dynamical system:xt+1= Axt+ But,xt: state at time tut: input at time tIt assumes a quadratic cost function:g(xt, ut) = x⊤tQxt+ u⊤tRutwith Q ≻ 0, R ≻ 0.For a square matrix X we have X ≻ 0 if and only if for all vectors z wehave z⊤Xz > 0. Hence there is a non-zero cost for any state different from theall-zeros state, and any input different from the all-zeros input.Page 5While LQ assumptions might seem very restrictive,we will see the method can be made applicablefor non-linear systems, e.g., helicopter.Value iteration Back-up step for i+1 steps to go: LQR:= minux⊤Qx + u⊤Ru + γ Ji(Ax + Bu)Page 6LQR value iteration: J1Initialize J0(x) = x⊤P0x.J1(x) = minux⊤Qx + u⊤Ru + J0(Ax + Bu)= minux⊤Qx + u⊤Ru + (Ax + Bu)⊤P0(Ax + Bu)(1)To find the minimum over u, we set the gradient w.r.t. u equal to zero:∇u[. . .] = 2Ru + 2B⊤P0(Ax + Bu) = 0,hence: u =−(R + B⊤P0B)−1B⊤P0Ax (2)(2) into (1): J1(x) = x⊤P1xfor: P1= Q + K⊤1RK1+ (A + BK1)⊤P0(A + BK1)K1=−(R + B⊤P0B)−1B⊤P0A.LQR value iteration: J1(ctd) In summary: J1(x) is quadratic, just like J0(x). Value iteration update is the same for all times and can be done in closed form!J1(x) = x⊤P1xfor: P1= Q + K⊤1RK1+ (A + BK1)⊤P0(A + BK1)K1=−(R + B⊤P0B)−1B⊤P0A.J0(x) = x⊤P0xxt+1= Axt+ Butg(x, u) = u⊤Ru + x⊤QxJ2(x) = x⊤P2xfor: P2= Q + K⊤2RK2+ (A + BK2)⊤P1(A + BK2)K2=−(R + B⊤P1B)−1B⊤P1A.Page 7Value iteration solution to LQRSet P0= 0.for i = 1, 2, 3, . . .Ki= −(R + B⊤Pi−1B)−1B⊤Pi−1APi= Q + K⊤iRKi+ (A + BKi)⊤Pi−1(A + BKi)The optimal policy for a i-step horizon is given by:π(x) = KixThe cost-to-go function for a i-step horizon is given by:Ji(x) = x⊤Pix. Extensions which make it more generally applicable: Affine systems System with stochasticity Regulation around non-zero fixed point for non-linear systems Penalization for change in control inputs Linear time varying (LTV) systems Trajectory following for non-linear systemsLQR assumptions revisitedxt+1= Axt+ Butg(xt, ut) = x⊤tQxt+ u⊤tRut= for keeping a linear system at the all-zeros state.Page 8 Optimal control policy remains linear, optimal cost-to-go function remains quadratic Two avenues to do derivation: 1. Work through the DP update as we did for standard setting 2. Redefine the state as: z_t = [x_t; 1], then we have:LQR Ext0: Affine systemsxt+1= Axt+ But+ cg(xt, ut) = x⊤tQxt+ u⊤tRutzt+1= xt+11= A c0 1 xt1+ B0ut= A′zt+ B′ut Exercise: work through similar derivation as we did for the deterministic case. Result:  Same optimal control policy Cost-to-go function is almost identical: has one additional term which depends on the variance in the noise (and which cannot be influenced by the choice of control inputs)LQR Ext1: stochastic systemxt+1= Axt+ But+ wtg(xt, ut) = x⊤tQxt+ u⊤tRutwt, t = 0, 1, . . . are zero mean and independentPage 9Nonlinear system:We can keep the system at the state x* iffLinearizing the dynamics around x*gives:Equivalently:Let zt= xt– x*, let vt= ut– u*, then:[=standard LQR]LQR Ext2: non-linear systemsxt+1= f(xt, ut)∃u∗s.t. x∗= f(x∗, u∗)xt+1≈f(x∗, u∗) +∂f∂x(x∗, u∗)(xt−x∗) +∂f∂u(x∗, u∗)(ut−u∗)xt+1−x∗≈A(xt−x∗) + B(ut−u∗)ABzt+1= Azt+ Bvt, cost = z⊤tQzt+ v⊤tRvtvt= Kzt⇒ut−u∗= K(xt−x∗)⇒ut= u∗+ K(xt−x∗)LQR Ext3: penalize for change in control inputs Standard LQR: When run in this format on real systems: often high frequency control inputs get generated. Typically highly undesirable and results in poor control performance. Why? Solution: frequency shaping of the cost function. (See, e.g., Anderson and Moore.)  Simple special case which works well in practice:


View Full Document
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?