DOC PREVIEW
Berkeley COMPSCI 287 - Lecture Notes 7

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Page 1CS 287: Advanced RoboticsFall 2009Lecture 6: Control 5: Optimal control --- [Function approximation in dynamic programming---special case: quadratic]Pieter AbbeelUC Berkeley EECSPHD Will there be lecture this Thursday (Sept 24)? Yes. No office hours this Thursday (as I am examining students for prelims). Feel free to schedule an appointment by email instead.Announcements Final project contents: Original investigation into an area that relates sufficiently closely to the course. Could be algorithmic/theoretical idea Could be application of existing algorithm(s) to a platform or domain in which these algorithms carry promise but have not been applied Alternatively: Significant improvement for an existing (or new) assignment for this course or for an existing (or new) assignment for 188 which has close ties to this course. Ideally: we are able to identify a topic that relates both to your on-going PhD research and the course. You are very welcome to come up with your own project ideas, yet make sure to pass them by me **before** you submit your abstract. Feel free to stop by office hours or set an appointment (via email) to discuss potential projects.Announcements Final project logistics: Final result: 6-8 page paper.  Should be structured like a conference paper, i.e., focus on the problem setting, why it matters, what is interesting/unsolved about it, your approach, results, analysis, and so forth. Cite and briefly survey prior work as appropriate, but don’t re-write prior work when not directly relevant to understand your approach.  Milestones: Oct. 9th, 23:59: **Approved-by-me** abstracts due: 1 page description of project + goals for milestone. Make sure to sync up with me before then! Nov 9th, 23:59: 1 page milestone report due Dec 3rd, In-class project presentations [tentatively] Dec 11th, 23:59: Final paper due 1 or 2 students/project. If you are two students on 1 final project, I will expect twice as much research effort has gone into it!AnnouncementsBellman’s curse of dimensionality n-dimensional state space Number of states grows exponentially in n In practice Discretization is considered only computationally feasible up to 5 or 6 dimensional state spaces even when using Variable resolution discretization Very fast implementationsPage 2 Linear Quadratic (LQ) setting --- special case: can solve continuous optimal control problem exactlyGreat reference:[optional] Anderson and Moore, Linear Quadratic Methods --- standard reference for LQ settingTodayLinear Quadratic Regulator (LQR)The LQR setting assumes a linear dynamical system:xt+1= Axt+ But,xt: state at time tut: input at time tIt assumes a quadratic cost function:g(xt, ut) = x⊤tQxt+ u⊤tRutwith Q ≻ 0, R ≻ 0.For a square matrix X we have X ≻ 0 if and only if for all vectors z wehave z⊤Xz > 0. Hence there is a non-zero cost for any state different from theall-zeros state, and any input different from the all-zeros input.While LQ assumptions might seem very restrictive,we will see the method can be made applicablefor non-linear systems, e.g., helicopter.Value iteration Back-up step for i+1 steps to go: LQR:= minux⊤Qx + u⊤Ru + γ Ji(Ax + Bu)LQR value iteration: J1Initialize J0(x) = x⊤P0x.J1(x) = minux⊤Qx + u⊤Ru + J0(Ax + Bu)= minux⊤Qx + u⊤Ru + (Ax + Bu)⊤P0(Ax + Bu)(1)To find the minimum over u, we set the gradient w.r.t. u equal to zero:∇u[. . .] = 2Ru + 2B⊤P0(Ax + Bu) = 0,hence: u =−(R + B⊤P0B)−1B⊤P0Ax (2)(2) into (1): J1(x) = x⊤P1xfor: P1= Q + K⊤1RK1+ (A + BK1)⊤P0(A + BK1)K1=−(R + B⊤P0B)−1B⊤P0A.LQR value iteration: J1(ctd) In summary: J1(x) is quadratic, just like J0(x). Value iteration update is the same for all times and can be done in closed form!J1(x) = x⊤P1xfor: P1= Q + K⊤1RK1+ (A + BK1)⊤P0(A + BK1)K1=−(R + B⊤P0B)−1B⊤P0A.J0(x) = x⊤P0xxt+1= Axt+ Butg(x, u) = u⊤Ru + x⊤QxJ2(x) = x⊤P2xfor: P2= Q + K⊤2RK2+ (A + BK2)⊤P1(A + BK2)K2=−(R + B⊤P1B)−1B⊤P1A.Page 3Value iteration solution to LQRSet P0= 0.for i = 1, 2, 3, . . .Ki= −(R + B⊤Pi−1B)−1B⊤Pi−1APi= Q + K⊤iRKi+ (A + BKi)⊤Pi−1(A + BKi)The optimal policy for a i-step horizon is given by:π(x) = KixThe cost-to-go function for a i-step horizon is given by:Ji(x) = x⊤Pix. Extensions which make it more generally applicable: Affine systems System with stochasticity Regulation around non-zero fixed point for non-linear systems Penalization for change in control inputs Linear time varying (LTV) systems Trajectory following for non-linear systemsLQR assumptions revisitedxt+1= Axt+ Butg(xt, ut) = x⊤tQxt+ u⊤tRut= for keeping a linear system at the all-zeros state. Optimal control policy remains linear, optimal cost-to-go function remains quadratic Two avenues to do derivation: 1. Work through the DP update as we did for standard setting 2. Redefine the state as: z_t = [x_t; 1], then we have:LQR Ext0: Affine systemsxt+1= Axt+ But+ cg(xt, ut) = x⊤tQxt+ u⊤tRutzt+1= xt+11= A c0 1 xt1+ B0ut= A′zt+ B′ut Exercise: work through similar derivation as we did for the deterministic case. Result:  Same optimal control policy Cost-to-go function is almost identical: has one additional term which depends on the variance in the noise (and which cannot be influenced by the choice of control inputs)LQR Ext1: stochastic systemxt+1= Axt+ But+ wtg(xt, ut) = x⊤tQxt+ u⊤tRutwt, t = 0, 1, . . . are zero mean and independentNonlinear system:We can keep the system at the state x* iffLinearizing the dynamics around x*gives:Equivalently:Let zt= xt– x*, let vt= ut– u*, then:[=standard LQR]LQR Ext2: non-linear systemsxt+1= f (xt, ut)∃u∗s.t. x∗= f(x∗, u∗)xt+1≈f(x∗, u∗) +∂f∂x(x∗, u∗)(xt−x∗) +∂f∂u(x∗, u∗)(ut−u∗)xt+1−x∗≈A(xt−x∗) + B(ut−u∗)ABzt+1= Azt+ Bvt, cost = z⊤tQzt+ v⊤tRvtvt= Kzt⇒ut−u∗= K(xt−x∗)⇒ut= u∗+ K(xt−x∗)LQR Ext3: penalize for change in control inputs Standard LQR: When run in this format on real systems: often high frequency control inputs get generated. Typically highly undesirable and results in poor control performance. Why? Solution: frequency shaping of the cost function. (See, e.g., Anderson and Moore.)  Simple special case which works well in practice: penalize for change in control


View Full Document
Download Lecture Notes 7
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes 7 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 7 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?