Berkeley COMPSCI 287 - Lecture Notes 7 - D244609

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 287> Lecture Notes 7

DOC PREVIEW

Berkeley COMPSCI 287 - Lecture Notes 7

School name University of California, Berkeley

Course Compsci 287- Advanced Robotics

Pages 7

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Page 1CS 287: Advanced RoboticsFall 2009Lecture 6: Control 5: Optimal control --- [Function approximation in dynamic programming---special case: quadratic]Pieter AbbeelUC Berkeley EECSPHD Will there be lecture this Thursday (Sept 24)? Yes. No office hours this Thursday (as I am examining students for prelims). Feel free to schedule an appointment by email instead.Announcements Final project contents: Original investigation into an area that relates sufficiently closely to the course. Could be algorithmic/theoretical idea Could be application of existing algorithm(s) to a platform or domain in which these algorithms carry promise but have not been applied Alternatively: Significant improvement for an existing (or new) assignment for this course or for an existing (or new) assignment for 188 which has close ties to this course. Ideally: we are able to identify a topic that relates both to your on-going PhD research and the course. You are very welcome to come up with your own project ideas, yet make sure to pass them by me **before** you submit your abstract. Feel free to stop by office hours or set an appointment (via email) to discuss potential projects.Announcements Final project logistics: Final result: 6-8 page paper.  Should be structured like a conference paper, i.e., focus on the problem setting, why it matters, what is interesting/unsolved about it, your approach, results, analysis, and so forth. Cite and briefly survey prior work as appropriate, but don’t re-write prior work when not directly relevant to understand your approach.  Milestones: Oct. 9th, 23:59: **Approved-by-me** abstracts due: 1 page description of project + goals for milestone. Make sure to sync up with me before then! Nov 9th, 23:59: 1 page milestone report due Dec 3rd, In-class project presentations [tentatively] Dec 11th, 23:59: Final paper due 1 or 2 students/project. If you are two students on 1 final project, I will expect twice as much research effort has gone into it!AnnouncementsBellman’s curse of dimensionality n-dimensional state space Number of states grows exponentially in n In practice Discretization is considered only computationally feasible up to 5 or 6 dimensional state spaces even when using Variable resolution discretization Very fast implementationsPage 2 Linear Quadratic (LQ) setting --- special case: can solve continuous optimal control problem exactlyGreat reference:[optional] Anderson and Moore, Linear Quadratic Methods --- standard reference for LQ settingTodayLinear Quadratic Regulator (LQR)The LQR setting assumes a linear dynamical system:xt+1= Axt+ But,xt: state at time tut: input at time tIt assumes a quadratic cost function:g(xt, ut) = x⊤tQxt+ u⊤tRutwith Q ≻ 0, R ≻ 0.For a square matrix X we have X ≻ 0 if and only if for all vectors z wehave z⊤Xz > 0. Hence there is a non-zero cost for any state diﬀerent from theall-zeros state, and any input diﬀerent from the all-zeros input.While LQ assumptions might seem very restrictive,we will see the method can be made applicablefor non-linear systems, e.g., helicopter.Value iteration Back-up step for i+1 steps to go: LQR:= minux⊤Qx + u⊤Ru + γ Ji(Ax + Bu)LQR value iteration: J1Initialize J0(x) = x⊤P0x.J1(x) = minux⊤Qx + u⊤Ru + J0(Ax + Bu)= minux⊤Qx + u⊤Ru + (Ax + Bu)⊤P0(Ax + Bu)(1)To ﬁnd the minimum over u, we set the gradient w.r.t. u equal to zero:∇u[. . .] = 2Ru + 2B⊤P0(Ax + Bu) = 0,hence: u =−(R + B⊤P0B)−1B⊤P0Ax (2)(2) into (1): J1(x) = x⊤P1xfor: P1= Q + K⊤1RK1+ (A + BK1)⊤P0(A + BK1)K1=−(R + B⊤P0B)−1B⊤P0A.LQR value iteration: J1(ctd) In summary: J1(x) is quadratic, just like J0(x). Value iteration update is the same for all times and can be done in closed form!J1(x) = x⊤P1xfor: P1= Q + K⊤1RK1+ (A + BK1)⊤P0(A + BK1)K1=−(R + B⊤P0B)−1B⊤P0A.J0(x) = x⊤P0xxt+1= Axt+ Butg(x, u) = u⊤Ru + x⊤QxJ2(x) = x⊤P2xfor: P2= Q + K⊤2RK2+ (A + BK2)⊤P1(A + BK2)K2=−(R + B⊤P1B)−1B⊤P1A.Page 3Value iteration solution to LQRSet P0= 0.for i = 1, 2, 3, . . .Ki= −(R + B⊤Pi−1B)−1B⊤Pi−1APi= Q + K⊤iRKi+ (A + BKi)⊤Pi−1(A + BKi)The optimal policy for a i-step horizon is given by:π(x) = KixThe cost-to-go function for a i-step horizon is given by:Ji(x) = x⊤Pix. Extensions which make it more generally applicable: Affine systems System with stochasticity Regulation around non-zero fixed point for non-linear systems Penalization for change in control inputs Linear time varying (LTV) systems Trajectory following for non-linear systemsLQR assumptions revisitedxt+1= Axt+ Butg(xt, ut) = x⊤tQxt+ u⊤tRut= for keeping a linear system at the all-zeros state. Optimal control policy remains linear, optimal cost-to-go function remains quadratic Two avenues to do derivation: 1. Work through the DP update as we did for standard setting 2. Redefine the state as: z_t = [x_t; 1], then we have:LQR Ext0: Affine systemsxt+1= Axt+ But+ cg(xt, ut) = x⊤tQxt+ u⊤tRutzt+1= xt+11= A c0 1 xt1+ B0ut= A′zt+ B′ut Exercise: work through similar derivation as we did for the deterministic case. Result:  Same optimal control policy Cost-to-go function is almost identical: has one additional term which depends on the variance in the noise (and which cannot be influenced by the choice of control inputs)LQR Ext1: stochastic systemxt+1= Axt+ But+ wtg(xt, ut) = x⊤tQxt+ u⊤tRutwt, t = 0, 1, . . . are zero mean and independentNonlinear system:We can keep the system at the state x* iffLinearizing the dynamics around x*gives:Equivalently:Let zt= xt– x*, let vt= ut– u*, then:[=standard LQR]LQR Ext2: non-linear systemsxt+1= f (xt, ut)∃u∗s.t. x∗= f(x∗, u∗)xt+1≈f(x∗, u∗) +∂f∂x(x∗, u∗)(xt−x∗) +∂f∂u(x∗, u∗)(ut−u∗)xt+1−x∗≈A(xt−x∗) + B(ut−u∗)ABzt+1= Azt+ Bvt, cost = z⊤tQzt+ v⊤tRvtvt= Kzt⇒ut−u∗= K(xt−x∗)⇒ut= u∗+ K(xt−x∗)LQR Ext3: penalize for change in control inputs Standard LQR: When run in this format on real systems: often high frequency control inputs get generated. Typically highly undesirable and results in poor control performance. Why? Solution: frequency shaping of the cost function. (See, e.g., Anderson and Moore.)  Simple special case which works well in practice: penalize for change in control

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 7 pages.

Berkeley COMPSCI 287 - Lecture Notes 7

Sign up for free to view:

Please select your school