DOC PREVIEW
Berkeley COMPSCI 294 - Separation Principle, Dynamics Modeling

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS294-40 Learning for Robotics and Control Lecture 12 - 10/07/2008Separation Principle, Dynamics ModelingLecturer: Pieter Abbeel Scribe: P˚al From1 Announcements• Milestone report: due on Sunday; 1 − 2 pages with the results so far,12− 1 page of future plans.2 Separation PrincipleAssume we have a linear systemxk= Axk−1+ Buk−1+ wk−1for k = 1, 2, . . . , H (1)with the quadratic costE"xTHPHxH+H−1Xk=0(uTkRkuk+ xTkPkxk)#(2)The input disturbances wkare as sume d to be independent, zero mean and have finite variance.We need to find a rule for the control utgiven It. Itcontains the information available to the controllerat time t, i.e. any (noisy) observations of the states at previous times k = 1 . . . t as well as the previouscontrols, ukfor k = 1 . . . t − 1.We start by solving for the optimal policy for time H − 1:arg minuH−1EuTH−1RH−1uH−1+ xTHPHxH| IH−1 (3)We start by rewriting the second term, in particular, we will use the following:Eh[xH− E [xH| IH−1]]TPHE [xH− E [xH| IH−1]] | IH−1i= ExTHPHxH| IH−1 + EhE [xH| IH−1]TPHE [xH| IH−1] | IH−1i− 2ExTHPHE [xH| IH−1] | IH−1 = ExTHPHxH| IH−1 + E [xH| IH−1]TPHE [xH| IH−1] − 2E [xH| IH−1]TPHE [xH| IH−1]= ExTHPHxH| IH−1 − E [xH| IH−1]TPHE [xH| IH−1] (4)Using Eqn. (4) and linearity of expection, we can re-write Eqn. (3) as follows:arg minuH−1uTH−1RH−1uH−1+ E [xH| IH−1]TPHE [xH| IH−1] + Eh[xH− E [xH| IH−1]]TPH[xH− E [xH| IH−1]] | IH−1i(5)Interestingly, Eqn. (5) shows that for a quadratic cost, the expected cost-to-go can be split into the cost-to-go for the expected state E[xH|IH−1], and an additional term Eh[xH− E [xH| IH−1]]TPH[xH− E [xH| IH−1]] | IH−1i,which accounts for the cost incurred by the uncertainty about the state.1We will now use the second particularly interesting fact about the linear quadratic setting: [xH− E [xH| IH−1]]is independent of u0:H−1so that we can exclude this term from the minimization. This property relies onthe linearity of the system.Intuitively, this property means that the estimation error is not influenced by the control inputs we applyfor a linear system. We do this by showing that the linear terms in u are repeated in xHand E[xH|IH−1]and thus cancel.This c an be seen by writing out the expressions for xHand E[xH|IH−1]:xH= AxH−1+ BuH−1+ wH−1= AHx0+ AH−1w0+ AH−2w1+ AH−3w3+ · · · + wH−1+ AH−1Bu0+ AH−2Bu1+ · · · + BuH−1(6)andE[xH|IH−1] = AHE[x0|IH−1] + AH−1E[w0|IH−1] + · · · + E[wH−1|IH−1]+ AH−1BE[u0|IH−1] + · · · + B E[uH−1|IH−1]| {z }uH−1(IH−1)(7)By observing the two expressions we see that since the control enter the expressions as linear terms andwe have a linear system, the difference will not be affected by our choice of u0:H−1. We can thus exclude thetermEh[xH− E [xH| IH−1]]TPH[xH− E [xH| IH−1]] | IH−1i(8)from (5) and obtain the following certainty equivalent expression:arg minuH−1EuTH−1RH−1uH−1+ xTHPHxH| IH−1 = arg minuH−1uTH−1RH−1uH−1+ E [xH| IH−1]TPH[xH| IH−1](9)We c an now use that xH= AxH−1+ BuH−1+ wH−1and getarg minuH−1uTH−1RH−1uH−1+ E [AxH−1+ BuH−1+ wH−1]TPHE [AxH−1+ BuH−1+ wH−1](10)Up to the noise wH−1, we now have the same setting as in Lecture 6 (which cove red the linear quadraticregulator setting). Using a similar derivation, and the fact that wH−1is assumed to be zero-mean andindependent of the other variables, we obtain:uH−1= KH−1E[xH−1|IH−1]forKH−1= −(RH−1+ BTH−1PHBH−1)−1BTH−1PHAH−1.Now we plug this into our original objective, as we still have to solve for u0, . . . , uH−2:arg minu0,...,uH−2E"E[xH−1|IH−1]>PH−1E[xH−1|IH−1] +H−2Xt=0u>tRtut+ x>tQtxt#forPH−1= QH−1+ KTH−1RH−1KH−1+ (AH−1+ BH−1KH−1)TQH(AH−1+ BH−1KH−1). (11)2Now, we proceed by solving for uH−2in a similar fashion. First observe, by using the same derivation asin Eqn. (4), that:EhE [xH−1| IH−1]TPH−1E [xH−1| IH−1] |IH−2i= E [xH−1| IH−2]TPH−1E [xH−1| IH−2]+ Eh[xH−1− E [xH−1| IH−2]]TPH−1[xH−1− E [xH−1| IH−2]] | IH−2i(12)We can show similar to earlier when solving for uH−1that the last term does not contribute to theminimization, but will of course affect the total cost. We simply cannot affect the uncertainty of the stateby our control inputs.We repeat the same reasoning for every time step t = H − 1, H − 2, . . . , 0.Hence, for linear systems with quadratic cost, the following procedure results in optimal control:• Estimate the states of the system with a Kalman filter, i.e. E [xt| It]• LQR controller - controller assuming the outputs of the Kalman filter to be true, i.e. using E [xt| It]in the controller as though it were the true state xt.This is known as the separation principle for linear systems with quadratic costs: we don’t have toexplicitly account for uncertainty when deciding on our control inputs. We can be optimal by solving theestimation and the c ontrol problem separately. The estimator gives the optimal estimates of the statesassuming no control and the controller is optimal assuming perfect state estimation.Challenge problem: Can you find other syste ms for which the separation principle applies?3 ModelingWe will consider an example dynamics model for a helicopter.3.1 Helicopter modelWe use the following state space to represent the state of the helicopter:state: (n, e, d, ˙n, ˙e,˙d, qx, qy, qz, qw| {z }qu aternion, p, q, r) (13)where the quaternion represents a rotation θ about the axis ~n = [nx, ny, nz], k~nk = 1 and can be written asqx= nxsinθ2qy= nysinθ2qz= nzsinθ2qw= cosθ2Note: two quaternions with opposite signs represent the same physical rotation, i.e.q(~n, θ + 2π) = −q(~n, θ). (14)3We have the following inputsinput: (uaileron| {z }roll rate, uelevator| {z }pitch rate| {z }cyclic control, urudder| {z }yaw rate, ucollective| {z }vertical trust) (15)Cyclic control means that the angle of the blade changes throughout the cycle.3.2 Dynamic modelThe dynamics model is given bynt+1= nt+ ∆t · ˙ntet+1= et+ ∆t · ˙etdt+1= dt+ ∆t ·˙dtand for the quaternion(qx, qy, qz, qw)t+1= (qx, qy, qz, qw)t∗ (sinθ2~n, cosθ2) (16)where ∗ is the quaternion product and~n =(p, q, r)∆tk(p, q, r)∆tk2, θ = k(p, q, r)∆tk2. (17)Further, the moments are given byTxTyTz=pqr×I


View Full Document

Berkeley COMPSCI 294 - Separation Principle, Dynamics Modeling

Documents in this Course
"Woo" MAC

"Woo" MAC

11 pages

Pangaea

Pangaea

14 pages

Load more
Download Separation Principle, Dynamics Modeling
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Separation Principle, Dynamics Modeling and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Separation Principle, Dynamics Modeling 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?