Stochastic Model Predictive Control• stochastic finite horizon control• stochastic dynamic programming• certainty equivalent model predictive controlProf. S. Boyd, EE364b, Stanford UniversityCausal state-feedback control• linear dynamical system, over finite time horizon:xt+1= Axt+ But+ wt, t = 0, . . . , T − 1– xt∈ Rnis state, ut∈ Rmis the input at time t– wtis the process noise (or exogeneous input) at time t• Xt= (x0, . . . , xt) is the state history up to time t• causal state-feedback control:ut= φt(Xt) = ψt(x0, w0, . . . , wt−1), t = 0, . . . , T − 1• φt: R(t+1)n→ Rmcalled the control policy at time tProf. S. Boyd, EE364b, Stanfor d University 1Stochastic finite horizon control• (x0, w0, . . . , wT −1) is a random variable• objective: J = EPT −1t=0ℓt(xt, ut) + ℓT(xT)– convex stage cost functions ℓt: Rn× Rm→ R, t = 0, . . . , T − 1– convex terminal cost function ℓT: Rn→ R• J depends on control policies φ0, . . . , φT −1• constraints: ut∈ Ut, t = 0, . . . , T − 1– convex input constraint sets U0, . . . , UT −1• stochastic control problem: choose control policies φ0, . . . , φT −1tominimize J, subject to constraintsProf. S. Boyd, EE364b, Stanfor d University 2Stochastic finite horizon control• an infinite dimensional problem: variables are functions φ0, . . . , φT −1– can restrict policies to finite dimensional subspace, e.g., φtall affine• key idea: we have recourse (a.k.a. feedback, closed-loop control)– we can change utbased on the observed state history x0, . . . , xt– cf standard (‘open loop’) optimal control problem, where we committo u0, . . . , uT −1ahead of time• in general case, need to evaluate J (for given control policies) viaMonte Carlo simulationProf. S. Boyd, EE364b, Stanfor d University 3‘Solution’ via dynamic programming• let Vt(Xt) be optimal value of objective, from t on, starting from initialstate history Xt• VT(XT) = ℓT(xT); J⋆= E V0(x0)• Vtcan be found by backward recursion: for t = T − 1, . . . , 0Vt(Xt) = infv∈U{ℓt(xt, v) + E(Vt+1((Xt, Axt+ Bv + wt))|Xt)}• Vt, t = 0, . . . , T are convex functions• optimal policy is causal state feedbackφ⋆t(Xt) = argminv∈U{ℓt(xt, v) + E(Vt+1((Xt, Axt+ Bv + wt))|Xt)}Prof. S. Boyd, EE364b, Stanfor d University 4Independent process noise• assume x0, w0, . . . , wT −1are independent• Vtdepends only on the current state xt(and not the state history Xt)• Bellman equations: VT(xT) = ℓT(xT); for t = T − 1, . . . , 0,Vt(xt) = infv∈U{ℓt(xt, v) + E Vt+1(Axt+ Bv + wt)}• optimal policy is a function of current state xtφ⋆(xt) = argminv∈U{ℓt(xt, v) + E Vt+1(Axt+ Bv + wt)}Prof. S. Boyd, EE364b, Stanfor d University 5Linear quadratic stochastic control• special case of linear stochastic control• Ut= Rm• x0, w0, . . . , wT −1are independent, withE x0= 0, E wt= 0, E x0xT0= Σ, E wtwTt= Wt• ℓt(xt, ut) = xTtQtxt+ uTtRtut, with Qt 0, Rt≻ 0• ℓT(xT) = xTTQTxT, with QT 0Prof. S. Boyd, EE364b, Stanfor d University 6• can show value functions are quadratic, i.e.,Vt(xt) = xTtPtxt+ qt, t = 0, . . . , T• Bellman recursion: PT= QT, qT= 0; for t = T − 1, . . . , 0,Vt(z) = infv{zTQtz + vTRtv+ E((Az + Bv + wt)TPt+1(Az + Bv + wt) + qt+1)}• works out toPt= ATPt+1A − ATPt+1B(BTPt+1B + Rt)−1BTPt+1A + Qtqt= qt+1+ Tr(WtPt+1)Prof. S. Boyd, EE364b, Stanfor d University 7• optimal policy is linear state feedback: φ⋆t(xt) = Ktxt,Kt= −(BTPt+1B + Rt)−1BTPt+1A(which, strangely, does not depend on Σ, W0, . . . , WT −1)• optimal costJ⋆= E V0(x0)= Tr(ΣP0) + q0= Tr(ΣP0) +T −1Xt=0Tr(WtPt+1)Prof. S. Boyd, EE364b, Stanfor d University 8Certainty equivalent model predictive control• at every time t we solve the certainty equivalent problemminimizePT −1τ =tℓt(xτ, uτ) + ℓT(xT)subject to uτ∈ Uτ, τ = t, . . . , T − 1xτ +1= Axτ+ Buτ+ ˆwτ |t, τ = t, . . . , T − 1with variables xt+1, . . . , xT, ut, . . . , uT −1and data xt, ˆwt|t, . . . , ˆwT −1|t• ˆwt|t, . . . , ˆwT −1|tare predicted values of wt, . . . , wT −1based on Xt(e.g., conditional expectations)• call solution ˜xt+1, . . . , ˜xT, ˜ut, . . . , ˜uT −1• we take φmpc(Xt) = ˜ut– φmpcis a function of Xtsince ˆwt|t, . . . , ˆwT −1|tare functions of XtProf. S. Boyd, EE364b, Stanfor d University 9Certainty equivalent model predictive control• widely used, e.g., in ‘revenue management’• based on (bad) approximations:– future values of disturbance are exactly as predicted; there is nofuture uncertainty– in future, no recourse is available• yet, often works very wellProf. S. Boyd, EE364b, Stanfor d University 10Example• system with n = 3 states, m = 2 inputs; horizon T = 50• A, B chosen randomly• quadratic stage cost: ℓt(x, u) = kxk22+ kuk22• quadratic final cost: ℓT(x) = kxk22• constraint set: U = {u | kuk∞≤ 0.5}• x0, w0, . . . , wT −1iid N (0, 0.25I)Prof. S. Boyd, EE364b, Stanfor d University 11Stochastic MPC: Sample trajectorysample trace of x1and u10 10 20 30 40 50−2−10120 10 20 30 40 50−0.500.5x1(t)u1(t)tProf. S. Boyd, EE364b, Stanfor d University 12Cost histogram0 200 400 600 800 10000501001500 200 400 600 800 1000050100150JmpcJrelaxJrelaxJsatProf. S. Boyd, EE364b, Stanfor d University 13Simple lower bound for quadratic stochastic control• x0, w0, . . . , wT −1independent• quadratic stage and final cost• relaxation:– ignore Ut; yields linear quadratic stochastic control problem– solve relaxed problem exactly; optimal cost is Jrelax• J⋆≥ Jrelax• for our numerical example,– Jmpc= 224.7 (via Monte Carlo)– Jsat= 271.5 (linear quadratic stochastic control with saturation)– Jrelax= 141.3Prof. S. Boyd, EE364b, Stanfor d University
View Full Document