Berkeley COMPSCI 287 - Lecture 4: Control 3: Optimal control---discretization - D478836

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 287> Lecture 4: Control 3: Optimal control---discretization

DOC PREVIEW

Berkeley COMPSCI 287 - Lecture 4: Control 3: Optimal control---discretization

School name University of California, Berkeley

Course Compsci 287- Advanced Robotics

Pages 18

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Page 1CS 287: Advanced RoboticsFall 2009Lecture 4: Control 3: Optimal control---discretization (function approximation)Pieter AbbeelUC Berkeley EECS Tuesday Sept 15: **no** lectureAnnouncementPage 2 Optimal control: provides general computational approach to tackle control problems---both under- and fully actuated. Dynamic programming Discretization Dynamic programming for linear systems Extensions to nonlinear settings: Local linearization Differential dynamic programming Feedback linearization Model predictive control (MPC) Examples:Today and forthcoming lectures Optimal control formalism [Tedrake, Ch. 6, Sutton and Barto Ch.1-4] Discrete Markov decision processes (MDPs) Solution through value iteration [Tedrake Ch.6, Sutton and Barto Ch.1-4] Solution methods for continuous problems: HJB equation [[[Tedrake, Ch. 7 (optional)]]] Markov chain approximation method [Chow and Tsitsiklis, 1991; Munos and Moore, 2001] [[[Kushner and Dupuis 2001 (optional)]]] Continuous  discrete [Chow and Tsitsiklis, 1991; Munos and Moore, 2001] [[[Kushner and Dupuis 2001 (optional)]]] Error bounds:  Value function: Chow and Tsitsiklis; Kushner and Dupuis; function approximation [Gordon 1995; Tsitsiklis and Van Roy, 1996] Value function close to optimal  resulting policy good  Speed-ups and Accuracy/Performance improvementsToday and ThursdayPage 3Optimal control formulationGiven:dynamics : ˙x(t) = f(x(t), u(t), t)cost function : g(x, u, t)Task: find a policy u(t) = π(x, t) which optimizes:Jπ(x0) = h(x(T )) +T0g(x(t), u(t), t)dtApplicability: g and f often easier to specify than π Markov decision process (MDP) (S, A, P, H, g) S: set of states A: set of actions P: dynamics model H: horizon g: S x A  R cost function  Policy Cost-to-go of a policy π: Goal: findFinite horizon discrete timeπ = (µ0, µ1, . . . , µH), µk: S→AJπ(x) = E[Ht=0g(x(t), u(t))|x0= x, π]π∗∈arg minπ∈ΠJπP (xt+1= x′|xt= x, ut= u)Page 4Dynamic programming (aka value iteration)Let J∗k= minµk,...,µHE[Ht=kg(xt, ut)], then we have:J∗H(x) = minug(x(H), u(H))J∗H−1(x) = minug(x, u) + x′P (x′|x, u)J∗H(x′). . .J∗k(x) = minug(x, u) + x′P (x′|x, u)J∗k+1(x′). . .J∗0(x) = minug(x, u) + x′P (x′|x, u)J∗1(x′)Andµ∗k(x) = arg minug(x, u) + x′P (x′|x, u)J∗k+1(x′); Running time: O(|S|2|A| H) vs. naïve search over all policies would require evaluation of |A||S|Hpolicies Markov decision process (MDP) (S, A, P, γ, g) γ: discount factor Policy Value of a policy π: Goal: findDiscounted infinite horizonπ = (µ0, µ1, . . .), µk: S→AJπ(x) = E[∞t=0γtg(x(t), u(t))|x0= x, π]π∗∈arg minπ∈ΠVπPage 5 Dynamic programming (DP) aka Value iteration (VI):For i=0,1, …For all s ∈ S Facts:Discounted infinite horizonJ(i+1)(s) ← minu∈A s′P (s′|s, u)g(s, a) + γJ(i)(s′)There is an optimal stationary policy: π∗= (µ∗, µ∗, . . .) which satisfies:µ∗(x) = arg minug(x, u) + γ x′P (x′|x, u)J∗(x)J(i)→J∗fori→∞ Hamilton-Jacobi-Bellman equation / approach: Continuous equivalent of discrete case we already discussed We will see 2 slides. Variational / Markov chain approximation method: Numerically solve a continuous problem by directly approximating the continuous MDP with a discrete MDP We will study this approach in detail.Continuous time and state-action spacePage 6Hamilton-Jacobi-Bellman (HJB) [*]Hamilton-Jacobi-Bellman (HJB) [*] Can also derive HJB equation for the stochastic setting. Keywords for finding out more: Controlled diffusions / diffusion jump processes.  For special cases, can assist in finding / verifying analytical solutions However, for most cases, need to resort to numerical solution methods for the corresponding PDE --- or directly approximate the control problem with a Markov chain References:  Tedrake Ch. 7; Bertsekas, “Dynamic Programming and Optimal Control.” Oksendal, “Stochastic Differential Equations: An Introduction with Applications” Oksendal and Sulem, “Applied Stochastic Control of Jump Diffusions” Michael Steele, “Stochastic Calculus and Financial Applications” Markov chain approximations: Kushner and Dupuis, 1992/2001Page 7Markov chain approximation (“discretization”) Original MDP (S, A, P, R, γ)  Discretized MDP: Grid the state-space: the vertices are the discrete states. Reduce the action space to a finite set. Sometimes not needed:  When Bellman back-up can be computed exactly over the continuous action space When we know only certain controls are part of the optimal policy (e.g., when we know the problem has a “bang-bang” optimal solution) Transition function remains to be resolved! ξξξξξξξξξξξξs‘sDiscretization: example 1Discrete states: { ξ, …, ξ}P (ξ2|s, a) = pA;P (ξ3|s, a) = pB;P (ξ6|s, a) = pC;s.t. s′= pAξ2+ pBξ3+ pCξ6a Results in discrete MDP, which we know how to solve. Policy when in “continuous state”:Note: need not be triangular. [See also: Munosand Moore, 2001.]π(s) = arg minag(s, a) + γs′P (s′|s, a)iP (ξi; s′)J(ξi)Page 8Discretization: example 1 (ctd) Discretization turns deterministic transitions into stochastic transitions If MDP already stochastic  Repeat procedure to account for all possible transitions and weight accordingly If a (state, action) pair can results in infinitely many different next states: Sample next states from the next-state distributionDiscretization: example 1 (ctd) Discretization results in finite state stochastic MDP, hence we know value iteration will converge Alternative interpretation: the Bellman back-ups in the finite state MDP are  (a) back-ups on a subset of the full state space (b) use linear interpolation to compute the required “next-state cost-to-go functions” whenever the next state is not in the discrete set= value iteration with function approximationPage 9Discretization: example 2Discrete states: { ξ, …, ξ}Similarly define transition probabilities for all ξiξξξξξξs‘P (ξ2|s, a) = 1;a Results in discrete MDP, which we know how to solve. Policy when in “continuous state”: This is nearest neighbor;

View Full Document