Controlled Diffusions and Hamilton Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science Engineering University of Washington Winter 2014 Emo Todorov UW AMATH CSE 579 Winter 2014 Winter 2014 1 16 Continuous time formulation Notation and terminology x t 2 Rn u t 2 Rm t 2 Rk state vector control vector Brownian motion integral of white noise dx f x u dt G x u d continuous time dynamics x u G x u G x u T noise covariance x u qT x cost for choosing control u in state x optional scalar cost at terminal states x 2 T 0 0 x 2 Rm v x 0 control law value cost to go function x v x optimal control law and its value function Emo Todorov UW AMATH CSE 579 Winter 2014 Winter 2014 2 16 Stochastic differential equations and integrals Ito diffusion stochastic differential equation SDE dx f x dt g x d This cannot be written as x f x g x because does not exist The SDE means that the time integrals of the two sides are equal x T x 0 Z T 0 f x t dt Z T 0 g x t d t The last term is an Ito integral For an Ito process y t adapted to t i e depending on the sample path only up to time t this integral is Definition Ito integral Z T 0 y t d t lim N 0 t0 t1 tN T N 1 i 0 y ti ti 1 ti Replacing y ti with y ti 1 ti 2 yields the Stratonovich integral Emo Todorov UW AMATH CSE 579 Winter 2014 Winter 2014 3 16 Stochastic chain rule and integration by parts A twice differentiable function a x of an Ito diffusion dx f x dt g x d is an Ito process not necessarily a diffusion which satisfies Lemma Ito da x t a0 x t dx t 12 a00 x t g x t 2 dt This is the stochastic version of the chain rule Emo Todorov UW AMATH CSE 579 Winter 2014 Winter 2014 4 16 Stochastic chain rule and integration by parts A twice differentiable function a x of an Ito diffusion dx f x dt g x d is an Ito process not necessarily a diffusion which satisfies Lemma Ito da x t a0 x t dx t 12 a00 x t g x t 2 dt This is the stochastic version of the chain rule There is also a stochastic version of integration by parts x T y T x 0 y 0 Z T 0 x t dy t Z T 0 y t dx t x y T The last term which would be 0 if x t or y t were differentiable is Definition quadratic covariation x y T lim N 0 t0 t1 tN T N 1 i 0 x ti 1 x ti y ti 1 y ti For a diffusion with constant noise amplitude we have x x T g2 T Emo Todorov UW AMATH CSE 579 Winter 2014 Winter 2014 4 16 Forward and backward equations generator Let p y sjx t s t denote the transition probability density under the Ito diffusion dx f x dt g x d Then p satisfies the following PDEs Theorem Kolmogorov equations forward FP equation backward equation Emo Todorov UW p s 1 2 g2 p fp y 2 y2 1 2 p f p g2 2 p L p y sj t t x 2 x AMATH CSE 579 Winter 2014 Winter 2014 5 16 Forward and backward equations generator Let p y sjx t s t denote the transition probability density under the Ito diffusion dx f x dt g x d Then p satisfies the following PDEs Theorem Kolmogorov equations forward FP equation backward equation p s 1 2 g2 p fp y 2 y2 1 2 p f p g2 2 p L p y sj t t x 2 x The operator L which computes expected directional derivatives is called the generator of the stochastic process It satisfies in the vector case Theorem generator Ex 0 x v x 0 L v x lim Emo Todorov UW v x f x T vx x 12 tr x vxx x AMATH CSE 579 Winter 2014 Winter 2014 5 16 Discretizing the time axis Consider the explicit Euler discretization with time step p x t x t f x t u t G x t u t t p where t N 0 I The term appears because the variance grows linearly with time Thus the transition probability p x0 jx u is Gaussian with mean x f x u and covariance matrix x u The one step cost is x u Now we can apply the Bellman equation in the finite horizon setting n o v x t min x u Ex0 p jx u v x0 t u n o min x u Ed N f x u x u v x d t u Next we use the Taylor series expansion of v Emo Todorov UW AMATH CSE 579 Winter 2014 Winter 2014 6 16 Hamilton Jacobi Bellman HJB equation v x d t v x t vt x t o 2 1 dT vx x t dT vxx x t d o d3 2 h i Using the fact that E dT Md tr cov d M o 2 the expectation is Ed v x d t v x t vt x t o 2 f x u T vx x t Emo Todorov UW AMATH CSE 579 Winter 2014 tr x u vxx x t 2 Winter 2014 7 16 Hamilton Jacobi Bellman HJB equation v x d t v x t vt x t o 2 1 dT vx x t dT vxx x t d o d3 2 h i Using the fact that E dT Md tr cov d M o 2 the expectation is Ed v x d t v x t vt x t o 2 f x u T vx x t tr x u vxx x t 2 Substituting in the Bellman equation v x t min u x u v x t vt x t o 2 f x u T vx x t 2 tr x u vxx x t Simplifying dividing by and taking 0 yields the HJB equation vt x t min x u f x u T vx x u Emo Todorov UW AMATH CSE 579 Winter 2014 1 tr x u vxx x 2 Winter 2014 7 16 HJB equations for different problem formulations Definition Hamiltonian H x u v x u f x u T vx x 1 tr x u vxx x L v 2 The HJB equations for the optimal cost to go v are Theorem HJB equations first exit finite horizon 0 minu H x u v vt x t minu H x u v t v x 2 T qT x v x T qT x 1 v x minu H x u v average c minu H x u e v R …
View Full Document
Unlocking...