Homework 1 MDP and RBF approximations to inverted pendulum E Todorov AMATH CSE 579 due February 9 Problem statement In this homework we will solve a simple optimal control problem balancing an inverted pendulum This will be done in two ways In Part 1 the problem will be discretized as an MDP which will then be solved with value iteration In Part 2 the optimal value function will be approximated with a radial basis function RBF whose parameters will be t by minimizing Bellman error In both cases the optimal control problem is as follows We have a deterministic continuous time dynamical system x 1 x2 x 2 a sin x1 bx2 u x1 is the angle of the pendulum x2 is the angular velocity u is the control The cost rate function is r x u 1 exp k cos x1 k u2 2 The problem is de ned in the discounted cost setting with discount factor which is up to you Here a 0 models gravity mass and length b 0 is damping k 0 determines the shape of the cost function larger k makes the cost sharper r 0 is the weight of the control cost The four constants a b k r together with the discount factor can be adjusted to obtain di erent problem instances you should experiment with them and try to understand how they a ect the solution 1 Part 1 MDP approximation Construct the approximating MDP as follows First decide on a time step h maximum velocity vmax and control umax to be represented in the MDP and numbers of grid points n1 n2 nu along the state and control dimensions Dimension x1 is circular and should be discretized with care enforcing wrap around Dimension x2 and the control dimension u are discretized over the intervals vmax vmax and umax umax in a straightforward way The Matlab command linspace is useful here Now we have a grid of discrete states each corresponding to some continuous state x1 x2 For each discrete state and each discrete control compute the continuous next state y1 y2 under the pendulum dynamics y1 x1 hx2 y2 x2 h a sin x1 bx2 u y1 y2 will not fall on the grid in general so we have to create transition probabilities in the MDP such that the average of the neighboring grid states weighted by the transition probabilities equals y1 y2 Compared to our discussion of MDP discretizations in class the situation here is simpler because the underlying continuous dynamics are deterministic and so we want to use the smallest possible set of next states with non zero probability We have freedom in choosing which neighbors to use The simplest choice is to take the nearest neighbors in each direction i e the vertices of the grid square where y1 y2 falls Note that y1 y2 will sometimes fall outside the grid in the velocity dimension In that case we cannot achieve the above requirement for matching the average Instead we use the nearest grid states and accept some approximation error i e edge e ect The cost in the MDP is h where is evaluated at the discretized states and controls There are several unspeci ed constants h vmax umax n1 n2 nu in the MDP construction Similar to the above constants appearing in the problem formulation you should de ne them as symbolic constants at the beginning of your code experiment with them and try to understand their e ects Generally denser grids produce more accurate solutions It is also useful to think about compatibility of the grid in the following sense Suppose the time step h is small and the discretization step in velocity is small while the discretization step in position is large Then it will be very di cult for the MDP to make any state transitions i e the transition probabilities will be 2 close to delta functions centered at the current state This is to be avoided Instead we want 2vmax 2 h n2 n1 Similar reasoning applies to the velocity and control discretization Part 2 RBF approximation RBFs are among the more popular function approximators Here we will represent the to be computed optimal value function as v x w XN i 1 wi exp s1 cos x1 c1 i s2 x2 c2 i 2 These bases are Gaussians in velocity and circular Gaussians or von Mises distributions in position The constants c1 i and c2 i determine the center of the i th basis in position and velocity respectively You should place them on a rectangular grid and experiment with the gird density it should be smaller than the grid used for the MDP discretization The two sharpness parameters s1 s2 are common to all bases they should be adjusted to match the grid of RBF centers so that adjacent RBFs have some overlap but not too much The weights wi are the only adjustable parameters of our function approximator which is why we have written v as a function of both x and w We want to set w so that v x w satis es the Hamilton Jacobi Bellman HJB equation as closely as possible To do this de ne a set of states fxk g where the HBJ equation will be evaluated This can be the same as the dense grid of states used in Part 1 for example To approximate the solution to a functional equation of the form v T v which includes the HJB equation with suitable choice of operator T we de ne the following tting cost X L w k v xk w T v xk w k2 k and then minimize L w This yields the optimal weights for the function approximator Note that v is linear in w but the HJB equation is quadratic in v thus L w contains 4 th order terms in w and cannot be minimized analytically Instead use a numeric optimizer 3 What to submit Matlab code ideally written in a modular way so that we can replace the pendulum problem with any other 2D dynamics cost Several gures showing the optimal cost to go and the optimal control law both are scalar functions over the 2D state space for di erent parameter settings you found interesting Comparison the solutions obtained in Part 1 and Part 2 Choose several initial states and show the trajectories of the continuousstate discrete time system under the optimal control law The latter is only de ned on the grid but you can interpolate linearly so as to de ne it everywhere Some text summarizing your observations 4
View Full Document
Unlocking...