DOC PREVIEW
Berkeley COMPSCI 294 - Introduction

This preview shows page 1 out of 2 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS294-40 Learning for Robotics and Control Lecture 1 - 8/28/2008IntroductionLecturer: Pieter Abbeel Scribe: Pieter Abbeel1 Lecture outline• Class logistics.• Slideshow and movies on current autonomous robotics, on algorithms they use, and on future directions.• Markov decision processes.2 Markov decision processes (MDPs)2.1 DefinitionA (discounted infinite horizon) Markov decision process (MDP) is a tuple (S, A, T , γ, D, R).Here1. S is the set of poss ible states for the system;2. A is the set of possible actions;3. T represents the (typically stochastic) system dynamics;4. D is the initial-state distribution, from which the start state s0is drawn;5. R : S 7→ < is the reward function.Acting in a Markov decision process results in a sequence of states and actions s0, a0, s1, a1, s2, . . ..A policy π is a sequence of mappings (µ0, µ1, µ2, . . .), where, at time t the mapping µt(·) determines theaction at= µt(st) to take when in state st.The objective is to find policies that maximize the expected sum of rewards accumulated over time. Inparticular, a policy π is good if its utilityU(π) = E[∞Xt=0γtR(st)|π]is high.To represent the system dynamics, we can use the state-transition distribution notationst+1∼ Psa(·|st, at).We will also often use the following notation:st+1= F (st, at, wt).Here F is a deterministic function, and wtis a random disturbance.12.2 Examples2.2.1 CarOne (approximate) way to model the state of a car is to use the following six state variables: northing (n),easting (e), north velocity ( ˙n), east velocity ( ˙e), heading (θ), angular rate (˙θ). Hence the state space S = <6.The actions (or control inputs) are (i) steering angle, (ii) throttle, (iii) brake.The perturbances capture both environmental perturbations as well as unmodeled aspects of the cardynamics.We could have the following dynamics model st+1= F (st, at, wt):nt+1= nt+ ˙nt∆t,et+1= et+ ˙et∆t,θt+1= θt+˙θt∆t,˙nt+1= fn( ˙nt, ˙et,˙θt, at, wt)˙et+1= fe( ˙nt, ˙et,˙θt, at, wt)˙θt+1= fθ( ˙nt, ˙et,˙θt, at, wt)The reward function could be R(st) = 1{in goal region} − 100 ∗ 1{in collision}. Here 1{·} is an indicatorfunction, taking the value “1” when its argument is true, and “0” otherwise. The functions fn, fe, fθaredeterministic functions modeling the car’s


View Full Document

Berkeley COMPSCI 294 - Introduction

Documents in this Course
"Woo" MAC

"Woo" MAC

11 pages

Pangaea

Pangaea

14 pages

Load more
Download Introduction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Introduction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?