Berkeley COMPSCI 294 - Introduction - D2695422

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 294> Introduction

DOC PREVIEW

Berkeley COMPSCI 294 - Introduction

School name University of California, Berkeley

Course Compsci 294- Special Topics

Pages 2

This preview shows page 1 out of 2 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 2 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 2 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS294-40 Learning for Robotics and Control Lecture 1 - 8/28/2008IntroductionLecturer: Pieter Abbeel Scribe: Pieter Abbeel1 Lecture outline• Class logistics.• Slideshow and movies on current autonomous robotics, on algorithms they use, and on future directions.• Markov decision processes.2 Markov decision processes (MDPs)2.1 DefinitionA (discounted infinite horizon) Markov decision process (MDP) is a tuple (S, A, T , γ, D, R).Here1. S is the set of poss ible states for the system;2. A is the set of possible actions;3. T represents the (typically stochastic) system dynamics;4. D is the initial-state distribution, from which the start state s0is drawn;5. R : S 7→ < is the reward function.Acting in a Markov decision process results in a sequence of states and actions s0, a0, s1, a1, s2, . . ..A policy π is a sequence of mappings (µ0, µ1, µ2, . . .), where, at time t the mapping µt(·) determines theaction at= µt(st) to take when in state st.The objective is to find policies that maximize the expected sum of rewards accumulated over time. Inparticular, a policy π is good if its utilityU(π) = E[∞Xt=0γtR(st)|π]is high.To represent the system dynamics, we can use the state-transition distribution notationst+1∼ Psa(·|st, at).We will also often use the following notation:st+1= F (st, at, wt).Here F is a deterministic function, and wtis a random disturbance.12.2 Examples2.2.1 CarOne (approximate) way to model the state of a car is to use the following six state variables: northing (n),easting (e), north velocity ( ˙n), east velocity ( ˙e), heading (θ), angular rate (˙θ). Hence the state space S = <6.The actions (or control inputs) are (i) steering angle, (ii) throttle, (iii) brake.The perturbances capture both environmental perturbations as well as unmodeled aspects of the cardynamics.We could have the following dynamics model st+1= F (st, at, wt):nt+1= nt+ ˙nt∆t,et+1= et+ ˙et∆t,θt+1= θt+˙θt∆t,˙nt+1= fn( ˙nt, ˙et,˙θt, at, wt)˙et+1= fe( ˙nt, ˙et,˙θt, at, wt)˙θt+1= fθ( ˙nt, ˙et,˙θt, at, wt)The reward function could be R(st) = 1{in goal region} − 100 ∗ 1{in collision}. Here 1{·} is an indicatorfunction, taking the value “1” when its argument is true, and “0” otherwise. The functions fn, fe, fθaredeterministic functions modeling the car’s

View Full Document

Berkeley COMPSCI 294 - Introduction

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 2 pages.

Berkeley COMPSCI 294 - Introduction

Sign up for free to view:

Please select your school