OSU CS 533 - Course Logistics - D2784704

Home> Schools> Oregon State University> Computer Science (CS) > CS 533> Course Logistics

DOC PREVIEW

OSU CS 533 - Course Logistics

School name Oregon State University

Course Cs 533- Intelligent Agents And Decision Making

Pages 49

This preview shows page 1-2-3-23-24-25-26-47-48-49 out of 49 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1 Course Logistics  CS533: Intelligent Agents and Decision Making  M, W, F: 1:00—1:50  Instructor: Alan Fern (KEC2071)  Office hours: by appointment (see me after class or send email)  Emailing me: include “CS533 Student” in subject and repeat if you don’t hear back within a day  Course website (link on instructor’s home page) has  Lecture notes, Assignments, and HW Solutions  Written Homework:  Assigned and collected regularly (submission via email is accepted)  Not graded for correctness, but rather for “completion with good effort” (no guarantee on when it will be returned---make a copy if you want one)  Must “complete with good effort” 90% of homework or a letter grade will be deducted from final grade  Grade based on:  25% Midterm exam (in class)  25% Final exam (take home)  25% Final Project (during last month)  25% 3 mini-projects (require some implementation)Some AI Planning Problems Fire & Rescue Response Planning Solitaire Real-Time Strategy Games Helicopter Control Legged Robot Control Network Security/Control3 Some AI Planning Problems  Health Care Personalized treatment planning Hospital Logistics/Scheduling  Transportation Autonomous Vehicles Supply Chain Logistics Air traffic control  Assistive Technologies Automated assistants for elderly/disabled Household robots  Sustainability Smart grid Forest fire management  …..  Does not assume a model of “world” is given  Exact solutions for small/moderate problems  Approximate solutions for large problems  Monte-Carlo Planning  Assumes a simulator of the “world” rather than exact model  Approximate solutions for large problems  Planning with Factored Models of Huge MDPs  Symbolic Dynamic Programming  Classical planning for deterministic problems4 Common Elements  We have a controllable system that can change state over time (in some predictable way) The state describes essential information about system (the visible card information in Solitaire)  We have an objective that specifies which states, or state sequences, are more/less preferred  Can (partially) control the system state transitions by taking certain actions  Problem: At each moment must select an action to optimize the overall objective Produce most preferred state sequences5 Observations Actions World fully observable vs. partially observable instantaneous vs. durative deterministic vs. stochastic Some Dimensions of AI Planning ???? sole source of change vs. other sources Goal6 Observations Actions ???? World fully observable instantaneous deterministic Classical Planning Assumptions (primary focus of AI planning until early 90’s) sole source of change Goal achieve goal condition7 Observations Actions ???? World fully observable instantaneous deterministic Classical Planning Assumptions (primary focus of AI planning until early 90’s) sole source of change Goal achieve goal condition Greatly limits applicability8 Observations Actions ???? World fully observable instantaneous stochastic Stochastic/Probabilistic Planning: Markov Decision Process (MDP) Model sole source of change Goal maximize expected reward over lifetime We will primarily focus on MDPs9 World State Action from finite set ???? Stochastic/Probabilistic Planning: Markov Decision Process (MDP) Model Goal maximize expected reward over lifetime Probabilistic state transition (depends on action)10 State describes all visible info about cards Action are the different legal card movements ???? Example MDP Goal win the game or play max # of cards11 Course Outline  Structured around algorithms for solving MDPs  Different assumptions about knowledge of MDP model  Different assumptions about how MDP is represented 1. Markov Decision Processes (MDPs) Basics  Basic definitions and solution techniques  Assume an exact MDP model is known  Exact solutions for small/moderate size problems 2. Monte-Carlo Planning  Assumes an MDP simulator is available  Approximate solutions for large problems 3. Reinforcement learning  MDP model is not known to agent  Exact solutions for small/moderate problems  Approximate solutions for large problems 4. Planning w/ Factored Representations of Huge MDPs  Symbolic Dynamic Programming  Classical planning for deterministic problems12 Markov Decision Processes Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld13 Markov Decision Processes  An MDP has four components: S, A, R, T:  finite state set S (|S| = n)  finite action set A (|A| = m)  transition function T(s,a,s’) = Pr(s’ | s,a)  Probability of going to state s’ after taking action a in state s  How many parameters does it take to represent?  m*n*(n-1)  bounded, real-valued reward function R(s)  Immediate reward we get for being in state s  Roughly speaking the objective is to select actions in order to maximize total reward  For example in a goal-based domain R(s) may equal 1 for goal states and 0 for all others (or -1 reward for non-goal states)14 Graphical View of MDP St Rt St+1 At Rt+1 St+2 At+1 Rt+215 Assumptions  First-Order Markovian dynamics (history independence)  Pr(St+1|At,St,At-1,St-1,..., S0) = Pr(St+1|At,St)  Next state only depends on current state and current action  State-Dependent Reward  Rt = R(St)  Reward is a deterministic function of current state  Stationary dynamics  Pr(St+1|At,St) = Pr(Sk+1|Ak,Sk) for all t, k  The world dynamics and reward function do not depend on absolute time  Full observability  Though we can’t predict exactly which state we will reach when we execute an action, after the action is executed, we see what the state is16 What is a solution to an MDP? MDP Planning Problem: Input: an MDP (S,A,R,T) Output: ????  Should the solution to an MDP be just a sequence of actions such as (a1,a2,a3, ….) ?  Consider a single player card game like Blackjack/Solitaire.  No! In general an action sequence is not sufficient  Actions have stochastic effects, so the state we end up in is uncertain  This means that we might end up in states where the remainder of the action sequence doesn’t apply or is a bad choice  A solution should tell us what the best action is for any possible situation/state that might arise17

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-23-24-25-26-47-48-49 out of 49 pages.

OSU CS 533 - Course Logistics

Sign up for free to view:

Please select your school