DOC PREVIEW
TAMU CSCE 625 - Markov Decision Processes MDPs

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Markov Decision Processes (MDPs)Slide 2Slide 3Slide 4Calculating V*(s)Slide 6Markov Decision Processes (MDPs)•read Ch 17.1-17.2•utility-based agents–goals encoded in utility function U(s), or U:S •effects of actions encoded in state transition function: T:SxAS–or T:SxApdf(S) for non-deterministic•rewards/costs encoded in reward function: R:SxA•Markov property: effects of actions only depend on current state, not previous history•the goal: maximize reward over time–long-term discounted reward–handles infinite horizon; encourages quicker achievement•“plans” are encoded in policies–mappings from states to actions: :SA•how to compute optimal policy * that maximizes long-term discounted reward?•value function V(s): expected long-term reward from starting in state s and following policy  •derive policy from V(s):• (s)=maxaA E[R(s,a)+V(T(s,(s)))] • = max  p(s’|s,a)·(R+V(s’))•optimal policy comes from optimal value function: (s)= max  p(s’|s,a)·V*(s’)=•Bellman’s equations–(eqn 17.5)•method 1: linear programming–n coupled linear equations–v1 = max(v2,v3,v4...)–v2 = max(v1,v3,v4...)–v3 = max(v1,v2,v4...) –solve for {v1,v2,v3...} using Gnu LP kit, etc.Calculating V*(s)•method 2: Value Iteration–initialize V(s)=0 for all states–iteratively update value of each state based on neighbors–...until


View Full Document

TAMU CSCE 625 - Markov Decision Processes MDPs

Download Markov Decision Processes MDPs
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Markov Decision Processes MDPs and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Markov Decision Processes MDPs 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?