TAMU CSCE 625 - Markov Decision Processes MDPs - D3151653

Home> Schools> Texas A&M University> Computer Sci. & Engr. (CSCE) > CSCE 625> Markov Decision Processes MDPs

DOC PREVIEW

TAMU CSCE 625 - Markov Decision Processes MDPs

School name Texas A&M University

Course Csce 625- Artificial Intelligence

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Markov Decision Processes (MDPs)Slide 2Slide 3Slide 4Calculating V*(s)Slide 6Markov Decision Processes (MDPs)•read Ch 17.1-17.2•utility-based agents–goals encoded in utility function U(s), or U:S •effects of actions encoded in state transition function: T:SxAS–or T:SxApdf(S) for non-deterministic•rewards/costs encoded in reward function: R:SxA•Markov property: effects of actions only depend on current state, not previous history•the goal: maximize reward over time–long-term discounted reward–handles infinite horizon; encourages quicker achievement•“plans” are encoded in policies–mappings from states to actions: :SA•how to compute optimal policy * that maximizes long-term discounted reward?•value function V(s): expected long-term reward from starting in state s and following policy  •derive policy from V(s):• (s)=maxaA E[R(s,a)+V(T(s,(s)))] • = max  p(s’|s,a)·(R+V(s’))•optimal policy comes from optimal value function: (s)= max  p(s’|s,a)·V*(s’)=•Bellman’s equations–(eqn 17.5)•method 1: linear programming–n coupled linear equations–v1 = max(v2,v3,v4...)–v2 = max(v1,v3,v4...)–v3 = max(v1,v2,v4...) –solve for {v1,v2,v3...} using Gnu LP kit, etc.Calculating V*(s)•method 2: Value Iteration–initialize V(s)=0 for all states–iteratively update value of each state based on neighbors–...until

View Full Document