View Full Document

# A Tutorial on the Partially Observable Markov

View Full Document
View Full Document

9 views

Unformatted text preview:

A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7 2006 Outlin e Overview of Markov decision Processes MDPs Introduction to partially observable decision processes POMDPs Some applications of POMDPs Overview of MDPs Introduction to POMDPs model Some applications of POMDPs Conclusions Markov decision processes The MDP is defined by the tuple S A T R S is a finite set of states of the world A is a finite set of actions T S A S is the state transition function the probability of an action changing the the world state from one to another T s a s R S A is the reward for the agent in a given world state after performing an action R s a Markov decision processes Two properties of the MDP The action dependent state transition is Markovian The state is fully observable after taking Illustration of MDPs action a AGENT s a Action a State s WORLD T s a s Markov decision processes Objective of MDPs Finding the optimal policy mapping state s to action a in order to maximize the value function V s V s max R s a T s a s V s a s Overview of MDPs Introduction to POMDPs Some applications of POMDPs Conclusions Introduction to POMDPs The POMDP is defined by the tuple S A T R O S A T and R are defined the same as in MDPs is a finite set of observations the agent can experience its world O S A is the observation function the probability of making a certain observation after performing a particular action landing in state s O s a o Introduction to POMDPs Differences between MDPs and POMDPs The state is hidden after taking action a The hidden state information is inferred from the action state dependent observation function O s a o Uncertainty of state s in POMDPs Introduction to POMDPs A new concept in POMDPs Belief State b sb s Pr s s o a o a o a o t t 1 1 2 2 t 1 t 1 t Introduction to POMDPs The belief state b s evolves according to Bayes rule b s O s a o T s a s b s Pr o a b s1 b s2 1 s S s1 o1 o2 s3 n control interval remaining b T

## Access the best Study Guides, Lecture Notes and Practice Exams Unlocking...