Representations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology1Sven Koenig and Yaxin LiuCollege of ComputingRepresentations ofGeorgia Institute of TechnologyDecision-Theoretic Planning TasksRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology2Artificial Intelligence Planning- Representations- AlgorithmsRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology3Artificial Intelligence Planning- Representations- Algorithms- Planning ObjectiveRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology4Probabilistic Planning with Goal-Directed MDPs (1)startgoallow medium highgoal0.60.10.10.2robot is tipped over0.10.70.10.1southeastnorth, eastsouth, westall actions take the same time to execute(same with staircases for indoor navigation)elevation elevation elevation......west eastRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology5Probabilistic Planning with Goal-Directed MDPs (2)E( Σγtreward of action executed at time t + γn reward of goal in which execution stops )t=0n-1n = number of action executions before execution stopsγ = discount factor (0 < g ≤ 1)find a planthat maximizes the expected total (discounted) rewardRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology6action-penalty representationgoal-reward representationaction rewards = -1goal rewards = 0action rewards = 0goal rewards = 1Assumption:resourceconsumptionin planningreinforcementin reinforcementlearningRepresentation ChoicesE( Σγtreward of action executed at time t + γn reward of goal in which execution stops )γ0 1 + γ1 1 + γ2 1 + γ3 1 γ4 0γ0 0 + γ1 0 + γ2 0 + γ3 0 γ4 1++action-penalty representationgoal-reward representationt=0n-1all actions take the same time to executeRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology7discounting no discounting0 < γ < 1 γ = 1guarantees the total reward to be finiteresults in mathematical conveniencebut is an unrealistic mathematical trickRepresentation ChoicesRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology8discounting no discountingaction-penalty representationgoal-reward representationRepresentation Choices0 < γ < 1 γ = 1action rewards = -1goal rewards = 0action rewards = 0goal rewards = 1Representations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology9discounting no discountingaction-penalty representationgoal-reward representationRepresentation Choices0 < γ < 1 γ = 1action rewards = -1goal rewards = 0action rewards = 0goal rewards = 1best plan=best planRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology10Example: Terrain for Outdoor NavigationRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology11ExamplestartgoalRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology12Example: Action-Penalty Representation with no Discountingaction-penalty representationgoal-reward representationdiscounting no discountingherestartgoalPlan(= Policy)Representations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology13Example: Goal-Reward Representation with Discount Factor 0.9action-penalty representationgoal-reward representationdiscounting no discountingγ = 0.9startgoalRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology14Trapping PhenomenonFor robot-navigation tasks, we often want the robot to find a plan thatamong all plans that achieve the goal for sure.maximizes the expected total rewardRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology15Trapping Phenomenon - TheoremsTheorem: Every plan that maximizes the expected total undiscounted reward forthe action-penalty representation achieves the goal for sureprovided that the goal can be achieved for sure from the start state.Theorem: A plan that maximizes the expected total discounted reward forthe goal-reward representation does not necessarily achieve the goal for sureeven if the goal can be achieved for sure from the start state.discounting no discountingaction-penalty representationgoal-reward representation0 < γ < 1 γ = 1action rewards = -1goal rewards = 0action rewards = 0goal rewards = 1optimal planachieves the goaloptimal planis not guaranteed()to achieve the goaloptimal planis not guaranteedto achieve the goalRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology16start goal11 action executionsPlan 1 Plan 2startgoal0.90.11 action execution1 action execution1.01.0Trapping Phenomenon - ExamplesAction-Penalty Representation with no Discounting1.0 x (-11) = -11.00000.9 x (-1) + 0.1 x (-∞) = -∞Goal-Reward Representation with Discount Factor 0.91.0 x 0.911 = 0.31380.9 x 0.91 + 0.1 x 0.9−∞ = 0.8100achieves the goal for sure is not guaranteed to achieve the goal for surethe robot is tipped overRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology17Trapping Phenomenon - ExplanationTheorem: Every plan that maximizes the expected total discounted reward forthe goal-reward representation and discount factor γ (0 < γ < 1)also maximizes the expected total undiscounted utility forthe action-penalty representationand the convex exponential utility function u(r) = γ-r, and vice versa.discounting no discountingaction-penalty representationgoal-reward representation0 < γ < 1 γ = 1action rewards = -1goal rewards = 0action rewards = 0goal rewards = 1best plan=best planif this robot is risk-seekingand thus de-emphasizes bad outcomesRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology18Trapping Phenomenon - ExplanationTheorem: Every plan that maximizes the expected total discounted reward forthe goal-reward representation and discount factor γ (0 < γ < 1)also maximizes the expected total undiscounted
View Full Document