Chapter 16 Planning Based on Markov Decision ProcessesMotivationStochastic SystemsExampleSlide 5PoliciesInitial StatesHistoriesSlide 9Slide 10Slide 11UtilitySlide 13Discounted UtilitySlide 15Planning as OptimizationBellman’s TheoremPolicy IterationSlide 19PowerPoint PresentationExample (Continued)Value IterationSlide 23Slide 24DiscussionDiscussion (Continued)Probabilistic Set-Theoretic PlanningSlide 28Real-Time Value IterationReal-Time Dynamic ProgrammingSlide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43POMDPsSlide 45More about Sensing ActionsBelief StatesBelief States (Continued)Slide 49Slide 50Slide 51Slide 52Policies on Belief StatesSlide 54Planning AlgorithmsReachability and Extended GoalsDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/1Chapter 16Planning Based on MarkovDecision ProcessesDana S. NauUniversity of Maryland09:46 PM January 13, 2019Lecture slides forAutomated Planning: Theory and PracticeDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/2MotivationUntil now, we’ve assumedthat each action has only onepossible outcomeBut often that’s unrealisticIn many situations, actions may havemore than one possible outcomeAction failures»e.g., gripper drops its loadExogenous events»e.g., road closedWould like to be able to plan in such situationsOne approach: Markov Decision Processesacbgrasp(c)acbIntendedoutcomea bcUnintendedoutcomeDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/3Stochastic SystemsStochastic system: a triple = (S, A, P) S = finite set of states A = finite set of actions Pa (s | s) = probability of going to s if we execute a in s s S Pa (s | s) = 1Several different possible action representationse.g., Bayes networks, probabilistic operatorsThe book does not commit to any particular representationIt only deals with the underlying semanticsExplicit enumeration of each Pa (s | s)Dana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/4Robot r1 startsat location l1State s1 inthe diagramObjective is toget r1 to location l4State s4 inthe diagramGoalStartmove(r1,l2,l1)waitwait2ExamplewaitwaitDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/5Robot r1 startsat location l1State s1 inthe diagramObjective is toget r1 to location l4State s4 inthe diagramNo classical plan (sequence of actions) can be a solution, because we can’t guarantee we’ll be in a state where the next action is applicableπ = move(r1,l1,l2), move(r1,l2,l3), move(r1,l3,l4) GoalStartmove(r1,l2,l1)waitwaitwaitwait2ExampleDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/6π1 = {(s1, move(r1,l1,l2)), (s2, move(r1,l2,l3)), (s3, move(r1,l3,l4)), (s4, wait), (s5, wait)}Policy: a function that maps states into actionsWrite it as a set of state-action pairsGoalStartmove(r1,l2,l1)waitwaitwaitwait2PoliciesDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/7For every state s, therewill be a probabilityP(s) that the systemstarts in sThe book assumesthere’s a unique states0 such that the systemalways starts in s0In the example, s0 = s1P(s1) = 1P(s) = 0 for all s ≠ s1GoalStartmove(r1,l2,l1)waitwaitwaitwait2Initial StatesDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/8History: a sequenceof system statesh = s0, s1, s2, s3, s4, … h0 = s1, s3, s1, s3, s1, … h1 = s1, s2, s3, s4, s4, … h2 = s1, s2, s5, s5, s5, … h3 = s1, s2, s5, s4, s4, … h4 = s1, s4, s4, s4, s4, … h5 = s1, s1, s4, s4, s4, … h6 = s1, s1, s1, s4, s4, … h7 = s1, s1, s1, s1, s1, … Each policy induces a probability distribution over historiesIf h = s0, s1, … then P(h|π) = P(s0) i ≥ 0 Pπ(Si) (si+1 | si)GoalStartmove(r1,l2,l1)waitwaitwaitwait2HistoriesThe book omits this because it assumes a unique starting stateDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/9goalπ1 = {(s1, move(r1,l1,l2)), (s2, move(r1,l2,l3)), (s3, move(r1,l3,l4)), (s4, wait), (s5, wait)}h1 = s1, s2, s3, s4, s4, … P(h1 | π1) = 1 1 .8 1 … = 0.8h2 = s1, s2, s5, s5 … P(h2 | π1) = 1 1 .2 1 … = 0.2 P(h | π1) = 0 for all other hso π1 reaches the goal with probability 0.8GoalStartmove(r1,l2,l1)waitwaitwaitwait2ExampleDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/10goalπ2 = {(s1, move(r1,l1,l2)), (s2, move(r1,l2,l3)), (s3, move(r1,l3,l4)), (s4, wait), (s5, move(r1,l5,l4))}h1 = s1, s2, s3, s4, s4, … P(h1 | π2) = 1 0.8 1 1 … = 0.8h3 = s1, s2, s5, s4, s4, … P(h3 | π2) = 1 0.2 1 1 … = 0.2 P(h | π1) = 0 for all other hso π2 reaches the goal with probability 1GoalStartmove(r1,l2,l1)waitwaitwaitwait2ExamplewaitDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/11goalπ3 = {(s1, move(r1,l1,l4)), (s2, move(r1,l2,l1)), (s3, move(r1,l3,l4)), (s4, wait), (s5, move(r1,l5,l4)}π3 reaches the goal withprobability 1.0h4 = s1, s4, s4, s4, … P(h4 |
View Full Document