UMD CMSC 722 - Chapter 16 Planning Based on Markov Decision Processes

Unformatted text preview:

Chapter 16 Planning Based on Markov Decision ProcessesMotivationStochastic SystemsExampleSlide 5PoliciesInitial StatesHistoriesSlide 9Slide 10Slide 11UtilitySlide 13Discounted UtilitySlide 15Planning as OptimizationBellman’s TheoremPolicy IterationSlide 19PowerPoint PresentationExample (Continued)Value IterationSlide 23Slide 24DiscussionDiscussion (Continued)Probabilistic Set-Theoretic PlanningSlide 28Real-Time Value IterationReal-Time Dynamic ProgrammingSlide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43POMDPsSlide 45More about Sensing ActionsBelief StatesBelief States (Continued)Slide 49Slide 50Slide 51Slide 52Policies on Belief StatesSlide 54Planning AlgorithmsReachability and Extended GoalsDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/1Chapter 16Planning Based on MarkovDecision ProcessesDana S. NauUniversity of Maryland09:46 PM January 13, 2019Lecture slides forAutomated Planning: Theory and PracticeDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/2MotivationUntil now, we’ve assumedthat each action has only onepossible outcomeBut often that’s unrealisticIn many situations, actions may havemore than one possible outcomeAction failures»e.g., gripper drops its loadExogenous events»e.g., road closedWould like to be able to plan in such situationsOne approach: Markov Decision Processesacbgrasp(c)acbIntendedoutcomea bcUnintendedoutcomeDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/3Stochastic SystemsStochastic system: a triple  = (S, A, P) S = finite set of states A = finite set of actions Pa (s | s) = probability of going to s if we execute a in s s  S Pa (s | s) = 1Several different possible action representationse.g., Bayes networks, probabilistic operatorsThe book does not commit to any particular representationIt only deals with the underlying semanticsExplicit enumeration of each Pa (s | s)Dana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/4Robot r1 startsat location l1State s1 inthe diagramObjective is toget r1 to location l4State s4 inthe diagramGoalStartmove(r1,l2,l1)waitwait2ExamplewaitwaitDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/5Robot r1 startsat location l1State s1 inthe diagramObjective is toget r1 to location l4State s4 inthe diagramNo classical plan (sequence of actions) can be a solution, because we can’t guarantee we’ll be in a state where the next action is applicableπ = move(r1,l1,l2), move(r1,l2,l3), move(r1,l3,l4) GoalStartmove(r1,l2,l1)waitwaitwaitwait2ExampleDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/6π1 = {(s1, move(r1,l1,l2)), (s2, move(r1,l2,l3)), (s3, move(r1,l3,l4)), (s4, wait), (s5, wait)}Policy: a function that maps states into actionsWrite it as a set of state-action pairsGoalStartmove(r1,l2,l1)waitwaitwaitwait2PoliciesDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/7For every state s, therewill be a probabilityP(s) that the systemstarts in sThe book assumesthere’s a unique states0 such that the systemalways starts in s0In the example, s0 = s1P(s1) = 1P(s) = 0 for all s ≠ s1GoalStartmove(r1,l2,l1)waitwaitwaitwait2Initial StatesDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/8History: a sequenceof system statesh = s0, s1, s2, s3, s4, …  h0 = s1, s3, s1, s3, s1, …  h1 = s1, s2, s3, s4, s4, …  h2 = s1, s2, s5, s5, s5, …  h3 = s1, s2, s5, s4, s4, …  h4 = s1, s4, s4, s4, s4, …  h5 = s1, s1, s4, s4, s4, …  h6 = s1, s1, s1, s4, s4, …  h7 = s1, s1, s1, s1, s1, … Each policy induces a probability distribution over historiesIf h = s0, s1, …  then P(h|π) = P(s0) i ≥ 0 Pπ(Si) (si+1 | si)GoalStartmove(r1,l2,l1)waitwaitwaitwait2HistoriesThe book omits this because it assumes a unique starting stateDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/9goalπ1 = {(s1, move(r1,l1,l2)), (s2, move(r1,l2,l3)), (s3, move(r1,l3,l4)), (s4, wait), (s5, wait)}h1 = s1, s2, s3, s4, s4, …  P(h1 | π1) = 1  1  .8  1  … = 0.8h2 = s1, s2, s5, s5 …  P(h2 | π1) = 1  1  .2  1  … = 0.2 P(h | π1) = 0 for all other hso π1 reaches the goal with probability 0.8GoalStartmove(r1,l2,l1)waitwaitwaitwait2ExampleDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/10goalπ2 = {(s1, move(r1,l1,l2)), (s2, move(r1,l2,l3)), (s3, move(r1,l3,l4)), (s4, wait), (s5, move(r1,l5,l4))}h1 = s1, s2, s3, s4, s4, …  P(h1 | π2) = 1  0.8  1  1  … = 0.8h3 = s1, s2, s5, s4, s4, …  P(h3 | π2) = 1  0.2  1  1  … = 0.2 P(h | π1) = 0 for all other hso π2 reaches the goal with probability 1GoalStartmove(r1,l2,l1)waitwaitwaitwait2ExamplewaitDana Nau: Lecture slides for Automated PlanningLicensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/11goalπ3 = {(s1, move(r1,l1,l4)), (s2, move(r1,l2,l1)), (s3, move(r1,l3,l4)), (s4, wait), (s5, move(r1,l5,l4)}π3 reaches the goal withprobability 1.0h4 = s1, s4, s4, s4, …  P(h4 |


View Full Document
Download Chapter 16 Planning Based on Markov Decision Processes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chapter 16 Planning Based on Markov Decision Processes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter 16 Planning Based on Markov Decision Processes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?