USC CSCI 571 - aips_00 - D2035937

Home> Schools> University of Southern California> Computer Science (CSCI) > CSCI 571> aips_00

DOC PREVIEW

USC CSCI 571 - aips_00

School name University of Southern California

Course Csci 571- Web Technologies

Pages 42

This preview shows page 1-2-3-20-21-40-41-42 out of 42 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 42 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Representations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology1Sven Koenig and Yaxin LiuCollege of ComputingRepresentations ofGeorgia Institute of TechnologyDecision-Theoretic Planning TasksRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology2Artificial Intelligence Planning- Representations- AlgorithmsRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology3Artificial Intelligence Planning- Representations- Algorithms- Planning ObjectiveRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology4Probabilistic Planning with Goal-Directed MDPs (1)startgoallow medium highgoal0.60.10.10.2robot is tipped over0.10.70.10.1southeastnorth, eastsouth, westall actions take the same time to execute(same with staircases for indoor navigation)elevation elevation elevation......west eastRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology5Probabilistic Planning with Goal-Directed MDPs (2)E( Σγtreward of action executed at time t + γn reward of goal in which execution stops )t=0n-1n = number of action executions before execution stopsγ = discount factor (0 < g ≤ 1)find a planthat maximizes the expected total (discounted) rewardRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology6action-penalty representationgoal-reward representationaction rewards = -1goal rewards = 0action rewards = 0goal rewards = 1Assumption:resourceconsumptionin planningreinforcementin reinforcementlearningRepresentation ChoicesE( Σγtreward of action executed at time t + γn reward of goal in which execution stops )γ0 1 + γ1 1 + γ2 1 + γ3 1 γ4 0γ0 0 + γ1 0 + γ2 0 + γ3 0 γ4 1++action-penalty representationgoal-reward representationt=0n-1all actions take the same time to executeRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology7discounting no discounting0 < γ < 1 γ = 1guarantees the total reward to be finiteresults in mathematical conveniencebut is an unrealistic mathematical trickRepresentation ChoicesRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology8discounting no discountingaction-penalty representationgoal-reward representationRepresentation Choices0 < γ < 1 γ = 1action rewards = -1goal rewards = 0action rewards = 0goal rewards = 1Representations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology9discounting no discountingaction-penalty representationgoal-reward representationRepresentation Choices0 < γ < 1 γ = 1action rewards = -1goal rewards = 0action rewards = 0goal rewards = 1best plan=best planRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology10Example: Terrain for Outdoor NavigationRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology11ExamplestartgoalRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology12Example: Action-Penalty Representation with no Discountingaction-penalty representationgoal-reward representationdiscounting no discountingherestartgoalPlan(= Policy)Representations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology13Example: Goal-Reward Representation with Discount Factor 0.9action-penalty representationgoal-reward representationdiscounting no discountingγ = 0.9startgoalRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology14Trapping PhenomenonFor robot-navigation tasks, we often want the robot to find a plan thatamong all plans that achieve the goal for sure.maximizes the expected total rewardRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology15Trapping Phenomenon - TheoremsTheorem: Every plan that maximizes the expected total undiscounted reward forthe action-penalty representation achieves the goal for sureprovided that the goal can be achieved for sure from the start state.Theorem: A plan that maximizes the expected total discounted reward forthe goal-reward representation does not necessarily achieve the goal for sureeven if the goal can be achieved for sure from the start state.discounting no discountingaction-penalty representationgoal-reward representation0 < γ < 1 γ = 1action rewards = -1goal rewards = 0action rewards = 0goal rewards = 1optimal planachieves the goaloptimal planis not guaranteed()to achieve the goaloptimal planis not guaranteedto achieve the goalRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology16start goal11 action executionsPlan 1 Plan 2startgoal0.90.11 action execution1 action execution1.01.0Trapping Phenomenon - ExamplesAction-Penalty Representation with no Discounting1.0 x (-11) = -11.00000.9 x (-1) + 0.1 x (-∞) = -∞Goal-Reward Representation with Discount Factor 0.91.0 x 0.911 = 0.31380.9 x 0.91 + 0.1 x 0.9−∞ = 0.8100achieves the goal for sure is not guaranteed to achieve the goal for surethe robot is tipped overRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology17Trapping Phenomenon - ExplanationTheorem: Every plan that maximizes the expected total discounted reward forthe goal-reward representation and discount factor γ (0 < γ < 1)also maximizes the expected total undiscounted utility forthe action-penalty representationand the convex exponential utility function u(r) = γ-r, and vice versa.discounting no discountingaction-penalty representationgoal-reward representation0 < γ < 1 γ = 1action rewards = -1goal rewards = 0action rewards = 0goal rewards = 1best plan=best planif this robot is risk-seekingand thus de-emphasizes bad outcomesRepresentations of Decision-Theoretic Planning Tasks; Sven Koenig and Yaxin Liu; Georgia Institute of Technology18Trapping Phenomenon - ExplanationTheorem: Every plan that maximizes the expected total discounted reward forthe goal-reward representation and discount factor γ (0 < γ < 1)also maximizes the expected total undiscounted

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-20-21-40-41-42 out of 42 pages.

USC CSCI 571 - aips_00

Sign up for free to view:

Please select your school