DOC PREVIEW
UMD CMSC 421 - Decision Making Under Uncertainty

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Decision Making Under Decision Making Under UncertaintyUncertaintyRussell and Norvig: ch 16, 17CMSC421 – Fall 2003material from Jean-Claude Latombe, andDaphne KollerUtilityUtility--Based AgentBased Agentenvironmentagent?sensorsactuatorsNonNon--deterministic vs. deterministic vs. Probabilistic UncertaintyProbabilistic Uncertainty?bac{a,b,c}Æ decision that isbest for worst case?bac{a(pa),b(pb),c(pc)}Æ decision that maximizesexpected utility valueNon-deterministic model Probabilistic model~ Adversarial searchExpected UtilityExpected UtilityRandom variable X with n values x1,…,xnand distribution (p1,…,pn)E.g.: X is the state reached after doing an action A under uncertaintyFunction U of XE.g., U is the utility of a stateThe expected utility of A isEU[A] = Σi=1,…,n p(xi|A)U(xi)2s0s3s2s1A10.2 0.7 0.1100 50 70U(S0) = 100 x 0.2 + 50 x 0.7 + 70 x 0.1= 20 + 35 + 7= 62One State/One Action ExampleOne State/One Action Examples0s3s2s1A10.2 0.7 0.1100 50 70A2s40.2 0.880• U1(S0) = 62• U2(S0) = 74• U(S0) = max{U1(S0),U2(S0)} = 74One State/Two Actions ExampleOne State/Two Actions ExampleMEU Principlerational agent should choose the action that maximizes agent’s expected utilitythis is the basis of the field of decision theorynormative criterion for rational choice of action Not quite…Must have complete model of: Actions Utilities StatesEven if you have a complete model, will be computationally intractableIn fact, a truly rational agent takes into account the utility of reasoning as well---bounded rationalityNevertheless, great progress has been made in this area recently, and we are able to solve much more complex decision theoretic problems than ever before3We’ll look atDecision Theoretic Planning Simple decision making (ch. 16) Sequential decision making (ch. 17)Decision NetworksExtend BNs to handle actions and utilitiesAlso called Influence diagramsMake use of BN inferenceCan do Value of Information calculationsDecision Networks cont.Chance nodes: random variables, as in BNsDecision nodes: actions that decision maker can takeUtility/value nodes: the utility of the outcome state. R&N example4Prenatal Testing Example Umbrella Networkweatherforecastumbrellahappinesstake/don’t takef w p(f|w)sunny rain 0.3rainy rain 0.7sunny no rain 0.8rainy no rain 0.2P(rain) = 0.4U(lug,rain) = -25U(lug,~rain) = 0U(~lug, rain) = -100U(~lug, ~rain) = 100Lug umbrellaP(lug|take) = 1.0P(~lug|~take)=1.0Evaluating Decision NetworksSet the evidence variables for current stateFor each possible value of the decision node: Set decision node to that value Calculate the posterior probability of the parent nodes of the utility node, using BN inference Calculate the resulting utility for actionreturn the action with the highest utilityValue of Information (VOI)suppose agent’s current knowledge is E. The value of the current best action α is∑=αiiiA))A(Do,E|)A(sult(ReP))A(sult(ReUmax)E|(EUthe value of the new best action (after new evidence E’ is obtained):∑′=′α′iiiA))A(Do,E,E|)A(sult(ReP))A(sult(ReUmax)E,E|(EUthe value of information for E’ is:)E|(EU)E,e|(EU)E|e(P)E(VOIkkekkα−α=′∑5Sequential Decision MakingFinite HorizonInfinite HorizonSimple Robot Navigation ProblemSimple Robot Navigation Problem• In each state, the possible actions are U, D, R, and LProbabilistic Transition ModelProbabilistic Transition Model• In each state, the possible actions are U, D, R, and L• The effect of U is as follows (transition model):• With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move)Probabilistic Transition ModelProbabilistic Transition Model• In each state, the possible actions are U, D, R, and L• The effect of U is as follows (transition model):• With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move)• With probability 0.1 the robot moves right one square (if therobot is already in the rightmost row, then it does not move)6Probabilistic Transition ModelProbabilistic Transition Model• In each state, the possible actions are U, D, R, and L• The effect of U is as follows (transition model):• With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move)• With probability 0.1 the robot moves right one square (if therobot is already in the rightmost row, then it does not move)• With probability 0.1 the robot moves left one square (if therobot is already in the leftmost row, then it does not move)Markov PropertyMarkov PropertyThe transition properties depend only on the current state, not on previous history (how that state was reached)Sequence of ActionsSequence of Actions• Planned sequence of actions: (U, R)2314321[3,2]Sequence of ActionsSequence of Actions• Planned sequence of actions: (U, R)• U is executed2314321[3,2][4,2][3,3][3,2]7HistoriesHistories• Planned sequence of actions: (U, R)• U has been executed• R is executed• There are 9 possible sequences of states – called histories – and 6 possible final states for the robot!4321231[3,2][4,2][3,3][3,2][3,3][3,2] [4,1] [4,2] [4,3][3,1]Probability of Reaching the GoalProbability of Reaching the Goal•P([4,3] | (U,R).[3,2]) = P([4,3] | R.[3,3]) x P([3,3] | U.[3,2])+ P([4,3] | R.[4,2]) x P([4,2] | U.[3,2])2314321Note importance of Markov property in this derivation•P([3,3] | U.[3,2]) = 0.8•P([4,2] | U.[3,2]) = 0.1•P([4,3] | R.[3,3]) = 0.8•P([4,3] | R.[4,2]) = 0.1•P([4,3] | (U,R).[3,2]) = 0.65Utility FunctionUtility Function•[4,3] provides power supply•[4,2] is a sand area from which the robot cannot escape-1+12314321Utility FunctionUtility Function• [4,3] provides power supply• [4,2] is a sand area from which the robot cannot escape• The robot needs to recharge its batteries-1+123143218Utility FunctionUtility Function• [4,3] provides power supply• [4,2] is a sand area from which the robot cannot escape• The robot needs to recharge its batteries• [4,3] or [4,2] are terminal states-1+12314321Utility of a HistoryUtility of a History• [4,3] provides power supply• [4,2] is a sand area from which the robot cannot escape• The robot needs to recharge its batteries• [4,3] or [4,2] are terminal states•The utility of a history is defined by the utility of the last state (+1 or –1) minus n/25, where n is the number of


View Full Document

UMD CMSC 421 - Decision Making Under Uncertainty

Download Decision Making Under Uncertainty
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Decision Making Under Uncertainty and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Decision Making Under Uncertainty 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?