UNCC ITCS 3153 - Making Complex Decisions - D2922992

Home> Schools> University of North Carolina - Charlotte> Computer Science (ITCS) > ITCS 3153> Making Complex Decisions

DOC PREVIEW

UNCC ITCS 3153 - Making Complex Decisions

School name University of North Carolina - Charlotte

Course Itcs 3153- Intro Artificial Intelligence

Pages 33

This preview shows page 1-2-15-16-17-32-33 out of 33 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

ITCS 3153 Artificial IntelligenceRobot ExampleSimilar to 15-puzzle problemHow about other search techniquesMarkov decision processes (MDP)Building a policySlide 7Using a policyExample solutionsStriking a balanceAttributes of optimalityTime horizonEvaluating state sequencesEvaluating infinite horizonsSlide 15Evaluating a policyBuilding an optimal policyUtility of statesExampleRestating the policyPutting pieces togetherWhat a dealExample of Bellman EquationUsing Bellman Equations to solve MDPsIterative solution of Bellman equationsBellman UpdateConvergence of value iterationSlide 28Policy IterationPolicy iterationSlide 31Slide 32Slide 33ITCS 3153Artificial IntelligenceLecture 19Lecture 19Making Complex DecisionsMaking Complex DecisionsChapter 17Chapter 17Lecture 19Lecture 19Making Complex DecisionsMaking Complex DecisionsChapter 17Chapter 17Robot ExampleImagine a robot with only local sensingImagine a robot with only local sensing•Traveling from A to BTraveling from A to B•Actions have uncertainActions have uncertainresults – might move atresults – might move atright angle to desiredright angle to desired•We want robot to “learn”We want robot to “learn”how to navigate in this how to navigate in this roomroomSequential Decision ProblemSequential Decision ProblemImagine a robot with only local sensingImagine a robot with only local sensing•Traveling from A to BTraveling from A to B•Actions have uncertainActions have uncertainresults – might move atresults – might move atright angle to desiredright angle to desired•We want robot to “learn”We want robot to “learn”how to navigate in this how to navigate in this roomroomSequential Decision ProblemSequential Decision ProblemSimilar to 15-puzzle problemHow is this similar and different from 15-puzzle?How is this similar and different from 15-puzzle?•Let robot position beLet robot position bethe blank tilethe blank tile•Keep issuing movementKeep issuing movementcommandscommands•Eventually a sequenceEventually a sequenceof commands will causeof commands will causerobot to reach goalrobot to reach goalHow is this similar and different from 15-puzzle?How is this similar and different from 15-puzzle?•Let robot position beLet robot position bethe blank tilethe blank tile•Keep issuing movementKeep issuing movementcommandscommands•Eventually a sequenceEventually a sequenceof commands will causeof commands will causerobot to reach goalrobot to reach goalOur model of the world is incompleteOur model of the world is incompleteOur model of the world is incompleteOur model of the world is incompleteHow about other search techniquesGenetic AlgorithmsGenetic Algorithms•Let each “gene” be aLet each “gene” be asequence of L, R, U, Dsequence of L, R, U, D–Length unknownLength unknown–Poor feedbackPoor feedbackSimulated annealing?Simulated annealing?Genetic AlgorithmsGenetic Algorithms•Let each “gene” be aLet each “gene” be asequence of L, R, U, Dsequence of L, R, U, D–Length unknownLength unknown–Poor feedbackPoor feedbackSimulated annealing?Simulated annealing?Markov decision processes (MDP)Initial StateInitial State•SS00 Transition ModelTransition Model•T (s, a, s’)T (s, a, s’)–How does Markov apply here?How does Markov apply here?–Uncertainty is possibleUncertainty is possibleReward FunctionReward Function•R(s)R(s)–For each stateFor each stateInitial StateInitial State•SS00 Transition ModelTransition Model•T (s, a, s’)T (s, a, s’)–How does Markov apply here?How does Markov apply here?–Uncertainty is possibleUncertainty is possibleReward FunctionReward Function•R(s)R(s)–For each stateFor each stateBuilding a policyHow might we acquire and store a solution?How might we acquire and store a solution?•Is this a search problem?Is this a search problem?–Isn’t everything?Isn’t everything?•Avoid local minsAvoid local mins•Avoid dead endsAvoid dead ends•Avoid needless repetitionAvoid needless repetitionKey observation: Key observation: if the number of states is small, consider if the number of states is small, consider evaluating states rather than evaluating action sequencesevaluating states rather than evaluating action sequencesHow might we acquire and store a solution?How might we acquire and store a solution?•Is this a search problem?Is this a search problem?–Isn’t everything?Isn’t everything?•Avoid local minsAvoid local mins•Avoid dead endsAvoid dead ends•Avoid needless repetitionAvoid needless repetitionKey observation: Key observation: if the number of states is small, consider if the number of states is small, consider evaluating states rather than evaluating action sequencesevaluating states rather than evaluating action sequencesBuilding a policySpecify a solution for any initial stateSpecify a solution for any initial state•Construct a Construct a policypolicy that outputs the best action for any state that outputs the best action for any state–policy = policy = –policy in state s = policy in state s = (s)(s)•Complete policyComplete policy covers all potential input states covers all potential input states•Optimal policy, Optimal policy, *, yields the highest *, yields the highest expectedexpected utility utility–Why expected?Why expected?Transitions are stochasticTransitions are stochasticSpecify a solution for any initial stateSpecify a solution for any initial state•Construct a Construct a policypolicy that outputs the best action for any state that outputs the best action for any state–policy = policy = –policy in state s = policy in state s = (s)(s)•Complete policyComplete policy covers all potential input states covers all potential input states•Optimal policy, Optimal policy, *, yields the highest *, yields the highest expectedexpected utility utility–Why expected?Why expected?Transitions are stochasticTransitions are stochasticUsing a policyAn agent in state sAn agent in state s•s is the percept available to agents is the percept available to agent*(s) outputs an action that maximizes expected utility*(s) outputs an action that maximizes expected utilityThe policy is a description of a simple reflexThe policy is a description of a simple reflexAn agent in state sAn agent in state s•s is the percept available to agents is the percept available to agent*(s) outputs an action that maximizes expected utility*(s) outputs an

View Full Document