DOC PREVIEW
Berkeley COMPSCI 188 - Decision diagrams (6PP)

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 188: Artificial IntelligenceFall 2011Lecture 17: Decision Diagrams10/27/2011Dan Klein – UC BerkeleyDecision Networks MEU: choose the action which maximizes the expected utility given the evidence Can directly operationalize this with decision networks Bayes nets with nodes for utility and actions Lets us calculate the expected utility for each action New node types: Chance nodes (just like BNs) Actions (rectangles, cannot have parents, act as observed evidence) Utility node (diamond, depends on action and chance nodes)WeatherForecastUmbrellaU4[DEMO: Ghostbusters]Decision Networks Action selection: Instantiate all evidence Set action node(s) each possible way Calculate posterior for all parents of utility node, given the evidence Calculate expected utility for each action Choose maximizing actionWeatherForecastUmbrellaU5Example: Decision NetworksWeatherUmbrellaUW P(W)sun 0.7rain 0.3A W U(A,W)leave sun 100leave rain 0take sun 20take rain 70Umbrella = leaveUmbrella = takeOptimal decision = leaveDecisions as Outcome Trees Almost exactly like expectimax / MDPs What’s changed?7U(t,s)Weather | {} Weather | {}{}U(t,r) U(l,s) U(l,r)Example: Decision NetworksWeatherForecast=badUmbrellaUA W U(A,W)leave sun 100leave rain 0take sun 20take rain 70W P(W|F=bad)sun 0.34rain 0.66Umbrella = leaveUmbrella = takeOptimal decision = take92Decisions as Outcome Trees10U(t,s)W | {b} W | {b}U(t,r) U(l,s) U(l,r){b}Value of Information Idea: compute value of acquiring evidence Can be done directly from decision network Example: buying oil drilling rights Two blocks A and B, exactly one has oil, worth k You can drill in one location Prior probabilities 0.5 each, & mutually exclusive Drilling in either A or B has EU = k/2, MEU = k/2 Question: what’s the value of information of O? Value of knowing which of A or B has oil Value is expected gain in MEU from new info Survey may say “oil in a” or “oil in b,” prob 0.5 each If we know OilLoc, MEU is k (either way) Gain in MEU from knowing OilLoc? VPI(OilLoc) = k/2 Fair price of information: k/2OilLocDrillLocUD O Ua a ka b 0b a 0b b kO Pa 1/2b 1/211VPI Example: WeatherWeatherForecastUmbrellaUA W Uleave sun 100leave rain 0take sun 20take rain 70MEU with no evidenceMEU if forecast is badMEU if forecast is goodF P(F)good 0.59bad 0.41Forecast distribution12Value of Information Assume we have evidence E=e. Value if we act now: Assume we see that E’ = e’. Value if we act then: BUT E’ is a random variable whose value isunknown, so we don’t know what e’ will be Expected value if E’ is revealed and then we act: Value of information: how much MEU goes upby revealing E’ first then acting, over acting now:P(s | e){e}aU{e, e’}aP(s | e, e’)U{e}P(e’ | e){e, e’}VPI Properties Nonnegative Nonadditive ---consider, e.g., obtaining Ejtwice Order-independent14Quick VPI Questions The soup of the day is either clam chowder or split pea, but you wouldn’t order either one. What’s the value of knowing which it is? There are two kinds of plastic forks at a picnic. One kind is slightly sturdier. What’s the value of knowing which? You’re playing the lottery. The prize will be $0 or $100. You can play any number between 1 and 100 (chance of winning is 1%). What is the value of knowing the winning number?3POMDPs MDPs have: States S Actions A Transition fn P(s’|s,a) (or T(s,a,s’)) Rewards R(s,a,s’) POMDPs add: Observations O Observation function P(o|s) (or O(s,o)) POMDPs are MDPs over beliefstates b (distributions over S) We’ll be able to say more in a few lecturesass, as,a,s’s’abb, aob’16Example: Ghostbusters In (static) Ghostbusters: Belief state determined by evidence to date {e} Tree really over evidence sets Probabilistic reasoning needed to predict new evidence given past evidence Solving POMDPs One way: use truncated expectimax to compute approximate value of actions What if you only considered busting or one sense followed by a bust? You get a VPI-based agent!a{e}e, ae’{e, e’}abb, aob’abust{e}{e}, asensee’{e, e’}asenseU(abust, {e})abustU(abust, {e, e’})17More Generally General solutions map belief functions to actions Can divide regions of belief space (set of belief functions) into policy regions (gets complex quickly) Can build approximate policies using discretization methods Can factor belief functions in various ways Overall, POMDPs are very (actually PSACE-) hard Most real problems are POMDPs, but we can rarely solve then in general!18 19Reasoning over Time Often, we want to reason about a sequence of observations Speech recognition Robot localization User attention Medical monitoring Need to introduce time into our models Basic approach: hidden Markov models (HMMs) More general: dynamic Bayes’ nets20Markov Models A Markov model is a chain-structured BN Each node is identically distributed (stationary) Value of X at a given time is called the state As a BN: Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial probs)X2X1X3X4[DEMO: Ghostbusters]4Conditional Independence Basic conditional independence: Past and future independent of the present Each time step only depends on the previous This is called the (first order) Markov property Note that the chain is just a (growing) BN We can always use generic BN reasoning on it if we truncate the chain at a fixed lengthX2X1X3X422Example: Markov Chain Weather: States: X = {rain, sun} Transitions: Initial distribution: 1.0 sun What’s the probability distribution after one step?rain sun0.90.90.10.1This is a CPT, not a BN!23Mini-Forward Algorithm Question: probability of being in state x at time t? Slow answer: Enumerate all sequences of length t which end in s Add up their probabilities…24Mini-Forward Algorithm Better way: cached incremental belief updates An instance of variable elimination!sunrainsunrainsunrainsunrainForward simulationExample From initial observation of sun From initial observation of rainP(X1) P(X2) P(X3) P(X∞)P(X1) P(X2) P(X3) P(X∞)26Stationary Distributions If we simulate the chain long enough: What happens? Uncertainty accumulates Eventually, we have no idea what the state


View Full Document

Berkeley COMPSCI 188 - Decision diagrams (6PP)

Documents in this Course
CSP

CSP

42 pages

Metrics

Metrics

4 pages

HMMs II

HMMs II

19 pages

NLP

NLP

23 pages

Midterm

Midterm

9 pages

Agents

Agents

8 pages

Lecture 4

Lecture 4

53 pages

CSPs

CSPs

16 pages

Midterm

Midterm

6 pages

MDPs

MDPs

20 pages

mdps

mdps

2 pages

Games II

Games II

18 pages

Load more
Download Decision diagrams (6PP)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Decision diagrams (6PP) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Decision diagrams (6PP) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?