CS 188: Artificial Intelligence Fall 2007AnnouncementsInferenceReminder: Alarm NetworkNormalization TrickInference by Enumeration?Nesting SumsEvaluation TreeVariable Elimination: IdeaBasic ObjectsBasic OperationsSlide 12ExampleSlide 14General Variable EliminationSlide 16Slide 17Variable EliminationSamplingPrior SamplingSlide 21Slide 22Rejection SamplingLikelihood WeightingLikelihood SamplingSlide 26Slide 27Decision NetworksSlide 29Example: Decision NetworksSlide 31Value of InformationGeneral FormulaVPI PropertiesVPI ExampleVPI ScenariosCS 188: Artificial IntelligenceFall 2007Lecture 18: Bayes Nets III10/30/2007Dan Klein – UC BerkeleyAnnouncementsProject shift:Project 4 moved back a littleInstead, mega-mini-homework, worth 3x, gradedContest is liveInferenceInference: calculating some statistic from a joint probability distributionExamples:Posterior probability:Most likely explanation:RTBDLT’Reminder: Alarm NetworkNormalization TrickNormalizeInference by Enumeration?Nesting SumsAtomic inference is extremely slow!Slightly clever way to save work:Move the sums as far right as possibleExample:Evaluation TreeView the nested sums as a computation tree:Still repeated work: calculate P(m | a) P(j | a) twice, etc.Variable Elimination: IdeaLots of redundant work in the computation treeWe can save time if we cache all partial resultsJoin on one hidden variable at a timeProject out that variable immediatelyThis is the basic idea behind variable eliminationBasic ObjectsTrack objects called factorsInitial factors are local CPTsDuring elimination, create new factorsAnatomy of a factor:Variables introducedVariables summed outArgument variables, always non-evidence variables4 numbers, one for each value of D and EBasic OperationsFirst basic operation: join factorsCombining two factors:Just like a database joinBuild a factor over the union of the domainsExample:Basic OperationsSecond basic operation: marginalizationTake a factor and sum out a variableShrinks a factor to a smaller oneA projection operationExample:ExampleExampleGeneral Variable EliminationQuery:Start with initial factors:Local CPTs (but instantiated by evidence)While there are still hidden variables (not Q or evidence):Pick a hidden variable HJoin all factors mentioning HProject out HJoin all remaining factors and normalizeExampleChoose AExampleChoose EFinishNormalizeVariable EliminationWhat you need to know:VE caches intermediate computationsPolynomial time for tree-structured graphs!Saves time by marginalizing variables ask soon as possible rather than at the endWe will see special cases of VE laterYou’ll have to implement the special casesApproximationsExact inference is slow, especially when you have a lot of hidden nodesApproximate methods give you a (close) answer, fasterSamplingBasic idea:Draw N samples from a sampling distribution SCompute an approximate posterior probabilityShow this converges to the true probability POutline:Sampling from an empty networkRejection sampling: reject samples disagreeing with evidenceLikelihood weighting: use evidence to weight samplesPrior SamplingCloudySprinklerRainWetGrassCloudySprinklerRainWetGrassPrior SamplingThis process generates samples with probability…i.e. the BN’s joint probabilityLet the number of samples of an event beThenI.e., the sampling procedure is consistentExampleWe’ll get a bunch of samples from the BN:c, s, r, wc, s, r, wc, s, r, wc, s, r, wc, s, r, wIf we want to know P(W)We have counts <w:4, w:1>Normalize to get P(W) = <w:0.8, w:0.2>This will get closer to the true distribution with more samplesCan estimate anything else, tooWhat about P(C| r)? P(C| r, w)?CloudySprinklerRainWetGrassCSRWRejection SamplingLet’s say we want P(C)No point keeping all samples aroundJust tally counts of C outcomesLet’s say we want P(C| s)Same thing: tally C outcomes, but ignore (reject) samples which don’t have S=sThis is rejection samplingIt is also consistent (correct in the limit)c, s, r, wc, s, r, wc, s, r, wc, s, r, wc, s, r, wCloudySprinklerRainWetGrassCSRWLikelihood WeightingProblem with rejection sampling:If evidence is unlikely, you reject a lot of samplesYou don’t exploit your evidence as you sampleConsider P(B|a)Idea: fix evidence variables and sample the restProblem: sample distribution not consistent!Solution: weight by probability of evidence given parentsBurglary AlarmBurglary AlarmLikelihood SamplingCloudySprinklerRainWetGrassCloudySprinklerRainWetGrassLikelihood WeightingSampling distribution if z sampled and e fixed evidenceNow, samples have weightsTogether, weighted sampling distribution is consistentCloudyRainCSRWLikelihood WeightingNote that likelihood weighting doesn’t solve all our problemsRare evidence is taken into account for downstream variables, but not upstream onesA better solution is Markov-chain Monte Carlo (MCMC), more advancedWe’ll return to sampling for robot localization and tracking in dynamic BNsCloudyRainCSRWDecision NetworksMEU: choose the action which maximizes the expected utility given the evidenceCan directly operationalize this with decision diagramsBayes nets with nodes for utility and actionsLets us calculate the expected utility for each actionNew node types:Chance nodes (just like BNs)Actions (rectangles, must be parents, act as observed evidence)Utilities (depend on action and chance nodes)WeatherReportUmbrellaUDecision NetworksAction selection:Instantiate all evidenceCalculate posterior over parents of utility nodeSet action node each possible wayCalculate expected utility for each actionChoose maximizing actionWeatherReportUmbrellaUExample: Decision NetworksWeatherUmbrellaUW P(W)sun 0.7rain 0.3A W U(A,W)leave sun 100leave rain 0take sun 20take rain 70Example: Decision NetworksWeatherReportUmbrellaUA W U(A,W)leave sun 100leave rain 0take sun 20take rain 70W P(W)sun 0.7rain 0.3R P(R|rain)clear 0.2cloud 0.8R P(R|sun)clear 0.5cloudy 0.5Value of InformationIdea: compute value of acquiring each possible piece of evidenceCan be done directly from decision networkExample: buying oil drilling rightsTwo blocks A and B, exactly one has oil, worth kPrior
View Full Document