CS 188: Artificial Intelligence Spring 2007AnnouncementsRepresenting KnowledgeInferenceInference in Graphical ModelsInference TechniquesReminder: Alarm NetworkInference by EnumerationExampleSlide 10Nesting SumsEvaluation TreeVariable Elimination: IdeaBasic ObjectsBasic OperationsSlide 18Slide 19Slide 20General Variable EliminationVariable EliminationSamplingPrior SamplingSlide 25Slide 26Rejection SamplingLikelihood WeightingLikelihood SamplingSlide 30Slide 31SummaryBayes Net for insuranceCS 188: Artificial IntelligenceSpring 2007Lecture 14: Bayes Nets III3/1/2007Srini Narayanan – ICSI and UC BerkeleyAnnouncementsOffice Hours this week will be on Friday (11-1).Assignment 2 gradingMidterm 3/13 Review 3/8 (next Thursday)Midterm review materials up over the weekend.Extended office hours next week (Thursday 11-1, Friday 2:30-4:30)Representing KnowledgeInferenceInference: calculating some statistic from a joint probability distributionExamples:Posterior probability:Most likely explanation:RTBDLT’Inference in Graphical ModelsQueriesValue of informationWhat evidence should I seek nextSensitivity AnalysisWhat probability values are most criticalExplanatione.g., Why do I need a new starter motorPredictione.g., What would happen if my fuel pump stops workingInference TechniquesExact InferenceInference by enumerationVariable eliminationApproximate Inference/ Monte CarloPrior SamplingRejection SamplingLikelihood weightingMonte Carlo Markov Chain (MCMC)Reminder: Alarm NetworkInference by EnumerationGiven unlimited time, inference in BNs is easyRecipe:State the marginal probabilities you needFigure out ALL the atomic probabilities you needCalculate and combine themExample:ExampleWhere did we use the BN structure?We didn’t!ExampleIn this simple method, we only need the BN to synthesize the joint entriesNesting SumsAtomic inference is extremely slow!Slightly clever way to save work:Move the sums as far right as possibleExample:Evaluation TreeView the nested sums as a computation tree:Still repeated work: calculate P(m | a) P(j | a) twice, etc.Variable Elimination: IdeaLots of redundant work in the computation treeWe can save time if we carry out the summation right to left and cache all intermediate results into objects called factorsThis is the basic idea behind variable eliminationBasic ObjectsTrack objects called factorsInitial factors are local CPTsDuring elimination, create new factorsAnatomy of a factor:Variables introducedVariables summed outFactor argument variablesBasic OperationsFirst basic operation: join factorsCombining two factors:Just like a database joinBuild a factor over the union of the domainsExample:Basic OperationsSecond basic operation: marginalizationTake a factor and sum out a variableShrinks a factor to a smaller oneA projection operationExample:ExampleExampleGeneral Variable EliminationQuery:Start with initial factors:Local CPTs (but instantiated by evidence)While there are still hidden variables (not Q or evidence):Pick a hidden variable HJoin all factors mentioning HProject out HJoin all remaining factors and normalizeVariable EliminationWhat you need to know:VE caches intermediate computationsPolynomial time for tree-structured graphs!Saves time by marginalizing variables as soon as possible rather than at the endApproximationsExact inference is slow, especially when you have a lot of hidden nodesApproximate methods give you a (close) answer, fasterSamplingBasic idea:Draw N samples from a sampling distribution SCompute an approximate posterior probabilityShow this converges to the true probability POutline:Sampling from an empty networkRejection sampling: reject samples disagreeing with evidenceLikelihood weighting: use evidence to weight samplesPrior SamplingCloudySprinklerRainWetGrassCloudySprinklerRainWetGrassPrior SamplingThis process generates samples with probability…i.e. the BN’s joint probabilityLet the number of samples of an event beThenI.e., the sampling procedure is consistentExampleWe’ll get a bunch of samples from the BN:c, s, r, wc, s, r, wc, s, r, wc, s, r, wc, s, r, wIf we want to know P(W)We have counts <w:4, w:1>Normalize to get P(W) = <w:0.8, w:0.2>This will get closer to the true distribution with more samplesCan estimate anything else, tooWhat about P(C| r)? P(C| r, w)?CloudySprinklerRainWetGrassCSRWRejection SamplingLet’s say we want P(C)No point keeping all samples aroundJust tally counts of C outcomesLet’s say we want P(C| s)Same thing: tally C outcomes, but ignore (reject) samples which don’t have S=sThis is rejection samplingIt is also consistent (correct in the limit)c, s, r, wc, s, r, wc, s, r, wc, s, r, wc, s, r, wCloudySprinklerRainWetGrassCSRWLikelihood WeightingProblem with rejection sampling:If evidence is unlikely, you reject a lot of samplesYou don’t exploit your evidence as you sampleConsider P(B|a)Idea: fix evidence variables and sample the restProblem: sample distribution not consistent!Solution: weight by probability of evidence given parentsBurglary AlarmBurglary AlarmLikelihood SamplingCloudySprinklerRainWetGrassCloudySprinklerRainWetGrassLikelihood WeightingSampling distribution if z sampled and e fixed evidenceNow, samples have weightsTogether, weighted sampling distribution is consistentCloudyRainCSRW=Likelihood WeightingNote that likelihood weighting doesn’t solve all our problemsRare evidence is taken into account for downstream variables, but not upstream onesA better solution is Markov-chain Monte Carlo (MCMC), more advancedWe’ll return to sampling for robot localization and tracking in dynamic BNsCloudyRainCSRWSummaryExact inference in graphical models is tractable only for polytrees.Variable elimination caches intermediate results to make the exact inference process more efficient for a given for a given query.Approximate inference techniques use a variety of sampling methods and can scale to large models.NEXT: Adding dynamics and change to graphical modelsBayes Net for
View Full Document