Unformatted text preview:

15-780: Grad AILecture 19: Graphical models, Monte Carlo methodsGeoff Gordon (this lecture)Tuomas Sandholm TAs Erik Zawadzki, Abe OthmanAdminReminder: midterm March 29Reminder: project milestone reports due March 31Review: scenariosConverting QBF+ to PBI/MILP by scenarios‣Replicate decision variables for each scenario‣Replicate clauses: share first stage vars; set scenario vars by scenario index; replace decision vars by replicates‣Sample random scenariosExample: PSTRIPSReview: dynamic programmingSolving #SAT by dynamic programming (variable elimination)‣repeatedly move sums inward, combine tables, sum out‣treewidth and runtime/spaceReview: graphical modelsBayes net = DAG + CPTs‣For each RV (say X), there is one CPT specifying P(X | pa(X))‣Can simulate with propositional logic + random causesInference: similar to #SAT DP—move sums inward‣Can do partly analytically‣Allows us to prove independences and conditional ind’s from DAG aloneReview: graphical modelsBlocking, explaining awayMarkov blanketLearning: counting, Laplace smoothing‣if hidden variables: take 10-708 or use a toolboxMarkov blanketMarkov blanket of C = minimal set of obs’ns to make C independent of rest of graphFactor graphsAnother common type of graphical modelUses undirected, bipartite graph instead of DAGRusty robot: factor graphP(M) P(Ra) P(O) P(W|Ra,O) P(Ru|M,W)ConventionDon’t need to show unary factorsWhy? They don’t affect algorithms below.Non-CPT factorsJust saw: easy to convert Bayes net → factor graphIn general, factors need not be CPTs: any nonnegative #s allowedIn general, P(A, B, …) =Z =Hard v. soft factors012012111113133XY012012000001011XYHard SoftFactor graph → Bayes netConversion possible, but more involved‣Each representation can handle any distribution‣But, size/complexity of graph may differ2 cases for conversion:‣without adding nodes:‣adding nodes:IndependenceJust like Bayes nets, there are graphical tests for independence and conditional independenceSimpler, though:‣Cover up all observed nodes‣Look for a pathIndependence exampleModeling independenceTake a Bayes net, list the (conditional) independencesConvert to a factor graph, list the (conditional) independencesAre they the same list?What happened?InferenceInference: prior + evidence → posteriorWe gave examples of inference in a Bayes net, but not a general algorithmReason: general algorithm uses factor-graph representationSteps: instantiate evidence, eliminate nuisance nodes, normalize, answer queryInferenceTypical Q: given Ra=F, Ru=T, what is P(W)?Incorporate evidenceCondition on Ra=F, Ru=TEliminate nuisance nodesRemaining nodes: M, O, WQuery: P(W)So, O&M are nuisance—marginalize awayMarginal =Elimination orderSum out the nuisance variables in turnCan do it in any order, but some orders may be easier than othersLet’s do O, then MOne last eliminationChecking our workhttp://www.aispace.org/bayes/version5.1.6/bayes.jnlpDiscussionSteps: instantiate evidence, eliminate nuisance nodes, normalize, answer query‣each elimination introduces a new table, makes some old tables irrelevantNormalizationEach elim. order introduces different tables‣some tables bigger than othersFLOP count; treewidthTreewidth examplesChain TreeTreewidth examplesParallel chainsCycleDiscussionSeveral relationships between GMs and logic (similar DP algorithm, use of independent choices + logical consequences to represent a GM, factor graph with 0-1 potentials = CSP, MAP assignment = ILP)Directed v. undirected: advantages to bothLifted reasoning‣Propositional logic + objects = FOL‣FO GMs are a current hot topic of research (plate models, MLNs, ICL)—not solved yet!Discussion: belief propagationSuppose we want all 1-variable marginalsCould do N runs of variable eliminationOr: the BP algorithm simulates N runs for the price of 2For details: Kschischang et al. readingHMMs and DBNsInference over timeConsider a robot: ‣true state (x, y, θ)‣controls (v, w)‣N range sensors (here N=2: r, s)Modelxt+1= xt+ vtcos θt+ noiseyt+1= yt+ vtsin θt+ noiseθt+1= θt+ wt+ noisert=�(xt− xR)2+(yt− yR)2+ noisest=�(xt− xS)2+(yt− yS)2+ noiseModel of x, y, θ (r, s unobserved)Goal: inference over timeN=1 sensor, repeatedly observe range = 1m + noiseFactor graphDynamic Bayes NetworkDBN: factor graph composed of a single structural unit repeated over time‣conceptually infinite to right, but in practice cut off at some maximum TFactors must be conditional distributions∀xt,yt,θt,ut,vt�xt+1,yt+1,θt+1φ(xt,yt,θt,ut,vt,xt+1,yt+1,θt+1)=1∀xt,yt,θt�rt,stφ(xt,yt,θt,rt,st)=1Three kinds of variableControl (connects only forward)StateObservationCondition on obs, do(control)ControlStateObservationCondition on obs, do(control)ControlStateObservationSimplified versionState: xt ∈ {1, 2, 3}Observation: yt ∈ {L, H}Control: just one (i.e., no choice)—“keep going”Hidden Markov ModelsThis is an HMM—a DBN with:‣one state variable‣one observation variablePotentials1231230.70.300.30.30.300.30.7Xt+1XtLH1230.670.330.50.50.330.67YtXtHMM inferenceCondition on y1 = H, y2 = H, y3 = LWhat is P(X2 | HHL)?HMM factors after conditioningEliminate x1 and x3Multiply remaining potentials and renormalizeForward-backwardYou may recognize the above as the forward-backward algorithmSpecial case of dynamic programming / variable elimination / belief propagationApproximate InferenceMost of the time…Treewidth is bigVariables are high-arity or continuousCan’t afford exact inferenceNeed numerical integration (and/or summation)We’ll look at randomized algorithmsNumerical integrationï1ï0.500.51ï1ï0.8ï0.6ï0.4ï0.200.20.40.60.81012345YXf(X,Y)Integration in 1000s of dimsEliazar and Parr, IJCAI-03Simple 1D problemï1 ï0.5 0 0.5 1010203040506070Uniform samplingï1 ï0.5 0 0.5 10102030405060702N�if(xi)Uniform samplingSo, V E(f(X)) is desired integralBut standard deviation can be bigCan reduce it by averaging many samplesBut only at rate 1/sqrt(N)E(f(X)) =�P (x)f(x)dx=1V�f(x)dxImportance samplingInstead of x ~ uniform, use x ~ Q(x)Q = importance distributionShould have Q(x) large where f(x) is largeProblem:EQ(f(X)) =�Q(x)f(x)dxImportance samplingh(x) ≡ f(x)/Q(x)EQ(h(X)) =�Q(x)h(x)dx=�Q(x)f(x)/Q(x)dx=�f(x)dxImportance samplingSo, take samples of h(X) instead of f(X)wi = 1/Q(xi) is importance weightQ = 1/V yields uniform samplingImportance samplingï1 ï0.5 0 0.5 1010203040506070VarianceHow does this help us control


View Full Document

CMU CS 15780 - lecture

Download lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?