15-780: Grad AILecture 21: Bayesian learning, (PO)MDPsGeoff Gordon (this lecture)Tuomas Sandholm TAs Erik Zawadzki, Abe OthmanAdminReminder: project milestone reports due todayReminder: HW5 outReview: numerical integrationParallel importance sampling‣allows ZR(x) instead of R(x)‣biased, but asymptotically unbiasedSequential sampling (for chains, trees)Parallel IS + resampling for sequential problems = particle filterReview: MCMCMetropolis-Hastings: randomized search procedure for high R(x)Leads to stationary distribution = R(x)Repeatedly tweak current x to get x’‣If R(x’) ≥ R(x), move to x’‣If R(x’) << R(x), stay at xRequires good one-step proposal Q(x’ | x) to get acceptable acceptance rate and mixing rateReview: GibbsSpecial case of MH for X divided into blocksProposal Q:‣pick a block i uniformly (or round robin, or any other schedule)‣sample XB(i) ~ P(XB(i) | X¬B(i))Acceptance rate = 100%Review: LearningP(M | X) = P(X | M) P(M) / P(X)P(M | X, Y) = P(Y | X, M) P(X | M) / P(Y | M)Example: framlingsVersion space algorithm: when prior is uniform and likelihood is 0 or 1Bayesian LearningRecall iris exampleH = factor graphs of given structureNeed to specify entries of ϕsϕ0ϕ4ϕ3ϕ2ϕ1ϕ0ϕ4ϕ3ϕ2ϕ1ϕ0ϕ4ϕ3ϕ2ϕ1ϕ0ϕ4ϕ3ϕ2ϕ1…Φ1 params…X1X2X3X4Factorslomhiset.vers.vir.piqi1–pi–qirisi1–ri–siuivi1–ui–visetosapversicolorqvirginica1–p–qϕ0ϕ1–ϕ4Continuous factorslomhiset.vers.vir.p1q11–p1–q1r1s11–r1–s1u1v11–u1–v1ϕ1Discretized petal length Continuous petal lengthΦ1(�,s)=exp(−(� − �s)2/2σ2)parameters �set, �vers, �vir;constant σ2Simpler exampleHTp1–pCoin tossParametric model classH is a parametric model class: each H in H corresponds to a vector of parameters θ = (p) or θ = (p, q, p1, q1, r1, s1, …)Hθ: X ~ P(X | θ) (or, Y ~ P(Y | X, θ))Contrast to discrete H, as in version spaceCould also have mixed H: discrete choice among parametric (sub)classesContinuous priorE.g., for coin toss, p ~ Beta(a, b):Specifying, e.g., a = 2, b = 2:P (p | a, b)=1B(a, b)pa−1(1 − p)b−1P (p)=6p(1 − p)Prior for p0 0.20.40.6 0.81012345Coin toss, cont’dJoint dist’n of parameter p and data xi:P (p, x)=P (p)�iP (xi| p)=6p(1 − p)�ipxi(1 − p)1−xiCoin flip posteriorP (p | x)=P (p)�iP (xi| p)/P (x)=1Zp(1 − p)�ipxi(1 − p)1−xi=1Zp1+Pixi(1 − p)1+Pi(1−xi)= Beta(2 +�ixi, 2+�i(1 − xi))Prior for p0 0.20.40.6 0.81012345Posterior after 4 H, 7 T0 0.20.40.6 0.81012345Posterior after 10 H, 19 T0 0.20.40.6 0.81012345Predictive distributionPosterior is nice, but doesn’t tell us directly what we need to knowWe care more about P(xN+1 | x1, …, xN)By law of total probability, conditional independence:P (xN +1| D)=�P (xN +1, θ | D)dθ=�P (xN +1| θ)P (θ | D)dθCoin flip exampleAfter 10 H, 19 T: p ~ Beta(12, 21)E(xN+1 | p) = pE(xN+1 | θ) = E(p | θ) = a/(a+b) = 12/33So, predict 36.4% chance of H on next flipApproximate BayesApproximate BayesCoin flip example was easyIn general, computing posterior (or predictive distribution) may be hardSolution: use the approximate integration techniques we’ve studied!Bayes as numerical integrationParameters θ, data DP(θ | D) = P(D | θ) P(θ) / P(D)Usually, P(θ) is simple; so is Z P(D | θ)So, P(θ | D) ∝ Z P(D | θ) P(θ)Perfect for MHP (y | x)=σ(ax + b)σ(z)=1/(1 + exp(−z))petal lengthP(I. virginica)PosteriorP (a, b | xi,yi)=ZP (a, b)�iσ(axi+ b)yiσ(−axi− b)1−yiP (a, b)=N (0,I)abSample from posteriorab!"#!"$ !"%& &"' &"# &"$&"%!%!(!$!)!#!*!'Expanded factor graphoriginal factor graph:Cheaper approximationsGetting cheaperMaximum a posteriori (MAP)Maximum likelihood (MLE)Conditional MLE / MAPInstead of true posterior, just use single most probable hypothesisMAPSummarize entire posterior density using the maximumarg maxθP (D | θ)P (θ)MLELike MAP, but ignore prior termarg maxθP (D | θ)Conditional MLE, MAPSplit D = (x, y)Condition on x, try to explain only yarg maxθP (y | x, θ)arg maxθP (y | x, θ)P (θ)abIris example: MAP vs. posterior!"#!"$ !"%& &"' &"# &"$&"%!%!(!$!)!#!*!'Irises: MAP vs. posterior!"# $%!&'(&&'(&'"&'$&')**'(Too certainThis behavior of MAP (or MLE) is typical: we are too sure of ourselvesBut, often gets better with more dataTheorem: MAP and MLE are consistent estimates of true θ, if “data per parameter” → ∞Sequential DecisionsMarkov decision process: influence diagramStates, actions, costs C(s,a) ∈ [Cmin, Cmax], transitions T(s’ | s, a), initial state s1Influence diagramsLike a Bayes net, except:‣diamond nodes are costs/rewards‣must have no children ‣square nodes are decisions‣we pick the CPTs (before seeing anything)‣minimize expected costCircles are ordinary r.v.s as beforeMarkov decision process:state space diagramStates, actions, costs C(s,a) ∈ [Cmin, Cmax], transitions T(s’ | s, a), initial state s1goal: all costs = 0, self-transition 100%Choosing actionsExecution trace: τ = (s1, a1, c1, s2, a2, c2, …)‣c1 = C(s1, a1), c2 = C(s2, a2), etc.‣s2 ~ T(s | s1, a1), s3 ~ T(s | s2, a2), etc.Policy π: S → A‣or randomized, π(a | s)Trace from π: a1 ~ π(a | s1), etc.‣τ is then an r.v. with known distribution‣we’ll write τ ~ π (rest of MDP implicit)Choosing good actionsValue of a policy:Objective:discount factor in (0,1)Jπ=1 − γγE��tγtct����τ ∼ π�J∗= minπJππ∗∈ arg m inπJπWhy a discount factor?Why a discount factor?A1: to make the sums finiteWhy a discount factor?A1: to make the sums finiteA2: interest rate 1/γ – 1 per periodWhy a discount factor?A1: to make the sums finiteA2: interest rate 1/γ – 1 per periodA3: model mismatch‣probability (1–γ) that something unexpected happens on each step and my plan goes out the windowTree searchRoot node = current stateAlternating levels: action and outcome‣min and expectationBuild out tree until goal or until γt small enoughγ = 0.5transitions = 0.5Interpreting the resultNumber at each ○ node: optimal cost if starting from state s instead of s1‣call this J*(s)—so, J* = J*(s1)‣state-value functionNumber at each ⋅ node: optimal cost if starting from parent’s s, choosing incoming a‣call this Q*(s,a)‣action-value functionSimilarly, Jπ(s) and Qπ(s, a)The update equationsFor ⋅ nodeFor ○ nodeQ∗(s, a) = (1 − γ)C(s, a)+γE[J∗(s�) | s�∼ T (·|s, a)]J∗(s) = minaQ∗(s, a)(1–γ) × immediate cost + γ × future costUpdates for a fixed policyFor ⋅ nodeFor ○
View Full Document