DOC PREVIEW
CMU CS 15780 - lecture

This preview shows page 1-2-3-4-5-6-39-40-41-42-43-80-81-82-83-84-85 out of 85 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 85 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

15-780: Grad AILecture 21: Bayesian learning, (PO)MDPsGeoff Gordon (this lecture)Tuomas Sandholm TAs Erik Zawadzki, Abe OthmanAdminReminder: project milestone reports due todayReminder: HW5 outReview: numerical integrationParallel importance sampling‣allows ZR(x) instead of R(x)‣biased, but asymptotically unbiasedSequential sampling (for chains, trees)Parallel IS + resampling for sequential problems = particle filterReview: MCMCMetropolis-Hastings: randomized search procedure for high R(x)Leads to stationary distribution = R(x)Repeatedly tweak current x to get x’‣If R(x’) ≥ R(x), move to x’‣If R(x’) << R(x), stay at xRequires good one-step proposal Q(x’ | x) to get acceptable acceptance rate and mixing rateReview: GibbsSpecial case of MH for X divided into blocksProposal Q:‣pick a block i uniformly (or round robin, or any other schedule)‣sample XB(i) ~ P(XB(i) | X¬B(i))Acceptance rate = 100%Review: LearningP(M | X) = P(X | M) P(M) / P(X)P(M | X, Y) = P(Y | X, M) P(X | M) / P(Y | M)Example: framlingsVersion space algorithm: when prior is uniform and likelihood is 0 or 1Bayesian LearningRecall iris exampleH = factor graphs of given structureNeed to specify entries of ϕsϕ0ϕ4ϕ3ϕ2ϕ1ϕ0ϕ4ϕ3ϕ2ϕ1ϕ0ϕ4ϕ3ϕ2ϕ1ϕ0ϕ4ϕ3ϕ2ϕ1…Φ1 params…X1X2X3X4Factorslomhiset.vers.vir.piqi1–pi–qirisi1–ri–siuivi1–ui–visetosapversicolorqvirginica1–p–qϕ0ϕ1–ϕ4Continuous factorslomhiset.vers.vir.p1q11–p1–q1r1s11–r1–s1u1v11–u1–v1ϕ1Discretized petal length Continuous petal lengthΦ1(�,s)=exp(−(� − �s)2/2σ2)parameters �set, �vers, �vir;constant σ2Simpler exampleHTp1–pCoin tossParametric model classH is a parametric model class: each H in H corresponds to a vector of parameters θ = (p) or θ = (p, q, p1, q1, r1, s1, …)Hθ: X ~ P(X | θ) (or, Y ~ P(Y | X, θ))Contrast to discrete H, as in version spaceCould also have mixed H: discrete choice among parametric (sub)classesContinuous priorE.g., for coin toss, p ~ Beta(a, b):Specifying, e.g., a = 2, b = 2:P (p | a, b)=1B(a, b)pa−1(1 − p)b−1P (p)=6p(1 − p)Prior for p0 0.20.40.6 0.81012345Coin toss, cont’dJoint dist’n of parameter p and data xi:P (p, x)=P (p)�iP (xi| p)=6p(1 − p)�ipxi(1 − p)1−xiCoin flip posteriorP (p | x)=P (p)�iP (xi| p)/P (x)=1Zp(1 − p)�ipxi(1 − p)1−xi=1Zp1+Pixi(1 − p)1+Pi(1−xi)= Beta(2 +�ixi, 2+�i(1 − xi))Prior for p0 0.20.40.6 0.81012345Posterior after 4 H, 7 T0 0.20.40.6 0.81012345Posterior after 10 H, 19 T0 0.20.40.6 0.81012345Predictive distributionPosterior is nice, but doesn’t tell us directly what we need to knowWe care more about P(xN+1 | x1, …, xN)By law of total probability, conditional independence:P (xN +1| D)=�P (xN +1, θ | D)dθ=�P (xN +1| θ)P (θ | D)dθCoin flip exampleAfter 10 H, 19 T: p ~ Beta(12, 21)E(xN+1 | p) = pE(xN+1 | θ) = E(p | θ) = a/(a+b) = 12/33So, predict 36.4% chance of H on next flipApproximate BayesApproximate BayesCoin flip example was easyIn general, computing posterior (or predictive distribution) may be hardSolution: use the approximate integration techniques we’ve studied!Bayes as numerical integrationParameters θ, data DP(θ | D) = P(D | θ) P(θ) / P(D)Usually, P(θ) is simple; so is Z P(D | θ)So, P(θ | D) ∝ Z P(D | θ) P(θ)Perfect for MHP (y | x)=σ(ax + b)σ(z)=1/(1 + exp(−z))petal lengthP(I. virginica)PosteriorP (a, b | xi,yi)=ZP (a, b)�iσ(axi+ b)yiσ(−axi− b)1−yiP (a, b)=N (0,I)abSample from posteriorab!"#!"$ !"%& &"' &"# &"$&"%!%!(!$!)!#!*!'Expanded factor graphoriginal factor graph:Cheaper approximationsGetting cheaperMaximum a posteriori (MAP)Maximum likelihood (MLE)Conditional MLE / MAPInstead of true posterior, just use single most probable hypothesisMAPSummarize entire posterior density using the maximumarg maxθP (D | θ)P (θ)MLELike MAP, but ignore prior termarg maxθP (D | θ)Conditional MLE, MAPSplit D = (x, y)Condition on x, try to explain only yarg maxθP (y | x, θ)arg maxθP (y | x, θ)P (θ)abIris example: MAP vs. posterior!"#!"$ !"%& &"' &"# &"$&"%!%!(!$!)!#!*!'Irises: MAP vs. posterior!"# $%!&'(&&'(&'"&'$&')**'(Too certainThis behavior of MAP (or MLE) is typical: we are too sure of ourselvesBut, often gets better with more dataTheorem: MAP and MLE are consistent estimates of true θ, if “data per parameter” → ∞Sequential DecisionsMarkov decision process: influence diagramStates, actions, costs C(s,a) ∈ [Cmin, Cmax], transitions T(s’ | s, a), initial state s1Influence diagramsLike a Bayes net, except:‣diamond nodes are costs/rewards‣must have no children ‣square nodes are decisions‣we pick the CPTs (before seeing anything)‣minimize expected costCircles are ordinary r.v.s as beforeMarkov decision process:state space diagramStates, actions, costs C(s,a) ∈ [Cmin, Cmax], transitions T(s’ | s, a), initial state s1goal: all costs = 0, self-transition 100%Choosing actionsExecution trace: τ = (s1, a1, c1, s2, a2, c2, …)‣c1 = C(s1, a1), c2 = C(s2, a2), etc.‣s2 ~ T(s | s1, a1), s3 ~ T(s | s2, a2), etc.Policy π: S → A‣or randomized, π(a | s)Trace from π: a1 ~ π(a | s1), etc.‣τ is then an r.v. with known distribution‣we’ll write τ ~ π (rest of MDP implicit)Choosing good actionsValue of a policy:Objective:discount factor in (0,1)Jπ=1 − γγE��tγtct����τ ∼ π�J∗= minπJππ∗∈ arg m inπJπWhy a discount factor?Why a discount factor?A1: to make the sums finiteWhy a discount factor?A1: to make the sums finiteA2: interest rate 1/γ – 1 per periodWhy a discount factor?A1: to make the sums finiteA2: interest rate 1/γ – 1 per periodA3: model mismatch‣probability (1–γ) that something unexpected happens on each step and my plan goes out the windowTree searchRoot node = current stateAlternating levels: action and outcome‣min and expectationBuild out tree until goal or until γt small enoughγ = 0.5transitions = 0.5Interpreting the resultNumber at each ○ node: optimal cost if starting from state s instead of s1‣call this J*(s)—so, J* = J*(s1)‣state-value functionNumber at each ⋅ node: optimal cost if starting from parent’s s, choosing incoming a‣call this Q*(s,a)‣action-value functionSimilarly, Jπ(s) and Qπ(s, a)The update equationsFor ⋅ nodeFor ○ nodeQ∗(s, a) = (1 − γ)C(s, a)+γE[J∗(s�) | s�∼ T (·|s, a)]J∗(s) = minaQ∗(s, a)(1–γ) × immediate cost + γ × future costUpdates for a fixed policyFor ⋅ nodeFor ○


View Full Document

CMU CS 15780 - lecture

Download lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?