Review•Multiclass logistic regression•Priors, conditional MAP logistic regression•Bayesian logistic regression!MAP is not always typical of posterior!posterior predictive can avoid overfitting!"!#!$ " $ #"!%!&!'"'&!!"!#" "#"!"""$!"$%"$&"$'#1Review•Finding posterior predictive distribution often requires numerical integration!uniform sampling!importance sampling!parallel importance sampling•These are all Monte-Carlo algorithms!another well-known MC algorithm coming up2Application: SLAMEliazar and Parr, IJCAI-033Parallel ISParallel IS•Final estimate:32Parallel IS•Pick N samples Xi from proposal Q(X)•If we knew Wi = P(Xi)/Q(Xi), could do IS•Instead, set !and, •Then: 314Parallel IS is biased0 1 2 300.511.522.53mean(weights)1 / mean(weights)E(mean(weights))E(¯W )=Z, but E(1/¯W ) !=1/Z in general5!! !" # " !!!!"#"!Q :(X, Y ) ∼ N(1, 1) θ ∼ U(−π, π)f(x, y, θ)=Q(x, y, θ)P (o =0.8 | x, y, θ)/Z6!! !" # " !!!!"#"!PosteriorE(X, Y, θ) = (0.496, 0.350, 0.084)7SLAM revisited•Uses a recursive version of parallel importance sampling: particle filter!each sample (particle) = trajectory over time!sampling extends trajectory by one step!recursively update importance weights and renormalize!resampling trick to avoid keeping lots of particles with low weights8Particle filter example9Monte-Carlo revisited•Recall: wanted•Would like to search for areas of high P(x)•But searching could bias our estimatesEP(g(X)) =!g(x)P (x)dx =!f(x)dx10Markov-Chain Monte Carlo•Randomized search procedure•Produces sequence of RVs X1, X2, …!Markov chain: satisfies Markov property•If P(Xt) small, P(Xt+1) tends to be larger•As t " !, Xi ~ P(X)•As # " !, Xt+# ! Xt11Markov chain12Stationary distribution13Markov-Chain Monte Carlo•As t " !, Xi ~ P(X); as # " !, Xt+# ! Xt •For big enough t and #, an approximately i.i.d. sample from P(X) is!{ Xt, Xt+#, Xt+2#, Xt+3#, … }•Can use i.i.d. sample to estimate EP(g(X))•Actually, don’t need independence:14Metropolis-Hastings•Way to design chain w/ stationary dist’n P(X)•Basic strategy: start from arbitrary X•Repeatedly tweak X to get X’!If P(X’) " P(X), move to X’!If P(X’) " P(X), stay at X!In intermediate cases, randomize15Proposal distribution•Left open: what does “tweak” mean?•Parameter of MH: Q(X’ | X)•Good proposals explore quickly, but remain in regions of high P(X)•Optimal proposal?16Simplest proposal•Random walk MH:!Q(X’ | X) = !big $:!small $:•Not usually a great proposal, but sometimes the best we have17MH algorithm•Initialize X1 arbitrarily•For t = 1, 2, …:!Sample X’ ~ Q(X’ | Xt)!Compute p =!With probability min(1, p), set Xt+1 := X’!else Xt+1 := Xt•Note: sequence X1, X2, … will usually contain duplicates18Acceptance rate•Want acceptance rate (avg of min(1,p)) to be large, so we don’t get big runs of same X•Want Q(X’ | X) to move long distances (to explore quickly)•Tension between long moves, acceptance rate:19Random walk MH revisited•Suppose we always accepted. Then:•Variance can only be smaller if we reject20Mixing rate, mixing time•If we pick a good proposal, we will move rapidly around domain of P(X)•After a short time, won’t be able to tell where we started•This is short mixing time = # steps until we can’t tell which starting point we used•Mixing rate = 1 / (mixing time)21MH example!!!"#$""#$!!!!"#%!"#&!"#'!"#(""#("#'"#&"#%!"!()'$*+,-+.*/22MH example!! !"#$ " "#$
View Full Document