1My PlanRemaining Lectures12 April: Learning Time Series (DBNs)12 - 15 April: CaliforniaPhone: 650-325-714316 April: Review: Graphical Models and Inference19 April: Review: Learning in Graphical ModelsStudent Presentations20 Minutes21 April: Maria, Noam23 April: Mark, LiangLast Time ILearning Local StructureDefault TablesDecision TreesTake Home Lesson:When data is completeEstimation problem decomposes.Optimize individual probability distributions for MDL/BDeNot limited to any specific local structureMDL framework straightforward to extend for GAM, other models.BDe less straightforward: Need a “nice” prior (conjugate)When data is incompleteEstimation problem doesn’t decomposeUse EM to complete the sample and optimize distributionsindependently.2Last Time IIGaussian NetworksNifty Shachter/Kenley technique for computing precisionmatrices:BDe for Gaussian Belief NetworksConjugate distributionPriorPrecision W: WishartMean m: F(W)X: N(m,W)PosteriorX: t-distribution. Approximate as gaussian()()−−+=+++++++++1111111111iiTiiiiTiibbbbiWiWννννrrrr()111ν=WLast Time IIPriorPrior network.Marginal for X is T-distributed, but use gaussian for priornetwork and assess an equivalent number of counts for boththe variance and the mean.(These are not separable)Model Selection (P{D})Closed Form Solution.Bayesian Estimate[][] []{}[]()()()[]()[]()21221’’111121’’21’’,1;,1|1+−−−+−++−+−Γ+Γ=+−=+WmXTmXnnnTTnXTmXXmXPTWnWWWwαµµαπααααL()111’’−++−= TnTWµµααα3Last Time IIBDe PriorMarginal for X is T. Approximate as Gaussian.Equivalent samples Separate counts for the joint variance and the unconditionalmean. (These are not separable)Model Selection (P{D})Bayesian Estimate“Gaussian”:Posteriorνµ=()TnWW111−−+=−αααµµ{}()()()2’22/2/’’,,’2|WWTTncncGDPWWnnMααµµααααπ=−MxM++=µµαναν’[]∑=mmxMx1M+=µµαα’()()TMxxMMMSTT −−+++=ννααµµ’MWW+=αα’[]()[]()∑−−=mTMxmxxmxMS1Last Time IIGaussian Learning Take Home LessonPrior and marginal are complicatedBUT, the formulas for updating the parameters of the NW,P{D}, P{X|X1…XN}, MAP are closed form.Function of sufficient statistics (surprise...)ExtensionsProbably works for all distribution families (need conjugate forBDe)Define model selection criterionP{D} and MDL penalty, if applicable.Define ML or MAP estimate for distribution.4Dynamic Belief NetworksTwo Topics:General DBN LearningLinear Dynamic SystemsImportant special case.DBN Review:X[0]X[t]X[t-1]PriorNetworkTransitionNetworkDBN Review, cont’dUnrolled NetworkSpecial NetworksHMM:Discrete MM hidden stateContinuous or discrete observationsLinear Dynamic SystemLinear gaussian state and observation dist’nsX[0]X[1]PriorNetworkX[2]X[3]X[0]X[t]X[t-1]5Input: sequences, l th sequence is of lengthOutput:Transition networkPrior networkLearning in discrete DBNs[Friedman, Murphy, and Russell; UAI-98]seqNlNNotationVariable i, time slice t, sequence l:Dirichlet parameter for prior network:Dirichlet parameter for transition network:Sufficient statistics:[]txli[]{}jPakXPiikji=== |0)0(,,θ[](){}jtPaktXPiikji===→][|,,θ[][]∑===lliikjiXjPakXIN ;,0)0(,,[][]∑∑===→ltliikjiXjPaktXIN ;,,,6P{X[n]|X[n-1]}P{X[n]|X[n-1]}Expected answer for complete dataAnswer is identical to normal BN learning.ProbabilityLog likelihoodML estimateLikelihood decomposes{}()() ()∏∏∏∏∏∏→→×=ΘijkNkjiijkNkjikjikjiGDP,,)0(,,,,0,,,|θθ()() ()∑∑∑∑∑∑→→+=ijkkjikjiijkkjikjiNNDGL,,,,0,,0,,loglog:θθ()()()∑=kkjikjikjiNN0,,0,,0,,ˆθ∑→→→=kkjikjikjiNN,,,,,,ˆθ7BIC()→+= BICBICDGBIC0:() ()00,,0,,0#2loglog GNNBICseqijkkjikji+=∑∑∑θ→→→→+=∑∑∑GNNBICijkkjikji#2loglog,,,,θBDeSame as normal BDe:Network and count for prior networkNetwork and count for transition networkFor both MDL and BDe, best net for is independent ofthe best net for()[] []{}jPakXPNiiBkji==×= 0|0ˆ00)0(,,α[] [](){}jtPaktXPNiiBkji==×=→→|ˆ0,,α0B→B8Learning with incomplete dataThis is just the same old algorithm...SEM:Use the EM algorithm to complete the data.Determine the expected sufficient statistics.Select the ML or MAP score for the expected sufficient statistics.Search over structures forIf HMM,This is the Baum-Welch algorithm()()→BB ,0BATmobileGenerated data that would be seen by a cameraoverlooking a highway.3500 vehicles simulated for 40-70 time steps.Train on 250 - 1500 vehicles and test on 2000.Learned decision tree CPTs using both BIC and BDe9Learned Modelsreldist: distance to car infront.relspeed: relative speedto car in frontleftblocked(rightblocked): car isto the left (right).xdot: left-right velocityydot: forward velocity orforward acceleration(not relative)Hidden StateHidden node has theinterpretation “avoid thecar in front”10Two hidden nodesPossible:distance and relativespeed policy as afunction of lane.lane and speed policyas a function of thecar on the right.Linear Dynamic Systems[Roweis and Ghahramani, 1997]Y[t]X[t][] [ ]()→→− VtXAtXN ,1;[] []()obsobsVtXAtYN ,;11Learning[Shumway + Stoffer, 82; Ghahramani + Hinton, 96]E-Step:Run b-net inference algorithm to estimate the mean andvariance for each X:M-Step:[] []tVtmˆ,ˆ[] []∑=tTtYtYα[] []∑==ntTtXtY,0ˆδ∑−==1,1 nttγγ[] [ ] []∑=−+−=ntTttVtXtX01,ˆ1ˆˆβ()10−++=nobsAγγγδNAVTobsobsδα−=[] [] []tVtXtXTtˆˆˆ+=γ()10−→+=γγβA()1−−+=→→NAVTnβγγ[]0ˆ0XX =[]0ˆ0VV =[ ][][][][][]()[]11ˆˆ11ˆ1,ˆ−−−−+−=− tJtVtVtJtJtVttVT[][ ] [
View Full Document