Unformatted text preview:

1My PlanRemaining Lectures12 April: Learning Time Series (DBNs)12 - 15 April: CaliforniaPhone: 650-325-714316 April: Review: Graphical Models and Inference19 April: Review: Learning in Graphical ModelsStudent Presentations20 Minutes21 April: Maria, Noam23 April: Mark, LiangLast Time ILearning Local StructureDefault TablesDecision TreesTake Home Lesson:When data is completeEstimation problem decomposes.Optimize individual probability distributions for MDL/BDeNot limited to any specific local structureMDL framework straightforward to extend for GAM, other models.BDe less straightforward: Need a “nice” prior (conjugate)When data is incompleteEstimation problem doesn’t decomposeUse EM to complete the sample and optimize distributionsindependently.2Last Time IIGaussian NetworksNifty Shachter/Kenley technique for computing precisionmatrices:BDe for Gaussian Belief NetworksConjugate distributionPriorPrecision W: WishartMean m: F(W)X: N(m,W)PosteriorX: t-distribution. Approximate as gaussian()()−−+=+++++++++1111111111iiTiiiiTiibbbbiWiWννννrrrr()111ν=WLast Time IIPriorPrior network.Marginal for X is T-distributed, but use gaussian for priornetwork and assess an equivalent number of counts for boththe variance and the mean.(These are not separable)Model Selection (P{D})Closed Form Solution.Bayesian Estimate[][] []{}[]()()()[]()[]()21221’’111121’’21’’,1;,1|1+−−−+−++−+−Γ+Γ=+−=+WmXTmXnnnTTnXTmXXmXPTWnWWWwαµµαπααααL()111’’−++−= TnTWµµααα3Last Time IIBDe PriorMarginal for X is T. Approximate as Gaussian.Equivalent samples Separate counts for the joint variance and the unconditionalmean. (These are not separable)Model Selection (P{D})Bayesian Estimate“Gaussian”:Posteriorνµ=()TnWW111−−+=−αααµµ{}()()()2’22/2/’’,,’2|WWTTncncGDPWWnnMααµµααααπ=−MxM++=µµαναν’[]∑=mmxMx1M+=µµαα’()()TMxxMMMSTT −−+++=ννααµµ’MWW+=αα’[]()[]()∑−−=mTMxmxxmxMS1Last Time IIGaussian Learning Take Home LessonPrior and marginal are complicatedBUT, the formulas for updating the parameters of the NW,P{D}, P{X|X1…XN}, MAP are closed form.Function of sufficient statistics (surprise...)ExtensionsProbably works for all distribution families (need conjugate forBDe)Define model selection criterionP{D} and MDL penalty, if applicable.Define ML or MAP estimate for distribution.4Dynamic Belief NetworksTwo Topics:General DBN LearningLinear Dynamic SystemsImportant special case.DBN Review:X[0]X[t]X[t-1]PriorNetworkTransitionNetworkDBN Review, cont’dUnrolled NetworkSpecial NetworksHMM:Discrete MM hidden stateContinuous or discrete observationsLinear Dynamic SystemLinear gaussian state and observation dist’nsX[0]X[1]PriorNetworkX[2]X[3]X[0]X[t]X[t-1]5Input: sequences, l th sequence is of lengthOutput:Transition networkPrior networkLearning in discrete DBNs[Friedman, Murphy, and Russell; UAI-98]seqNlNNotationVariable i, time slice t, sequence l:Dirichlet parameter for prior network:Dirichlet parameter for transition network:Sufficient statistics:[]txli[]{}jPakXPiikji=== |0)0(,,θ[](){}jtPaktXPiikji===→][|,,θ[][]∑===lliikjiXjPakXIN ;,0)0(,,[][]∑∑===→ltliikjiXjPaktXIN ;,,,6P{X[n]|X[n-1]}P{X[n]|X[n-1]}Expected answer for complete dataAnswer is identical to normal BN learning.ProbabilityLog likelihoodML estimateLikelihood decomposes{}()() ()∏∏∏∏∏∏→→×=ΘijkNkjiijkNkjikjikjiGDP,,)0(,,,,0,,,|θθ()() ()∑∑∑∑∑∑→→+=ijkkjikjiijkkjikjiNNDGL,,,,0,,0,,loglog:θθ()()()∑=kkjikjikjiNN0,,0,,0,,ˆθ∑→→→=kkjikjikjiNN,,,,,,ˆθ7BIC()→+= BICBICDGBIC0:() ()00,,0,,0#2loglog GNNBICseqijkkjikji+=∑∑∑θ→→→→+=∑∑∑GNNBICijkkjikji#2loglog,,,,θBDeSame as normal BDe:Network and count for prior networkNetwork and count for transition networkFor both MDL and BDe, best net for is independent ofthe best net for()[] []{}jPakXPNiiBkji==×= 0|0ˆ00)0(,,α[] [](){}jtPaktXPNiiBkji==×=→→|ˆ0,,α0B→B8Learning with incomplete dataThis is just the same old algorithm...SEM:Use the EM algorithm to complete the data.Determine the expected sufficient statistics.Select the ML or MAP score for the expected sufficient statistics.Search over structures forIf HMM,This is the Baum-Welch algorithm()()→BB ,0BATmobileGenerated data that would be seen by a cameraoverlooking a highway.3500 vehicles simulated for 40-70 time steps.Train on 250 - 1500 vehicles and test on 2000.Learned decision tree CPTs using both BIC and BDe9Learned Modelsreldist: distance to car infront.relspeed: relative speedto car in frontleftblocked(rightblocked): car isto the left (right).xdot: left-right velocityydot: forward velocity orforward acceleration(not relative)Hidden StateHidden node has theinterpretation “avoid thecar in front”10Two hidden nodesPossible:distance and relativespeed policy as afunction of lane.lane and speed policyas a function of thecar on the right.Linear Dynamic Systems[Roweis and Ghahramani, 1997]Y[t]X[t][] [ ]()→→− VtXAtXN ,1;[] []()obsobsVtXAtYN ,;11Learning[Shumway + Stoffer, 82; Ghahramani + Hinton, 96]E-Step:Run b-net inference algorithm to estimate the mean andvariance for each X:M-Step:[] []tVtmˆ,ˆ[] []∑=tTtYtYα[] []∑==ntTtXtY,0ˆδ∑−==1,1 nttγγ[] [ ] []∑=−+−=ntTttVtXtX01,ˆ1ˆˆβ()10−++=nobsAγγγδNAVTobsobsδα−=[] [] []tVtXtXTtˆˆˆ+=γ()10−→+=γγβA()1−−+=→→NAVTnβγγ[]0ˆ0XX =[]0ˆ0VV =[ ][][][][][]()[]11ˆˆ11ˆ1,ˆ−−−−+−=− tJtVtVtJtJtVttVT[][ ] [


View Full Document

Duke STA 294 - Lecture

Documents in this Course
Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?