Unformatted text preview:

ProbabilisticGraphical ModelsLecture 12 – Dynamical ModelsCS/CNS/EE 155Andreas Krause2AnnouncementsHomework 3 out tonightStart early!!Project milestones due todayPlease email to TAs3Parameter learning for log-linear modelsFeature functions φi(Ci) defined over cliquesLog linear model over undirected graph GFeature functions φ1(C1),…,φk(Ck)Domains Cican overlapJoint distributionHow do we get weights wi?4Log-linear conditional random fieldDefine log-linear model over outputs YNo assumptions about inputs XFeature functions φi(Ci,x) defined over cliques and inputsJoint distribution5Example: CRFs in NLPClassify into Person, Location or OtherMrs. Greene spoke today in New York. Green chairs the finance committeeX1X2X3X4X5X6X7X8X9X10X11X12Y1Y2Y3Y4Y5Y6Y7Y8Y9Y10Y11Y126Example: CRFs in vision7Gradient of conditional log-likelihoodPartial derivativeRequires one inference perCan optimize using conjugate gradient8Exponential Family DistributionsDistributions for log-linear modelsMore generally: Exponential family distributionsh(x): Base measurew: natural parametersφ(x): Sufficient statisticsA(w): log-partition functionHereby x can be continuous (defined over any set)9ExamplesExp. Family:Gaussian distributionOther examples: Multinomial, Poisson, Exponential, Gamma, Weibull, chi-square, Dirichlet, Geometric, …h(x): Base measurew: natural parametersφ(x): Sufficient statisticsA(w): log-partition function10Moments and gradientsCorrespondence between moments and log-partition function (just like in log-linear models)Can compute moments from derivatives, and derivatives from moments!MLE  moment matching11Conjugate priors in Exponential FamilyAny exponential family likelihood has a conjugate prior12Exponential family graphical modelsSo far, only defined graphical models over discrete variables.Can define GMs over continuous distributions! For exponential family distributions:Can do much of what we discussed (VE, JT, parameter learning, etc.) for such exponential family modelsImportant example: Gaussian Networks13Multivariate Gaussian distributionJoint distribution over n random variables P(X1,…Xn)σjk= E[ (Xj– µj) (Xk- µk) ]Xjand Xkindependent  σjk=014MarginalizationSuppose (X1,…,Xn) ~ N( µ, Σ)What is P(X1)??More generally: Let A={i1,…,ik} ⊆ {1,…,N}Write XA= (Xi1,…,Xik)XA~ N( µA, ΣAA)15ConditioningSuppose (X1,…,Xn) ~ N( µ, Σ)Decompose as (XA,XB)What is P(XA | XB)??P(XA = xA| XB = xB) = N(xA; µA|B, ΣA|B) whereComputable using linear algebra!16Conditional linear Gaussians17Canonical RepresentationMultivariate Gaussians in exponential family!Standard vs canonical form:µ = Λ-1ηΣ = Λ-118Gaussian NetworksZeros in precision matrix Λ indicate missing edgesin log-linear model!19Inference in Gaussian NetworksCan compute marginal distributions in O(n3)!For large numbers n of variables, still intractableIf Gaussian Network has low treewidth, can use variable elimination / JT inference!Need to be able to multiply and marginalize factors!20Multiplying factors in Gaussians21Conditioning in canonical formJoint distribution (XA, XB) ~ N(ηAB,ΛAB)Conditioning: P(XA| XB= xB) = N(xA; ηA|B=xB, ΛA|B=xB)22Marginalizing in canonical formRecall conversion formulasµ = Λ-1ηΣ = Λ-1Marginal distribution23Standard vs. canonical formStandard form Canonical formMarginalizationConditioningIn standard form, marginalization is easyIn canonical form, conditioning is easy!24Variable elimination In Gaussian Markov Networks, Variable elimination = Gaussian elimination (fast for low bandwidth = low treewidth matrices)25Dynamical models26HMMs / Kalman FiltersMost famous Graphical models:Naïve Bayes modelHidden Markov modelKalman FilterHidden Markov modelsSpeech recognitionSequence analysis in comp. bioKalman Filters controlCruise control in carsGPS navigation devicesTracking missiles..Very simple models but very powerful!!27HMMs / Kalman FiltersX1,…,XT: Unobserved (hidden) variablesY1,…,YT: ObservationsHMMs: XiMultinomial, YiarbitraryKalman Filters: Xi, YiGaussian distributionsNon-linear KF: XiGaussian, YiarbitraryY1Y2Y3Y4Y5Y6X1X2X3X4X5X628HMMs for speech recognitionInfer spoken words from audio signals28Y1Y2Y3Y4Y5Y6PhonemeX1X2X3X4X5X6Words“He ate the cookies on the couch”29Hidden Markov ModelsInference:In principle, can use VE, JT etc.New variables Xt, Ytateach time step need to rerunBayesian Filtering:Suppose we already have computed P(Xt| y1,…,t)Want to efficiently compute P(Xt+1| y1,…,t+1)Y1Y2Y3Y4Y5Y6X1X2X3X4X5X630Bayesian filteringStart with P(X1)At time tAssume we have P(Xt| y1…t-1)Condition: P(Xt| y1…t)Prediction: P(Xt+1, Xt| y1…t)Marginalization: P(Xt+1| y1…t)Y1Y2Y3Y4Y5Y6X1X2X3X4X5X631Parameter learning in HMMsAssume we have labels for hidden variablesAssume stationarityP(Xt+1| Xt) is same over all time stepsP(Yt| Xt) is same over all time stepsViolates parameter independence ( parameter “sharing”)Example: compute parameters for P(Xt+1=x | Xt=x’)What if we don’t have labels for hidden vars? Use EM (later this course)32Kalman Filters (Gaussian HMMs)X1,…,XT: Location of object being trackedY1,…,YT: ObservationsP(X1): Prior belief about location at time 1P(Xt+1|Xt): “Motion model”How do I expect my target to move in the environment?Represented as CLG: Xt+1= A Xt+ N(0, ΣM)P(Yt| Xt): “Sensor model”What do I observe if target is at location Xt?Represented as CLG: Yt= H Xt+ N(0, ΣO)Y1Y2Y3Y4Y5Y6X1X2X3X4X5X633Understanding Motion model34Understanding sensor model35Bayesian Filtering for KFsCan use Gaussian elimination to perform inference in “unrolled” model Start with prior belief P(X1)At every timestep have belief P(Xt| y1:t-1)Condition on observation: P(Xt| y1:t)Predict (multiply motion model): P(Xt+1,Xt| y1:t)“Roll-up” (marginalize prev. time): P(Xt+1| y1:t)Y1Y2Y3Y4Y5Y6X1X2X3X4X5X636ImplementationCurrent belief: P(xt| y1:t-1) = N(xt; ηXt, ΛXt)Multiply sensor and motion modelMarginalize37What if observations not “linear”?Linear observations:Yt= H Xt+ noiseNonlinear observations:38Incorporating Non-gaussian observationsNonlinear observation P(Yt| Xt) not GaussianFirst approach: Approximate P(Yt| Xt) as CLGLinearize P(Yt| Xt) around current estimate E[Xt| y1..t-1]Known as Extended Kalman Filter (EKF)Can perform poorly if P(Yt| Xt) highly nonlinearSecond approach: Approximate P(Yt, Xt) as GaussianTakes correlation in Xtinto accountAfter obtaining approximation, condition on Yt=yt(now a “linear”


View Full Document

CALTECH CS 155 - Probabilistic Graphical Models

Download Probabilistic Graphical Models
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Probabilistic Graphical Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Probabilistic Graphical Models 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?