CMU CS 10708 - hw4 - D792004

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10708> hw4

DOC PREVIEW

CMU CS 10708 - hw4

School name Carnegie Mellon University

Course Cs 10708- Probabilistic Graphical Models

Pages 5

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Probablistic Graphical Models, Spring 2007Homework 4Due at the beginning of class on 11/26/07InstructionsThere are four questions in this homework. The last question involves some programming which should be done inMATLAB. Do not attach your code to the writeup. Make a tarball of your code and output, name it <userid>.tgzwhere your userid is your CS or Andrew id, and copy it to /afs/cs/academic/class/10708-f07/hw4/<userid> .If you are not submitting this from an SCS machine, you might need to authenticate yourself first. Seehttp://www.cs.cmu.edu/∼help/afs/cross realm.htmlfor instructions. If you are not a CMU student and don’thave a CS or Andrew id, email your submission to [email protected] are allowed to use any of (and only) the material distributed in class for the homeworks. This includes theslides and the handouts given in the class1. Refer to the web page for policies regarding collaboration, due dates, andextensions.1 [20 pts]Importance SamplingTo do this question, you will need to read (Koller& Friedman, 10.2.2). The likelihood weighting of an importancesampler is defined w(x) = P(x)/Q(x) where P is the distribution we want to sample from and Q is a proposal distribu-tion.1. Why is computing the probability of a complete instantiation of the variables in a Markov Random Field com-putationally intractable?2. Given a chordal graph, describe how to compute the likelihood weighting for an importance sampler(Hint:Whatis the relationship between chordal graphs and junction trees?)3. Given a non-chordal graph, describe how to compute the likelihood weighting for an importance sampler.4. Briefly comment on why it is not useful to use importance sampling for approximate inference on MRFs.2 [20 pts] BP in sigmoid networksConsider a three-layer sigmoid network (X1,...Xn,Y1,...,Yn,Z1...Zn). All the variables are binary; the variablesX1,...Xnin the top layer are all independent of each other; the variables in the second and third layers depend on thevariables in their previous layer CPT:P(yj|x1...xn) = sigmoid(∑iw1i, jxi+ w10, j)P(zj|y1...yn) = sigmoid(∑iw2i, jyi+ w20, j)where sigmoid(x) = 1/(1+ exp−t).1Please contact Monica Hopes(meh@cs)if you need a copy of a handout.11. Write down the belief propagation updates for the network.2. Is the above graph loopy? Are the updates in the previous section guaranteed to converge?3 [30 pts] Mean Field Variational Inference for Admixture ModelsLet us consider a specific case of the Admixture model for text, that uses a Dirichlet prior. This model is commonlyknown as the Latent Dirichlet Allocation (LDA) model. The graphical representation of the model is shown in figure1(a). The complete data log-likelihood is given as follows:P({wd}Nd=1,{{zdn(.)}Ndn=1}Md=1,{θd}Md=1|{βk}Kk=1,α)=M∏d=1(P(θd|α)Nd∏n=1K∏k=1(P(zdnk= 1|θd)P(wdn|βk))zdnk)=M∏d=1( Γ(Kα)(Γ(α))KK∏k=1(θdk)α−1!Nd∏n=1K∏k=1(θdkβkwdn)zdnk)(1)where M is the number of documents, Ndis the length of document d, K is the number of topics, n is the position inthe document and k is the topic index. wd= (wd1,···,wdNd) is the text of the document, where wdn∈ {1, · · · ,V} is theword at the nthposition in document d, where V is the vocabulary size.• The vector zdn(·)= (zdn1,···,zdnK) is a topic indicator vector for position n in the document d, such that zdnk∈{0,1} and∑kzdnk= 1,• θd= (θd1,···,θdK)|∑Kk=1θdk= 1;θdk≥ 0 is the multinomial topic mixing proportions for document d,• α is the parameter of the symmetric Dirichlet distribution that generates θdfor any document d,• βk= (βk1,···,βkV)|∑Vw=1βkw= 1;βkw≥ 0 is a multinomial distribution over the vocabulary for topic k, and• Γ() is the Gamma function.1. This model is intractable for exact inference of P(θd,zdn(·)|w,β,α). Let us actually understand and verify thisfact: construct a junction-tree for a single document d with Nd= 3. In other words, assume the documentis (w1,w2,w3). What is the maximum tree-width? Does the tree-width grow with increasing Nd? Write anexpression for the marginal for P(z1= k|β,α) and P(θd|β,α) in terms of the messages on the junction tree. Is itpossible to compute the marginals? what is the main reason for intractability of exact inference?2. Let us use mean-field variational technique for approximate inference. In particular, let us approximate theposterior by the following distribution:Q(θd,{zdn}Ndn=1|γd,{φdn}Ndn=1) =(P(θd|γd)Nd∏n=1(φdnk)zdnk)(2)where φdn= (φdn1,···,φdnK) is a variational multinomial distribution over topics for position n in the documentd and γd= (γd1,···,γdK) are the parameters of a variational Dirichlet distribution given by:P(θd|γd) =Γ(∑kγdk)∏kΓ(γdk)K∏k=1(θdk)γdk−1(3)The graphical representation for the variational distribution is given in figure 1(b).2(b) Mean field variational distribution for LDA MNwαθ βzγ θzφΜΝ(a) LDA graphical representationFigure 1: Admixture model: graphical representationStarting with the results of the mean field variational inference given in Eq. (4) below, derive closed formexpressions for the variational parameters γdkand φdnk.Q(x) =1Zexp(∑Φ:X∈scope(Φ)EQ(U)[lnΦ(UΦ,x)]) (4)where Φ is a factor in the graphical model andUΦ= scope(Φ)−X. In your derivation, you will use the followingproperties of the Dirichlet distribution:EQ(θd|γd)[logθdk] = Ψ(γdk) − Ψ(K∑k′=1γdk′) (5)where Ψ(γdk) =ddγdklogΓ(γdk) is the digamma function. Prove the result above using the properties of the expo-nential family we discussed in class. (Hint: convert Dirichlet to its natural parametrization and use the propertythat the expected value of a sufficient statistic is equal to the first derivative of the log-partition function w.r.t thecorresponding natural parameter.)3. Now, let us consider the Logistic Normal admixture model, known in common parlance as the Correlated TopicModel. It differs from LDA in only that the symmetric Dirichlet prior with parameter α is replaced by a Logisticnormal distribution, which is given as follows:P(ηd|µ,Σ) = N (µ,Σ) (6)where ηd= (ηd1,···,ηdK) with each ηdk∈ R. Each θdkis a simple logistic transformation of ηdkgiven byθdk=exp(ηdk)∑Kk′=1exp(ηdk′)(7)Using a logistic normal distribution as described above, allows us to capture correlations in topics, given by thematrix Σ.Note that the description of logistic normal distribution above is actually an overcomplete

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 5 pages.

CMU CS 10708 - hw4

Sign up for free to view:

Please select your school