DOC PREVIEW
CMU CS 10708 - hw4

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Probablistic Graphical Models, Spring 2007Homework 4Due at the beginning of class on 11/26/07InstructionsThere are four questions in this homework. The last question involves some programming which should be done inMATLAB. Do not attach your code to the writeup. Make a tarball of your code and output, name it <userid>.tgzwhere your userid is your CS or Andrew id, and copy it to /afs/cs/academic/class/10708-f07/hw4/<userid> .If you are not submitting this from an SCS machine, you might need to authenticate yourself first. Seehttp://www.cs.cmu.edu/∼help/afs/cross realm.htmlfor instructions. If you are not a CMU student and don’thave a CS or Andrew id, email your submission to [email protected] are allowed to use any of (and only) the material distributed in class for the homeworks. This includes theslides and the handouts given in the class1. Refer to the web page for policies regarding collaboration, due dates, andextensions.1 [20 pts]Importance SamplingTo do this question, you will need to read (Koller& Friedman, 10.2.2). The likelihood weighting of an importancesampler is defined w(x) = P(x)/Q(x) where P is the distribution we want to sample from and Q is a proposal distribu-tion.1. Why is computing the probability of a complete instantiation of the variables in a Markov Random Field com-putationally intractable?2. Given a chordal graph, describe how to compute the likelihood weighting for an importance sampler(Hint:Whatis the relationship between chordal graphs and junction trees?)3. Given a non-chordal graph, describe how to compute the likelihood weighting for an importance sampler.4. Briefly comment on why it is not useful to use importance sampling for approximate inference on MRFs.2 [20 pts] BP in sigmoid networksConsider a three-layer sigmoid network (X1,...Xn,Y1,...,Yn,Z1...Zn). All the variables are binary; the variablesX1,...Xnin the top layer are all independent of each other; the variables in the second and third layers depend on thevariables in their previous layer CPT:P(yj|x1...xn) = sigmoid(∑iw1i, jxi+ w10, j)P(zj|y1...yn) = sigmoid(∑iw2i, jyi+ w20, j)where sigmoid(x) = 1/(1+ exp−t).1Please contact Monica Hopes(meh@cs)if you need a copy of a handout.11. Write down the belief propagation updates for the network.2. Is the above graph loopy? Are the updates in the previous section guaranteed to converge?3 [30 pts] Mean Field Variational Inference for Admixture ModelsLet us consider a specific case of the Admixture model for text, that uses a Dirichlet prior. This model is commonlyknown as the Latent Dirichlet Allocation (LDA) model. The graphical representation of the model is shown in figure1(a). The complete data log-likelihood is given as follows:P({wd}Nd=1,{{zdn(.)}Ndn=1}Md=1,{θd}Md=1|{βk}Kk=1,α)=M∏d=1(P(θd|α)Nd∏n=1K∏k=1(P(zdnk= 1|θd)P(wdn|βk))zdnk)=M∏d=1( Γ(Kα)(Γ(α))KK∏k=1(θdk)α−1!Nd∏n=1K∏k=1(θdkβkwdn)zdnk)(1)where M is the number of documents, Ndis the length of document d, K is the number of topics, n is the position inthe document and k is the topic index. wd= (wd1,···,wdNd) is the text of the document, where wdn∈ {1, · · · ,V} is theword at the nthposition in document d, where V is the vocabulary size.• The vector zdn(·)= (zdn1,···,zdnK) is a topic indicator vector for position n in the document d, such that zdnk∈{0,1} and∑kzdnk= 1,• θd= (θd1,···,θdK)|∑Kk=1θdk= 1;θdk≥ 0 is the multinomial topic mixing proportions for document d,• α is the parameter of the symmetric Dirichlet distribution that generates θdfor any document d,• βk= (βk1,···,βkV)|∑Vw=1βkw= 1;βkw≥ 0 is a multinomial distribution over the vocabulary for topic k, and• Γ() is the Gamma function.1. This model is intractable for exact inference of P(θd,zdn(·)|w,β,α). Let us actually understand and verify thisfact: construct a junction-tree for a single document d with Nd= 3. In other words, assume the documentis (w1,w2,w3). What is the maximum tree-width? Does the tree-width grow with increasing Nd? Write anexpression for the marginal for P(z1= k|β,α) and P(θd|β,α) in terms of the messages on the junction tree. Is itpossible to compute the marginals? what is the main reason for intractability of exact inference?2. Let us use mean-field variational technique for approximate inference. In particular, let us approximate theposterior by the following distribution:Q(θd,{zdn}Ndn=1|γd,{φdn}Ndn=1) =(P(θd|γd)Nd∏n=1(φdnk)zdnk)(2)where φdn= (φdn1,···,φdnK) is a variational multinomial distribution over topics for position n in the documentd and γd= (γd1,···,γdK) are the parameters of a variational Dirichlet distribution given by:P(θd|γd) =Γ(∑kγdk)∏kΓ(γdk)K∏k=1(θdk)γdk−1(3)The graphical representation for the variational distribution is given in figure 1(b).2(b) Mean field variational distribution for LDA MNwαθ βzγ θzφΜΝ(a) LDA graphical representationFigure 1: Admixture model: graphical representationStarting with the results of the mean field variational inference given in Eq. (4) below, derive closed formexpressions for the variational parameters γdkand φdnk.Q(x) =1Zexp(∑Φ:X∈scope(Φ)EQ(U)[lnΦ(UΦ,x)]) (4)where Φ is a factor in the graphical model andUΦ= scope(Φ)−X. In your derivation, you will use the followingproperties of the Dirichlet distribution:EQ(θd|γd)[logθdk] = Ψ(γdk) − Ψ(K∑k′=1γdk′) (5)where Ψ(γdk) =ddγdklogΓ(γdk) is the digamma function. Prove the result above using the properties of the expo-nential family we discussed in class. (Hint: convert Dirichlet to its natural parametrization and use the propertythat the expected value of a sufficient statistic is equal to the first derivative of the log-partition function w.r.t thecorresponding natural parameter.)3. Now, let us consider the Logistic Normal admixture model, known in common parlance as the Correlated TopicModel. It differs from LDA in only that the symmetric Dirichlet prior with parameter α is replaced by a Logisticnormal distribution, which is given as follows:P(ηd|µ,Σ) = N (µ,Σ) (6)where ηd= (ηd1,···,ηdK) with each ηdk∈ R. Each θdkis a simple logistic transformation of ηdkgiven byθdk=exp(ηdk)∑Kk′=1exp(ηdk′)(7)Using a logistic normal distribution as described above, allows us to capture correlations in topics, given by thematrix Σ.Note that the description of logistic normal distribution above is actually an overcomplete


View Full Document

CMU CS 10708 - hw4

Documents in this Course
Lecture

Lecture

15 pages

Lecture

Lecture

25 pages

Lecture

Lecture

24 pages

causality

causality

53 pages

lecture11

lecture11

16 pages

Exam

Exam

15 pages

Notes

Notes

12 pages

lecture

lecture

18 pages

lecture

lecture

16 pages

Lecture

Lecture

17 pages

Lecture

Lecture

15 pages

Lecture

Lecture

17 pages

Lecture

Lecture

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

16 pages

r6

r6

22 pages

lecture

lecture

20 pages

lecture

lecture

35 pages

Lecture

Lecture

19 pages

Lecture

Lecture

21 pages

lecture

lecture

21 pages

lecture

lecture

13 pages

review

review

50 pages

Semantics

Semantics

30 pages

lecture21

lecture21

26 pages

MN-crf

MN-crf

20 pages

lecture

lecture

12 pages

Lecture

Lecture

25 pages

Lecture

Lecture

25 pages

Lecture

Lecture

14 pages

Lecture

Lecture

15 pages

Load more
Download hw4
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view hw4 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view hw4 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?