DOC PREVIEW
CMU CS 10708 - lecture9-learningBN-annotated

This preview shows page 1-2-20-21 out of 21 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11School of Computer ScienceLearning generalized linear models and tabular CPT of structured full BNProbabilistic Graphical Models (10Probabilistic Graphical Models (10--708)708)Lecture 9, Oct 15, 2007Eric XingEric XingReceptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8Receptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8Reading: J-Chap. 7,8. Eric Xing 2z Grade for hw 1z Project proposalz Questions2Eric Xing 3Linear Regressionz Let us assume that the target variable and the inputs are related by the equation:where ε is an error term of unmodeled effects or random noisez Now assume that ε follows a Gaussian N(0,σ), then we have:iiTiyεθ+= x⎟⎟⎠⎞⎜⎜⎝⎛−−=22221σθσπθ)(exp);|(iTiiiyxypxEric Xing 4Logistic Regression (sigmoid classifier)z The condition distribution: a Bernoulliwhere µis a logistic functionz We can used the brute-force gradient method as in LRz But we can also apply generic laws by observing the p(y|x) is an exponential family function, more specifically, a generalized linear model!yyxxxyp−−=11 ))(()()|(µµxTexθµ−+=11)(3Eric Xing 5Exponential familyz For a numeric random variable Xis an exponential family distribution with natural (canonical) parameter ηz Function T(x) is a sufficient statistic.z Function A(η) = log Z(η) is the log normalizer.z Examples: Bernoulli, multinomial, Gaussian, Poisson, gamma,...{}{})(exp)()()()(exp)()|(xTxhZAxTxhxpTTηηηηη1=−=XnNEric Xing 6Multivariate Gaussian Distributionz For a continuous vector random variable X∈Rk:z Exponential family representationz Note: a k-dimensional Gaussian is a (d+d2)-parameter distribution with a (d+d2)-element vector of sufficient statistics (but because of symmetry and positivity, parameters are constrained and have lower degree of freedom)()()(){}Σ−Σ−Σ+Σ−=⎭⎬⎫⎩⎨⎧−Σ−−Σ=Σ−−−−logtrexp)()(exp),(///µµµπµµπµ121112121212212121TTTkTkxxxxxxp()[]()[]()[]()()22211122112112121121121122/)(log)(trlog)(vec;)( and ,vec,vec;kTTTxhAxxxxT−−−−−−=−−−=Σ+Σ==Σ−=Σ==Σ−Σ=πηηηηµµηηµηηηµηMoment parameterNatural parameter4Eric Xing 7Multinomial distributionz For a binary vector random variable z Exponential family representation),|(multi~πxx⎭⎬⎫⎩⎨⎧==∑kkkxKxxxxpKπππππlnexp)( L2121[]110111=⎟⎠⎞⎜⎝⎛=⎟⎠⎞⎜⎝⎛−−==⎥⎦⎤⎢⎣⎡⎟⎠⎞⎜⎝⎛=∑∑=−=)(lnln)()(;lnxheAxxTKkKkkKkkηπηππη⎭⎬⎫⎩⎨⎧⎟⎠⎞⎜⎝⎛−+⎟⎟⎠⎞⎜⎜⎝⎛∑−=⎭⎬⎫⎩⎨⎧⎟⎠⎞⎜⎝⎛−⎟⎠⎞⎜⎝⎛−+=∑∑∑∑∑−=−=−=−=−=−=1111111111111111KkkKkkKkkkKkkKkKKkkkxxxπππππlnlnexplnlnexpEric Xing 8Why exponential family?z Moment generating property{}{}[])()()(exp)()()(exp)()()()()(logxTEdxZxTxhxTdxxTxhddZZddZZddddATT=====∫∫ηηηηηηηηηηη11{}{}[][][])()()()()()()(exp)()()()(exp)()(xTVarxTExTEZddZdxZxTxhxTdxZxTxhxTdAd2TT=−=−=∫∫22221ηηηηηηηη5Eric Xing 9Moment estimationz We can easily compute moments of any exponential family distribution by taking the derivatives of the log normalizerA(η).z The qthderivative gives the qthcentered moment.z When the sufficient statistic is a stacked vector, partial derivatives need to be considered.Lvariance)(mean)(==22ηηηηdAdddAEric Xing 10Moment vs canonical parametersz The moment parameter µ can be derived from the natural (canonical) parameterz A(h) is convex sincez Hence we can invert the relationship and infer the canonical parameter from the moment parameter (1-to-1):z A distribution in the exponential family can be parameterized not only by η −the canonical parameterization, but also by µ −the moment parameterization.[]µηηdef)()(== xTEddA[]022>= )()(xTVardAdηη)(defµψη=48-2-101248-2-1012Aηη∗6Eric Xing 11MLE for Exponential Familyz For iid data, the log-likelihood isz Take derivatives and set to zero:z This amounts to moment matching.z We can infer the canonical parameters using{}∑∑∏−⎟⎠⎞⎜⎝⎛+=−=nnnTnnnTnNAxTxhAxTxhD)()()(log)()(exp)(log);(ηηηηηl )()()( )()(∑∑∑==∂∂⇒=∂∂−=∂∂nnMLEnnnnxTNxTNAANxT110µηηηηη)l)(MLEMLEµψη))=Eric Xing 12Sufficiencyz For p(x|θ), T(x) is sufficient for θif there is no information in Xregarding θyeyond that in T(x).z We can throw away Xfor the purpose pf inference w.r.t. θ. z Bayesian viewz Frequentist viewz The Neyman factorization theoremzT(x) is sufficient for θif T(x)θX))(|()),(|( xTpxxTpθθ=T(x)θX))(|()),(|( xTxpxTxp=θT(x)θX))(,()),(()),(,( xTxxTxTxp21ψθψθ=))(,()),(()|( xTxhxTgxpθθ=⇒7Eric Xing 13Examplesz Gaussian:z Multinomial:z Poisson:()[]()[]()22112112112/)(log)(vec;)(vec;kTTxhAxxxxT−−−−=Σ+Σ==Σ−Σ=πµµηµη∑∑==⇒nnnnMLExNxTN111)(µ[]110111=⎟⎠⎞⎜⎝⎛=⎟⎠⎞⎜⎝⎛−−==⎥⎦⎤⎢⎣⎡⎟⎠⎞⎜⎝⎛=∑∑=−=)(lnln)()(;lnxheAxxTKkKkkKkkηπηππη∑=⇒nnMLExN1µ!)()()(logxxheAxxT1=====ηληλη∑=⇒nnMLExN1µEric Xing 14Generalized Linear Models (GLIMs)z The graphical modelz Linear regressionz Discriminative linear classificationz Commonality: model Ep(Y)=µ=f(θTX)z What is p()? the cond. dist. of Y.z What is f()? the response function.z GLIMz The observed input xis assumed to enter into the model via a linear combination of its elementsz The conditional mean µis represented as a function f(ξ) of ξ, where f is known as the response functionz The observed output yis assumed to be characterized by an exponential family distribution with conditional mean µ. XnYnNxTθξ=8Eric Xing 15GLIM, cont.z The choice of exp family is constrained by the nature of the data Yz Example: y is a continuous vector Æ multivariate Gaussiany is a class label Æ Bernoulli or multinomial z The choice of the response functionz Following some mild constrains, e.g., [0,1]. Positivity …z Canonical response function: z In this case θTxdirectly corresponds to canonical parameter η.(){})()(exp)()|(ηηηφAyxyhypT−=⇒1)(⋅=−1ψf{})()(exp)()|(ηηηAyxyhypT−=ηψfθxµξyGLIMEric Xing 16MLE for GLIMs with natural responsez Log-likelihoodz Derivative of Log-likelihoodz Online learning for canonical GLIMsz Stochastic gradient ascent = least mean squares (LMS) algorithm:()∑∑−+=nnnnnTnAyxyh )()(logηθl())()(µµθηηηθ−=−=⎟⎟⎠⎞⎜⎜⎝⎛−=∑∑yXxyddddAyxddTnnnnnnnnnnlThis is a fixed point function


View Full Document

CMU CS 10708 - lecture9-learningBN-annotated

Documents in this Course
Lecture

Lecture

15 pages

Lecture

Lecture

25 pages

Lecture

Lecture

24 pages

causality

causality

53 pages

lecture11

lecture11

16 pages

Exam

Exam

15 pages

Notes

Notes

12 pages

lecture

lecture

18 pages

lecture

lecture

16 pages

Lecture

Lecture

17 pages

Lecture

Lecture

15 pages

Lecture

Lecture

17 pages

Lecture

Lecture

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

16 pages

r6

r6

22 pages

lecture

lecture

20 pages

lecture

lecture

35 pages

Lecture

Lecture

19 pages

Lecture

Lecture

21 pages

lecture

lecture

21 pages

lecture

lecture

13 pages

review

review

50 pages

Semantics

Semantics

30 pages

lecture21

lecture21

26 pages

MN-crf

MN-crf

20 pages

hw4

hw4

5 pages

lecture

lecture

12 pages

Lecture

Lecture

25 pages

Lecture

Lecture

25 pages

Lecture

Lecture

14 pages

Lecture

Lecture

15 pages

Load more
Download lecture9-learningBN-annotated
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lecture9-learningBN-annotated and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lecture9-learningBN-annotated 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?