DOC PREVIEW
CALTECH CS 155 - Probabilistic Graphical Models

This preview shows page 1-2-15-16-31-32 out of 32 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

ProbabilisticGraphical ModelsLecture 2 – Bayesian Networks RepresentationCS/CNS/EE 155Andreas Krause2AnnouncementsWill meet in Steele 102 for nowStill looking for another 1-2 TAs..Homework 1 will be out soon. Start early!! ☺3Multivariate distributionsInstead of random variable, have random vectorX(ω) = [X1(ω),…,Xn(ω)]Specify P(X1=x1,…,Xn=xn)Suppose all Xiare Bernoulli variables.How many parameters do we need to specify?34Marginal distributionsSuppose we have joint distribution P(X1,…,Xn)ThenIf all Xibinary: How many terms?45Rules for random variablesChain ruleBayes’ rule6Key concept: Conditional independenceEvents α, β conditionally independent given γ ifRandom variables X and Y cond. indep. given Z iffor all x∈ Val(X), y∈ Val(Y), Z∈ Val(Z)P(X = x, Y = y | Z = z) = P(X =x | Z = z) P(Y = y| Z= z)If P(Y=y |Z=z)>0, that’s equivalent toP(X = x | Z = z, Y = y) = P(X = x | Z = z)Similarly for sets of random variables X, Y, ZWe write: P  X ⊥ Y | Z67Why is conditional independence useful?P(X1,…,Xn) = P(X1) P(X2| X1) … P(Xn| X1,…,Xn-1)How many parameters?Now suppose X1…Xi-1⊥ Xi+1… Xn| Xi for all iThenP(X1,…,Xn) =How many parameters?Can we compute P(Xn) more efficiently?8Properties of Conditional IndependenceSymmetryX ⊥ Y | Z ⇒ Y ⊥ X | ZDecompositionX ⊥ Y,W | Z ⇒ X ⊥ Y | ZContraction(X ⊥ Y | Z) Æ (X ⊥ W | Y,Z) ⇒ X ⊥ Y,W | ZWeak unionX ⊥ Y,W | Z ⇒ X ⊥ Y | Z,WIntersection(X ⊥ Y | Z,W) Æ (X ⊥ W | Y,Z) ⇒ X ⊥ Y,W | ZHolds only if distribution is positive, i.e., P>09Key questionsHow do we specify distributions that satisfy particular independence properties? RepresentationHow can we exploit independence properties for efficient computation? InferenceHow can we identify independence properties present in data? LearningWill now see example: Bayesian Networks10Key ideaConditional parameterization (instead of joint parameterization)For each RV, specify P(Xi| XA) for set XAof RVsThen use chain rule to get joint parametrizationHave to be careful to guarantee legal distribution…11Example: 2 variables12Example: 3 variables13Example: Naïve Bayes modelsClass variable YEvidence variables X1,…,XnAssume that XA⊥ XB| Y for all subsets XA,XBof {X1,…,Xn}Conditional parametrization:Specify P(Y)Specify P(Xi| Y)Joint distribution14Today: Bayesian networksCompact representation of distributions over large number of variables(Often) allows efficient exact inference (computing marginals, etc.)HailFinder56 vars~ 3 states each~1026terms> 10.000 yearson Top supercomputersJavaBayes applet15Causal parametrizationGraph with directed edges from (immediate) causes to (immediate) effects Earthquake BurglaryAlarmJohnCallsMaryCalls16Bayesian networksA Bayesian network structure is a directed, acyclic graph G, where each vertex s of G is interpreted as a random variable Xs (with unspecified distribution)A Bayesian network (G,P) consists of A BN structure G and ....a set of conditional probability distributions (CPTs)P(Xs| PaXs), where PaXsare the parents of node Xssuch that(G,P) defines joint distribution17Bayesian networksCan every probability distribution be described by a BN?18Representing the world using BNsWant to make sure that I(P) ⊆ I(P’)Need to understand CI properties of BN (G,P) s1s2s3s4s5s7s6s11s12s9s10s8s1s3s12s9True distribution P’with cond. ind. I(P’)Bayes net (G,P)with I(P)represent19Which kind of CI does a BN imply?E BAJ M20Which kind of CI does a BN imply?E BAJ M21Local Markov AssumptionEach BN Structure G is associated with the following conditional independence assumptionsX ⊥ NonDescendentsX| PaXWe write Iloc(G) for these conditional independencesSuppose (G,P) is a Bayesian network representing PDoes it hold that Iloc(G) ⊆ I(P)?If this holds, we say G is an I-map for P.22Factorization Theorems1s2s3s4s5s7s6s11s12s9s10s8s1s3s12s9Iloc(G) ⊆ I(P)True distribution Pcan be represented exactly asG is an I-map of P(independence map)i.e., P can be represented asa Bayes net (G,P)23Factorization Theorems1s2s3s4s5s7s6s11s12s9s10s8s1s3s12s9Iloc(G) ⊆ I(P)G is an I-map of P(independence map)True distribution Pcan be represented exactly asa Bayes net (G,P)24Proof: I-Map to factorization25Factorization Theorems1s2s3s4s5s7s6s11s12s9s10s8s1s3s12s9Iloc(G) ⊆ I(P)G is an I-map of P(independence map)True distribution Pcan be represented exactly asa Bayes net (G,P)26The general case27Factorization Theorems1s2s3s4s5s7s6s11s12s9s10s8s1s3s12s9Iloc(G) ⊆ I(P)True distribution Pcan be represented exactly asBayesian network (G,P)G is an I-map of P(independence map)28Defining a Bayes NetGiven random variables and known conditional independencesPick ordering X1,…,Xnof the variablesFor each XiFind minimal subset A ⊆{X1,…,Xi-1} such that Xi⊥ X¬A| A, where ¬A = {X1,…,Xn} \ ASpecify / learn CPD(Xi| A)Ordering matters a lot for compactness of representation! More later this course.29Adding edges doesn’t hurtTheorem: Let G be an I-Map for P, and G’ be derived from G by adding an edge. Then G’ is an I-Map of P(G’ is strictly more expressive than G)Proof30Additional conditional independenciesBN specifies joint distribution through conditional parameterization that satisfies Local Markov PropertyBut we also talked about additional properties of CIWeak Union, Intersection, Contraction, …Which additional CI does a particular BN specify?All CI that can be derived through algebraic operations31What you need to knowBayesian networksLocal Markov propertyI-MapsFactorization Theorem32TasksSubscribe to Mailing listhttps://utils.its.caltech.edu/mailman/listinfo/cs155Read Koller & Friedman Chapter 3.1-3.3Form groups and think about class projects. If you have difficulty finding a group, email Pete


View Full Document

CALTECH CS 155 - Probabilistic Graphical Models

Download Probabilistic Graphical Models
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Probabilistic Graphical Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Probabilistic Graphical Models 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?