Unformatted text preview:

Introduction to Bayesian NetworksOverviewProbabilistic ReconstructionBayesian NetworksExample 1 – Icy RoadsExample 1 – Icy RoadsConditional IndependenceConditional IndependencyExample 2 – Rain/SprinklerExample 2 – Rain/SprinklerExplaining AwayConditional DependenceGraph SemanticsChain/LinearDivergingConvergingGraph SemanticsD-SeparationEquivalence of NetworksBayesian NetworksExample Bayesian NetworkBN Probability DistributionBNs are Compact Descriptions of P(X)Recall from HMM/CRF LectureCPDsBayesian Networks for InferenceWalking Through an ExampleWalking Through an ExampleWalking Through an ExampleWalking Through an ExampleWalking Through an ExampleWalking Through an ExampleWalking Through an ExampleWalking Through an ExampleWalking Through an ExampleWalking Through an ExampleWalking Through an ExampleWalking Through an ExampleWalking Through an ExampleWalking Through an ExampleWalking Through an ExampleMessage Passing in Junction TreesMessage Passing in Junction TreesHUGIN AlgorithmBN to Junction TreeRecall from HMM/CRF LectureApplicationsLatent VariablesLatent VariablesLatent VariablesObservation vs InterventionExample – SprinklerCausal vs ProbabilisticLearning Bayesian NetworksLearning QLearning Bayesian NetworksLearning SLearning S – Heuristic SearchModel AveragingSampling Models - MCMCMIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, EvolutionFall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.Introduction to Bayesian Networks6.047/6.878 Computational Biology: Genomes, Networks, EvolutionOverview• We have looked at a number of graphical representations of probability distributions– DAG example: HMM– Undirected graph example: CRF• Today we will look at a very general graphical model representation – Bayesian Networks• One application – modeling gene expression•Aviv Regevguest lecture – an extension of this basic ideaProbabilistic Reconstruction• Expression data gives us information about what genes tend to be expressed with others• In probability terms, information about the joint distribution over gene states X:P(X)=P(X1,X2,X3,X4,…,Xm)Can we model this joint distribution?Bayesian Networks• Directed graph encoding joint distribution variables XP(X) = P(X1,X2,X3,…,XN)• Learning approaches• Inference algorithms• Captures information about dependency structure of P(X)Example 1 – Icy RoadsIIcy RoadsWWatsonCrashesHHolmesCrashesCausalimpactAssume we learn that Watson has crashedGiven this causal network, one might fear Holmes has crashed too. Why?Example 1 – Icy RoadsIIcy RoadsWWatsonCrashesHHolmesCrashesNow imagine we have learned that roads are not icyWe would no longer have an increased fear that Holmes has crashedConditional IndependenceIIcy RoadsWWatsonCrashesHHolmesCrashesIf we know nothing about I, W and H are dependentIf we know I, W and H are conditionally independentNo Info on IWe Know IConditional Independency• Independence of 2 random variables• Conditional independence given a third(,) ()()XY PXY PXPY⊥⇔ =|(,|)(|)(|)but ( , ) ( ) ( ) necessarilyXYZ PXYZ PXZPYZPXY PXPY⊥⇔ =≠Example 2 – Rain/SprinklerRRainWWatsonWetHHolmesWetSSprinklerHolmes discovers his house is wet.Could be rain or his sprinkler.RRainWWatsonWetHHolmesWetSSprinklerNow imagine Holmes sees Watsons grass is wetNow we conclude it was probably rain And probably not his sprinklerExample 2 – Rain/SprinklerExplaining AwayRRainWWatsonWetHHolmesWetSSprinklerInitially we had two explanations for Holmes’ wet grass.But once we had more evidence for R, this explained away H and thus no reason for increase in SConditional DependenceRRainWWatsonWetHHolmesWetSSprinklerIf we don’t know H, R and S are …independentBut if we know H, R and S are conditionally dependentDon’t know HKnow HGraph SemanticsYX ZYX ZYX ZSerial Diverging ConvergingEach implies a particular independence relationshipThree basic building blocksChain/Linear|XZ⊥∅|XZY⊥YX ZYX ZConditional IndependenceDiverging|XZ⊥∅|XZY⊥YX ZYX ZConditional IndependenceConverging|XZ⊥∅|XZY⊥YX ZYX ZConditional Dependence - Explaining AwayGraph SemanticsYX ZConvergingThree basic building blocksA B….D-SeparationDefinition : Two nodes A and B are d-separated (or blocked) if for every path p between A and B there is a node V such that either1. The connection is serial or divergingand V is known2. The connection is converging and V and all of its descendants are unknownIf A and B are not d-separated, they are d-connectedThree semantics combine in concept of d-separationEquivalence of NetworksTwo structures are equivalent if they represent same independence relationship - they encode the same space of probability distributionsWill return to this when we consider causal vs probabilistic networksYX ZYX ZYX ZExampleBayesian Networks• A network structure S– Directed acyclic graph (DAG)– Nodes => random variables X– Encodes graph independence sematics• Set of probability distributions P– Conditional Probability Distribution (CPD)– Local distributions for XA Bayesian network (BN) for X = {X1, X2, X3,…, Xn} consists of:Example Bayesian NetworkRRainWWatsonWetHHolmesWetSSprinkler() ()(|)(|,)(|,,)()()( | )( |,)PX PRPS RPW RSPH RSWPRPSPW RPH SR==BN Probability DistributionOnly need distributions over nodes and their parents1111( ) ( ,..., )(())niiiniiiPX PX X XPX paX−====∏∏BNs are Compact Descriptions of P(X)Independencies allow us to factorize distributionExample- Assume 2 states per nodeX4X2 X3X1X51213214321 43211( ) P(X )P(X |X )P(X |X ,X ) P(X |X ,X ,X )P(X5|X ,X ,X ,X ) 2 4 8 16 32 62 entries() ( ( )) P(X1)P(X2|X1)P(X3|X1) P(X4|X2)P(X5|X3) 2 4niiiPXPX PX paX==⇒+++ + ===⇒++∏44418 entries++=Recall from HMM/CRF Lecture()()all nodes vii-1 iiP(X,Y) = P(v|parents(v))= PY|Y PX|Y∏∏Y2Y3Y1X2X3X1Y4X4Y2Y3Y1Y4X2Directed Graph SemanticsFactorizationPotential Functions over Cliques(conditioned on X)Markov Random FieldFactorization()all nodes vii-1P(Y|X) = P(v|clique(v),X)= P Y | Y ,X∏∏CPDsRW HSRS HP(H|R,S)00 1 0.100 0 0.901 1 0.801 0 0.210 1 0.910 0 0.111 1 0.111 0 0.9DiscreteContinuousBayesian Networks for Inference• Observe values (evidence on) of a set of nodes, want to predict state of other nodes• Exact Inference– Junction Tree Algorithm• Approximate Inference– Variational approaches, Monte Carlo samplingObservational


View Full Document
Download Study guide
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study guide and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study guide 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?