Unformatted text preview:

Bayesian networksChapter 14.1–3Chapter 14.1–3 1Outline♦ Syntax♦ Semantics♦ Parameterized distributionsChapter 14.1–3 2Bayesian networksA simple, graphical notation for conditional independence assertionsand hence for compact specification of full joint distributionsSyntax:a set of nodes, one per variablea directed, acyclic graph (link ≈ “directly influences”)a conditional distribution for each node given its parents:P(Xi|Parents(Xi))In the simplest case, conditional distribution represented asa conditional probability table (CPT) giving thedistribution over Xifor each combination of parent valuesChapter 14.1–3 3ExampleTopology of network encodes conditional independence assertions:WeatherCavityToothache CatchW eather is independent of the other variablesT oothache and Catch are conditionally independent given CavityChapter 14.1–3 4ExampleI’m at work, neighbor John calls to say my alarm is ringing, but neighborMary doesn’t call. Sometimes it’s set off by minor earthquakes. Is there aburglar?Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCallsNetwork topology reflects “causal” knowledge:– A burglar can set the alarm off– An earthquake can set the alarm off– The alarm can cause Mary to call– The alarm can cause John to callChapter 14.1–3 5Example contd..001P(B).002P(E)AlarmEarthquakeMaryCallsJohnCallsBurglaryBTTFFETFTF.95.29.001.94P(A|B,E)ATF.90.05P(J|A)ATF.70.01P(M|A)Chapter 14.1–3 6CompactnessACPTforBooleanXiwith k Boolean parents hasB EJAM2krows for the combinations of parent valuesEach row requires one number p for Xi= true(the number for Xi= false is just 1 −p)If each variable has no more than k parents,the complete network requires O(n · 2k) numbersI.e., grows linearly with n,vs.O(2n) for the full joint distributionFor burglary net, 1+1+4+2+2=10numbers (vs. 25− 1=31)Chapter 14.1–3 7Global semanticsGlobal semantics defines the full joint distributionB EJAMas the product of the local conditional distributions:P (x1,...,xn)=Πni =1P (xi|parents(Xi))e.g., P (j ∧ m ∧ a ∧¬b ∧¬e)=Chapter 14.1–3 8Global semantics“Global” semantics defines the full joint distributionB EJAMas the product of the local conditional distributions:P (x1,...,xn)=Πni =1P (xi|parents(Xi))e.g., P (j ∧ m ∧ a ∧¬b ∧¬e)= P (j|a)P (m|a)P (a|¬b, ¬e)P (¬ b)P (¬e)=0.9 ×0.7 ×0.001 ×0.999 ×0.998≈ 0.00063Chapter 14.1–3 9Local semanticsLocal semantics: each node is conditionally independentof its nondescendants given its parents. . .. . .U1XUmYnZnjY1Z1jTheorem: Local semantics ⇔ global semanticsChapter 14.1–3 10Markov blanketEach node is conditionally independent of all others given itsMarkov blanket: parents + children + children’s parents. . .. . .U1XUmYnZnjY1Z1jChapter 14.1–3 11Constructing Bayesian networksNeed a method such that a series of locally testable assertions ofconditional independence guarantees the required global semantics1. Choose an ordering of variables X1,...,Xn2. For i =1tonadd Xito the networkselect parents from X1,...,Xi−1such thatP(Xi|Parents(Xi)) = P(Xi|X1,...,Xi−1)This choice of parents guarantees the global semantics:P(X1,...,Xn)=Πni =1P(Xi|X1,...,Xi−1) (chain rule)=Πni =1P(Xi|Parents(Xi)) (by construction)Chapter 14.1–3 12ExampleSuppose we choose the ordering M, J, A, B, EMaryCallsJohnCallsP (J|M)=P (J)?Chapter 14.1–3 13ExampleSuppose we choose the ordering M, J, A, B, EMaryCallsAlarmJohnCallsP (J|M)=P (J)?NoP (A|J, M )=P (A|J)? P (A|J, M)=P (A)?Chapter 14.1–3 14ExampleSuppose we choose the ordering M, J, A, B, EMaryCallsAlarmBurglaryJohnCallsP (J|M)=P (J)?NoP (A|J, M )=P (A|J)? P (A|J, M)=P (A)?NoP (B|A, J, M )=P (B|A)?P (B|A, J, M )=P (B)?Chapter 14.1–3 15ExampleSuppose we choose the ordering M, J, A, B, EMaryCallsAlarmBurglaryEarthquakeJohnCallsP (J|M)=P (J)?NoP (A|J, M )=P (A|J)? P (A|J, M)=P (A)?NoP (B|A, J, M )=P (B|A)?YesP (B|A, J, M )=P (B)?NoP (E|B,A,J,M)=P(E|A)?P (E|B,A,J,M)=P(E|A, B)?Chapter 14.1–3 16ExampleSuppose we choose the ordering M, J, A, B, EMaryCallsAlarmBurglaryEarthquakeJohnCallsP (J|M)=P (J)?NoP (A|J, M )=P (A|J)? P (A|J, M)=P (A)?NoP (B|A, J, M )=P (B|A)?YesP (B|A, J, M )=P (B)?NoP (E|B,A,J,M)=P(E|A)?NoP (E|B,A,J,M)=P(E|A, B)?YesChapter 14.1–3 17Example contd.MaryCallsAlarmBurglaryEarthquakeJohnCallsDeciding conditional independence is hard in noncausal directions(Causal models and conditional independence seem hardwired for humans!)Assessing conditional probabilities is hard in noncausal directionsNetwork is less compact: 1+2+4+2+4=13numbers neededChapter 14.1–3 18Example: Car diagnosisInitial evidence: car won’t startTestable variables (green), “broken, so fix it” variables (orange)Hidden variables (gray) ensure sparse structure, reduce parameterslightsno oil no gasstarterbrokenbattery agealternator brokenfanbeltbrokenbattery deadno chargingbattery flatgas gaugefuel lineblockedoil lightbattery metercar won’t startdipstickChapter 14.1–3 19Example: Car insuranceSocioEconAgeGoodStudentExtraCarMileageVehicleYearRiskAversionSeniorTrainDrivingSkillMakeModelDrivingHistDrivQualityAntilockAirbagCarValueHomeBaseAntiTheftTheftOwnDamagePropertyCostLiabilityCostMedicalCostCushioningRuggednessAccidentOtherCostOwnCostChapter 14.1–3 20Compact conditional distributionsCPT grows exponentially with number of parentsCPT becomes infinite with continuous-valued parent or childSolution: canonical distributions that are defined compactlyDeterministic nodes are the simplest case:X = f(Parents(X)) for some function fE.g., Boolean functionsNorthAmerican ⇔ Canadian ∨ US ∨MexicanE.g., numerical relationships among continuous variables∂Level∂t= inflow + precipitation - outflow - evaporationChapter 14.1–3 21Compact conditional distributions contd.Noisy-OR distributions model multiple noninteracting causes1) Parents U1...Ukinclude all causes (can add leak node)2) Independent failure probability qifor each cause alone⇒ P(X|U1...Uj, ¬Uj+1...¬Uk)=1− Πji =1qiCold F lu Malaria P(Fever) P (¬Fever)FFF0.0 1.0FFT0.9 0.1FTF0.8 0.2FTT0.98 0.02 = 0.2 × 0.1TFF0.4 0.6TFT0.94 0.06 = 0.6 × 0.1TTF0.88 0.12 = 0.6 × 0.2TTT0.988 0.012 = 0.6 × 0.2 × 0.1Number of parameters linear in number of parentsChapter 14.1–3 22Hybrid (discrete+continuous) networksDiscrete (Subsidy? and Buys?); continuous (Harvest and Cost)Buys?HarvestSubsidy?CostOption 1: discretization—possibly large errors, large CPTsOption 2: finitely parameterized


View Full Document

U of M CS 5541 - Bayesian networks

Download Bayesian networks
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayesian networks and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayesian networks 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?