Pitt CS 2750 - Bayesian belief networks - D447307

Home> Schools> University of Pittsburgh> Computer Science (CS) > CS 2750> Bayesian belief networks

Pitt CS 2750 - Bayesian belief networks

Pages 26

Download Save

Unformatted text preview:

1CS 2750 Machine LearningCS 2750 Machine LearningLecture 14Milos [email protected] Sennott SquareBayesian belief networksCS 2750 Machine LearningDensity estimationData: Attributes:• modeled by random variables with:– Continuous values– Discrete valuesE.g. blood pressure with numerical values or chest pain with discrete values [no-pain, mild, moderate, strong]Underlying true probability distribution:},..,,{21 nDDDD =iiD x=a vector of attribute values},,,{21 dXXX K=X)(Xp2CS 2750 Machine LearningDensity estimationData: Objective: try to estimate the underlying true probability distribution over variables , , using examples in DStandard (iid) assumptions: Samples• are independent of each other• come from the same (identical) distribution (fixed )},..,,{21 nDDDD =iiD x=a vector of attribute valuesX)(Xp},..,,{21 nDDDD=n samplestrue distributionestimate)(ˆXp)(Xp)(XpCS 2750 Machine LearningLearning via parameter estimationIn this lecture we consider parametric density estimationBasic settings:• A set of random variables • A model of the distribution over variables in Xwith parameters : • DataObjective: find the parameters that explain best the observed data},,,{21 dXXX K=XΘ},..,,{21 nDDDD =Θ)|(ˆΘXp3CS 2750 Machine LearningParameter estimation • Maximum likelihood (ML)– yields: one set of parameters– the target distribution is approximated as:• Bayesian parameter estimation– uses the posterior distribution over possible parameters– Yields: all possible settings of (and their “weights”)– The target distribution is approximated as: ),|(ξΘDpmaximize)|()|(),|(),|(ξξξξDppDpDpΘΘ=ΘMLΘΘ)()(ˆMLpp Θ|XX=∫==ΘΘΘΘ|XX dDpXpDpp ),|()|()()(ˆξCS 2750 Machine LearningParameter estimation. Other possible criteria:• Maximum a posteriori probability (MAP)– Yields: one set of parameters– Approximation:• Expected value of the parameter– Expectation taken with regard to posterior– Yields: one set of parameters– Approximation:maximize),|(ξDp Θ(mode of the posterior)MAPΘ)(ˆΘΘ E=)()(ˆMAPpp Θ|XX=),|(ξDp Θ)ˆ()(ˆΘ|XX pp =(mean of the posterior)4CS 2750 Machine LearningDensity estimation• So far we have covered density estimation for “simple” distribution models:– Bernoulli– Binomial– Multinomial– Gaussian– PoissonBut what if:• The dimension of is large– Example: patient data• Compact parametric distributions do not seem to fit the data– E.g.: multivariate Gaussian may not fit• We have only a “small” number of examples to do accurate parameter estimates },,,{21 dXXX K=XCS 2750 Machine LearningHow to learn complex distributionsHow to learn complex multivariate distributions with large number of variables?One solution:• Decompose the distribution along conditional independence relations • Decompose the parameter estimation problem to a set of smaller parameter estimation tasksDecomposition of distributions under conditional independence assumption is the main idea behind Bayesian belief networks)(ˆXp5CS 2750 Machine LearningExampleProblem description:• Disease: pneumonia• Patient symptoms (findings, lab tests):– Fever, Cough, Paleness, WBC (white blood cells) count, Chest pain, etc.Representation of a patient case: • Symptoms and disease are represented as random variablesOur objectives: • Describe a multivariate distribution representing the relations between symptoms and disease• Design of inference and learning procedures for the multivariate modelCS 2750 Machine LearningJoint probability distributionJoint probability distribution (for a set variables)• Defines probabilities for all possible assignments to values of variables in the set)(WBCcountP005.0993.0 002.0),(WBCcountpneumoniaPhighnormal lowPneumoniaTrueFalseWBCcount0008.00042.00001.09929.00001.00019.0)(PneumoniaP001.0999.0Marginalization (summing of rows, or columns)- summing out variablestable32×6CS 2750 Machine LearningVariable independence• The joint distribution over a subset of variables can be always computed from the joint distribution through marginalization • Not the other way around !!! – Only exception: when variables are independent)(WBCcountP005.0993.0 002.0),( WBCcountpneumoniaPhighnormal lowPneumoniaTrueFalseWBCcount0008.00042.00001.09929.00001.00019.0)(PneumoniaP001.0999.0)()(),( BPAPBAP=CS 2750 Machine LearningConditional probabilityConditional probability :• Probability of A given B• Conditional probability is defined in terms of joint probabilities• Joint probabilities can be expressed in terms of conditional probabilities• Conditional probability – is useful for various probabilistic inferences ),,|( TrueCoughhighWBCcountTrueFeverTruePneumoniaP ====)(),()|(BPBAPBAP =)()|(),( BPBAPBAP==),,(21 nXXXP K∏=−niiiXXXP11,1)|( K(product rule)(chain rule)7CS 2750 Machine LearningModeling uncertainty with probabilities• Full joint distribution:– joint distribution over all random variables that define the domain– it is sufficient to do any type of probabilistic inferences CS 2750 Machine LearningInferenceAny query can be computed from the full joint distribution !!!• Joint over a subset of variables is obtained through marginalization• Conditional probability over a set of variables, given other variables’ values is obtained through marginalization and definition of conditionals ),,,(),(∑∑=======ijjidDcCbBaAPcCaAP ),,,(),,,(∑∑∑=========ijjiiidDcCbBaAPdDcCbBaAP ),(),,(),|(cCaAPdDcCaAPcCaAdDP=========8CS 2750 Machine LearningInference.Any query can be computed from the full joint distribution !!!• Any joint probability can be expressed as a product of conditionals via the chain rule. • It is often easier to define the distribution in terms of conditional probabilities:– E.g. )()|(),,(1,11,121 −−=nnnnXXPXXXPXXXP KKK)()|()|(2,12,111,1 −−−−=nnnnnXXPXXXPXXXP KKK∏=−=niiiXXXP11,1)|( K)|( TPneumoniaFever=P)|( FPneumoniaFever=PCS 2750 Machine LearningModeling uncertainty with probabilities• Full joint distribution: joint distribution over all random variables defining the domain– it is sufficient to represent the complete domain and to do any type of probabilistic inferences Problems:– Space complexity. To store full joint distribution requires to remember numbers.n – number of random

View Full Document


School:
Email:
New Password:
Confirm Password:

Pitt CS 2750 - Bayesian belief networks

Sign up for free to view:

Please select your school