Unformatted text preview:

Axioms of Probability Theory Foundations of Artificial Intelligence All probabilities between 0 and 1 0 P A 1 CS101 FALL 2007 True proposition has probability 1 false has probability 0 P true 1 P false 0 The probability of disjunction is P A B P A P B P A B Bayesian Reasoning A Conditional Probability P A B is the probability of A given B Assumes that B is all and only information known Defined by A B B Independence A and B are independent iff P A B P A These two constraints are logically equivalent P B A P B Therefore if A and B are independent P A B A P A B P B A B P A B B P A B P A P B P A B P A P B Joint Distribution Probabilistic Classification The joint probability distribution for a set of random variables X1 Xn gives the probability of every combination of values an ndimensional array with vn values if all variables are discrete with v values all vn values must sum to 1 P X1 Xn negative positive circle square circle square red 0 20 0 02 red 0 05 0 30 blue 0 02 0 01 blue 0 20 0 20 The probability of all possible conjunctions assignments of values to some subset of variables can be calculated by summing the appropriate subset of values from the joint distribution P red circle 0 20 0 05 0 25 P red 0 20 0 02 0 05 0 3 0 57 Therefore all conditional probabilities can also be calculated P positive red circle P E H P H P E P H E P E Assuming Y and all Xi are binary we need 2n entries to specify P Y pos X xk for each of the 2n possible xk s since P Y neg X xk 1 P Y pos X xk Compared to 2n 1 1 entries for the joint distribution P Y X1 X2 Xn Bayesian Categorization Determine category of xk by determining for each yi Simple proof from definition of conditional probability P H E Let Y be the random variable for the class which takes values y1 y2 ym Let X be the random variable describing an instance consisting of a vector of values for n features X1 X2 Xn let xk be a possible value for X and xij a possible value for Xi For classification we need to compute P Y yi X xk for i 1 m However given no other assumptions this requires a table giving the probability of each category for each possible instance in the instance space which is impossible to accurately estimate from a reasonably sized training set P positive red circle 0 20 0 80 P red circle 0 25 Bayes Theorem P H E Def cond prob P H E Def cond prob P H P H E P E H P H P E H QED P H E P E H P H P E P Y yi X xk P Y yi P X xk Y yi P X xk P X xk can be determined since categories are complete and disjoint m P Y y i 1 m i X xk i 1 m P Y yi P X xk Y yi 1 P X xk P X xk P Y yi P X xk Y yi i 1 Bayesian Categorization cont Generative Probabilistic Models Need to know Priors P Y yi Conditionals P X xk Y yi P Y yi are easily estimated from data If ni of the examples in D are in yi then P Y yi ni D Too many possible instances e g 2n for binary features to estimate all P X xk Y yi Still need to make some sort of independence assumptions about the features to make learning tractable Na ve Bayes Generative Model Assume a simple usually unrealistic probabilistic method by which the data was generated For categorization each category has a different parameterized generative model that characterizes that category Training Use the data for each category to estimate the parameters of the generative model for that category Maximum Likelihood Estimation MLE Set parameters to maximize the probability that the model produced the given training data If M denotes a model with parameter values and Dk is the training data for the kth class find model parameters for class k k that maximize the likelihood of Dk k argmax Pto Ddetermine Testing Use Bayesian analysis k M the category model that most likely generated a specific test instance Na ve Bayes Inference Problem lg red circ neg pos pos pos neg pos neg Category neg pos pos pos neg pos neg Category med sm med lg lg lg sm sm med Size red blue red grn red red blue red circ tri tricirc circ circ circ sqr lg sm med med sm lglg sm Color Shape Size Positive red blue grn grn red blue blue grn circ sqr tri circ circ tri sqr sqr tri med sm med lg lg lg sm sm med Color Shape Size Negative red blue red grn red red blue red circ tri tricirc circ circ circ sqr lg sm med med sm lglg sm Color Shape Size Positive red blue grn grn red blue blue grn circ sqr tri circ circ tri sqr sqr tri Color Shape Negative Na ve Bayesian Categorization If we assume features of an instance are independent given the category conditionally independent n P X Y P X 1 X 2 L X n Y P X i Y i 1 Therefore we then only need to know P Xi Y for each possible pair of a feature value and a category If Y and all Xi and binary this requires specifying only 2n parameters P Xi true Y true and P Xi true Y false for each Xi P Xi false Y 1 P Xi true Y Compared to specifying 2n parameters without any independence assumptions Na ve Bayes Example Probability positive negative P Y 0 5 0 5 P small Y 0 4 0 4 P medium Y 0 1 0 2 P large Y 0 5 0 4 P red Y 0 9 0 3 P blue Y 0 05 0 3 P green Y 0 05 0 4 P square Y 0 05 0 4 P triangle Y 0 05 0 3 P circle Y 0 9 0 3 Na ve Bayes Example Probability positive negative P Y 0 5 0 5 P medium Y 0 1 0 2 P red Y 0 9 0 3 P circle Y 0 9 0 3 Test Instance medium red circle P negative X P negative P medium negative P red negative P circle negative P X 0 5 0 2 0 3 0 3 0 009 P X 0 009 0 0495 0 1818 P positive X P negative X 0 0405 P X 0 009 P X 1 P X 0 0405 0 009 0 0495 Estimating Probabilities P positive X P positive P medium positive P red positive P …


View Full Document

Brandeis CS 101A - Bayesian Reasoning

Loading Unlocking...
Login

Join to view Bayesian Reasoning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayesian Reasoning and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?