UT CS 343 - Need for Probabilistic Reasoning - D1711375

Home> Schools> University of Texas at Austin> Computer Science (CS) > CS 343> Need for Probabilistic Reasoning

DOC PREVIEW

UT CS 343 - Need for Probabilistic Reasoning

School name University of Texas at Austin

Course Cs 343- Artificial Intelligence

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

11CS 343: Artificial IntelligenceProbabilistic Reasoning andNaïve BayesRaymond J. MooneyUniversity of Texas at Austin2Need for Probabilistic Reasoning• Most everyday reasoning is based on uncertain evidence and inferences.• Classical logic, which only allows conclusions to be strictly true or strictly false, does not account for this uncertainty or the need to weigh and combine conflicting evidence.• Straightforward application of probability theory is impractical since the large number of probability parameters required are rarely, if ever, available.• Therefore, early expert systems employed fairly ad hoc methods for reasoning under uncertainty and for combining evidence.• Recently, methods more rigorously founded in probability theory that attempt to decrease the amount of conditional probabilities required have flourished.3Axioms of Probability Theory• All probabilities between 0 and 1• True proposition has probability 1, false has probability 0. P(true) = 1 P(false) = 0.• The probability of disjunction is:1)(0≤≤AP)()()()( BAPBPAPBAP∧−+=∨ABBA∧4Conditional Probability • P(A | B) is the probability of A given B• Assumes that B is all and only information known.• Defined by:)()()|(BPBAPBAP∧=A BBA∧5Independence• A and B are independent iff:• Therefore, if A and B are independent:)()|( APBAP=)()|( BPABP=)()()()|( APBPBAPBAP =∧=)()()( BPAPBAP=∧These two constraints are logically equivalent6Classification (Categorization)• Given:– A description of an instance, x∈X, where X is the instance language or instance space.– A fixed set of categories: C={c1, c2,…cn}• Determine:– The category of x: c(x)∈C, where c(x) is a categorization function whose domain is X and whose range is C.– If c(x) is a binary function C={0,1} ({true,false}, {positive, negative}) then it is called a concept.27Learning for Categorization• A training example is an instance x∈X, paired with its correct category c(x): <x, c(x)> for an unknown categorization function, c. • Given a set of training examples, D.• Find a hypothesized categorization function, h(x), such that:)()(: )(, xcxhDxcx=∈><∀Consistency8Sample Category Learning Problem• Instance language: <size, color, shape>– size ∈ {small, medium, large}– color ∈ {red, blue, green}– shape ∈ {square, circle, triangle}• C = {positive, negative}• D:Example Size Color Shape Category1 small red circle positive2 large red circle positive3 small red triangle negative4 large blue circle negative9Joint Distribution• The joint probability distribution for a set of random variables, X1,…,Xngives the probability of every combination of values (an n-dimensional array with vnvalues if all variables are discrete with vvalues, all vnvalues must sum to 1): P(X1,…,Xn)• The probability of all possible conjunctions (assignments of values to some subset of variables) can be calculated by summing the appropriate subset of values from the joint distribution.• Therefore, all conditional probabilities can also be calculated.circle squarered 0.20 0.02blue 0.02 0.01circle squarered 0.05 0.30blue 0.20 0.20positivenegative25.005.020.0)(=+=∧circleredP80.025.020.0)()()|( ==∧∧∧=∧circleredPcircleredpositivePcircleredpositiveP57.03.005.002.020.0)(=+++=redP10Probabilistic Classification• Let Y be the random variable for the class which takes values {y1,y2,…ym}.• Let X be the random variable describing an instance consisting of a vector of values for n features <X1,X2…Xn>, let xk be a possible value for X and xija possible value for Xi.• For classification, we need to compute P(Y=yi| X=xk) for i=1…m• However, given no other assumptions, this requires a table giving the probability of each category for each possible instance in the instance space, which is impossible to accurately estimate from a reasonably-sized training set.– Assuming Y and all Xiare binary, we need 2nentries to specify P(Y=pos | X=xk) for each of the 2npossible xk’s sinceP(Y=neg | X=xk) = 1 – P(Y=pos | X=xk) – Compared to 2n+1 – 1 entries for the joint distribution P(Y,X1,X2…Xn)11Bayes TheoremSimple proof from definition of conditional probability:)()()|()|(EPHPHEPEHP =)()()|(EPEHPEHP∧=)()()|(HPEHPHEP∧=)()|()( HPHEPEHP=∧QED:(Def. cond. prob.)(Def. cond. prob.))()()|()|(EPHPHEPEHP =12Bayesian Categorization• Determine category of xkby determining for each yi• P(X=xk) can be determined since categories are complete and disjoint.)()|()()|(kikikixXPyYxXPyYPxXyYP=======∑∑==========mikikimikixXPyYxXPyYPxXyYP111)()|()()|(∑======miikikyYxXPyYPxXP1)|()()(313Bayesian Categorization (cont.)• Need to know:– Priors: P(Y=yi) – Conditionals: P(X=xk| Y=yi)• P(Y=yi) are easily estimated from data. – If niof the examples in D are in yi then P(Y=yi) = ni / |D|• Too many possible instances (e.g. 2n for binary features) to estimate all P(X=xk| Y=yi).• Still need to make some sort of independence assumptions about the features to make learning tractable.14Generative Probabilistic Models• Assume a simple (usually unrealistic) probabilistic method by which the data was generated.• For categorization, each category has a different parameterized generative model that characterizes that category.• Training: Use the data for each category to estimate the parameters of the generative model for that category. – Maximum Likelihood Estimation (MLE): Set parameters to maximize the probability that the model produced the given training data.– If Mλdenotes a model with parameter values λ and Dkis the training data for the kth class, find model parameters for class k(λk) that maximize the likelihood of Dk:• Testing: Use Bayesian analysis to determine the category model that most likely generated a specific test instance.)|(argmaxλλλMDPkk=15Naïve Bayes Generative ModelSize Color Shape Size Color Shape PositiveNegativeposnegpospospos negnegsmmedlglgmedsmsmmedlgredredredredredbluebluegrncirccirccirccircsqrtritricircsqrtrismlgmedsmlgmedlgsmblueredgrnbluegrnredgrnbluecircsqrtricircsqrcirctriCategory16Naïve Bayes Inference ProblemSize Color Shape Size Color Shape PositiveNegativeposnegpospospos negnegsmmedlglgmedsmsmmedlgredredredredredbluebluegrncirccirccirccircsqrtritricircsqrtrismlgmedsmlgmedlgsmblueredgrnbluegrnredgrnbluecircsqrtricircsqrcirctriCategorylg red circ ?? ??17Naïve Bayesian

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 6 pages.

UT CS 343 - Need for Probabilistic Reasoning

Sign up for free to view:

Please select your school