Unformatted text preview:

1Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceMonday, March 6, 2000William H. HsuDepartment of Computing and Information Sciences, KSUhttp://www.cis.ksu.edu/~bhsuReadings:“In Defense of Probability”, Cheeseman(Reference) Sections 6.1-6.5, MitchellUncertain Reasoning Discussion (1 of 4):The Case for ProbabilityLecture 21Lecture 21Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceLecture OutlineLecture Outline• Suggested Reading: Sections 6.1-6.5, Mitchell• Overview of Bayesian Learning– Framework: using probabilistic criteria to generate hypotheses of all kinds– Probability: foundations• Bayes’s Theorem– Definition of conditional (posterior) probability– Ramifications of Bayes’s Theorem• Answering probabilistic queries• MAP hypotheses• Generating Maximum A Posteriori (MAP) Hypotheses• Generating Maximum Likelihood Hypotheses• Later• Next class: learning Bayesian networks• Probabilistic methods for KDD• Learning over text, web documentsKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceBayesian LearningBayesian Learning• Framework: Interpretations of Probability [Cheeseman, 1985]– Bayesian subjectivist view• A measure of an agent’s belief in a proposition• Proposition denoted by random variable (sample space: range)• e.g., Pr(Outlook = Sunny) = 0.8– Frequentist view: probability is the frequency of observations of an event– Logicist view: probability is inferential evidence in favor of a proposition• Typical Applications– HCI: learning natural language; intelligent displays; decision support– Approaches: prediction; sensor and data fusion (e.g., bioinformatics)• Prediction: Examples– Measure relevant parameters: temperature, barometric pressure, wind speed– Make statement of the form Pr(Tomorrow’s-Weather = Rain) = 0.5– College admissions: Pr(Acceptance) ≡ p• Plain beliefs: unconditional acceptance (p = 1) or categorical rejection (p = 0)• Conditional beliefs: depends on reviewer (use probabilistic model)Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceTwo Roles for Bayesian MethodsTwo Roles for Bayesian Methods• Practical Learning Algorithms– Naïve Bayes (aka simple Bayes)– Bayesian belief network (BBN) structure learning and parameter estimation– Combining prior knowledge (prior probabilities) with observed data• A way to incorporate background knowledge (BK), aka domain knowledge• Requires prior probabilities (e.g., annotated rules)• Useful Conceptual Framework– Provides “gold standard” for evaluating other learning algorithms• Bayes Optimal Classifier (BOC)• Stochastic Bayesian learning: Markov chain Monte Carlo (MCMC)– Additional insight into Occam’s Razor (MDL)Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceProbabilistic Concepts versusProbabilistic Concepts versusProbabilistic LearningProbabilistic Learning• Two Distinct Notions: Probabilistic Concepts, Probabilistic Learning• Probabilistic Concepts– Learned concept is a function, c: X → [0, 1]–c(x), the target value, denotes the probability that the label 1 (i.e., True) isassigned to x– Previous learning theory is applicable (with some extensions)• Probabilistic (i.e., Bayesian) Learning– Use of a probabilistic criterion in selecting a hypothesis h• e.g., “most likely” h given observed data D: MAP hypothesis• e.g., h for which D is “most likely”: max likelihood (ML) hypothesis• May or may not be stochastic (i.e., search process might still be deterministic)– NB: h can be deterministic (e.g., a Boolean function) or probabilisticKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceProbability:Probability:Basic Definitions and AxiomsBasic Definitions and Axioms• Sample Space (Ω): Range of a Random Variable X• Probability Measure Pr(•)– Ω denotes a range of “events”; X: Ω– Probability Pr, or P, is a measure over Ω– In a general sense, Pr(X = x ∈ Ω) is a measure of belief in X = x•P(X = x) = 0 or P(X = x) = 1: plain (aka categorical) beliefs (can’t be revised)• All other beliefs are subject to revision• Kolmogorov Axioms– 1. ∀x ∈ Ω . 0 ≤ P(X = x) ≤ 1– 2. P(Ω) ≡ ∑x ∈ Ω P(X = x) = 1– 3.• Joint Probability: P(X1 ∧ X2) ≡ Probability of the Joint Event X1 ∧ X2• Independence: P(X1 ∧ X2) = P(X1) • P(X2)()∑∞=∞==∅=∧⇒≠∋∀1ii1iiji21XPXP.XXji,X,XUK2Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceBayes’sBayes’s Theorem Theorem• Theorem•P(h) ≡ Prior Probability of Hypothesis h– Measures initial beliefs (BK) before any information is obtained (hence prior)•P(D) ≡ Prior Probability of Training Data D– Measures probability of obtaining sample D (i.e., expresses D)•P(h | D) ≡ Probability of h Given D–| denotes conditioning - hence P(h | D) is a conditional (aka posterior) probability•P(D | h) ≡ Probability of D Given h– Measures probability of observing D given that h is correct (“generative” model)•P(h ∧ D) ≡ Joint Probability of h and D– Measures probability of observing D and of h being correct()()()()()()DPDhPDPhPh|DPD|hP∧==Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceChoosing HypothesesChoosing Hypotheses()[]xfmaxargx∈• Bayes’s Theorem• MAP Hypothesis– Generally want most probable hypothesis given the training data– Define: ≡ the value of x in the sample space Ω with the highest f(x)– Maximum a posteriori hypothesis, hMAP• ML Hypothesis– Assume that p(hi) = p(hj) for all pairs i, j (uniform priors, i.e., PH ~ Uniform)– Can further simplify and choose the maximum likelihood hypothesis, hML()( )()()()()hPh|DPmaxargDPhPh|DPmaxargD|hPmaxarghHhHhHhMAP∈∈∈===()()()()()()DPDhPDPhPh|DPD|hP∧==()iHhMLh|DPmaxarghi∈=Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial


View Full Document

K-State CIS 830 - The Case for Probability

Download The Case for Probability
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Case for Probability and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Case for Probability 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?