PowerPoint PresentationSlide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceFriday, March 10, 2000William H. HsuDepartment of Computing and Information Sciences, KSUhttp://www.cis.ksu.edu/~bhsuReadings:“Learning Bayesian Network Structure from Massive Datasets”, Friedman, Nachman, and Pe’er(Reference) Section 6.11, MitchellUncertain Reasoning Discussion (2 of 4):Learning Bayesian Network StructureLecture 21Lecture 21Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceLecture OutlineLecture Outline•Suggested Reading: Section 6.11, Mitchell•Overview of Bayesian Learning (Continued)•Bayes’s Theorem (Continued)–Definition of conditional (posterior) probability–Ramifications of Bayes’s Theorem•Answering probabilistic queries•MAP hypotheses•Generating Maximum A Posteriori (MAP) Hypotheses•Generating Maximum Likelihood Hypotheses•Later–Applications of probability in KDD•Learning over text•Learning over hypermedia documents•General HCII (Yuhui Liu: March 13, 2000)–Causality (Yue Jiao: March 17, 2000)Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceProbability:Probability:Basic Definitions and AxiomsBasic Definitions and Axioms•Sample Space (): Range of a Random Variable X•Probability Measure Pr(-) denotes a range of outcomes; X: –Probability P: measure over 2 (power set of sample space, aka event space)–In a general sense, Pr(X = x ) is a measure of belief in X = x•P(X = x) = 0 or P(X = x) = 1: plain (aka categorical) beliefs (can’t be revised)•All other beliefs are subject to revision•Kolmogorov Axioms–1. x . 0 P(X = x) 1–2. P() x P(X = x) = 1–3. •Joint Probability: P(X1 X2) Probability of the Joint Event X1 X2•Independence: P(X1 X2) = P(X1) - P(X2) 1ii1iiji21XPXP.XXji,X,XKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceChoosing HypothesesChoosing Hypotheses xfmaxargΩx•Bayes’s Theorem•MAP Hypothesis–Generally want most probable hypothesis given the training data–Define: the value of x in the sample space with the highest f(x)–Maximum a posteriori hypothesis, hMAP•ML Hypothesis–Assume that p(hi) = p(hj) for all pairs i, j (uniform priors, i.e., PH ~ Uniform)–Can further simplify and choose the maximum likelihood hypothesis, hML hPh|DPmaxargDPhPh|DPmaxargD|hPmaxarghHhHhHhMAP DPDhPDPhPh|DPD|hP iHhMLh|DPmaxarghiKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceBayes’s Theorem:Bayes’s Theorem:Query Answering (QA)Query Answering (QA)•Answering User Queries–Suppose we want to perform intelligent inferences over a database DB•Scenario 1: DB contains records (instances), some “labeled” with answers•Scenario 2: DB contains probabilities (annotations) over propositions –QA: an application of probabilistic inference•QA Using Prior and Conditional Probabilities: Example–Query: Does patient have cancer or not?–Suppose: patient takes a lab test and result comes back positive•Correct + result in only 98% of the cases in which disease is actually present•Correct - result in only 97% of the cases in which disease is not present•Only 0.008 of the entire population has this cancer P(false negative for H0 Cancer) = 0.02 (NB: for 1-point sample) P(false positive for H0 Cancer) = 0.03 (NB: for 1-point sample)–P(+ | H0) P(H0) = 0.0078, P(+ | HA) P(HA) = 0.0298 hMAP = HA Cancer 0.020.98Cancer|PCancer|P 0.970.03Cancer|PCancer|P 0.9920.008CancerPCancerPKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceBasic Formulas for ProbabilitiesBasic Formulas for Probabilities•Product Rule (Alternative Statement of Bayes’s Theorem)–Proof: requires axiomatic set theory, as does Bayes’s Theorem•Sum Rule–Sketch of proof (immediate from axiomatic set theory)•Draw a Venn diagram of two sets denoting events A and B•Let A B denote the event corresponding to A B…•Theorem of Total Probability–Suppose events A1, A2, …, An are mutually exclusive and exhaustive •Mutually exclusive: i j Ai Aj = •Exhaustive: P(Ai) = 1–Then–Proof: follows from product rule and 3rd Kolmogorov axiom BPBAPB|AP BAPBP APBAP iniiAPA|BPBP 1A BKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceMAP and ML Hypotheses:MAP and ML Hypotheses:A Pattern Recognition FrameworkA Pattern Recognition Framework•Pattern Recognition Framework–Automated speech recognition (ASR), automated image recognition–Diagnosis•Forward Problem: One Step in ML Estimation–Given: model h, observations (data) D–Estimate: P(D | h), the “probability that the model generated the data”•Backward Problem: Pattern Recognition / Prediction Step–Given: model h, observations D–Maximize: P(h(X) = x | h, D) for a new X (i.e., find best x)•Forward-Backward (Learning) Problem–Given: model space H, data D–Find: h H such that P(h | D) is maximized (i.e., MAP hypothesis)•More Info–http://www.cs.brown.edu/research/ai/dynamics/tutorial/Documents/ HiddenMarkovModels.html–Emphasis on a particular H (the space of hidden Markov models)Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceBayesian Learning Example:Bayesian Learning Example:Unbiased Coin [1]Unbiased Coin [1]•Coin Flip–Sample space: = {Head, Tail}–Scenario: given coin is either fair or has a 60% bias in favor of Head•h1
View Full Document