1Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceFriday, March 10, 2000William H. HsuDepartment of Computing and Information Sciences, KSUhttp://www.cis.ksu.edu/~bhsuReadings:“Learning Bayesian Network Structure from Massive Datasets”, Friedman,Nachman, and Pe’er(Reference) Section 6.11, MitchellUncertain Reasoning Discussion (2 of 4):Learning Bayesian Network StructureLecture 21Lecture 21Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceLecture OutlineLecture Outline• Suggested Reading: Section 6.11, Mitchell• Overview of Bayesian Learning (Continued)• Bayes’s Theorem (Continued)– Definition of conditional (posterior) probability– Ramifications of Bayes’s Theorem• Answering probabilistic queries• MAP hypotheses• Generating Maximum A Posteriori (MAP) Hypotheses• Generating Maximum Likelihood Hypotheses• Later– Applications of probability in KDD• Learning over text• Learning over hypermedia documents• General HCII (Yuhui Liu: March 13, 2000)– Causality (Yue Jiao: March 17, 2000)Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceProbability:Probability:Basic Definitions and AxiomsBasic Definitions and Axioms• Sample Space (Ω): Range of a Random Variable X• Probability Measure Pr(•)– Ω denotes a range of outcomes; X: Ω– Probability P: measure over 2Ω (power set of sample space, aka event space)– In a general sense, Pr(X = x ∈ Ω) is a measure of belief in X = x•P(X = x) = 0 or P(X = x) = 1: plain (aka categorical) beliefs (can’t be revised)• All other beliefs are subject to revision• Kolmogorov Axioms– 1. ∀x ∈ Ω . 0 ≤ P(X = x) ≤ 1– 2. P(Ω) ≡ ∑x ∈ Ω P(X = x) = 1– 3.• Joint Probability: P(X1 ∧ X2) ≡ Probability of the Joint Event X1 ∧ X2• Independence: P(X1 ∧ X2) = P(X1) • P(X2)()∑∞=∞==∅=∧⇒≠∋∀1ii1iiji21XPXP.XXji,X,XUKKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceChoosing HypothesesChoosing Hypotheses()[]xfmaxargx∈• Bayes’s Theorem• MAP Hypothesis– Generally want most probable hypothesis given the training data– Define: ≡ the value of x in the sample space Ω with the highest f(x)– Maximum a posteriori hypothesis, hMAP• ML Hypothesis– Assume that p(hi) = p(hj) for all pairs i, j (uniform priors, i.e., PH ~ Uniform)– Can further simplify and choose the maximum likelihood hypothesis, hML()( )()()()()hPh|DPmaxargDPhPh|DPmaxargD|hPmaxarghHhHhHhMAP∈∈∈===()()()()()()DPDhPDPhPh|DPD|hP∧==()iHhMLh|DPmaxarghi∈=Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceBayes’sBayes’s Theorem: Theorem:Query Answering (QA)Query Answering (QA)• Answering User Queries– Suppose we want to perform intelligent inferences over a database DB• Scenario 1: DB contains records (instances), some “labeled” with answers• Scenario 2: DB contains probabilities (annotations) over propositions– QA: an application of probabilistic inference• QA Using Prior and Conditional Probabilities: Example– Query: Does patient have cancer or not?– Suppose: patient takes a lab test and result comes back positive• Correct + result in only 98% of the cases in which disease is actually present• Correct - result in only 97% of the cases in which disease is not present• Only 0.008 of the entire population has this cancer– α ≡ P(false negative for H0 ≡ Cancer) = 0.02 (NB: for 1-point sample)– β ≡ P(false positive for H0 ≡ Cancer) = 0.03 (NB: for 1-point sample)– P(+ | H0) P(H0) = 0.0078, P(+ | HA) P(HA) = 0.0298 ⇒ hMAP = HA ≡ ¬Cancer()()0.020.98=−=+Cancer|PCancer|P()()0.970.03=¬−=¬+Cancer|PCancer|P()()0.9920.008=¬=CancerPCancerPKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceBasic Formulas for ProbabilitiesBasic Formulas for Probabilities• Product Rule (Alternative Statement of Bayes’s Theorem)– Proof: requires axiomatic set theory, as does Bayes’s Theorem• Sum Rule– Sketch of proof (immediate from axiomatic set theory)• Draw a Venn diagram of two sets denoting events A and B• Let A ∪ B denote the event corresponding to A ∨ B…• Theorem of Total Probability– Suppose events A1, A2, …, An are mutually exclusive and exhaustive• Mutually exclusive: i ≠ j ⇒ Ai ∧ Aj = ∅• Exhaustive: ∑ P(Ai) = 1– Then– Proof: follows from product rule and 3rd Kolmogorov axiom()()()BPBAPB|AP∧=()()()()BAPBP APBAP∧−+=∨() ( ) ( )iniiAPA|BPBP⋅=∑=1A B2Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceMAP and ML Hypotheses:MAP and ML Hypotheses:A Pattern Recognition FrameworkA Pattern Recognition Framework• Pattern Recognition Framework– Automated speech recognition (ASR), automated image recognition– Diagnosis• Forward Problem: One Step in ML Estimation– Given: model h, observations (data) D– Estimate: P(D | h), the “probability that the model generated the data”• Backward Problem: Pattern Recognition / Prediction Step– Given: model h, observations D– Maximize: P(h(X) = x | h, D) for a new X (i.e., find best x)• Forward-Backward (Learning) Problem– Given: model space H, data D– Find: h ∈ H such that P(h | D) is maximized (i.e., MAP hypothesis)• More Info– http://www.cs.brown.edu/research/ai/dynamics/tutorial/Documents/HiddenMarkovModels.html– Emphasis on a particular H (the space of hidden Markov models)Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceBayesian Learning Example:Bayesian Learning Example:Unbiased Coin [1]Unbiased Coin [1]• Coin Flip– Sample space: Ω = {Head, Tail}– Scenario: given coin is either fair or has a 60% bias in favor of Head•h1 ≡ fair coin: P(Head) = 0.5•h2 ≡ 60% bias towards Head: P(Head) = 0.6– Objective: to decide between default (null) and alternative hypotheses•A Priori (aka Prior) Distribution on H–P(h1) = 0.75, P(h2) = 0.25– Reflects learning agent’s prior beliefs regarding H– Learning is
View Full Document