K-State CIS 830 - Uncertain Reasoning Discussion (2 of 4): Learning Bayesian Network Structure

Unformatted text preview:

1Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceFriday, March 10, 2000William H. HsuDepartment of Computing and Information Sciences, KSUhttp://www.cis.ksu.edu/~bhsuReadings:“Learning Bayesian Network Structure from Massive Datasets”, Friedman,Nachman, and Pe’er(Reference) Section 6.11, MitchellUncertain Reasoning Discussion (2 of 4):Learning Bayesian Network StructureLecture 21Lecture 21Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceLecture OutlineLecture Outline• Suggested Reading: Section 6.11, Mitchell• Overview of Bayesian Learning (Continued)• Bayes’s Theorem (Continued)– Definition of conditional (posterior) probability– Ramifications of Bayes’s Theorem• Answering probabilistic queries• MAP hypotheses• Generating Maximum A Posteriori (MAP) Hypotheses• Generating Maximum Likelihood Hypotheses• Later– Applications of probability in KDD• Learning over text• Learning over hypermedia documents• General HCII (Yuhui Liu: March 13, 2000)– Causality (Yue Jiao: March 17, 2000)Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceProbability:Probability:Basic Definitions and AxiomsBasic Definitions and Axioms• Sample Space (Ω): Range of a Random Variable X• Probability Measure Pr(•)– Ω denotes a range of outcomes; X: Ω– Probability P: measure over 2Ω (power set of sample space, aka event space)– In a general sense, Pr(X = x ∈ Ω) is a measure of belief in X = x•P(X = x) = 0 or P(X = x) = 1: plain (aka categorical) beliefs (can’t be revised)• All other beliefs are subject to revision• Kolmogorov Axioms– 1. ∀x ∈ Ω . 0 ≤ P(X = x) ≤ 1– 2. P(Ω) ≡ ∑x ∈ Ω P(X = x) = 1– 3.• Joint Probability: P(X1 ∧ X2) ≡ Probability of the Joint Event X1 ∧ X2• Independence: P(X1 ∧ X2) = P(X1) • P(X2)()∑∞=∞==∅=∧⇒≠∋∀1ii1iiji21XPXP.XXji,X,XUKKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceChoosing HypothesesChoosing Hypotheses()[]xfmaxargx∈• Bayes’s Theorem• MAP Hypothesis– Generally want most probable hypothesis given the training data– Define: ≡ the value of x in the sample space Ω with the highest f(x)– Maximum a posteriori hypothesis, hMAP• ML Hypothesis– Assume that p(hi) = p(hj) for all pairs i, j (uniform priors, i.e., PH ~ Uniform)– Can further simplify and choose the maximum likelihood hypothesis, hML()( )()()()()hPh|DPmaxargDPhPh|DPmaxargD|hPmaxarghHhHhHhMAP∈∈∈===()()()()()()DPDhPDPhPh|DPD|hP∧==()iHhMLh|DPmaxarghi∈=Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceBayes’sBayes’s Theorem: Theorem:Query Answering (QA)Query Answering (QA)• Answering User Queries– Suppose we want to perform intelligent inferences over a database DB• Scenario 1: DB contains records (instances), some “labeled” with answers• Scenario 2: DB contains probabilities (annotations) over propositions– QA: an application of probabilistic inference• QA Using Prior and Conditional Probabilities: Example– Query: Does patient have cancer or not?– Suppose: patient takes a lab test and result comes back positive• Correct + result in only 98% of the cases in which disease is actually present• Correct - result in only 97% of the cases in which disease is not present• Only 0.008 of the entire population has this cancer– α ≡ P(false negative for H0 ≡ Cancer) = 0.02 (NB: for 1-point sample)– β ≡ P(false positive for H0 ≡ Cancer) = 0.03 (NB: for 1-point sample)– P(+ | H0) P(H0) = 0.0078, P(+ | HA) P(HA) = 0.0298 ⇒ hMAP = HA ≡ ¬Cancer()()0.020.98=−=+Cancer|PCancer|P()()0.970.03=¬−=¬+Cancer|PCancer|P()()0.9920.008=¬=CancerPCancerPKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceBasic Formulas for ProbabilitiesBasic Formulas for Probabilities• Product Rule (Alternative Statement of Bayes’s Theorem)– Proof: requires axiomatic set theory, as does Bayes’s Theorem• Sum Rule– Sketch of proof (immediate from axiomatic set theory)• Draw a Venn diagram of two sets denoting events A and B• Let A ∪ B denote the event corresponding to A ∨ B…• Theorem of Total Probability– Suppose events A1, A2, …, An are mutually exclusive and exhaustive• Mutually exclusive: i ≠ j ⇒ Ai ∧ Aj = ∅• Exhaustive: ∑ P(Ai) = 1– Then– Proof: follows from product rule and 3rd Kolmogorov axiom()()()BPBAPB|AP∧=()()()()BAPBP APBAP∧−+=∨() ( ) ( )iniiAPA|BPBP⋅=∑=1A B2Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceMAP and ML Hypotheses:MAP and ML Hypotheses:A Pattern Recognition FrameworkA Pattern Recognition Framework• Pattern Recognition Framework– Automated speech recognition (ASR), automated image recognition– Diagnosis• Forward Problem: One Step in ML Estimation– Given: model h, observations (data) D– Estimate: P(D | h), the “probability that the model generated the data”• Backward Problem: Pattern Recognition / Prediction Step– Given: model h, observations D– Maximize: P(h(X) = x | h, D) for a new X (i.e., find best x)• Forward-Backward (Learning) Problem– Given: model space H, data D– Find: h ∈ H such that P(h | D) is maximized (i.e., MAP hypothesis)• More Info– http://www.cs.brown.edu/research/ai/dynamics/tutorial/Documents/HiddenMarkovModels.html– Emphasis on a particular H (the space of hidden Markov models)Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceBayesian Learning Example:Bayesian Learning Example:Unbiased Coin [1]Unbiased Coin [1]• Coin Flip– Sample space: Ω = {Head, Tail}– Scenario: given coin is either fair or has a 60% bias in favor of Head•h1 ≡ fair coin: P(Head) = 0.5•h2 ≡ 60% bias towards Head: P(Head) = 0.6– Objective: to decide between default (null) and alternative hypotheses•A Priori (aka Prior) Distribution on H–P(h1) = 0.75, P(h2) = 0.25– Reflects learning agent’s prior beliefs regarding H– Learning is


View Full Document

K-State CIS 830 - Uncertain Reasoning Discussion (2 of 4): Learning Bayesian Network Structure

Download Uncertain Reasoning Discussion (2 of 4): Learning Bayesian Network Structure
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Uncertain Reasoning Discussion (2 of 4): Learning Bayesian Network Structure and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Uncertain Reasoning Discussion (2 of 4): Learning Bayesian Network Structure 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?