CMU CS 15780 - Probabilistic Reasoning and Inference - D1955064

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 15780> Probabilistic Reasoning and Inference

DOC PREVIEW

CMU CS 15780 - Probabilistic Reasoning and Inference

School name Carnegie Mellon University

Course Cs 15780- Graduate Artificial Intelligence

Pages 40

This preview shows page 1-2-3-19-20-38-39-40 out of 40 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

15-780: Graduate Artificial IntelligenceProbabilistic Reasoning and InferenceAdvantages of probabilistic reasoning• Appropriate for complex, uncertain, environments- Will it rain tomorrow?• Applies naturally to many domains- Robot predicting the direction of road, biology, Word paper clip• Allows to generalize acquired knowledge and incorporate prior belief- Medical diagnosis • Easy to integrate different information sources- Robot’s sensorsExamples• Unmanned vehiclesExamples: Speech processingExample: Biological dataATGAAGCTACTGTCTTCTATCGAACAAGCATGCGATATTTGCCGACTTAAAAAGCTCAAG TGCTCCAAAGAAAAACCGAAGTGCGCCAAGTGTCTGAAGAACAACTGGGAGTGTCGCTAC TCTCCCAAAACCAAAAGGTCTCCGCTGACTAGGGCACATCTGACAGAAGTGGAATCAAGG CTAGAAAGACTGGAACAGCTATTTCTACTGATTTTTCCTCGAGAAGACCTTGACATGATTBasic notations• Random variable- referring to an element / event whose status is unknown:A = “it will rain tomorrow”•Domain- The set of values a random variable can take:- “A = The stock market will go up this year”: Binary- “A = Number of Steelers wins in 2006”: Discrete- “A = % change in Google stock in 2006”: ContinuousPriorsDegree of belief in an event in the absence of any other informationRainNo rainP(rain tomorrow) = 0.2P(no rain tomorrow) = 0.8Conditional probability• P(A = 1 | B = 1): The fraction of cases where A is true if B is trueP(A = 0.2) P(A|B = 0.5)Conditional probability• In some cases, given knowledge of one or more random variables we can improve upon our prior belief of another random variable• For example:p(slept in movie) = 0.5p(slept in movie | liked movie) = 1/3p(didn’t sleep in movie | liked movie) = 2/3Liked movieSlept P110.2100.4000.1010.3Joint distributions• The probability that a set of random variables will take a specific value is their joint distribution.• Notation: P(A ∧ B) or P(A,B)• Example: P(liked movie, slept)Liked movieSlept P110.2100.4000.1010.3Joint distribution (cont)P(class size > 20) = 0.5P(summer) = 1/3Evaluation of classesP(class size > 20, summer) = 0Time (regular =2, summer =1)Class size Evaluation (1-3)11022343112226512153243111332512Joint distribution (cont)P(class size > 20) = 0.5P(eval = 1) = 2/9P(class size > 20, eval = 1) = 2/9Evaluation of classesTime (regular =2, summer =1)Class size Evaluation (1-3)11022343112226512153243111332512Chain rule• The joint distribution can be specified in terms of conditional probability:P(A,B) = P(A|B)*P(B)• Together with Bayes rule (which is actually derived from it) this is one of the most powerful rules in probabilistic reasoningAxioms of probability (Kolmogorov’s axioms)• A variety of useful facts can be derived from just three axioms:1. 0 ≤ P(A) ≤ 12. P(true) = 1, P(false) = 03. P(A ∨ B) = P(A) + P(B) – P(A ∧ B)Axioms of probability (Kolmogorov’s axioms)• A variety of useful facts can be derived from just three axioms:1. 0 ≤ P(A) ≤ 12. P(true) = 1, P(false) = 03. P(A ∨ B) = P(A) + P(B) – P(A ∧ B)Axioms of probability (Kolmogorov’s axioms)• A variety of useful facts can be derived from just three axioms:1. 0 ≤ P(A) ≤ 12. P(true) = 1, P(false) = 03. P(A ∨ B) = P(A) + P(B) – P(A ∧ B)P(Steelers win the 05-06 season) = 1Axioms of probability (Kolmogorov’s axioms)• A variety of useful facts can be derived from just three axioms:1. 0 ≤ P(A) ≤ 12. P(true) = 1, P(false) = 03. P(A ∨ B) = P(A) + P(B) – P(A ∧ B)Axioms of probability (Kolmogorov’s axioms)• A variety of useful facts can be derived from just three axioms:1. 0 ≤ P(A) ≤ 12. P(true) = 1, P(false) = 03. P(A ∨ B) = P(A) + P(B) – P(A ∧ B)There have been several other attempts to provide a foundation for probability theory. Kolmogorov’saxioms are the most widely used.Using the axioms• How can we use the axioms to prove that:P(¬A) = 1 – P(A)?Bayes rule• One of the most important rules for AI usage.• Derived from the chain rule:P(A,B) = P(A | B)P(B) = P(B | A)P(A)•Thus,)()()|()|(BPAPABPBAP =Thomas Bayes was an English clergyman who set out his theory of probability in 1764.Bayes rule (cont)Often it would be useful to derive the rule a bit further:∑==AAPABPAPABPBPAPABPBAP)()|()()|()()()|()|(P(B,A=1)P(B,A=0)ABABThis results from: P(B) = ∑AP(B,A)Using Bayes rule• Cards game:Place your bet on the location of the King!Using Bayes rule• Cards game:√Do you want to change your bet?Using Bayes ruleABC)()()|()|(selBPkCPkCselBPselBkCP====√Computing the (posterior) probability: P(C = k | selB)Bayes rule)10()10|()()|()()|(==+=====CPCselBPkCPkCselBPkCPkCselBPUsing Bayes ruleABC√P(C=k | selB) =1/31/21/2 1/3 2/3)10()10|()()|()()|(==+=====CPCselBPkCPkCselBPkCPkCselBP= 1/31/2Joint distributions• The probability that a set of random variables will take a specific value is their joint distribution.• Requires a joint probability table to specify the possible assignments• The table can grow very rapidly …Liked movieSlept P110.2100.4000.1010.3How can we decrease the number of columns in the table?IndependenceLiked movieSlept raining P111110110.151100.11000.20000.050100.150.110 0.20 0 0.05• In some cases the additional information does not help• In this case, the extra knowledge about rain does not change our prediction• Slept and rain are independent!P(slept) = 0.5P(slept | rain = 1) = 0.5Independence (cont.)• Notation: P(S | R) = P(S)• Using this we can derive the following:-P(¬S | R) = P(¬S)- P(S,R) = P(S)P(R)- P(R | S) = P(R)Independence• Independence allows for easier models, learning and inference• For our example: - P(raining, slept movie) = P(raining)P(slept movie)- Instead of 4 by 2 table (4 parameters), only 2 are required- The saving is even greater if we have many more parameters …• In many cases it would be useful to assume independence, even if its not the caseConditional independence• Two dependent random variables may become independent when conditioned on a third variable:P(A,B | C) = P(A | C) P(B | C)•ExampleP(liked movie) = 0.5P(slept) = 0.4P(liked movie, slept) = 0.1P(liked movie | long) = 0.4P(slept | long) 0.6P(slept, like movie | long) = 0.24 Given knowledge of length, the two other variables become independentBayesian networks• Bayesian networks are directed graphs with nodes representing random variables and edges representing dependency assumptionsLoLi SLong?Slept?Liked?Bayesian networks: NotationsP(Lo) = 0.5LeLi SConditional probability tables (CPTs)Conditional dependencyP(S | Lo) = 0.6P(S | ¬Lo) =

View Full Document