Unformatted text preview:

15-780: Graduate ArtificialIntelligenceProbabilistic Reasoning and InferenceAdvantages of probabilisticreasoning• Appropriate for complex, uncertain, environments - Will it rain tomorrow?• Applies naturally to many domains - Robot predicting the direction of road, biology, Word paper clip• Allows to generalize acquired knowledge andincorporate prior belief - Medical diagnosis• Easy to integrate different information sources - Robot’s sensorsExamples• Unmanned vehiclesExamples: Speech processingExample: Biological dataATGAAGCTACTGTCTTCTATCGAACAAGCATGCGATATTTGCCGACTTAAAAAGCTCAAGTGCTCCAAAGAAAAACCGAAGTGCGCCAAGTGTCTGAAGAACAACTGGGAGTGTCGCTACTCTCCCAAAACCAAAAGGTCTCCGCTGACTAGGGCACATCTGACAGAAGTGGAATCAAGGCTAGAAAGACTGGAACAGCTATTTCTACTGATTTTTCCTCGAGAAGACCTTGACATGATTBasic notations• Random variable - referring to an element / event whose status is unknown: A = “it will rain tomorrow”• Domain - The set of values a random variable can take: - “A = The stock market will go up this year”: Binary - “A = Number of Steelers wins in 2007”: Discrete - “A = % change in Google stock in 2007”: ContinuousPriorsP(rain tomorrow) = 0.2P(no rain tomorrow) = 0.8RainNo rainDegree of beliefin an event in theabsence of anyother informationConditional probability• P(A = 1 | B = 1): The fraction of cases where A is true ifB is trueP(A = 0.2) P(A|B = 0.5)Conditional probability• In some cases, given knowledge of one or morerandom variables we can improve upon our priorbelief of another random variable• For example: p(slept in movie) = 0.5 p(slept in movie | liked movie) = 1/3 p(didn’t sleep in movie | liked movie) = 2/30.3100.1000.4010.211PSleptLikedmovieJoint distributions• The probability that a set of randomvariables will take a specific value is theirjoint distribution.• Notation: P(A ∧ B) or P(A,B)• Example: P(liked movie, slept)0.3100.1000.4010.211PSleptLikedmovieJoint distribution (cont)14323131251231521652212133422101Evaluation(1-3)Class sizeTime (regular =2,summer =1)P(class size > 20) = 0.5P(summer) = 1/3Evaluation of classesP(class size > 20, summer) = ?Joint distribution (cont)14323131251231521652212133422101Evaluation(1-3)Class sizeTime (regular =2,summer =1)P(class size > 20) = 0.5P(summer) = 1/3Evaluation of classesP(class size > 20, summer) = 0Joint distribution (cont)14323131251231521652212133422101Evaluation(1-3)Class sizeTime (regular =2,summer =1)P(class size > 20) = 0.5P(eval = 1) = 2/9P(class size > 20, eval = 1) = 2/9Evaluation of classesChain rule• The joint distribution can be specified in terms ofconditional probability: P(A,B) = P(A|B)*P(B)• Together with Bayes rule (which is actually derived fromit) this is one of the most powerful rules in probabilisticreasoningAxioms of probability(Kolmogorov’s axioms)• A variety of useful facts can be derived from just threeaxioms:1. 0 ≤ P(A) ≤ 12. P(true) = 1, P(false) = 03. P(A ∨ B) = P(A) + P(B) – P(A ∧ B)Axioms of probability(Kolmogorov’s axioms)• A variety of useful facts can be derived from just threeaxioms:1. 0 ≤ P(A) ≤ 12. P(true) = 1, P(false) = 03. P(A ∨ B) = P(A) + P(B) – P(A ∧ B)Axioms of probability(Kolmogorov’s axioms)• A variety of useful facts can be derived from just threeaxioms:1. 0 ≤ P(A) ≤ 12. P(true) = 1, P(false) = 03. P(A ∨ B) = P(A) + P(B) – P(A ∧ B)P(Steelers win the 05-06 season) = 1Axioms of probability(Kolmogorov’s axioms)• A variety of useful facts can be derived from just threeaxioms:1. 0 ≤ P(A) ≤ 12. P(true) = 1, P(false) = 03. P(A ∨ B) = P(A) + P(B) – P(A ∧ B)Axioms of probability(Kolmogorov’s axioms)• A variety of useful facts can be derived from just threeaxioms:1. 0 ≤ P(A) ≤ 12. P(true) = 1, P(false) = 03. P(A ∨ B) = P(A) + P(B) – P(A ∧ B)There have been severalother attempts to provide afoundation for probabilitytheory. Kolmogorov’saxioms are the most widelyused.Using the axioms• How can we use the axioms to prove that: P(¬A) = 1 – P(A) ?Bayes rule• One of the most important rules for AI usage.• Derived from the chain rule: P(A,B) = P(A | B)P(B) = P(B | A)P(A)• Thus,Thomas Bayes wasan Englishclergyman who setout his theory ofprobability in 1764.)()()|()|(BPAPABPBAP =Bayes rule (cont)Often it would be useful to derive the rule a bitfurther:!==AAPABPAPABPBPAPABPBAP)()|()()|()()()|()|(This results from:P(B) = ∑AP(B,A)ABABP(B,A=1)P(B,A=0)Using Bayes rule• Cards game:Place your bet on thelocation of the King!Using Bayes rule• Cards game:√Do you want tochange your bet?Using Bayes rule)()()|()|(selBPkCPkCselBPselBkCP====√Computing the (posterior) probability: P(C = k | selB)A B C)10()10|()()|()()|(==+=====CPCselBPkCPkCselBPkCPkCselBPBayes ruleUsing Bayes rule√A B C1/31/21/2 1/3 2/31/2)10()10|()()|()()|(==+=====CPCselBPkCPkCselBPkCPkCselBP= 1/3P(C=k | selB) =Joint distributions• The probability that a set of randomvariables will take a specific value is theirjoint distribution.• Requires a joint probability table tospecify the possible assignments• The table can grow very rapidly …0.3100.1000.4010.211PSleptLikedmovieHow can we decrease the number of columns inthe table?Independence• In some cases the additionalinformation does not help• In this case, the extraknowledge about rain does notchange our prediction• Slept and rain are independent!P(slept) = 0.4P(slept | rain = 1) = 0.40.0751100.150110.30010.0750000.225010111raining0.025000.1010.0511PSleptLikedmovieIndependence (cont.)• Notation: P(S | R) = P(S)• Using this we can derive the following: - P(¬S | R) = P(¬S) - P(S,R) = P(S)P(R) - P(R | S) = P(R)Independence• Independence allows for easier models, learning andinference• For our example: - P(raining, slept movie) = P(raining)P(slept movie) - Instead of 4 by 2 table (4 parameters), only 2 arerequired - The saving is even greater if we have many morevariables …• In many cases it would be useful to assumeindependence, even if its not the caseConditional independence• Two dependent random variables may becomeindependent when conditioned on a third variable: P(A,B | C) = P(A | C) P(B | C)• Example P(liked movie) = 0.5 P(slept) = 0.4 P(liked movie, slept) = 0.1 P(liked movie | long) = 0.4 P(slept | long) 0.6 P(slept, like movie | long) = 0.24Given knowledge of length,the two other variablesbecome independentBayesian networks• Bayesian


View Full Document

CMU CS 15780 - introProb

Download introProb
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view introProb and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view introProb 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?