The BasicsWhere do we start?TodayEvent spacesProbability distribution P over (,S)Interpretations of probability – A can of worms!Conditional probabilitiesTwo of the most important rules of the semester: 1. The chain ruleTwo of the most important rules of the semester: 2. Bayes ruleMost important concept: a) IndependenceMost important concept: b) Conditional independenceRandom variableMarginal distributionJoint distribution, MarginalizationMarginalization – The general caseBasic concepts for random variablesConditionally independent random variablesProperties of independenceBayesian networksHandwriting recognitionWebpage classificationHandwriting recognition 2Webpage classification 2Let’s start on BNs…What if variables are independent?Conditional parameterization – two nodesConditional parameterization – three nodesThe naïve Bayes model – Your first real Bayes NetWhat you need to knowNext classThe BasicsGraphical Models – 10708Carlos GuestrinCarnegie Mellon UniversitySeptember 12th, 2005Where do we start? From Bayesian networks “Complete” BN presentation first Representation Exact inference Learning Only discrete variables for now Later in the semester Undirected models Approximate inference Continuous Temporal model And more… Class focuses on fundamentals – Understand the foundation and basic conceptsToday Probabilities Independence Two nodes make a BN Naïve Bayes Should be a review for everyone – Setting up notation for the classEvent spaces Outcome space Ω Measurable events S Each α∈S is a subset of Ω Must contain Empty event ∅ Trivial event Ω Closed under Union: α∪β∈S Complement: α∈S, then Ω-α also in SProbability distribution P over (Ω,S) P(α)≥ 0 P(Ω)=1 If α∪β=∅, then P(α∪β) = P(α)+P(β) From here, you can prove a lot, e.g., P(∅)=0 P(α∪β) = P(α)+P(β)- P(α∩β)Interpretations of probability –A can of worms! Frequentists P(α) is the frequency of α in the limit Many arguments against this interpretation What is the frequency of the event “it will rain tomorrow”?Subjective interpretation P(α) is my degree of belief that α will happen What the …. does “degree of belief mean? If I say P(α)=0.8, then I am willing to bet!!! For this class, we (mostly) don’t care what camp you are inConditional probabilities After learning that α is true, how do we feel about β? P(β|α)Two of the most important rules of the semester: 1. The chain rule P(α∩β)=P(α)P(β|α) More generally: P(α1,…,αk)= P(α1) P(α2|α1)···P(αk|α1∩…∩αk-1)Two of the most important rules of the semester: 2. Bayes ruleMore generally: Most important concept: a) Independence α and β independent, if P(β|α)=P(β) P ² (α⊥β) Proposition: α and β independent if and only if P(α∩β)=P(α)P(β)Most important concept: b) Conditional independence Independence is rarely true, but conditionally… α and β conditionally independent given γ if P(β|α∩γ)=P(β|γ) P ² (α⊥β| γ)Proposition: P ² (α⊥β| γ) if and only if P(α∩β |γ)=P(α |γ)P(β |γ)Random variable Events are complicated – we think about attributes Age, Grade, HairColor Random variables formalize attributes: Grade=A shorthand for event {ω∈Ω: fGrade(ω) = A} Properties of random vars: Val(X) = possible values of random var X For discrete (categorical): ∑i=1…|Val(X)| P(X=xi) = 1 For continuous: ∫xp(X=x)dx = 1Marginal distribution Probability P(X) of possible outcomes XJoint distribution, Marginalization Two random variables – Grade & Intelligence Marginalization – Compute marginal over single varMarginalization – The general case Compute marginal distribution P(Xi):Basic concepts for random variables Atomic outcome: assignment x1,…,xnto X1,…,Xn Conditional probability: P(X,Y)=P(X)P(Y|X) Bayes rule: P(X|Y)= Chain rule: P(X1,…,Xn) = P(X1)P(X2|X1)···P(Xk|X1,…,Xk-1)Conditionally independent random variables Sets of variables X, Y, Z X is independent of Y given Z if P ²(X=x⊥Y=y|Z=z), ∀ x∈Val(X), y∈Val(Y), z∈Val(Z) Shorthand: Conditional independence: P ² (X ⊥ Y | Z) For P ² (X ⊥ Y | ∅), write P ² (X ⊥ Y) Proposition: P statisfies (X ⊥ Y | Z) if and only if P(X,Y|Z) = P(X|Z) P(Y|Z)Properties of independence Symmetry: (X ⊥ Y | Z) ⇒ (Y ⊥ X | Z) Decomposition: (X ⊥ Y,W | Z) ⇒ (X ⊥ Y | Z) Weak union: (X ⊥ Y,W | Z) ⇒ (X ⊥ Y | Z,W) Contraction: (X ⊥ W | Y,Z)& (X ⊥ Y | Z) ⇒ (X ⊥ Y,W | Z) Intersection: (X ⊥ Y | W,Z)& (X ⊥ W | Y,Z) ⇒ (X ⊥ Y,W | Z) Only for positive distributions! P(α)>0, ∀α, α≠∅ Notation: I(P) – independence properties entailed by PBayesian networks One of the most exciting advancements in statistical AI in the last 10-15 years Compact representation for exponentially-large probability distributions Fast marginalization too Exploit conditional independenciesHandwriting recognitionWebpage classificationCompany home pagevsPersonal home pagevsUniveristy home pagevs…Handwriting recognition 2Webpage classification 2Let’s start on BNs… Consider P(X_i) Assign probability to each x_i \in Val(X_i) Independent parameters Consider P(X_1,…,X_n) How many independent parameters if |Val(X_i)|=k?What if variables are independent? What if variables are independent? (Xi⊥ Xj), ∀ i,j Not enough!!! (See homework 1 ☺) Must assume that (X ⊥ Y), ∀ X,Y subsets of {X1,…,Xn} Can write P(X1,…,Xn) = ∏i=1…nP(Xi) How many independent parameters now?Conditional parameterization –two nodes Grade is determined by IntelligenceConditional parameterization –three nodes Grade and SAT score are determined by Intelligence (G ⊥ S | I)The naïve Bayes model –Your first real Bayes Net Class variable: C Evidence variables: X1,…,Xn assume that (X ⊥ Y | C), ∀ X,Y subsets of {X1,…,Xn}What you need to know Basic definitions of probabilities Independence Conditional independence The chain rule Bayes rule Naïve BayesNext class We’ve heard of Bayes nets, we’ve played with Bayes nets, we’ve even used them in your research Next class, we’ll learn the
View Full Document