15-781 Machine Learning (Recitation 1)EntropyEntropy H(*)Specific Conditional Entropy H(Y|X=v)Slide 6Conditional Entropy H(Y|X)Conditional EntropyInformation GainDecision TreeSlide 11Tree pruningProbabilityTest your understandingSlide 1515-781 Machine Learning(Recitation 1)By Jimeng Sun9/15/05•Entropy, Information Gain•Decision Tree•ProbabilityEntropySuppose X can have one of m values… V1, V2, … Vm What’s the smallest possible number of bits, on average, per symbol, needed to transmit a stream of symbols drawn from X’s distribution? It’sH(X) = The entropy of X•“High Entropy” means X is from a uniform (boring) distribution•“Low Entropy” means X is from varied (peaks and valleys) distributionEntropymmppppppXH2222121logloglog)( P(X=V1) = p1P(X=V2) = p2…. P(X=Vm) = pmmjjjpp12logEntropy H(*)H(X) = 1.5H(Y) = 1X = College MajorY = Likes “XBOX”X YMath YesHistory NoCS YesMath NoMath NoCS YesHistory NoMath YesDefinition of Specific Conditional Entropy: H(Y |X=v) = The entropy of Y among only those records in which X has value vX = College MajorY = Likes “XBOX”X YMath YesHistory NoCS YesMath NoMath NoCS YesHistory NoMath YesSpecific Conditional Entropy H(Y|X=v)Definition of Specific Conditional Entropy: H(Y |X=v) = The entropy of Y among only those records in which X has value vExample:• H(Y|X=Math) = 1• H(Y|X=History) = 0• H(Y|X=CS) = 0X = College MajorY = Likes “XBOX”X YMath YesHistory NoCS YesMath NoMath NoCS YesHistory NoMath YesSpecific Conditional Entropy H(Y|X=v)Conditional Entropy H(Y|X)Definition of Conditional Entropy:H(Y |X) = The average specific conditional entropy of Y= if you choose a record at random what will be the conditional entropy of Y, conditioned on that row’s value of X= Expected number of bits to transmit Y if both sides will know the value of X= Σj Prob(X=vj) H(Y | X = vj)X = College MajorY = Likes “XBOX”X YMath YesHistory NoCS YesMath NoMath NoCS YesHistory NoMath YesConditional EntropyDefinition of Conditional Entropy: H(Y|X) = The average conditional entropy of Y= ΣjProb(X=vj) H(Y | X = vj)X = College MajorY = Likes “XBOX”Example:vjProb(X=vj) H(Y | X = vj)Math 0.5 1History 0.25 0CS 0.25 0H(Y|X) = 0.5 * 1 + 0.25 * 0 + 0.25 * 0 = 0.5X YMath YesHistory NoCS YesMath NoMath NoCS YesHistory NoMath YesInformation GainDefinition of Information Gain: IG(Y|X) = I must transmit Y. How many bits on average would it save me if both ends of the line knew X?IG(Y|X) = H(Y) - H(Y | X)X = College MajorY = Likes “XBOX”Example:• H(Y) = 1• H(Y|X) = 0.5• Thus IG(Y|X) = 1 – 0.5 = 0.5X YMath YesHistory NoCS YesMath NoMath NoCS YesHistory NoMath YesDecision TreeTree pruningProbabilityTest your understanding? ][][][ does ever) (if When :Question YVarXVa rYXVar •All the time?•Only when X and Y are independent?•It can fail even if X and Y are
View Full Document