15 781 Machine Learning Recitation 1 By Jimeng Sun 9 15 05 Entropy Entropy Information Gain Decision Tree Probability Entropy Suppose X can have one of m values V1 V2 P X V1 p1 P X V2 p2 Vm P X Vm pm What s the smallest possible number of bits on average per symbol needed to transmit a stream of symbols drawn from X s distribution It s H X p1 log 2 p1 p2 log 2 p2 pm log 2 pm m p j log 2 p j j 1 H X The entropy of X High Entropy means X is from a uniform boring distribution Low Entropy means X is from varied peaks and valleys distribution Entropy H X College Major H X 1 5 Y Likes XBOX X Y Math Yes History No CS Yes Math No Math No CS Yes History No Math Yes H Y 1 Specific Conditional Entropy H Y X v X College Major Y Likes XBOX X Y Math Yes History No CS Yes Math No Math No CS Yes History No Math Yes Definition of Specific Conditional Entropy H Y X v The entropy of Y among only those records in which X has value v Specific Conditional Entropy H Y X v X College Major Y Likes XBOX Definition of Specific Conditional Entropy Math Yes H Y X v The entropy of Y among only those records in which X has value v History No Example CS Yes Math No H Y X Math 1 Math No CS Yes History No Math Yes X Y H Y X History 0 H Y X CS 0 Conditional Entropy H Y X X College Major Y Likes XBOX X Y Definition of Conditional Entropy H Y X The average specific conditional entropy of Y Math Yes History No if you choose a record at random CS Yes Math No Math No what will be the conditional entropy of Y conditioned on that row s value of X CS Yes History No Math Yes Expected number of bits to transmit Y if both sides will know the value of X Conditional Entropy X College Major Y Likes XBOX Definition of Conditional Entropy H Y X The average conditional entropy of Y X Y Math Yes History No CS Yes Math No Math No CS Yes History No Math Yes jProb X vj H Y X vj Example vj Math History CS Prob X vj 0 5 0 25 0 25 H Y X vj 1 0 0 H Y X 0 5 1 0 25 0 0 25 0 0 5 Information Gain X College Major Y Likes XBOX X Y Definition of Information Gain IG Y X I must transmit Y How many bits on average would it save me if both ends of the line knew X Math Yes History No CS Yes Math No Math No CS Yes H Y X 0 5 History No Thus IG Y X 1 0 5 0 5 Math Yes IG Y X H Y H Y X Example H Y 1 Decision Tree Tree pruning Probability Test your understanding Question When if ever does Var X Y Var X Var Y All the time Only when X and Y are independent It can fail even if X and Y are independent
View Full Document