Information and Interaction Among Features 36 350 Data Mining September 13 2006 Reading Aleks Jakulin and Ivan Bratko Quantifying and Visualizing Attribute Interactions http arxiv org abs cs AI 0308002 also on Blackboard Last time we set up a game in which we went to a random position in the document and tested for a particular word This tends to give small expected information since the answer is usually no A different question we could ask is Is this word present anywhere in the document The expected information we get from this question can be computed from a table of word presence counts For example this is the table for testing if the word car is present in the larger collection of 200 documents word label FALSE TRUE auto 47 53 moto 95 5 This table tells you that car is present in 58 documents 53 of which are in the auto group and 5 of which are in the moto group The other column describes the cases where car is not present The expected information is 0 21 bits compared to 0 0025 for the random position test We can do this computation for all words and sort them as shown in the figure Interestingly the most informative word is not car but DoD the abbreviation for a motorcycle club called Denizens of Doom So far we ve been computing the information content in a single word Suppose we are allowed to ask another question about the document after getting the answer to the first question The best second question is not necessarily the second best first question For example if you know car is present it doesn t help as much to know that cars is also present This effect is called interaction Whereas correlation and information are properties between two variables interaction is a property between three variables Interaction is when measuring one variable changes the importance of another variable Conditional information is the information X gives about C when a third variable Y is already known I C X Y y H C Y y H C X Y y 1 actual conditional information I C X Y X Pr Y y I C X Y y expected conditional information y Interaction measures how Y changes the information in X I C X Y y I C X Y y I C X actual interaction I C X Y I C X Y I C X expected interaction Expected interaction is symmetric in all three variables I C X Y I C Y X I X Y C etc A positive interaction means that knowing Y makes X more informative about C A negative interaction means that Y makes X less informative about C For example suppose C is tomorrow s weather and X and Y are weather reports from two different stations Once you hear one weather report the other going to be much less informative Relations of information and interaction can be conveniently visualized with information graphs 2 Example the interaction between the presence of car and cars cars FALSE car label FALSE TRUE auto 35 34 moto 93 3 cars TRUE car label FALSE TRUE auto 12 19 moto 2 2 I label car I label car cars F I label car cars F I label car cars T I label car cars T Pr cars T I label car cars 0 227 0 233 0 006 0 004 0 223 0 175 0 034 positive interaction negative interaction negative on average Note that the expected interaction can be zero even though the actual interactions are nonzero but of opposite sign 3 0 6 0 4 0 0 0 2 Actual information 0 8 bike helmet bikes engine fordrider wagon ed fuel countersteering xt dog saturn transmission tires vehicles shifter oil green mileage integra callison auto ducati cable king drag east biker stuck nec driving v uk driver mitsubishi manual nissan behanna uoknor llarge dealer cars speedy jody rally james sumax wave autos ss accident maxima driven valve awd levine horse beth nick rd ama gm geico virginia gardner hydro mjs nj fully mr caltech seattleu mazda radar claim wd adobe lean clutch force sf him riding morgan mcguire bgardner lights plug jacket j k problems uwm wen ecn stanza sort cornell michael dec ole blaine cb dean ride understand cdac cd fleet plastic vehicle price r guess sys pressure qnx targa mustang npet motorcycling starting dogs reliability yamaha motorcycles footpeg bumper qazi pettefar camaro uea mx utexas rest satan steer center moto sc ux sl centerline months weekend altima distance drew reed bird alot latech ranck curtis hawk de book karr rein cb eric volvo twin convertible andrew magnus gary proper agent tops engineer bridges villager selling grand reining gloves pulling drinking posting certainly keys bicycle syl jackson ninjaite cjackson pads pain apple plow performance prefer ssave steven netcom trans rates taurus rotors station drove maintenance smorris harder shows massive sixsmith alcohol vincent deal kids attack nist dixon dead learn reins wish handlebars bombed uokmax kawasaki cookson vlasis drain manish csd leather riders models federal speed side push motorcycle mm egreen gs wagons copy ryan corp pre q terry questor compartment quantum quest spot engines happening advantage knowing backing blowing reading lightning budge dagibbs killing training guide judge diego waving guzzi hg quality chrysler compdyn society mostly policy harleys boyle harley enjoy dry key jim objects grateful japan join joe purdue hplsla depressed pebbles squish slip jimf corner seca runs zx ucs waco cwru wam enet comment ottawa route s sr rz bikers mercedes oklahoma owned ahead seemed stated levels acted calls ubc rd six air mechanism cousineau woodward handlebar aluminum thunder instructor svoboda alaska member morris sedan rebuilt cobra attitude mirrors trunk numbers base mellon units dream walk rush civic inch eliot leavitt install erik ins beer rid itd rain idle rail via neck mike informed sunroof confused refund four fire fall fords frost soft kill based guy except together drivers conditions american rocks shift shaft fitgt ac energy anger gap eg gauge munny penny pays compare overpass squeeze po ps response government gts tranny tommy country system apartment quarts nntp step suggested urbanachampaign approaching beavington considering according mbeaving goddamn debating shipping legend heading regards dodge jaguar galant gently weighs signals eagle angle dying paying galaxy guard gibbs signal gatech aerodynamics boyhood chevy snyder youd larry jerry theyre sticky significant figures joesbar jason jlevine shifting ejv j fringe jeep ninja jaws xjs unforgiven forget comparison passed microcomputer experiences applications paperwork appreciated interceptor purchase department practical explorer stooped prepared warped provide pranks stopped spiros impact pillion repair squid pipes input pixel
View Full Document
Unlocking...