Programming Project #2Along Party LinesThis problem set is due 11/20.Since it is an important year for U.S. elections, we’ll do a political programming project. Ina separate file, we’ve included the party membership and voting records of various member ofcongress. We’d like you to use this data to classify congressmen as Republicans or Democrats.We’d like you to use Bayesian Classifiers and Decision Trees. We’ve included their votes on anumber of issues. The data set is from the 80s, so the issues aren’t current (you can envision thecongressmen with funny hair if you’d like). And the issues are:1. Handicapped Infants2. Water Project Cost Sharing3. Adopt the Budget Resolution4. Physician Fee Freeze5. Aid to El Salvador6. Religious Groups in Schools7. Anti Satellite Test Ban8. Aid to Nicaraguan Contras9. MX Missile10. Immigration11. Synfuels Corporation Cutback12. Education Spending13. Supe rfund Right to Sue14. Crime15. Duty Free Exports16. South Africa Export Administration Act1An individual congressmen is represented by a line in the training file:no,yes,yes,no,no,maybe,yes,yes,yes,yes,yes,no,maybe,yes,yes,yes,democratWhich shows their votes on the issues and their political party. Congressmen who didn’t voteduring a particular vote are given a “maybe”. You may notice that some congressmen have notvoted at all.You have three files:1. training-large.csv: The main training set2. training-small.csv: A smaller training set3. testing.csv: The testing set1 Part 1: ClassificationFor each part, create a classifier using the suggested python modules. Train the classifier on thesmall training set and then test using the test data. Collect precision and recall information. Dothe same thing for the larger training set.Please document your code and make the structure clean, readable, and intuitive.1.1 Part 1a: Bayesian ClassifierUse a na¨ıve Bayes classifier to classify the politicians.You should use the Reverend classifier: http://divmod.org/trac/wiki/DivmodReverend. Youcannot simply just plug in the classifier; you will have to write a tokenizer to split up the traininginput appropriately.1.2 Part 1b: Decision TreesYou can find a Python implimentation of the ID3 algorithm here (along with a long tutorial):http://www.onlamp.com/pub/a/python/2006/02/09/aidecision trees.html2 Part 2: DiscussionNow that you’ve classified things, please discuss the different classification techniques. Did differentclassifiers do better with this data set? How did the size of the training set effect the results? Diddifferent methods work better with different size training sets?This should definitely be longer than a paragraph, and should be as long as you need to getyour ideas
View Full Document