UW-Madison CS 525 - Computing Project - The Disputed Federalist Papers

Unformatted text preview:

525 Computing Project, Spring 2014∗The Disputed Federalist PapersLinear and quadratic programming can be used to solve problems in manyapplication areas. In this project, we will apply linear programming to amachine learning formulation to determine the authorship of the disputedFederalist Papers.The Federalist Papers were written in 1787-1788 by Alexander Hamilton,John Jay, and James Madison to persuade the citizens of the State of NewYork to ratify the U.S. Constitution. As was common in those days, these77 essays, about 900 to 3500 words in length, appeared in newspapers signedwith a pseudonym, in this instance, “Publius”. In 1778, these papers togetherwith eight additional articles on the same subject — a total of 85 articles —were published in book form. Since then, the consensus has been that JohnJay was the sole author of five of a total 85 papers, that Hamilton was the soleauthor of 51, that Madison was the sole author of 14, and that Madison andHamilton collaborated on another three. The authorship of the remaining 12papers, known as the “disputed papers,” has been a matter of long-standingcontroversy. It has been generally agreed that the disputed papers werewritten by either Madison or Hamilton, but there was no consensus aboutwhich were written by Hamilton and which by Madison.In this project, we look at the frequencies with which Madison and Hamil-ton used certain words, and use this information to try to determine whichof them wrote each of the 12 disputed papers.The data, obtained from [1], is available in the file federalData.matin the public directory ∼cs525-1/public. The file contains a data matrixwith 118 lines of data, one line per paper. (A number of other papers withknown authorship of either Hamilton or Madison were added to the FederalistPapers mentioned above, to provide extra data on how the two authors made∗Due at 5:00pm on May 9, 20141use of vocabulary.) The first entry in each line contains the code number ofthe author: 1 for Hamilton (56 papers in total), 2 for Madison (50 papersin total), and 3 for the disputed papers (12 in total). The remaining entriescontain 70 floating point numbers that correspond to the relative frequencies(number of occurrences per 1000 words of the text) of the 70 function words,which are also available in the data file as an array of strings.The idea of the project is to come up with a linear discriminant function— a plane that approximately separates the papers of Hamilton from those ofMadison in “word space” — to determine if a disputed paper was authoredby Hamilton or Madison. To do this, we divide the papers with knownauthors into a “training set” and a “tuning set” for separating plane. Thedisputed papers form a “testing set”. Once we have calculated the separatingplane, we test to see which side of the plane the disputed papers lie on. Anappropriate plane can often be determined by solving a linear or quadraticprogram, as we now describe.Given word frequency data for n words, a linear function f will be con-structed to “usually” have the following property:f(x) > 0 =⇒ x ∈ M, f(x) ≤ 0 =⇒ x ∈ H,where M denotes Madison’s papers, H denotes Hamilton’s papers, andf(x) = w0x − γ, with w ∈ Rnand γ ∈ R to be determined by solvingan optimization problem constructed from the training data. In other words,the plane w0x = γ separates (most of) Madison’s papers from Hamilton’s inthe space Rnof word frequencies.If we represent the sets of m points in M by a matrix M ∈ Rm×nandthe set of h points in H by a matrix H ∈ Rh×n, then the problem becomesone of choosing w and γ to solve the following minimization problem:minw,γ1m(−Mw + emγ + em)+22+1h(Hw − ehγ + eh)+22Here emand ehare vectors of lengths m and h, respectively, whose entriesare all 1, while ((z)+)i= max{zi, 0}, i = 1, 2, . . . , m and kzk1=Pmi=1|zi| forz ∈ Rm. This problem approximately minimizes the number of misclassifiedpoints by choosing w and γ to minimize the sum of the squared distances tothe separating plane for those points on the incorrect side of the plane. The(·)+and k·k1functions can be eliminated by introducing additional variablesy and z into the formulation, and writing the problem above as a quadratic2programminw,γ,y,z1mkyk22+1hkzk22subject to Mw − emγ + y ≥ em,−Hw + ehγ + z ≥ eh,y ≥ 0, z ≥ 0.Suppose we wish to find a classifier that depends on just a few of thewords in the vocabulary, and is able to classify effectively on the basis ofthese few words. In other words, we seek a sparse solution w to the problemabove — a solution in which just a few of the components of w are nonzero.One way to achieve this goal is to regularize the formulation above by addinga term µkwk1to the objective, where µ is some user-specified parameter. Wethus obtain the following modified formulation:minw,γ,y,z1mkyk22+1hkzk22+ µkwk1subject to Mw − eγ + y ≥ em,−Hw + eγ + z ≥ eh,y ≥ 0, z ≥ 0.(As we know, this can be converted into a quadratic program by splittingthe variable w into its positive and negative parts.) As the value of µ valueis increased, the vector w tends to have fewer and fewer nonzero elements.The MATLAB routine getFederalistData.m (available in ∼cs525-1/public)parses the data in federalData.mat to produce a training matrix train(consisting of 86 entries with known authorship), a tuning matrix tune (con-sisting of the other 20 papers of known authorship), and a testing matrixtest (consisting of the 12 entries with unknown authorship).Answer the following questions.1. Write a code in MATLAB and CVX to solve the problem above to findthe classifying hyperplane, where the matrices M and H are extractedfrom the training matrix train described above. Write a loop to solvethe problem for these values of µ: 0, .001, .01, .1, 1, 10, and 100. Aftersolving for w, “postprocess” your solution by setting small componentsof w to zero, according to the following formula:Set wi:= 0 if |wi| ≤ 10−6max(kwk1, 1).3For each value of µ, calculate and print the following information:– the optimal objective;– the values of γ and kwk1;– the number of nonzeros in w (after the postprocessing above);– the number of misclassified points (those that lie on the wrongside of the classifying hyperplane) from the training set, which isdescribed in the matrix train;– the number of misclassified points from the tuning set, which


View Full Document

UW-Madison CS 525 - Computing Project - The Disputed Federalist Papers

Download Computing Project - The Disputed Federalist Papers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Computing Project - The Disputed Federalist Papers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Computing Project - The Disputed Federalist Papers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?