View Full Document

AMethod of Automated Nonparametric Content Analysis for Social Science



View the full content.
View Full Document
View Full Document

7 views

Unformatted text preview:

A Method of Automated Nonparametric Content Analysis for Social Science Daniel J Hopkins Georgetown University Gary King Harvard University The increasing availability of digitized text presents enormous opportunities for social scientists Yet hand coding many blogs speeches government records newspapers or other sources of unstructured text is infeasible Although computer scientists have methods for automated content analysis most are optimized to classify individual documents whereas social scientists instead want generalizations about the population of documents such as the proportion in a given category Unfortunately even a method with a high percent of individual documents correctly classified can be hugely biased when estimating category proportions By directly optimizing for this social science goal we develop a method that gives approximately unbiased estimates of category proportions even when the optimal classifier performs poorly We illustrate with diverse data sets including the daily expressed opinions of thousands of people about the U S presidency We also make available software that implements our methods and large corpora of text for further analysis E fforts to systematically categorize text documents date to the late 1600s when the Church tracked the proportion of printed texts which were nonreligious Krippendorff 2004 Similar techniques were used by earlier generations of social scientists including Waples Berelson and Bradshaw 1940 which apparently includes the first use of the term content analysis and Berelson and de Grazia 1947 Content analyses like these have spread to a vast array of fields with automated methods now joining projects based on hand coding and have increased at least sixfold from 1980 to 2002 Neuendorf 2002 The recent explosive increase in web pages blogs emails digitized books and articles transcripts and elec tronic versions of government documents Lyman and Varian 2003 suggests the potential for many new applications Given



Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view AMethod of Automated Nonparametric Content Analysis for Social Science and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view AMethod of Automated Nonparametric Content Analysis for Social Science and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?