Berkeley COMPSCI 260A - Visualizing Statistical Analysis of News

Unformatted text preview:

Nicholas KongFebruary 23, 2009Visualizing Statistical Analysis of NewsCS260, Spring ’09, Project ProposalBackgroundText is the most common form of visualization, and one of the most idiosyncratic. Since the adventof the Internet our access to text, and thus our difficulty to assimilate it all, has grown exponentially.This is especially true of news; we now have access to an unprecedented number of news sources,from the mainstream media to polemic bloggers.While much work has been done in summarizing and condensing the news (e.g., NewsMap,Google News), to my knowledge no statistical analysis of large corpora of text has been used as thebasis of a sensemaking visualization. In addition, the statistical analysis is itself novel. My researchinvolves a collaboration with Professor Laurent El Ghaoui in EE and his StatNews group. TheStatNews project is investigating statistical algorithms to analyze large text corpora, specificallynews data. Their current approach involves Naive Bayesian analysis, whereupon the corpus issplit into two sets (e.g., headlines that contain “Iraq” and headlines that do not), and each wordis subsequently assigned a weight by the classifier. A positive weight indicates that a word is asignificant predictor of Iraq in the headlines, whereas a negative weight indicates that a word is asignificant predictor of the absence of Iraq in the headlines.ApproachThe main questions in this project are two-fold:1. What does this analysis give us that co-occurrences do not, and2. how do we display the results in such a fashion to reveal their significance?In order to address either question, it is necessary to identify queries that our target users areinterested in. As a starting point, we have started conversation with a social scientist about whatshe would be interested in using the tool for. She gave us queries such as• Is Iran being portrayed as isolated?• Is the word “spending” always used in the context of “social spending”, or does it alsoencompass Department of Defense spending?By discovering interesting queries from the target users, I will attempt to refine the existingvisualization prototype (shown in Figure 1) and add additional features (such as the ability tocompare two seed words, or the ability to dynamically alter the news source) in order to helpanswer those queries.To address the first question I would also like to explore a similar visualization using just rawco-occurrences to discover if similar insights can be gleaned from both analyses.1Figure 1: A view of the current visualization of an analysis of the New York Times headlines.Themes from the courseThis project principally relates to two course themes:1. Frames2. Social IdentityFramesInstead of merely searching for coocurrences of, for example, “Iraq” and “evil”, it will also beinteresting to use this tool to compare different framings of the same or similar events. For example,we could compare the occurrences of “bailout” and “economic stimulus” to determine whether theyare being used in mutually exclusive timeframes, or whether “bailout” is exclusively associated with“automobile” or “banking” whereas “stimulus” may be associated with “jobs” or “infrastructure”.Social identityMore tenuously, this project could potentially relate to social identity, specifically political socialidentity. By comparing news sources and left-/right- leaning blogs, the tool and analyses could beused to identify a prototypical “voice” for a certain social group. That is, we could find what wordspredict a right-wing blog versus what words predict a left-wing blog.Project goals and assessmentAssessment will be primarily performed through case studies with social scientists or others whoare interested in seriously analyzing this data. Since the design of the visualization will havebeen informed by one set of queries, we can determine the general efficacy of the visualizationby presenting it to another group of users with another set of queries. Success will be mostlyqualitatively determined: if users are able to find convincing evidence for their queries through thetool, we will have achieved our


View Full Document

Berkeley COMPSCI 260A - Visualizing Statistical Analysis of News

Download Visualizing Statistical Analysis of News
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Visualizing Statistical Analysis of News and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Visualizing Statistical Analysis of News 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?