Penn CIS 400 - Forecasting Prediction Markets By News Content Analysis - D542047

Home> Schools> University of Pennsylvania> Cinema Studies (CIS) > CIS 400> Forecasting Prediction Markets By News Content Analysis

Penn CIS 400 - Forecasting Prediction Markets By News Content Analysis

School name University of Pennsylvania

Course Cis 400- Senior Project.

Pages 45

Download Save

Unformatted text preview:

Reading the Markets: Forecasting Prediction Markets By News Content Analysis Ari Gilder ([email protected]) Kevin Lerman ([email protected]) Faculty Advisor: Fernando Pereira ([email protected]) Project Advisor: Mark Dredze ([email protected]) Abstract: We present a system for predicting price fluctuations in Prediction Markets, such as TradeSports and the Iowa Electronic Markets. Our approach utilizes both market history and public news articles, published before the beginning of trading each day, to produce a set of recommended investment actions. Since there is evidence that prediction markets are very good indicators of future events, we hypothesize that the converse is true: past/present events can potentially assist in predicting future prices in these markets. We empirically show that these markets are surprisingly predictable, even by purely market-historical techniques. Furthermore, analyzing relevant news articles captures information independent of the market’s history, and combining the two methods significantly improves results. Capturing this signal from news articles requires some linguistic sophistication – the standard naïve bag-of-words approach does not yield predictive features. Instead, we use part-of-speech tagging, dependency parsing and semantic role labeling to generate features that improve system accuracy. We evaluate our system on eight political markets from 2004 and show that we can make effective investment decisions based on our system’s predictions, whose profits greatly exceed those generated by a baseline system. Additionally, our market prediction system can be applied to any Prediction Market with a known end date and for which a set of relevant entities (people, places, or things) can be defined. 1Introduction Prediction markets, such as Tradesports [www.tradesports.com], NewsFutures [www.newsfutures.com], and the Iowa Electronic Markets [www.biz.uiowa.edu/iem/], allow users to predict some future event and stake money on their prediction. For example, in 2004, users could purchase a share of “George W. Bush to win the 2004 Presidential election”. This share’s value becomes $1 if Bush wins in November, and $0 if he loses. In the interim, the share can be bought and sold at varying prices according to market demand. Thus, if something positive for Bush happens (e.g. Bin Laden is captured), Bush will appear more likely to win, more people will want to buy “Bush to win” shares, and the price of these shares will go up. Likewise, if something negative for Bush occurs (e.g. casualties in Iraq increase dramatically), people will think he is less likely to win, will want to unload their shares before the election happens, and the share price will drop – similar to the stock market. We attempt to predict future price fluctuations in these markets by analyzing a market’s history (previous price movements, trading volume, etc.) as well as news written about the people involved. Since in principle the news is the information upon which people are basing their investment decisions, a sufficiently sophisticated analysis of the news should yield a decent prediction as to what people are likely to do. We generate a syntactic dependency parse of each day’s news, and extract features related to a user-defined set of market entities (e.g. for the Bush election market, Bush, Kerry, and Iraq). This approach is empirically more effective than a more naïve bag-of-words approach, where no linguistic information is used. We compare these feature values to those generated from previous days to determine the novel component of the day’s news (i.e. what is being discussed that wasn’t discussed yesterday). These comparisons are then used as features for a machine learning system. Finally, we generate one set of predictions based on market history, another based on news content, and automatically merge the two to produce significantly better returns than either system independently. 2Related Work There have been many researchers that have studied the stock and prediction markets. Koppel & Shtrimberg (2004) looked for correlations between news and the stock market, although with only low to moderate success. Our hypothesis is that their lack of success resulted from the extreme efficiency of the stock market [Antweiler & Frank, 2005], along with various other fluctuations. While there is evidence that prediction markets are somewhat efficient [Pennock et al. 2000], we suggest that prediction markets may be more susceptible to published news because of their lower efficiency and less scrutiny by market analysts, coupled with the nature of these markets as “information markets”. This suggestion seems to be supported by current research including that of Debnath et al., 2003; Pennock et al. 2001; and Servan-Schreiber et al., 2004. Furthermore, there is considerable evidence that prediction markets tend to be very accurate in predicting future events [Wolfers & Zitzewitz, 2004; Servan-Schreiber et al., 2004; Pennock et al., 2000]. We suspect there also exists a converse effect: current events can potentially predict the price of prediction markets since past and present news should be what informs investors in these prediction markets. Our system makes use of various textual information extraction techniques, including a bag-of-words method, commonly used in sentiment classification techniques [Dave et al., 2003], as well as more linguistic techniques, such as using dependency-tree parsing to extract useful syntactic information [McDonald et al., 2006]. The use of syntactic information, combined with additional semantic information, for text polarity extraction is supported by the work of Hurst & Nigam [2004]. We believe we are the first to explore a correlation between news articles and prediction markets and the first to apply advanced NLP techniques to the task of market news correlation. System Overview We begin with a brief overview of our system (see “Technical Approach” for more details). We start by extracting features from each day’s news articles. Each sentence of each article is part-of-speech tagged with the Stanford part-of-speech tagger [Toutanova et al. 2003], parsed into a dependency tree representing the syntactic relationships between different words of the sentence, the edges of which in turn are 3assigned semantic role labels denoting the nature of the relationships

View Full Document


School:
Email:
New Password:
Confirm Password:

Penn CIS 400 - Forecasting Prediction Markets By News Content Analysis

Sign up for free to view:

Please select your school