UMD CMSC 838S - Stock Seeker: Finding Correspondences in Stock Data

Unformatted text preview:

Stock Seeker: Finding Correspondences in Stock Data∗Abhinav Gupta and Vlad MorariuDepartment of Computer ScienceUniversity of Maryland-College ParkAbstractThe problem of stock data analysis has been widelyresearched by stock analysts interested in detectingpatterns in stocks. Resulting algorithms have dealtwith many representations of stock data dependingon the problem, many dealing with complex caseswhere patterns in stock data need to be matcheddespite differences in scale, translation, or noisiness.However, specific instances arise where such gener-ality is not needed. For example, analysts may beinterested in compiling historical price data from dif-ferent sources. In this cas e the correspondence be-tween subsequences in two datasets is not known, butis desired to allow analysts to have accurate stockprices. This problem may be subject to noise, but isnot subject to scaling or translation, allowing for amuch simpler matching algorithm that can b e madeto run rapidly to allow for user interaction. This pa-per presents a visual interface that allows the user tosolve such matching problems by iteratively makingqueries, viewing results, and changing parameters ormatching algorithms.1 IntroductionStock analysts often compile databases of financialdata from multiple providers. To ensure that thedata collected is accurate, the multiple sources pro-vide duplicate data. However, different data sourcesare not guaranteed to have the sam e identifiers fortime-series, and in some cases one time-series may∗The work has been done for partial fulfillment of the re-quirements of the course CMSC838Smatch to a subsequence of any of a number of time-series. For example one source might provide closing-day prices of a company and another source mightprovide closing-day prices of a company’s security.Since a company may have multiple securities, thecompany’s closing day price might be the price ofany of its securities or even an average of its secu-rities. The method for choosing prices might alsochange at any time. Thus, a list of possible matchesbetween prices in the two databases can aid in theanalysts’ task of resolving any possible discrepanciesbetween the sources.Thus, the problem is as follows. We are given twodatasets that contain M and N time-series, over amaximum duration of t time points. All data pointsare aligned by time, and matches are only determinedat equal time points (thus removing the possibility oftranslation). A time-series may correspond to differ-ent sources at different times, and the goal of thisproblem is to find subsequences in the two datasetsgenerated by the same source. Thus, a time-series inthe first dataset may match one time-series in the sec-ond dataset for a certain period of time, but anotherfor another period of time. It is possible that noiseis present, so prices are not necessarily equal duringa match. Given a se t of matching parameters and amatching algorithm, the matches for each time-seriesfrom one set to another can be computed.However, the results depend on the parameters,and a visualization method can greatly aid users infinding the best parameters to complete this match-ing task. Thus, we present an interface that allowsusers to view the two datasets and interact with themby making queries, viewing matches, changing pa-rameters, and repeating the process. Once the de-1sired parameters are found, users can compute allmatches and save the results for future use. The vi-sualization dynamically updates as users make newqueries and change parameters, allowing users toquickly arrive at the best parameters. Detail viewsare also provided, where time-series are filtered basedon individual queries and their matches. Finally, re-sults of all matches can be viewed both textually andgraphically. The textual results are provided in theform of a link table, which contains entries {ID1, ID2,BeginDate, EndDate}, describing a match betweenID1 in one dataset and ID2 in another starting onBeginDate and ending on EndDate.The paper is organized as follows. In section 2,we discuss past work in time-series and subsequencematching as well as some related visualizations. Insection 3, we discuss the matching algorithms thatwe use in our interface. In section 4, we describe howthe user can interact with the interface. Finally, insection 5, we provide our concluding remarks.2 Related WorkTime-series and subsequence matching have beenwidely studied by researchers. Much of the researchin the field has been done on data representation forsimilarity and indexing for database queries. Recentwork has also focused on the information visualiza-tion aspect of the problem, where fast algorithms areimportant for increased interactivity.Faloustsos et. al [3] in their pioneering work in thefield use Euclidean distance of the coefficients of themoving-window Discrete Fourier Transform (DFT).Rafiei et. al [11] discuss a method for similarityqueries of time-series data that uses linear transfor-mations on the Fourier series representation of a se-quence. Both the approaches above have the niceprop e rty that few false hits and no false negativesare introduced. Kim et. al [7] propose an efficientgraph based approach to reduce these false alarms.The approach is based on graph traversal of DirectedAcyclic Graph(DAG) graph to yield partial matches.The sub-trails of query sequence is matched to yieldpossible set of candidate sub-trails from the other se-quence. These candidates sets have to be traversedto give the best match subsequence.Keogh [5] proposes an approach to approximate atime-series by piecewise linear representation. Thedistance measure considered is the variance of lengthof projected lines. The paper also presents a novel’bottom-up’ segmentation algorithm. Park et. al [9]introduce aligned subsequence matching scheme thatreplaces Euclidean distance as the distance mea-sure. Their technique relies on splitting sequencesinto piece-wise segments and examining only subse-quences that start and end at segment boundaries.This reduces the search complexity while still usingmodified time warping distance internally. The au-thors also introduce a method for indexing to facili-tate time sequence queries using the measure of m od-ified time warping distance.Peng et. al [10] propose to use landmarks and itsproperties instead of the whole time-series. Theirapproach is motivated by researches in psychologywhich suggest that humans have episodic memory.They also present a novel


View Full Document

UMD CMSC 838S - Stock Seeker: Finding Correspondences in Stock Data

Download Stock Seeker: Finding Correspondences in Stock Data
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Stock Seeker: Finding Correspondences in Stock Data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Stock Seeker: Finding Correspondences in Stock Data 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?