**Unformatted text preview:**

Non-Parametric Time Series ClassificationScott Lenser and Maneula VelosoCarnegie Mellon UniversityPittsburgh, PA{slenser,mmv}@cs.cmu.eduAbstractWe present a new state-based prediction algorithm for time series. Giventime series produced by a process composed of different underlyingstates, the algorithm predicts future time series values based on past timeseries values for each state. Unlike many algorithms, this algorithm pre-dicts a multi-modal distribution over future values. This prediction formsthe basis for labelling part of a time series with the underlying state thatcreated it given some labelled examples. The algorithm is robust to awide variety of possible types of changes in signals including changes inmean, amplitude, amount of noise, and period. We show results demon-strating that the algorithm successfully segments signals for a wide vari-ety of example possible signal changes.1 IntroductionSegmentation of time series into discrete classes is an important problem in many fields.We approach the problem from the field of robotics where time series generated by sensorsare readily available. We are interested in using these signals to identify sudden changesin the robot’s environment allowing the robot to respond intelligently. For this applica-tion, the signal segmentation must be performed in real time and on line, which requiresalgorithms that are amenable to on-line use. Usually a mathematical model of the processthat generates the sensor signal is unavailable as are the number of possible states in thisprocess. Therefore, we focus on techniques that require little a priori knowledge and fewassumptions.In previous work [1, 2], we developed a technique for segmenting a time series into differ-ent classes given labelled example time series. In [1], we showed that our algorithm cansuccessfully segment signals from robotic sensors. In this work, we improve on our previ-ous technique by replacing a windowed approach to signal classification with a recursivesolution based on a simple HMM.We have named our new algorithm for classifying time series the Probable Series Classifier(PSC). It is based on a time series prediction component which we will refer to as theProbable Series Predictor (PSP). Unlike many other methods, PSP predicts a multi-modelprobability density over next values. PSC uses several PSP modules to classify a time seriesinto one of a set number of pre-trained states. PSC uses one PSP module per state. EachPSP module is pre-trained from an example time series generated by one state. PSP uses aninternal non-parametric model trained from an example time series to make its predictionsPSC runs each PSP module on a time series to be classified and uses the one which bestpredicts the time series as the classification of the unknown time series.There has been much interest in time series analysis in the literature due to the broad ap-plicability of time series techniques. There have also been many approaches to time seriespredictions, most of which are focused on producing a single predicted value. For example,time series prediction has been done using AR, ARMA, IMA, and ARIMA models (e.g. [3]) and neural networks (NNs). All of these techniques produce a single estimated next valuein the time series. These techniques can be converted into predicting a distribution overvalues by assuming a Gaussian deviation around the predicted value. This approach is usedby Petridis and Kehagias for NNs [4, 5] and Penny and Roberts for HMM-AR models [6].A related approach is that of using Gaussian Processes [7] for prediction. Unfortunately,this technique requires the inversion of an nxn matrix which takes O(n3) time for n obser-vations. In contrast, our approach takes O(n log(n)) time in our current implementation.The chief advantages of our approach over these previous approaches are that we are capa-ble of predicting a multi-modal distribution over values and that our method is amenable toon-line training as more data from a particular state becomes available.There are a wide variety of algorithms based on change detection, particularly in the domainof fault detection and identification (FDI). These FDI algorithms (e.g. [8, 9]) are usuallyspecialized for the case of two states, one for normal system operation and one for failurecases. Because data can only be collected about the normal state of the system, thesealgorithms are attempting to solve a different problem and are generally more specializedfor this task. We take a very general approach where we detect a wide variety of types ofchanges to the signal which sets PSC apart from these other techniques.There has also been a lot of interest in HMMs and switching state-space models, e.g. [6, 10].These techniques require an a priori knowledge of the underlying structure of the systemor extensive off-line training. PSC requires no knowledge about the system structure, aswe only require labelled time series examples. PSC also requires negligible training timesince almost all of the work is done at query time.2 Probable Series Classifier AlgorithmConsider a time series of values ~x0, ~x1, . . . , ~xtcreated by a generator with k distinct states.At each point in time, one of the states is active and generates the next data value in thetime series based upon the previous time series values. Also assume that the frequency ofswitching between states is relatively low, such that sequential values are likely to be fromthe same state. We are interested in using the time series of values to recover which statewas active at each point in time using only example time series created by each state.The belief state at time j for state i is the probability of it being active at time j:B(sj= i) = P (sj= i|~xj, ~xj−1, . . . , ~x0)=P (~xj|~xj−1, . . . , ~x0, sj= i) ∗ P (sj= i|~xj−1, . . . , ~x0)P (~xj|~xj−1, . . . , ~x0)We are interested in finding the state i that maximizes this probability. Note thatP (~xj|~xj−1, . . . , ~x0) is just a normalizing constant and thus doesn’t affect which s = ihas the maximum likelihood. Furthermore, we will make the mth-order Markov assump-tion that values > m time steps ago are negligible, given more current readings. Thisassumption simplifies P (~xj|~xj−1, . . . , ~x0, sj= i) to P (~xj|~xj−1, . . . , ~xj−m, sj= i).P (sj= i|~xj−1, . . . , ~x0)=XlP (sj= i, sj−1= l|~xj−1, . . . , ~x0)=XlP (sj= i|sj−1= l, ~xj−1. . . ~x0)P (sj−1= l|~xj−1. . . ~x0)=XlP (sj= i|sj−1= l) ∗