New version page

Iterative Incremental Clustering of Time Series

Upgrade to remove ads

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

Iterative Incremental Clustering of Time Series Jessica Lin, Michail Vlachos, Eamonn Keogh, and Dimitrios Gunopulos Computer Science & Engineering Department University of California, Riverside Riverside, CA 92521 {jessica, mvlachos, eamonn, dg}@cs.ucr.edu Abstract. We present a novel anytime version of partitional clustering algorithm, such as k-Means and EM, for time series. The algorithm works by leveraging off the multi-resolution property of wavelets. The dilemma of choosing the initial centers is mitigated by initializing the centers at each approximation level, using the final centers returned by the coarser representations. In addition to casting the clustering algorithms as anytime algorithms, this approach has two other very desirable properties. By working at lower dimensionalities we can efficiently avoid local minima. Therefore, the quality of the clustering is usually better than the batch algorithm. In addition, even if the algorithm is run to completion, our approach is much faster than its batch counterpart. We explain, and empirically demonstrate these surprising and desirable properties with comprehensive experiments on several publicly available real data sets. We further demonstrate that our approach can be generalized to a framework of much broader range of algorithms or data mining problems. 1 Introduction Clustering is a vital process for condensing and summarizing information, since it can provide a synopsis of the stored data. Although there has been much research on clustering in general, most classic machine learning and data mining algorithms do not work well for time series due to their unique structure. In particular, the high dimensionality, very high feature correlation, and the (typically) large amount of noise that characterize time series data present a difficult challenge. Although numerous clustering algorithms have been proposed, the majority of them work in a batch fashion, thus hindering interaction with the end users. Here we address the clustering problem by introducing a novel anytime version of partitional clustering algorithm based on wavelets. Anytime algorithms are valuable for large databases, since results are produced progressively and are refined over time [11]. Their utility for data mining has been documented at length elsewhere [2, 21]. While partitional clustering algorithms and wavelet decomposition have both been studied extensively in the past, the major novelty of our approach is that it mitigates the problem associated with the choice of initial centers, in addition to providing the functionality of user-interaction. The algorithm works by leveraging off the multi-resolution property of wavelet decomposition [1, 6, 22]. In particular, an initial clustering is performed with a verycoarse representation of the data. The results obtained from this “quick and dirty” clustering are used to initialize a clustering at a finer level of approximation. This process is repeated until the “approximation” is the original “raw” data. Our approach allows the user to interrupt and terminate the process at any level. In addition to casting the clustering algorithm as an anytime algorithm, our approach has two other very unintuitive properties. The quality of the clustering is often better than the batch algorithm, and even if the algorithm is run to completion, the time taken is typically much less than the time taken by the batch algorithm. We initially focus our approach on the popular k-Means clustering algorithm [10, 18, 24] for time series. For simplicity we demonstrate how the algorithm works by utilizing the Haar wavelet decomposition. Then we extend the idea to another widely used clustering algorithm, EM, and another well-known decomposition method, DFT, towards the end of the paper. We demonstrate that our algorithm can be generalized as a framework for a much broader range of algorithms or data mining problems. The rest of this paper is organized as follows. In Section 2 we review related work, and introduce the necessary background on the wavelet transform and k-Means clustering. In Section 3, we introduce our algorithm. Section 4 contains a comprehensive comparison of our algorithm to classic k-Means on real datasets. In Section 5 we study how our approach can be extended to other iterative refinement method (such as EM), and we also investigate the use of other multi-resolution decomposition such as DFT. In Section 6 we summarize our findings and offer suggestions for future work. 2 Background and Related Work Since our work draws on the confluence of clustering, wavelets and anytime algorithms, we provide the necessary background on these areas in this section. 2.1 Background on Clustering One of the most widely used clustering approaches is hierarchical clustering, due to the great visualization power it offers [12]. Hierarchical clustering produces a nested hierarchy of similar groups of objects, according to a pairwise distance matrix of the objects. One of the advantages of this method is its generality, since the user does not need to provide any parameters such as the number of clusters. However, its application is limited to only small datasets, due to its quadratic (or higher order) computational complexity. A faster method to perform clustering is k-Means [2, 18]. The basic intuition behind k-Means (and in general, iterative refinement algorithms) is the continuous reassignment of objects into different clusters, so that the within-cluster distance is minimized. Therefore, if x are the objects and c are the cluster centers, k-Means attempts to minimize the following objective function:  kmNimicxF1 1 (1) The k-Means algorithm for N objects has a complexity of O(kNrD) [18], where k is the number of clusters specified by the user, r is the number of iterations until convergence, and D is the dimensionality of the points. The shortcomings of the algorithm are its tendency to favor spherical clusters, and its requirement for prior knowledge on


Download Iterative Incremental Clustering of Time Series
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Iterative Incremental Clustering of Time Series and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Iterative Incremental Clustering of Time Series 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?