Unformatted text preview:

CS490D: Introduction to Data Mining Prof. Chris CliftonMining Time-Series and Sequence DataSlide 3Mining Time-Series and Sequence Data: Trend analysisEstimation of Trend CurveDiscovery of Trend in Time-Series (1)Discovery of Trend in Time-Series (2)Similarity Search in Time-Series AnalysisData transformationMultidimensional IndexingSubsequence MatchingEnhanced similarity search methodsSimilar time series analysisSteps for Performing a Similarity SearchSlide 15Query Languages for Time SequencesSequential Pattern MiningMining Sequences (cont.)Sequential pattern mining: Cases and ParametersSequential pattern mining: Cases and Parameters (2)Episodes and Sequential Pattern Mining MethodsPeriodicity AnalysisCS490D:Introduction to Data MiningProf. Chris CliftonApril 5, 2004Mining of Time Series DataCS490D Spring 2004 2Mining Time-Series and Sequence Data•Time-series database–Consists of sequences of values or events changing with time–Data is recorded at regular intervals–Characteristic time-series components•Trend, cycle, seasonal, irregular•Applications–Financial: stock price, inflation–Biomedical: blood pressure–Meteorological: precipitationCS490D Spring 2004 3Mining Time-Series and Sequence DataTime-series plotCS490D Spring 2004 4Mining Time-Series and Sequence Data: Trend analysis•A time series can be illustrated as a time-series graph which describes a point moving with the passage of time•Categories of Time-Series Movements–Long-term or trend movements (trend curve)–Cyclic movements or cycle variations, e.g., business cycles–Seasonal movements or seasonal variations•i.e, almost identical patterns that a time series appears to follow during corresponding months of successive years.–Irregular or random movementsCS490D Spring 2004 5Estimation of Trend Curve•The freehand method–Fit the curve by looking at the graph–Costly and barely reliable for large-scaled data mining•The least-square method–Find the curve minimizing the sum of the squares of the deviation of points on the curve from the corresponding data points•The moving-average method–Eliminate cyclic, seasonal and irregular patterns–Loss of end data–Sensitive to outliersCS490D Spring 2004 6Discovery of Trend in Time-Series (1) •Estimation of seasonal variations–Seasonal index•Set of numbers showing the relative values of a variable during the months of the year•E.g., if the sales during October, November, and December are 80%, 120%, and 140% of the average monthly sales for the whole year, respectively, then 80, 120, and 140 are seasonal index numbers for these months–Deseasonalized data•Data adjusted for seasonal variations•E.g., divide the original monthly data by the seasonal index numbers for the corresponding monthsCS490D Spring 2004 7Discovery of Trend in Time-Series (2)•Estimation of cyclic variations–If (approximate) periodicity of cycles occurs, cyclic index can be constructed in much the same manner as seasonal indexes•Estimation of irregular variations–By adjusting the data for trend, seasonal and cyclic variations•With the systematic analysis of the trend, cyclic, seasonal, and irregular components, it is possible to make long- or short-term predictions with reasonable qualityCS490D Spring 2004 8Similarity Search in Time-Series Analysis•Normal database query finds exact match •Similarity search finds data sequences that differ only slightly from the given query sequence•Two categories of similarity queries–Whole matching: find a sequence that is similar to the query sequence–Subsequence matching: find all pairs of similar sequences•Typical Applications–Financial market–Market basket data analysis–Scientific databases–Medical diagnosisCS490D Spring 2004 9Data transformation•Many techniques for signal analysis require the data to be in the frequency domain•Usually data-independent transformations are used–The transformation matrix is determined a priori•E.g., discrete Fourier transform (DFT), discrete wavelet transform (DWT)–The distance between two signals in the time domain is the same as their Euclidean distance in the frequency domain–DFT does a good job of concentrating energy in the first few coefficients –If we keep only first a few coefficients in DFT, we can compute the lower bounds of the actual distanceCS490D Spring 2004 10Multidimensional Indexing•Multidimensional index–Constructed for efficient accessing using the first few Fourier coefficients•Use the index can to retrieve the sequences that are at most a certain small distance away from the query sequence•Perform post-processing by computing the actual distance between sequences in the time domain and discard any false matchesCS490D Spring 2004 11Subsequence Matching•Break each sequence into a set of pieces of window with length w•Extract the features of the subsequence inside the window•Map each sequence to a “trail” in the feature space•Divide the trail of each sequence into “subtrails” and represent each of them with minimum bounding rectangle•Use a multipiece assembly algorithm to search for longer sequence matchesCS490D Spring 2004 12Enhanced similarity search methods•Allow for gaps within a sequence or differences in offsets or amplitudes•Normalize sequences with amplitude scaling and offset translation•Two subsequences are considered similar if one lies within an envelope of  width around the other, ignoring outliers•Two sequences are said to be similar if they have enough non-overlapping time-ordered pairs of similar subsequences •Parameters specified by a user or expert: sliding window size, width of an envelope for similarity, maximum gap, and matching fractionCS490D Spring 2004 13Similar time series analysisCS490D Spring 2004 14Steps for Performing a Similarity Search•Atomic matching–Find all pairs of gap-free windows of a small length that are similar•Window stitching–Stitch similar windows to form pairs of large similar subsequences allowing gaps between atomic matches•Subsequence Ordering–Linearly order the subsequence matches to determine whether enough similar pieces existCS490D Spring 2004 15Similar time series analysisVanEck International Fund Fidelity Selective Precious Metal and Mineral FundTwo similar mutual funds in the different fund groupCS490D Spring 2004 16Query Languages for Time Sequences•Time-sequence query language–Should be able to specify sophisticated


View Full Document

Purdue CS 490D - Mining Time-Series

Download Mining Time-Series
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Mining Time-Series and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Mining Time-Series 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?