Spatial and Temporal DatabasesTable of ContentsIntroductionIntroduction (cont)Introduction (cont.)Related WorksRelated Works (cont.)The proposed Approach : Similarity ModelProposed Approach : Haar WaveletProposed Approach : Haar Wavelet (cont)Slide 11Proposed Approach : DFT versus Haar (cont)Proposed Approach: Guarantee of no False DismissalThe Overall StrategyExperimental ResultsExperimental Results (cont.)ConclusionSpatial and Temporal DatabasesEfficiently Time Series Matching by Wavelets (ICDE 98)Kin-pong Chan and Ada Wai-chee Fu2Table of ContentsIntroductionRelated WorksThe Proposed ApproachOverall StrategyPerformance EvaluationConclusion3IntroductionTime-series: a sequence of real numbers, each number representing a value at a time point (financial data, scientific observation data, …)Time-series databases supporting fast retrieval of data and similarity query are desired4Introduction (cont)Similarity SearchFinds data sequences that differ only slightly from the given query sequenceExample) One may want to find all companies whose stock price fluctuations behave similarly with IBM during a year.Similarity matching processGiven compute10212)||(),(niiiyxyxD),.. .,,(121 nxxxx),... ,,(121 nyyyy5Introduction (cont.)Indexing Dimensionality reductionTransformation is applied to reduce dimensionCompletenessNature of dataEffectiveness of power concentration of a particulartransformation depends on the nature of the time series),())(),(( yxDyfxfD 6Related WorksDiscrete Fourier Transform (Agrawal et al)Parseval’s theoremF-index may raise false alarm, but guarantee no false dismissalDisadvantage: misses the important feature of time localization22|||||||| YXyx 7Related Works (cont.)Singular Value Decomposition: decompose a matrix X of size N*M into RestrictionX is not updatedX can be updated daily or monthly. In that case, SVD has to be recomputed the whole matrix again to updateVUX8The proposed Approach : Similarity ModelDefine new similarity model used in sequence matching 10212)||(),(niiiyxyxD10212)))()(((),(nixyiimmxyyxD9Proposed Approach : Haar WaveletHaar waveletAllows a good approximation with a subset of coefficientsFast to compute and requires little storageIt preserves Euclidean distance10Proposed Approach : Haar Wavelet (cont)Example of Wavelet ComputationAssume Original time sequence is f(x) = (9 7 3 5) 4 (9 7 3 5) 1 (6) (2) 2 (8 4) (1 –1)Resolution Average Coefficients=6+2=6-2=8+1=8-1=4+(-1)=4-(-1)11Proposed Approach : Haar Wavelet (cont)Instead of storing 6,2,1 and -1, assume we store first two coefficient, 6 and 2Reconstruction Process 4 (8 8 4 4) 1 (6) (2) 2 (8 4)Resolution Average Coefficients (0 0)Original: (9 7 3 5), Reconstructed: (8 8 4 4) We can reduce dimension of the data with sacrificing the accuracy12Proposed Approach : DFT versus Haar (cont)Motivation of replacing DFT with DWTPruning power: less false alarm appear in DWT than DFT Complexity considerationComplexity of Haar is O(n) while O(nlogn) for Fast Fourier TransformNote: DWT does not require massive index reorganization in case of update, which is a major drawback of SVD13Proposed Approach:Guarantee of no False DismissalNo qualified time sequence will be rejected, thus no false dismissalThey show that this property holds for the Haar wavelet where ),(2),( rsDyxD ryHsxH )(,)(14The Overall StrategyPre-processingSimilarity Model Selection:User can select Euclidean distance or v-shift similarityHaar wavelet transform is applied to time-seriesIndex ConstructionIndex structure such as R-tree is built using first few coefficientsRange QueryNearest Neighbor Query15Experimental ResultstransformtimeSSecision Pr16Experimental Results (cont.)Scalability Test17ConclusionEfficient time series matching through dimension reduction by Haar wavelet transformOutperforms DFT in terms of pruning power, scalability and
View Full Document