DOC PREVIEW
BU CS 565 - Time-series data analysis

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Why deal with sequential data?Time-series dataQuestionsSequence segmentationDynamic-programming algorithmExampleBasic definitionsThe k-segmentation problemOptimal solution for the k-segmentation problemHeuristicsApproximation algorithmDivide ’n Segment (DnS) algorithmDnS algorithm - DetailsThe DnS algorithmThe DnS algorithm – Step 1The DnS algorithm – Step 2The DnS algorithm – Step 2The DnS algorithm – Step 3The DnS algorithm – Step 4Running timeThe segmentation errorProof for E1ProofTrading speed for accuracyReal datasets – DnS algorithmReal datasets – DnS algorithmSpeed vs. accuracy in practiceTime-series data analysisWhy deal with sequential data?•Because all data is sequential •All data items arrive in the data store in some order•Examples–transaction data–documents and words•In some (or many) cases the order does not matter•In many cases the order is of interestTime-series dataFinancial time series, process monitoring…Questions•What is the structure of sequential data?•Can we represent this structure compactly and accurately?Sequence segmentation•Gives an accurate representation of the structure of sequential data•How?–By trying to find homogeneous segments •Segmentation question:•Can a sequence T={t1,t2,…,tn} be described as a concatenation of subsequences S1,S2,…,Sk such that each Si is in some sense homogeneous?•The corresponding notion of segmentation in unordered data is clusteringDynamic-programming algorithm•Sequence T, length n, k segments, cost function E(), table M•For i=1 to n–Set M[1,i]=E(T[1…i]) //Everything in one cluster•For j=1 to k–Set M[j,j] = 0 //each point in its own cluster•For j=2 to k–For i=j+1 to n•Set M[j,i] = mini’<i{M[j-1,i]+E(T[i’+1…i])}•To recover the actual segmentation (not just the optimal cost) store also the minimizing values i’•Takes time O(n2k), space O(kn)ExampletRtRBasic definitions•Sequence T = {t1,t2,…,tn}: an ordered set of n d-dimensional real points tiЄRd•A k-segmentation S: a partition of T into k contiguous segments {s1,s2,…,sk}–Each segment sЄS is represented by a single value μsЄRd (the representative of the segment)•Error Ep(S): The error of replacing individual points with representativespSs stpsptSE1||)(  The k-segmentation problem•Common cases for the error functionEp: p = 1 and p = 2.–When p = 1, the best μs corresponds the median of the points in segment s.–When p = 2, the best μs corresponds to the mean of the points in segment s.Given a sequence T of length n and a value k, find a k-segmentation S = {s1, s2, …,sk} of T such that the Ep error is minimized.Optimal solution for the k-segmentation problem•Running time O(n2k)–Too expensive for large datasets![Bellman’61] The k-segmentation problem can be solved optimally using a standard dynamic-programming algorithmHeuristics•Bottom-up greedy (BU): O(nlogn)–[Keogh and Smyth’97, Keogh and Pazzani’98]•Top-down greedy (TD): O(nlogn)–[Douglas and Peucker’73, Shatkay and Zdonik’96, Lavrenko et. al’00]•Global Iterative Replacement (GiR): O(nI)–[Himberg et. al ’01]•Local Iterative Replacement (LiR): O(nI)–[Himberg et. al ’01]Approximation algorithm[Theorem] The segmentation problem can be approximated within a constant factor of 3 for both E1 and E2 error measures. That is,2,1 )(3)(  pSESEOPTpDnSpThe running time of the approximation algorithm is:)(3/53/4knODivide ’n Segment (DnS) algorithm•Main idea–Split the sequence arbitrarily into subsequences–Solve the k-segmentation problem in each subsequence–Combine the results•Advantages–Extremely simple–High quality results–Can be applied to other segmentation problems[Gionis’03, Haiminen’04,Bingham’06]DnS algorithm - DetailsInput: Sequence T, integer kOutput: a k-segmentation of T1. Partition sequence T arbitrarily into m disjoint intervals T1,T2,…,Tm2. For each interval Ti solve optimally the k- segmentation problem using DP algorithm3. Let T’ be the concatenation of mk representatives produced in Step 2. Each representative is weighted with the length of the segment it represents4. Solve optimally the k-segmentation problem for T’ using the DP algorithm and output this segmentation as the final segmentationThe DnS algorithmInput sequence T consisting of n=20 points (k=2)102030405060708090100The DnS algorithm – Step 1Partition the sequence into m=3 disjoint intervals102030405060708090100T1T2T3The DnS algorithm – Step 2102030405060708090100T1T2T3Solve optimally the k-segmentation problem into each partition (k=2)The DnS algorithm – Step 2102030405060708090100T1T2T3Solve optimally the k-segmentation problem into each partition (k=2)The DnS algorithm – Step 3102030405060708090100Sequence T’ consisting of mk=6 representantivesw=8w=4w=4w=1w=1w=2The DnS algorithm – Step 4102030405060708090100Solve k-segmentation on T’ (k=2)w=8w=4w=4w=1w=1w=2Running time•In the case of equipartition in Step 1, the running time of the algorithm as a function of m is:Running time3/53/402)( knmR kmkkmnmmR22)()( The function R(m) is minimized for320knmThe segmentation error[Theorem] The segmentation error of the DnS algorithms is at most three times the error of the optimal (DP) algorithm for both E1 and E2 error measures.2,1 )(3)(  pSESEOPTpDnSpProof for E1–λt: the representative of point t in the optimal segmentation –τ: the representative of point t in the segmentation of Step 2102030405060708090100 Tt Ttttdtd ),(),(11Lemma :ProofTttDnStdSE ),()(11 Tttdtd ),(),(11 Tttdtd ),(),(11 Ttttdtdtd ),(),(),(111TttTtttdtd ),(),(211)(3OPTSE(triangle inequality)(optimality of DP)(triangle inequality)(Lemma)(triangle inequality)(optimality of DP)(triangle inequality)(Lemma) Tt Ttttdtd ),(),(11Lemma :λt: the representative of point t in the optimal segmentationτ: the representative of point t in the segmentation of Step 2μt: the representative of point t in the final segmentation in Step 4Trading speed for accuracy•Recursively divide (into m pieces) and segment•If χ=(ni)1/2, where ni the length of the sequence in the i-th recursive level (n1=n) then–running time


View Full Document

BU CS 565 - Time-series data analysis

Documents in this Course
Load more
Download Time-series data analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Time-series data analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Time-series data analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?