C. FaloutsosCMU 1CMU SCS15-826: Multimedia Databasesand Data MiningStreams, Sensors and forecastingChristos Faloutsos15-826 (c) C. Faloutsos, 2005 2CMU SCSThanksDeepay Chakrabarti (CMU)Spiros Papadimitriou (CMU)Prof. Byoung-Kee Yi (Pohang U.)Prof. Dimitris Gunopulos (UCR)Mengzhi Wang (CMU)15-826 (c) C. Faloutsos, 2005 3CMU SCSOutline• Motivation• Similarity search – distance functions• Linear Forecasting• Bursty traffic - fractals and multifractals• Non-linear forecasting• Conclusions15-826 (c) C. Faloutsos, 2005 4CMU SCSProblem definition• Given: one or more sequences x1 , x2 , … , xt , …(y1, y2, … , yt, …… )• Find– similar sequences; forecasts– patterns; clusters; outliers15-826 (c) C. Faloutsos, 2005 5CMU SCSMotivation - Applications• Financial, sales, economic series• Medical– ECGs +; blood pressure etc monitoring– reactions to new drugs– elderly care15-826 (c) C. Faloutsos, 2005 6CMU SCSMotivation - Applications (cont’d)• ‘Smart house’– sensors monitor temperature, humidity, air quality• video surveillanceC. FaloutsosCMU 215-826 (c) C. Faloutsos, 2005 7CMU SCSMotivation - Applications (cont’d)• civil/automobile infrastructure– bridge vibrations [Oppenheim+02]– road conditions / traffic monitoring15-826 (c) C. Faloutsos, 2005 8CMU SCSMotivation - Applications (cont’d)• Weather, environment/anti-pollution– volcano monitoring– air/water pollutant monitoring15-826 (c) C. Faloutsos, 2005 9CMU SCSMotivation - Applications (cont’d)• Computer systems– ‘Active Disks’ (buffering, prefetching)– web servers (ditto)– network traffic monitoring– ...15-826 (c) C. Faloutsos, 2005 10CMU SCSStream Data: Disk accessestime#bytes15-826 (c) C. Faloutsos, 2005 11CMU SCSSettings & Applications• One or more sensors, collecting time-series data15-826 (c) C. Faloutsos, 2005 12CMU SCSSettings & ApplicationsEach sensor collects data (x1, x2, …, xt, …)C. FaloutsosCMU 315-826 (c) C. Faloutsos, 2005 13CMU SCSSettings & ApplicationsSome sensors ‘report’ to others or to the central site15-826 (c) C. Faloutsos, 2005 14CMU SCSSettings & ApplicationsGoal #1:Finding patternsin a single time sequence15-826 (c) C. Faloutsos, 2005 15CMU SCSSettings & ApplicationsGoal #2:Finding patternsin many time sequences15-826 (c) C. Faloutsos, 2005 16CMU SCSProblem #1:Goal: given a signal (e.g.., #packets over time)Find: patterns, periodicities, and/or compressyearcountlynx caught per year(packets per day;temperature per day)15-826 (c) C. Faloutsos, 2005 17CMU SCSProblem#2: ForecastGiven xt, xt-1, …, forecast xt+101020304050607080901 3 5 7 9 11Time TickNumber of packets sent??15-826 (c) C. Faloutsos, 2005 18CMU SCSProblem#2’: Similarity searchE.g.., Find a 3-tick pattern, similar to the last one01020304050607080901 3 5 7 9 11Time TickNumber of packets sent??C. FaloutsosCMU 415-826 (c) C. Faloutsos, 2005 19CMU SCSProblem #3:• Given: A set of correlated time sequences• Forecast ‘Sent(t)’01020304050607080901 3 5 7 9 11Time TickNumber of packetssentlostrepeated15-826 (c) C. Faloutsos, 2005 20CMU SCSDifferences from DSP/Stat• Semi-infinite streams – we need on-line, ‘any-time’ algorithms• Can not afford human intervention– need automatic methods• sensors have limited memory / processing / transmitting power– need for (lossy) compression15-826 (c) C. Faloutsos, 2005 21CMU SCSImportant observationsPatterns, rules, forecasting and similarity indexing are closely related:• To do forecasting, we need– to find patterns/rules– to find similar settings in the past• to find outliers, we need to have forecasts– (outlier = too far away from our forecast)15-826 (c) C. Faloutsos, 2005 22CMU SCSImportant topics NOT in this tutorial:• Continuous queries– [Babu+Widom ] [Gehrke+] [Madden+]• Categorical data streams– [Hatonen+96]• Outlier detection (discontinuities)– [Breunig+00]• Related (see D. Shasha’s tutorial)15-826 (c) C. Faloutsos, 2005 23CMU SCSOutline• Motivation• Similarity Search and Indexing• DSP• Linear Forecasting• Bursty traffic - fractals and multifractals• Non-linear forecasting• Conclusions15-826 (c) C. Faloutsos, 2005 24CMU SCSOutline• Motivation• Similarity search and distance functions– Euclidean– Time-warping• DSP• ...C. FaloutsosCMU 515-826 (c) C. Faloutsos, 2005 25CMU SCSImportance of distance functionsSubtle, but absolutely necessary:• A ‘must’ for similarity indexing (-> forecasting)• A ‘must’ for clusteringTwo major families– Euclidean and Lp norms– Time warping and variations15-826 (c) C. Faloutsos, 2005 26CMU SCSEuclidean and Lp∑=−=niiiyxyxD12)(),(rrx(t) y(t)...∑=−=nipiipyxyxL1||),(rr•L1: city-block = Manhattan•L2= Euclidean•L∞15-826 (c) C. Faloutsos, 2005 27CMU SCSObservation #1• Time sequence -> n-d vector...Day-1Day-2Day-n15-826 (c) C. Faloutsos, 2005 28CMU SCSObservation #2Euclidean distance is closely related to – cosine similarity– dot product– ‘cross-correlation’ function...Day-1Day-2Day-n15-826 (c) C. Faloutsos, 2005 29CMU SCSTime Warping• allow accelerations - decelerations– (with or w/o penalty)• THEN compute the (Euclidean) distance (+ penalty)• related to the string-editing distance15-826 (c) C. Faloutsos, 2005 30CMU SCSTime Warping‘stutters’:C. FaloutsosCMU 615-826 (c) C. Faloutsos, 2005 31CMU SCSTime warpingQ: how to compute it?A: dynamic programmingD( i, j ) = cost to match prefix of length i of first sequence x with prefix of length j of second sequence y15-826 (c) C. Faloutsos, 2005 32CMU SCSThus, with no penalty for stutter, for sequencesx1, x2, …, xi,;y1, y2, …, yj−−−−+−=),1()1,()1,1(min][][),(jiDjiDjiDjyixjiDx-stuttery-stutterno stutterTime warping15-826 (c) C. Faloutsos, 2005 33CMU SCSVERY SIMILAR to the string-editing distance−−−−+−=),1()1,()1,1(min][][),(jiDjiDjiDjyixjiDx-stuttery-stutterno stutterTime warping15-826 (c) C. Faloutsos, 2005 34CMU SCSTime warping• Complexity: O(M*N) - quadratic on the length of the strings• Many variations (penalty for stutters; limit on the number/percentage of stutters; …)• popular in voice processing [Rabiner+Juang]15-826 (c) C. Faloutsos, 2005 35CMU SCSOther Distance functions• piece-wise linear/flat approx.; compare pieces [Keogh+01] [Faloutsos+97]• ‘cepstrum’ (for voice [Rabiner+Juang])– do DFT; take log of amplitude; do DFT again!• Allow for small gaps
View Full Document