DOC PREVIEW
CMU CS 15826 - Streams, Sensors and Forecasting

This preview shows page 1-2-14-15-29-30 out of 30 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

C. FaloutsosCMU 1CMU SCS15-826: Multimedia Databasesand Data MiningStreams, Sensors and forecastingChristos Faloutsos15-826 (c) C. Faloutsos, 2005 2CMU SCSThanksDeepay Chakrabarti (CMU)Spiros Papadimitriou (CMU)Prof. Byoung-Kee Yi (Pohang U.)Prof. Dimitris Gunopulos (UCR)Mengzhi Wang (CMU)15-826 (c) C. Faloutsos, 2005 3CMU SCSOutline• Motivation• Similarity search – distance functions• Linear Forecasting• Bursty traffic - fractals and multifractals• Non-linear forecasting• Conclusions15-826 (c) C. Faloutsos, 2005 4CMU SCSProblem definition• Given: one or more sequences x1 , x2 , … , xt , …(y1, y2, … , yt, …… )• Find– similar sequences; forecasts– patterns; clusters; outliers15-826 (c) C. Faloutsos, 2005 5CMU SCSMotivation - Applications• Financial, sales, economic series• Medical– ECGs +; blood pressure etc monitoring– reactions to new drugs– elderly care15-826 (c) C. Faloutsos, 2005 6CMU SCSMotivation - Applications (cont’d)• ‘Smart house’– sensors monitor temperature, humidity, air quality• video surveillanceC. FaloutsosCMU 215-826 (c) C. Faloutsos, 2005 7CMU SCSMotivation - Applications (cont’d)• civil/automobile infrastructure– bridge vibrations [Oppenheim+02]– road conditions / traffic monitoring15-826 (c) C. Faloutsos, 2005 8CMU SCSMotivation - Applications (cont’d)• Weather, environment/anti-pollution– volcano monitoring– air/water pollutant monitoring15-826 (c) C. Faloutsos, 2005 9CMU SCSMotivation - Applications (cont’d)• Computer systems– ‘Active Disks’ (buffering, prefetching)– web servers (ditto)– network traffic monitoring– ...15-826 (c) C. Faloutsos, 2005 10CMU SCSStream Data: Disk accessestime#bytes15-826 (c) C. Faloutsos, 2005 11CMU SCSSettings & Applications• One or more sensors, collecting time-series data15-826 (c) C. Faloutsos, 2005 12CMU SCSSettings & ApplicationsEach sensor collects data (x1, x2, …, xt, …)C. FaloutsosCMU 315-826 (c) C. Faloutsos, 2005 13CMU SCSSettings & ApplicationsSome sensors ‘report’ to others or to the central site15-826 (c) C. Faloutsos, 2005 14CMU SCSSettings & ApplicationsGoal #1:Finding patternsin a single time sequence15-826 (c) C. Faloutsos, 2005 15CMU SCSSettings & ApplicationsGoal #2:Finding patternsin many time sequences15-826 (c) C. Faloutsos, 2005 16CMU SCSProblem #1:Goal: given a signal (e.g.., #packets over time)Find: patterns, periodicities, and/or compressyearcountlynx caught per year(packets per day;temperature per day)15-826 (c) C. Faloutsos, 2005 17CMU SCSProblem#2: ForecastGiven xt, xt-1, …, forecast xt+101020304050607080901 3 5 7 9 11Time TickNumber of packets sent??15-826 (c) C. Faloutsos, 2005 18CMU SCSProblem#2’: Similarity searchE.g.., Find a 3-tick pattern, similar to the last one01020304050607080901 3 5 7 9 11Time TickNumber of packets sent??C. FaloutsosCMU 415-826 (c) C. Faloutsos, 2005 19CMU SCSProblem #3:• Given: A set of correlated time sequences• Forecast ‘Sent(t)’01020304050607080901 3 5 7 9 11Time TickNumber of packetssentlostrepeated15-826 (c) C. Faloutsos, 2005 20CMU SCSDifferences from DSP/Stat• Semi-infinite streams – we need on-line, ‘any-time’ algorithms• Can not afford human intervention– need automatic methods• sensors have limited memory / processing / transmitting power– need for (lossy) compression15-826 (c) C. Faloutsos, 2005 21CMU SCSImportant observationsPatterns, rules, forecasting and similarity indexing are closely related:• To do forecasting, we need– to find patterns/rules– to find similar settings in the past• to find outliers, we need to have forecasts– (outlier = too far away from our forecast)15-826 (c) C. Faloutsos, 2005 22CMU SCSImportant topics NOT in this tutorial:• Continuous queries– [Babu+Widom ] [Gehrke+] [Madden+]• Categorical data streams– [Hatonen+96]• Outlier detection (discontinuities)– [Breunig+00]• Related (see D. Shasha’s tutorial)15-826 (c) C. Faloutsos, 2005 23CMU SCSOutline• Motivation• Similarity Search and Indexing• DSP• Linear Forecasting• Bursty traffic - fractals and multifractals• Non-linear forecasting• Conclusions15-826 (c) C. Faloutsos, 2005 24CMU SCSOutline• Motivation• Similarity search and distance functions– Euclidean– Time-warping• DSP• ...C. FaloutsosCMU 515-826 (c) C. Faloutsos, 2005 25CMU SCSImportance of distance functionsSubtle, but absolutely necessary:• A ‘must’ for similarity indexing (-> forecasting)• A ‘must’ for clusteringTwo major families– Euclidean and Lp norms– Time warping and variations15-826 (c) C. Faloutsos, 2005 26CMU SCSEuclidean and Lp∑=−=niiiyxyxD12)(),(rrx(t) y(t)...∑=−=nipiipyxyxL1||),(rr•L1: city-block = Manhattan•L2= Euclidean•L∞15-826 (c) C. Faloutsos, 2005 27CMU SCSObservation #1• Time sequence -> n-d vector...Day-1Day-2Day-n15-826 (c) C. Faloutsos, 2005 28CMU SCSObservation #2Euclidean distance is closely related to – cosine similarity– dot product– ‘cross-correlation’ function...Day-1Day-2Day-n15-826 (c) C. Faloutsos, 2005 29CMU SCSTime Warping• allow accelerations - decelerations– (with or w/o penalty)• THEN compute the (Euclidean) distance (+ penalty)• related to the string-editing distance15-826 (c) C. Faloutsos, 2005 30CMU SCSTime Warping‘stutters’:C. FaloutsosCMU 615-826 (c) C. Faloutsos, 2005 31CMU SCSTime warpingQ: how to compute it?A: dynamic programmingD( i, j ) = cost to match prefix of length i of first sequence x with prefix of length j of second sequence y15-826 (c) C. Faloutsos, 2005 32CMU SCSThus, with no penalty for stutter, for sequencesx1, x2, …, xi,;y1, y2, …, yj−−−−+−=),1()1,()1,1(min][][),(jiDjiDjiDjyixjiDx-stuttery-stutterno stutterTime warping15-826 (c) C. Faloutsos, 2005 33CMU SCSVERY SIMILAR to the string-editing distance−−−−+−=),1()1,()1,1(min][][),(jiDjiDjiDjyixjiDx-stuttery-stutterno stutterTime warping15-826 (c) C. Faloutsos, 2005 34CMU SCSTime warping• Complexity: O(M*N) - quadratic on the length of the strings• Many variations (penalty for stutters; limit on the number/percentage of stutters; …)• popular in voice processing [Rabiner+Juang]15-826 (c) C. Faloutsos, 2005 35CMU SCSOther Distance functions• piece-wise linear/flat approx.; compare pieces [Keogh+01] [Faloutsos+97]• ‘cepstrum’ (for voice [Rabiner+Juang])– do DFT; take log of amplitude; do DFT again!• Allow for small gaps


View Full Document

CMU CS 15826 - Streams, Sensors and Forecasting

Download Streams, Sensors and Forecasting
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Streams, Sensors and Forecasting and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Streams, Sensors and Forecasting 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?