15-744 C. Faloutsos (for Dave Andersen)1SCS-CMUData Mining ToolsA crash courseC. Faloutsos15-744, S07 C. Faloutsos 2SCS-CMUSubset of:www.cs.cmu.edu/~christos/TALKS/SIGMETRICS03-tut/15-744, S07 C. Faloutsos 3SCS-CMUHigh-level Outline• [I - Traditional Data Mining tools– classification, CART trees; clustering• II - Time series: analysis and forecasting– ARIMA; Fourier, Wavelets]• III - New Tools: SVD• IV - New Tools: Fractals & power laws15-744 C. Faloutsos (for Dave Andersen)215-744, S07 C. Faloutsos 4SCS-CMUHigh-level Outline• [I - Traditional Data Mining tools• II - Time series: analysis and forecasting]• III - New Tools: SVD• IV - New Tools: Fractals & power laws15-744, S07 C. Faloutsos 5SCS-CMUIII - SVD - outline• Introduction - motivating problems• Definition - properties• Interpretation / Intuition• Solutions to posed problems• Conclusions15-744, S07 C. Faloutsos 6SCS-CMUSVD - Motivation• problem #1: find patterns in a matrix– (e.g., traffic patterns from several IP-sources)– compression; dim. reduction15-744 C. Faloutsos (for Dave Andersen)315-744, S07 C. Faloutsos 7SCS-CMUProblem#1• ~10**6 rows; ~10**3 columns; no updates;• Compress / find patterns15-744, S07 C. Faloutsos 8SCS-CMUSVD - in short:It givesthe best hyperplaneto project on15-744, S07 C. Faloutsos 9SCS-CMUSVD - in short:It givesthe best hyperplaneto project on15-744 C. Faloutsos (for Dave Andersen)415-744, S07 C. Faloutsos 10SCS-CMUIII - SVD - outline• Introduction - motivating problems• Definition - properties• Interpretation / Intuition• Solutions to posed problems• Conclusions15-744, S07 C. Faloutsos 11SCS-CMUSVD - Definition• A = U ΛΛΛΛ VT - example:15-744, S07 C. Faloutsos 12SCS-CMUSVD - notationConventions:• bold capitals -> matrix (eg. A, U, ΛΛΛΛ, V)• bold lower-case -> column vector (eg., x, v1, u3)• regular lower-case -> scalars (eg., λ1, λr)15-744 C. Faloutsos (for Dave Andersen)515-744, S07 C. Faloutsos 13SCS-CMUSVD - DefinitionA[n x m]= U[n x r]Λ Λ Λ Λ [ [ [ [ r x r](V[m x r])T• A: n x m matrix (eg., n customers, m days)• U: n x r matrix (n customers, r concepts)• ΛΛΛΛ: r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix)• V: m x r matrix (m days, r concepts)15-744, S07 C. Faloutsos 14SCS-CMUSVD - PropertiesTHEOREM [Press+92]: always possible to decompose matrix A into A = U ΛΛΛΛ VT , where• U, Λ,Λ,Λ,Λ, V: unique (*)• U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other)– UTU = I; VTV = I (I: identity matrix)• ΛΛΛΛ: eigenvalues are positive, and sorted in decreasing order15-744, S07 C. Faloutsos 15SCS-CMUSVD - example• Customers; days; #packetsComm.Res.15-744 C. Faloutsos (for Dave Andersen)615-744, S07 C. Faloutsos 16SCS-CMUSVD - Example• A = U ΛΛΛΛ VT - example:Com.Res.1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1WeTh.FrSaSu0.1800.3600.1800.90000.5300.8000.27=9.64 00 5.29x0.580.580.580 00 0 00.710.71x15-744, S07 C. Faloutsos 17SCS-CMUIII - SVD - outline• Introduction - motivating problems• Definition - properties• Interpretation / Intuition– #1: customers, days, concepts– #2: best projection - dimensionality reduction• Solutions to posed problems• Conclusions15-744, S07 C. Faloutsos 18SCS-CMUSVD - Interpretation #1‘customers’, ‘days’ and ‘concepts’• U: customer-to-concept similarity matrix• V: day-to-concept sim. matrix• ΛΛΛΛ: its diagonal elements: ‘strength’ of each concept15-744 C. Faloutsos (for Dave Andersen)715-744, S07 C. Faloutsos 19SCS-CMUSVD - Interpretation #1• A = U ΛΛΛΛ VT - example:Com.Res.1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1WeTh.FrSaSu0.1800.3600.1800.90000.5300.8000.27=9.64 00 5.29x0.580.580.580 00 0 00.710.71x2x2Rank=215-744, S07 C. Faloutsos 20SCS-CMUSVD - Interpretation #1• A = U ΛΛΛΛ VT - example:Com.Res.1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1WeTh.FrSaSu0.1800.3600.1800.90000.5300.8000.27=9.64 00 5.29x0.580.580.580 00 0 00.710.71xRank=2=2 ‘concepts’15-744, S07 C. Faloutsos 21SCS-CMU(reminder)• Customers; days; #packetsComm.Res.15-744 C. Faloutsos (for Dave Andersen)815-744, S07 C. Faloutsos 22SCS-CMUSVD - Interpretation #1• A = U ΛΛΛΛ VT - example:weekday-conceptW/end-conceptU: customer-to-concept similarity matrixCom.Res.1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1WeTh.FrSaSu0.1800.3600.1800.90000.5300.8000.27=9.64 00 5.29x0.580.580.580 00 0 00.710.71x15-744, S07 C. Faloutsos 23SCS-CMUSVD - Interpretation #1• A = U ΛΛΛΛ VT - example:weekday-conceptW/end-conceptCom.Res.1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1WeTh.FrSaSu0.1800.3600.1800.90000.5300.8000.27=9.64 00 5.29x0.580.580.580 00 0 00.710.71xU: Customer to conceptsimilarity matrix15-744, S07 C. Faloutsos 24SCS-CMUSVD - Interpretation #1• A = U ΛΛΛΛ VT - example:Com.Res.1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1WeTh.FrSaSu0.1800.3600.1800.90000.5300.8000.27=9.64 00 5.29x0.580.580.580 00 0 00.710.71xunit15-744 C. Faloutsos (for Dave Andersen)915-744, S07 C. Faloutsos 25SCS-CMUSVD - Interpretation #1• A = U ΛΛΛΛ VT - example:weekday-conceptCom.Res.1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1WeTh.FrSaSu0.1800.3600.1800.90000.5300.8000.27=9.64 00 5.29x0.580.580.580 00 0 00.710.71xStrength of ‘weekday’concept15-744, S07 C. Faloutsos 26SCS-CMUSVD - Interpretation #1• A = U ΛΛΛΛ VT - example:weekday-conceptCom.Res.1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1WeTh.FrSaSu0.1800.3600.1800.90000.5300.8000.27=9.64 00 5.29x0.580.580.580 00 0 00.710.71xV: day to conceptsimilarity matrix15-744, S07 C. Faloutsos 27SCS-CMUIII - SVD - outline• Introduction - motivating problems• Definition - properties• Interpretation / Intuition– #1: customers, days, concepts– #2: best projection - dimensionality reduction• Solutions to posed problems• Conclusions15-744 C. Faloutsos (for Dave Andersen)1015-744, S07 C. Faloutsos 28SCS-CMUSVD - Interpretation #2• best axis to project on: (‘best’ = min sum of squares of projection errors)15-744, S07 C. Faloutsos 29SCS-CMUSVD - Interpretation #215-744, S07 C. Faloutsos 30SCS-CMUSVD - Interpretation#215-744 C. Faloutsos (for Dave Andersen)1115-744, S07 C. Faloutsos 31SCS-CMUSVD - interpretation #2• minimum RMS errorSVD: givesbest axis to projectv115-744, S07 C.
View Full Document