Unformatted text preview:

15-744 C. Faloutsos (for Dave Andersen)1SCS-CMUData Mining ToolsA crash courseC. Faloutsos15-744, S07 C. Faloutsos 2SCS-CMUSubset of:www.cs.cmu.edu/~christos/TALKS/SIGMETRICS03-tut/15-744, S07 C. Faloutsos 3SCS-CMUHigh-level Outline• [I - Traditional Data Mining tools– classification, CART trees; clustering• II - Time series: analysis and forecasting– ARIMA; Fourier, Wavelets]• III - New Tools: SVD• IV - New Tools: Fractals & power laws15-744 C. Faloutsos (for Dave Andersen)215-744, S07 C. Faloutsos 4SCS-CMUHigh-level Outline• [I - Traditional Data Mining tools• II - Time series: analysis and forecasting]• III - New Tools: SVD• IV - New Tools: Fractals & power laws15-744, S07 C. Faloutsos 5SCS-CMUIII - SVD - outline• Introduction - motivating problems• Definition - properties• Interpretation / Intuition• Solutions to posed problems• Conclusions15-744, S07 C. Faloutsos 6SCS-CMUSVD - Motivation• problem #1: find patterns in a matrix– (e.g., traffic patterns from several IP-sources)– compression; dim. reduction15-744 C. Faloutsos (for Dave Andersen)315-744, S07 C. Faloutsos 7SCS-CMUProblem#1• ~10**6 rows; ~10**3 columns; no updates;• Compress / find patterns15-744, S07 C. Faloutsos 8SCS-CMUSVD - in short:It givesthe best hyperplaneto project on15-744, S07 C. Faloutsos 9SCS-CMUSVD - in short:It givesthe best hyperplaneto project on15-744 C. Faloutsos (for Dave Andersen)415-744, S07 C. Faloutsos 10SCS-CMUIII - SVD - outline• Introduction - motivating problems• Definition - properties• Interpretation / Intuition• Solutions to posed problems• Conclusions15-744, S07 C. Faloutsos 11SCS-CMUSVD - Definition• A = U ΛΛΛΛ VT - example:15-744, S07 C. Faloutsos 12SCS-CMUSVD - notationConventions:• bold capitals -> matrix (eg. A, U, ΛΛΛΛ, V)• bold lower-case -> column vector (eg., x, v1, u3)• regular lower-case -> scalars (eg., λ1, λr)15-744 C. Faloutsos (for Dave Andersen)515-744, S07 C. Faloutsos 13SCS-CMUSVD - DefinitionA[n x m]= U[n x r]Λ Λ Λ Λ [ [ [ [ r x r](V[m x r])T• A: n x m matrix (eg., n customers, m days)• U: n x r matrix (n customers, r concepts)• ΛΛΛΛ: r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix)• V: m x r matrix (m days, r concepts)15-744, S07 C. Faloutsos 14SCS-CMUSVD - PropertiesTHEOREM [Press+92]: always possible to decompose matrix A into A = U ΛΛΛΛ VT , where• U, Λ,Λ,Λ,Λ, V: unique (*)• U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other)– UTU = I; VTV = I (I: identity matrix)• ΛΛΛΛ: eigenvalues are positive, and sorted in decreasing order15-744, S07 C. Faloutsos 15SCS-CMUSVD - example• Customers; days; #packetsComm.Res.15-744 C. Faloutsos (for Dave Andersen)615-744, S07 C. Faloutsos 16SCS-CMUSVD - Example• A = U ΛΛΛΛ VT - example:Com.Res.1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1WeTh.FrSaSu0.1800.3600.1800.90000.5300.8000.27=9.64 00 5.29x0.580.580.580 00 0 00.710.71x15-744, S07 C. Faloutsos 17SCS-CMUIII - SVD - outline• Introduction - motivating problems• Definition - properties• Interpretation / Intuition– #1: customers, days, concepts– #2: best projection - dimensionality reduction• Solutions to posed problems• Conclusions15-744, S07 C. Faloutsos 18SCS-CMUSVD - Interpretation #1‘customers’, ‘days’ and ‘concepts’• U: customer-to-concept similarity matrix• V: day-to-concept sim. matrix• ΛΛΛΛ: its diagonal elements: ‘strength’ of each concept15-744 C. Faloutsos (for Dave Andersen)715-744, S07 C. Faloutsos 19SCS-CMUSVD - Interpretation #1• A = U ΛΛΛΛ VT - example:Com.Res.1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1WeTh.FrSaSu0.1800.3600.1800.90000.5300.8000.27=9.64 00 5.29x0.580.580.580 00 0 00.710.71x2x2Rank=215-744, S07 C. Faloutsos 20SCS-CMUSVD - Interpretation #1• A = U ΛΛΛΛ VT - example:Com.Res.1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1WeTh.FrSaSu0.1800.3600.1800.90000.5300.8000.27=9.64 00 5.29x0.580.580.580 00 0 00.710.71xRank=2=2 ‘concepts’15-744, S07 C. Faloutsos 21SCS-CMU(reminder)• Customers; days; #packetsComm.Res.15-744 C. Faloutsos (for Dave Andersen)815-744, S07 C. Faloutsos 22SCS-CMUSVD - Interpretation #1• A = U ΛΛΛΛ VT - example:weekday-conceptW/end-conceptU: customer-to-concept similarity matrixCom.Res.1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1WeTh.FrSaSu0.1800.3600.1800.90000.5300.8000.27=9.64 00 5.29x0.580.580.580 00 0 00.710.71x15-744, S07 C. Faloutsos 23SCS-CMUSVD - Interpretation #1• A = U ΛΛΛΛ VT - example:weekday-conceptW/end-conceptCom.Res.1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1WeTh.FrSaSu0.1800.3600.1800.90000.5300.8000.27=9.64 00 5.29x0.580.580.580 00 0 00.710.71xU: Customer to conceptsimilarity matrix15-744, S07 C. Faloutsos 24SCS-CMUSVD - Interpretation #1• A = U ΛΛΛΛ VT - example:Com.Res.1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1WeTh.FrSaSu0.1800.3600.1800.90000.5300.8000.27=9.64 00 5.29x0.580.580.580 00 0 00.710.71xunit15-744 C. Faloutsos (for Dave Andersen)915-744, S07 C. Faloutsos 25SCS-CMUSVD - Interpretation #1• A = U ΛΛΛΛ VT - example:weekday-conceptCom.Res.1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1WeTh.FrSaSu0.1800.3600.1800.90000.5300.8000.27=9.64 00 5.29x0.580.580.580 00 0 00.710.71xStrength of ‘weekday’concept15-744, S07 C. Faloutsos 26SCS-CMUSVD - Interpretation #1• A = U ΛΛΛΛ VT - example:weekday-conceptCom.Res.1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1WeTh.FrSaSu0.1800.3600.1800.90000.5300.8000.27=9.64 00 5.29x0.580.580.580 00 0 00.710.71xV: day to conceptsimilarity matrix15-744, S07 C. Faloutsos 27SCS-CMUIII - SVD - outline• Introduction - motivating problems• Definition - properties• Interpretation / Intuition– #1: customers, days, concepts– #2: best projection - dimensionality reduction• Solutions to posed problems• Conclusions15-744 C. Faloutsos (for Dave Andersen)1015-744, S07 C. Faloutsos 28SCS-CMUSVD - Interpretation #2• best axis to project on: (‘best’ = min sum of squares of projection errors)15-744, S07 C. Faloutsos 29SCS-CMUSVD - Interpretation #215-744, S07 C. Faloutsos 30SCS-CMUSVD - Interpretation#215-744 C. Faloutsos (for Dave Andersen)1115-744, S07 C. Faloutsos 31SCS-CMUSVD - interpretation #2• minimum RMS errorSVD: givesbest axis to projectv115-744, S07 C.


View Full Document

Duke CPS 214 - Data Mining Tools

Download Data Mining Tools
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Mining Tools and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Mining Tools 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?