UMD AMSC 663 - Information Retrieval through Various Approximate Matrix Decompositions

Unformatted text preview:

Information Retrieval through Various Approximate Matrix DecompositionsInformation RetrievalQuerying a Document DatabaseTerm-Document MatrixQuery VectorDocument ScoringCan we do better if we replace the matrix by an approximation?Nonnegative Matrix Factorization (NMF)NMFNMF ValidationSlide 11CUR DecompositionCUR ImplementationsSamplingComputation of UDeterministic CURCompact Matrix Decomposition (CMD) ImprovementCUR: Sampling with Replacement ValidationSampling without Replacement: Scaling vs. No ScalingCUR: Sampling without Replacement ValidationCUR ComparisonJudging Success: Precision and RecallLSI ResultsSlide 24Matrix Approximation ResultsConclusionsCompleted Project GoalsReferences1Information Retrieval through Various Approximate Matrix DecompositionsKathryn LinehanAdvisor: Dr. Dianne O’Leary2Information RetrievalExtracting information from databasesWe need an efficient way of searching large amounts of data Example: web search engine3Querying a Document DatabaseWe want to return documents that are relevant to entered search termsGiven data: •Term-Document Matrix, A •Entry ( i , j ): importance of term i in document j•Query Vector, q•Entry ( i ): importance of term i in the query4Term-Document MatrixEntry ( i, j) : weight of term i in document j 15000200000102000510002001500015 Document 1 2 3 4TermMarkTwainSamuelClemensPurpleFairyExample:Example taken from [5]5Query VectorEntry ( i ) : weight of term i in the query 000011 Example: search for “Mark Twain”TermMarkTwainSamuelClemensPurpleFairyExample taken from [5]6Document ScoringDoc 1 and Doc 3 will be returned as relevant, but Doc 2 will not TT02003015000200000102000510002001500015*000011 Document 1 2 3 4TermMarkTwainSamuelClemensPurpleFairyScoresDoc 1Doc 2Doc 3Doc 4Example taken from [5]7Can we do better if we replace the matrix by an approximation?Singular Value Decomposition (SVD) • Nonnegative Matrix Factorization (NMF)• CUR Decomposition• WHA CURA TVUA 8Nonnegative Matrix Factorization (NMF)HWA*m x n m x kk x n• W and H are nonnegative• kWHrank )(Storage: k(m + n) entries9NMFMultiplicative update algorithm of Lee and Seung found in [1]•Find W, H to minimize •Random initialization for W,H•Gradient descent method•Slow due to matrix multiplications in iteration 221FWHA 10NMF ValidationA: 5 x 3 random dense matrix. Average over 5 runs.1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 300.050.10.150.20.250.30.350.4NMF Validation: Relative Errork: rank(WH), rank(SVD) <= krelative error: Frobenius norm NMFSVD40 60 80 100 120 140 160 180 20000.10.20.30.40.50.60.70.8NMF Validation: Relative Errork: rank(WH), rank(SVD) <= krelative error: Frobenius norm NMFSVDB: 500 x 200 random sparse matrix. Average over 5 runs.11NMF Validation10 20 30 40 50 60 70 80 90 1000.50.550.60.650.70.750.80.850.90.95Iteration NumberRelative error: Frobenius NormRelative Error vs. Iteration Number NMFSVDB: 500 x 200 random sparse matrix. Rank(NMF) = 80.12CUR Decomposition**A C U Rm x n m x c c x r r x n• C (R) holds c (r) sampled and rescaled columns (rows) of A• U is computed using C and R• kCURrank )(where k is a rank parameter,Storage: (nz(C) + cr + nz(R)) entries13CUR ImplementationsCUR algorithm in [3] by Drineas, Kannan, and Mahoney•Linear time algorithm•Improvement: Compact Matrix Decomposition (CMD) in [6] by Sun, Xie, Zhang, and Faloutsos•Modification: use ideas in [4] by Drineas, Mahoney, Muthukrishnan (no longer linear time)•Other Modifications: our ideasDeterministic CUR code by G. W. Stewart [2]14SamplingColumn (Row) norm sampling [3]•Prob(col j) = (similar for row i)Subspace Sampling [4]•Uses rank-k SVD of A for column probabilities•Prob(col j) = •Uses “economy size” SVD of C for row probabilities•Prob(row i) =Sampling without replacement 22)(:,FAjAkjVkA2,:),(ciUC2:),(15Computation of ULinear U [3]: approximately solves• Optimal U: solves• FUUCAˆminˆRCCACCCUTkTTT)()(ˆ2minFUCURA  )()(TTTTRRARCCCU16Deterministic CURCode by G. W. Stewart [2]Uses a RRQR algorithm that does not store Q•We only need the permutation vector•Gives us the columns (rows) for C (R)Uses an optimal U17Compact Matrix Decomposition (CMD) ImprovementRemove repeated columns (rows) in C (R)Decreases storage while still achieving the same relative error [6]Algorithm [3] [3] with CMDRuntime 0.008060 0.007153Storage 880.5 550.5Relative Error 0.820035 0.820035A: 50 x 30 random sparse matrix, k = 15. Average over 10 runs.18CUR: Sampling with Replacement ValidationA: 5 x 3 random dense matrix. Average over 5 runs.Legend: Sampling, U 5 6 7 8 9 10 11 12 13 14 1500.050.10.150.20.250.30.350.40.450.5CUR Validation: Relative Errorc/r: number of columns/rows sampledrelative error: Frobenius norm CN,LCN,OS,LS,OSVD500 600 700 800 900 1000 1100 1200 1300 1400 150000.050.10.150.20.250.30.350.4CUR Validation: Relative Errorc/r: number of columns/rows sampledrelative error: Frobenius norm CN,LCN,OS,LS,OSVD1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3x 10500.050.10.150.20.250.30.350.4CUR Validation: Relative Errorc/r: number of columns/rows sampledrelative error: Frobenius norm CN,LCN,OS,LS,OSVD19Sampling without Replacement: Scaling vs. No Scaling Invert scaling factor applied to• TkTCCU )(20CUR: Sampling without Replacement ValidationA: 5 x 3 random dense matrix. Average over 5 runs. 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 300.10.20.30.40.50.60.70.8CUR Validation: Relative Error, r = 2k, c = kk: rank(CUR) <= k, rank(SVD) <= krelative error: Frobenius norm w/o R,L,w/o Scw/o R,L,ScSVD40 60 80 100 120 140 160 180 20000.10.20.30.40.50.60.70.80.91CUR Validation: Relative Error, c = 3k, r = kk: rank(CUR) <= k,


View Full Document
Download Information Retrieval through Various Approximate Matrix Decompositions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Information Retrieval through Various Approximate Matrix Decompositions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Information Retrieval through Various Approximate Matrix Decompositions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?