Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Slide 46Slide 47Slide 52Slide 53Slide 54Slide 55Introduction to Information Retrieval Introduction toInformation RetrievalCS276: Information Retrieval and Web SearchChristopher Manning and Pandu NayakLecture 13: Latent Semantic IndexingIntroduction to Information Retrieval Today’s topicLatent Semantic IndexingTerm-document matrices are very largeBut the number of topics that people talk about is small (in some sense)Clothes, movies, politics, …Can we represent the term-document space by a lower dimensional latent space?Ch. 18Introduction to Information Retrieval Linear Algebra BackgroundIntroduction to Information Retrieval Eigenvalues & EigenvectorsEigenvectors (for a square mm matrix S)How many eigenvalues are there at most?only has a non-zero solution if This is a mth order equation in λ which can have at most m distinct solutions (roots of the characteristic polynomial) – can be complex even though S is real.eigenvalue(right) eigenvectorExampleSec. 18.1Introduction to Information Retrieval Matrix-vector multiplicationhas eigenvalues 30, 20, 1 withcorresponding eigenvectorsOn each eigenvector, S acts as a multiple of the identitymatrix: but as a different multiple on each.Any vector (say x= ) can be viewed as a combination ofthe eigenvectors: x = 2v1 + 4v2 + 6v3Sec. 18.1Introduction to Information Retrieval Matrix-vector multiplicationThus a matrix-vector multiplication such as Sx (S, x as in the previous slide) can be rewritten in terms of the eigenvalues/vectors:Even though x is an arbitrary vector, the action of S on x is determined by the eigenvalues/vectors.Sec. 18.1Introduction to Information Retrieval Matrix-vector multiplicationSuggestion: the effect of “small” eigenvalues is small.If we ignored the smallest eigenvalue (1), then instead ofwe would getThese vectors are similar (in cosine similarity, etc.)Sec. 18.1Introduction to Information Retrieval Eigenvalues & EigenvectorsFor symmetric matrices, eigenvectors for distincteigenvalues are orthogonalAll eigenvalues of a real symmetric matrix are real.All eigenvalues of a positive semidefinite matrixare non-negativeSec. 18.1Introduction to Information Retrieval Plug in these values and solve for eigenvectors.ExampleLetThenThe eigenvalues are 1 and 3 (nonnegative, real). The eigenvectors are orthogonal (and real):Real, symmetric.Sec. 18.1Introduction to Information Retrieval Let be a square matrix with m linearly independent eigenvectors (a “non-defective” matrix)Theorem: Exists an eigen decomposition (cf. matrix diagonalization theorem)Columns of U are the eigenvectors of SDiagonal elements of are eigenvalues of Eigen/diagonal DecompositiondiagonalUnique for distinct eigen-valuesSec. 18.1Introduction to Information Retrieval Diagonal decomposition: why/howLet U have the eigenvectors as columns:Then, SU can be writtenAnd S=UΛ U–1.Thus SU=UΛ, or U–1SU=ΛSec. 18.1Introduction to Information Retrieval Diagonal decomposition - exampleRecall The eigenvectors and form Inverting, we haveThen, S=UΛU–1 =RecallUU–1 =1.Sec. 18.1Introduction to Information Retrieval Example continuedLet’s divide U (and multiply U–1) by Then, S=Q (Q-1= QT )ΛWhy? Stay tuned …Sec. 18.1Introduction to Information Retrieval If is a symmetric matrix:Theorem: There exists a (unique) eigen decompositionwhere Q is orthogonal:Q-1= QTColumns of Q are normalized eigenvectorsColumns are orthogonal.(everything is real)Symmetric Eigen DecompositionSec. 18.1Introduction to Information Retrieval ExerciseExamine the symmetric eigen decomposition, if any, for each of the following matrices:Sec. 18.1Introduction to Information Retrieval Time out!I came to this class to learn about text retrieval and mining, not to have my linear algebra past dredged up again …But if you want to dredge, Strang’s Applied Mathematics is a good place to start.What do these matrices have to do with text?Recall M N term-document matrices … But everything so far needs square matrices – so …Introduction to Information Retrieval Similarity ClusteringWe can compute the similarity between two document vector representations xi and xj by xixjTLet X = [x1 … xN] Then XXT is a matrix of similaritiesXij is symmetricSo XXT = QΛQTSo we can decompose this similarity space into a set of orthonormal basis vectors (given in Q) scaled by the eigenvalues in ΛThis leads to PCA (Principal Components Analysis)17Introduction to Information Retrieval Singular Value DecompositionMM MN V is NNFor an M N matrix A of rank r there exists afactorization (Singular Value Decomposition = SVD)as follows:(Not proven here.)Sec. 18.2Introduction to Information Retrieval Singular Value DecompositionAAT = QΛQTAAT = (UΣVT)(UΣVT)T = (UΣVT)(VΣUT) = UΣ2UT MM MN V is NNThe columns of U are orthogonal eigenvectors of AAT.The columns of V are orthogonal eigenvectors of ATA.Singular valuesEigenvalues λ1 … λr of AAT are the eigenvalues of ATA.Sec. 18.2Introduction to Information Retrieval Singular Value DecompositionIllustration of SVD dimensions and sparsenessSec. 18.2Introduction to Information Retrieval SVD exampleLetThus M=3, N=2. Its SVD isTypically, the singular values arranged in decreasing order.Sec. 18.2Introduction to Information Retrieval SVD can be used to compute optimal low-rank approximations.Approximation problem: Find Ak of rank k such thatAk and X are both m n matrices.Typically, want k << r.Low-rank ApproximationFrobenius normSec. 18.3Introduction to Information Retrieval Solution via SVDLow-rank Approximationset smallest r-ksingular values to zerocolumn notation: sum of rank 1 matriceskSec. 18.3Introduction to Information Retrieval If we retain only k singular values, and set the rest to 0, then we don’t need the matrix parts in colorThen Σ is k×k, U is M×k, VT is k×N, and Ak is M×N This is referred to
View Full Document