UMD AMSC 663 - Information Retrieval through Various Approximate Matrix Decompositions

Unformatted text preview:

11Information Retrieval through Information Retrieval through Various Approximate Matrix Various Approximate Matrix DecompositionsDecompositionsKathryn LinehanKathryn LinehanAdvisor: Dr. Dianne OAdvisor: Dr. Dianne O’’LearyLeary22Information RetrievalInformation RetrievalzzExtracting information from databasesExtracting information from databaseszzWe need an efficient way of searching We need an efficient way of searching large amounts of data large amounts of data zzExample: web search engineExample: web search engine33Querying a Document DatabaseQuerying a Document DatabasezzThe problem: return documents that are The problem: return documents that are relevant to entered search termsrelevant to entered search termszzIn real systems such as Google, the In real systems such as Google, the problem is formulated in terms of matrices:problem is formulated in terms of matrices:zzTermTerm--Document MatrixDocument MatrixzzQuery vector Query vector44TermTerm--Document MatrixDocument MatrixzzEntry ( i, j) : weight of term i in document jEntry ( i, j) : weight of term i in document j⎥⎥⎥⎥⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎢⎢⎢⎢⎣⎡15000200000102000510002001500015Document1 2 3 4TermMarkTwainSamuelClemensPurpleFairyExample:Example taken from [3]55Query VectorQuery VectorzzEntry ( i ) : weight of term i in the queryEntry ( i ) : weight of term i in the query⎥⎥⎥⎥⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎢⎢⎢⎢⎣⎡000011Example: search for “Mark Twain”TermMarkTwainSamuelClemensPurpleFairyExample taken from [3]66Literal Term MatchingLiteral Term MatchingzzGiven: Given: zzTermTerm--Document matrix, ADocument matrix, AzzQuery vector, qQuery vector, qzzUsing A and q, we need to determine Using A and q, we need to determine which documents are relevant to the querywhich documents are relevant to the queryzzFormulate score vector: s = qFormulate score vector: s = qTTAAzzReturn the highest scoring documentsReturn the highest scoring documents77Document ScoringDocument ScoringzzDoc 1 and Doc 3 will be returned as relevant, Doc 1 and Doc 3 will be returned as relevant, but Doc 2 will not but Doc 2 will not TT⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎣⎡=⎥⎥⎥⎥⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎢⎢⎢⎢⎣⎡⎥⎥⎥⎥⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎢⎢⎢⎢⎣⎡02003015000200000102000510002001500015*000011Document1 2 3 4TermMarkTwainSamuelClemensPurpleFairyScoresDoc 1Doc 2Doc 3Doc 4Example taken from [3]88Latent Semantic IndexingLatent Semantic IndexingzzProblem: We need a way to return Problem: We need a way to return relevant documents that may not contain relevant documents that may not contain the exact query termsthe exact query termszzSolution: Latent Semantic Indexing (LSI)Solution: Latent Semantic Indexing (LSI)zzUse an approximation to the termUse an approximation to the term--document document matrixmatrix99RankRank--k SVDk SVDzzStandard approximation used in LSI: rankStandard approximation used in LSI: rank--k SVDk SVDzzSVD Review: A = USVD Review: A = UΣΣVVTTzzU: orthogonal, left singular vectors of AU: orthogonal, left singular vectors of AzzΣΣ: diagonal, singular values in decreasing order: diagonal, singular values in decreasing orderzzV: orthogonal, right singular vectors of AV: orthogonal, right singular vectors of AzzRankRank--k SVD: Ak SVD: Akk= = UUkkΣΣkkVVkkTTzzUUkk: first k columns of U: first k columns of UzzΣΣkk: diagonal, top k singular values in decreasing order: diagonal, top k singular values in decreasing orderzzVVkk: first k columns of V : first k columns of V1010RankRank--k SVDk SVDzzExample: rankExample: rank--2 approximation to original term2 approximation to original term--document matrixdocument matrix⎥⎥⎥⎥⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎢⎢⎢⎢⎣⎡150002000002.128.73.801.69.31.401.163.100.1105.55.37.3Document1 2 3 4TermMarkTwainSamuelClemensPurpleFairyOriginal matrixExample taken from [3]1111Document ScoringDocument ScoringzzDoc 1, Doc 2, and Doc 3 will all be returned Doc 1, Doc 2, and Doc 3 will all be returned as relevant as relevant TT⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎣⎡=⎥⎥⎥⎥⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎢⎢⎢⎢⎣⎡⎥⎥⎥⎥⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎢⎢⎢⎢⎣⎡06.218.137.14150002000002.128.73.801.69.31.401.163.100.1105.55.37.3*000011Document1 2 3 4TermMarkTwainSamuelClemensPurpleFairyScoresDoc 1Doc 2Doc 3Doc 4Example taken from [3]Original Scores1212Literal Term Matching vs. LSILiteral Term Matching vs. LSIzzLiteral term matching: use sparse (m x n) termLiteral term matching: use sparse (m x n) term--document document matrix A matrix A zzLSI: use approximation to A. For example, ALSI: use approximation to A. For example, Akk= = UUkkΣΣkkVVkkTT, , where Uwhere Ukk: m x k, : m x k, ΣΣkk: k x k, V: k x k, Vkk: n x k. : n x k. zzThese factors can be denseThese factors can be densezzComputing AComputing Akkis a oneis a one--time expensetime expenseO((m+n)k+k)O((m+n)k+k)O((m+n)k+k)O((m+n)k+k)LSILSIO(nz(A))O(nz(A))O(nz(A))O(nz(A))Literal term Literal term matchingmatchingQuery TimeQuery TimeStorageStorage1313Precision and RecallPrecision and RecallzzWe need a measurement of performance for document retrievalWe need a measurement of performance for document retrievalzzLet Retrieved = number of documents retrieved, Let Retrieved = number of documents retrieved, Relevant = total number of relevant documents to the query,Relevant = total number of relevant documents to the query,RetRel = number of documents retrieved that are relevant.RetRel = number of documents retrieved that are relevant.zzPrecision:Precision:zzRecall: Recall: RetrievedRetRel)Retrieved( =PRelevantRetRel)Retrieved( =R1414Precision and RecallPrecision and RecallzzExample: Example: Given: set of 25 documents, query, list of Given: set of 25 documents, query, list of documents relevant to the query = {22, 1, 11, 9, 5} documents relevant to the query = {22, 1, 11, 9, 5} Task: retrieve four documents in order of Task: retrieve four documents in order of relevance scorerelevance scoreResults: 22, 3, 9, 7Results: 22, 3, 9, 742)4(,32)3(,21)2(,1)1( ==== PPPP52)4(,52)3(,51)2(,51)1( ==== RRRR1515ValidationValidationzzThree common information retrieval data Three common information retrieval data sets found at sets found at www.cs.utk.edu/~lsi/www.cs.utk.edu/~lsi/zzListed under CISI, CRAN, MEDListed under CISI, CRAN,


View Full Document
Download Information Retrieval through Various Approximate Matrix Decompositions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Information Retrieval through Various Approximate Matrix Decompositions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Information Retrieval through Various Approximate Matrix Decompositions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?