UCSD CSE 254 - Probabilistic Latent Semantic Indexing - D2979678

Home> Schools> University of California, San Diego> Computer Science & Engineering (CSE) > CSE 254> Probabilistic Latent Semantic Indexing

UCSD CSE 254 - Probabilistic Latent Semantic Indexing

School name University of California, San Diego

Course Cse 254- Seminar on Learning Algorithms

Pages 12

Download Save

Unformatted text preview:

11Probabilistic Latent Semantic IndexingThomas HofmannProceedings of the Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval (1999)Presentation by Jonathan UltisComputer Science and Engineering DepartmentUniversity of California, San [email protected]. Latent Semantic AnalysisII. Singular Value DecompositionIII. Probabilistic Latent Semantic AnalysisIV. ExperimentsV. The PLSA Advantage23Issues in Information Retrieval•Synonyms are separate words that have the same meaning. They tend to reduce recall.•Polysemy refers to words that have multiple meanings. This problem tends to reduce precision.•Both issues point to a more general problem. There is a disconnect between topics and keywords.4Latent Semantic Analysis (LSA)•LSA attempts to discover information about the meaning behind words.•Highly correlated words usually indicate the existence of a topic.•LSA is proposed as an automated solution to the problems of synonymy and polysemy.35Indexing by Latent Semantic Analysis1•Once LSA has been performed, documents can be written as vectors in latent semantic space rather than as word vectors. This is known as Latent Semantic Indexing (LSI).•This paper proposed Singular Value Decomposition (SVD) as an appropriate technique for LSA.•SVD is still the most commonly used technique for LSA.1) Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Richard Harshman (1990)6SVD Overview•Documents are commonly represented as vectors of term frequencies.•Multiple documents can be represented as a matrix Xof terms by documents.47•SVD is a standard technique for breaking a matrix into orthogonal components.•m is the rank of the original matrix. mis usually much smaller than either tor d.•The matrix S0is a diagonal matrix of singular values.•T0and D0are matrices of orthonormal columns and rows respectively.8•The largest entries in S0represent the dimensions of greatest variance. These represent strong divisions in term usage.•LSA is performed by removing small entries in S0. •Xwill be as close to X0as possible for a rank kmatrix (least-squares-fit).59Least Squares Fit•The least-squares-fit error has the minimum sum squared error for a matrix of its rank.•Example:Given the vector {1, 5, 1}The vector {2, 5, 2} has a sum squared error of 2.The vector {1, 8, 1} has a sum squared error of 9.Therefore {1, 5, 1} is closer to {2, 5, 2} in a least-squares sense.10ProbabilisticLatent Semantic Analysis•PLSA is based on a generative probabilistic model.•Documents generate a particular distribution of aspects (topics).•Aspects generate a particular distribution of word usage.611•The probability of each document and the probability of each word are known.•The probability of an aspect given a document is unknown.•The probability of a word given an aspect is unknown.Visualizations of equation 212Maximum Likelihood•The EM algorithm is used to estimate the unknowns by maximizing the log likelihood of the training data.•Example:Given the vector {1, 5, 1} and a perfectly trained model of one document P(w) = {0.14, 0.71, 0.14} The vector {2, 5, 2} has a likelihood of –4.16.The vector {1, 8, 1} has a likelihood of –2.90.Therefore {1, 5, 1} is closer to {1, 8, 1} in a log likelihood sense.713•The EM algorithm performs an implicit clustering of words into aspects.Each column holds the 10 words that a particular aspect is most likely to generate. Column headings were assigned by a human.filmmoviemusicnewbesthollywoodloveactorentertainmentstarhomefamilylikelovekidsmotherlifehappyfriendscnnspaceshuttlemissionastronautslaunchstationcrewnasasatelliteearthplaneairportcrashflightsafetyaircraftairpassengerboardairline“Hollywood”“family”“space shuttle”“plane”14Effectiveness of PLSA•The effectiveness of PLSA is tested by using the aspects to perform queries (PLSI).•Two basic query techniques for PLSI are tested against the normal query technique for LSI.•All queries use the cosine similarity metric to find the similarity between vectors.81516•PLSI-U incorporates TF-IDF weighting by directly modifying the word vectors of the document and query before performing the comparison.•PLSI-Q uses a weighted average of the TF-IDF scores of the words that affect each aspect. •The aspect vector for a query is generated by treating the query as a new document. The query is added to the model and the weights for the query are trained with the EM algorithm.917PLSI-Q* and PLSI-U*•The two basic query techniques are also tested using combinations of models.•PLSI-U* combines the augmented word vectors predicted by models with different numbers of aspects.•PLSI-Q* combines the cosine similarities of the aspect vectors predicted by models with different numbers of aspects.18Experiments•4 document collections provide a test bed.–MED – 1033 abstracts from the National Library of Medicine–CRAN – 1400 documents on aeronautics–CACM – 3204 abstracts from the CACM journals–CISI – 1460 abstracts in library science•Average precision across 9 recall levels is reported (10, 20, …, 90%).•Average relative improvement over the baseline at each recall level is reported.1019•PLSI-U is better than LSI in every case.•PLSI-U* is the champion in almost every case•The author speculates that PLSI-Q* could do much better if tf-idf weighting were incorporated more effectively.-+8.4+15.3+14.4+21.8+20.820.221.923.323.124.624.4-+8.7+15.5+21.5+26.0+29.221.923.825.326.627.628.3-+9.9+10.5+9.7+14.8+13.935.238.738.938.640.440.1-+31.8+41.8+29.0+47.1+35.349.064.669.563.272.166.3cos+tfidfLSIPLSI-UPLSI-QPLSI-U*PLSI-Q*improvementprecisionimprovementprecisionimprovementprecisionimprovementprecisionCISICACMCRANMED20112122The PLSA Advantage•PLSA has several advantages over traditional SVD based LSA.•PLSA attempts to maximize the likelihood of the data rather than minimizing the sum squared error.•PLSA can use the usual methods to prevent overfitting, which can lead to more general models.1223•Models can be combined productively with PLSA.•PLSA provides a “more intuitive” definition for aspects.•Empirical results bear out these

View Full Document


School:
Email:
New Password:
Confirm Password:

UCSD CSE 254 - Probabilistic Latent Semantic Indexing

Sign up for free to view:

Please select your school