MIT 6 867 - Lecture Notes (28 pages)

Previewing pages 1, 2, 3, 26, 27, 28 of 28 page document View the full content.
View Full Document

Lecture Notes



Previewing pages 1, 2, 3, 26, 27, 28 of actual document.

View the full content.
View Full Document
View Full Document

Lecture Notes

58 views


Pages:
28
School:
Massachusetts Institute of Technology
Course:
6 867 - Machine Learning

Unformatted text preview:

Machine learning lecture 17 Tommi S Jaakkola MIT CSAIL tommi csail mit edu Topics Clustering cont d semi supervised clustering clustering by dynamics Structured probability models hidden Markov models Tommi Jaakkola MIT CSAIL 2 Overview of clustering methods Flat clustering methods e g mixture models k means clustering Hierarchical clustering methods Top down splitting e g hierarchical mixture models Bottom up merging e g hierarchical agglomerative clustering Spectral clustering Semi supervised clustering Clustering by dynamics Etc Tommi Jaakkola MIT CSAIL 3 Semi supervised clustering Let s assume we have some additional relevance information about the examples and we d like clusters to preserve this information as much as possible This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled For example by merging together documents we do not wish to loose information about the words they contain word distributions Tommi Jaakkola MIT CSAIL 4 Semi supervised clustering Let s assume we have some additional relevance information about the examples and we d like clusters to preserve this information as much as possible This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled For example by merging together documents we do not wish to loose information about the words they contain word distributions xi Training example e g a text document y Relevance variable e g a word P y xi Relevance information e g word distribution



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?