View Full Document

Exponential family sparse coding with application to self-taught learning with text documents



View the full content.
View Full Document
View Full Document

11 views

Unformatted text preview:

Exponential family sparse coding with application to self taught learning with text documents hllee cs stanford edu rajatr cs stanford edu teichman cs stanford edu ang cs stanford edu Honglak Lee Rajat Raina Alex Teichman Andrew Y Ng Stanford University Stanford CA 94305 USA Abstract Sparse coding is an unsupervised learning algorithm for finding concise slightly higherlevel representations of an input and has been successfully applied to self taught learning Raina et al 2007 where the goal is to use unlabeled data to help on a supervised learning task even if the unlabeled data cannot be associated with the labels of the supervised task However sparse coding uses a Gaussian noise model and a quadratic loss function and thus performs poorly if applied to binary valued integer valued or other non Gaussian data such as text Drawing on ideas from Generalized linear models GLMs we present a generalization of sparse coding to learning with data drawn from any exponential family distribution such as Bernoulli Poisson etc This gives a method that we argue is much better suited to model other data types than Gaussian We present an efficient algorithm for solving the optimization problem defined by this model We also show that the new model results in significantly improved self taught learning performance when applied to text data 1 Introduction We consider the self taught learning problem in which we are given just a few labeled examples for a classification task and also large amounts of unlabeled data that is only mildly related to the task Raina et al 2007 Weston et al 2006 Specifically the unlabeled data may not share the class labels or arise from the same distribution For example given only a few labeled examples of webpages about Baseball or Football along with access to a large corpus of unrelated webpages we might want to accurately classify new baseball football webpages The ability to use such easily available unlabeled data has the potential to greatly reduce



Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Exponential family sparse coding with application to self-taught learning with text documents and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Exponential family sparse coding with application to self-taught learning with text documents and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?