DOC PREVIEW
CMU CS 10601 - Semi supervised learning

This preview shows page 1-2-17-18-19-35-36 out of 36 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Semi supervised learning10601 Machine LearningCan Unlabeled Data improve supervised learning?Important question! In many cases, unlabeled data is plentiful, labeled data expensive• Medical outcomes (x=<patient,treatment>, y=outcome)• Text classification (x=document, y=relevance)• Customer modeling (x=user actions, y=user intent)• …When can Unlabeled Data help supervised learning?Consider setting:• Set X of instances drawn from unknown distribution P(X)• Wish to learn target function f: XY (or, P(Y|X))• Given a set H of possible hypotheses for fGiven:• iid labeled examples• iid unlabeled examples Determine:Four Ways to Use Unlabeled Data for Supervised Learning1. Use to re-weight labeled examples2. Use to help EM learn class-specific generative models3. If problem has redundantly sufficient features, use CoTraining. 4. Use to detect/preempt overfitting1. Use unlabeled data to reweight labeled examples• Most machine learning algorithms (neural nets, decision trees, SVMs) attempt to minimize errors over labeled examples• But our ultimate goal is to minimize error over future examples drawn from the same underlying distribution• If we know the underlying distribution, we should weight each training example by its probability according to this distribution• Unlabeled data allows us to estimate this distribution more accurately, and to reweight our labeled examples accordinglyExample1. reweight labeled examples1 if hypothesis hdisagrees with true function f, else 01. reweight labeled examples1 if hypothesis hdisagrees with true function f, else 0# of times we have x in the labeled set1. reweight labeled examples1 if hypothesis hdisagrees with true function f, else 0# of times we have x in the labeled set# of times we have x in the unlabeled setExample2. Improve EM clustering algorithms• Consider completely unsupervised clustering, where we assume data X is generated by a mixture of probability distributions, one for each cluster– For example, Gaussian mixtures• Some classifier learning algorithms such as Gaussian Bayes classifiers also assumes the data X is generated by a mixture of distributions, one for each class Y• Supervised learning: estimate P(X|Y) from labeled data• Opportunity: estimate P(X|Y) from labeled and unlabeled data, using EM as in clusteringBag of Words Text Classificationaardvark 0about 2all 2Africa 1apple 0anxious 0...gas 1...oil 1…Zaire 0Baseline: Naïve Bayes LearnerTrain:For each class cjof documents1. Estimate P(cj )2. For each word wiestimate P(wi| cj )Classify (doc):Assign doc to most probable classdocwjijjicwPcP )|()(maxargNaïve Bayes assumption: words are conditionally independent, given classExpectation Maximization (EM) Algorithm• Use labeled data L to learn initial classifier hLoop:• E Step:– Assign probabilistic labels to U, based on h• M Step:– Retrain classifier h using both L (with fixed membership) and assigned labels to U (soft membership)• Under certain conditions, guaranteed to converge to locally maximum likelihood h2. Generative Bayes modelYX1X4X3X2Y X1 X2 X3 X41 0 0 1 10 0 1 0 00 0 0 1 0? 0 1 1 0? 0 1 0 1Learn P(Y|X)E Step:M Step:wtis t-th word in vocabularyUsing one labeled example per classNewsgrop postings – 20 newsgroups, 1000/groupExperimental Evaluation3. If Problem Setting Provides Redundantly Sufficient Features, use CoTraining• In some settings, available data features are so redundant that we can train two classifiers using different features• In this case, the two classifiers should agree on the classification for each unlabeled example• Therefore, we can use the unlabeled data to constrain training of both classifiers, forcing them to agree3. CoTraining)()()()(,:22112121xfxgxgxggandondistributiunknownfromdrawnxwhereXXXwhereYXflearnRedundantly Sufficient FeaturesProfessor Faloutsosmy advisorCoTraining Algorithm [Blum&Mitchell, 1998]Given: labeled data L, unlabeled data ULoop:Train g1 (hyperlink classifier) using LTrain g2 (page classifier) using LAllow g1 to label p positive, n negative examps from UAllow g2 to label p positive, n negative examps from U Add the intersection of the self-labeled examples to LCoTraining: Experimental Results• begin with 12 labeled web pages (academic course)• provide 1,000 additional unlabeled web pages• average error: learning from labeled data 11.1%; • average error: cotraining 5.0% (when both agree)Typical run:Co-Training Rote LearnerMy advisor+--pageshyperlinksClassifying Jobs for FlipDogX1: job titleX2: job description4. Use U to Detect/Preempt Overfitting• Overfitting is a problem for many learning algorithms (e.g., decision trees, neural networks)• The symptom of overfitting: complex hypothesis h2 performs better on training data than simpler hypothesis h1, but worse on test data• Unlabeled data can help detect overfitting, by comparing predictions of h1 and h2 over the unlabeled examples – The rate at which h1 and h2 disagree on U should be the same as the rate on L, unless overfitting is occuring• Definition of distance metric– Non-negative d(f,g)≥0; – symmetric d(f,g)=d(g,f); – triangle inequality d(f,g) · d(f,h)+d(h,g)• Classification with zero-one loss:• Regression with squared loss:Defining a distance metricUsing the distance metricGenerated y values contain zero mean Gaussian noise eY=f(x)+eExperimental Evaluation of TRI[Schuurmans & Southey, MLJ 2002]• Use it to select degree of polynomial for regression• Compare to alternatives such as cross validation, structural risk minimization, …Cross validation (Ten-fold)Structural risk minimizationApproximation ratio: true error of selected hypothesistrue error of best hypothesis consideredResults using 200 unlabeled, t labeledperformance in top .50 of trialsSummarySeveral ways to use unlabeled data in supervised learningOngoing research area1. Use to reweight labeled examples2. Use to help EM learn class-specific generative models3. If problem has redundantly sufficient features, use CoTraining4. Use to detect/preempt overfittingFurther Reading• EM approach: K.Nigam, et al., 2000. "Text Classification from Labeled and Unlabeled Documents using EM", Machine Learning, 39, pp.103—134.• CoTraining: A. Blum and T. Mitchell, 1998. ―Combining Labeled and Unlabeled Data with Co-Training,‖ Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT-98).• S. Dasgupta, et al., ―PAC


View Full Document

CMU CS 10601 - Semi supervised learning

Documents in this Course
lecture

lecture

40 pages

Problem

Problem

12 pages

lecture

lecture

36 pages

Lecture

Lecture

31 pages

Review

Review

32 pages

Lecture

Lecture

11 pages

Lecture

Lecture

18 pages

Notes

Notes

10 pages

Boosting

Boosting

21 pages

review

review

21 pages

review

review

28 pages

Lecture

Lecture

31 pages

lecture

lecture

52 pages

Review

Review

26 pages

review

review

29 pages

Lecture

Lecture

37 pages

Lecture

Lecture

35 pages

Boosting

Boosting

17 pages

Review

Review

35 pages

lecture

lecture

32 pages

Lecture

Lecture

28 pages

Lecture

Lecture

30 pages

lecture

lecture

29 pages

leecture

leecture

41 pages

lecture

lecture

34 pages

review

review

38 pages

review

review

31 pages

Lecture

Lecture

41 pages

Lecture

Lecture

15 pages

Lecture

Lecture

21 pages

Lecture

Lecture

38 pages

Notes

Notes

37 pages

lecture

lecture

29 pages

Load more
Download Semi supervised learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Semi supervised learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Semi supervised learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?