DOC PREVIEW
CMU CS 10701 - Co-Training for Semi- supervised learning (cont.)

This preview shows page 1-2-3-24-25-26-27-49-50-51 out of 51 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

©2005-2007 Carlos Guestrin1Co-Training for Semi-supervised learning(cont.)Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityApril 23rd, 20072©2005-2007 Carlos GuestrinExploiting redundant information insemi-supervised learning Want to predict Y fromfeatures X f(X) a Y have some labeled data L lots of unlabeled data U Co-training assumption: X isvery expressive X = (X1,X2) can learn g1(X1) a Y g2(X2) a Y3©2005-2007 Carlos GuestrinCo-Training Algorithm[Blum & Mitchell ’99]4©2005-2007 Carlos GuestrinUnderstanding Co-Training: Asimple setting Suppose X1 and X2 are discrete |X1| = |X2| = N No label noise Without unlabeled data, how hard is it to learn g1 (or g2)?5©2005-2007 Carlos GuestrinCo-Training in simple setting –Iteration 06©2005-2007 Carlos GuestrinCo-Training in simple setting –Iteration 17©2005-2007 Carlos GuestrinCo-Training in simple setting – afterconvergence8©2005-2007 Carlos GuestrinCo-Training in simple setting –Connected components Suppose infinite unlabeled data Co-training must have at least one labeledexample in each connected component of L+Ugraph What’s probability of making an error? For k Connected components, how muchlabeled data?9©2005-2007 Carlos GuestrinHow much unlabeled data?10©2005-2007 Carlos GuestrinCo-Training theory Want to predict Y from features X f(X) a Y Co-training assumption: X is very expressive X = (X1,X2) want to learn g1(X1) a Y and g2(X2) a Y Assumption: ∃ g1, g2, ∀ x g1(x1) = f(x), g2(x2) = f(x) One co-training result [Blum & Mitchell ’99] If (X1 ⊥ X2 | Y) g1 & g2 are PAC learnable from noisy data (and thus f) Then f is PAC learnable from weak initial classifier plus unlabeled data11©2005-2007 Carlos GuestrinWhat you need to know about co-training Unlabeled data can help supervised learning (a lot) whenthere are (mostly) independent redundant features One theoretical result: If (X1 ⊥ X2 | Y) and g1 & g2 are PAC learnable from noisy data(and thus f) Then f is PAC learnable from weak initial classifier plusunlabeled data Disagreement between g1 and g2 provides bound on error of finalclassifier Applied in many real-world settings: Semantic lexicon generation [Riloff, Jones 99] [Collins, Singer 99],[Jones 05] Web page classification [Blum, Mitchell 99] Word sense disambiguation [Yarowsky 95] Speech recognition [de Sa, Ballard 98] Visual classification of cars [Levin, Viola, Freund 03]©2005-2007 Carlos Guestrin12Transductive SVMsMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityApril 23rd, 200713©2005-2007 Carlos GuestrinSemi-supervised learning anddiscriminative models We have seen semi-supervised learning forgenerative models EM What can we do for discriminative models Not regular EM we can’t compute P(x) But there are discriminative versions of EM Co-Training! Many other tricks… let’s see an example14©2005-2007 Carlos GuestrinLinear classifiers – Which line is better?Data:Example i:w.x = ∑j w(j) x(j)15©2005-2007 Carlos GuestrinSupport vector machines (SVMs)w.x + b = +1w.x + b = -1w.x + b = 0margin γ Solve efficiently by quadraticprogramming (QP) Well-studied solution algorithms Hyperplane defined by supportvectors16©2005-2007 Carlos GuestrinWhat if we have unlabeled data?nL Labeled Data:Example i:w.x = ∑j w(j) x(j)nU Unlabeled Data:17©2005-2007 Carlos GuestrinTransductive support vectormachines (TSVMs)w.x + b = +1w.x + b = -1w.x + b = 0margin γ18©2005-2007 Carlos GuestrinTransductive support vectormachines (TSVMs)w.x + b = +1w.x + b = -1w.x + b = 0margin γ19©2005-2007 Carlos GuestrinWhat’s the difference between transductivelearning and semi-supervised learning? Not much, and A lot!!! Semi-supervised learning: labeled and unlabeled data ! learn w use w on test data Transductive learning same algorithms for labeled and unlabeled data, but… unlabeled data is test data!!! You are learning on the test data!!! OK, because you never look at the labels of the test data can get better classification but be very very very very very very very very careful!!! never use test data prediction accuracy to tune parameters, select kernels, etc.20©2005-2007 Carlos GuestrinAdding slack variablesw.x + b = +1w.x + b = -1w.x + b = 0margin γ21©2005-2007 Carlos GuestrinTransductive SVMs – now with slackvariables! [Vapnik 98]w.x + b = +1w.x + b = -1w.x + b = 0margin γ22©2005-2007 Carlos GuestrinLearning Transductive SVMs is hard!w.x + b = +1w.x + b = -1w.x + b = 0margin γ Integer Program NP-hard!!! Well-studied solution algorithms,but will not scale up to very largeproblems23©2005-2007 Carlos GuestrinA (heuristic) learning algorithm forTransductive SVMs [Joachims 99]w.x + b = +1w.x + b = -1w.x + b = 0margin γ If you set to zero → ignore unlabeled data Intuition of algorithm: start with small add labels to some unlabeled data based on classifierprediction slowly increase keep on labeling unlabeled data and re-runningclassifier24©2005-2007 Carlos GuestrinSome results classifying newsarticles – from [Joachims 99]25©2005-2007 Carlos GuestrinWhat you need to know abouttransductive SVMs What is transductive v. semi-supervised learning Formulation for transductive SVM can also be used for semi-supervised learning Optimization is hard! Integer program There are simple heuristic solution methods thatwork well here©2005-2007 Carlos Guestrin26DimensionalityreductionMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityApril 23rd, 200727©2005-2007 Carlos GuestrinDimensionality reduction Input data may have thousands or millions ofdimensions! e.g., text data has Dimensionality reduction: represent data withfewer dimensions easier learning – fewer parameters visualization – hard to visualize more than 3D or 4D discover “intrinsic dimensionality” of data high dimensional data that is truly lower dimensional28©2005-2007 Carlos GuestrinFeature selection Want to learn f:XaY X=<X1,…,Xn> but some features are more important than others Approach: select subset of features to be usedby learning algorithm Score each feature (or sets of features) Select set of features with best score29©2005-2007 Carlos GuestrinSimple greedy forward feature


View Full Document

CMU CS 10701 - Co-Training for Semi- supervised learning (cont.)

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Co-Training for Semi- supervised learning (cont.)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Co-Training for Semi- supervised learning (cont.) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Co-Training for Semi- supervised learning (cont.) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?