DOC PREVIEW
CMU CS 10701 - Learning from Labeled and Unlabeled Data

This preview shows page 1-2-3-4-25-26-27-51-52-53-54 out of 54 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Learning from Labeled and Unlabeled DataMachine Learning 10-701November 7, 2006Tom M. MitchellMachine Learning DepartmentCarnegie Mellon UniversityWhen can Unlabeled Data improve supervised learning?Important question! In many cases, unlabeled data is plentiful, labeled data expensive• Medical outcomes (x=<symptoms,treatment>, y=outcome)• Text classification (x=document, y=relevance)• Customer modeling (x=user actions, y=user intent)• Sensor interpretation (x=<video,audio>, y=who’s there)When can Unlabeled Data help supervised learning?Problem setting:•Set X of instances drawn from unknown distribution P(X)• Wish to learn target function f: XÆY (or, P(Y|X))• Given a set H of possible hypotheses for fGiven:• iid labeled examples• iid unlabeled examples Wish to determine:Idea 1: Use Labeled and Unlabeled Data to Train Bayes Net for P(X,Y)Idea 1: Use Labeled and Unlabeled Data to Train Bayes Net for P(X,Y), then infer P(Y|X)YX1X4X3X21010?0110?010000010011001X4X3X2X1YWhat CPDs are needed?How do we estimate them from fully observed data?How do we estimate them from partly observed?Supervised: Naïve Bayes LearnerTrain:For each class yjof documents1. Estimate P(Y=yj)2. For each word wiestimate P(X=wi| Y=yj)Classify (doc):Assign doc to most probable class* assuming words wiare conditionally independent, given class*What if we have labels for only somedocuments? YX1X4X3X21010?0110?010000010011001X4X3X2X1YLearn P(Y|X)What if we have labels for only somedocuments? YX1X4X3X21010?0110?010000010011001X4X3X2X1YLearn P(Y|X)EM: Repeat until convergence1. Use probabilistic labels to train classifier h2. Apply h to assign probabilistic labels to unlabeled data[Nigam et al., 2000]E Step:M Step:wtis t-th word in vocabulary[Nigam et al., 2000]Using one labeled example per classWords sorted by P(w|course) / P(w| ¬ course)20 NewsgroupsWhy/When will this work?• What’s best case? Worst case? How can we test which we have?Summary : Semisupervised Learning with EM and Naïve Bayes Model • If all data is labeled, corresponds to supervised training of Naïve Bayes classifier• If all data unlabeled, corresponds to unsupervied, mixture-of-multinomial clustering• If both labeled and unlabeled data, then unlabeled data helps if the Bayes net model is correct (e.g., P(X) is a mixture of class-conditional multinomials with conditionally independent Xi ) • Of course we could use Bayes net models other than Naïve BayesIdea 2: Use U to reweight labeled examples• Most learning algorithms minimize errors over labeled examples• But we really want to minimize error over future examples drawn from the same underlying distribution (ie., true error of hypothesis)• If we know the underlying distribution P(X), we could weight each labeled training example <x,y> by its probability according to P(X=x)• Unlabeled data allows us to estimate P(X)Idea 2: Use U to reweight labeled examples L1 if hypothesis hdisagrees with true function f, else 0• We can produce a better approximation by incorporating U:• Wish to minimize true error:Use to alter the loss functionWhich equals:• Usually approximate this as:n(x,L) = number of times x occurs in LReweighting Labeled Examples• Wish to find• Already have algorithm (e.g., decision tree learner) to find• Just reweight examples in L, and have algorithm minimize• Or if X is continuous, use L+U to estimate p(X), and minimizeIdea 3: CoTraining• In some settings, available data features are redundant and we can train two classifiers based on disjoint features• In this case, the two classifiers should agree on the classification for each unlabeled example• Therefore, we can use the unlabeled data to constrain joint training of both classifiersRedundantly Sufficient FeaturesProfessor Faloutsosmy advisorRedundantly Sufficient FeaturesProfessor Faloutsosmy advisorRedundantly Sufficient FeaturesRedundantly Sufficient FeaturesProfessor Faloutsosmy advisorCoTraining Algorithm #1 [Blum&Mitchell, 1998]Given: labeled data L, unlabeled data ULoop:Train g1 (hyperlink classifier) using LTrain g2 (page classifier) using LAllow g1 to label p positive, n negative examps from UAllow g2 to label p positive, n negative examps from U Add these self-labeled examples to LCoTraining: Experimental Results• begin with 12 labeled web pages (academic course)• provide 1,000 additional unlabeled web pages• average error: learning from labeled data 11.1%; • average error: cotraining 5.0%Typical run:One result [Blum&Mitchell 1998]: •If– X1 and X2 are conditionally independent given Y– f is PAC learnable from noisy labeled data• Then– f is PAC learnable from weak initial classifier plus unlabeleddataCoTraining setting:• wish to learn f: X Æ Y, given L and U drawn from P(X)• features describing X can be partitioned (X = X1 x X2)such that f can be computed from either X1 or X2Co-Training Rote LearnerMy advisor+--pageshyperlinksCo-Training Rote LearnerMy advisor+--pageshyperlinks----++---++++--++-Expected Rote CoTraining error given m examples[]mjjjgxPgxPerrorE ))(1)(( ∈−∈=∑Where g is the jth connected component of graph of L+U, m is number of labeled examplesj)()()()(,::22112121xfxgxgxggandondistributiunknownfromdrawnxwhereXXXwhereYXflearnsettingCoTraining==∀∃×=→How many unlabeled examples suffice?Want to assure that connected components in the underlying distribution, GD, are connected components in the observed sample, GSGDGSO(log(N)/α) examples assure that with high probability, GShas same connected components as GD[Karger, 94]N is size of GD, α is min cut over all connected components of GDPAC Generalization Bounds on CoTraining[Dasgupta et al., NIPS 2001]This theorem assumes X1 and X2 are conditionally independent given YPAC Generalization Bounds on CoTraining[Dasgupta et al., NIPS 2001]This theorem assumes X1 and X2 are conditionally independent given Y• Idea: Want classifiers that produce a maximally consistent labeling of the data• If learning is an optimization problem, what function should we optimize?What if CoTraining Assumption Not Perfectly Satisfied?-+++Example 2: Learning to extract named entitiesI arrived in Beijing on Saturday.location?If: “I arrived in <X> on Saturday.”Then: Location(X)Co-Training for Named Entity Extraction(i.e.,classifying which strings refer to people, places, dates, etc.)Answer1Classifier1Answer2 Classifier2I arrived in Beijing saturday.BeijingI arrived in __


View Full Document

CMU CS 10701 - Learning from Labeled and Unlabeled Data

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Learning from Labeled and Unlabeled Data
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Learning from Labeled and Unlabeled Data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Learning from Labeled and Unlabeled Data 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?