DOC PREVIEW
CMU CS 10701 - Expectation Maximization, and Learning from Partly Unobserved Data

This preview shows page 1-2-3-21-22-23-43-44-45 out of 45 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Expectation Maximization, and Learning from Partly Unobserved Data (part 2)OutlineUnsupervised Clustering: K-means and Mixtures of GaussiansClusteringK-means ClusteringK Means appletMixtures of GaussiansEM for Mixture of GaussiansEMEM – E StepEM – M Step EM – M Step Mixture of Gaussians appletK-Means vs Mixture of GaussiansElaboration 1: Downweight the influence of unlabeled examples by factor l Experimental Evaluation20 Newsgroups20 NewsgroupsCombining Labeled and Unlabeled Data3. If Problem Setting Provides Redundantly Sufficient Features, use CoTrainingRedundantly Sufficient FeaturesRedundantly Sufficient FeaturesRedundantly Sufficient FeaturesRedundantly Sufficient FeaturesCoTraining Algorithm #1 [Blum&Mitchell, 1998]CoTraining: Experimental ResultsCoTraining SettingCo-Training Rote LearnerCo-Training Rote LearnerExpected Rote CoTraining error given m examplesHow many unlabeled examples suffice?PAC Generalization Bounds on CoTrainingWhat if CoTraining Assumption Not Perfectly Satisfied?What Objective Function?What Function Approximators?Classifying Jobs for FlipDogGradient CoTraining Classifying FlipDog job descriptions: SysAdmin vs. WebProgrammerGradient CoTraining Classifying Upper Case sequences as Person NamesCoTraining SummaryWhat you should knowExpectation Maximization, andLearning from Partly Unobserved Data(part 2)Machine Learning 10-701April 2005Tom M. MitchellCarnegie Mellon UniversityOutline• Clustering–K means– EM: Mixture of Gaussians• Training classifiers with partially unlabeled data– Naïve Bayes and EM– Reweighting the labeled examples using P(X)– Co-training– Regularization based on1. Unsupervised Clustering: K-means and Mixtures of GaussiansClustering• Given set of data points, group them• Unsupervised learning• Which patients are similar? (or which earthquakes, customers, faces, …)K-means ClusteringGiven data <x1…xn>, and K, assign each xito one of K clusters, C1…CK, minimizingWhere is mean over all points in cluster CjK-Means Algorithm:Initialize randomlyRepeat until convergence:1. Assign each point xito the cluster with the closest mean μj2. Calculate the new mean for each clusterK Means applet• Run applet• Try 3 clusters, 15 ptsMixtures of GaussiansK-means is EM’ish, but makes ‘hard’ assignments of xito clusters. Let’s derive a real EM algorithm for clustering.What object function shall we optimize?• Maximize data likelihood!What form of P(X) should we assume?• Mixture of GaussiansMixture distribution:• Assume P(x) is a mixture of K different Gaussians• Assume each data point, x, is generated by 2-step process1. z Å choose one of the K Gaussians, according to π1…πK2. Generate x according to the Gaussian N(μz, Σz)EM for Mixture of GaussiansSimplify to make this easier 1. assume Xiare conditionally independent given Z. 2. assume only 2 classes, and assume3. Assume σ known, π1…πK, μ1i…μKiunknownObserved: XUnobserved: ZZX1X4X3X2EMGiven observed variables X, unobserved Z Definewhere Iterate until convergence:• E Step: Calculate P(Z(n)|X(n),θ) for each example X(n). Use this to construct • M Step: Replace current θby ZX1X4X3X2EM – E StepCalculate P(Z(n)|X(n),θ) for each observed example X(n)X(n)=<x1(n), x2(n), … xT(n)>.ZX1X4X3X2EM – M Step ZX1X4X3X2First consider update for ππ’ has no influenceCount z(n)=1EM – M Step ZX1X4X3X2Now consider update for μjiμji’ has no influence………Compare above to MLE if Z were observable:Mixture of Gaussians applet• Run applet• Try 2 clusters• See different local minima with different random startsK-Means vs Mixture of Gaussians• Both are iterative algorithms to assign points to clusters• Objective function– K Means: minimize – MixGaussians: maximize P(X|θ)• MixGaussians is the more general formulation– Equivalent to K Means when Σk= σ I, and σ → 0Using Unlabeled Data to Help Train Naïve Bayes ClassifierYX1X4X3X2YX1X2X3X4100110010000010?0110?0101Learn P(Y|X)From [Nigam et al., 2000]E Step:M Step:wtis t-th word in vocabularyElaboration 1: Downweight the influence of unlabeled examples by factor λNew M step:Chosen by cross validationUsing one labeled example per classExperimental Evaluation• Newsgroup postings – 20 newsgroups, 1000/group• Web page classification – student, faculty, course, project– 4199 web pages• Reuters newswire articles – 12,902 articles– 90 topics categories20 Newsgroups20 NewsgroupsCombining Labeled and Unlabeled DataHow else can unlabeled data be useful for supervised learning/function approximation?1. Use U to reweight labeled examples1 if hypothesis hdisagrees with true function f, else 03. If Problem Setting Provides Redundantly Sufficient Features, use CoTraining• In some settings, available data features are redundant and we can train two classifiers using different features• In this case, the two classifiers should at least agree on the classification for each unlabeled example• Therefore, we can use the unlabeled data to constrain training of both classifiers, forcing them to agreeRedundantly Sufficient FeaturesProfessor Faloutsosmy advisorRedundantly Sufficient FeaturesProfessor Faloutsosmy advisorRedundantly Sufficient FeaturesRedundantly Sufficient FeaturesProfessor Faloutsosmy advisorCoTraining Algorithm #1 [Blum&Mitchell, 1998]Given: labeled data L, unlabeled data ULoop:Train g1 (hyperlink classifier) using LTrain g2 (page classifier) using LAllow g1 to label p positive, n negative examps from UAllow g2 to label p positive, n negative examps from U Add these self-labeled examples to LCoTraining: Experimental Results• begin with 12 labeled web pages (academic course)• provide 1,000 additional unlabeled web pages• average error: learning from labeled data 11.1%; • average error: cotraining 5.0%Typical run:CoTraining Setting)()()()(,:22112121xfxgxgxggandondistributiunknownfromdrawnxwhereXXXwhereYXflearn==∀∃×=→•If– x1, x2 conditionally independent given y– f is PAC learnable from noisy labeled data•Then– f is PAC learnable from weak initial classifier plus unlabeled dataCo-Training Rote LearnerMy advisor+--pageshyperlinksCo-Training Rote LearnerMy advisor+--pageshyperlinks----++---++++--++-Expected Rote CoTraining error given m examples[]mjjjgxPgxPerrorE ))(1)(( ∈−∈=∑Where g is the jth connected component of


View Full Document

CMU CS 10701 - Expectation Maximization, and Learning from Partly Unobserved Data

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Expectation Maximization, and Learning from Partly Unobserved Data
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Expectation Maximization, and Learning from Partly Unobserved Data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Expectation Maximization, and Learning from Partly Unobserved Data 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?