DOC PREVIEW
CMU CS 10601 - Expectation Maximization and Learning from partly Unobserved Data

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Expectation Maximization andExpectation Maximization, andLearning from Partly Unobserved DataMachine Learning 10-601gMarch 3, 2008Tom M. MitchellMachine Learning DepartmentMachine Learning DepartmentCarnegie Mellon University1. Learning Bayes net parameters gy pfrom partly unobserved data,and reviewing Bayes nets concepts for the midterm!for the midterm!Bayes Net with 5 Boolean Variables• How many parameters does this have in itsFluAllergydoes this have in its CPTs?Sinus• How would you learn them from fully HeadacheNosete o uyobserved data?• What if you could only observe F,A,H and N?Learning CPTs•MLE (Max LikelihoodFluAllergy•MLE (Max Likelihood Estimate) isSinusHeadacheNosekthtraining example• Partly unobserved data Æuse EM!Æuse EM!Estimate from partly observed data• What if FAHN observed, but not S?•Can’t calculate MLEFluAllergySinus•Can t calculate MLEHeadacheNose• Let X be all observed variable values (over all examples)• Let Z be all unobserved variable values • Can’t calculate MLE:• EM seeks* estimate:* EM guaranteed to find local maximumEM AlgorithmEM is a general procedure for solving such problemsGiven observed variables X, unobserved Z (X={F,A,H,N}, Z={S})DefineIt t tilIterate until convergence:• E Step: Use X and current θ to estimate P(Z|X,θ)•M Step: Replace currentθby•M Step: Replace current θby Guaranteed to find local maximum. Each iteration increasesE Step: Use X, θ, to Calculate P(Z|X,θ)FluAllergySinus• How? Bayes net inference problem.HeadacheNoseEM and estimating FluAllergySinusHeadacheNoseMore generally, Given observed set X, unobserved set Z of boolean valuesE step: Calculate for each training example, k the expected value of each unobserved variable M step:Cl l i i il MLEb l i hCalculate estimates similar to MLE, but replacing each count by its expected count2. Usupervised clustering: K-means and Mixtures of GaussiansClustering• Given set of data points, group them•Unsupervised learning•Unsupervised learning• Which patients are similar? (or which earthquakes customers faces web pages )earthquakes, customers, faces, web pages, …)K-means ClusteringGiven data <x1… xn>, and K, assign each xito one of K clusters, C1… CK, minimizingWhere is mean over all points in cluster CjK-Means Algorithm:Initialize randomlyRttilRepeat until convergence:1. Assign each point xito the cluster with the closest mean μj2. Calculate the new mean for each clusterMixtures of GaussiansK-means is EM’ish, but makes ‘hard’ assignments of xito clusters. Let’s derive a real EM algorithm for clustering.What object function shall we optimize?What object function shall we optimize?• Maximize data likelihood!What form of P(X) should we assume?Mi t f G i•Mixture of GaussiansMixture of Gaussians:• Assume P(x) is a mixture of K different Gaussians• Then each data point, x, is generated by 2-step process1.zÅchoose one of theKGaussians, according toπ1…πK11.zÅchoose one of the KGaussians, according to π1… πK-12. Generate x according to the Gaussian N(μz, Σz)EM for Mixture of GaussiansSimplify to make this easier: 1. assume X=<X1... Xn>, and the Xiare conditionally independent given Z. g2assume only 2 mixture components and2.assume only 2 mixture components, andZ3. Assume σ known, π1… πK, μ1i…μKiunknownObserved: X=<X1... Xn>Unobserved: ZX1X4X3X21432EMZGiven observed variables X, unobserved Z Definewhere X1X4X3X2Iterate until convergence:• E Step: Calculate P(Z(n)|X(n),θ) for each example X(n). Use this to constructUse this to construct • M Step: Replace current θbyEM – E StepZCalculate P(Z(n)|X(n),θ) for each observed example X(n)X(n)=<x1(n), x2(n), … xT(n)>.()1(),2(),T()X1X4X3X2EM – M Step ZFirst consider update for ππ’ has no influenceX1X4X3X2Count z(n)=1EM – M Step ZNow consider update for μjiμji’ has no influenceX1X4X3X2………Compare above to MLE if Z were observable:EM – putting it togetherZGiven observed variables X, unobserved Z Definewhere X1X4X3X2Iterate until convergence:• E Step: For each observed example X(n), calculate P(Z(n)|X(n),θ)• M Step: UpdateMixture of Gaussians applet• Run applethttp://www neurosci aist go jp/%7Eakaho/MixtureEM htmlhttp://www.neurosci.aist.go.jp/%7Eakaho/MixtureEM.htmlK-Means vs Mixture of Gaussians• Both are iterative algorithms to assign points to clusters• Objective function– K Means: minimize – MixGaussians: maximize P(X|θ)• Mixture of Gaussians is the more general formulation– Equivalent to K Means when Σk= σ I, and σ ! 0Using Unlabeled Data to Help Train Naïve Bayes ClassifierNaïve Bayes ClassifierLearn P(Y|X)YY X1X2X3X4100110010000010X1X4X3X2?0110?0101From [Nigam et al 2000]From [Nigam et al., 2000]E Step:M Step:wtis t-th word in vocabularyElaboration 1: Downweight the influence of unlabeled examples by factorλunlabeled examples by factor λChosen by cross New M step:Chosen by cross validationUi Using one labeled example per ppclassExperimental Evaluation• Newsgroup postings 20 newsgroups 1000/group–20 newsgroups, 1000/group• Web page classification –student faculty course project–student, faculty, course, project– 4199 web pages•Reuters newswire articlesReuters newswire articles – 12,902 articles–90 topics categoriespg20 Newsgroups20 NewsgroupsWhat you should know about EM• For learning from partly unobserved data•MLEst ofθ= MLEst of θ • EM estimate: θ= Where X is observed part of data, Z is unobserved• EM for training Bayes networksCldlMAP ifEM•Can also develop MAP version of EM• Given some time, be able to derive your own EM algorithm for your own problemalgorithm for your own


View Full Document

CMU CS 10601 - Expectation Maximization and Learning from partly Unobserved Data

Documents in this Course
lecture

lecture

40 pages

Problem

Problem

12 pages

lecture

lecture

36 pages

Lecture

Lecture

31 pages

Review

Review

32 pages

Lecture

Lecture

11 pages

Lecture

Lecture

18 pages

Notes

Notes

10 pages

Boosting

Boosting

21 pages

review

review

21 pages

review

review

28 pages

Lecture

Lecture

31 pages

lecture

lecture

52 pages

Review

Review

26 pages

review

review

29 pages

Lecture

Lecture

37 pages

Lecture

Lecture

35 pages

Boosting

Boosting

17 pages

Review

Review

35 pages

lecture

lecture

32 pages

Lecture

Lecture

28 pages

Lecture

Lecture

30 pages

lecture

lecture

29 pages

leecture

leecture

41 pages

lecture

lecture

34 pages

review

review

38 pages

review

review

31 pages

Lecture

Lecture

41 pages

Lecture

Lecture

15 pages

Lecture

Lecture

21 pages

Lecture

Lecture

38 pages

Notes

Notes

37 pages

lecture

lecture

29 pages

Load more
Download Expectation Maximization and Learning from partly Unobserved Data
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Expectation Maximization and Learning from partly Unobserved Data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Expectation Maximization and Learning from partly Unobserved Data 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?