Pitt CS 2750 - Machine - D1779852

Home> Schools> University of Pittsburgh> Computer Science (CS) > CS 2750> Machine

Pitt CS 2750 - Machine

Pages 12

Download Save

Unformatted text preview:

1CS 2750 Machine LearningCS 2750 Machine LearningLecture 16Milos [email protected] Sennott SquareLearning with hidden variables and missing valuesCS 2750 Machine LearningDensity estimation with hidden variablesGoal: Find the set of parametersEstimation criteria:– MLOptimization methods for ML: gradient-ascent, conjugate gradient, Newton-Rhapson, etc.• Problem: No or very small advantage from the structure of the corresponding belief networkExpectation-maximization (EM) method– An alternative optimization method– Suitable when there are missing or hidden values– Takes advantage of the structure of the belief network),|(maxξΘΘDpΘˆ),|(ξDp Θ– Bayesian2CS 2750 Machine LearningGeneral EMThe key idea of a method:Compute the parameter estimates iteratively by performing the following two steps: Two steps of the EM:1. Expectation step. Complete all hidden and missing variables with expectations for the current set of parameters2. Maximization step. Compute the new estimates of for the completed data Stop when no improvement possible'ΘΘCS 2750 Machine LearningEM algorithmAlgorithm (general formulation)Initialize parametersRepeat Set 1. Expectation step2. Maximization stepuntil no or small improvement in We proved that the EM algorithm improves the loglikelihood of dataΘ),|,(log)'|(',|ξΘ=ΘΘΘDHPEQDH)'|(maxarg ΘΘ=ΘΘQΘ=Θ')'|( ΘΘQ3CS 2750 Machine LearningEM advantagesKey advantages:• In many problems (e.g. Bayesian belief networks)– has a nice form and the maximization of Q can be carried in the closed form• No need to compute Q before maximizing • We directly optimize – use quantities corresponding to expected counts),|,(log)'|(',|ξΘ=ΘΘΘDHPEQDHCS 2750 Machine LearningNaïve Bayes with a hidden class and missing valuesAssume:• is modeled using a Naïve Bayes model with hidden class variable• Missing entries (values) for attributes in the dataset D)(XP1X2XnX…Hidden class variableAttributes are independentgiven the classC4CS 2750 Machine LearningEM for the Naïve Bayes• We can use EM to learn the parameters• Parameters:prior on class jprobability of an attribute i having value k given class j• Indicator variables:for example l, the class is j ; if true (=1) else false (=0)for example l, the class is j and the value of attrib i is kbecause the class is hidden and some attributes are missing, thevalues (0,1) of indicator variables are not known; they are hiddenH – a collection of all indicator variables),|,(log)'|(',|ξΘ=ΘΘΘDHPEQDHjπijkθljδlijkδCS 2750 Machine LearningEM for the Naïve Bayes model• We can use EM to do the learning of parameters),|,(log)'|(',|ξΘ=ΘΘΘDHPEQDH∏∏∏∏==ΘNlikijkjjlijkljDHP1log),|,(logδδθπξ)loglog(1ijkiklijkjNljljθδπδ∑∑∑∑+==)log)(log)((),|,(log',|1',|',| ijkiklijkDHjNljljDHDHEEDHPEθδπδξ∑∑∑∑Θ=ΘΘ+=Θ)',|()(',|Θ==Θ llljDHDjCpEδ)',|,()(',|Θ===Θ llillijkDHDjCkXpEδSubstitutes 0,1with expected value5CS 2750 Machine LearningEM for Naïve Bayes model• Computing derivatives of Q for parameters and setting it to 0 we get:• Important:– Use expected counts instead of counts !!!– Re-estimate the parameters using expected counts∑==irkijkijkijkNN1~~θNNjj~=π)',|()(~1',|1Θ===∑∑=Θ=llNlljDHNljDjCpENδ)',|,()(~1',|1Θ====∑∑=Θ=llilNllijkDHNlijkDjCkXpENδCS 2750 Machine LearningEM for BBNs• The same result applies to learning of parameters of any Bayesian belief network with discrete-valued variables • Again:– Use expected counts instead of counts∑=Θ===NllliliijkDjpakxpN1)',|,(~∑==irkijkijkijkNN1~~θrequires inference),|,(log)'|(',|ξΘ=ΘΘΘDHPEQDHParameter value maximizing Q6CS 2750 Machine LearningGaussian mixture modelAssume we want to represent the probability model of a population in a two dimensional space-2 -1.5 -1 -0. 5 0 0.5 1 1.5 2-1-0.500.511.522.53},{21XX=XX)|( iCP =XCModel : 3 Gaussians witha hidden class variable)(CPExamplesCS 2750 Machine LearningGaussian mixture modelProbability of occurrence of a data point x is modeled aswhere= probability of a data point coming from class C=i = class conditional density (modeled as a Gaussian)for class iRemember: C is hidden !!!!)|()()(1iCpiCppki===∑=xx),()|(iiNiCp Σµx ≈=X)|( iCp =XC)(CP)( iCp =7CS 2750 Machine LearningHidden variable model• Mixture of GaussiansCS 2750 Machine LearningGaussian mixture modelML estimate of parameters for the labeled example (as in classification):∑==iCjilN:1NNii=π~∑==iCjjiilN:1~xµTijiiCjjiilN))((1~:µxµxΣ −−=∑=class1=C1Σ,µ1C2=C22Σ,µ8CS 2750 Machine LearningGaussian mixture model• Gaussians are not labeled• We can apply EM algorithm:– re-estimation based on the class posterior∑=Θ=Θ=Θ=Θ==Θ==mulllllllliluCxpuCpiCxpiCpiCph1)',|()'|()',|()'|()',|( x∑=lilihNNNii=π~∑=ljiliihNxµ1~TijiljiliihN))((1~µxµxΣ −−=∑Count replaced with the expected countMean and variance expressionsweighted by the class posteriorCS 2750 Machine LearningGaussian mixture algorithm• A special case: the same fixed covariance matrix for all hidden groups and uniform prior on classes• Algorithm:Initialize means for all classes iRepeat two steps until no change in the means:1. Compute the class posterior for each Gaussian and each point (a kind of responsibility for a Gaussian for a point)2. Move the means of the Gaussians to the center of the data, weighted by the responsibilities iµResponsibility:New mean:∑∑===NlilNllilihh11xµ∑=Θ=Θ=Θ=Θ==mulllllliluCxpuCpiCxpiCph1)',|()'|()',|()'|(9CS 2750 Machine LearningGaussian mixture model - example-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2-1.5-1-0.500.511.522.5CS 2750 Machine LearningGaussian mixture example-2 -1.5 -1 -0 .5 0 0.5 1 1. 5 2-2-1.5-1-0.500.511.522.5-2 -1.5 -1 -0. 5 0 0. 5 1 1.5 2-2-1.5-1-0.500.511.522.5-2 -1.5 -1 -0 .5 0 0.5 1 1. 5 2-2-1.5-1-0.500.511.522.5-2 -1. 5 -1 -0.5 0 0.5 1 1.5 2-2-1.5-1-0.500.511.522.5123410CS 2750 Machine LearningGaussian mixture model. Gradient ascent.• A set of parameters Assume unit variance terms and fixed priorsCx)(Cp)|( Cxp{}mm,...µ,µµ2121,,..,πππ=Θ−−==−22/121exp)2()|(iµxiCPπx−−=Θ−==∏∑22/11121exp)2()|(ilNlmiixDPµππ−−=Θ∑∑==−2112/121exp)2(log)(ilNlmiixlµππ)()(1∑=−=∂Θ∂Nlililixhlµµ- very easy on-line updateCS 2750 Machine LearningEM versus gradient ascentGradient ascent

View Full Document


School:
Email:
New Password:
Confirm Password:

Pitt CS 2750 - Machine

Sign up for free to view:

Please select your school