Pitt CS 2750 - Density estimation with hidden variables and missing values - D2067741

Home> Schools> University of Pittsburgh> Computer Science (CS) > CS 2750> Density estimation with hidden variables and missing values

DOC PREVIEW

Pitt CS 2750 - Density estimation with hidden variables and missing values

School name University of Pittsburgh

Course Cs 2750- Machine Learning

Pages 18

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CS 2750 Machine LearningCS 2750 Machine LearningLecture 18Milos [email protected] Sennott SquareDensity estimation with hidden variables and missing valuesCS 2750 Machine LearningProject proposalsDue: Wednesday, March 24, 2004• 1-2 pages longProposal • Written proposal:1. Outline of a learning problem, type of data you have available. Why is the problem important?2. Learning methods you plan to try and implement for the problem. References to previous work.3. How do you plan to test, compare learning approaches4. Schedule of work (approximate timeline of work)• A PPT (3 slide) summary of points 1-42CS 2750 Machine LearningLearning probability distributionBasic learning settings:• A set of random variables • A model of the distribution over variables in Xwith parameters • Datas.t.Objective: find parameters that describe the data Assumptions considered so far:– Known parameterizations– No hidden variables – No-missing values},,,{21 nXXX K=XΘ},..,,{21 NDDDD =Θˆ),,(21iniiixxxD K=CS 2750 Machine LearningHidden variablesModeling assumption: Variables are related through hidden variables Why to add hidden variables?• More flexibility in describing the distribution• Smaller parameterization of – New independences can be introduced via hidden variablesExample: • Latent variable models– hidden classes (categories)},,,{21 nXXX K=X)(XPHidden class variable)(XPX)|( iCP=XC3CS 2750 Machine LearningHidden variable model. Example.• We want to represent the probability model of a population in a two dimensional space-2 -1.5 -1 -0. 5 0 0.5 1 1.5 2-1-0.500.511.522.53},{21XX=XObserved dataCS 2750 Machine LearningHidden variable model• We want to represent the probability model of a population in a two dimensional space-2 -1.5 -1 -0. 5 0 0.5 1 1.5 2-1-0.500.511.522.53},{21XX=XObserved data4CS 2750 Machine LearningHidden variable model• We want to represent the probability model of a population in a two dimensional space-2 -1.5 -1 -0. 5 0 0.5 1 1.5 2-1-0.500.511.522.53},{21XX=XObserved dataCS 2750 Machine LearningHidden variable model• We want to represent the probability model of a population in a two dimensional space-2 -1.5 -1 -0. 5 0 0.5 1 1.5 2-1-0.500.511.522.53},{21XX=XX)|( iCP =XCModel : 3 Gaussians witha hidden class variable)(CPObserved data5CS 2750 Machine LearningMixture of GaussiansProbability of the occurrence of a data point x is modeled aswhere= probability of a data point coming from class C=i = class-conditional density (modeled as Gaussian)for class i)|()()(1iCpiCppki===∑=xx),()|(iiNiCp Σµx ≈=X)|( iCp =XC)(CP)( iCp =CS 2750 Machine LearningMixture of Gaussians• Density function for the Mixture of Gaussians model6CS 2750 Machine LearningNaïve Bayes with a hidden class variableIntroduction of a hidden variable can reduce the number of parameters defining Example: • Naïve Bayes model with a hidden class variable•Useful in customer profiles– Class value = type of customers)(XP1X2XnX…Hidden class variableAttributes are independentgiven the classCCS 2750 Machine LearningMissing valuesA set of random variables • Data• But some values are missing• Example: medical records• We still want to estimate parameters of },,,{21 nXXX K=X},..,,{21 NDDDD =),,(31iniiixxxD K=ix2Missing value of ),(31iniixxD K=+iixx21,Missing values of )(XPEtc.7CS 2750 Machine LearningDensity estimationGoal: Find the set of parametersEstimation criteria:– MLPossible optimization methods for the ML:gradient-ascent, conjugate gradient, Newton-Rhapson, etc.•Problem: No or very small advantage from the structure of the corresponding belief networkExpectation-maximization (EM) method– An alternative optimization method– Suitable when there are missing or hidden values–Takes advantage of the structure of the belief network),|(maxξΘΘDpΘˆ),|(ξDp Θ– BayesianCS 2750 Machine LearningGeneral EMThe key idea of a method:Compute the parameter estimatesiteratively by performing the following two steps: Two steps of the EM:1. Expectation step. Complete the values for all hidden and missing variables with expectations for the current set of parameters2. Maximization step. Compute the new estimates of for the ‘completed’ data Stop when no improvement possible'ΘΘ8CS 2750 Machine LearningEMLet H – be a set of all variables with hidden or missing valuesDerivation),|(),,|(),|,(ξξξΘΘ=Θ DPDHPDHP),|(log),,|(log),|,(logξξξΘ+Θ=Θ DPDHPDHP),,|(log),|,(log),|(logξξξΘ−Θ=Θ DHPDHPDPAverage both sides with for ),',|(ξΘDHP'Θ),|(log),|,(log),|(log',|',|',|ξξξΘ−Θ=ΘΘΘΘHPEDHPEDPEDHDHDH)'|()'|(),|(logΘΘ+ΘΘ=Θ HQDPξLog-likelihood of dataLog-likelihood of dataCS 2750 Machine LearningEM algorithmAlgorithm (general formulation)Initialize parametersRepeat Set 1. Expectation step2. Maximization stepuntil no or small improvement in Questions: Why this leads to the ML estimate ?What is the advantage of the algorithm?Θ),|,(log)'|(',|ξΘ=ΘΘΘDHPEQDH)'|(maxargΘΘ=ΘΘQ)'(Θ=ΘΘ=Θ 'Θ9CS 2750 Machine LearningEM algorithm• Why is the EM algorithm correct?•Claim: maximizing Q improves the log-likelihood)'|()'|()(ΘΘ+ΘΘ=Θ HQl)'|'()'|()'|'()'|()'()(ΘΘ−ΘΘ+ΘΘ−ΘΘ=Θ−Θ HHQQllDifference in log-likelihoods (current and next step)0)'|'()'|( ≥ΘΘ−ΘΘHHSubexpressionKullback-Leibler (KL) divergence (distance between 2 distributions)∑ΘΘ−=Θ−=ΘΘΘiDHHPDHpHPEH ),|(log)',|(),|(log)'|(',|ξξ0log)|( ≥=∑iiiiRPPRPKLIs always positive !!!0),|(),'|(log)',|()'|'()'|( ≥ΘΘΘ=ΘΘ−ΘΘ∑iHPHPDHPHHξξCS 2750 Machine LearningEM algorithm)'|'()'|()'|'()'|()'()( ΘΘ−ΘΘ+ΘΘ−ΘΘ=Θ−Θ HHQQllDifference in log-likelihoodsThusby maximizing Q we maximize the log-likelihood)'|'()'|()'()(ΘΘ−ΘΘ≥Θ−Θ QQll)'|()'|()(ΘΘ+ΘΘ=Θ HQlEM is a first-order optimization procedure•Climbs the gradient• Automatic learning rateNo need to adjust the learning rate !!!!10CS 2750 Machine LearningEM advantagesKey advantages:• In many problems (e.g. Bayesian belief networks)– has a nice form and the maximization of Q can be carried in the closed form• No need to compute Q before maximizing • We directly optimize – use quantities corresponding to expected counts),|,(log)'|(',|ξΘ=ΘΘΘDHPEQDHCS 2750 Machine LearningNaïve Bayes with a hidden class and missing valuesAssume:• is

View Full Document