Pitt CS 2750 - Ensamble methods Mixtures of experts

Unformatted text preview:

1CS 2750 Machine LearningCS 2750 Machine LearningLecture 22Milos [email protected] Sennott SquareEnsamble methods. Mixtures of expertsCS 2750 Machine LearningMixture of experts model• Ensamble methods:– Use a combination of simpler learners to improve predictions• Mixture of expert model:– Covers different input regions with different learners– A “soft” switching between learners• Mixture of expertsExpert = learnerx2CS 2750 Machine LearningMixture of experts model• Gating network : decides what expert to useExpert 1Expert 2Expert kkgxGatingnetworky. . .2g1gkggg ,...,21- gating functionsCS 2750 Machine LearningLearning mixture of experts• Learning consists of two tasks:– Learn the parameters of individual expert networks– Learn the parameters of the gating network• Decides where to make a split• Assume: gating functions give probabilities• Based on the probability we partition the space– partitions belongs to different experts • How to model the gating network? – A multiway classifier model:• softmax model• a generative classifier model1)(),...(),(021≤≤ xxxkggg∑==kuug11)(x3CS 2750 Machine LearningLearning mixture of experts• Assume we have a set of linear experts• Assume a softmax gating network• Likelihood of y (assumed that errors for different experts are normally distributed with the same variance)xθTii=µ)|()exp()exp()(1ηx,xηxηxikuTuTiipgω≈=∑=(Note: bias terms are hidden in x)),,|(),|(),,|(1ΘxηxηxikiiypPyPωω∑==Θ−−=∑∑==22112exp21)exp()exp(σµσπikikjTjTiyxηxηCS 2750 Machine LearningLearning mixture of expertsGradient learning.On-line update rule for parameters of expert i– If we know the expert that is responsible for x– If we do not know the expertiθjiijijijxy )(µαθθ−+←jiiijijijxyh )(µαθθ−+←ih - responsibility of the ith expert = a kind of posterior()()∑∑==−−−−==kuuuiikuuuiiiygygypgypgyh12212/1exp)(2/1exp)(),,|()(),,|()(),(µµωωxxθxxθxxx)(xig -a prior exp(...) - a likelihood4CS 2750 Machine LearningLearning mixtures of expertsGradient methods• On-line learning of gating network parameters• The learning with conditioned mixtures can be extended to learning of parameters of an arbitrary expert network– e.g. logistic regression, multilayer neural networkiηjiiijijijxgyh ))(),(( xx −+←βηηijiiijiiijhllθµθµµθ∂∂=∂∂∂∂=∂∂ijijijijlθβθθ∂∂+←CS 2750 Machine LearningLearning mixture of expertsEM algorithm offers an alternative way to learn the mixtureAlgorithm:Initialize parametersRepeat Set 1. Expectation step2. Maximization stepuntil no or small improvement in –Hidden variables are identities of expert networks responsible for (x,y) data points Θ),,(log)'|('|ξΘ=ΘΘ X|YH,ΘY,X,PEQH)'|(maxargΘΘ=ΘΘQΘ=Θ ')'|(ΘΘQ5CS 2750 Machine LearningLearning mixture of experts with EM• Assume we have a set of linear experts• Assume a softmax gating network • Q function to optimize• Assume:– indexes different data points– an indicator variable for the data point l to be covered by an expert ixθTii=µ)|()( ηx,xiiPgω=)),,|,(log()',',,|()'|( ηxηx ΘΘ=ΘΘ∑∑lilllliilyPyEQωδ),,(log)'|('|ξΘ=ΘΘ X|YH,ΘY,X,PEQHliδlCS 2750 Machine LearningLearning mixture of experts with EM• Assume:– indexes different data points– an indicator variable for data point l and expert i)),,|,(log()',',,|()'|( ηxηx ΘΘ=ΘΘ∑∑lilllliilyPyEQωδliδl∑===ΘkuullluillilllillliypgypgyhyE1)',,|()()',,|()(),()',',,|(θxxθxxxηxωωδ)),,|,(log(),()'|( ηxx Θ=ΘΘ∑∑lilllliilyPyhQωResponsibility of the expert i for (x,y)6CS 2750 Machine LearningLearning mixture of experts with EM• The maximization step boils down to the problem that is equivalent to the problem of finding the ML estimates of the parameters of the expert and gating networks• Note that any optimization technique can be applied in this step)),,|,(log(),()'|( ηxx Θ=ΘΘ∑∑lilllliilyPyhQω),|(log),,|(log)),,|,(log( ηxxηxlilillilPyPyPωωω+Θ=ΘExpert network iGating network(Linear regression) (Softmax) CS 2750 Machine LearningLearning mixture of experts • Note that we can use different expert and gating models• For example:– Experts: logistic regression models– Gating network: a generative latent variable model • Likelihood of y: ))exp(1/(1 xθTiiy −+=)|()( ηx,xiiPgω=),,|(),|(),,|(1ΘxηxηxukuuypPyPωω∑==ΘxωHidden class7CS 2750 Machine LearningHierarchical mixture of experts• Mixture of experts: define a probabilistic split• The idea can be extended to a hierarchy of experts (a kind of a probabilistic decision tree)E1 E2 E3uωyyy yxuvωE4Switching (gating)indicatorCS 2750 Machine LearningHierarchical mixture modelAn output is conditioned (gated) on multiple mixture levels•Define• Then),..,,,|(),,,|()..,,|(),|(),|(.... suvuvusuvusuvuuuvvuuyPPpPyPθωωωωωξωωηωxxxxx∑∑∑=Θ KIndividual experts{}suvuvusuv ....,...,ωωω=Ω),,,|()..,|()|(),|(....KuvusuvuuvusuvPPPPωωωωωωxxxx =ΘΩ),,|(),|(...),|(....ΘΩΘΩ=Θ∑∑∑suvuv ssuvyPPyP xxx- Mixture model is a kind of soft decision tree model- with a fixed tree structure !!8CS 2750 Machine LearningHierarchical mixture of experts• Multiple levels of probabilistic gating functions• Multiple levels of responsibilities• How they are related?),|()(Θ= xxuuPgω),|()(|Θ=uuvuvPgωωxx),,|(),(Θ= yPyhuuxxω),,,|(),(|Θ=uuvuvyPyhωωxx∑ΘΘΘΘ=ΘvuuvuvuuuvuvuuuvPyPPyPyP),,|(),,,|(),,|(),,,|(),,,|(ωωωωωωωωωωxxxxx),,|(),,|,( Θ=Θ=∑uuuvvyPyPωωωxx)(|xuvgresponsibilityCS 2750 Machine LearningHierarchical mixture of experts• Responsibility for the top layer• But is computed while computing•General algorithm:– Downward sweep; calculate– Upward sweep; calculate∑ΘΘΘΘ=Θ=uuuuuuuPyPPyPyPyh),|(),,|(),|(),,|(),,|(),(xxxxxxωωωωω),|( ΘuyPωx),,,|(),(|Θ=uuvuvyPyhωωxx),,|()(|Θ=uuvuvPxgωωx),,|(),(Θ= yPyhuuxxω)(|xuvg),( yhuxuωuvω9CS 2750 Machine LearningOn-line learning• Assume linear experts•Gradients (vector form):• Again: can it can be extended to different expert networksxθTuvuv=µxθ)(| uvuvuuvyhhlµ−=∂∂xη)(uughl−=∂∂xξ)(|| uvuvughhl−=∂∂Top level (root) nodeSecond level


View Full Document

Pitt CS 2750 - Ensamble methods Mixtures of experts

Download Ensamble methods Mixtures of experts
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Ensamble methods Mixtures of experts and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Ensamble methods Mixtures of experts 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?