MIT 9 520 - Bayesian Interpretations of Regularization - D2768993

Home> Schools> Massachusetts Institute of Technology> Brain and Cognitive Sciences (9) > 9 520> Bayesian Interpretations of Regularization

DOC PREVIEW

MIT 9 520 - Bayesian Interpretations of Regularization

School name Massachusetts Institute of Technology

Course 9 520- Statistical Learning Theory and Applications

Pages 48

This preview shows page 1-2-3-23-24-25-26-46-47-48 out of 48 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Bayesian Interpretations of RegularizationCharlie Frogner9.520 Class 20April 21, 2010C. Frogner Bayesian Interpretations of RegularizationThe PlanRegularized least squares maps {(xi, yi)}ni=1to a function thatminimizes the regularized loss:fS= arg minf ∈H12nXi=1(yi− f (xi))2+λ2kf k2HCan we interpret RLS from a probabilistic point of view?C. Frogner Bayesian Interpretations of RegularizationSome notationS = {(xi, yi)}ni=1is the set of observed input/output pairs inRd× R (the training set).X and Y denote the matrices [x1, . . . , xn]T∈ Rn×dand[y1, . . . , yn]T∈ Rn, respectively.θ is a vector of parameters in Rp.p(Y |X , θ) is the joint distribution over outputs Y giveninputs X and the parameters.C. Frogner Bayesian Interpretations of RegularizationWhere do probabilities show up?12nXi=1V (yi, f (xi)) +λ2kf k2Hbecomesp(Y |f , X ) · p(f )Likelihood, a.k.a. noise model: p(Y |f , X ).Gaussian: yi∼ Nf∗(xi), σ2iPoisson: yi∼ Pois (f∗(xi))Prior: p(f ).C. Frogner Bayesian Interpretations of RegularizationWhere do probabilities show up?12nXi=1V (yi, f (xi)) +λ2kf k2Hbecomesp(Y |f , X ) · p(f )Likelihood, a.k.a. noise model: p(Y |f , X ).Gaussian: yi∼ Nf∗(xi), σ2iPoisson: yi∼ Pois (f∗(xi))Prior: p(f ).C. Frogner Bayesian Interpretations of RegularizationEstimationThe estimation problem:Given data {(xi, yi)}Ni=1and model p(Y |f , X ), p(f ).Find a good f to explain data.C. Frogner Bayesian Interpretations of RegularizationThe PlanMaximum likelihood estimation for ERMMAP estimation for linear RLSMAP estimation for kernel RLSTransductive modelInfinite dimensions get more complicatedC. Frogner Bayesian Interpretations of RegularizationMaximum likelihood estimationGiven data {(xi, yi)}Ni=1and model p(Y |f , X ), p(f ).A good f is one that maximizes p(Y |f , X ).C. Frogner Bayesian Interpretations of RegularizationMaximum likelihood and least squaresFor least squares, noise model is:yi|f , xi∼ Nf (xi), σ2a.k.a.Y |f , X ∼ Nf (X ), σ2ISop(Y |f , X ) =1(2πσ2)N/2exp(−NXi=11σ2(yi− f (xi))2)C. Frogner Bayesian Interpretations of RegularizationMaximum likelihood and least squaresFor least squares, noise model is:yi|f , xi∼ Nf (xi), σ2a.k.a.Y |f , X ∼ Nf (X ), σ2ISop(Y |f , X ) =1(2πσ2)N/2exp(−NXi=11σ2(yi− f (xi))2)C. Frogner Bayesian Interpretations of RegularizationMaximum likelihood and least squaresMaximum likelihood: maximizep(Y |f , X ) =1(2πσ2)N/2exp(−NXi=11σ2(yi− f (xi)))2)Empirical risk minimization: minimizeNXi=1(yi− f (xi))2C. Frogner Bayesian Interpretations of Regularization...NXi=1(yi− f (xi))2C. Frogner Bayesian Interpretations of Regularization...e−NPi=11σ2(yi−f (xi))2C. Frogner Bayesian Interpretations of RegularizationWhat about regularization?RLS:arg minf12nXi=1(yi− f (xi))2+λ2kf k2HIs there a model of Y and f that yields RLS?Yes.e−12σ2εnPi=1(yi−f (xi))2−λ2kf k2Hp(Y |f , X ) · p(f )C. Frogner Bayesian Interpretations of RegularizationWhat about regularization?RLS:arg minf12nXi=1(yi− f (xi))2+λ2kf k2HIs there a model of Y and f that yields RLS?Yes.e−12σ2εnPi=1(yi−f (xi))2−λ2kf k2Hp(Y |f , X ) · p(f )C. Frogner Bayesian Interpretations of RegularizationWhat about regularization?RLS:arg minf12nXi=1(yi− f (xi))2+λ2kf k2HIs there a model of Y and f that yields RLS?Yes.e−12σ2εnPi=1(yi−f (xi))2· e−λ2kf k2Hp(Y |f , X ) · p(f )C. Frogner Bayesian Interpretations of RegularizationWhat about regularization?RLS:arg minf12nXi=1(yi− f (xi))2+λ2kf k2HIs there a model of Y and f that yields RLS?Yes.e−12σ2εnPi=1(yi−f (xi))2· e−λ2kf k2Hp(Y |f , X ) · p(f )C. Frogner Bayesian Interpretations of RegularizationPosterior function estimatesGiven data {(xi, yi)}Ni=1and model p(Y |f , X ), p(f ).Find a good f to explain data.(If we can get p(f |Y , X ))Bayes least squares estimate:ˆfBLS= E(f |X ,Y )[f ]i.e. the mean of the posterior.MAP estimate:ˆfMAP(Y |X ) = arg maxfp(f |X , Y )i.e. a mode of the posterior.C. Frogner Bayesian Interpretations of RegularizationPosterior function estimatesGiven data {(xi, yi)}Ni=1and model p(Y |f , X ), p(f ).Find a good f to explain data.(If we can get p(f |Y , X ))Bayes least squares estimate:ˆfBLS= E(f |X ,Y )[f ]i.e. the mean of the posterior.MAP estimate:ˆfMAP(Y |X ) = arg maxfp(f |X , Y )i.e. a mode of the posterior.C. Frogner Bayesian Interpretations of RegularizationPosterior function estimatesGiven data {(xi, yi)}Ni=1and model p(Y |f , X ), p(f ).Find a good f to explain data.(If we can get p(f |Y , X ))Bayes least squares estimate:ˆfBLS= E(f |X ,Y )[f ]i.e. the mean of the posterior.MAP estimate:ˆfMAP(Y |X ) = arg maxfp(f |X , Y )i.e. a mode of the posterior.C. Frogner Bayesian Interpretations of RegularizationA posterior on functions?How to find p(f |Y , X )?Bayes’ rule:p(f |X , Y ) =p(Y |X , f ) · p(f )p(Y |X )=p(Y |X , f ) · p(f )Rp(Y |X , f )dfWhen is this well-defined?C. Frogner Bayesian Interpretations of RegularizationA posterior on functions?How to find p(f |Y , X )?Bayes’ rule:p(f |X , Y ) =p(Y |X , f ) · p(f )p(Y |X )=p(Y |X , f ) · p(f )Rp(Y |X , f )dfWhen is this well-defined?C. Frogner Bayesian Interpretations of RegularizationA posterior on functions?Functions vs. parameters:H∼=RpRepresent functions in H by their coordinates w.r.t. a basis:f ∈ H ↔ θ ∈ RpAssume (for the moment): p < ∞C. Frogner Bayesian Interpretations of RegularizationA posterior on functions?Functions vs. parameters:H∼=RpRepresent functions in H by their coordinates w.r.t. a basis:f ∈ H ↔ θ ∈ RpAssume (for the moment): p < ∞C. Frogner Bayesian Interpretations of RegularizationPosterior for linear RLSLinear function:f (x) = hx, θiNoise model:Y |X , θ ∼ NX θ, σ2εIAdd a prior:θ ∼ N (0, Λ)C. Frogner Bayesian Interpretations of RegularizationPosterior for linear RLSModel:Y |X , θ ∼ NX θ, σ2εI, θ ∼ N (0, Λ)Joint over Y and θ:Yθ∼ N00,X ΛXT+ σ2εI X ΛΛXTΛCondition on Y .C. Frogner Bayesian Interpretations of RegularizationPosterior for linear RLSPosterior:θ|X , Y ∼ Nµθ|X ,Y, Σθ|X ,Ywhereµθ|X ,Y= ΛXT(X ΛXT+ σ2εI)−1YΣθ|X ,Y= Λ −ΛXT(X ΛXT+ σ2εI)−1X ΛThis is Gaussian, soˆθMAP(Y |X ) =ˆθBLS(Y |X ) = ΛXT(X ΛXT+ σ2εI)−1YC. Frogner Bayesian Interpretations of RegularizationPosterior for linear RLSPosterior:θ|X , Y ∼ Nµθ|X ,Y, Σθ|X ,Ywhereµθ|X ,Y= ΛXT(X ΛXT+

View Full Document