DOC PREVIEW
MIT 9 520 - Bayesian Interpretations of Regularization

This preview shows page 1-2-3-23-24-25-26-46-47-48 out of 48 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Bayesian Interpretations of RegularizationCharlie Frogner9.520 Class 20April 21, 2010C. Frogner Bayesian Interpretations of RegularizationThe PlanRegularized least squares maps {(xi, yi)}ni=1to a function thatminimizes the regularized loss:fS= arg minf ∈H12nXi=1(yi− f (xi))2+λ2kf k2HCan we interpret RLS from a probabilistic point of view?C. Frogner Bayesian Interpretations of RegularizationSome notationS = {(xi, yi)}ni=1is the set of observed input/output pairs inRd× R (the training set).X and Y denote the matrices [x1, . . . , xn]T∈ Rn×dand[y1, . . . , yn]T∈ Rn, respectively.θ is a vector of parameters in Rp.p(Y |X , θ) is the joint distribution over outputs Y giveninputs X and the parameters.C. Frogner Bayesian Interpretations of RegularizationWhere do probabilities show up?12nXi=1V (yi, f (xi)) +λ2kf k2Hbecomesp(Y |f , X ) · p(f )Likelihood, a.k.a. noise model: p(Y |f , X ).Gaussian: yi∼ Nf∗(xi), σ2iPoisson: yi∼ Pois (f∗(xi))Prior: p(f ).C. Frogner Bayesian Interpretations of RegularizationWhere do probabilities show up?12nXi=1V (yi, f (xi)) +λ2kf k2Hbecomesp(Y |f , X ) · p(f )Likelihood, a.k.a. noise model: p(Y |f , X ).Gaussian: yi∼ Nf∗(xi), σ2iPoisson: yi∼ Pois (f∗(xi))Prior: p(f ).C. Frogner Bayesian Interpretations of RegularizationEstimationThe estimation problem:Given data {(xi, yi)}Ni=1and model p(Y |f , X ), p(f ).Find a good f to explain data.C. Frogner Bayesian Interpretations of RegularizationThe PlanMaximum likelihood estimation for ERMMAP estimation for linear RLSMAP estimation for kernel RLSTransductive modelInfinite dimensions get more complicatedC. Frogner Bayesian Interpretations of RegularizationMaximum likelihood estimationGiven data {(xi, yi)}Ni=1and model p(Y |f , X ), p(f ).A good f is one that maximizes p(Y |f , X ).C. Frogner Bayesian Interpretations of RegularizationMaximum likelihood and least squaresFor least squares, noise model is:yi|f , xi∼ Nf (xi), σ2a.k.a.Y |f , X ∼ Nf (X ), σ2ISop(Y |f , X ) =1(2πσ2)N/2exp(−NXi=11σ2(yi− f (xi))2)C. Frogner Bayesian Interpretations of RegularizationMaximum likelihood and least squaresFor least squares, noise model is:yi|f , xi∼ Nf (xi), σ2a.k.a.Y |f , X ∼ Nf (X ), σ2ISop(Y |f , X ) =1(2πσ2)N/2exp(−NXi=11σ2(yi− f (xi))2)C. Frogner Bayesian Interpretations of RegularizationMaximum likelihood and least squaresMaximum likelihood: maximizep(Y |f , X ) =1(2πσ2)N/2exp(−NXi=11σ2(yi− f (xi)))2)Empirical risk minimization: minimizeNXi=1(yi− f (xi))2C. Frogner Bayesian Interpretations of Regularization...NXi=1(yi− f (xi))2C. Frogner Bayesian Interpretations of Regularization...e−NPi=11σ2(yi−f (xi))2C. Frogner Bayesian Interpretations of RegularizationWhat about regularization?RLS:arg minf12nXi=1(yi− f (xi))2+λ2kf k2HIs there a model of Y and f that yields RLS?Yes.e−12σ2εnPi=1(yi−f (xi))2−λ2kf k2Hp(Y |f , X ) · p(f )C. Frogner Bayesian Interpretations of RegularizationWhat about regularization?RLS:arg minf12nXi=1(yi− f (xi))2+λ2kf k2HIs there a model of Y and f that yields RLS?Yes.e−12σ2εnPi=1(yi−f (xi))2−λ2kf k2Hp(Y |f , X ) · p(f )C. Frogner Bayesian Interpretations of RegularizationWhat about regularization?RLS:arg minf12nXi=1(yi− f (xi))2+λ2kf k2HIs there a model of Y and f that yields RLS?Yes.e−12σ2εnPi=1(yi−f (xi))2· e−λ2kf k2Hp(Y |f , X ) · p(f )C. Frogner Bayesian Interpretations of RegularizationWhat about regularization?RLS:arg minf12nXi=1(yi− f (xi))2+λ2kf k2HIs there a model of Y and f that yields RLS?Yes.e−12σ2εnPi=1(yi−f (xi))2· e−λ2kf k2Hp(Y |f , X ) · p(f )C. Frogner Bayesian Interpretations of RegularizationPosterior function estimatesGiven data {(xi, yi)}Ni=1and model p(Y |f , X ), p(f ).Find a good f to explain data.(If we can get p(f |Y , X ))Bayes least squares estimate:ˆfBLS= E(f |X ,Y )[f ]i.e. the mean of the posterior.MAP estimate:ˆfMAP(Y |X ) = arg maxfp(f |X , Y )i.e. a mode of the posterior.C. Frogner Bayesian Interpretations of RegularizationPosterior function estimatesGiven data {(xi, yi)}Ni=1and model p(Y |f , X ), p(f ).Find a good f to explain data.(If we can get p(f |Y , X ))Bayes least squares estimate:ˆfBLS= E(f |X ,Y )[f ]i.e. the mean of the posterior.MAP estimate:ˆfMAP(Y |X ) = arg maxfp(f |X , Y )i.e. a mode of the posterior.C. Frogner Bayesian Interpretations of RegularizationPosterior function estimatesGiven data {(xi, yi)}Ni=1and model p(Y |f , X ), p(f ).Find a good f to explain data.(If we can get p(f |Y , X ))Bayes least squares estimate:ˆfBLS= E(f |X ,Y )[f ]i.e. the mean of the posterior.MAP estimate:ˆfMAP(Y |X ) = arg maxfp(f |X , Y )i.e. a mode of the posterior.C. Frogner Bayesian Interpretations of RegularizationA posterior on functions?How to find p(f |Y , X )?Bayes’ rule:p(f |X , Y ) =p(Y |X , f ) · p(f )p(Y |X )=p(Y |X , f ) · p(f )Rp(Y |X , f )dfWhen is this well-defined?C. Frogner Bayesian Interpretations of RegularizationA posterior on functions?How to find p(f |Y , X )?Bayes’ rule:p(f |X , Y ) =p(Y |X , f ) · p(f )p(Y |X )=p(Y |X , f ) · p(f )Rp(Y |X , f )dfWhen is this well-defined?C. Frogner Bayesian Interpretations of RegularizationA posterior on functions?Functions vs. parameters:H∼=RpRepresent functions in H by their coordinates w.r.t. a basis:f ∈ H ↔ θ ∈ RpAssume (for the moment): p < ∞C. Frogner Bayesian Interpretations of RegularizationA posterior on functions?Functions vs. parameters:H∼=RpRepresent functions in H by their coordinates w.r.t. a basis:f ∈ H ↔ θ ∈ RpAssume (for the moment): p < ∞C. Frogner Bayesian Interpretations of RegularizationPosterior for linear RLSLinear function:f (x) = hx, θiNoise model:Y |X , θ ∼ NX θ, σ2εIAdd a prior:θ ∼ N (0, Λ)C. Frogner Bayesian Interpretations of RegularizationPosterior for linear RLSModel:Y |X , θ ∼ NX θ, σ2εI, θ ∼ N (0, Λ)Joint over Y and θ:Yθ∼ N00,X ΛXT+ σ2εI X ΛΛXTΛCondition on Y .C. Frogner Bayesian Interpretations of RegularizationPosterior for linear RLSPosterior:θ|X , Y ∼ Nµθ|X ,Y, Σθ|X ,Ywhereµθ|X ,Y= ΛXT(X ΛXT+ σ2εI)−1YΣθ|X ,Y= Λ −ΛXT(X ΛXT+ σ2εI)−1X ΛThis is Gaussian, soˆθMAP(Y |X ) =ˆθBLS(Y |X ) = ΛXT(X ΛXT+ σ2εI)−1YC. Frogner Bayesian Interpretations of RegularizationPosterior for linear RLSPosterior:θ|X , Y ∼ Nµθ|X ,Y, Σθ|X ,Ywhereµθ|X ,Y= ΛXT(X ΛXT+


View Full Document

MIT 9 520 - Bayesian Interpretations of Regularization

Documents in this Course
Load more
Download Bayesian Interpretations of Regularization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayesian Interpretations of Regularization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayesian Interpretations of Regularization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?