MIT 9 520 - Spectral Regularization - D1711387

Home> Schools> Massachusetts Institute of Technology> Brain and Cognitive Sciences (9) > 9 520> Spectral Regularization

DOC PREVIEW

MIT 9 520 - Spectral Regularization

School name Massachusetts Institute of Technology

Course 9 520- Statistical Learning Theory and Applications

Pages 45

This preview shows page 1-2-3-21-22-23-43-44-45 out of 45 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Spectral RegularizationLorenzo Rosasco9.520 Class 08March 1, 2010L. Rosasco Spectral RegularizationAbout this classGoal To discuss how a class of regularization methodsoriginally designed for solving ill-posed inverseproblems, give rise to regularized learningalgorithms. These algorithms are kernel methodsthat can be easily implemented and have acommon derivation, but different computationaland theoretical properties.L. Rosasco Spectral RegularizationPlanFrom ERM to Tikhonov regularization.Linear ill-posed problems and stability.Spectral Regularization and Filtering.Example of Algorithms.L. Rosasco Spectral RegularizationBasic Notationtraining set S = {(x1, y1), ..., (xn, yn)}.X is the n by d input matrix.Y = (y1, . . . , yn) is the output vector.k denotes the kernel function , K the n by n kernel matrixwith entries Kij= k(xi, xj) and H the RKHS with kernel k.RLS estimator solvesminf ∈H1nnXi=1(yi− f (xi))2+ λ kf k2H.L. Rosasco Spectral RegularizationRepresenter TheoremWe have seen that RKHS allow us to write the RLS estimator inthe formfλS(x) =nXi=1cik(x, xi)with(K + nλI)c = Ywhere c = (c1, . . . , cn).L. Rosasco Spectral RegularizationEmpirical risk minimizationSimilarly we can prove that the solution of empirical riskminimizationminf ∈H1nnXi=1(yi− f (xi))2can be written asfS(x) =nXi=1cik(x, xi)where the coefficients satisfyKc = Y .L. Rosasco Spectral RegularizationThe Role of RegularizationWe observed that adding a penalization term can be interpretedas way to to control smoothness and avoid overfittingminf ∈H1nnXi=1(yi− f (xi))2⇒ minf ∈H1nnXi=1(yi− f (xi))2+ λ kf k2H.L. Rosasco Spectral RegularizationThe Role of RegularizationNow we can observe that adding a penalty has an effect from anumerical point of view:Kc = Y ⇒ (K + nλI)c = Yit stabilizes a possibly ill-conditioned matrix inversion problem.This is the point of view of regularization for (ill-posed) inverseproblems.L. Rosasco Spectral RegularizationIll-posed Inverse ProblemsHadamard introduced the definition of ill-posedness. Ill-posedproblems are typically inverse problems.If g ∈ G and f ∈ F , with G, F Hilbert spaces, a linear,continuous operator L, consider the equationg = Lf .The direct problem is is to compute g given f ; the inverseproblem is to compute f given the data g.The inverse problem of finding f is well-posed whenthe solution exists,is unique andis stable, that is depends continuously on the initial data g.Otherwise the problem is ill-posed.L. Rosasco Spectral RegularizationIll-posed Inverse ProblemsHadamard introduced the definition of ill-posedness. Ill-posedproblems are typically inverse problems.If g ∈ G and f ∈ F , with G, F Hilbert spaces, a linear,continuous operator L, consider the equationg = Lf .The direct problem is is to compute g given f ; the inverseproblem is to compute f given the data g.The inverse problem of finding f is well-posed whenthe solution exists,is unique andis stable, that is depends continuously on the initial data g.Otherwise the problem is ill-posed.L. Rosasco Spectral RegularizationIll-posed Inverse ProblemsHadamard introduced the definition of ill-posedness. Ill-posedproblems are typically inverse problems.If g ∈ G and f ∈ F , with G, F Hilbert spaces, a linear,continuous operator L, consider the equationg = Lf .The direct problem is is to compute g given f ; the inverseproblem is to compute f given the data g.The inverse problem of finding f is well-posed whenthe solution exists,is unique andis stable, that is depends continuously on the initial data g.Otherwise the problem is ill-posed.L. Rosasco Spectral RegularizationLinear System for ERMIn the finite dimensional case the main problem is numericalstability.For example, in the learning setting the kernel matrix can bedecomposed as K = QΣQt, with Σ = diag(σ1, . . . , σn),σ1≥ σ2≥ ...σn≥ 0 and q1, . . . , qnare the correspondingeigenvectors.Thenc = K−1Y = QΣ−1QtY =nXi=11σi< qi, Y > qi.In correspondence of small eigenvalues, small perturbations ofthe data can cause large changes in the solution. The problemis ill-conditioned.L. Rosasco Spectral RegularizationLinear System for ERMIn the finite dimensional case the main problem is numericalstability.For example, in the learning setting the kernel matrix can bedecomposed as K = QΣQt, with Σ = diag(σ1, . . . , σn),σ1≥ σ2≥ ...σn≥ 0 and q1, . . . , qnare the correspondingeigenvectors.Thenc = K−1Y = QΣ−1QtY =nXi=11σi< qi, Y > qi.In correspondence of small eigenvalues, small perturbations ofthe data can cause large changes in the solution. The problemis ill-conditioned.L. Rosasco Spectral RegularizationRegularization as a FilterFor Tikhonov regularizationc = (K + nλI)−1Y= Q(Σ + nλI)−1QtY=nXi=11σi+ nλ< qi, Y > qi.Regularization filters out the undesired components.For σ >> λn, then1σi+nλ∼1σi.For σ << λn, then1σi+nλ∼1λn.L. Rosasco Spectral RegularizationMatrix FunctionNote that we can look at a scalar function Gλ(σ) as a functionon the kernel matrix.Using the eigen-decomposition of K we can defineGλ(K ) = QGλ(Σ)QT,meaningGλ(K )Y =nXi=1Gλ(σi) < qi, Y > qi.For TikhonovGλ(σ) =1σ + nλ.L. Rosasco Spectral RegularizationRegularization in Inverse ProblemsIn the inverse problems literature many algorithms areknown besides Tikhonov regularization.Each algorithm is defined by a suitable filter function Gλ.This class of algorithms is known collectively as spectralregularization.Algorithms are not necessarily based on penalizedempirical risk minimization.L. Rosasco Spectral RegularizationFiltering, Regularizartion and LearningThe idea of using regularization from inverse problems instatistics (see Wahba) and machine learning (see Poggio andGirosi) is now well known.Ideas coming from inverse problems regarded mostly the useof Tikhonov regularization.The notion of filter function was studied in machine learningand gave a connection between function approximation insignal processing and approximation theory. The work ofPoggio and Girosi enlighted the relation between neuralnetwork, radial basis function and regularization.Filtering was typically used to define a penalty for Tikhonovregularization, in the following it is used to define algorithmsdifferent though similar to Tikhonov regularization.L. Rosasco Spectral RegularizationAlgorithmsBesides Tikhonov regularization we consider the followingalgorithms:Gradient Descent or Landweber Iteration or L2 Boostingν-method, accelerated

View Full Document