Regularization via Spectral FilteringLorenzo Rosasco9.520 Class 09March 2, 2011L. Rosasco Regularization via Spectral FilteringAbout this classGoal To discuss how a class of regularization methodsoriginally designed for solving ill-posed inverseproblems, give rise to regularized learningalgorithms. These algorithms are kernel methodsthat can be easily implemented and have acommon derivation, but different computationaland theoretical properties.L. Rosasco Regularization via Spectral FilteringPlanFrom ERM to Tikhonov regularization.Linear ill-posed problems and stability.Spectral Regularization and Filtering.Example of Algorithms.L. Rosasco Regularization via Spectral FilteringBasic Notationtraining set S = {(x1, y1), ..., (xn, yn)}.X is the n by d input matrix.Y = (y1, . . . , yn) is the output vector.k denotes the kernel function , K the n by n kernel matrixwith entries Kij= k(xi, xj) and H the RKHS with kernel k.RLS estimator solvesminf ∈H1nnXi=1(yi− f (xi))2+ λ kf k2H.L. Rosasco Regularization via Spectral FilteringRepresenter TheoremWe have seen that RKHS allow us to write the RLS estimator inthe formfλS(x) =nXi=1cik(x, xi)with(K + nλI)c = Ywhere c = (c1, . . . , cn).L. Rosasco Regularization via Spectral FilteringEmpirical risk minimizationSimilarly we can prove that the solution of empirical riskminimizationminf ∈H1nnXi=1(yi− f (xi))2can be written asfS(x) =nXi=1cik(x, xi)where the coefficients satisfyKc = Y .L. Rosasco Regularization via Spectral FilteringThe Role of RegularizationWe observed that adding a penalization term can be interpretedas way to to control smoothness and avoid overfittingminf ∈H1nnXi=1(yi− f (xi))2⇒ minf ∈H1nnXi=1(yi− f (xi))2+ λ kf k2H.L. Rosasco Regularization via Spectral FilteringThe Role of RegularizationNow we can observe that adding a penalty has an effect from anumerical point of view:Kc = Y ⇒ (K + nλI)c = Yit stabilizes a possibly ill-conditioned matrix inversion problem.This is the point of view of regularization for (ill-posed) inverseproblems.L. Rosasco Regularization via Spectral FilteringIll-posed Inverse ProblemsHadamard introduced the definition of ill-posedness. Ill-posedproblems are typically inverse problems.If g ∈ G and f ∈ F , with G, F Hilbert spaces, a linear,continuous operator L, consider the equationg = Lf .The direct problem is is to compute g given f ; the inverseproblem is to compute f given the data g.The inverse problem of finding f is well-posed whenthe solution exists,is unique andis stable, that is depends continuously on the initial data g.Otherwise the problem is ill-posed.L. Rosasco Regularization via Spectral FilteringIll-posed Inverse ProblemsHadamard introduced the definition of ill-posedness. Ill-posedproblems are typically inverse problems.If g ∈ G and f ∈ F , with G, F Hilbert spaces, a linear,continuous operator L, consider the equationg = Lf .The direct problem is is to compute g given f ; the inverseproblem is to compute f given the data g.The inverse problem of finding f is well-posed whenthe solution exists,is unique andis stable, that is depends continuously on the initial data g.Otherwise the problem is ill-posed.L. Rosasco Regularization via Spectral FilteringIll-posed Inverse ProblemsHadamard introduced the definition of ill-posedness. Ill-posedproblems are typically inverse problems.If g ∈ G and f ∈ F , with G, F Hilbert spaces, a linear,continuous operator L, consider the equationg = Lf .The direct problem is is to compute g given f ; the inverseproblem is to compute f given the data g.The inverse problem of finding f is well-posed whenthe solution exists,is unique andis stable, that is depends continuously on the initial data g.Otherwise the problem is ill-posed.L. Rosasco Regularization via Spectral FilteringLinear System for ERMIn the finite dimensional case the main problem is numericalstability.For example, in the learning setting the kernel matrix can bedecomposed as K = QΣQT, with Σ = diag(σ1, . . . , σn),σ1≥ σ2≥ ...σn≥ 0 and q1, . . . , qnare the correspondingeigenvectors.Thenc = K−1Y = QΣ−1QTY =nXi=11σihqi, Y iqi.In correspondence of small eigenvalues, small perturbations ofthe data can cause large changes in the solution. The problemis ill-conditioned.L. Rosasco Regularization via Spectral FilteringLinear System for ERMIn the finite dimensional case the main problem is numericalstability.For example, in the learning setting the kernel matrix can bedecomposed as K = QΣQT, with Σ = diag(σ1, . . . , σn),σ1≥ σ2≥ ...σn≥ 0 and q1, . . . , qnare the correspondingeigenvectors.Thenc = K−1Y = QΣ−1QTY =nXi=11σihqi, Y iqi.In correspondence of small eigenvalues, small perturbations ofthe data can cause large changes in the solution. The problemis ill-conditioned.L. Rosasco Regularization via Spectral FilteringRegularization as a FilterFor Tikhonov regularizationc = (K + nλI)−1Y= Q(Σ + nλI)−1QTY=nXi=11σi+ nλhqi, Y iqi.Regularization filters out the undesired components.For σ λn, then1σi+nλ∼1σi.For σ λn, then1σi+nλ∼1λn.L. Rosasco Regularization via Spectral FilteringMatrix FunctionNote that we can look at a scalar function Gλ(σ) as a functionon the kernel matrix.Using the eigen-decomposition of K we can defineGλ(K ) = QGλ(Σ)QT,meaningGλ(K )Y =nXi=1Gλ(σi)hqi, Y iqi.For TikhonovGλ(σ) =1σ + nλ.L. Rosasco Regularization via Spectral FilteringRegularization in Inverse ProblemsIn the inverse problems literature many algorithms areknown besides Tikhonov regularization.Each algorithm is defined by a suitable filter function Gλ.This class of algorithms is known collectively as spectralregularization.Algorithms are not necessarily based on penalizedempirical risk minimization.L. Rosasco Regularization via Spectral FilteringAlgorithmsGradient Descent or Landweber Iteration or L2 Boostingν-method, accelerated Landweber.Iterated TikhonovTruncated Singular Value Decomposition (TSVD) PrincipalComponent Regression (PCR)The spectral filtering perspective leads to a unified framework.L. Rosasco Regularization via Spectral FilteringProperties of Spectral FiltersNot every scalar function defines a regularization scheme.Roughly speaking a good filter function must have the followingproperties:as λ goes to 0, Gλ(σ) → 1/σ so thatGλ(K ) → K−1.λ controls the magnitude of the (smaller) eigenvalues ofGλ(K ).L. Rosasco Regularization via Spectral FilteringSpectral Regularization for LearningWe can
View Full Document