Reproducing Kernel Hilbert SpacesLorenzo Rosasco9.520 Class 03February 13, 2008L. Rosasco RKHSAbout this classGoal To introduce a particularly useful family ofhypothesis spaces called Reproducing KernelHilbert Spaces (RKHS) and to derive the generalsolution of Tikhonov regularization in RKHS.L. Rosasco RKHSFunction Approximation from SamplesHere is a graphical example for generalization: given a certainnumber of samples...L. Rosasco RKHSFunction Approximation from Samples (cont.)Suppose this is the “true” solution...L. Rosasco RKHSThe Problem is Ill-Posed... but suppose ERM gives this solution!L. Rosasco RKHSRegularizationThe basic idea of regularization (originally introducedindependently of the learning problem) is to restorewell-posedness of ERM by constraining the hypothesis spaceH.Penalized MinimizationA possible way to do this is considering penalized empirical riskminimization, that is we look for solutions minimizing a two termfunctionalERR(f )| {z }empirical error+λ pen(f )| {z }penalization termthe regularization parameter λ trade-offs the two terms.L. Rosasco RKHSTikhonov RegularizationTikhonov regularization amounts to minimize1nnXi=1V (f(xi), yi) + λkfk2H, λ > 0 (1)V (f (x), y) is the loss function, that is the price we paywhen we predict f (x) in place of yk · kHis the nor m in the function space HSuch a penalization term should encode some notion ofsmoothness of f .L. Rosasco RKHSThe "Ingredients" of Tikhonov RegularizationThe scheme we just described is very general and bychoosing different loss functions V (f (x), y) we can recoverdifferent algorithmsThe main point we want to discuss is how to choose anorm encoding some notion of smoothness/complexity ofthe solutionReproducing Kernel Hilbert Spaces allow us to do this in avery powerful wayL. Rosasco RKHSSome Functional AnalysisA function space F is a space whose elements are functionsf , for example f : Rd→ R.A norm is a nonnegative function k · k such that ∀f , g ∈ F andα ∈ R1kf k ≥ 0 and kf k = 0 iff f = 0;2kf + gk ≤ kf k + kgk;3kαf k = |α | kf k.A norm can be defined via a dot product kf k =phf , f i.A Hilbert space (besides other technical conditions) is a(possibly) infinite dimensional linear space endowed with a dotproduct.L. Rosasco RKHSSome Functional AnalysisA function space F is a space whose elements are functionsf , for example f : Rd→ R.A norm is a nonnegative function k · k such that ∀f , g ∈ F andα ∈ R1kf k ≥ 0 and kf k = 0 iff f = 0;2kf + gk ≤ kf k + kgk;3kαf k = |α | kf k.A norm can be defined via a dot product kf k =phf , f i.A Hilbert space (besides other technical conditions) is a(possibly) infinite dimensional linear space endowed with a dotproduct.L. Rosasco RKHSSome Functional AnalysisA function space F is a space whose elements are functionsf , for example f : Rd→ R.A norm is a nonnegative function k · k such that ∀f , g ∈ F andα ∈ R1kf k ≥ 0 and kf k = 0 iff f = 0;2kf + gk ≤ kf k + kgk;3kαf k = |α | kf k.A norm can be defined via a dot product kf k =phf , f i.A Hilbert space (besides other technical conditions) is a(possibly) infinite dimensional linear space endowed with a dotproduct.L. Rosasco RKHSSome Functional AnalysisA function space F is a space whose elements are functionsf , for example f : Rd→ R.A norm is a nonnegative function k · k such that ∀f , g ∈ F andα ∈ R1kf k ≥ 0 and kf k = 0 iff f = 0;2kf + gk ≤ kf k + kgk;3kαf k = |α | kf k.A norm can be defined via a dot product kf k =phf , f i.A Hilbert space (besides other technical conditions) is a(possibly) infinite dimensional linear space endowed with a dotproduct.L. Rosasco RKHSExamplesContinuous functions C[a, b] :a norm can be established by definingkf k = maxa≤x≤b|f (x)|(not a Hilbert space!)Square integrable functions L2[a, b]:it is a Hilbert space where the norm is induced by the dotproducthf , gi =Zbaf (x)g(x)dxL. Rosasco RKHSExamplesContinuous functions C[a, b] :a norm can be established by definingkf k = maxa≤x≤b|f (x)|(not a Hilbert space!)Square integrable functions L2[a, b]:it is a Hilbert space where the norm is induced by the dotproducthf , gi =Zbaf (x)g(x)dxL. Rosasco RKHSRKHSA linear evaluation functional over the Hilbert space of functionsH is a linear functional Ft: H → R that evaluates each functionin the space at the point t, orFt[f ] = f (t).DefinitionA Hilbert space H is a reproducing kernel Hilbert space(RKHS) if the evaluation functionals are bounded, i.e. if thereexists a M s.t.|Ft[f ]| = |f (t)| ≤ Mkf kH∀f ∈ HL. Rosasco RKHSRKHSA linear evaluation functional over the Hilbert space of functionsH is a linear functional Ft: H → R that evaluates each functionin the space at the point t, orFt[f ] = f (t).DefinitionA Hilbert space H is a reproducing kernel Hilbert space(RKHS) if the evaluation functionals are bounded, i.e. if thereexists a M s.t.|Ft[f ]| = |f (t)| ≤ Mkf kH∀f ∈ HL. Rosasco RKHSEvaluation functionalsEvaluation functionals are not always bounded.Consider L2[a, b]:Each element of the space is an equivalence class offunctions with the same integralR|f (x)|2dx.An integral remains the same if we change the function ina countable set of points.L. Rosasco RKHSReproducing kernel (rk)If H is a RKHS, then for each t ∈ X there exists, by theRiesz representation theorem a function Ktof H (calledrepresenter) with the reproducing propertyFt[f ] = hKt, f iH= f (t).Since Ktis a function in H, by the reproducing property, foreach x ∈ XKt(x) = hKt, KxiHThe reproducing kernel (rk) of H isK (t, x) := Kt(x)L. Rosasco RKHSReproducing kernel (rk)If H is a RKHS, then for each t ∈ X there exists, by theRiesz representation theorem a function Ktof H (calledrepresenter) with the reproducing propertyFt[f ] = hKt, f iH= f (t).Since Ktis a function in H, by the reproducing property, foreach x ∈ XKt(x) = hKt, KxiHThe reproducing kernel (rk) of H isK (t, x) := Kt(x)L. Rosasco RKHSPositive definite kernelsLet X be some set, for example a subset of Rdor Rditself. Akernel is a symmetric function K : X × X → R.DefinitionA kernel K (t, s) is positive definite (pd) ifnXi,j=1cicjK (ti, tj) ≥ 0for any n ∈ N and choice of t1, ..., tn∈ X and c1, ..., cn∈ R.L. Rosasco RKHSRKHS and kernelsThe following theorem relates pd ker nels and RKHSTheorema) For every RKHS the reproducing kernel is a positive definitekernelb) Conversely for every positive definite kernel K onX × X there is a unique RKHS on X with K as its
View Full Document