Reproducing Kernel Hilbert Spaces9.520 Class 03, 15 February 2006Andrea CaponnettoAbout this classGoal To introduce a particularly useful family of hypoth-esis spaces called Reproducing Kernel Hilbert Spaces(RKHS) and to derive the general solution of Tikhonovregularization in RKHS.Here is a graphical example forgeneralization: given a certain number ofsamples...xf(x)suppose this is the “true” solution...xf(x)... but suppose ERM gives this solution!xf(x)RegularizationThe basic idea of regularization (originally introduced in-dependently of the learning problem) is to restore well-posedness of ERM by constraining the hypothesis spaceH. The direct way – minimize the empirical error subjectto f in a ball in an appropriate normed functional spaceH – is called Ivanov regularization. The indirect way isTikhonov regularization (which is not ERM).Ivanov regularization over normed spacesERM finds the function in H which minimizes1nnXi=1V (f(xi), yi)which in general – for arbitrary hypothesis space H – isill-posed. Ivanov regularizes by finding the function thatminimizes1nnXi=1V (f(xi), yi)while satisfyingkfk2H≤ A,with k · k, the norm in the normed function space HFunction spaceA function space is a space made of functions. Eachfunction in the space can be thought of as a point. Ex-amples:1. C[a, b], the set of all real-valued continuous functionsin the interval [a, b];2. L1[a, b], the set of all real-valued functions whose ab-solute value is integrable in the interval [a, b];3. L2[a, b], the set of all real-valued functions square inte-grable in the interval [a, b]Normed spaceA normed space is a linear (vector) space N in which anorm is defined. A nonnegative function k · k is a norm iff∀f, g ∈ N and α ∈ IR1. kfk ≥ 0 and kfk = 0 iff f = 0;2. kf + gk ≤ kfk + kgk;3. kαfk = |α| kfk.Note, if all conditions are satisfied except kfk = 0 iff f = 0then the space has a seminorm instead of a norm.Examples1. A norm in C[a, b] can be established by definingkfk = maxa≤t≤b|f(t)|.2. A norm in L1[a, b] can be established by definingkfk =Zba|f(t)|dt.3. A norm in L2[a, b] can be established by definingkfk = Zbaf2(t)dt!1/2.From Ivanov to Tikhonov regularizationAlternatively, by the Lagrange multipler’s technique, Tikhonovregularization minimizes over the whole normed functionspace H, for a fixed positive parameter λ, the regularizedfunctional1nnXi=1V (f(xi), yi) + λkfk2H. (1)In practice, the normed function space H that we will con-sider, is a Reproducing Kernel Hilbert Space (RKHS).Lagrange multiplier’s techniqueLagrange multiplier’s technique allows the reduction of theconstrained minimization problemMinimize I(x)subject to Φ(x) ≤ A (for some A)to the unconstrained minimization problemMinimize J(x) = I(x) + λΦ(x) (for some λ ≥ 0)Hilbert spaceA Hilbert space is a normed space whose norm is inducedby a dot product hf, gi by the relationkfk =qhf, fi.A Hilbert space must also be complete and separable.• Hilbert spaces generalize the finite Euclidean spaces IRd,and are generally infinite dimensional.• Separability implies that Hilbert spaces have countableorthonormal bases.Examples• Euclidean d-space. The set of d-tuples x = (x1, ..., xd)endowed with the dot producthx, yi =dXi=1xiyi.The corresponding norm iskxk =vuuutdXi=1x2i.The vectorse1= (1, 0, . . . , 0), e2= (0, 1, . . . , 0), . . . , ed= (0, 0, . . . , 1)form an orthonormal basis, that is hei, eji = δij.Examples (cont.)• The function space L2[a, b] consisting of square integrablefunctions. The norm is induced by the dot producthf, gi =Zbaf(t)g(t)dt.It can be proved that this space is complete and separable.An important example of orthogonal basis in this space isthe following set of functions1, cos2πntb − a, sin2πntb − a(n = 1, 2, ...).• C[ a, b] and L1[a, b] are not Hilbert spaces.Evaluation functionalsA linear evaluation functional over the Hilbert space offunctions H is a linear functional Ft: H → IR that evaluateseach function in the space at the point t, orFt[f] = f(t)The functional is bounded if there exists a M s.t.|Ft[f]| = |f(t)| ≤ MkfkH∀f ∈ Hwhere k · kHis the norm in the Hilbert space of functions.• we don’t like the space L2[a, b] because the its evaluationfunctionals are unbounded.Evaluation functionals in Hilbert spaceThe evaluation functional is not bounded in the familiarHilbert space L2([0, 1]), no such M exists and in fact ele-ments of L2([0, 1]) are not even defined pointwise.0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20123456xf(x)RKHSDefinition A (real) RKHS is a Hilbert space of real-valuedfunctions on a domain X (closed bounded subset of IRd)with the property that for each t ∈ X the evaluation func-tional Ftis a bounded linear functional.Reproducing kernel (rk)If H is a RKHS, t hen for each t ∈ X there exists, by theRiesz representation theorem a function Ktof H (calledrepresenter of evaluation) with t he property – called byAronszajn – the reproducing propertyFt[f] = hKt, fiK= f(t).Since Ktis a function in H, by the reproducing property,for each x ∈ XKt(x) = hKt, KxiKThe reproducing kernel (rk) of H isK(t, x) := Kt(x).Positive definite kernelsLet X be some set, for example a subset of IRdor IRditself.A kernel is a symmetric function K : X × X → IR.DefinitionA kernel K(t, s) is positive definite (pd) ifnXi,j=1cicjK(ti, tj) ≥ 0for any n ∈ IN and choice of t1, ..., tn∈ X and c1, ..., cn∈ IR.RKHS and kernelsThe following theorem relates pd kernels and RKHS.Theorema) For every RKHS the rk is a positive definite kernel on.b) Conversely for every positive definite kernel K onX × X there is a unique RKHS on X with K as its rkSketch of proofa) We must prove that the rk K(t, x) = hKt, KxiKis sym-metric and pd.• Symmetry follows from the symmetry property of dotproductshKt, KxiK= hKx, KtiK• K is pd becausenXi,j=1cicjK(ti, tj) =nXi,j=1cicjhKti, KtjiK= ||XcjKtj||2K≥ 0.Sketch of proof (cont.)b) Conversely, given K one can construct the RKHS H asthe completion of the space of functions spanned by theset {Kx|x ∈ X} with a inner product defined as follows.The dot product of two functions f and g in span{Kx|x ∈X}f(x) =sXi=1αiKxi(x)g(x) =s0Xi=1βiKx0i(x)is by definitionhf, giK=sXi=1s0Xj=1αiβjK(xi, x0j).Examples of pd kernelsVery common examples of symmetric pd kernels are• Linear kernelK(x, x0) = x · x0• Gaussian kernelK(x, x0) = ekx−x0k2σ2, σ > 0• Polynomial kernelK(x, x0) = (x · x0+ 1)d, d ∈ INFor specific applications,
View Full Document