CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 7Reproducing Kernel Hilbert SpacesLecturer: Peter Bartlett Scribe: Chunhui Gu1 Reproducing Kernel Hilbert Spaces1.1 Hilbert Space and KernelAn inner product hu, vi can be1. a usual dot product: hu, vi = v0w =Piviwi2. a kernel product: hu, vi = k(v, w) = ψ(v)0ψ(w) (where ψ(u) may have infinite dimensions)However, an inner product h·, ·i must satisfy the following conditions:1. Symmetryhu, vi = hv, ui ∀u, v ∈ X2. Bilinearityhαu + βv, wi = αhu, wi + βhv, wi ∀u, v, w ∈ X , ∀α, β ∈ R3. Positive definitenesshu, ui ≥ 0, ∀u ∈ Xhu, ui = 0 ⇐⇒ u = 0Now we can define the notion of a Hilbert space.Definition. A Hilbert Space is an inner product space that is complete and separable with respect to thenorm defined by the inner product.Examples of Hilbert spaces include:1. The vector space Rnwith ha, bi = a0b, the ve ctor dot product of a and b.2. The space l2of square summable sequences, with inner product hx, yi =P∞i=1xiyi3. The space L2of square integrable functions (i.e.,Rsf(x)2dx < ∞), with inner product hf, gi =Rsf(x)g(x)dxDefinition. k(·, ·) is a reproducing kernel of a Hilbe rt space H if ∀f ∈ H, f (x) = hk(x, ·), f(·)i.12 Reproducing Kernel Hilbert SpacesA Reproducing Kernel Hilbert Space (RKHS) is a Hilbert space H with a reproducing kernel whose spanis dense in H. We could equivalently define an RKHS as a Hilbert space of functions with all evaluationfunctionals bounded and linear.For instance, the L2space is a Hilbert space, but not an RKHS because the delta function which has thereproducing propertyf(x) =Zsδ(x − u)f(u)dudoes not satisfy the square integrable condition, that is,Zsδ(u)2du 6< ∞,thus the delta function is not in L2.Now let us define a kernel.Definition. k : X × X → R is a kernel if1. k is symmetric: k(x, y) = k(y, x).2. k is positive semi-definite, i.e., ∀x1, x2, ..., xn∈ X , the ”Gram Matrix” K defined by Kij= k(xi, xj) ispositive semi-definite. (A matrix M ∈ Rn×nis positive semi-definite if ∀a ∈ Rn, a0Ma ≥ 0.)Here are some properties of a kernel that are worth noting:1. k(x, x) ≥ 0. (Think about the Gram matrix of n = 1)2. k(u, v) ≤pk(u, u)k(v, v). (This is the Cauchy-Schwarz inequality.)To see why the second property holds, we consider the case when n = 2:Let a =k(v, v)−k(u, v). The Gram matrix K =k(u, u) k(u, v)k(v, u) k(v, v) 0 ⇐⇒ a0Ka ≥ 0⇐⇒ [k(v, v)k(u, u) − k(u, v)2]k(v, v) ≥ 0.By the first property we know k(v, v) ≥ 0, so k(v, v)k(u, u) ≥ k(u, v)2.1.2 Build an Reproducing Kernel Hilbert Space (RKHS)Given a kernel k, define the ”reproducing kernel feature map” Φ : X → RXas:Φ(x) = k(·, x)Consider the vector space:span({Φ(x) : x ∈ X }) = {f (·) =nXi=1αik(·, xi) : n ∈ N, xi∈ X , αi∈ R}For f =Piαik(·, ui) and g =Piβik(·, vi), define hf, gi =Pi,jαiβjk(ui, vj).Note that:hf, k(·, x)i =Xiαik(x, ui) = f(x), i.e., k has the reproducing prop erty.We show that hf, gi is an inner product by checking the following conditions:Reproducing Kernel Hilbert Spaces 31. Symmetry: hf, gi =Pi,jαiβjk(ui, vj) =Pi,jβjαik(vj, ui) = hg, f i2. Bilinearity: hf, gi =Piαig(ui) =Pjβjf(vj)3. Positive definiteness: hf, f i = α0Kα ≥ 0 with equality iff f = 0.From 3 we can also derive:1. hf, gi2≤ hf, f ihg, giProof. ∀a ∈ R, haf + g, af + gi = a2hf, f i + 2ahf, gi + hg, gi ≥ 0. This implies that the quadraticexpression has a non-positive discriminant. Therefore, hf, gi2− hf, fihg, gi ≤ 02. |f(x)|2= hk(·, x), f i2≤ k(x, x)hf, f i, which implies that if hf, f i = 0 then f is identically zero.Now we have defined an inner product space h·, ·i. Complete it to give the Hilbert space.Definition. For a (compact) X ⊆ Rd, and a Hilbert space H of functions f : X → R, we say H is aReproducing Kernel Hilbert Space if ∃k : X → R, s.t.1. k has the reproducing property, i.e., f(x) = hf(·), k(·, x)i2. k spans H = span{k(·, x) : x ∈ X }1.3 Mercer’s TheoremAnother way to characterize a symmetric positive semi-definite kernel k is via the Mercer’s Theorem.Theorem 1.1 (Mercer’s). Suppose k is a continuous positive semi-definite kernel on a compact set X , andthe integral operator Tk: L2(X ) → L2(X ) defined by(Tkf)(·) =ZXk(·, x)f (x)dxis positive semi-definite, that is, ∀f ∈ L2(X ),ZXk(u, v)f (u)f (v)dudv ≥ 0Then there is an orthonormal basis {ψi} of L2(X ) consisting of e igenfunctions of Tksuch that the corresp ond-ing sequence of eigenvalues {λi} are non-negative. The e igenfunctions corresponding to non-zero eigenvaluesare continuous on X and k(u, v) has the representationk(u, v) =∞Xi=1λiψi(u)ψi(v)where the convergence is absolute and uniform, that is,limn→∞supu,v|k(u, v) −nXi=1λiψi(u)ψi(v)| = 04 Reproducing Kernel Hilbert SpacesTo take an analogue in the finite case, that is, X = {x1, . . . , xn}. Let Kij= k(xi, xj), and f : X → Rnwithfi= f(xi). Then,Tkf =nXi=1k(·, xi)fi∀f, f0Kf ≥ 0 ⇒ K 0 ⇒ K =Xλiviv0iHence,k(xi, xj) = Kij= (V ΛV0)ij=nXk=1λkvkivkj=nXk=1λkψk(xi)ψk(xj) ⇒ ψk(xi) = (vk)iWe summarize several equivalent conditions on continuous, symmetric k defined on compact X :1. Every Gram matrix is positive semi-definite.2. Tkis positive semi-definite.3. k can be expressed as k(u, v) =Piλiψi(u)ψi(v).4. k is the reproducing kernel of an RKHS of functions on X
View Full Document