DOC PREVIEW
Berkeley COMPSCI C281B - Multiple Kernels and Reproducing Kernel Hilbert Spaces

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS281B/Stat241B: Advanced Topics in Learning & Decision MakingMultiple Kernels and Reproducing Kernel Hilbert SpacesLecturer: Michael I. Jordan Scribes: Blaine Nelson and Sierra Boyd1 Convex ProgrammingFigure 1: Diagram of Convex Programming2 Multiple KernelsWe continue our discussion on multiple kernels from the previous lecture. Recall,K =PiµiKiK0tr (K) ≤ c12 Multiple Kernels and Reproducing Kernel Hilbert SpacesGiven data pairs {(xi, yi)}ni=1(xi∈ χ and yi∈ {−1, 1}), let G (K) = diag (Y ) · K · diag (Y ) wherediag (Y ) =y1y200...ynThus, the (i,j)-th element of G (K) is given by G (K)ij= yiyjK (xi, xj)The hard-margin SVM associated with with G (K) is represented by the following program:max 2αTe − αTG (K) αs.t.αTy = 0 αi≥ 0e =11...1Moreover, the soft-margin SVM c an be represented by a similar program where the last constraint is boxed0 ≤ αi≤ C.How can we optimize over α and µ?Theorem 1Multiple Kernel Optimization Theorem: The problem of optimizingmaxα2αTe − αT(G (K) + τI) αs.t.αTy ≥ 0 0 ≤ αi≤ Cwith respect to {µi} where τ is a regularization parameter reduces to:min{µi},t,λ,ν,δts.t.tr (K) = 0 K =PiµiKiG (K) + λI e + ν − δ + λy(e + ν − δ + λy)Tt − 2CδTe0ν ≥ 0 δ ≥ 0which is a Semi-Definite Program (SDP ).Proof. Optimizing with respect to {µi} is equivalent to the following program:min{µi}maxα2αTe − αT(G (K) + τI) α≡ min{µi},t(t) ≥ maxα2αTe − αT(G (K) + τI) αMultiple Kernels and Reproducing Kernel Hilbert Spaces 3Taking the dual of this gives the Lagrangian,L (α, ν, λ, δ) = 2αTe − αT(G (K) + τI) α + 2νTα + 2λyTα + 2δT(Ce − α)Now, we maximize over α to find the dual problem. This is done by setting∂L∂αto 0. Thus we find thatα = (G (K) + τ I)−1(e + ν − δ + λy). This yields the following dual problem:minν≥0,δ≥0,λ(e + ν − δ + λy)T(G (K) + τI)−1(e + ν − δ + λy) + 2CδTe| {z }βHence the primal statement is less than or equal to t if and only if ∃ν, δ, λ such that β ≤ t.But, from the Schur Complement Theorem, we know that,A BC D0 ⇔A0D − CA−1B0But D − CA−1B has the exact same form as −β. Moreover, from the problem statement, (G (K) + τ I) 0.Thus, (e + ν − δ + λy)T(G (K) + τI)−1(e + ν − δ + λy) + 2CδTe≺0 if and only if,G (K) + τI e + ν − δ + λy(e + ν − δ + λy)Tt − 2CδTe0Thus, this condition implies that β ≤ t, which is equivalent to the original min-max problem.3 Hilbert SpaceA Hilbert space is essentially an Euclidean space, but a Euclidean space that may be infinite-dimensional. Itis a vector space (i.e., is closed under addition and scalar multiplication, obeys the distributive and associativelaws, etc.). It is also endowed with an inner product h·, ·i; a bilinear form obeying the following conditions:hx + y, zi = hx, zi + hy, zihαx, yi = αhx, yihx, yi = hy, xihx, xi ≥ 0hx, xi = 0 =⇒ x = 0From h·, ·i we get a norm k · k via k x k∆= hx, xi1/2. This norm allows us to define notions of convergence.Adding all limit points of Cauchy sequences (the distance between elements xiand xi+1of the sequencebecomes small as i → ∞) to our space yields a Hilbert space—a complete inner product space.Theorem 2Cauchy-Schwartz:hx, yi ≤k x kk y kis easily proved for any Hilbert space.4 Multiple Kernels and Reproducing Kernel Hilbert Spaces3.1 Examples• Rn: hx, yi = xTy• L2: hx, yi =Rx(t)y(t)dt• l2: hx, yi =P∞i=1xiyi4 Reproducing Kernel Hilbert SpacesThe Hilbert space L2is too “big” for our purposes, containing too many non-smooth functions. One approachto obtaining restricted, smooth spaces is the Repro ducing Kernel Hilbert Space (RKHS) approach. A RKHSis “smaller” than a general Hilbert space.Given a kernel k(x, x0), we will construct a Hilbert space such that k defines an inner product in that space.First define the Gram matrix. Given points x1, x2, x3, ...xn, define:Kij= k(xi, xj)We say that the kernel k is positive definite (p.d.) if its Gram matrix is positive definite for all x1, x2, ..., xn.Similarly, we define positive semidefinite (p.s.d.), negative definite, and negative semidefinite by the proper-ties of the Gram matrix.The Cauchy-Schwartz inequality holds for p.s.d. kernels:k(x1, x2)2≤ k(x1, x1)k(x2, x2)Proof : Form a Gram matrix of the two points x1and x2:K =k(x1, x1) k(x1, x2)k(x2, x1) k(x2, x2)For K to be positive semidefinite as a matrix, the determinant of K must be nonnegative:=⇒ k (x1, x1)k(x2, x2) − k(x2, x1)k(x1, x2) ≥ 0,which implies Cauchy-Schwartz.Define the following reproducing kernel map for a kernel k (·, ·):Φ : x −→ k(·, x).I.e., to each point x in the original space we associate a function k(·, x).Example: Gaussian kernel. Each point x maps to a Gaussian centered at that point as depicted in Fig. 2.Intuitively this captures the similarity of x to all other points.We now construct a vector space containing all linear combinations of the functions k(·, x):f(·) =mXi=1αik(·, xi).for arbitrary {xi∈ χ}mi=1. This will be used to construct our RKHS H.Multiple Kernels and Reproducing Kernel Hilbert Spaces 5Figure 2: The mapping of an input space to a gaussian feature space.We now define an inner product. Let f (·) =Piαik (·, xi) and g (·) =Pjβjk·, x0jsuch that f, g ∈ H, anddefine:hf, gi∆=mXi=1m0Xj=1αiβjkxi, x0jWe need to verify that this in fact defines an inner product. Symmetry is obvious: hf, gi = hg, f i. Linearityis easy to show. In the next lecture we will establish the key property: hf, f i = 0 =⇒ f = 0.Kernels are analogs of Dirac delta functions. Consider L2(which is not a RKHS). We have:f(x) =Zf(t)δ(t, x)dt,where δ(t, x) is the Dirac delta function. The Dirac delta function is the representer of evaluation for L2,but of course it is not itself in L2. (Which is consistent with the fact that L2is not a


View Full Document

Berkeley COMPSCI C281B - Multiple Kernels and Reproducing Kernel Hilbert Spaces

Download Multiple Kernels and Reproducing Kernel Hilbert Spaces
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multiple Kernels and Reproducing Kernel Hilbert Spaces and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiple Kernels and Reproducing Kernel Hilbert Spaces 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?