UC BerkeleyDepartment of Electrical Engineering and Computer ScienceDepartment of StatisticsEECS 281B / STAT 241BAdvanced Topics in Statistical Learning TheoryProblem Set 2Spring 2009Issued: Monday, February 9, 2009 Due: Monday, February 23, 2009Problem 2.1True or false: either pr ovide a proof (when true) or an explicit counterexample (when false).(a) If K1and K2are both positive semidefin ite (PSD) kernel functions on X × X , thenλ1K1+ λ2K2is a PSD kernel function for all λi≥ 0.(b) Any symm etric function K is that is elementwise non-negative (i.e., K(x, y) ≥ 0 for allx, y) is a PSD kernel function.(c) If K1and K2are both PSD kernel functions on X × X , then K(x, y) : = K1(x, y)K2(x, y)is also a PSD kernel function.(d) Given a probability space with events E and probability law P, the function K : E × E → Rdefined by K(A, B) : = P(A, B) − P(A)P(B) is a PSD kernel function.(e) Given a finite set E, let P(E) denote the set of all subsets of E. If K : E × E → R is aPSD kernel function, then¯K(A, B) : =Xx∈A,y∈BK(x, y)is a PSD kernel fun ction on P(E) × P(E).Problem 2.2On the course website, you will find the data set regression.dat in ASCII format, whichdefines a regression problem in R10. (The first 10 columns correspond to (x1, . . . , x10) andthe final column corresponds to y ∈ R.)(a) Fit a linear regression to th ese data and report the sum of squared errors on the testset regression.test.(b) Use ordinary PCA and reduce the dimensionality of the covariate space to two dimen-sions. Fit a linear regression and report the sum of squared errors on the test setregression.test.(c) Use kernel PCA with a Gaussian kernel K(x, y) = exp(−kx−yk22σ2), and red uce the dimen-sionality of the covariate sp ace to two dimensions. (Propose and implement a methodfor choosing the bandwidth parameter σ). Fit a linear regression and report the sum ofsquared errors on the test set regression.test.1Problem 2.3For each of the following kernels, compute the eigenfunctions and eigenvalues of the operatorTK: L2(E) → L2(E) defined byTK(f)(x) =ZEK(x, y)f (y)dy.(a) For E = [0, 2π], the kernel K(x, y) =P∞ℓ=0wℓcos(ℓ(x − y)) for s ome sequence of weightswℓ≥ 0 such thatP∞ℓ=0wℓ< ∞.(b) For E = [0, 1], the polynomial kernel K(x, y) = (1 + xy)2.Problem 2.4Consider a RKHS with feature map Φ and kernel K, such that K(x, y) = hΦ(x), Φ(y)iHforall x, y ∈ E. Given a data set {x(1), . . . , x(n)}, consider some element f in the linear span of{Φ(x(i)), i = 1, 2, . . . , n}—that is, f =Pni=1αiΦ(x(i)) for some fixed coefficients α ∈ Rn. Theprojection of a new element Φ(x) onto f is given byhf, Φ(x)iHkfk2Hf.Show how to compute the sample variance of this projection, for a fixed Φ(x), using only thekernel K.Problem 2.5Given a data set {x(1), . . . , x(n)} ⊆ Rd, a novelty detection algorithm can be constructedby finding the smallest sphere that contains the data points. (When a new x is observed,it is flagged as “novel” if it lies outside this sphere.) Of course, this idea can also beimplemented in a feature space, using some feature map Φ associated with a RKHS (i.e.,K(x, y) = hΦ(x), Φ(y)iHfor all x, y ∈ Rd).(a) Give a precise formulation of the optimization problem to be solved in order to learn anovelty detector. Using Lagrangian methods, compute the dual, and show h ow solutionrequires only computing the kernel matrix K with entries Kij= K(x(i), x(j)).(b) Extend your algorithm to allow some fraction ν > 0 of the d ata to allow outside thesphere in feature s pace. (Hint: Use slack variables, as in the extension of a hard marginSVM to a soft margin SVM.)Problem 2.6Concentration bounds play an important role in the analysis of statistical estimators; in thisproblem, we explore some elementary aspects of concentration.(a) Prove that if Z is a non-negative random variable with expectation E[Z], then for allt > 0, we have P[Z ≥ t] ≤ E[Z]/t.(b) A zero-mean random variable is said to be sub-Gaussian with parameter σ > 0 ifE[exp(sX)] ≤ exp(σ2t22) for all s ∈ R. Show that X ∼ N(0, σ2) is sub -Gaussian.2(c) Suppose that X is Bernoulli with P[X = +1] = P[X = −1] = 1/2. Show that X issub-Gaussian. (Can you generalize your argument to any bounded random variable?)(d) Show that any sub-Gaussian random variable X satisfies the two-sided tail boundP[|X| > t] ≤ 2 exp−t22σ2for all t ∈ R.(e) Let X1, . . . , Xnbe n i.i.d. samples of a sub-Gaussian variable with parameter σ. Showthat f or any δ > 0, we havePmaxi=1,...,nXi>p(2 + δ)σ2log n → 0 as n →
View Full Document