DOC PREVIEW
Berkeley STAT C241B - Problem Set 2

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

UC BerkeleyDepartment of Electrical Engineering and Computer ScienceDepartment of StatisticsEECS 281B / STAT 241BAdvanced Topics in Statistical Learning TheoryProblem Set 2Spring 2009Issued: Monday, February 9, 2009 Due: Monday, February 23, 2009Problem 2.1True or false: either pr ovide a proof (when true) or an explicit counterexample (when false).(a) If K1and K2are both positive semidefin ite (PSD) kernel functions on X × X , thenλ1K1+ λ2K2is a PSD kernel function for all λi≥ 0.(b) Any symm etric function K is that is elementwise non-negative (i.e., K(x, y) ≥ 0 for allx, y) is a PSD kernel function.(c) If K1and K2are both PSD kernel functions on X × X , then K(x, y) : = K1(x, y)K2(x, y)is also a PSD kernel function.(d) Given a probability space with events E and probability law P, the function K : E × E → Rdefined by K(A, B) : = P(A, B) − P(A)P(B) is a PSD kernel function.(e) Given a finite set E, let P(E) denote the set of all subsets of E. If K : E × E → R is aPSD kernel function, then¯K(A, B) : =Xx∈A,y∈BK(x, y)is a PSD kernel fun ction on P(E) × P(E).Problem 2.2On the course website, you will find the data set regression.dat in ASCII format, whichdefines a regression problem in R10. (The first 10 columns correspond to (x1, . . . , x10) andthe final column corresponds to y ∈ R.)(a) Fit a linear regression to th ese data and report the sum of squared errors on the testset regression.test.(b) Use ordinary PCA and reduce the dimensionality of the covariate space to two dimen-sions. Fit a linear regression and report the sum of squared errors on the test setregression.test.(c) Use kernel PCA with a Gaussian kernel K(x, y) = exp(−kx−yk22σ2), and red uce the dimen-sionality of the covariate sp ace to two dimensions. (Propose and implement a methodfor choosing the bandwidth parameter σ). Fit a linear regression and report the sum ofsquared errors on the test set regression.test.1Problem 2.3For each of the following kernels, compute the eigenfunctions and eigenvalues of the operatorTK: L2(E) → L2(E) defined byTK(f)(x) =ZEK(x, y)f (y)dy.(a) For E = [0, 2π], the kernel K(x, y) =P∞ℓ=0wℓcos(ℓ(x − y)) for s ome sequence of weightswℓ≥ 0 such thatP∞ℓ=0wℓ< ∞.(b) For E = [0, 1], the polynomial kernel K(x, y) = (1 + xy)2.Problem 2.4Consider a RKHS with feature map Φ and kernel K, such that K(x, y) = hΦ(x), Φ(y)iHforall x, y ∈ E. Given a data set {x(1), . . . , x(n)}, consider some element f in the linear span of{Φ(x(i)), i = 1, 2, . . . , n}—that is, f =Pni=1αiΦ(x(i)) for some fixed coefficients α ∈ Rn. Theprojection of a new element Φ(x) onto f is given byhf, Φ(x)iHkfk2Hf.Show how to compute the sample variance of this projection, for a fixed Φ(x), using only thekernel K.Problem 2.5Given a data set {x(1), . . . , x(n)} ⊆ Rd, a novelty detection algorithm can be constructedby finding the smallest sphere that contains the data points. (When a new x is observed,it is flagged as “novel” if it lies outside this sphere.) Of course, this idea can also beimplemented in a feature space, using some feature map Φ associated with a RKHS (i.e.,K(x, y) = hΦ(x), Φ(y)iHfor all x, y ∈ Rd).(a) Give a precise formulation of the optimization problem to be solved in order to learn anovelty detector. Using Lagrangian methods, compute the dual, and show h ow solutionrequires only computing the kernel matrix K with entries Kij= K(x(i), x(j)).(b) Extend your algorithm to allow some fraction ν > 0 of the d ata to allow outside thesphere in feature s pace. (Hint: Use slack variables, as in the extension of a hard marginSVM to a soft margin SVM.)Problem 2.6Concentration bounds play an important role in the analysis of statistical estimators; in thisproblem, we explore some elementary aspects of concentration.(a) Prove that if Z is a non-negative random variable with expectation E[Z], then for allt > 0, we have P[Z ≥ t] ≤ E[Z]/t.(b) A zero-mean random variable is said to be sub-Gaussian with parameter σ > 0 ifE[exp(sX)] ≤ exp(σ2t22) for all s ∈ R. Show that X ∼ N(0, σ2) is sub -Gaussian.2(c) Suppose that X is Bernoulli with P[X = +1] = P[X = −1] = 1/2. Show that X issub-Gaussian. (Can you generalize your argument to any bounded random variable?)(d) Show that any sub-Gaussian random variable X satisfies the two-sided tail boundP[|X| > t] ≤ 2 exp−t22σ2for all t ∈ R.(e) Let X1, . . . , Xnbe n i.i.d. samples of a sub-Gaussian variable with parameter σ. Showthat f or any δ > 0, we havePmaxi=1,...,nXi>p(2 + δ)σ2log n → 0 as n →


View Full Document

Berkeley STAT C241B - Problem Set 2

Download Problem Set 2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Problem Set 2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Problem Set 2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?