Berkeley STAT C241B - Problem Set 2 - D1614575

Home> Schools> University of California, Berkeley> Statistics (STAT) > STAT C241B> Problem Set 2

DOC PREVIEW

Berkeley STAT C241B - Problem Set 2

School name University of California, Berkeley

Course Stat C241b- Advanced Topics in Learning and Decision Making

Pages 3

This preview shows page 1 out of 3 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

UC BerkeleyDepartment of Electrical Engineering and Computer ScienceDepartment of StatisticsEECS 281B / STAT 241BAdvanced Topics in Statistical Learning TheoryProblem Set 2Spring 2009Issued: Monday, February 9, 2009 Due: Monday, February 23, 2009Problem 2.1True or false: either pr ovide a proof (when true) or an explicit counterexample (when false).(a) If K1and K2are both positive semidefin ite (PSD) kernel functions on X × X , thenλ1K1+ λ2K2is a PSD kernel function for all λi≥ 0.(b) Any symm etric function K is that is elementwise non-negative (i.e., K(x, y) ≥ 0 for allx, y) is a PSD kernel function.(c) If K1and K2are both PSD kernel functions on X × X , then K(x, y) : = K1(x, y)K2(x, y)is also a PSD kernel function.(d) Given a probability space with events E and probability law P, the function K : E × E → Rdefined by K(A, B) : = P(A, B) − P(A)P(B) is a PSD kernel function.(e) Given a finite set E, let P(E) denote the set of all subsets of E. If K : E × E → R is aPSD kernel function, then¯K(A, B) : =Xx∈A,y∈BK(x, y)is a PSD kernel fun ction on P(E) × P(E).Problem 2.2On the course website, you will find the data set regression.dat in ASCII format, whichdefines a regression problem in R10. (The first 10 columns correspond to (x1, . . . , x10) andthe final column corresponds to y ∈ R.)(a) Fit a linear regression to th ese data and report the sum of squared errors on the testset regression.test.(b) Use ordinary PCA and reduce the dimensionality of the covariate space to two dimen-sions. Fit a linear regression and report the sum of squared errors on the test setregression.test.(c) Use kernel PCA with a Gaussian kernel K(x, y) = exp(−kx−yk22σ2), and red uce the dimen-sionality of the covariate sp ace to two dimensions. (Propose and implement a methodfor choosing the bandwidth parameter σ). Fit a linear regression and report the sum ofsquared errors on the test set regression.test.1Problem 2.3For each of the following kernels, compute the eigenfunctions and eigenvalues of the operatorTK: L2(E) → L2(E) defined byTK(f)(x) =ZEK(x, y)f (y)dy.(a) For E = [0, 2π], the kernel K(x, y) =P∞ℓ=0wℓcos(ℓ(x − y)) for s ome sequence of weightswℓ≥ 0 such thatP∞ℓ=0wℓ< ∞.(b) For E = [0, 1], the polynomial kernel K(x, y) = (1 + xy)2.Problem 2.4Consider a RKHS with feature map Φ and kernel K, such that K(x, y) = hΦ(x), Φ(y)iHforall x, y ∈ E. Given a data set {x(1), . . . , x(n)}, consider some element f in the linear span of{Φ(x(i)), i = 1, 2, . . . , n}—that is, f =Pni=1αiΦ(x(i)) for some fixed coefficients α ∈ Rn. Theprojection of a new element Φ(x) onto f is given byhf, Φ(x)iHkfk2Hf.Show how to compute the sample variance of this projection, for a fixed Φ(x), using only thekernel K.Problem 2.5Given a data set {x(1), . . . , x(n)} ⊆ Rd, a novelty detection algorithm can be constructedby finding the smallest sphere that contains the data points. (When a new x is observed,it is flagged as “novel” if it lies outside this sphere.) Of course, this idea can also beimplemented in a feature space, using some feature map Φ associated with a RKHS (i.e.,K(x, y) = hΦ(x), Φ(y)iHfor all x, y ∈ Rd).(a) Give a precise formulation of the optimization problem to be solved in order to learn anovelty detector. Using Lagrangian methods, compute the dual, and show h ow solutionrequires only computing the kernel matrix K with entries Kij= K(x(i), x(j)).(b) Extend your algorithm to allow some fraction ν > 0 of the d ata to allow outside thesphere in feature s pace. (Hint: Use slack variables, as in the extension of a hard marginSVM to a soft margin SVM.)Problem 2.6Concentration bounds play an important role in the analysis of statistical estimators; in thisproblem, we explore some elementary aspects of concentration.(a) Prove that if Z is a non-negative random variable with expectation E[Z], then for allt > 0, we have P[Z ≥ t] ≤ E[Z]/t.(b) A zero-mean random variable is said to be sub-Gaussian with parameter σ > 0 ifE[exp(sX)] ≤ exp(σ2t22) for all s ∈ R. Show that X ∼ N(0, σ2) is sub -Gaussian.2(c) Suppose that X is Bernoulli with P[X = +1] = P[X = −1] = 1/2. Show that X issub-Gaussian. (Can you generalize your argument to any bounded random variable?)(d) Show that any sub-Gaussian random variable X satisfies the two-sided tail boundP[|X| > t] ≤ 2 exp−t22σ2for all t ∈ R.(e) Let X1, . . . , Xnbe n i.i.d. samples of a sub-Gaussian variable with parameter σ. Showthat f or any δ > 0, we havePmaxi=1,...,nXi>p(2 + δ)σ2log n → 0 as n →

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 3 pages.

Berkeley STAT C241B - Problem Set 2

Sign up for free to view:

Please select your school