DOC PREVIEW
UCLA COMSCI 260 - Problem Set 2

This preview shows page 1 out of 2 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS260: Machine Learning TheoryProblem Set 2Due Monday, October 24, 2011Ground Rules:• This problem set is due at the beginning of class on October 24. Please bring a hard copy of yoursolutions with you to class. Slightly late assignments should be submitted directly to the grader, andwill be penalized 25% (i.e., 25 points). No assignments will be accepted more than 24 hours late.• You are strongly encouraged to discuss the problem set with other students in the class, as long asyou follow the rules outlined in the course academic honesty policy. Don’t forget to list all of yourcollaborators and properly credit any sources that you consult.• All solutions must be typed; LaTeX is strongly recommended. Hand-written solutions will be penal-ized 25%, and unreadable answers will not be graded.• You will be graded on both correctness and clarity. Be concise and clear, especially when writingproofs! If you cannot solve a problem completely, you will get more partial credit if you identify thegaps in your argument rather than trying to cover them up.Problems:1. Uniform convergence (25 points)Consider a finite concept class C and finite hypothesis class H with C ⊆ H. Let P be a probabilitydistribution over H, with P (h) > 0 for every h ∈ H. Let ~x1, ~x2, · · · , ~xmbe a set of points drawni.i.d. from a fixed but unknown distribution D. Define err(h) = Pr~x∼D(h(~x) 6= c(~x)) and cerr(h) =(1/m) |{i : h(~xi) 6= c(~xi)}|.Let δ be a fixed parameter in (0, 1/2). Prove that with probability at least 1 − δ, for every h ∈ H,|err(h) − cerr(h)| ≤vuut−k lnP (h)δ2mfor some constant k, where the probability is taken over the random choice of the sample~x1, ~x2, · · · , ~xm.2. Learning n-dimensional axis-aligned boxes (continued) (40 points total)Recall the class of n-dimensional axis-aligned boxes that you examined in Problem Set 1. Eachfunction c in this class is specified by a set of 2n values `c1, `c2, · · · , `cnand uc1, uc2, · · · , ucn. Given ann-dimensional input vector ~x, c(~x) is defined to be 1 if for every i ∈ {1, · · · , n} the ith coordinate of~x lies in [`ci, uci]. Otherwise, c(~x) is defined to be 0.1(a) What is the VC dimension of the class of n-dimensional axis-aligned boxes? Provide a proof ofyour answer. This proof requires two parts: a proof that at least d points can be shattered, and aproof that no more than d points can be shattered for some d. (30 points)(b) Let A be an algorithm that learns n-dimensional axis-aligned boxes in the consistency model,such as the algorithm that you defined in Problem Set 1. Let D be a fixed, unknown distributionover points in Rn, let c be a fixed, unknown n-dimensional axis-aligned box, and let δ be a fixedparameter in (0, 1/2). Suppose A is given as input m points ~x1, ~x2, · · · , ~xmdrawn i.i.d. from Dand their corresponding labels c(~x1), c(~x2), · · · , c(~xm), and it outputs a function h. Using theVC dimension that you derived in part a, state a rough upper bound (big-O notation is ok) onPr~x∼D(h(~x) 6= c(~x)) that holds with probability at least 1 − δ. How does this bound compareto the bound you derived in Problem Set 1? (Don’t worry about constants in your comparison.)(10 points)3. VC dimension of linear threshold functions (35 points total)Consider the class of linear threshold functions which pass through the origin in n-dimensions. Eachfunction c in this class can be represented as an n-dimensional weight vector ~wcsuch that c(~x) = 1if ~wc· ~x ≥ 0 and c(~x) = 0 otherwise. In this problem, you will prove that the VC dimension of thisclass is n.Hint: Once again, it might help you to first think about small values of n before trying to find a generalsolution.(a) First prove that the VC dimension of the set of linear thresholds passing through the origin in ndimensions is at least n. That is, describe a set of n points that can be shattered by this class, anda give brief argument showing that they can be shattered. Your solution should hold for arbitraryvalues of n, not just one specific value. (15 points)(b) Show that no set of n + 1 points can be shattered by this class. Hint: Recall that a set of k points~x1, · · · , ~xkis linearly dependent ifPki=1λi~xi= 0 for some values λ1, · · · , λksuch that at leastone λiis non-zero. That is, there exists one point that can be written as a weighted sum of theothers. Start by showing that no set of linearly dependent points can be shattered by this class.(20


View Full Document

UCLA COMSCI 260 - Problem Set 2

Download Problem Set 2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Problem Set 2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Problem Set 2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?