UCLA COMSCI 260 - Problem Set 2 - D2084356

Home> Schools> University of California, Los Angeles> Computer Science (COMSCI) > COMSCI 260> Problem Set 2

DOC PREVIEW

UCLA COMSCI 260 - Problem Set 2

School name University of California, Los Angeles

Course Comsci 260- Machine Learning Theory

Pages 2

This preview shows page 1 out of 2 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 2 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 2 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS260: Machine Learning TheoryProblem Set 2Due Monday, October 24, 2011Ground Rules:• This problem set is due at the beginning of class on October 24. Please bring a hard copy of yoursolutions with you to class. Slightly late assignments should be submitted directly to the grader, andwill be penalized 25% (i.e., 25 points). No assignments will be accepted more than 24 hours late.• You are strongly encouraged to discuss the problem set with other students in the class, as long asyou follow the rules outlined in the course academic honesty policy. Don’t forget to list all of yourcollaborators and properly credit any sources that you consult.• All solutions must be typed; LaTeX is strongly recommended. Hand-written solutions will be penal-ized 25%, and unreadable answers will not be graded.• You will be graded on both correctness and clarity. Be concise and clear, especially when writingproofs! If you cannot solve a problem completely, you will get more partial credit if you identify thegaps in your argument rather than trying to cover them up.Problems:1. Uniform convergence (25 points)Consider a finite concept class C and finite hypothesis class H with C ⊆ H. Let P be a probabilitydistribution over H, with P (h) > 0 for every h ∈ H. Let ~x1, ~x2, · · · , ~xmbe a set of points drawni.i.d. from a fixed but unknown distribution D. Define err(h) = Pr~x∼D(h(~x) 6= c(~x)) and cerr(h) =(1/m) |{i : h(~xi) 6= c(~xi)}|.Let δ be a fixed parameter in (0, 1/2). Prove that with probability at least 1 − δ, for every h ∈ H,|err(h) − cerr(h)| ≤vuut−k lnP (h)δ2mfor some constant k, where the probability is taken over the random choice of the sample~x1, ~x2, · · · , ~xm.2. Learning n-dimensional axis-aligned boxes (continued) (40 points total)Recall the class of n-dimensional axis-aligned boxes that you examined in Problem Set 1. Eachfunction c in this class is specified by a set of 2n values `c1, `c2, · · · , `cnand uc1, uc2, · · · , ucn. Given ann-dimensional input vector ~x, c(~x) is defined to be 1 if for every i ∈ {1, · · · , n} the ith coordinate of~x lies in [`ci, uci]. Otherwise, c(~x) is defined to be 0.1(a) What is the VC dimension of the class of n-dimensional axis-aligned boxes? Provide a proof ofyour answer. This proof requires two parts: a proof that at least d points can be shattered, and aproof that no more than d points can be shattered for some d. (30 points)(b) Let A be an algorithm that learns n-dimensional axis-aligned boxes in the consistency model,such as the algorithm that you defined in Problem Set 1. Let D be a fixed, unknown distributionover points in Rn, let c be a fixed, unknown n-dimensional axis-aligned box, and let δ be a fixedparameter in (0, 1/2). Suppose A is given as input m points ~x1, ~x2, · · · , ~xmdrawn i.i.d. from Dand their corresponding labels c(~x1), c(~x2), · · · , c(~xm), and it outputs a function h. Using theVC dimension that you derived in part a, state a rough upper bound (big-O notation is ok) onPr~x∼D(h(~x) 6= c(~x)) that holds with probability at least 1 − δ. How does this bound compareto the bound you derived in Problem Set 1? (Don’t worry about constants in your comparison.)(10 points)3. VC dimension of linear threshold functions (35 points total)Consider the class of linear threshold functions which pass through the origin in n-dimensions. Eachfunction c in this class can be represented as an n-dimensional weight vector ~wcsuch that c(~x) = 1if ~wc· ~x ≥ 0 and c(~x) = 0 otherwise. In this problem, you will prove that the VC dimension of thisclass is n.Hint: Once again, it might help you to first think about small values of n before trying to find a generalsolution.(a) First prove that the VC dimension of the set of linear thresholds passing through the origin in ndimensions is at least n. That is, describe a set of n points that can be shattered by this class, anda give brief argument showing that they can be shattered. Your solution should hold for arbitraryvalues of n, not just one specific value. (15 points)(b) Show that no set of n + 1 points can be shattered by this class. Hint: Recall that a set of k points~x1, · · · , ~xkis linearly dependent ifPki=1λi~xi= 0 for some values λ1, · · · , λksuch that at leastone λiis non-zero. That is, there exists one point that can be written as a weighted sum of theothers. Start by showing that no set of linearly dependent points can be shattered by this class.(20

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 2 pages.

UCLA COMSCI 260 - Problem Set 2

Sign up for free to view:

Please select your school