DOC PREVIEW
MIT 6 867 - Final Exam

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

6.867 Machine learning Final exam December 3, 2004 Your name and MIT ID: (Optional) The grade you would give to yourself + a brief justification. 1 Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].Problem 1 0 1 2 3 4 5 600.511.522.533.544.55Figure 1: Labeled training points for problem 1. Consider the labeled training points in Figure 1, where ’x’ and ’o’ denote positive and negative labels, respectively. We wish to apply AdaBoost with decision stumps to solve the classification problem. In each boosting iteration, we select the stump that minimizes the weighted training error, breaking ties arbitrarily. 1. (3 points) In figure 1, draw the decision boundary corresponding to the first decision stump that the boosting algorithm would choose. Label this boundary (1), and also indicate +/- side of the decision boundary. 2. (2 points) In the same figure 1 also circle the point(s) that have the highest weight after the first boosting iteration. 3. (2 points) What is the weighted error of the first decision stump after the first boosting iteration, i.e., after the points have been reweighted? 4. (3 points) Draw the decision boundary corresponding to the second decision stump, again in Figure 1, and label it with (2), also indicating the +/- side of the boundary. 2 Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].5. (3 points) Would some of the points be misclassified by the combined classifier after the two boosting iterations? Provide a brief justification. (the points will be awarded for the justification, not whether your y/n answer is correct) Problem 2 1. (2 points) Consider a linear SVM trained with n labeled points in R2 without slack penalties and resulting in k = 2 support vectors (k < n). By adding one additional labeled training point and retraining the SVM classifier, what is the maximum number of support vectors in the resulting solution? ( ) k ( ) k + 1 ( ) k + 2 ( ) n + 1 2. We train two SVM classifiers to separate points in R2 . The classifiers differ only in terms of the kernel function. Classifier 1 uses the linear kernel K1(x, x�) = xT x�, and classifier 2 uses K2(x, x�) = p(x)p(x�), where p(x) is a 3-component Gaussian mixture density, estimated on the basis of related other problems. (a) (3 points) What is the VC-dimension of the second SVM classifier that uses kernel K2(x, x�)? (b) (T/F – 2 points) The second SVM classifier can only separate points that are likely according to p(x) from those that have low probability under p(x). (c) (4 points) If both SVM classifiers achieve zero training error on n labeled points, which classifier would have a better generalization guarantee? Provide a brief justification. 3 Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].Problem 3 0 2 4 6 8 10 12 14 16 18−4−202468x1x2−2 0 2 4 6 8 10−6−4−20246x1x2(a) (b) Figure 2: Data sets for clustering. Points are located at integer c oordinates. 1. (4 points) First consider the data plotted in Figure 2a, which consist of two rows of equally spaced points. If k-means clustering (k = 2) is initialised with the two points whose coordinates are (9, 3) and (11, 3), indicate the final clusters obtained (after the algorithm converges) on Figure 2a. 2. (4 points) Now consider the data in Figure 2b. We will use spectral clustering to divide these points into two clusters. Our version of spectral clustering uses a neighbourhood graph obtained by connecting each point to its two nearest neighbors (breaking ties randomly), and by weighting the resulting edges between points xi and xj by Wij = exp(−||xi − xj||). Indicate on Figure 2b the clusters that we will obtain from spectral clustering. Provide a brief justification. 3. (4 points) Can the solution obtained in the previous part for the data in Figure 2b also be obtained by k-means clustering (k = 2)? Justify your answer. 4 Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].�� �� �� �� �� �� �� �� �� �� �� �� 0 0.2 0.4 0.6 0.8 1−0.200.20.40.60.811.2xyFigure 3: Training sample from a mixture of two linear models Problem 4 The data in Figure 3 comes from a mixture of two linear regression models with Gaussian noise: P (y|x; θ) = p1N (y ; w10 + w11x, σ2) + p2N (y ; w20 + w21x, σ2)1 2 where p1 + p2 = 1 and θ = (p1, p2, w10, w11, w20, w21, σ1, σ2). We hope to estimate θ from such data via the EM algorithm. To this end, let z ∈ {1, 2} be the mixture index, variable indicating which of the regression models is used to generate y given x. 1. (6 points) Connect the random variables X, Y , and Z with directed edges so that the graphical model on the left represents the mixture of linear regression mo dels described above, and the one on the right represents a mixture-of-experts model. For both models, Y denotes the output variable, X the input, and Z is the choice of the linear regression model or expert. mixture of linear regressions mixture of experts X Z X Z Y Y 5 Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].We use a single plot to represent the model parameters (see the figure below). Each linear regression model appears as a solid line (y = wi0 + wi1x) in between two parallel dotted lines at vertical distance 2σi to the solid line. Thus each regression model “covers” the data that falls between the dotted lines. When w10 = w20 and w11 = w21 you would only see a single solid line in the figure; you may still see two different sets of dotted lines corresponding to different values of σ1 and σ2. The solid bar to the right represents p1 (and p2 = 1 − p1). For


View Full Document

MIT 6 867 - Final Exam

Download Final Exam
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Final Exam and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Final Exam 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?