6 867 Machine learning Final exam December 3 2004 Your name and MIT ID Optional The grade you would give to yourself a brief justi cation 1 Cite as Tommi Jaakkola course materials for 6 867 Machine Learning Fall 2006 MIT OpenCourseWare http ocw mit edu Massachusetts Institute of Technology Downloaded on DD Month YYYY Problem 1 5 4 5 4 3 5 3 2 5 2 1 5 1 0 5 0 0 1 2 3 4 5 6 Figure 1 Labeled training points for problem 1 Consider the labeled training points in Figure 1 where x and o denote positive and negative labels respectively We wish to apply AdaBoost with decision stumps to solve the classi cation problem In each boosting iteration we select the stump that minimizes the weighted training error breaking ties arbitrarily 1 3 points In gure 1 draw the decision boundary corresponding to the rst decision stump that the boosting algorithm would choose Label this boundary 1 and also indicate side of the decision boundary 2 2 points In the same gure 1 also circle the point s that have the highest weight after the rst boosting iteration 3 2 points What is the weighted error of the rst decision stump after the rst boosting iteration i e after the points have been reweighted 4 3 points Draw the decision boundary corresponding to the second decision stump again in Figure 1 and label it with 2 also indicating the side of the boundary 2 Cite as Tommi Jaakkola course materials for 6 867 Machine Learning Fall 2006 MIT OpenCourseWare http ocw mit edu Massachusetts Institute of Technology Downloaded on DD Month YYYY 5 3 points Would some of the points be misclassi ed by the combined classi er after the two boosting iterations Provide a brief justi cation the points will be awarded for the justi cation not whether your y n answer is correct Problem 2 1 2 points Consider a linear SVM trained with n labeled points in R2 without slack penalties and resulting in k 2 support vectors k n By adding one additional labeled training point and retraining the SVM classi er what is the maximum number of support vectors in the resulting solution k k 1 k 2 n 1 2 We train two SVM classi ers to separate points in R2 The classi ers di er only in terms of the kernel function Classi er 1 uses the linear kernel K1 x x xT x and classi er 2 uses K2 x x p x p x where p x is a 3 component Gaussian mixture density estimated on the basis of related other problems a 3 points What is the VC dimension of the second SVM classi er that uses kernel K2 x x b T F 2 points The second SVM classi er can only separate points that are likely according to p x from those that have low probability under p x c 4 points If both SVM classi ers achieve zero training error on n labeled points which classi er would have a better generalization guarantee Provide a brief justi cation 3 Cite as Tommi Jaakkola course materials for 6 867 Machine Learning Fall 2006 MIT OpenCourseWare http ocw mit edu Massachusetts Institute of Technology Downloaded on DD Month YYYY Problem 3 6 8 4 6 2 4 x2 x2 2 0 0 2 2 4 4 0 2 4 6 8 10 12 14 16 18 x1 a 6 2 0 2 4 x1 6 8 10 b Figure 2 Data sets for clustering Points are located at integer coordinates 1 4 points First consider the data plotted in Figure 2a which consist of two rows of equally spaced points If k means clustering k 2 is initialised with the two points whose coordinates are 9 3 and 11 3 indicate the nal clusters obtained after the algorithm converges on Figure 2a 2 4 points Now consider the data in Figure 2b We will use spectral clustering to divide these points into two clusters Our version of spectral clustering uses a neighbourhood graph obtained by connecting each point to its two nearest neighbors breaking ties randomly and by weighting the resulting edges between points xi and xj by Wij exp xi xj Indicate on Figure 2b the clusters that we will obtain from spectral clustering Provide a brief justi cation 3 4 points Can the solution obtained in the previous part for the data in Figure 2b also be obtained by k means clustering k 2 Justify your answer 4 Cite as Tommi Jaakkola course materials for 6 867 Machine Learning Fall 2006 MIT OpenCourseWare http ocw mit edu Massachusetts Institute of Technology Downloaded on DD Month YYYY 1 2 1 0 8 y 0 6 0 4 0 2 0 0 2 0 0 2 0 4 0 6 0 8 1 x Figure 3 Training sample from a mixture of two linear models Problem 4 The data in Figure 3 comes from a mixture of two linear regression models with Gaussian noise P y x p1 N y w10 w11 x 12 p2 N y w20 w21 x 22 where p1 p2 1 and p1 p2 w10 w11 w20 w21 1 2 We hope to estimate from such data via the EM algorithm To this end let z 1 2 be the mixture index variable indicating which of the regression models is used to generate y given x 1 6 points Connect the random variables X Y and Z with directed edges so that the graphical model on the left represents the mixture of linear regression models described above and the one on the right represents a mixture of experts model For both models Y denotes the output variable X the input and Z is the choice of the linear regression model or expert mixture of linear regressions mixture of experts X Z X Z Y Y 5 Cite as Tommi Jaakkola course materials for 6 867 Machine Learning Fall 2006 MIT OpenCourseWare http ocw mit edu Massachusetts Institute of Technology Downloaded on DD Month YYYY We use a single plot to represent the model parameters see the gure below Each linear regression model appears as a solid line y wi0 wi1 x in between two parallel dotted lines at vertical distance 2 i to the solid line Thus each regression model covers the data that falls between the dotted lines When w10 w20 and w11 w21 you would only see a single solid line in the gure you may still see two di erent sets of dotted lines corresponding to di erent values of 1 and 2 The solid bar to the right represents p1 and p2 1 p1 For example if p1 p2 w10 w11 0 35 0 65 0 5 0 w20 w21 1 0 85 0 7 0 05 2 0 15 the plot is 1 2 1 0 9 1 0 8 0 8 0 7 0 6 0 6 y 0 5 0 4 0 4 0 3 0 2 0 2 0 0 1 0 2 0 0 2 0 4 0 6 x …
View Full Document
Unlocking...