6 867 Machine Learning Problem set 4 Due Thursday October 31 in class What and how to turn in Turn in short written answers to the questions explicitly stated and when requested to explain or prove Do not turn in answers when requested to think consider try or experiment except when specifically instructed You may turn in answers to questions marked optional they will be read and corrected but a grade will not be recorded for them Turn in all MATLAB code explicitly requested or that you used to calculate requested values It should be clear exactly what command was used to get the answer to each question To help the graders including yourself please be neat answer the questions briefly and in the order they are stated Staple each Problem separately and be sure to write your name on the top of every page Problem 1 Boosting In this problem we slightly modify the AdaBoost algorithm seen in class to better explore some properties of the algorithm Specifically we no longer normalize the weights on the training examples after each iteration The modified algorithm which is set to run for T iterations is shown in Algorithm 1 Note that in the modified version the weights associated with the training examples are no longer guaranteed to sum to one after each iteration and therefore can not be viewed as a distribution but the algorithm is still valid Let us denote the sum of weights at P t the start of iteration t by Zt ni 1 wi At the start of the first iteration of boosting Z1 n Let us now investigate the behavior of Zt as a function of t 1 10pt At the tth iteration we found a weak classifier that achieves a weighted training t error t Show that the choice of votes given by t 21 log 1 is optimal in the t 1 Algorithm 1 AdaBoost slightly modified version from the lecture the weights are not normalized to sum to one Input Set of n labeled examples x1 y1 xn yn yi 1 1 1 1 Set of associated weights w1 wn initially all wi 1 Required number of iterations T for t 1 T do Find a weak classifier ht which outputs binary predictions ht x 1 such that its weighted training error t n 1 X t w 1 ht xi yi Zt i 1 i satisfies t 1 2 t Set the vote t 12 log 1 t Update the weights for each i 1 n set t 1 wi t wi e yi t ht xi end for Output the combined classifier defined by h T x T X t 1 2 t ht x sense that it minimizes Zt 1 Advice Look at Zt 1 as a function of and find the value for which the function achieves its minimum You may also find the following notational shorthand useful WI n X t wi 1 n X t yi ht xi WC wi yi ht xi i 1 i 1 where WC is the total weight of points classified by ht correctly and WI the total weight of misclassified points y ht x 1 whenever the label predicted by ht is correct and zero otherwise The weights here are those available at the start of iteration t 2 5pt Show that the sum of weights Zt is monotonically decreasing as a function of t We know now that Zt decreases and that our choice of t is optimal for minimizing Zt 1 but is this quantity useful in any way It is useful in bounding how successive boosting iterations reduce the training error 3 10pt Show that the training error the average number of misclassified training examples of the combined classifier after m iterations of boosting h m x m X t ht x t 1 is bounded from above by Zm 1 n Advice In the definition of the algorithm the weights and therefore Zt are defined recursively You will find it helpful to unroll the definition and write out the value of Zt 1 in terms of the initial weights which are all 1 and the subsequent updates using 1 m You may also find the following simple fact helpful for any x 0 ex 1 We have shown that AdaBoost tries in each iteration to minimize an upper bound on the training error of the combined classifier and guarantees that this bound will monotonically decrease as the algorithm proceeds How much the bound decreases at each iteration depends on the weighted training error of each weak classifier In the AdaBoost algorithm presented above the vote t assigned to a weak classifier ht does not differentiate between the directions of mistakes it makes misclassifying a positive example as negative or vice versa We already know that the magnitude of the vote is related to the performance of the classifier that is a classifier which achieves lower weighted training error will get a higher But what if ht makes almost all of its mistakes on say negative examples That is for a training example x if ht x 1 with very high probability x is indeed negative but if ht x 1 it is quite likely that in fact x is negative Recall that the vote t measures the influence of ht in the final combined classifier Thus it would make sense to assign a high vote to its decisions in the direction in which ht makes fewer mistakes and a lower vote to the opposite decisions Algorithm 2 describes the modification in AdaBoost that allows for such a refinement 3 Algorithm 2 Modified boosting algorithm allowing different votes for positive and negative decisions Input Set of n labeled examples x1 y1 xn yn yi 1 1 1 1 Set of associated weights w1 wn initially all wi 1 Required number of iterations T for t 1 T do Find a weak classifier ht which outputs binary predictions ht x 1 such that its weighted training error n 1 X t t w 1 ht xi yi Zt i 1 i satisfies t 1 2 Set the votes t t see Question 5 Update the weights for each i 1 n set t 1 wi where vt x t wi e yi vt xi t t if ht x 1 if ht x 1 end for Output the combined classifier defined by h T x T X vt x t 1 4 3pt Express Zt 1 as a function of t t and the weights at the start of t th iteration 5 12pt Fill in the missing part in the algorithm above That is find the rules for computing t and t that would minimize Zt 1 You may use the following notational t shorthands W is the total weight under the weights active at the beginning of t the t th iteration of the positive examples that are classified as positive by ht W is the total weight of the positive examples misclassified by ht and similarly defined t t W and W 4 Problem 2 Complexity Here we try to understand a bit better some implications …
View Full Document
Unlocking...