DOC PREVIEW
MIT 6 867 - Problem Set 4

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

6.867 Machine LearningProblem set 4Due Thursday, October 31, in classWhat and how to turn in?Turn in short written answers to the questions explicitly stated, and when requested toexplain or prove. Do not turn in answers when requested to “think”, “consider”, “try” or“experiment” (except when specifically instructed). You may turn in answers to questionsmarked “optional”— they will be read and corrected, but a grade will not be recorded forthem.Turn in all MATLAB code explicitly requested, or that you used to calculate requestedvalues. It should be clear exactly what command was used to get the answer to eachquestion.To help the graders (including yourself...), please be neat, answer the questions briefly, andin the order they are stated. Staple each “Problem” separately, and be sure to write yourname on the top of every page.Problem 1: BoostingIn this problem, we slightly modify the AdaBoost algorithm seen in class to better exploresome properties of the algorithm. Specifically, we no longer normalize the weights on thetraining examples after each iteration. The modified algorithm, which is set to run for Titerations, is shown in Algorithm 1.Note that in the modified version, the weights associated with the training examples areno longer guaranteed to sum to one after each iteration (and therefore can not be viewedas a “distribution”), but the algorithm is still valid. Let us denote the sum of weights atthe start of iteration t by Zt=Pni=1w(t)i. At the start of the first iteration of boosting,Z1= n.Let us now investigate the behavior of Zt, as a function of t.1. [10pt] At the tthiteration we found a weak classifier that achieves a weighted trainingerror ²t. Show that the choice of “votes” given by αt=12log1−²t²tis optimal in the1Algorithm 1 AdaBoost: slightly modified version from the lecture (the weights are notnormalized to sum to one).Input: Set of n labeled examples (x1, y1), . . . , (xn, yn), yi= ±1.Set of associated weights w(1)1, . . . , w(1)n, initially all w(1)i= 1.Required number of iterations, T .for t = 1, . . . , T doFind a weak classifier ht, which outputs binary predictions ht(x) = ±1, such that itsweighted training error²t=1ZtnXi=1w(t)i(1 − δ (ht(xi), yi))satisfies ²t< 1/2.Set the vote αt=12log1−²t²t.Update the weights : for each i = 1, . . . , n setw(t+1)i= w(t)ie−yiαtht(xi)end forOutput: the combined classifier defined byˆhT(x) =TXt=1αtht(x)2sense that it minimizes Zt+1.Advice: Look at Zt+1as a function of α, and find the value for which the function achieves itsminimum. You may also find the following notational shorthand useful:WI=nXi=1w(t)i¡1 − δ(yi, ht(xi))¢, WC=nXi=1w(t)iδ(yi, ht(xi)),where WCis the total weight of points classified by htcorrectly, and WIthe total weight ofmisclassified points. δ(y, ht(x)) = 1 whenever the lab el predicted by htis correct and zerootherwise. The weights here are those available at the start of iteration t.2. [5pt] Show that the sum of weights Ztis monotonically decreasing as a function of t.We know now that Ztdecreases, and that our choice of αtis optimal for minimizingZt+1; but is this quantity useful in any way? It is useful in bounding how successivebo osting iterations reduce the training error.3. [10pt] Show that the training error (the average number of misclassified trainingexamples) of the combined classifier after m iterations of boosting,ˆhm(x) =mXt=1αtht(x),is bounded from above by Zm+1/n.Advice: In the definition of the algorithm, the weights (and therefore Zt) are defined recursively.You will find it helpful to “unroll” the definition, and write out the value of Zt+1in terms ofthe initial weights (which are all 1) and the subsequent updates, using α1, . . . , αm. You mayalso find the following (simple) fact helpful: for any x ≥ 0, ex≥ 1.We have shown that AdaBoost tries in each iteration to minimize an upper boundon the training error of the combined classifier, and guarantees that this bound willmonotonically decrease as the algorithm proceeds. How much the bound decreasesat each iteration depends on the weighted training error of each weak classifier.In the AdaBoost algorithm presented above, the vote αtassigned to a weak classifierhtdoes not differentiate between the “directions” of mistakes it makes (misclassifyinga positive example as negative or vice versa). We already know that the magnitudeof the vote is related to the performance of the classifier - that is, a classifier whichachieves lower weighted training error will get a higher α. But what if htmakesalmost all of its mistakes on, say, negative examples? That is, for a training examplex if ht(x) = −1, with very high probability x is indeed negative, but if ht(x) = +1,it is quite likely that in fact x is negative.Recall that the vote αtmeasures the influence of htin the final (combined) classifier.Thus, it would make sense to assign a high vote to its decisions in the “direction” inwhich htmakes fewer mistakes, and a lower vote to the opposite decisions. Algorithm2 describes the modification in AdaBoost that allows for such a refinement.3Algorithm 2 Modified boosting algorithm, allowing different votes for positive and nega-tive decisionsInput: Set of n labeled examples (x1, y1), . . . , (xn, yn), yi= ±1.Set of associated weights w(1)1, . . . , w(1)n, initially all w(1)i= 1.Required number of iterations, T .for t = 1, . . . , T doFind a weak classifier ht, which outputs binary predictions ht(x) = ±1, such that itsweighted training error²t=1ZtnXi=1w(t)i(1 − δ (ht(xi), yi))satisfies ²t< 1/2Set the votes αt, βt(see Question 5).Update the weights : for each i = 1, . . . , n setw(t+1)i= w(t)ie−yivt(xi)wherevt(x) =(αtif ht(x) = 1,−βtif ht(x) = −1.end forOutput: the combined classifier defined byˆhT(x) =TXt=1vt(x)4. [3pt] Express Zt+1as a function of αt, βtand the weights at the start of t-th iteration.5. [12pt] Fill in the missing part in the algorithm above. That is, find the rules forcomputing αtand βtthat would minimize Zt+1. You may use the following notationalshorthands: W(t)+ +is the total weight (under the weights active at the beginning ofthe t-th iteration) of the positive examples that are classified as positive by ht; W(t)+ −is the total weight of the positive examples misclassified by ht; and similarly definedW(t)− −and W(t)− +.4Problem 2: ComplexityHere we try to understand a bit better some implications of the fact that


View Full Document

MIT 6 867 - Problem Set 4

Download Problem Set 4
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Problem Set 4 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Problem Set 4 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?