Unformatted text preview:

1 1Bankruptcy Example0123456780 0.5 1 1.5 2RLNoYes 20123456780 0.5 1 1.5 2RLNoYes1-Nearest Neighbor Hypothesis 3Decision Tree HypothesisR > 0.90noyesL > 1.51yesL>5.0noyes010123456780 0.5 1 1.5 2RLNoYes 4Linear Hypothesis0123456780 0.5 1 1.5 2RLNoYes2 5Linearly Separable 6Not Linearly Separable 7Not Linearly Separable 8Not Linearly Separable3 9Linear Hypothesis Class• Equation of a hyperplane in the feature space• w, b are to be learned0=+! bxw!==+njjjbxw10 10Linear Hypothesis Class• Equation of a hyperplane in the feature space• w, b are to be learned0=+! bxw!==+njjjbxw10x1x2w=[w1 w2]w/b! 11Linear Hypothesis Class• Equation of a hyperplane in the feature space• w, b are to be learned• A useful trick: let x0=1 and w0=b0=+! bxw!==+njjjbxw100=! xw!==njjjxw00x1x2w=[w1 w2]w/b! 12Hyperplane: Geometryx1x2b!wˆxunitnormaloffset4 13Hyperplane: Geometryx1x2b!wˆxxw !ˆb+! xwˆsigned perpendiculardistance of point x tohyperplane.recall:!cosbaba ="! 14Hyperplane: Geometryx1x2b!wˆxxw !ˆb+! xwˆsigned perpendiculardistance of point x tohyperplane.recall:!cosbaba ="perpdistance ispositiveperpdistance isnegative!perpdistance iszero 15Linear Classifierx1x2wx)()()( xwxwx !"+!= signbsignhoutputs +1 or -1 16Linear Classifierx1x2wixiiiiiyby xwxw !"+!" )(#proportional toperpendicular distance ofpoint xi to hyperplane.γi > 0 : point is correctlyclassified (sign of distance = yi)yk = -1γi < 0 : point is incorrectlyclassified (sign of distance ≠ yi)yi = +1kxi!k!Margin:5 17Perceptron AlgorithmRosenblatt, 1956• Pick initial weight vector (including b), e.g. [0 … 0]• Repeat until all points correctly classified• Repeat for each point– Calculate margin ( ) for point i– If margin > 0, point is correctly classified– Else change weights to increase margin; change in weightproportional toiiy xwiiy x 18Perceptron AlgorithmRosenblatt, 1956• Pick initial weight vector (including b), e.g. [0 … 0]• Repeat until all points correctly classified• Repeat for each point– Calculate margin ( ) for point i– If margin > 0, point is correctly classified– Else change weights to increase margin; change in weightproportional toiiy xwiiy x• Note that, if yi = 1if xji > 0 then wj increased (increases margin)if xji < 0 then wj decreased (increases margin)• And, similarly for yi = -1 19Perceptron AlgorithmRosenblatt, 1956• Pick initial weight vector (including b), e.g. [0 … 0]• Repeat until all points correctly classified• Repeat for each point– Calculate margin ( ) for point i– If margin > 0, point is correctly classified– Else change weights to increase margin; change in weightproportional toiiy xwiiy x• Note that, if yi = 1if xji > 0 then wj increased (increases margin)if xji < 0 then wj decreased (increases margin)• And, similarly for yi = -1• Guaranteed to find separating hyperplane if one exists• Otherwise, data are not linearly separable, loops forever 20Perceptron AlgorithmBankruptcy DataFinal Answer: w=[-2.2 0.94 0.4]Initial Guess: w=[0.0 0.0 0.0]rate η = 0.16 21Gradient Ascent• Why pick as increment to weights?• To maximize scalar function of one variable f(w)• Pick initial w• Change w to w + η df/dw (η > 0, small)• until f stops changing (df/dw 0)iiy xfwdf/dw > 0slopedf/dw= 0local extremumη df/dw! " 22Gradient Ascent/Descent• To maximize f(w)• Pick initial w• Change w to w + η ∇wf (η > 0, small)• until f stops changing (∇wf ≈ 0)• Finds local maximum; global maximum if function isglobally convex.!"#$%&''''=(nwfwff ,,1Kw 23Gradient Ascent/Descent• To maximize f(w)• Pick initial w• Change w to w + η ∇wf (η > 0, small)• until f stops changing (∇wf ≈ 0)• Finds local maximum; global maximum if function isglobally convex• Rate (η) has to be chosen carefully.• Too small –slow convergence• Too big –oscillation!"#$%&''''=(nwfwff ,,1Kwfw 24Perceptron Trainingvia Gradient Descent• Maximize sum of margins of misclassified points!=iedmisclassif )(iiiyf xww!="iedmisclassif iiiyf xw7 25Perceptron Trainingvia Gradient Descent• Maximize sum of margins of misclassified points!=iedmisclassif )(iiiyf xww• Off-line training: Compute gradient as sum over alltraining points.• On-line training: Approximate gradient by one ofthe terms in the sum:!="iedmisclassif iiiyf xwiiy x 26Perceptron AlgorithmBankruptcy Data w0 w1 w2-1.0 1.00 1.0-1.3 0.71 0.6-1.4 0.66 0.5-1.4 0.66 0.8-1.5 0.61 0.7-1.6 0.56 0.6-1.6 0.65 0.6-1.6 0.74 0.6-1.6 0.83 0.6-1.7 0.81 0.3Initial Guess: w=[-1.0 1.0 1.0]Final Answer: w=[-1.7 0.81 0.3]rate η = 0.1 27Perceptron AlgorithmBankruptcy Datarate η = 0.1-9x(0.2 3)-4x(0.5 4)-1x(0.7 2)-1x(1.7 1)5x(0.2 6) 3x(1.1 3)Initial Guess: w=[-1.0 1.0 1.0]Final Answer: w=[-1.7 0.81 0.3] 5 x (1.0 0.2 6.0) 3 x (1.0 1.1 3.0)-9 x (1.0 0.2 3.0)-1 x (1.0 0.7 2.0)-4 x (1.0 0.5 4.0)-1 x (1.0 1.7 1.0) (-7.0 -1.9 -7.0) x 0.1 = (-0.7 -0.19 -0.7) 28Dual Form 5 x (1.0 0.2 6.0) 3 x (1.0 1.1 3.0)-9 x (1.0 0.2 3.0)-1 x (1.0 0.7 2.0)-4 x (1.0 0.5 4.0)-1 x (1.0 1.7 1.0) (-7.0 -1.9 -7.0) x 0.1= (-0.7 -0.19 -0.7)11111188877744433311xxxxxx1yyyyyy!!!!!!!==miiiiy1xw"#Assume initial weights are 0; rate=η>0αi is count ofmistakes on pointi during training8 29Dual Form 5 x (1.0 0.2 6.0) 3 x (1.0 1.1 3.0)-9 x (1.0 0.2 3.0)-1 x (1.0 0.7 2.0)-4 x (1.0 0.5 4.0)-1 x (1.0 1.7 1.0) (-7.0 -1.9 -7.0) x 0.1= (-0.7 -0.19 -0.7) ! "1y1x 1"3y3x 3"4y4x 4"7y7x 7"8y8x 8"11y11x 11!==miiiiy1xw"#Assume initial weights are 0; rate=η>0αi is count ofmistakes on pointi during training ! h(x) = sign(w " x ) = sign(#iyix i" x i=1m$)η just scalesanswer, set to 1 30Perceptron TrainingDual Form• α = 0• Repeat until all points correctly classified• Repeat for each point i– Calculate margin– If margin > 0, point is correctly classified– Else increment αi• Return• If data is not linearly separable, the αi grow withoutbound ! "jyjx j# x ij=1m$ ! w ="jj=1m#yjx j 31Which Separator? 32Which Separator?Maximize the margin to closest points9 33Which Separator?Maximize the margin to closest points 34Margin of a pointx1x2wix ! "i# yi(w $ xi+ b)• proportional to perpendiculardistance of point xi to hyperplanekxi!k! 35Margin of a pointx1x2wix ! "i# yi(w $ xi+ b)• proportional to perpendiculardistance of point xi to hyperplane• geometric margin iskxi!k! ! "iw 36Margin• Scaling w changes value of margin but not actualdistances to separator


View Full Document

MIT 6 825 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?