1 1Bankruptcy Example0123456780 0.5 1 1.5 2RLNoYes 20123456780 0.5 1 1.5 2RLNoYes1-Nearest Neighbor Hypothesis 3Decision Tree HypothesisR > 0.90noyesL > 1.51yesL>5.0noyes010123456780 0.5 1 1.5 2RLNoYes 4Linear Hypothesis0123456780 0.5 1 1.5 2RLNoYes2 5Linearly Separable 6Not Linearly Separable 7Not Linearly Separable 8Not Linearly Separable3 9Linear Hypothesis Class• Equation of a hyperplane in the feature space• w, b are to be learned0=+! bxw!==+njjjbxw10 10Linear Hypothesis Class• Equation of a hyperplane in the feature space• w, b are to be learned0=+! bxw!==+njjjbxw10x1x2w=[w1 w2]w/b! 11Linear Hypothesis Class• Equation of a hyperplane in the feature space• w, b are to be learned• A useful trick: let x0=1 and w0=b0=+! bxw!==+njjjbxw100=! xw!==njjjxw00x1x2w=[w1 w2]w/b! 12Hyperplane: Geometryx1x2b!wˆxunitnormaloffset4 13Hyperplane: Geometryx1x2b!wˆxxw !ˆb+! xwˆsigned perpendiculardistance of point x tohyperplane.recall:!cosbaba ="! 14Hyperplane: Geometryx1x2b!wˆxxw !ˆb+! xwˆsigned perpendiculardistance of point x tohyperplane.recall:!cosbaba ="perpdistance ispositiveperpdistance isnegative!perpdistance iszero 15Linear Classifierx1x2wx)()()( xwxwx !"+!= signbsignhoutputs +1 or -1 16Linear Classifierx1x2wixiiiiiyby xwxw !"+!" )(#proportional toperpendicular distance ofpoint xi to hyperplane.γi > 0 : point is correctlyclassified (sign of distance = yi)yk = -1γi < 0 : point is incorrectlyclassified (sign of distance ≠ yi)yi = +1kxi!k!Margin:5 17Perceptron AlgorithmRosenblatt, 1956• Pick initial weight vector (including b), e.g. [0 … 0]• Repeat until all points correctly classified• Repeat for each point– Calculate margin ( ) for point i– If margin > 0, point is correctly classified– Else change weights to increase margin; change in weightproportional toiiy xwiiy x 18Perceptron AlgorithmRosenblatt, 1956• Pick initial weight vector (including b), e.g. [0 … 0]• Repeat until all points correctly classified• Repeat for each point– Calculate margin ( ) for point i– If margin > 0, point is correctly classified– Else change weights to increase margin; change in weightproportional toiiy xwiiy x• Note that, if yi = 1if xji > 0 then wj increased (increases margin)if xji < 0 then wj decreased (increases margin)• And, similarly for yi = -1 19Perceptron AlgorithmRosenblatt, 1956• Pick initial weight vector (including b), e.g. [0 … 0]• Repeat until all points correctly classified• Repeat for each point– Calculate margin ( ) for point i– If margin > 0, point is correctly classified– Else change weights to increase margin; change in weightproportional toiiy xwiiy x• Note that, if yi = 1if xji > 0 then wj increased (increases margin)if xji < 0 then wj decreased (increases margin)• And, similarly for yi = -1• Guaranteed to find separating hyperplane if one exists• Otherwise, data are not linearly separable, loops forever 20Perceptron AlgorithmBankruptcy DataFinal Answer: w=[-2.2 0.94 0.4]Initial Guess: w=[0.0 0.0 0.0]rate η = 0.16 21Gradient Ascent• Why pick as increment to weights?• To maximize scalar function of one variable f(w)• Pick initial w• Change w to w + η df/dw (η > 0, small)• until f stops changing (df/dw 0)iiy xfwdf/dw > 0slopedf/dw= 0local extremumη df/dw! " 22Gradient Ascent/Descent• To maximize f(w)• Pick initial w• Change w to w + η ∇wf (η > 0, small)• until f stops changing (∇wf ≈ 0)• Finds local maximum; global maximum if function isglobally convex.!"#$%&''''=(nwfwff ,,1Kw 23Gradient Ascent/Descent• To maximize f(w)• Pick initial w• Change w to w + η ∇wf (η > 0, small)• until f stops changing (∇wf ≈ 0)• Finds local maximum; global maximum if function isglobally convex• Rate (η) has to be chosen carefully.• Too small –slow convergence• Too big –oscillation!"#$%&''''=(nwfwff ,,1Kwfw 24Perceptron Trainingvia Gradient Descent• Maximize sum of margins of misclassified points!=iedmisclassif )(iiiyf xww!="iedmisclassif iiiyf xw7 25Perceptron Trainingvia Gradient Descent• Maximize sum of margins of misclassified points!=iedmisclassif )(iiiyf xww• Off-line training: Compute gradient as sum over alltraining points.• On-line training: Approximate gradient by one ofthe terms in the sum:!="iedmisclassif iiiyf xwiiy x 26Perceptron AlgorithmBankruptcy Data w0 w1 w2-1.0 1.00 1.0-1.3 0.71 0.6-1.4 0.66 0.5-1.4 0.66 0.8-1.5 0.61 0.7-1.6 0.56 0.6-1.6 0.65 0.6-1.6 0.74 0.6-1.6 0.83 0.6-1.7 0.81 0.3Initial Guess: w=[-1.0 1.0 1.0]Final Answer: w=[-1.7 0.81 0.3]rate η = 0.1 27Perceptron AlgorithmBankruptcy Datarate η = 0.1-9x(0.2 3)-4x(0.5 4)-1x(0.7 2)-1x(1.7 1)5x(0.2 6) 3x(1.1 3)Initial Guess: w=[-1.0 1.0 1.0]Final Answer: w=[-1.7 0.81 0.3] 5 x (1.0 0.2 6.0) 3 x (1.0 1.1 3.0)-9 x (1.0 0.2 3.0)-1 x (1.0 0.7 2.0)-4 x (1.0 0.5 4.0)-1 x (1.0 1.7 1.0) (-7.0 -1.9 -7.0) x 0.1 = (-0.7 -0.19 -0.7) 28Dual Form 5 x (1.0 0.2 6.0) 3 x (1.0 1.1 3.0)-9 x (1.0 0.2 3.0)-1 x (1.0 0.7 2.0)-4 x (1.0 0.5 4.0)-1 x (1.0 1.7 1.0) (-7.0 -1.9 -7.0) x 0.1= (-0.7 -0.19 -0.7)11111188877744433311xxxxxx1yyyyyy!!!!!!!==miiiiy1xw"#Assume initial weights are 0; rate=η>0αi is count ofmistakes on pointi during training8 29Dual Form 5 x (1.0 0.2 6.0) 3 x (1.0 1.1 3.0)-9 x (1.0 0.2 3.0)-1 x (1.0 0.7 2.0)-4 x (1.0 0.5 4.0)-1 x (1.0 1.7 1.0) (-7.0 -1.9 -7.0) x 0.1= (-0.7 -0.19 -0.7) ! "1y1x 1"3y3x 3"4y4x 4"7y7x 7"8y8x 8"11y11x 11!==miiiiy1xw"#Assume initial weights are 0; rate=η>0αi is count ofmistakes on pointi during training ! h(x) = sign(w " x ) = sign(#iyix i" x i=1m$)η just scalesanswer, set to 1 30Perceptron TrainingDual Form• α = 0• Repeat until all points correctly classified• Repeat for each point i– Calculate margin– If margin > 0, point is correctly classified– Else increment αi• Return• If data is not linearly separable, the αi grow withoutbound ! "jyjx j# x ij=1m$ ! w ="jj=1m#yjx j 31Which Separator? 32Which Separator?Maximize the margin to closest points9 33Which Separator?Maximize the margin to closest points 34Margin of a pointx1x2wix ! "i# yi(w $ xi+ b)• proportional to perpendiculardistance of point xi to hyperplanekxi!k! 35Margin of a pointx1x2wix ! "i# yi(w $ xi+ b)• proportional to perpendiculardistance of point xi to hyperplane• geometric margin iskxi!k! ! "iw 36Margin• Scaling w changes value of margin but not actualdistances to separator
View Full Document