UTD CS 6375 - Binary Classification / Perceptron - D3522906

Home> Schools> The University of Texas at Dallas> Computer Science (CS) > CS 6375> Binary Classification / Perceptron

UTD CS 6375 - Binary Classification / Perceptron

School name The University of Texas at Dallas

Pages 47

Download Save

Unformatted text preview:

Binary Classification Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Reminders Homework 1 is available on eLearning and due in 2 weeks Late homework will not be accepted Instructions for getting started with the course e g joining Piazza are on eLearning Office hours are happening this week Prof blackboard T 1 30pm 2 30pm W 11 00am 12 00pm 2 Supervised Learning Input 1 1 is the data item and is the label Goal find a function approximation to Can use it to predict such that is a good values for previously unseen values 3 Supervised Learning Hypothesis space set of allowable functions Goal find the best element of the hypothesis space How do we measure the quality of 4 Regression Hypothesis class linear functions How do we measure the quality of the approximation 5 Linear Regression In typical regression applications measure the fit using a squared loss function Want to minimize the average loss on the training data 1 2 For 2 D linear regression the learning problem is then For an unseen data point min 1 2 the learning algorithm predicts 6 Supervised Learning Select a hypothesis space elements of the space are represented by a collection of parameters Choose a loss function evaluates quality of the hypothesis as a function of its parameters Minimize loss function e g using gradient descent minimization over the parameters Evaluate quality of the learned model using test data that is data on which the model was not trained 7 Binary Classification Regression operates over a continuous set of outcomes Suppose that we want to learn a function As an example 0 1 How do we pick the hypothesis space How do we find the best space in this 1 2 3 4 3 0 0 1 1 0 1 1 1 1 0 0 1 0 1 1 0 8 Binary Classification Regression operates over a continuous set of outcomes Suppose that we want to learn a function As an example 0 1 How many functions with three binary inputs and one binary output are there 1 2 3 4 3 0 0 1 1 0 1 1 1 1 0 0 1 0 1 1 0 9 Binary Classification possible functions are consistent with the 8 2 observations 4 2 How do we choose the best one What if the observations are noisy 3 1 2 3 4 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 1 0 10 Challenges in ML How to choose the right hypothesis space Number of factors influence this decision difficulty of learning over the chosen space how expressive the space is How to evaluate the quality of our learned hypothesis Prefer simpler hypotheses to prevent overfitting Want the outcome of learning to generalize to unseen data 11 Binary Classification Input with and 1 1 We can think of the observations as points in 1 1 associated sign either corresponding to 0 1 with an An example with 2 12 Binary Classification Input with and 1 1 We can think of the observations as points in 1 1 associated sign either corresponding to 0 1 with an An example with 2 What is a good hypothesis space for this problem 13 Binary Classification Input with and 1 1 We can think of the observations as points in 1 1 associated sign either corresponding to 0 1 An example with with an 2 What is a good hypothesis space for this problem 14 Binary Classification Input with and 1 1 We can think of the observations as points in 1 1 associated sign either corresponding to 0 1 An example with with an 2 In this case we say that the observations are linearly separable 15 Linear Separators In dimensions a hyperplane is a solution to the equation with Hyperplanes divide open halfspaces 0 into two distinct sets of points called 0 0 16 Binary Classification Input with and 1 1 We can think of the observations as points in 1 1 associated sign either corresponding to 0 1 An example with with an 2 In this case we say that the observations are linearly separable 17 The Linearly Separable Case Given with and 1 1 Hypothesis space separating hyperplanes 1 1 How should we choose the loss function 18 The Linearly Separable Case Given with and 1 1 Hypothesis space separating hyperplanes 1 1 How should we choose the loss function Count the number of misclassifications Tough to optimize gradient contains no information 19 The Linearly Separable Case Given with and 1 1 Hypothesis space separating hyperplanes 1 1 How should we choose the loss function Penalize misclassification linearly by the size of the violation max 0 Modified hinge loss convex but not differentiable 20 The Perceptron Algorithm Try to minimize the perceptron loss using gradient descent The perceptron loss isn t differentiable how can we apply gradient descent Need a generalization of what it means to be the gradient of a convex function 21 Gradients of Convex Functions For a differentiable convex function its gradients are linear underestimators 22 Gradients of Convex Functions For a differentiable convex function its gradients are linear underestimators 23 Gradients of Convex Functions For a differentiable convex function its gradients are linear underestimators zero gradient corresponds to a global optimum 24 Subgradients For a convex function a subgradient at a point by any line such that i e it is a linear underestimator 0 0 and is given for all 0 0 25 Subgradients For a convex function a subgradient at a point by any line such that i e it is a linear underestimator 0 0 and is given for all 0 0 26 Subgradients For a convex function a subgradient at a point by any line such that i e it is a linear underestimator 0 0 and is given for all 0 0 27 Subgradients For a convex function a subgradient at a point by any line such that i e it is a linear underestimator 0 0 and is given for all 0 is a subgradient is a If then at 0 global minimum 0 0 0 28 Subgradients If a convex function is differentiable at a point unique subgradient at the point given by the gradient then it has a If a convex function is not differentiable at a point many subgradients it can have E g the set of subgradients of the convex function at the point is given by the set of slopes Subgradients only guaranteed to exist for convex functions 0 1 1 29 The Perceptron Algorithm max 0 Try to minimize the perceptron loss using sub gradient descent 30 The Perceptron Algorithm max 0 Try to minimize the perceptron loss using sub gradient descent 1 1 0 1 0 1 31 The Perceptron Algorithm max 0 Try to minimize the perceptron loss using sub gradient descent 1 1 0 1 0 1 Is equal to zero if the data point is correctly classified and one otherwise 32 The Perceptron Algorithm Try to minimize …

View Full Document


School:
Email:
New Password:
Confirm Password:

UTD CS 6375 - Binary Classification / Perceptron

Sign up for free to view:

Please select your school