**Unformatted text preview:**

Binary Classification / PerceptronRemindersSupervised LearningSupervised LearningRegressionLinear RegressionSupervised LearningBinary ClassificationBinary ClassificationBinary ClassificationChallenges in MLBinary ClassificationBinary ClassificationBinary ClassificationBinary ClassificationLinear SeparatorsBinary ClassificationThe Linearly Separable CaseThe Linearly Separable CaseThe Linearly Separable CaseThe Perceptron AlgorithmGradients of Convex FunctionsGradients of Convex FunctionsGradients of Convex FunctionsSubgradientsSubgradientsSubgradientsSubgradientsSubgradientsThe Perceptron AlgorithmThe Perceptron AlgorithmThe Perceptron AlgorithmThe Perceptron AlgorithmSubgradient DescentSubgradient DescentStochastic Gradient DescentStochastic Gradient DescentApplications of PerceptronPerceptron Learning DrawbacksWhat If the Data Isn‘t Separable?What If the Data Isn‘t Separable?Adding FeaturesAdding FeaturesAdding Features: ExamplesAdding FeaturesAdding FeaturesSupport Vector MachinesBinary Classification / PerceptronNicholas RuozziUniversity of Texas at DallasSlides adapted from David Sontag and Vibhav GogateReminders• Homework 1 is available on eLearning and due in 2 weeks• Late homework will not be accepted• Instructions for getting started with the course, e.g., joining Piazza, are on eLearning• Office hours are happening this week• Prof. (blackboard) T 1:30pm-2:30pm, W 11:00am-12:00pm2Supervised Learning• Input: 𝑥𝑥(1), 𝑦𝑦(1), … , (𝑥𝑥(𝑀𝑀), 𝑦𝑦(𝑀𝑀))• 𝑥𝑥(𝑚𝑚)is the 𝑚𝑚𝑡𝑡𝑡data item and 𝑦𝑦(𝑚𝑚)is the 𝑚𝑚𝑡𝑡𝑡label• Goal: find a function 𝑓𝑓 such that 𝑓𝑓 𝑥𝑥(𝑚𝑚)is a “good approximation” to 𝑦𝑦(𝑚𝑚)• Can use it to predict 𝑦𝑦 values for previously unseen 𝑥𝑥 values3Supervised Learning• Hypothesis space: set of allowable functions 𝑓𝑓: 𝑋𝑋 → 𝑌𝑌• Goal: find the “best” element of the hypothesis space• How do we measure the quality of 𝑓𝑓?4Regression𝑥𝑥𝑦𝑦Hypothesis class: linear functions 𝑓𝑓 𝑥𝑥 = 𝑎𝑎𝑥𝑥 + 𝑏𝑏How do we measure the quality of the approximation?5Linear Regression• In typical regression applications, measure the fit using a squared loss function𝐿𝐿 𝑓𝑓 =1𝑀𝑀�𝑚𝑚𝑓𝑓 𝑥𝑥𝑚𝑚−𝑦𝑦𝑚𝑚2• Want to minimize the average loss on the training data• For 2-D linear regression, the learning problem is thenmin𝑎𝑎,𝑏𝑏1𝑀𝑀�𝑚𝑚𝑎𝑎𝑥𝑥(𝑚𝑚)+ 𝑏𝑏 −𝑦𝑦(𝑚𝑚)2• For an unseen data point, 𝑥𝑥, the learning algorithm predicts 𝑓𝑓(𝑥𝑥)6Supervised Learning• Select a hypothesis space (elements of the space are represented by a collection of parameters)• Choose a loss function (evaluates quality of the hypothesis as a function of its parameters)• Minimize loss function, e.g., using gradient descent (minimization over the parameters)• Evaluate quality of the learned model using test data – that is, data on which the model was not trained7Binary Classification• Regression operates over a continuous set of outcomes• Suppose that we want to learn a function 𝑓𝑓: 𝑋𝑋 → {0,1}• As an example:How do we pick the hypothesis space?How do we find the best 𝑓𝑓 in this space?𝒙𝒙𝟏𝟏𝒙𝒙𝟐𝟐𝑥𝑥3𝑦𝑦10 0 1 020 1 0 131 1 0 141 1 1 08Binary Classification• Regression operates over a continuous set of outcomes• Suppose that we want to learn a function 𝑓𝑓: 𝑋𝑋 → {0,1}• As an example:𝒙𝒙𝟏𝟏𝒙𝒙𝟐𝟐𝑥𝑥3𝑦𝑦10 0 1 020 1 0 131 1 0 141 1 1 0How many functions with three binary inputs and one binary output are there?9Binary Classification𝒙𝒙𝟏𝟏𝒙𝒙𝟐𝟐𝑥𝑥3𝑦𝑦0 0 0 ?10 0 1 020 1 0 10 1 1 ?1 0 0 ?1 0 1 ?31 1 0 141 1 1 028possible functions24are consistent with the observationsHow do we choose the best one?What if the observations are noisy?10Challenges in ML• How to choose the right hypothesis space?• Number of factors influence this decision: difficulty of learning over the chosen space, how expressive the space is, … • How to evaluate the quality of our learned hypothesis?• Prefer “simpler” hypotheses (to prevent overfitting)• Want the outcome of learning to generalize to unseen data11Binary Classification• Input 𝑥𝑥1, 𝑦𝑦(1), … , (𝑥𝑥𝑀𝑀, 𝑦𝑦(𝑀𝑀)) with 𝑥𝑥(𝑚𝑚)∈ ℝ𝑛𝑛and 𝑦𝑦(𝑚𝑚)∈{−1, +1}• We can think of the observations as points in ℝ𝑛𝑛with an associated sign (either +/- corresponding to 0/1)• An example with 𝑛𝑛 = 2++++++++++++__________12++++++++++++__________Binary Classification• Input 𝑥𝑥1, 𝑦𝑦(1), … , (𝑥𝑥𝑀𝑀, 𝑦𝑦(𝑀𝑀)) with 𝑥𝑥(𝑚𝑚)∈ ℝ𝑛𝑛and 𝑦𝑦(𝑚𝑚)∈{−1, +1}• We can think of the observations as points in ℝ𝑛𝑛with an associated sign (either +/- corresponding to 0/1)• An example with 𝑛𝑛 = 2What is a good hypothesis space for this problem?13Binary Classification• Input 𝑥𝑥1, 𝑦𝑦(1), … , (𝑥𝑥𝑀𝑀, 𝑦𝑦(𝑀𝑀)) with 𝑥𝑥(𝑚𝑚)∈ ℝ𝑛𝑛and 𝑦𝑦(𝑚𝑚)∈{−1, +1}• We can think of the observations as points in ℝ𝑛𝑛with an associated sign (either +/- corresponding to 0/1)• An example with 𝑛𝑛 = 2++++++++++++__________14What is a good hypothesis space for this problem?Binary Classification• Input 𝑥𝑥1, 𝑦𝑦(1), … , (𝑥𝑥𝑀𝑀, 𝑦𝑦(𝑀𝑀)) with 𝑥𝑥(𝑚𝑚)∈ ℝ𝑛𝑛and 𝑦𝑦(𝑚𝑚)∈{−1, +1}• We can think of the observations as points in ℝ𝑛𝑛with an associated sign (either +/- corresponding to 0/1)• An example with 𝑛𝑛 = 2++++++++++++__________15In this case, we say that the observations are linearly separableLinear Separators• In 𝑛𝑛 dimensions, a hyperplane is a solution to the equation𝑤𝑤𝑇𝑇𝑥𝑥 + 𝑏𝑏 = 0with 𝑤𝑤 ∈ ℝ𝑛𝑛, 𝑏𝑏 ∈ ℝ• Hyperplanes divide ℝ𝑛𝑛into two distinct sets of points (called open halfspaces)𝑤𝑤𝑇𝑇𝑥𝑥 + 𝑏𝑏 > 0𝑤𝑤𝑇𝑇𝑥𝑥 + 𝑏𝑏 < 016Binary Classification• Input 𝑥𝑥1, 𝑦𝑦(1), … , (𝑥𝑥𝑀𝑀, 𝑦𝑦(𝑀𝑀)) with 𝑥𝑥(𝑚𝑚)∈ ℝ𝑛𝑛and 𝑦𝑦(𝑚𝑚)∈{−1, +1}• We can think of

View Full Document