DOC PREVIEW
Purdue CS 59000 - Lecture notes

This preview shows page 1-2-3-19-20-38-39-40 out of 40 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 59000 Statistical Machine learningLecture 19Yuan (Alan) QiPurdue CSMidterm statisticsMedian: 32 out of 45 pointsstandard deviation: 8.5Outline• Review of Support Vector Machines• SVM classification of overlapping classes • SMV regression• Graphical Models, Bayesian networksSupport Vector MachinesSupport Vector Machines: motivated by statistical learning theory.Maximum margin classifiersMargin: the smallest distance between the decision boundary and any of the samplesMaximizing Margin Since scaling w and b together will not change the above ratio, we setIn the case of data points for which the equality holds, the constraints are said to be active, whereas for the remainder they are said to be inactive.Optimization ProblemQuadratic programming:Subject toLagrange Multiplier with Inequality ConstraintsActive and Inactive constraintsInactive constraint:The constraint plays no role in optimizationand so the stationary condition isEquivalently, we haveActive constraint:The sign of the Lagrange multiplier is positiveKarush-Kuhn-Tucker (KKT) conditionLagrange Function for SVMQuadratic programming:Subject to Lagrange function:Dual VariablesSetting derivatives of L with respect to w and b to zero:Dual ProblemMaximizePredictionKKT Condition, Support Vectors, and BiasThe corresponding data points in the latter case are known as support vectors. Then we can solve the bias term as follows:Computational ComplexityQuadratic programming:When Dimension < Number of data points, Solving the Dual problem is more costly.Dual representation allows the use of kernelsClassification for Overlapping ClassesSoft Margin:New Cost FunctionTo maximize margin and softly penalize points that lies on the wrong side of margin (not decision) boundary, we minimizeLagrange FunctionWhere we have Lagrange multipliers:Why?KKT ConditionGradientsDual LagrangianSince and , we haveDual Lagrangian with Constraints MinimizeSubject toExplain the above conditionsSupport VectorsSupport vectors: data points correspond to active constraintsand contribute to the predictive modelOther data points: correspond to inactive constraints and do not contribute to the predictionSolve Bias TermDiscussion on solving SVMs...Interpretation from Regularization FrameworkRegularized Logistic RegressionFor logistic regression, we haveVisualization of Hinge Error FunctionSVM for RegressionUsing sum of square errors, we haveHowever, the solution for ridge regression is not sparse.Є-insensitive Error FunctionMinimizeSlack VariablesHow many slack variables do we need?MinimizeVisualization of SVM RegressionSupport Vectors for RegressionWhich points will be support vectors for regression?Why?Sparsity RevisitedDiscussion: Error function or regularizer (Lasso)Bayesian NetworksDirected Acyclic Graph (DAG)Bayesian NetworksGeneral FactorizationBayesian Curve Fitting (1)PolynomialBayesian Curve Fitting (2)PlateBayesian Curve Fitting (3)Input variables and explicit hyperparametersBayesian Curve Fitting—LearningCondition on dataBayesian Curve Fitting — PredictionPredictive distribution:


View Full Document

Purdue CS 59000 - Lecture notes

Documents in this Course
Lecture 4

Lecture 4

42 pages

Lecture 6

Lecture 6

38 pages

Load more
Download Lecture notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?