Unformatted text preview:

Summary Supervised Learning Greg Grudic Machine Learning 1 Summary Topics Supervised Learning Model Selection i e Learning Parameters Frequentist and Bayesian Learning algorithm evaluation Assumptions on Data Generative and Discriminative Classifiers Supervise Learning Algorithms Machine Learning 2 1 Supervised Learning Given Training examples x f x x f x x 1 1 2 2 P f xP of some unknown function system y f x Find f x i e an approximation Predict y f x where x is not in the training set Machine Learning 3 Two Types of Supervised Learning Classification y c1 c2 cN Model output is a prediction that the input belongs to some class If the input is an image the output might be chair face dog boat etc Regression y The output has infinitely many values If the input is stock features the output could be a prediction of tomorrow s stock price Machine Learning 4 2 Goal of Supervised Learning Build a model that does best on Future Data Machine Learning 5 Assumptions on Regression Data Data x1 y1 x N yN where xi d are independently identically distributed iid from D x and y are generated from yi f xi where f x is a real valued function defined on x d and is a random variable E 0 V c c 0 c Machine Learning 6 3 Assumptions on Classification Data Assume data x1 y1 x N yN where xi d and y c1 cK i e K classes The prior probability of each class is pk and each class is iid from a pdf hk x Then the posterior probability of class ck given x is pk hk x Pr y ck x K p h x i i i 1 Machine Learning 7 Building Supervised Learning Models Frequentist Model Selection x1 y1 x N yN Learning Parameters Learning Algorithm M x Model is used to make predictions y M x Machine Learning 8 4 Learning Parameters These dictate how the learning algorithm will build a model Changing the learning parameters changes how good the model is Goal Choose the learning parameters that produce the best model Machine Learning 9 Measuring Model Accuracy Regression Assume a set of data x1 y1 x K yK Regression accuracy Two commonly used metrics Mean Square Error 2 1 K 1 errorM x yi M xi K i 1 K Relative Error K K 2 y y i i i 1 2 y M x i errorM x i i 1 K 2 y y i i 1Learning Machine 10 5 Measuring Model Accuracy Classification Assume a set of data x1 y1 x K yK Classification accuracy 1 errorM x K K c x y M x i i i i 1 Where 0 if yi M xi c xi yi M xi otherwise 1 Machine Learning 11 Picking the Best Learning Parameters Partition learning data into disjoint sets Training Set x1 y1 xT yT Used to build the model Validation Set x1 y1 xV yV Used to evaluate model Pick the Learning Parameters that give the lowest error on the Validation Set 1 V errorM x c xi yi M xi V i 1 Machine Learning 12 6 How Big Should the Training and Validation Sets Be It Depends If you have Lots of data for learning Randomly putting half the data into each set is often sufficient If you only have a Small data set for learning Usually do N Fold Cross Validation Machine Learning 13 N Fold Cross Validation Partition the data D x y x y into N disjoint sets T1 TN For i from 1 to N do Use Ti for validation and the remaining Si for training Training Set Si D0 Ti Error on validation T errorT i i Return the average error on validation sets 0 1 1 M M 1 N errorTi N i 1 Pick the learning parameters that minimize this error errorM x Machine Learning 14 7 Does My Cross Validation Error Reflect the True Error of My Model No Need to do randomized experiments e g 100 experiments 90 data for learning use cross validation on this set to pick learning parameters 10 for testing Report average test error over the 100 experiments Machine Learning 15 Bayesian Model Selection Pick the hypothesis that has maximum probability given the data Bayes Theorem Learning parameters are chosen to maximize the probability of the hypothesis given the data Machine Learning 16 8 Generative and Discriminative Classifiers Generative Classifier Models model the distributions that generate the data e g Bayesian density models y arg max p k h k x k Discriminative classifier Models model only the boundaries e g trees SVMs Nearest Neighbor Neural Networks etc Machine Learning 17 Supervised Learning Algorithms I Linear Regression Ridge Regression Linear and Kernel Lasso Regression Linear and Kernel Perceptron Classification Support Vector Machines Classification and Regression Machine Learning 18 9 Supervised Learning Algorithms II K Nearest Neighbors Classification and Regression Decision Trees 1 R stump Neural Networks Classification and Regression Bagging Classification and Regression Random Forests Classification and Regression Boosting Classifiers Machine Learning 19 Linear Regression Main Assumptions Linear weighted sum of attribute values Attributes and target values are real valued Hypothesis Space Fixed size parametric Limited modeling potential d y i xi 0 i 1 Can be made non linear using basis functions now linear in basis function space K y i i x 0 i 1 Machine Learning 20 10 Linear Regression Learning Algorithms Minimum Least Square Error 2 N d MSE arg min yi 0 j xij i 1 j 1 Ridge Regression Lasso 2 d d N ridge arg min yi 0 j xij 2j i 1 j 1 j 1 2 N d lasso arg min yi 0 j xij i 1 j 1 d subject to j s s 0 j 1 Machine Learning 21 Linear Regression Summary Good points Does feature selection LASO Bad points Slow learning on very large datasets 20 000 Software WEKA http www cs waikato ac nz ml weka index html LARS http wwwstat stanford edu hastie Papers LARS Machine Learning 22 11 Perceptron Algorithm Finds a Linear Separating Hyper Plane d y sgn 0 i xi i 1 NO Machine Learning 23 Linear Hyperplanes Linearly Separable Not Linearly Separable Machine Learning 24 12 Nonlinear Perceptron Algorithm Use a nonlinear basis function space K y sgn 0 i i x i 1 Basis functions can be kernels Machine Learning 25 Perceptron Algorithm Works by gradient descent L L 0 1 d yi 0 1 d xT i M where M is the set of missclassified training examples Machine Learning 26 13 Perceptron Summary Good points Convergence guaranteed if problem is separable In basis function space or linear space Works on large data sets Algorithm works by gradient descent Bad points Won t converge if data isn t separable Learning Parameters Learning rate choice of nonlinear basis functions Machine Learning 27 Support Vector Machines Main Assumption Build a model using minimal number of training instances Support Vectors Hypothesis Space Variable size nonparametric Can model any function given the right kernels e g Gaussian Machine Learning 28 14 Linear Support


View Full Document

CU-Boulder CSCI 4202 - Supervised Learning

Download Supervised Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Supervised Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Supervised Learning and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?