MSU CSE 847 - online - D3015385

Home> Schools> Michigan State University> Computer Science & Engineering (CSE) > CSE 847> online

MSU CSE 847 - online

Pages 32

Download Save

Unformatted text preview:

Online Learning Rong Jin Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are received one at each time Online Learning For t 1 2 T Receive an instance Predict its class label Receive the true class label Encounter loss Update the classification model Objective Minimize the total loss Loss function Zero One loss Hinge loss 4 Loss Functions Hinge Loss Zero One Loss 1 1 5 Linear Classifiers Restrict our discussion to linear classifier Prediction Confidence 6 Separable Set 7 Inseparable Sets 8 Why Online Learning Fast Memory efficient process one example at a time Simple to implement Formal guarantees Regret Mistake bounds Online to Batch conversions No statistical assumptions Adaptive Not as good as a well designed batch algorithms 9 Update Rules Online algorithms are based on an update rule which defines from and possibly other information Linear Classifiers find from based on the input Some Update Rules Perceptron Rosenblat ALMA Gentile ROMMA Li Long NORMA Kivinen et al MIRA Crammer Singer EG Littlestown and Warmuth Bregman Based Warmuth 10 Perceptron Initialize For t 1 2 T Receive an instance Predict its class label Receive the true class label If then Geometrical Interpretation 12 Mistake Bound Separable Case Assume the data set D is linearly separable with margin i e Assume Then the maximum number of mistakes made by the Perceptron algorithm is bounded by Mistake Bound Separable Case Mistake Bound Inseparable Case Let be the best linear classifier We measure our progress by Consider we make a mistake for Mistake Bound Inseparable Case Result 1 Mistake Bound Inseparable Case Result 2 Perceptron with Projection Initialize For t 1 2 T Receive an instance Predict its class label Receive the true class label If then If then Remarks Mistake bound is measured for a sequence of classifiers Bound does not depend on dimension of the feature vector The bound holds for all sequences no i i d assumption It is not tight for most real world data But it can not be further improved in general 19 Perceptron Conservative updates Initialize the classifier only For t 1 2 T when it misclassifies Receive an instance Predict its class label Receive the true class label If then Aggressive Perceptron Initialize For t 1 2 T Receive an instance Predict its class label Receive the true class label If then Regret Bound Learning a Classifier The evaluation mistake bound or regret bound concerns a sequence of classifiers But by the end of the day which classifier should used The last By Cross Validation Learning with Expert Advice Learning to combine the predictions from multiple experts An ensemble of d experts Combination weights Combined classifier Hedge Simple Case There exists one expert denoted by who can perfectly classify all the training examples What is your learning strategy Difficult case What if we don t have such a perfect expert Hedge Algorithm 1 1 1 1 Hedge Algorithm Initialize For t 1 2 T Receive a training example Prediction If then For i 1 2 d If then Mistake Bound Mistake Bound Measure the progress Lower bound Mistake Bound Upper bound Mistake Bound Upper bound Mistake Bound

View Full Document


School:
Email:
New Password:
Confirm Password:

MSU CSE 847 - online

Sign up for free to view:

Please select your school