MIT 9 520 - Online Learning - D1695366

Home> Schools> Massachusetts Institute of Technology> Brain and Cognitive Sciences (9) > 9 520> Online Learning

DOC PREVIEW

MIT 9 520 - Online Learning

School name Massachusetts Institute of Technology

Course 9 520- Statistical Learning Theory and Applications

Pages 43

This preview shows page 1-2-3-20-21-22-41-42-43 out of 43 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Online LearningTomaso Poggio and Lorenzo Rosasco9.520 Class 15March 30 2011T. Poggio and L. Rosasco Online LearningAbout this classGoal To introduce theory and algorithms for onlinelearning.T. Poggio and L. Rosasco Online LearningPlanDifferent views on online learningFrom batch to online least squaresOther loss functionsTheoryT. Poggio and L. Rosasco Online Learning(Batch) Learning AlgorithmsA learning algorithm A is a map from the data space into thehypothesis space andfS= A(S),where S = Sn= (x0, y0). . . . (xn−1, yn−1).We typically assume that:A is deterministic,A does not depend on the ordering of the points in thetraining set.notation: note the weird numbering of the training set!T. Poggio and L. Rosasco Online LearningOnline Learning AlgorithmsThe pure online learning approach is O(1) in time and memorywith respect to the data.let f1= initfor n = 1, . . .fn+1= A(fn, (xn, yn))The algorithm works sequentially and has a recursivedefinition.T. Poggio and L. Rosasco Online LearningOnline Learning Algorithms (cont.)A related approach (similar to transductive learning) is typicallyO(1) in time but not in memory with respect to the data.let f1= initfor n = 1, . . .fn+1= A(fn, Sn, (xn, yn))Also in this case the algorithm works sequentially and hasa recursive definition, but it requires storing the past dataSn.T. Poggio and L. Rosasco Online LearningWhy Online Learning?Different motivations/perspectives that often corresponds todifferent theoretical framework.Biologically plausibility.Stochastic approximation.Incremental Optimization.Non iid data, game theoretic view.T. Poggio and L. Rosasco Online LearningOnline Learning and Stochastic ApproximationOur goal is to minimize the expected riskI[f ] = E(x,y)[V (f (x), y)] =ZV (f (x), y)dµ(x, y )over the hypothesis space H, but the data distribution is notknown.The idea is to use the samples to build an approximate solutionand to update such a solution as we get more data.T. Poggio and L. Rosasco Online LearningOnline Learning and Stochastic ApproximationOur goal is to minimize the expected riskI[f ] = E(x,y)[V (f (x), y)] =ZV (f (x), y)dµ(x, y )over the hypothesis space H, but the data distribution is notknown.The idea is to use the samples to build an approximate solutionand to update such a solution as we get more data.T. Poggio and L. Rosasco Online LearningOnline Learning and Stochastic Approximation (cont.)More precisely ifwe are given samples (xi, yi)iin a sequential fashionat the n − th step we have an approximation G(f , (xn, yn))of the gradient of I[f ]then we can define a recursion bylet f1= initfor n = 1, . . .fn+1= fn+ γn(G(fn, (xn, yn))T. Poggio and L. Rosasco Online LearningOnline Learning and Stochastic Approximation (cont.)More precisely ifwe are given samples (xi, yi)iin a sequential fashionat the n − th step we have an approximation G(f , (xn, yn))of the gradient of I[f ]then we can define a recursion bylet f1= initfor n = 1, . . .fn+1= fn+ γn(G(fn, (xn, yn))T. Poggio and L. Rosasco Online LearningIncremental OptimizationHere our goal is to solve empirical risk minimizationIS[f ],or regularized empirical risk minimizationIλS[f ] = IS[f ] + λkf k2over the hypothesis space H, when the number of points is sobig (say n = 108− 109) that standard solvers would not befeasible.Memory is the main constraint here.T. Poggio and L. Rosasco Online LearningIncremental Optimization (cont.)In this case we can considerlet f1= initfor t = 1, . . .ft+!= ft+ γt(G(ft, (xnt, ynt))where here G(ft, (xnt, ynt)) is a pointwise estimate of ISor IλS.EpochsNote that in this case the number of iteration is decoupled tothe index of training set points and we can look at the datamore than once, that is consider different epochs.T. Poggio and L. Rosasco Online LearningNon i.i.d. data, game theoretic viewIf the data are not i.i.d. we can consider a setting when the datais a finite sequence that we will be disclosed to us in asequential (possibly adversarial) fashion.Then we can see learning as a two players game whereat each step nature chooses a samples (xi, yi)at each step a learner chooses an estimator fn.The goal of the learner is to perform as well as if he could viewthe whole sequence.T. Poggio and L. Rosasco Online LearningNon i.i.d. data, game theoretic viewIf the data are not i.i.d. we can consider a setting when the datais a finite sequence that we will be disclosed to us in asequential (possibly adversarial) fashion.Then we can see learning as a two players game whereat each step nature chooses a samples (xi, yi)at each step a learner chooses an estimator fn.The goal of the learner is to perform as well as if he could viewthe whole sequence.T. Poggio and L. Rosasco Online LearningPlanDifferent views on online learningFrom batch to online least squaresOther loss functionsTheoryT. Poggio and L. Rosasco Online LearningRecalling Least SquaresWe start considering a linear kernel so thatIS[f ] =1nn−1Xi=0(yi− xTiw) = kY − Xwk2Remember that in this casewn= (XTX )−1XTY = C−1nn−1Xi=0xiyi.(Note that if we regularize we have (Cn+ λI)−1in place of C−1n.notation: note the weird numbering of the training set!T. Poggio and L. Rosasco Online LearningRecalling Least SquaresWe start considering a linear kernel so thatIS[f ] =1nn−1Xi=0(yi− xTiw) = kY − Xwk2Remember that in this casewn= (XTX )−1XTY = C−1nn−1Xi=0xiyi.(Note that if we regularize we have (Cn+ λI)−1in place of C−1n.notation: note the weird numbering of the training set!T. Poggio and L. Rosasco Online LearningA Recursive Least Squares AlgorithmThen we can considerwn+1= wn+ C−1n+1xn[yn− xTnwn].Proofwn= C−1n(Pn−1i=0xiyi)wn+1= C−1n+1(Pn−1i=0xiyi+ xnyn)wn+1− wn= C−1n+1(xnyn) + C−1n+1(Cn− Cn+1)C−1nPn−1i=0xiyiCn+1− Cn= xnxTn.T. Poggio and L. Rosasco Online LearningA Recursive Least Squares AlgorithmThen we can considerwn+1= wn+ C−1n+1xn[yn− xTnwn].Proofwn= C−1n(Pn−1i=0xiyi)wn+1= C−1n+1(Pn−1i=0xiyi+ xnyn)wn+1− wn= C−1n+1(xnyn) + C−1n+1(Cn− Cn+1)C−1nPn−1i=0xiyiCn+1− Cn= xnxTn.T. Poggio and L. Rosasco Online LearningA Recursive Least Squares AlgorithmThen we can considerwn+1= wn+ C−1n+1xn[yn− xTnwn].Proofwn= C−1n(Pn−1i=0xiyi)wn+1= C−1n+1(Pn−1i=0xiyi+ xnyn)wn+1− wn= C−1n+1(xnyn) + C−1n+1(Cn− Cn+1)C−1nPn−1i=0xiyiCn+1− Cn= xnxTn.T. Poggio and L. Rosasco Online LearningA Recursive Least Squares AlgorithmThen we can considerwn+1= wn+ C−1n+1xn[yn− xTnwn].Proofwn=

View Full Document