DOC PREVIEW
MIT 9 520 - Study References

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Loose Ends: stability, various definitionTomaso Poggio9.520 Class 13March 2010Tomaso Poggio Loose Ends: stability, various definitionsA reminder: convergence in probabilityLet {Xn} be a sequence of bounded random variables. We saythatlimn→∞Xn= X in probabilityif∀ε>0 limn→∞P{|Xn− X |≥ε} = 0.orif for each n there exists a εnand a δnsuch thatP {|Xn− X |≥εn}≤δn,with εnand δngoing to zero for n →∞.Tomaso Poggio Loose Ends: stability, various definitionsGeneralizationA natural requirement for fSis distribution independentgeneralization∀µ, limn→∞|IS[fS] − I[fS]| = 0 in probabilityThis is equivalent to saying that for each n there exists a εnanda δnsuch that ∀µP {|ISn[fSn] − I[fSn]|≥εn}≤δn,with εnand δngoing to zero for n →∞.In other words, the training error for the solution must convergeto the expected error and thus be a “proxy” for it. Otherwise thesolution would not be “predictive”.A desirable additional requirement is universal consistency∀ε>0 limn→∞supµPS�I[fS] > inff ∈HI[f ]+ε�= 0.Tomaso Poggio Loose Ends: stability, various definitionsUniform StabilityLet us recall notation: S training set, Si,ztraining set obtainedreplacing the i-th example in S with a new point z =(x, y).DefinitionWe say that an algorithm A has uniform stability β (isβ-stable) if∀(S, z) ∈Zn+1, ∀i, ∀z�supz�∈Z|V (fS, z�) − V (fSi,z, z�)|≤β.Tomaso Poggio Loose Ends: stability, various definitionsRemarks: Uniform StabilityUniform stability is a strong requirement: a solution has tochange very little even when a very unlikely training set isdrawn.the coefficient β is a function of n, and should perhaps bewritten βn.Tomaso Poggio Loose Ends: stability, various definitionsCVlooStabilityWe first introduce the definition of Cross-Validationleave-one-out stability. Definition: The learning map L isdistribution-independent, CVloostable if uniformly for allprobability distributions µlimn→∞supi∈{1,...,n}|V (fSi, zi) − V (fS, zi)| = 0 in probability,where Sidenotes the training set S with the ith point removed.CVloostability measures the difference in errors at a point zibetween a function obtained given the entire training set andone obtained given the same training set but with the point zileft outTheorem A: For good loss functions the following statementsare equivalent for ERM:L is distribution-independent CVloostableERM generalizes and is universally consistentH is uniform Glivenko-Cantelli.Tomaso Poggio Loose Ends: stability, various definitionsRemarks: CVlooStabilityCVloostability is weaker than uniform stability because a) itis in probability and b) it is true for zinot for an arbitrary z.the definition of stability is about difference of the error on atraining point and the error on the same test point going tozero: it seems plausible that this may imply generalization.it turns out that with some additional technical conditionsCVloostability implies generalization independently ofERM.Tomaso Poggio Loose Ends: stability, various definitionsLoose Ends: online stabilityTomaso PoggioMarch 2010Tomaso Poggio Loose Ends: online stabilityBatch learning algorithmsWe consider sequentially independent and identicallydrawn samples from the distribution on Z . The training setS consists of n samples:S = {z1=(x1, y1),...,zn=(xn, yn)}.The expected error of of a function f is defined asI[f ]=�ZV (f, z)dµ(z)=EzV (f, z),which is also the expected error of a new sample z drawnfrom the distribution.The following quantity, called empirical error, can becomputed by a “batch” learning algorithm, given all thetraining data SIS[f ]=1nn�i=1V (f, zi).Tomaso Poggio Loose Ends: online stabilityOnline learningAlgorithms here take as inputs a hypothesis f ∈Hand a newexample z = x, y and return a new hypothesis f�∈H. Given aninput sequence S ∈ Znwith S = z1, ··· , zn, the onlinealgorithm will use z1and the zero hypothesis f0to generate thefirst hypothesis f1. After seeing the whole Znsequence thealgorithm has generated a sequence of hypothesis f0, ··· , fnand has “memory” only of the last example zn.Tomaso Poggio Loose Ends: online stabilityOnline learning algorithmsWe define as training error of an online algorithm atiteration nV (fn, zn)where the algorithm generates fnfrom fn−1after “seeing”zn.We define as average training error of an online algorithmat iteration nInemp=1nn�iV (fi, zi)where the algorithm generates fifrom fi−1after “seeing” zi.Tomaso Poggio Loose Ends: online stabilityThe notion of generalization is not appropriate foronline algorithmsAn algorithm is said to generalize if the function fSselected by itsatisfies for all S (|S| = n) and for any probability distribution µlimn→∞|I[fS] − IS[fS]| = 0 in probability.For an online algorithm that “forgets” past data, it is not naturalto define the empirical error. Generalization is not a naturalconcept for online algorithms. Consistency is meaningful foronline algorithms. We recall that an algorithm is (universally)consistent if for any distribution µ and any ε>0limn→∞P�I[fS] > inff ∈HI[f ]+ε�= 0.Tomaso Poggio Loose Ends: online stabilityA class project: can stability be at the core of onlinelearning?CV-like:�n< [−V (fn+1, zn+1)+V (fn, zn+1)] ≤ χnNotice that V (fn, zn+1) is the out-of-sample-error since fndoes not depend on zn+1whereas V (fn, zn) is thein-sample-error since fndepends on zn(and fn−1). Noticethat fndepends on zn: thus in [V (fn+1, zn+1)] thehypothesis fn+1is a function of zn+1(and of fn+1). Thusthis is a condition on the cross-validation error.The upper-bound above is key. It makes sure that theupdate of the hypothesis decreases the error on the newdata point (relative to the error on that point made by theprevious hypothesis that was formulated before “seeing”that point) – but not too much. Intuitively it guarantees thatoverfitting cannot occur.Tomaso Poggio Loose Ends: online stabilityA class project: can stability be at the core of onlinelearning?Notice that online regularization (which satisfies the conditionabove) ensures that Regret = o(T ) and this in turn ensuresconsistency of the online learning (Rakhlin, pers. comm.).Conjecture The CV-like condition is sufficient for consistencyof online learning.RemarkIf the conjecture is true, one could have algorithms which usedirectly stability (though they would be similar to the specialcase of online regularization). This may be


View Full Document

MIT 9 520 - Study References

Documents in this Course
Load more
Download Study References
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study References and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study References 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?