MIT 9 520 - Beyond a model, towards a theory - D2282857

Home> Schools> Massachusetts Institute of Technology> Brain and Cognitive Sciences (9) > 9 520> Beyond a model, towards a theory

DOC PREVIEW

MIT 9 520 - Beyond a model, towards a theory

School name Massachusetts Institute of Technology

Course 9 520- Statistical Learning Theory and Applications

Pages 34

This preview shows page 1-2-16-17-18-33-34 out of 34 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Derived Distance: Beyond a model, towards a theory OutlineWhy Hierarchies?Slide Number 4Slide Number 5Learning Invariant Representations: Derived DistanceSlide Number 7Derived Distance (2-layer case): DefinitionsDerived Distance (2-layer case): DefinitionsDerived Distance: TemplatesDerived Distance: Neural SimilarityDerived Distance: Neural Similarity (2)Derived Distance: Iterating the ProcessDerived Distance: Example (Top-Down/Recursive)Derived Distance: Example (2)Derived Distance: Example (3)Derived Distance: Connection to the CBCL ModelDerived Distance with Normalized KernelsDerived Distance: RemarksImage Classification ExperimentsImage Classification Experiments (2)Image Classification Experiments (3)Image Classification Experiments (4)Derived Distance: Three-layer String ExampleDerived Distance: Three-layer String Example (2)Derived Distance: Three-layer String Example (3)Derived Distance: Three-layer String Example (4)Derived Distance: Three-layer String Example (6)Derived Distance: ComplexityDerived Distance: ComplexityA Step Further: Learning Invariant Representations: SlownessA Step Further: Learning Invariant Representations: RCASummarySlide Number 34Derived Distance: Beyond a model, towards a theory9.520 April 23 2008Jake Bouvriework with Steve Smale, Tomaso Poggio, Andrea Caponnetto and Lorenzo RosascoReference:Smale, S., T. Poggio, A. Caponnetto, and J. Bouvrie. Derived Distance: towards a mathematical theory of visual cortex, CBCL Paper, Massachusetts Institute of Technology, Cambridge, MA, November, 2007.Outline• Motivation: why should we use hierarchical feature maps or learning architectures? What can iteration do for us?• Learning invariances: a brief introduction with two examples from the literature at the end.• Derived distance: towards a theory that can explain why the CBCL model works.• Derived distance: preliminary experiments.• Derived distance: open problems and future directions.How do the learning machines described in the theory compare with brains?One of the most obvious differences is the ability of people and animals to learn from very few examples. A comparison with real brains offers another, related, challenge to learning theory. The “learning algorithms”we have described in this paper correspond to one-layer architectures. Are hierarchical architectures with more layers justifiable in terms of learning theory? Why hierarchies? For instance, the lowest levels of the hierarchy may represent a dictionary of features that can be shared across multiple classification tasks. There may also be the more fundamental issue of sample complexity. Thus our ability of learning from just a few examples, and its limitations, may be related to the hierarchical architecture of cortex. In the limit: 1 ex.¾ Hierarchies can be used to Incorporate specific kinds of invariances…(vs. virtual examples).Notices of the American Mathematical Society (AMS), Vol. 50, No. 5,537-544, 2003.The Mathematics of Learning: Dealing with DataTomaso Poggio and Steve SmaleWhy Hierarchies?Figure: Bengio & LeCun, 2007Some Engineered Hierarchical Models…Neocognitron, from Fukushima et al., 1980Convolutional Neural Networks (LeCun)CBCL ModelFigure: T. SerreHinton’s Deep Autoencoderfrom: G. Hinton, Science 2007.…most with specific invariances built in…• Hierarchies can be used to incorporate particular pre- defined invariances in a straightforward manner, by e.g. pooling, and transformations.• Combinations of features, combinations of combinations agglomerate into a complex object or scene.• But what if we don’t know how to characterize variation in the data, or even know what kinds of variation we need to capture?• Nonparametric representations with random templates: look for patterns that we’ve seen before, whatever they might be.• Learning features and invariances automatically from sequential data, in an unsupervised setting. (Maurer, Wiskott, Caponnetto) – more on this later.Learning Invariant Representations• Steve Smale has proposed a simple yet powerful framework for constructing invariant representations with a hierarchy of associations.• Derived distance can be seen as a simplification of the CBCL model that lends itself well to analysis.• Some outstanding questions to be answered:-Does it reduce sample complexity? Poverty of the stimulus implies some additional apparatus.- Does it provide a separation of classes that is more useful for classification than just the image pixels?- If so, what are the optimal parameters?- How many layers are needed?- Does the distance converge to something interesting in the limit of templates or layers?Learning Invariant Representations: Derived Distance• Iterated analysis with arbitrary transforms and nonlinearities in between layers.• Template dictionaries at each layer encode objects in terms of similarities.• The set of templates give an empirical approximation to the true distribution of image patches.• First layer performs operations similar to template matching over the set of allowed transformations.• At higher layers, we work with representations based on previous layers’ templates.• Final output is the distance (or similarity) between two objects (images, strings,…)• Summary: Images are represented as a hierarchy of similarities to templates of increasing complexity, modulo scaling and translations.Derived Distance: SketchDerived Distance (2-layer case): Definitions• Consider an image defined on R,f : R → [0, 1]. f belongs to Im(R).• Domains in R2: v ⊂ v0⊂ R• Im(v),Im(v0) are subsets of restrictionsof the image f ∈ Im(R)tov and v0.• H is a set of transformationsh : v → v0of the form h = hβhαwithhα(x)=αx and hβ(x)=x + β.Similar definition for h0∈ H0withh0: v0→ R.-Note that here, the function h translates the entire image over the“receptive field” v. Usually we think of sliding a filter over the image…Smale, S., T. Poggio, A. Caponnetto, and J. Bouvrie. Derived Distance: towards a mathematical theory of visual cortex, CBCL Paper, Massachusetts Institute of Technology, Cambridge, MA, November, 2007.Derived Distance (2-layer case): DefinitionsA key property:• A patch of an image is isolated byrestricting the image to a transformeddomain via composition. f(h(x)) is animage itself from v → [0, 1]. Dependingon how h is chosen (via the parametersα, β) we get a transformed piece of f.Rv’vf ◦ hβ0= f

View Full Document