MIT 9 520 - Lecture Notes - D2982389

Home> Schools> Massachusetts Institute of Technology> Brain and Cognitive Sciences (9) > 9 520> Lecture Notes

DOC PREVIEW

MIT 9 520 - Lecture Notes

School name Massachusetts Institute of Technology

Course 9 520- Statistical Learning Theory and Applications

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

HMAX ModelsArchitectureJim MutchMarch 31, 2010• Basic concepts:– Layers, operations, features, scales, etc.– Will use one particular model for illustration; concepts apply generally.• Introduce software.• Model variants.– Attempts to find best parameters.• Some current challenges.TopicsExample Model• Our best-performing model for multiclass categorization (with a few simplications).• Similar to:– J. Mutch and D.G. Lowe. Object class recognition and localization using sparse features with limited receptive fields. IJCV 2008.– T. Serre, L. Wolf, and T. Poggio. Object recognition with features inspired by visual cortex. CVPR 2005.• Results on the Caltech 101 database: around 62%.• State of the art is in the high 70s using multiple kernel approaches.Layers• A layer is a 3-D array of units which collectively represent the activity of some set of features (F) at each location in a 2-D grid of points in retinal space (X, Y).• The number and kind of features change as you go higher in the model.– Input: only one feature (pixel intensity).– S1 and C1: responses to gabor filters of various orientations.– S2 and C2: responses to more complex features.Common Retinal Coordinate System for (X, Y)• The number of (X, Y) positions in a layer gets smaller as you go higher in the model.– (X,Y) indices aren’t meaningful across layers.• However: each layer’s cells still cover the entire retinal space.– With wider spacing.– With some loss near the edges.• Each cell knows its (X, Y) center in a real-valued retinal coordinate system that is consistent across layers.– Keeping track of this explicitly turns out to simplify some operations.Scale Invariance• Finer scales have more (X, Y) positions.• Each such position represents a smaller region of the visual field.• Not all scales are shown (there are 12 in total).• In a single visual cortical area (e.g. V1) you will find cells tuned to different spatial scales.• For simplicity in our computational models, we represent different spatial scales using multiple layers.…Operations• Every cell is computed using cells in layer(s) immediately below as inputs.• We always pool over a local region in (X, Y) …… sometimes over one scale at a time.… sometimes over multiple scales (tricky!)… sometimes over multiple feature types.S1 (Gabor Filter) Layers• Image (at finest scale) is [256 x 256 x 1].• Only 1 feature at each grid point: image intensity.• Center 4 different gabor filters over each pixel position.• Resulting S1 layer (at finest scale) is [246 x 246 x 4].• Can’t center filters over pixels near edges.• Actual gabors are 11 x 11.S1 (Gabor Filter) Layers• Image (at finest scale) is [256 x 256 x 1].• Only 1 feature at each grid point: image intensity.• Center 4 different gabor filters over each pixel position.• Resulting S1 layer (at finest scale) is [246 x 246 x 4].• Can’t center filters over pixels near edges.• Actual gabors are 11 x 11.S1 (Gabor Filter) Layers• Image (at finest scale) is [256 x 256 x 1].• Only 1 feature at each grid point: image intensity.• Center 4 different gabor filters over each pixel position.• Resulting S1 layer (at finest scale) is [246 x 246 x 4].• Can’t center filters over pixels near edges.• Actual gabors are 11 x 11.C1 (Local Invariance) Layers• S1 layer (finest scale) is [246 x 246 x 4].• For each orientation we compute a local maximum over (X, Y) and scale.• We also subsample by a factor of 5 in both X and Y.• Resulting C1 layer (finest scale) is [47 x 47 x 4].• Pooling over scales is tricky to define because adjacent scales differ by non-integer multiples. The common, real-valued coordinate system helps.C1 (Local Invariance) Layers• S1 layer (finest scale) is [246 x 246 x 4].• For each orientation we compute a local maximum over (X, Y) and scale.• We also subsample by a factor of 5 in both X and Y.• Resulting C1 layer (finest scale) is [47 x 47 x 4].• Pooling over scales is tricky to define because adjacent scales differ by non-integer multiples. The common, real-valued coordinate system helps.C1 (Local Invariance) Layers• S1 layer (finest scale) is [246 x 246 x 4].• For each orientation we compute a local maximum over (X, Y) and scale.• We also subsample by a factor of 5 in both X and Y.• Resulting C1 layer (finest scale) is [47 x 47 x 4].• Pooling over scales is tricky to define because adjacent scales differ by non-integer multiples. The common, real-valued coordinate system helps.S2 (Intermediate Feature) Layers• C1 layer (finest scale) is [47 x 47 x 4].• We now compute the response to (the same) large dictionary of learned features at each C1 grid position (separately for each scale).• Each feature is looking for its preferred stimulus: a particular local combination of different gabor filter responses (each of which is already locally invariant).• Features can be of different sizes in (X, Y).• Resulting S2 layer (finest scale) is [44 x 44 x 4000].• The dictionary is learned by sampling from the C1 layer of training images.– Can decide to ignore some orientations at each position:4000featuresS2 (Intermediate Feature) Layers• C1 layer (finest scale) is [47 x 47 x 4].• We now compute the response to (the same) large dictionary of learned features at each C1 grid position (separately for each scale).• Each feature is looking for its preferred stimulus: a particular local combination of different gabor filter responses (each of which is already locally invariant).• Features can be of different sizes in (X, Y).• Resulting S2 layer (finest scale) is [44 x 44 x 4000].• The dictionary is learned by sampling from the C1 layer of training images.– Can decide to ignore some orientations at each position:4000featuresS2 (Intermediate Feature) Layers• C1 layer (finest scale) is [47 x 47 x 4].• We now compute the response to (the same) large dictionary of learned features at each C1 grid position (separately for each scale).• Each feature is looking for its preferred stimulus: a particular local combination of different gabor filter responses (each of which is already locally invariant).• Features can be of different sizes in (X, Y).• Resulting S2 layer (finest scale) is [44 x 44 x 4000].• The dictionary is learned by sampling from the C1 layer of training images.– Can decide to ignore some orientations at each position:4000featuresC2 (Global

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-21-22 out of 22 pages.

MIT 9 520 - Lecture Notes

Sign up for free to view:

Please select your school