Unformatted text preview:

Feedforward theories of visual cortex predict human performance in rapid image categorization S1 and C1 unitsFrom S2 to S4From C2 to S4The model predicts several properties of cortical neuronsThe model can perform complex recognition task very wellHow does the model compare to human observers? Animal vs. non-animal categ.Training and testing the modelTraining the modelResults: ModelRapid categorization taskRapid categorization taskResults: Human-observers“Simpler” models cannot do the jobResults: Image orientationResults: Image orientationDetailed comparison Good agreement: Correctly rejectionsGood agreement: Correct detectionsDisagreementDisagreementDiscussion Discussion DiscussionDiscussionSpeculation!!SummaryCollaboratorsFeedforward theories of visual cortex predict human performance in rapid image categorization Thomas SerreCenter for Biological and Computational LearningMcGovern Institute for Brain ResearchBrain and Cognitive Sciences DepartmentModified from (Ungerleider & VanEssen)¾ Builds upon previous neurobiological models (Hubel & Wiesel, 1959; Fukushima, 1980; Oram & Perrett, 1993, Wallis & Rolls, 1997; Riesenhuber & Poggio, 1999)¾ General class of feedforward hierarchical models of object recognition in cortex¾ Biophysically plausible operations ¾ Predicts several properties of cortical neurons (Serre, Kouh, Cadieu, Knoblich, Kreiman, Poggio, 2005)¾ Generic dictionary of shape components (from V1 to IT)  Unsupervised learning during developmental-like stage From natural images unrelated to any categorization tasks¾ Task-specific circuits (from IT to PFC) Supervised learning Linear classifier trained to minimize classification error on the training set (~ RBF net)(Hubel & Wiesel, 1959)S1C14 orientations17 spatial frequenciesS1 and C1 unitsFrom S2 to S4S2 unit¾ Units are increasingly complex and invariant¾ e.g, combination of V1-like complex units at different orientationsFrom C2 to S4¾ 2,000 “features” at the C3 level ~ same number of feature columns in IT (Fujita et al, 1992)¾ Total ~6,000 types of features with various levels of complexity and invarianceTuning for boundary conformation(Pasupathy & Connor, 2001) (Reynolds, Chelazzi and Desimone, 1999)The model predicts several properties of cortical neurons In various cortical areas Examples from V4Tuning for two-bar stimuliV4 neurons (with attention directed away from receptive field)(Reynolds , Chelazzi and Desimone, 1999)(Serre, Kouh, Cadieu, Knoblich, Kreiman and Poggio, 2005)C2 unitsPrediction: Response of the pair is predicted to fall between the responses elicited by the stimuli aloneThe model can perform complex recognition task very well¾ At the level of some of the best computer vision systems¾ e.g, constellation models (Leung et al, 1995; Burl et al, 1998; Weber et al., 2000; Fergus et al, 2003; Li et al, 2004)rear-car airplane frontal face motorbike leafHow does the model compare to human observers?Animal vs. non-animal categ.¾ 1,200 stimuli (from Corel database)¾ 600 animals in 4 categories:  Head Close-body Medium-body Far-body and groups¾ 600 matched distractors (½ art., ½ nat.) to prevent reliance on low-level cues(Torralba & Oliva, 2003; Oliva & Torralba, in press)(Torralba & Oliva, 2003; Oliva & Torralba, in press)Training and testing the model¾ Random splits (good estimate of expected error)¾ Split 1,200 stimuli into two setsTraining TestTraining the model¾ Repeat 20 times¾ Average model performance over allTraining TestResults: ModelmodelRapid categorization taskAnimal presentor not ?30 ms ISI20 msImageInterval Image-MaskMask1/f noise80 ms(Thorpe et al, 1996; Van Rullen & Koch, 2003; Bacon-Mace et al, 2005; Oliva & Torralba, in press)Rapid categorization taskAnimal presentor not ?~ 50 ms SOA close to performance ceiling in (Bacon-Mace et al, 2005)ImageInterval Image-MaskMask1/f noise80 msec(Thorpe et al, 1996; VanRullen & Koch, 2003; Bacon-Mace et al, 2005; Oliva & Torralba, in press)Results: Human-observersmodel50 ms SOA (ISI=30 ms)“Simpler” models cannot do the jobmodel(Renninger & Malik, 2004)(Torralba & Oliva, 2001)(Serre, Oliva and Poggio, in prep)Model C1(n=24)50 ms SOA (ISI=30 ms)90 deginvertedupright(n=14)(Serre, Oliva and Poggio, in prep)Human observersRobustness to image orientation is in agreement with previous results (Rousselet et al, 2003; Guyonneau et al, ECVP 2005)Results: Image orientation50 ms SOA (ISI=30 ms)Results: Image orientation90 deginvertedupright(n=14)Human observers Model(Serre, Oliva and Poggio, in prep)50 ms SOA (ISI=30 ms)Detailed comparison ¾ For each individual image¾ How many times image classified as animal: For humans: across subjects For model: across 20 runsmodel humans¾ Heads: ρ=0.71 ¾ Close-body: ρ=0.84 ¾ Medium-body: ρ=0.71¾ Far-body: ρ=0.60Good agreement: Correctly rejectionsGood agreement: Correct detectionsDisagreementDisagreementDiscussion¾ The model predicts human performance extremely well when the delay between the stimulus and the mask, i.e. the SOA is ~50 ms¾ What happens for different SOAs?Discussion¾ Why should we except the model to account for human performance around 50 ms SOA?model20 ms SOA (ISI=0 ms)80 ms SOA (ISI=60 ms)50 ms SOA (ISI=30 ms)no mask condition(Serre, Oliva and Poggio, in prep)Discussion¾ What is so special with 50 ms SOA? Possible answer: 9Nothing!! 9Mask disrupts signal integration at the neural level9Model does not yet account for human level of performanceDiscussion¾ Alternative answer:  50 ms is a very long time!9Within 50 ms most of the information has already been transmitted from one stage to the next (Rolls et al, 1999; Vogels et al, 1995, Keysers et al, 2001)9Reading out from IT (~10-20ms):– both object category and identity– largely translation and scale invariant (Hung, Kreiman, Poggio, DiCarlo, 2005)¾ So what happened after the first 50 ms?Speculation!!¾ Our model is purely feedforward Only local feedback loops No feedback loopsTiming estimates are for monkeys, based on (Thorpe & Fabre-Thorpe, 2001) and (Thorpe, Personal communication)V1/V2V4ITPFC0-10 ms>40 ms>40 ms>40 ms¾ Feedback loops may already play a role for SOAs longer than 50 ms¾ Discrepancy for longer SOAs may be due to the cortical back-projectionsSummary¾ I have described a model that is faithful to the anatomy and physiology of the ventral stream of


View Full Document

MIT 9 459 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?