From zero to gist in 200 msec: The time course of scene recognition Aude Oliva & Michelle Greene Brain and Cognitive Sciences MIT SUnS 06A summary of the gist • Semantic categories (~ 20-50 msec, Potter, 1975; Schyns & Oliva, 1994; Thorpe et al., 1997; Rousselet et al., 2005; Greene & Oliva, 2005; Fei Fei et al., 2004; Renniger & Malik, 2002, Castelhano, 2005). •A few objects (~ 50 to 150 msec, Potter etal., 2002, 2004; Intraub, 1997; Grill-Spector & Kanwisher, 2004; Fei Fei et al., 2004; Greene & Oliva, in prep; Gordon, 2004; Wolfe, 1998) • Spatial layout properties (~ 20-30 msec, mean depth, Torralba & Oliva, 2002; openness, Greene & Oliva, 2005). • Surface properties (e.g. color distribution, Oliva & Schyns, 2000; Goffaux et al., 2005; temperature, Greene & Oliva, sub.). • High level semantic properties (30-50 msec, emotional valence, Maljkovic & Martini, 2005; events; Potter, 1975, 2002). The gist of a scene corresponds to a verbal description ofall levels of information (Molly Potter)Global to Local Scene Representation Seeing the forest before the trees (Navon,1977) but the trees compose the forest …Scene-Centered Representation: Global Properties to Scene Category Seeing that {enclosed + textured +camouflaged + expansive space}compose the forest … Greene & Oliva (submitted), Oliva & Torralba (2001)Scene-Centered Representation Global Properties Scene Category Enclosed space High roughness Medium size volume High degree of expansion High degree of navigability Bilateral symmetry … This would explain how we see the “forest” before the “trees” Oliva & Torralba (2001). Greene & Oliva (submitted)Scene-Centered Representation 1) What is the vocabulary of useful global properties? (properties describing the spatial layout and function of the scene) 2) When are the global properties perceived during the course of a glance? 3) What is the relation between global properties and scene category? ForestVocabulary of Global properties As a scene is inherently a 3D entity, Oliva & Torralba (2001) proposed that scene recognition could be based on properties diagnostic of the space that the scene subtends. What are the global properties common to all these streets? Degree of clutter, openness, perspective, roughnessVocabulary of Global Properties Description of the “gist” of the scene Degree of Navigation • Spatial layout properties (e.g. openness, expansion, roughness, mean depth, Spatial Envelope Properties, Oliva & Torralba, 2001, 2002) • Functional properties (e.g. potentiality for navigation, Degree of Camouflagecamouflage, Greene & Oliva, 2005, submitted) • Surface_based properties (e.g. color distribution; texture and material properties)Scene-Centered Representation 1) What is the vocabulary of diagnostic global properties? 2) When are the global properties perceived during the course of a glance? 3) What is the causal relation between global properties and scene category? Forest !Time course of global properties Method: What is the presentation time permitting a 75% correct detection? (Task: yes-no forced choice: is the scene open? Is the scene a forest?) Greene & Oliva (in preparation)Time course of global properties Greene & Oliva (in preparation)Scene-Centered Representation 1) What is the vocabulary of diagnostic global properties? 2) When are the global properties perceived during the course of a glance? 3) What is the relation between global properties and scene category? Forest !From Global properties to category • Method: 10 observers ranked 200 natural scene images(from 8 semantic categories)along 7 global propertiesrelevant for scene gist • Spatial layout properties(mean depth, openness,expansion) • Functional properties(degree of navigability, level ofcamouflage) • Surface-based properties(degree of “movement”,temperature) • Each image is represented bya vector of 7 global properties Greene & Oliva (submitted)From Global properties to category Each semantic category can be described by its magnitude along each of the seven global properties. Each semantic category has a specific “global property signature” Greene & Oliva (submitted)Scene Categorization model One can train a classifier to take only these 7 values as input and predict the correct semantic category of a novel scene (an ideal observer which takes the maximum likelihood category summed over all the global properties). Comparison are made with correct categorization given by human observers seeing each scene for only 30 msec. {Medium/high temperature Low camouflage High expansiveness Large depth High navigability High openness High movement} Î “desert” Greene & Oliva (submitted)Comparison model – human in progress •Task: detecting whether an image presented for 30 msec belongs to a particular semantic category (e.g. forest) among a distractor set that share a particular global property. Closed scene distractors 18% false alarms Open scene distractors 9% false alarms Target category • False alarms from the classifier model are correlated with the false alarms made by human observers. Greene & Oliva (in progress)Conclusion • A scene-centered representation based on globalproperties of a scene is a valid approach for scene gistidentification: it provides both the “semantic category” of the image and a description of spatial layout andfunctional properties of the scene. • It is not necessary to describe the regions and objects ofa scene to recognize its semantic category. • Global properties are indeed available for processing at the early stage of the glance (~20-30 msec after image
View Full Document