Unformatted text preview:

A bottom–up model of spatial attention predicts humanerror patterns in rapid scene recognitionDivision of Biology, California Institute of Technology,Pasadena CA, USAWolfgang EinhäuserDepartment of Computer Science,University of Southern California,Los Angeles, CA, USAT. Nathan MundhenkSchool of Information and Computer Sciences,University of California in Irvine,Irvine, CA, USAPierre BaldiDivision of Biology, California Institute of Technology,Pasadena CA, USAChristof KochDepartment of Computer Science,University of Southern California,Los Angeles, CA, USALaurent IttiHumans demonstrate a peculiar ability to detect complex targets in rapidly presented natural scenes. Recent studiessuggest that (nearly) no focal attention is required for overall performance in such tasks. Little is known, however, of howdetection performance varies from trial to trial and which stages in the processing hierarchy limit performance: bottom–upvisual processing (attentional selection and/or recognition) or top–down factors (e.g., decision-making, memory, or alertnessfluctuations)? To investigate the relative contribution of these factors, eight human observers performed an animal detectiontask in natural sce nes presented at 20 Hz. Trial-by-trial performance was highly consistent across observers, far exceedingthe prediction of independent errors. This consistency demonstrates that performance is not primarily limited byidiosyncratic factors but by visual processing. Two statistical stimulus properties, contrast variation in the target imageand the information-theoretical measure of “surprise” in adjacent images, predict performance on a trial-by-trial basis. Thesemeasures are tightly related to spatial attention, demonstrating that spatial attention and rapid target detection sharecommon mechanisms. To isolate the causal contribution of the surprise measure, eight additional observers performed theanimal detection task in sequences that were reordered versions of those all subjects had correctly recognized in the firstexperiment. Reordering increased surprise before and/or after the target while keeping the target and distractorsthemselves unchanged. Surprise enhancement impaired target detection in all observers. Consequently, and contrary toseveral previously published findings, our results demonstrate that attentional limitations, rather than target recognitionalone, affect the detection of targets in rapidly presented visual sequences.Keywords: psychophysics, modeling, attention, saliency, RSVPCitation: Einhäuser, W., Mundhenk, T. N., Baldi, P., Koch, C., & Itti, L. (2007). A bottom–up model of spatial attentionpredicts human error patterns in rapid scene recognition. Journal of Vision, 7(10):6, 1–13, http://journalofvision.org/7/10/6/,doi:10.1167/7.10.6.IntroductionHumans and other primates grasp the “gist” of acomplex natural scene even when presented for only afew tens of milliseconds (Biederman, 1981; Evans &Treisman, 2005; Fabre-Thorpe, Richard, & Thorpe, 1998;Li, VanRullen, Koch, & Perona, 2002; Potter & Levy,1969; Rousselet, Fabre-Thorpe, & Thorpe, 2002; Thorpe,Fize, & Marlot, 1996; VanRullen & Thorpe, 2001).Furthermore, observe rs can detect with above-ch anceperformance complex target items (such as an animal) inrapidly presented image sequences (rapid serial visualpresentation [RSVP]; Evans & Treisman, 2005; Potter &Levy, 1969). Such performance is typically seen asevidence for a rapid, sensory-driven (“bottom–up”) modeof processing, primarily driven by the visual stimulus.This leads to the hypothesis that properties o f t hestimulus, rather than observer-specific and possibly moreidiosyncratic top–down processes, may, to a large extent,determine performance in RSVP. If so, what are thesestatistical properties?Journal of Vision (2007) 7(10):6, 1–13 http://journalofvision.org/7/10/6/ 1doi: 10.1167/7.10.6 Received December 3, 2006; published June 20, 2007 ISSN 1534-7362 * ARVOIt has been argued that rapid recognition requires littleor no focal spatial attention (Li et al., 2002; Rousseletet al., 2002). According to this view, bottom–up attentiondoes not constitute the primary limit for rapid visualprocessing, but rather, such a limit is found in a latertarget recognition stage. Indeed, some aspects of overallperformance can be captured by models of objectrecognition; for example, animals that appear farther awayare more difficult to detect on average (Serre, Oliva, &Poggio, 2006). However, these studies typically useisolated images followed by masking stimuli. Contraryto these results, when using a stream of images, some ofwhich are targets and most of which act as distractors, onefinds two attentional phenomena that limit rapid process-ing: When two identical items are presented in directsuccession, often only one is detected (“repetition blind-ness”; Kanswisher, 1987), and when a second target itemis presented shortlyVbut not immediatelyVafter a firstone, its processing is also impaired (“attentional blink”;Raymond, Shapiro, & Arnell, 1992). Although repetitionblindness and attentional blink are distinct phenomena(Chun, 1997), models of such attentional impairments aretypically variants of an attentional gating model, as firstformalized by Reeves and Sperling (1986): In this view, asalient item (e.g., a target) opens an “attentional gate” forits and subsequent items’ access to visual short-termmemory. Failure to quickly reopen the gate impairs thedetection of the second target in attentional blink;furthermore, integration of information according to orderand strength within an open gate epoch leads to the loss oforder information, a potential cause for repetition blind-ness. In attentional blink, the saliency of an item to open agate arises from its property of being a target orsemantically related to the target (Barnard, Scott, Taylor,May, & Knightley, 2004). Items that attract attentionbecause of their emotional content can also lead to anattentional-blink-like recognition impairment, whichsome, but not all, observers can overcome throughvolitional control (Most, Chun, Widders, & Zald, 2005).Similarly, odd items (e.g., the rare occurrence of a face ina letter task or vice versa) can impair subsequentprocessing (Marois, Todd, & Gilbert, 2003), as can itemsthat are visually similar to the target but appear atperipheral locations (Folk, Leber, & Egeth, 2002). How-ever, very little is known quantitatively of the neuralmechanisms by which


View Full Document

UT PSY 394U - Lecture Notes

Documents in this Course
Roadmap

Roadmap

6 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?