Unformatted text preview:

DRAFTI-POMDP: An Infomax Model of Eye MovementNicholas J. ButkoDepartment of Cognitive ScienceUniversity of California, San DiegoLa Jolla, CA 92093-0515Email: [email protected] R. MovellanInstitute for Neural ComputationLa Jolla, CA 92093-0445Email: [email protected]—Modeling eye-movements during search is importantfor building intelligent robotic vision systems, and for under-standing how humans select relevant information and structurebehavior in real time. Previous models of visual search (VS) relyon the idea of “saliency maps” which indicate likely locationsfor targets of interest. In these models the eyes move to locationswith maximum saliency. This approach has several drawbacks:(1) It assumes that occulomotor control is a greedy process, i.e.,every eye movement is planned as if no further eye movementswould be possible after it. (2) It does not account for temporaldynamics and how information is integrated as over time. Toaddress these limitations, we reformulate the problem of VS asan Information-gathering Partially Observable Markov DecisionProcess (I-POMDP).We find that contrary to looking to the most likely targetlocations, as assumed by most current models, an optimalcontroller should avoid looking directly at the most likely targets.We also find that the optimal control law depends heavily onthe Foveal-Peripheral Operating Characteristic (FPOC) of thevisual system. This argues against previous “one-size-fits-all”approaches that have tried to make robotic cameras that movetheir eyes like humans. We show how optimal visual searchbehavior for standard robotic cameras differs from optimalbehavior of a human eye.I. INTRODUCTIONPersonal robots must juggle the varied demands of everydaylife in an intelligent way. To do this effectively, they mustmake sense of data inundating their sensors. It is importantto know which data can be ignored and which data must beprocessed. Camera movements are a prototypical attentionalcontrol mechanism, selecting small regions of the visual worldto attend moment-to-moment. Moreover, a social robot thatmove her sensors in a purposive way will appear intelligentand lifelike to the humans interacting with her (Figure 1b).Studying human eye-movement highlights principles thatmay be important for robots. Past years have seen an explosionof publications about computational models of visual saliency,which are evaluated in terms of how well they describewhere humans will tend to fixate within an image or video(Figure 1a). Recently, these models have become fast enoughto run in low-end computers in faster-than-real-time, whilemaintaining competitive accuracy [1]. This gives a robot away to automatically choose where to look, and still haveprocessing power left over for other tasks.Models of visual salience can usually be categorized asdescriptive, or prescriptive. Descriptive models try to matchhuman data directly, either by following psychological theories(a) Example Saliency Map (b) Robot Joint AttentionFig. 1: (a) The saliency-map approach to robot eye-movementsmodels where humans would look in an image. Top: The originalimage is the input to a saliency algorithm. Bottom: Bright regionsare found salient by the algorithm. (b) When robots move their eyesin a fashion similar to humans, compelling feelings of intelligenceare created, and joint attention is observed.(e.g. [2] which models Feature-Integration-Theory [3]), or bydirectly fitting models to human data [4].In contrast, prescriptive saliency models try to uncover theunderlying computational objectives that organize occulomo-tor control. A popular choice is to frame the goal of eye-movement as Visual Search (VS), i.e. finding targets within avisual array. These models postulate that an intelligent agentshould fixate its visual sensors on regions of the array wherethe target is most likely to be, given visual features acrossthe whole array. Thus saliency for each pixel x in the visualarray is related to probability that it is rendered by a class ofinterest, Cx= 1. This can be framed mathematically asSalience(x) = p(Cx= 1|Image)=p(Image|Cx= 1)p(Cx= 1)p(Image)(1)Many salience algorithms, e.g. [5]–[8], can be seen as specialcases of this framework. See [1] for a more thorough review.While the above framework is compelling and predicts wellwhere on average humans will look in unconstrained tasks,it has important limitations: (1) The framework is atemporal,i.e. it gives no principled account for the order of saccades.DRAFT(2) There is no mechanism for integrating information acrossfixations. After a robot saccades, how does she update hersaliency map based on what she saw? In fact, saliency modelsalways fixate the same maximally salient location unlessInhibition of Return (IOR) is added. In practice, this meanssalience is subtracted around the currently fixated location.The use of IOR is an ad hoc procedure rather than emergingproperty derived from a formal computational framework. (3)Existing salience models are not explicit in their assumptionsof the Foveal-Peripheral Operating Characteristic (FPOC), andcannot account for how behavior should ideally differ acrossindividuals with different FPOCs, like human infants, humanadults, and robots. (4) In VS, is the best strategy to reallyattempt to foveate the object, or can we get more information,in the long run, by leaving it in the periphery? Saliency modelspostulate a reasonable control strategy (direct foveation oflikely targets) without testing for its optimality.To address these concerns, we frame occulomotor control asa problem in stochastic optimal control. We start with a psy-chophysical model of visual perception proposed by Najemnik& Geisler [9] and reformulate it as an Information-gatheringPartially Observable Markov Decision Process (I-POMDP).We design an I-POMDP with parameters fit to human data,as well as one designed to model robotic vision. We findthat optimal control laws for both make an effort to avoidlooking at locations where targets are likely to be, indicatingthat commonly accepted postulates about visual search can beimproved upon. We also find that optimal looking behaviorchanges with the characteristics of the imaging system, inparticular the relative resolution of the foveal and peripheralregions. This argues against the “one-size-fits-all” approachtaken previously that assumed that robotic eye-movementstrategies should be the same as those in humans.II. VISUAL SEARCH MODELNajemnik &


View Full Document

UT PSY 394U - Study Notes

Documents in this Course
Roadmap

Roadmap

6 pages

Load more
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?