Unformatted text preview:

I POMDP An Infomax Model of Eye Movement Nicholas J Butko Javier R Movellan Department of Cognitive Science University of California San Diego La Jolla CA 92093 0515 Email nbutko cogsci ucsd edu Institute for Neural Computation La Jolla CA 92093 0445 Email movellan mplab ucsd edu RA FT Abstract Modeling eye movements during search is important for building intelligent robotic vision systems and for understanding how humans select relevant information and structure behavior in real time Previous models of visual search VS rely on the idea of saliency maps which indicate likely locations for targets of interest In these models the eyes move to locations with maximum saliency This approach has several drawbacks 1 It assumes that occulomotor control is a greedy process i e every eye movement is planned as if no further eye movements would be possible after it 2 It does not account for temporal dynamics and how information is integrated as over time To address these limitations we reformulate the problem of VS as an Information gathering Partially Observable Markov Decision Process I POMDP We find that contrary to looking to the most likely target locations as assumed by most current models an optimal controller should avoid looking directly at the most likely targets We also find that the optimal control law depends heavily on the Foveal Peripheral Operating Characteristic FPOC of the visual system This argues against previous one size fits all approaches that have tried to make robotic cameras that move their eyes like humans We show how optimal visual search behavior for standard robotic cameras differs from optimal behavior of a human eye a Example Saliency Map b Robot Joint Attention Fig 1 a The saliency map approach to robot eye movements models where humans would look in an image Top The original image is the input to a saliency algorithm Bottom Bright regions are found salient by the algorithm b When robots move their eyes in a fashion similar to humans compelling feelings of intelligence are created and joint attention is observed I I NTRODUCTION D Personal robots must juggle the varied demands of everyday life in an intelligent way To do this effectively they must make sense of data inundating their sensors It is important to know which data can be ignored and which data must be processed Camera movements are a prototypical attentional control mechanism selecting small regions of the visual world to attend moment to moment Moreover a social robot that move her sensors in a purposive way will appear intelligent and lifelike to the humans interacting with her Figure 1b Studying human eye movement highlights principles that may be important for robots Past years have seen an explosion of publications about computational models of visual saliency which are evaluated in terms of how well they describe where humans will tend to fixate within an image or video Figure 1a Recently these models have become fast enough to run in low end computers in faster than real time while maintaining competitive accuracy 1 This gives a robot a way to automatically choose where to look and still have processing power left over for other tasks Models of visual salience can usually be categorized as descriptive or prescriptive Descriptive models try to match human data directly either by following psychological theories e g 2 which models Feature Integration Theory 3 or by directly fitting models to human data 4 In contrast prescriptive saliency models try to uncover the underlying computational objectives that organize occulomotor control A popular choice is to frame the goal of eyemovement as Visual Search VS i e finding targets within a visual array These models postulate that an intelligent agent should fixate its visual sensors on regions of the array where the target is most likely to be given visual features across the whole array Thus saliency for each pixel x in the visual array is related to probability that it is rendered by a class of interest Cx 1 This can be framed mathematically as Salience x p Cx 1 Image p Image Cx 1 p Cx 1 p Image 1 Many salience algorithms e g 5 8 can be seen as special cases of this framework See 1 for a more thorough review While the above framework is compelling and predicts well where on average humans will look in unconstrained tasks it has important limitations 1 The framework is atemporal i e it gives no principled account for the order of saccades S A set of states that cannot be directly observed by the agent St i 1 N corresponds to the event that the target is at location i at time t A A set of actions that the agent can take At k 1 N corresponds to the event that the agent s center of fixation is at location k at time t O A set of observations that can be made by the agent t RN is a vector with elements Otj that correspond O to noisy sensor evidence at time t about whether or not the target is at location j in the visual array p St 1 St At Dynamics How the state changes based on the agent s actions The current task has a stationary target so p St 1 St At 1 if St 1 St 0 otherwise p Ot St At Observation model How states and actions combine to yield observations Section II A A critical concept in POMDPs is the Belief State vector t 0 1 N in which the ith element Bti represents the B probability that the target is in visual array location i given all of the agent s previous eye movements and observations def i e Bti p St i A1 t 1 O1 t 1 A well known result in the theory of POMDPs is that the belief state vector at time t can be calculated based only on a single previous eye t 1 and belief vector movement At 1 set of observations O t 1 Specifically B RA FT 2 There is no mechanism for integrating information across fixations After a robot saccades how does she update her saliency map based on what she saw In fact saliency models always fixate the same maximally salient location unless Inhibition of Return IOR is added In practice this means salience is subtracted around the currently fixated location The use of IOR is an ad hoc procedure rather than emerging property derived from a formal computational framework 3 Existing salience models are not explicit in their assumptions of the Foveal Peripheral Operating Characteristic FPOC and cannot account for how behavior should ideally differ across individuals with different FPOCs like human infants human adults and robots 4 In VS is the best strategy to really attempt to foveate the object or can we get more


View Full Document

UT PSY 394U - Study Notes

Documents in this Course
Roadmap

Roadmap

6 pages

Load more
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?