Early Integration of Vision and ManipulationGiorgio MettaLIRA-Lab, DISTUniversity of GenovaGenova, [email protected] FitzpatrickArtifical Intelligence LabMassachusetts Institute of TechnologyCambridge, MA, [email protected] 25, 2002Address correspondence to Giorgio Metta, LIRA-Lab, DIST – University of GenovaViale Causa, 13 – I-16145, Genova, Italy(phone: +39 0103532791, fax: +39 0103532948).AbstractVision and manipulation are inextricably intertwined in the primate brain. Tantalizing results fromneuroscience are shedding light on the mixed motor and sensory representations used by the brain duringreaching, grasping, and object recognition. We now know a great deal about what happens in the brainduring these activities, but not necessarily why. Is the integration we see functionally important, orjust a reflection of evolution’s lack of enthusiasm for sharp modularity? We wish to instantiate theseresults in robotic form to probe their technical advantages and to find any lacunae in existing models.We begin with a precursor to manipulation, simple poking and prodding, and show how it facilitatesobject segmentation, a long-standing problem in machine vision. The robot can familiarize itself with theobjects in its environment by acting upon them. It can then recognize other actors (such as humans) inthe environment through their effect on the objects it has learned about. We argue that following causalchains of events out from the robot’s body into the environment allows for a very natural developmentalprogression of visual competence, and relate this idea to results in neuroscience.keywords: humanoid robotics, active segmentation, epigenesisrunning title: Vision and ManipulationEarly Integration of Vision and ManipulationGiorgio MettaLIRA-Lab, DISTUniversity of GenovaGenova, [email protected] FitzpatrickArtifical Intelligence LabMassachusetts Institute of TechnologyCambridge, MA, [email protected] 25, 2002AbstractVision and manipulation are inextricably intertwined in the primate brain. Tantalizing results fromneuroscience are shedding light on the mixed motor and sensory representations used by the brain duringreaching, grasping, and object recognition. We now know a great deal about what happens in the brainduring these activities, but not necessarily why. Is the integration we see functionally important, orjust a reflection of evolution’s lack of enthusiasm for sharp modularity? We wish to instantiate theseresults in robotic form to probe their technical advantages and to find any lacunae in existing models.We believe it would be missing the point to investigate this on a platform where dextrous manipulationand sophisticated machine vision are already implemented in their mature form, and instead follow adevelopmental approach from simpler primitives.We begin with a precursor to manipulation, simple poking and prodding, and show how it facilitatesobject segmentation, a long-standing problem in machine vision. The robot can familiarize itself with theobjects in its environment by acting upon them. It can then recognize other actors (such as humans) inthe environment through their effect on the objects it has learned about. We argue that following causalchains of events out from the robot’s body into the environment allows for a very natural developmentalprogression of visual competence, and relate this idea to results in neuroscience.1 Vision, action, and developmentRobots and animals are actors in their environment, not simply passive observers. They have the opportunityto examine the world using causality, by performing probing actions and learning from the response. Tracingchains of causality from motor action to perception (and back again) is important both to understand how thebrain deals with sensorimotor coordination and to implement those same functions in an artificial system,such as a humanoid robot. In this paper, we propose that such causal probing can be arranged in adevelopmental sequence leading to a manipulation-driven representation of objects. We present results formany important steps along the way, and describe how they fit in a larger scale implementation. And wediscuss in what sense our artificial implementation is substantially in agreement with neuroscience.Table 1 shows three levels of causal complexity that we address in the paper. The simplest causal chainthat an actor – whether robotic or biological – may experience is the perception of its own actions. Thetemporal aspect is immediate: visual information is tightly synchronized to motor commands. Once thiscausal connection is established, we can go further and use it to actively explore the boundaries of objects.In this case, there is one more step in the causal chain, and the temporal nature of the response may bedelayed since initiating a reaching movement doesn’t immediately elicit consequences in the environment.Finally we argue that extending this causal chain further will allow the actor to make a connection betweenits own actions and the actions of another. This is reminiscent of what has been observed in the response ofthe monkey’s premotor cortex.10001000010111110001000010427859124618232117317113432423710155061325a cross a binary cross?Figure 1: On the left are three examples of crosses, following (Manzotti and Tagliasco, 2001). The humanability to segment objects is not general-purpose, and improves with experience. On the right is an image ofa cube on a table, illustrating the ambiguities that plague machine vision. The edges of the table and cubehappen to be aligned (dashed line), the colors of the cube and table are not well separated, and the cubehas a potentially confusing surface pattern.We wished to keep the actions implemented on our robotic system as simple as possible, to avoid obscuringthe core issue of development behind an elaborately engineered dextrous system. We found that simple pokinggestures (prodding, tapping, swiping, batting, etc.) were rich enough to evoke object affordances such asrolling and to provide the kind of training data needed to bootstrap perception.type of activity nature of causation time profilesensorimotor coordination direct causal chain strict synchronyobject probing one level of indirection fast onset upon contact, po-tential for delayed effectsconstructing mirror represen-tationcomplex causation involvingmultiple causal chainsarbitrarily delayed onset andeffectsTable 1:
View Full Document