DOC PREVIEW
SWARTHMORE CS 97 - Early Integration of Vision and Manipulation

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Early Integration of Vision and ManipulationGiorgio MettaLIRA-Lab, DISTUniversity of GenovaGenova, [email protected] FitzpatrickArtifical Intelligence LabMassachusetts Institute of TechnologyCambridge, MA, [email protected] 25, 2002Address correspondence to Giorgio Metta, LIRA-Lab, DIST – University of GenovaViale Causa, 13 – I-16145, Genova, Italy(phone: +39 0103532791, fax: +39 0103532948).AbstractVision and manipulation are inextricably intertwined in the primate brain. Tantalizing results fromneuroscience are shedding light on the mixed motor and sensory representations used by the brain duringreaching, grasping, and object recognition. We now know a great deal about what happens in the brainduring these activities, but not necessarily why. Is the integration we see functionally important, orjust a reflection of evolution’s lack of enthusiasm for sharp modularity? We wish to instantiate theseresults in robotic form to probe their technical advantages and to find any lacunae in existing models.We begin with a precursor to manipulation, simple poking and prodding, and show how it facilitatesobject segmentation, a long-standing problem in machine vision. The robot can familiarize itself with theobjects in its environment by acting upon them. It can then recognize other actors (such as humans) inthe environment through their effect on the objects it has learned about. We argue that following causalchains of events out from the robot’s body into the environment allows for a very natural developmentalprogression of visual competence, and relate this idea to results in neuroscience.keywords: humanoid robotics, active segmentation, epigenesisrunning title: Vision and ManipulationEarly Integration of Vision and ManipulationGiorgio MettaLIRA-Lab, DISTUniversity of GenovaGenova, [email protected] FitzpatrickArtifical Intelligence LabMassachusetts Institute of TechnologyCambridge, MA, [email protected] 25, 2002AbstractVision and manipulation are inextricably intertwined in the primate brain. Tantalizing results fromneuroscience are shedding light on the mixed motor and sensory representations used by the brain duringreaching, grasping, and object recognition. We now know a great deal about what happens in the brainduring these activities, but not necessarily why. Is the integration we see functionally important, orjust a reflection of evolution’s lack of enthusiasm for sharp modularity? We wish to instantiate theseresults in robotic form to probe their technical advantages and to find any lacunae in existing models.We believe it would be missing the point to investigate this on a platform where dextrous manipulationand sophisticated machine vision are already implemented in their mature form, and instead follow adevelopmental approach from simpler primitives.We begin with a precursor to manipulation, simple poking and prodding, and show how it facilitatesobject segmentation, a long-standing problem in machine vision. The robot can familiarize itself with theobjects in its environment by acting upon them. It can then recognize other actors (such as humans) inthe environment through their effect on the objects it has learned about. We argue that following causalchains of events out from the robot’s body into the environment allows for a very natural developmentalprogression of visual competence, and relate this idea to results in neuroscience.1 Vision, action, and developmentRobots and animals are actors in their environment, not simply passive observers. They have the opportunityto examine the world using causality, by performing probing actions and learning from the response. Tracingchains of causality from motor action to perception (and back again) is important both to understand how thebrain deals with sensorimotor coordination and to implement those same functions in an artificial system,such as a humanoid robot. In this paper, we propose that such causal probing can be arranged in adevelopmental sequence leading to a manipulation-driven representation of objects. We present results formany important steps along the way, and describe how they fit in a larger scale implementation. And wediscuss in what sense our artificial implementation is substantially in agreement with neuroscience.Table 1 shows three levels of causal complexity that we address in the paper. The simplest causal chainthat an actor – whether robotic or biological – may experience is the perception of its own actions. Thetemporal aspect is immediate: visual information is tightly synchronized to motor commands. Once thiscausal connection is established, we can go further and use it to actively explore the boundaries of objects.In this case, there is one more step in the causal chain, and the temporal nature of the response may bedelayed since initiating a reaching movement doesn’t immediately elicit consequences in the environment.Finally we argue that extending this causal chain further will allow the actor to make a connection betweenits own actions and the actions of another. This is reminiscent of what has been observed in the response ofthe monkey’s premotor cortex.10001000010111110001000010427859124618232117317113432423710155061325a cross a binary cross?Figure 1: On the left are three examples of crosses, following (Manzotti and Tagliasco, 2001). The humanability to segment objects is not general-purpose, and improves with experience. On the right is an image ofa cube on a table, illustrating the ambiguities that plague machine vision. The edges of the table and cubehappen to be aligned (dashed line), the colors of the cube and table are not well separated, and the cubehas a potentially confusing surface pattern.We wished to keep the actions implemented on our robotic system as simple as possible, to avoid obscuringthe core issue of development behind an elaborately engineered dextrous system. We found that simple pokinggestures (prodding, tapping, swiping, batting, etc.) were rich enough to evoke object affordances such asrolling and to provide the kind of training data needed to bootstrap perception.type of activity nature of causation time profilesensorimotor coordination direct causal chain strict synchronyobject probing one level of indirection fast onset upon contact, po-tential for delayed effectsconstructing mirror represen-tationcomplex causation involvingmultiple causal chainsarbitrarily delayed onset andeffectsTable 1:


View Full Document

SWARTHMORE CS 97 - Early Integration of Vision and Manipulation

Documents in this Course
Load more
Download Early Integration of Vision and Manipulation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Early Integration of Vision and Manipulation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Early Integration of Vision and Manipulation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?