SWARTHMORE CS 97 - Early Integration of Vision and Manipulation - D2584710

Home> Schools> Swarthmore College> (CS) > CS 97> Early Integration of Vision and Manipulation

DOC PREVIEW

SWARTHMORE CS 97 - Early Integration of Vision and Manipulation

School name Swarthmore College

Course Cs 97- Computer Perception

Pages 23

This preview shows page 1-2-22-23 out of 23 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Early Integration of Vision and ManipulationGiorgio MettaLIRA-Lab, DISTUniversity of GenovaGenova, [email protected] FitzpatrickArtifical Intelligence LabMassachusetts Institute of TechnologyCambridge, MA, [email protected] 25, 2002Address correspondence to Giorgio Metta, LIRA-Lab, DIST – University of GenovaViale Causa, 13 – I-16145, Genova, Italy(phone: +39 0103532791, fax: +39 0103532948).AbstractVision and manipulation are inextricably intertwined in the primate brain. Tantalizing results fromneuroscience are shedding light on the mixed motor and sensory representations used by the brain duringreaching, grasping, and object recognition. We now know a great deal about what happens in the brainduring these activities, but not necessarily why. Is the integration we see functionally important, orjust a reﬂection of evolution’s lack of enthusiasm for sharp modularity? We wish to instantiate theseresults in robotic form to probe their technical advantages and to ﬁnd any lacunae in existing models.We begin with a precursor to manipulation, simple poking and prodding, and show how it facilitatesobject segmentation, a long-standing problem in machine vision. The robot can familiarize itself with theobjects in its environment by acting upon them. It can then recognize other actors (such as humans) inthe environment through their eﬀect on the objects it has learned about. We argue that following causalchains of events out from the robot’s body into the environment allows for a very natural developmentalprogression of visual competence, and relate this idea to results in neuroscience.keywords: humanoid robotics, active segmentation, epigenesisrunning title: Vision and ManipulationEarly Integration of Vision and ManipulationGiorgio MettaLIRA-Lab, DISTUniversity of GenovaGenova, [email protected] FitzpatrickArtifical Intelligence LabMassachusetts Institute of TechnologyCambridge, MA, [email protected] 25, 2002AbstractVision and manipulation are inextricably intertwined in the primate brain. Tantalizing results fromneuroscience are shedding light on the mixed motor and sensory representations used by the brain duringreaching, grasping, and object recognition. We now know a great deal about what happens in the brainduring these activities, but not necessarily why. Is the integration we see functionally important, orjust a reﬂection of evolution’s lack of enthusiasm for sharp modularity? We wish to instantiate theseresults in robotic form to probe their technical advantages and to ﬁnd any lacunae in existing models.We believe it would be missing the point to investigate this on a platform where dextrous manipulationand sophisticated machine vision are already implemented in their mature form, and instead follow adevelopmental approach from simpler primitives.We begin with a precursor to manipulation, simple poking and prodding, and show how it facilitatesobject segmentation, a long-standing problem in machine vision. The robot can familiarize itself with theobjects in its environment by acting upon them. It can then recognize other actors (such as humans) inthe environment through their eﬀect on the objects it has learned about. We argue that following causalchains of events out from the robot’s body into the environment allows for a very natural developmentalprogression of visual competence, and relate this idea to results in neuroscience.1 Vision, action, and developmentRobots and animals are actors in their environment, not simply passive observers. They have the opportunityto examine the world using causality, by performing probing actions and learning from the response. Tracingchains of causality from motor action to perception (and back again) is important both to understand how thebrain deals with sensorimotor coordination and to implement those same functions in an artiﬁcial system,such as a humanoid robot. In this paper, we propose that such causal probing can be arranged in adevelopmental sequence leading to a manipulation-driven representation of objects. We present results formany important steps along the way, and describe how they ﬁt in a larger scale implementation. And wediscuss in what sense our artiﬁcial implementation is substantially in agreement with neuroscience.Table 1 shows three levels of causal complexity that we address in the paper. The simplest causal chainthat an actor – whether robotic or biological – may experience is the perception of its own actions. Thetemporal aspect is immediate: visual information is tightly synchronized to motor commands. Once thiscausal connection is established, we can go further and use it to actively explore the boundaries of objects.In this case, there is one more step in the causal chain, and the temporal nature of the response may bedelayed since initiating a reaching movement doesn’t immediately elicit consequences in the environment.Finally we argue that extending this causal chain further will allow the actor to make a connection betweenits own actions and the actions of another. This is reminiscent of what has been observed in the response ofthe monkey’s premotor cortex.10001000010111110001000010427859124618232117317113432423710155061325a cross a binary cross?Figure 1: On the left are three examples of crosses, following (Manzotti and Tagliasco, 2001). The humanability to segment objects is not general-purpose, and improves with experience. On the right is an image ofa cube on a table, illustrating the ambiguities that plague machine vision. The edges of the table and cubehappen to be aligned (dashed line), the colors of the cube and table are not well separated, and the cubehas a potentially confusing surface pattern.We wished to keep the actions implemented on our robotic system as simple as possible, to avoid obscuringthe core issue of development behind an elaborately engineered dextrous system. We found that simple pokinggestures (prodding, tapping, swiping, batting, etc.) were rich enough to evoke object aﬀordances such asrolling and to provide the kind of training data needed to bootstrap perception.type of activity nature of causation time proﬁlesensorimotor coordination direct causal chain strict synchronyobject probing one level of indirection fast onset upon contact, po-tential for delayed eﬀectsconstructing mirror represen-tationcomplex causation involvingmultiple causal chainsarbitrarily delayed onset andeﬀectsTable 1:

View Full Document