3D VisionOutline of MotionThe Importance of Visual MotionHuman TrackingBlurred SequenceVideo MosaicingVideo in Classroom/AuditoriumVision Based InteractionVideo TextureProblem StatementApproachesThe Motion Field of Rigid ObjectsSlide 13Slide 14Slide 15Basic Equations of Motion FieldMotion Field vs. DisparitySpecial Case 1: Pure TranslationSpecial Case 2: Pure RotationSpecial Case 3: Moving PlaneSpecial Cases: A SummarySlide 22Motion ParallaxSlide 24Slide 25SummaryNotion of Optical FlowEstimating Optical FlowSlide 29Using Optical FlowSome DetailsFeature-Based ApproachMotion-Based SegmentationSlide 34Next3D Computer Visionand Video Computing3D Vision3D VisionTopic 5 of Part II Visual MotionCSc I6716Spring 2008Cover Image/video credits: Rick Szeliski, MSRZhigang Zhu, City College of New York [email protected] Computer Visionand Video ComputingOutline of Motion Outline of Motion Problems and ApplicationsThe importance of visual motionProblem StatementThe Motion Field of Rigid MotionBasics – Notations and EquationsThree Important Special Cases: Translation, Rotation and Moving PlaneMotion ParallaxOptical Flow (next lecture)Optical flow equation and the aperture problemEstimating optical flow3D motion & structure from optical flowFeature-based ApproachTwo-frame algorithmMulti-frame algorithmStructure from motion – Factorization methodAdvanced Topics Spatio-Temporal Image and Epipolar Plane ImageVideo Mosaicing and Panorama GenerationMotion-based Segmentation and Layered Representation3D Computer Visionand Video ComputingThe Importance of Visual MotionThe Importance of Visual MotionStructure from MotionApparent motion is a strong visual clue for 3D reconstructionMore than a multi-camera stereo systemRecognition by motion (only) Biological visual systems use visual motion to infer properties of 3D world with little a priori knowledge of itBlurred image sequence Visual Motion = Video ! [Go to CVPR 2004-2007 Sites for Workshops]Video Coding and Compression: MPEG 1, 2, 4, 7…Video Mosaicing and Layered Representation for IBRSurveillance (Human Tracking and Traffic Monitoring)HCI using Human Gesture (video camera)Automated Production of Video Instruction Program (VIP)Video Texture for Image-based Rendering…3D Computer Visionand Video ComputingHuman TrackingHuman TrackingW4- Visual Surveillance of Human Activity From: Prof. Larry Davis, University of Maryland http://www.umiacs.umd.edu/users/lsd/vsam.htmlTracking moving subjects from video of a stationary camera…3D Computer Visionand Video ComputingBlurred SequenceBlurred SequenceAn up-sampling from images of resolution 15x20 pixels From: James W. Davis. MIT Media Lab http://vismod.www.media.mit.edu/~jdavis/MotionTemplates/motiontemplates.htmlRecognition by Actions: Recognize object from motion even if we cannot distinguish it in any images …3D Computer Visionand Video ComputingVideo MosaicingVideo MosaicingStereo Mosaics from a single video sequenceFrom: Z. Zhu, E. M. Riseman, A. R. Hanson, Parallel-perspective stereo mosaics, The Eighth IEEE International Conference on Computer Vision, Vancouver, Canada, July 2001, vol I, 345-352. http://www-cs.engr.ccny.cuny.edu/~zhu/StereoMosaic.htmlVideo of a moving camera = multi-frame stereo with multiple cameras…3D Computer Visionand Video ComputingVideo in Classroom/AuditoriumVideo in Classroom/AuditoriumDemo: Bellcore AutoauditoriumA Fully Automatic, Multi-Camera System that Produces Videos Without a Crew http://www.autoauditorium.com/An application in e-learning: Analyzing motion of people as well as control the motion of the camera…3D Computer Visionand Video ComputingVision Based InteractionVision Based InteractionMicrosoft Research Vision based Interfaceby Matthew TurkDemoMotion and Gesture as Advanced Human-Computer Interaction (HCI)….3D Computer Visionand Video ComputingVideo TextureVideo TextureVideo Textures are derived from video by using the finite duration input clip to generate a smoothly playing infinite video.From: Arno Schödl, Richard Szeliski, David H. Salesin, and Irfan Essa. Video textures. Proceedings of SIGGRAPH 2000, pages 489-498, July 2000http://www.gvu.gatech.edu/perception/projects/videotexture/Image (video) -based rendering: realistic synthesis without “vision”…3D Computer Visionand Video ComputingProblem StatementProblem StatementTwo SubproblemsCorrespondence: Which elements of a frame correspond to which elements in the next frame?Reconstruction :Given a number of correspondences, and possibly the knowledge of the camera’s intrinsic parameters, how to recovery the 3-D motion and structure of the observed worldMain Difference between Motion and StereoCorrespondence: the disparities between consecutive frames are much smaller due to dense temporal samplingReconstruction: the visual motion could be caused by multiple motions ( instead of a single 3D rigid transformation)The Third Subproblem, and Fourth….Motion Segmentation: what are the regions the the image plane corresponding to different moving objects?Motion Understanding: lip reading, gesture, expression, event…3D Computer Visionand Video ComputingApproachesApproachesTwo SubproblemsCorrespondence: Differential Methods - >dense measure (optical flow)Matching Methods -> sparse measureReconstruction : More difficult than stereo since Motion (3D transformation betw. Frames) as well as structure needs to be recoveredSmall baseline causes large errorsThe Third SubproblemMotion Segmentation: Chicken and Egg problemWhich should be solved first? Matching or SegmentationSegmentation for matching elementsMatching for Segmentation3D Computer Visionand Video ComputingThe Motion Field of Rigid ObjectsThe Motion Field of Rigid ObjectsMotion: 3D Motion ( R, T): camera motion (static scene) or single object motion Only one rigid, relative motion between the camera and the scene (object)Image motion field: 2D vector field of velocities of the image points induced by the relative motion.Data: Image sequenceMany framescaptured at time t=0, 1, 2, …Basics: only consider two consecutive framesWe consider a reference frame and its consecutive frameImage motion field can be viewed disparity map of the two frames captured at two consecutive camera locations ( assuming we have a
View Full Document