Jayant Choranur Rajachar Student ID 1000023648 EE5359 Multimedia Processing Project Proposal Title Multiple B frame construction using depth map and segmentation Proposal In a temporal sequence of images representing a moving scene there will be typically a great deal of similarity between nearby images 1 For compressing such a sequence we could use a scheme where each static image is compressed individually such as motion JPEG However such a scheme fails to take advantage of the temporal redundancy A better method would be to use motion compensation Motion compensation involves the usage of motion vectors to reduce or eliminate the effects of motion in the scene We use the first image in the sequence as the reference image The consecutive images are not stored in their entirety Instead we store a measure of the displacement of the current image with respect to the reference image This measure is referred to as the motion vector The motion vector may apply to only a part of the image with the background being static or to to whole image which happens in panning of a camera or both In this project we will be considering the individual images or frames of the sequence to be of 3 types as in MPEG namely intraframes I frames predicted frames P frames and bidirectional frames B frames An I frame is a frame that is encoded only using information from within that frame In other words it is encoded spatially with no information from any other frame This frame is the reference image for the rest of the sequence A P frame is the next frame stored in the sequence with motion compensation B frames are the frames that are not stored in the sequence but generated by the decoder The decoder compares the I frame and the next P frame or two consecutive P frames and uses bidirectional interpolation to create additional frames between them This ensures smooth transition between the stored frames while reducing the size of the sequence The goal of this project is to use an advanced interpolative technique to generate maximum number of accurate B frames The size of the sequence is to be reduced as much as possible So the redundancy between consecutive stored frames must be reduced as much as possible However generation of the P frames is not the focus of this project and so two I frames will be directly used to generate the B frames The interpolation technique proposed is generation of a depth map from the two I frames to estimate the displacement measure of the B frame s with respect to the I frames coupled with segmentation of the I frames to further refine the bidirectional motion A depth map is a two dimensional array where the x and y distance information corresponds to the rows and columns of the array as in an ordinary image and the corresponding depth readings z values are stored in the array s elements pixels 2 It is like a gray scale image except that the z information replaces the intensity information Consider a largely static image where the motion vectors are the result of the camera panning Page 1 of 2 Jayant Choranur Rajachar Student ID 1000023648 over the scene In case of a camera panning the objects at a greater distance from the viewer move at a lesser speed than the objects that are closer to the viewer Given the depth map of the scene it is easy to estimate the extent to which the static objects in the scene should be shifted based on the estimated depth of the objects from the viewer A mesh based representation will be considered for the generation of the depth map 3 This is a technique suitable for real time rendering In case the method is applied to video the mesh based representation lends itself to moderately high compression and low complexity decoding as well The next step is motion estimation using segmentation 4 Segmentation based approaches partition the image into regions with each region assuming a parametric motion model 5 A color segmentation approach will be used Here we use the assumption that similarly colored neighboring pixels have similar motions or depths Color discontinuities are used to delineate object boundaries and thus motion discontinuities Also the segments generated across neighboring images have similar shapes and colors i e they are temporally consistent So the motion estimation problem is reduced from pixel or block to segment matching It is expected that this two step approach will result in greater compression of the video stream without loss in picture quality Two images that are part of the same sequence will be taken as I frames and multiple B frames will be generated between them and possibly this will be extended to MPEG video Potential applications If this method of interpolation results in good compression without significant loss in image quality then it can be used for video coding A more important application would be the use of this technique for Free viewpoint Video FVV communication 6 With a suitably advanced interpolation technique the number of cameras required for FVV can be minimized Another interesting possibility is the application of this algorithm in digital cameras to produce simplified 3D images References 1 Video Compression Demystified Peter Symes McGraw Hill 2001 2 http www cs cf ac uk Dave Vision lecture node9 html 3 A Depth Map Representation for Real Time Transmission and View Based Rendering of a Dynamic 3D Scene Bing Bing Chai Sriram Sethuraman Harpreet S Sawhney Proceedings of the First International Symposium on 3D Data Processing Visualization and Transmission 2002 3DPVT 02 4 Representing moving images with layers Wang J Y A Adelson E H IEEE Transactions on Image Processing Volume 3 Issue 5 Sept 1994 Page s 625 638 5 Consistent segmentation for optical flow estimation C L Zitnick N Jojic S B Kang IEEE International Conference on Computer Vision 2005 6 System design of free viewpoint video communication Kimata H Kitahara M Kamikura K Yashimat Y Fujii T Tanimoto M The Fourth International Conference on Computer and Information Technology 2004 CIT 04 14 16 Sept 2004 Page s 52 59 Page 2 of 2
View Full Document