UT Arlington EE 5359 - Project Proposal - D2841115

Home> Schools> University of Texas at Arlington> Electrical Engineering (EE) > EE 5359> Project Proposal

DOC PREVIEW

UT Arlington EE 5359 - Project Proposal

School name University of Texas at Arlington

Course Ee 5359- Topics in Signal Processing

Pages 2

This preview shows page 1 out of 2 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 2 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 2 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Jayant Choranur Rajachar Student ID # 1000023648 Page 1 of 2 EE5359 - Multimedia Processing Project Proposal Title: Multiple B-frame construction using depth map and segmentation. Proposal: In a temporal sequence of images representing a moving scene, there will be, typically, a great deal of similarity between nearby images[1]. For compressing such a sequence, we could use a scheme where each static image is compressed individually, such as motion JPEG. However, such a scheme fails to take advantage of the temporal redundancy. A better method would be to use motion compensation. Motion compensation involves the usage of motion vectors to reduce or eliminate the effects of motion in the scene. We use the first image in the sequence as the reference image. The consecutive images are not stored in their entirety. Instead, we store a measure of the displacement of the current image with respect to the reference image. This measure is referred to as the motion vector. The motion vector may apply to only a part of the image (with the background being static) or to to whole image (which happens in panning of a camera) or both. In this project, we will be considering the individual images, or frames of the sequence to be of 3 types (as in MPEG), namely, intraframes (I-frames), predicted frames (P-frames) and bidirectional frames (B-frames). An I-frame is a frame that is encoded only using information from within that frame. In other words, it is encoded spatially with no information from any other frame. This frame is the reference image for the rest of the sequence. A P-frame is the next frame stored in the sequence, with motion compensation. B-frames are the frames that are not stored in the sequence, but generated by the decoder. The decoder compares the I-frame and the next P-frame, or two consecutive P-frames and uses bidirectional interpolation to create additional frames between them. This ensures smooth transition between the stored frames, while reducing the size of the sequence. The goal of this project is to use an advanced interpolative technique to generate maximum number of accurate B-frames. The size of the sequence is to be reduced as much as possible. So, the redundancy between consecutive stored frames must be reduced as much as possible. However, generation of the P-frames is not the focus of this project, and so, two I-frames will be directly used to generate the B-frames. The interpolation technique proposed is generation of a depth map from the two I-frames, to estimate the displacement measure of the B-frame(s) with respect to the I-frames, coupled with segmentation of the I-frames to further refine the bidirectional motion. A depth map is a two-dimensional array where the x and y distance information corresponds to the rows and columns of the array as in an ordinary image, and the corresponding depth readings (z values) are stored in the array's elements (pixels)[2]. It is like a gray scale image except that the z information replaces the intensity information. Consider a largely static image, where the motion vectors are the result of the camera panningJayant Choranur Rajachar Student ID # 1000023648 Page 2 of 2 over the scene. In case of a camera panning, the objects at a greater distance from the viewer move at a lesser speed than the objects that are closer to the viewer. Given the depth map of the scene, it is easy to estimate the extent to which the static objects in the scene should be shifted, based on the estimated depth of the objects from the viewer. A mesh-based representation will be considered for the generation of the depth map[3]. This is a technique suitable for real-time rendering. In case the method is applied to video, the mesh-based representation lends itself to moderately high compression and low complexity decoding as well. The next step is motion estimation using segmentation[4]. Segmentation-based approaches partition the image into regions, with each region assuming a parametric motion model[5]. A color segmentation approach will be used. Here, we use the assumption that similarly colored neighboring pixels have similar motions (or depths). Color discontinuities are used to delineate object boundaries and thus motion discontinuities. Also, the segments generated across neighboring images have similar shapes and colors, i.e. they are temporally consistent. So the motion estimation problem is reduced from pixel or block to segment matching. It is expected that this two step approach will result in greater compression of the video stream without loss in picture quality. Two images that are part of the same sequence will be taken as I-frames and multiple B-frames will be generated between them, and possibly this will be extended to MPEG video. Potential applications: If this method of interpolation results in good compression without significant loss in image quality, then it can be used for video coding. A more important application would be the use of this technique for Free-viewpoint Video (FVV) communication[6]. With a suitably advanced interpolation technique, the number of cameras required for FVV can be minimized. Another interesting possibility is the application of this algorithm in digital cameras to produce simplified 3D images. References: [1] Video Compression Demystified – Peter Symes – McGraw-Hill – 2001. [2] http://www.cs.cf.ac.uk/Dave/Vision_lecture/node9.html [3] A Depth Map Representation for Real-Time Transmission and View-Based Rendering of a Dynamic 3D Scene - Bing-Bing Chai, Sriram Sethuraman, Harpreet S. Sawhney - Proceedings of the First International Symposium on 3D Data Processing Visualization and Transmission 2002 (3DPVT.02). [4] Representing moving images with layers - Wang, J.Y.A., Adelson, E.H. - IEEE Transactions on Image Processing, Volume 3, Issue 5, Sept. 1994 Page(s):625 – 638. [5] Consistent segmentation for optical flow estimation - C. L. Zitnick, N. Jojic, S. B. Kang - IEEE International Conference on Computer Vision, 2005. [6] System design of free viewpoint video communication - Kimata, H., Kitahara, M., Kamikura, K., Yashimat, Y., Fujii, T., Tanimoto, M. - The Fourth International Conference on Computer and Information Technology, 2004. (CIT '04) - 14-16 Sept. 2004 Page(s):52 -

View Full Document

UT Arlington EE 5359 - Project Proposal

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 2 pages.

UT Arlington EE 5359 - Project Proposal

Sign up for free to view:

Please select your school