UT Arlington EE 5359 - EE 5359 PROJECT - D2433944

Home> Schools> University of Texas at Arlington> Electrical Engineering (EE) > EE 5359> EE 5359 PROJECT

DOC PREVIEW

UT Arlington EE 5359 - EE 5359 PROJECT

School name University of Texas at Arlington

Course Ee 5359- Topics in Signal Processing

Pages 10

This preview shows page 1-2-3 out of 10 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

EE 5359 MULTIMEDIA PROCESSING SPRING 2011 Interim Report FPGA IMPLEMENTATION OF H 264 VIDEO ENCODER Under guidance of DR K R RAO DEPARTMENT OF ELECTRICAL ENGINEERING UNIVERSITY OF TEXAS AT ARLINGTON SPRING 2011 Presented by KUSHAL KUNIGAL 1000662485 kushal kunigal mavs uta edu Introduction This project presents a detailed study on H 264 video encoder and the algorithms for evaluating the transform and quantization suitable for high speed implementation on FPGA ASIC Along with this detailed architectures of intra Prediction integer transforms and quantization processors are presented Overview To achieve a real time H 264 encoding solution multiple FPGAs and programmable DSPs are often used The computational complexity alone does not determine if a functional module should be mapped to hardware or remain in software The architectural issues that influence the overall design decision are Data Locality In a synchronous design the ability to access memory in a particular order and granularity while minimizing the number of clock cycles due to latency bus contention alignment DMA transfer rate and the types of memory used is very important The data locality issue Figure 1 is primarily dictated by the physical interfaces between the data unit and the arithmetic unit or the processing engine 2 Fig 1 H 264 encoder block diagram 2 Computational Complexity Programmable DSPs are bounded in computational complexity as measured by the clock rate of the processor Signal processing algorithms implemented in the FPGA fabric are typically computationally intensive By mapping these modules onto the FPGA fabric the host processor or the programmable DSP has the extra cycles for other algorithms Furthermore FPGAs can have multiple clock domains in the fabric so selective hardware blocks can have separate clock speeds based on their computational requirements 2 Block diagram of H 264 advanced Video Encoder Fig 2 shows the block diagram of H 264 encoder Therein the modules designed in this work are shown in grey shades An input frame or field Fn is processed in units of a macro block A macro block consists of 16x16 pixels Each macro block is encoded in intra or inter mode and for each block in the macro block a prediction P is formed based on the reconstructed picture samples In Intra mode P is formed from samples in the current slice that have been previously constructed In inter mode P is formed by motion compensated prediction from one or two reference picture s selected from the set of reference pictures Fig 2 Modules in H 264 video encoder 3 Concept By understanding the ideas and importance behind video compression it is possible to use the idea and implement an efficient and high performance encoder such that it consumers less power and take less clock cycles to encode an image frame The implementation is considered a lite version of the H 264 encoder similar to the MPEG 4 digital video codec which is known to achieving high data compression The same building blocks implemented in the H 264 encoder will be used in the lite version with exceptions of a few optimizing modifications For example the Motion Estimation algorithm it was suggested to use a full search algorithm but after some research it was discovered that the motion estimation process consumes 66 94 of the cycles 3 Therefore if there was any optimization to be made here would be the place to start Therefore instead of applying the full search algorithm an alternative algorithm was used which will be discussed under the background section The motion compensation would produce the predicted frame from the motion vectors from motion estimation and reference frame The residual frame would be generated by the difference between the predicted frame and current frame To compress the data even further Discrete Cosine Transform DCT a type of linear transform will be performed on the residual frame In addition quantization will be used as well to compress the data This project will be simulated and synthesized on the Xilinx 8 1 ISE to determine chip size and power consumption ModelSim to simulate and observe waveforms The purpose of this implementation is to improve behavioral VHDL modeling and FPGA design process In addition to learn about video compression system and be able to incorporate other simulation tools with VHDL Hopefully with this implementation of the H 264 encoder it will result in a design that is efficient and achieves high performance the hardware design that runs faster with less power consumption and smaller area Modified Encoder Hardware Design Fig 3 Modified Encoder Hardware Design derived from Fig 2 The encoder will be used to encode a frame from a video sequence 30 frames second Each frame is 176 x 144 pixels QCIF resolution very typical for low bit rate video contents in cell phones Fig 4 Reference frame 1 4 Fig 5 Current frame 2 4 Fig 6 Residual of current frame 4 The following are the different stages in designing the video encoder Motion Estimation Motion estimation is the most computationally demanding task in image compression applications and can require as much as 66 94 of the processor cycles spent in the video encoder 6 Therefore an efficient motion estimation algorithm is essential for creating an efficient implementation of the H 264 encoder This section will briefly cover different motion estimation algorithms and the tradeoffs between them and the proposed algorithm that will be implemented on this project Fig 7 The principle of block matching motion estimation algorithm is finding the best matching block in the searching area of a reference frame for each macroblock in the original frame Motion estimation attempts to find a region in a previously encoded frame called the reference frame that closely matches each macro block in the current frame In this implementation each frame is divided into macro blocks of 4x4 pixels To find the best matching block Minimum Absolute Difference MAD is very often exploited as the matching criterion because of its simple operation and efficiency Distance vector between this best matching block and current original macroblock is so called the motion vector The motion vector is comprised of the horizontal and vertical offsets from the location of the macro block in the current frame to the location in the reference frame The full search algorithm also referred to as exhaustive block matching algorithm EBMA was the suggested algorithm to perform the motion estimation in the

View Full Document

UT Arlington EE 5359 - EE 5359 PROJECT

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 10 pages.

UT Arlington EE 5359 - EE 5359 PROJECT

Sign up for free to view:

Please select your school