DOC PREVIEW
UT Arlington EE 5359 - Performance Analysis of H.264 Encoder on TMS320C64x+ and ARM 9E

This preview shows page 1-2-3-4-24-25-26-50-51-52-53 out of 53 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Project objectivesSlide 3H.264 Encoder – Profiles [4]Profile structure of H.264 [3]H.264 Encoder baseline profile [4]Baseline profile continued[3]Layers of H.264 encoder[7]Video data hierarchy [4]H.264 Encoder block diagramEncoding processIntra predictionIntra coding- prediction modes for 4X4 blocks [3]Inter predictionSlide 15Sub-pixel motion compensationHalf pixel and quarter pixel interpolation [3] Integer transformQuantizationEntropy codingDeblocking filterDeblocking filter continuedSlide 23Slide 24TI TMS320C64x+ DSP [9]C64x+ CPUThe .M unitThe .L unitThe .S unitThe .D unitC64x+ pipeline [10]C64x+ pipeline block diagram [10]ARM 9 DSP [14]ARM 9E CPUMajor blocks of ARM 9E CPUARM 9E register structureRegister functionalitiesARM 9E pipeline [14]ARM 9E pipeline block diagram[14]Slide 40ImplementationOptimizationOptimization: C level [27]Optimization: Assembly level [14]ResultsConclusionList of acronyms in alphabetical orderList of acronyms in alphabetical order continuedReferencesReferences continuedReferences continuedReferences continuedSlide 53Performance Analysis of H.264 Encoder on TMS320C64x+ and ARM 9ENikshep PatilProject objectives•Understand the major blocks H.264 encoder [2]•Understand the Texas Instruments [16] TMS64x+ DSP architecture•Understand the ARM 9E [18] DSP architecture•Port the H.264 encoder on the two platforms•Analyze the performance of the encoder on the two processors in terms of MIPS•Identify and optimize the most computationally expensive blocks separately for both the DSP cores •Achieve MIPS reduction of about 30%Part 1 H.264 encoderH.264 Encoder – Profiles [4]Seven prominent profiles –•Baseline profile•Main profile•Extended profile•High Profile•High 10 Profile•High 4:2:2 Profile•High 4:4:4 ProfileProfile structure of H.264 [3]Fig. 1. The specific coding parts of the profiles in H.264 [3]H.264 Encoder baseline profile [4]Primarily designed for –•Low processing power platforms•Error prone transmission environmentsFeatures –•Low on coding efficiency•I- and P- slice coding•Enhanced error resilience coding such as flexible macroblock ordering (FMO) and arbitrary slice ordering (ASO) and redundant slices (RS)•Context adaptive variable length coding (CAVLC)Features not included in baseline profile –•B- slices, SI- or SP- slices •Interlace coding tools•Context adaptive binary arithmetic coding (CABAC)Baseline profile continued[3]Major applications –•video-conferencing •mobile video streamingLayers of H.264 encoder[7]The H.264 encoder is organized into two layers-•Network abstraction layer: Packets containing integer number of bytes with a header.–Video coding layer NAL units–Non video coding layer NAL units•Video coding layer – The coded video bitstreamVideo data hierarchy [4]•Video data organized as -Picture ---> Slices ---> Macroblocks ---> Sub-macroblocks ---> Blocks ---> Pixels•Pixel is the most basic building block of a digital imageH.264 Encoder block diagramFig. 2. The block diagram of H.264 encoder [3]Encoding processThe major encoding steps are –•Intra prediction•Inter prediction•Transform and quantization•Entropy coding•Deblocking filterIntra prediction•Performed in pixel-domain•Prediction of pixel values as linear interpolations of pixels from the adjacent edges of neighboring macroblocks already decoded•For luma samples, the prediction block may be formed for each 4X4 subblock, each 8X8 block, or for a 16X16 macroblock•9 directional prediction modes for each 4X4 and 8X8 luma blocks •4 directional prediction modes for 16X16 luma block •4 directional prediction modes for chroma blocksIntra coding- prediction modes for 4X4 blocks [3]Inter prediction•Generates a predicted version of a rectangular array of pixels, by choosing another similarly sized rectangular array of pixels from a previously decoded reference picture•Macroblocks partitioned into smaller sub-blocks. A large partition size is appropriate for homogeneous areas of the frame and a small partition size is beneficial for detailed areas.•A 16X16 macroblock can be partitioned in four ways: 16X16, 16X8, 8X16 or 8X8•the 8X8 sub-block can be partitioned in four ways: 8X8, 8X4, 4X8 or 4X4Inter prediction – Macroblock and sub macroblock partitions [3]Sub-pixel motion compensation•Sub-pixel motion compensation provides significantly better compression performance than integer-pixel compensation•Increases complexity. •Increases coding efficiency at high bitrates and high video resolutions•For luma component, sub-pixel samples at half pixel positions are generated first and are interpolated from neighboring integer pixel samples using a 6-tap FIR filter with weights (1, -5, 20, 20, -5, 1)/32•Quarter-pixel samples produced using bilinear interpolation between neighboring half- or integer-pixel samples•For the 4:2:0 video format, 1/8 pixel samples are required for the chroma component. These samples are linearly interpolated between integer-pixel chroma samplesHalf pixel and quarter pixel interpolation [3]Integer transform•This residual signal with spatial redundancy is split into 4X4 or 8X8 blocks. The 4X4 transform removes the need for multiplications•Hierarchical transform structure•The 4X4 blocks are first transformed with integer DCT operation. Then the DC coefficients of neighboring 4X4 transforms for the luma blocks are grouped into 4X4 blocks and transformed again by Hadamard transform•A 4X4 Walsh Hadamard transform is used for luma DC coefficients for 16X16 Intra-mode. •A 2X2 Walsh Hadamard transform is used for chroma DC coefficients.Quantization•The quantized signal Y is obtained from the input signal X using the relation –Y = X . ROUND(SF/Qstep)-X is the input signal-Y is the output signal-Qstep is the quantization parameter•The quantization parameter varies from 0 to 51 allowing a total of 52 quantization steps•The scaling operations for the quantization step sizes are arranged with logarithmic step size increments. An increment of Qstep by 6 corresponds to doubling of quantization step sizeEntropy coding•The syntax elements other than the residual data are encoded by the Exp-Golomb codes•A more sophistical method - CAVLC - employed for coding the residual data•In CAVLC inter-symbol redundancies are exploited by switching VLC tables for various syntax elements depending on already transmitted coding symbols•The increased adaptivity


View Full Document

UT Arlington EE 5359 - Performance Analysis of H.264 Encoder on TMS320C64x+ and ARM 9E

Documents in this Course
JPEG 2000

JPEG 2000

27 pages

MPEG-II

MPEG-II

45 pages

MATLAB

MATLAB

22 pages

AVS China

AVS China

22 pages

Load more
Download Performance Analysis of H.264 Encoder on TMS320C64x+ and ARM 9E
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Performance Analysis of H.264 Encoder on TMS320C64x+ and ARM 9E and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Performance Analysis of H.264 Encoder on TMS320C64x+ and ARM 9E 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?