UW-Madison ECE 734 - H.264 Performance Optimization Using SSE - D2328570

Home> Schools> University of Wisconsin, Madison> Electrical and Computer Engr (ECE) > ECE 734> H.264 Performance Optimization Using SSE

DOC PREVIEW

UW-Madison ECE 734 - H.264 Performance Optimization Using SSE

School name University of Wisconsin, Madison

Course Ece 734- VLSI Array Structures for Digital Signal Processing

Pages 2

This preview shows page 1 out of 2 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 2 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 2 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

H.264 Performance Optimization Using SSEPieper, S and Tsen, S {spieper, stsen}@wisc.edu1.0 PROJECT OVERVIEW AND HIGHLIGHTSWe plan to accelerate the baseline implementation of H.264 by selectively vectorizing computational kernels. This project will require identifying the performance critical sections of H.264, and replacing these sections of C code with SSE intrinsics. This approach will allow us the benefits of hand-optimized assembly while allowing us to alsotake advantage of the compiler’s ability to perform efficient register allocation and to write easily debugable code. To determine the effectiveness of our algorithm transformations, we will compare our results against an unoptimized version of the code and against a version compiled with full optimizations. To get the best results, we would like to use Intel’s compiler, but this will require some installation and may not be possible.2.0 PROJECT MOTIVATIONThe motivation of the project is to study the benefits of vectorized instructions for a current application. H.264 is an emerging standard with many desirable features in terms of compression rate and video quality, but it is also very computationally intensive. If we are able to significantly accelerate its execution on IA32 processors, this would be a very exciting result. It will also be interesting to determine the extent to which SIMD instructions are capable of extracting parallelism and accelerating real applications.3.0 PRIOR ARTAcceleration of DSP algorithms through the use of explicit parallelism is not a new idea. Some of the original vector processors were super computers made by Cray. Similarly, the idea of hand-optimizing critical loops has been around as long as compilers. These ideas were first applied to general purpose processors with the introduction of the Pentium MMX in 1997. The research that led to the introduction of these instructions canbe found in [2] and demonstrated significant performance benefits for multimedia applications. More recently, a 2003 paper [3] examined using MMX and SSE instructions to accelerateMPEG4 decoding. Some optimizations to MPEG4 kernels are described. It is not possibleto determine the speedup due to these optimizations, however, as their results also includethe use of a co-processor. This work is close to what we are interested in, but predates theadvent of H.264 which is also the MPEG4 advanced video codec. It is not clear how useful the optimizations suggested in this paper will be.Finally, Intel themselves have developed an optimized H.264 encoder/decoder[4] and discuss the issues relating to its development. Their optimizations draw on a study by Horowitz of the computational complexity of H.264 [5]. We expect both of these papers to be very helpful in guiding our optimizations.4.0 APPROACHWe plan to profile the H.264 code, to determine the most intensive portions. Next, we will choose sections of the code to speed up. Next we will rewrite the code using inline MMX, SSE2, or SSE3 instructions. As final steps we will evaluate results of ssoftware speedup and draw conclusions based on the results.5.0 EXPECTED RESULTSWe expect to achieve a significant speedup over unoptimized code, and some speedup over optimized compiler generated code. The maximum possible speedup available would be 16X in the case that we could convert every operation to an 8-bit operation performed in parallel. Our likely speedup is much less than this, but could possibly be a factor of 2 or 3 over unoptimized code.6.0 TASK PLANNINGTask1 - profile the H.264 code, to determine the most intensive portions.Task2 - choose sections of the code to speed up and search for known optimizationsTask3 - rewrite the code using inline MMX, SSE2, or SSE3 instructionsTask4 - evaluate speedupTask5 - write final report and prepare final presentationREFERENCES1. Thomas Wiegand, Gary J. Sullivan, Gisle Bjontegaard, and Ajay Luthra, Overview of the H.264/AVC Video Coding Standard, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 560-576, July 2003.2. Alex Peleg , Uri Weiser, MMX Technology Extension to the Intel Architecture. IEEE Micro, v.16 n.4, p.42-50, August 1996 3. Jin-Hau Kuo Chia-Chiang Ho Kan-Li Huang Jim Shiu Ja-Ling Wu. A low-cost media-processor based real-time MPEG-4 video decoder, IEEE Transactions on Consumer Electronics, Vol. 49, No. 4, pp. 1488-1497, Nov. 2003.4. Iverson, V.; McVeigh, J.; Reese, B., Real-time H.24-AVC codec on Intel architectures, International Conference on Image Processing, 2004, Vol. 2 pp. 24-27, Oct. 2004.5. Horowitz, M. Joch, A. Kossentini, F. Hallapuro, A., H.264/AVC baseline profile decoder complexity analysis. IEEE Transactions on Circuits and Systems for Video Technology, Vol.13, No. 7, pp. 704-716, July

View Full Document

UW-Madison ECE 734 - H.264 Performance Optimization Using SSE

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 2 pages.

UW-Madison ECE 734 - H.264 Performance Optimization Using SSE

Sign up for free to view:

Please select your school