DOC PREVIEW
UW-Madison ECE 734 - H.264 Performance Optimization Using SSE

This preview shows page 1 out of 2 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

H.264 Performance Optimization Using SSEPieper, S and Tsen, S {spieper, stsen}@wisc.edu1.0 PROJECT OVERVIEW AND HIGHLIGHTSWe plan to accelerate the baseline implementation of H.264 by selectively vectorizing computational kernels. This project will require identifying the performance critical sections of H.264, and replacing these sections of C code with SSE intrinsics. This approach will allow us the benefits of hand-optimized assembly while allowing us to alsotake advantage of the compiler’s ability to perform efficient register allocation and to write easily debugable code. To determine the effectiveness of our algorithm transformations, we will compare our results against an unoptimized version of the code and against a version compiled with full optimizations. To get the best results, we would like to use Intel’s compiler, but this will require some installation and may not be possible.2.0 PROJECT MOTIVATIONThe motivation of the project is to study the benefits of vectorized instructions for a current application. H.264 is an emerging standard with many desirable features in terms of compression rate and video quality, but it is also very computationally intensive. If we are able to significantly accelerate its execution on IA32 processors, this would be a very exciting result. It will also be interesting to determine the extent to which SIMD instructions are capable of extracting parallelism and accelerating real applications.3.0 PRIOR ARTAcceleration of DSP algorithms through the use of explicit parallelism is not a new idea. Some of the original vector processors were super computers made by Cray. Similarly, the idea of hand-optimizing critical loops has been around as long as compilers. These ideas were first applied to general purpose processors with the introduction of the Pentium MMX in 1997. The research that led to the introduction of these instructions canbe found in [2] and demonstrated significant performance benefits for multimedia applications. More recently, a 2003 paper [3] examined using MMX and SSE instructions to accelerateMPEG4 decoding. Some optimizations to MPEG4 kernels are described. It is not possibleto determine the speedup due to these optimizations, however, as their results also includethe use of a co-processor. This work is close to what we are interested in, but predates theadvent of H.264 which is also the MPEG4 advanced video codec. It is not clear how useful the optimizations suggested in this paper will be.Finally, Intel themselves have developed an optimized H.264 encoder/decoder[4] and discuss the issues relating to its development. Their optimizations draw on a study by Horowitz of the computational complexity of H.264 [5]. We expect both of these papers to be very helpful in guiding our optimizations.4.0 APPROACHWe plan to profile the H.264 code, to determine the most intensive portions. Next, we will choose sections of the code to speed up. Next we will rewrite the code using inline MMX, SSE2, or SSE3 instructions. As final steps we will evaluate results of ssoftware speedup and draw conclusions based on the results.5.0 EXPECTED RESULTSWe expect to achieve a significant speedup over unoptimized code, and some speedup over optimized compiler generated code. The maximum possible speedup available would be 16X in the case that we could convert every operation to an 8-bit operation performed in parallel. Our likely speedup is much less than this, but could possibly be a factor of 2 or 3 over unoptimized code.6.0 TASK PLANNINGTask1 - profile the H.264 code, to determine the most intensive portions.Task2 - choose sections of the code to speed up and search for known optimizationsTask3 - rewrite the code using inline MMX, SSE2, or SSE3 instructionsTask4 - evaluate speedupTask5 - write final report and prepare final presentationREFERENCES1. Thomas Wiegand, Gary J. Sullivan, Gisle Bjontegaard, and Ajay Luthra, Overview of the H.264/AVC Video Coding Standard, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 560-576, July 2003.2. Alex Peleg , Uri Weiser, MMX Technology Extension to the Intel Architecture. IEEE Micro, v.16 n.4, p.42-50, August 1996 3. Jin-Hau Kuo Chia-Chiang Ho Kan-Li Huang Jim Shiu Ja-Ling Wu. A low-cost media-processor based real-time MPEG-4 video decoder, IEEE Transactions on Consumer Electronics, Vol. 49, No. 4, pp. 1488-1497, Nov. 2003.4. Iverson, V.; McVeigh, J.; Reese, B., Real-time H.24-AVC codec on Intel architectures, International Conference on Image Processing, 2004, Vol. 2 pp. 24-27, Oct. 2004.5. Horowitz, M. Joch, A. Kossentini, F. Hallapuro, A., H.264/AVC baseline profile decoder complexity analysis. IEEE Transactions on Circuits and Systems for Video Technology, Vol.13, No. 7, pp. 704-716, July


View Full Document

UW-Madison ECE 734 - H.264 Performance Optimization Using SSE

Documents in this Course
Load more
Download H.264 Performance Optimization Using SSE
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view H.264 Performance Optimization Using SSE and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view H.264 Performance Optimization Using SSE 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?