OPTIMIZATION OF H.264 BASELINE DECODER ON ARM9TDMI PROCESSOR by SANDYA BASAVANAHALLI SHESHADRI Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN ELECTRICAL ENGINEERING THE UNIVERSITY OF TEXAS AT ARLINGTON December 2005ii ACKNOWLEDGEMENTS I am grateful to lot of people who helped me to shape up this thesis. I am greatly indebted to Dr. K.R.Rao for his support, guidance and encouragement right from the beginning of my work. His courses Digital Image Processing and Video Coding Standards made my concepts about video processing much clearer. He also helped me in getting the industry experience in FastVDO LLC, Columbia, MD, where I am gaining a lot real world experience. I thank him for all the support. Dr. Pankaj Topiwala, president of FastVDO LLC, was generous to allow me to run tests on FastVDO’s Baseline decoder© and let me utilize all the resources and licenses for ADSv1.2, without which this work would not have been in reality. I am greatly thankful to him. I received many helpful comments and suggestions from Patrick Rault, Basavaraj Mudigoudar, Sachin Patil, and Tarun Batia who are my colleagues in FastVDO. I thank Dr. Devarajan and Dr. Wang for accepting to be my committee members and review my thesis. Last but not the least I thank my parents and my brother for their love and support in every walk of my life. November 18, 2005iii ABSTRACT OPTIMIZATION OF H.264 BASELINE DECODER ON ARM9TDMI PROCESSOR Publication No. ______ Sandya Basavanahalli Sheshadri, MS The University of Texas at Arlington, 2005 Supervising Professor: Dr. K. R. Rao With the newly introduced features and advancements to the pre-existing features, the emerging H.264 video coding standard achieves significant improvements in coding performance over all existing standards, in a wide variety of applications. The coding-efficiency advantages of H.264, however, come at the expense of higher computational complexity. H.264 decoders can exhibit more than double the complexity of H.263 decoders. Furthermore, previous studies have shown that fractional-pixel motion-compensation interpolation and the loop filtering consume a significant amount of computational power in emerging H.264 decoders. Since these operations are part of the baseline profile of H.264, there is a need to evaluate new ways for minimizing complexity for H.264 decoders on low-complexity devices. In particular, new wirelessivdevices have both complexity and bit rate constraints, yet the range of these constraints differ from traditional systems (e.g., powerful PCs that are networked over the best-effort Internet). Under common operational scenarios, a low complexity wireless handheld may have significantly greater complexity/power constraints than bit rate limitation (e.g., over a wireless access LAN). This thesis analyzes the bottlenecks of H.264 decoders on ARM9TDMI processor, targeted for mobile devices, using performance-profiling tools. Optimizations are performed to achieve real time decoding. The code is built with Real View Compiler for ARM and ported on Symbian using Metroworks© Codewarrior© for Symbian V3.0 to achieve real time H.264 decoding on Nokia 6630 cellphone. The compiler flags were optimized for speed.v TABLE OF CONTENTS ACKNOWLEDGEMENTS.......................................................................................... ii ABSTRACT ............................................................................................................... iii LIST OF ILLUSTRATIONS..................................................................................... viii LIST OF TABLES ........................................................................................................x LIST OF ACRONYMS................................................................................................xi Chapter 1. INTRODUCTION..............................................................................................1 1.1 Overview: H.264 Video Coding Standard ................................................1 1.2 Applications and Design Feature Highlights.............................................2 1.3 Layered Structure.....................................................................................8 1.3.1 Network Abstraction Layer .............................................................9 1.3.2 Video Coding Layer ......................................................................10 1.3.3 YCbCr color space and 4:2:0 sampling..........................................11 1.3.4 Division of the picture into macroblocks .......................................12 1.3.5 Slices and slice groups...................................................................12 1.4 H.264 Codec ....................................................................................15 1.4.1 Encoder (forward path)..................................................................17vi 1.4.2 Encoder (reconstruction path).........................................................18 1.4.3 Decoder..........................................................................................18 1.5 Intra-frame Prediction .............................................................................19 1.6 Inter-frame Prediction .............................................................................23 1.6.1 Inter-frame Prediction in P Slices ...................................................23 1.7 Transform, Scaling, and Quantization .....................................................29 1.8 Entropy Coding.......................................................................................31 1.9 In-Loop Deblocking Filter.......................................................................31 2. ARM9TDMI ...................................................................................................34 2.1 About the ARM9TDMI...........................................................................34 2.2 Programmer’s Model ..............................................................................36 2.2.1 Hardware Fundamentals ..................................................................36 2.2.2 Instruction set extension spaces .......................................................36 2.2.3 Pipeline implementation and
View Full Document