LOW COMPLEXITY ENCODER USING MACHINE LEARNING by THEJASWINI PURUSHOTHAM Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN ELECTRICAL ENGINEERING THE UNIVERSITY OF TEXAS AT ARLINGTON December 2010 Copyright by Thejaswini Purushotham 2010 All Rights Reserved ACKNOWLEDGEMENTS Multitudes of pixels come together to form a lovely portrait Similarly the fruition of a thesis happens because of the encouragement and guidance of numerous people Thus I would like to take this opportunity to thank everyone who invested their precious time on me in the last two years In the fall of 2008 I walked into the room of Dr K R Rao with the hopes of learning from the master of video coding Though we were total strangers he immediately put me at ease by creating a very positive working atmosphere which entails my sincere appreciation His mentoring has undoubtedly had a profound impact on me I am greatly indebted to him I am deeply grateful to Dr Dongil Han for always being available in the lab and providing me with continued financial support and technical advice I would like to thank the other members of my advisory committee Dr W Alan Davis and Dr Jonathan Bredow for reviewing this thesis document and offering insightful comments My sincere thanks to Pragnesh and Suchethan The love affection and encouragement of Bhumika Bhavana Gunpreet Thara Srikanth and all my friends who kept me going through the trying times of my Masters Finally my sincere gratitude and love goes out to my mom Ms N Pushpalatha and my dad Mr M Purushotham They have been my role models and have shaped me to be the positive and independent person that I am today My brother Arvind has been very loving and supportive throughout and this thesis is dedicated to my family September 8 2010 iii ABSTRACT LOW COMPLEXITY H 264 ENCODER USING MACHINE LEARNING Thejaswini Purushotham M S The University of Texas at Arlington 2010 Supervising Professor K R Rao H 264 is currently one of the most widely accepted video coding standards in the industry Several software and hardware solutions for the H 264 video encoder exist in the market at present H 264 is used in such applications as Blu ray Disc videos on the internet digital video broadcast direct broadcast satellite television service cable television services and real time videoconferencing This thesis uses the WEKA Waikato Environment for Knowledge Analysis tool to generate the classification rule WEKA is detailed in Chapter 3 The input attributes to the WEKA have been calculated from the video sequence to be encoded The procedure has been elaborated in Chapter 4 For real time applications like videoconferencing it is essential that the encoding time taken by the video codec be as low as possible In the H 264 video codec the macroblock mode decision in inter frames is computationally the most expensive process since it uses such features as variable block size motion estimation and quarter pixel motion compensation in H 264 encoder Hence the goal of this thesis is to reduce the encoding time while conserving the quality and compression ratio iv Machine learning has been used to decide the mode decisions and hence reduce the motion estimation time The proposed machine learning method on an average decreases the encoding time by 42 86405 for mode decisions in H 264 encoder with a loss of only 01070 decrease in structural similarity index metric SSIM Motion Estimation is the most time consuming part of the encoder An average of 60 70 of the total encoding time is taken for motion estimation The time consuming sum of absolute differences SAD method adopted in the H 264 encoder in JM 16 2 software has been replaced with a classification rule Assuming FS Full Search and P block types Q reference frames and a search range of MxN MxNxPxQ computations are needed The classification rule has been implemented as a series of if else statements The time taken to execute the if else statements is lesser than the time taken to execute the SAD Hence this thesis describes a reduction in the H 264 encoder execution time v TABLE OF CONTENTS ACKNOWLEDGEMENTS iii ABSTRACT iv LIST OF ILLUSTRATIONS ix LIST OF TABLES xi Chapter Page 1 INTRODUCTION 1 1 1 Significance 1 1 2 Summary 4 2 H 264 VIDEO CODEC 5 2 1 Introduction 5 2 2 Profiles and Levels 8 2 2 1 Baseline Profile 8 2 2 2 Main Profile 8 2 2 3 High Profile 8 2 2 4 Extended Profile 8 2 2 5 High Profiles defined in FRexts amendments 9 2 2 6 Overview of Scalable Video Codec 12 2 2 6 1 Spatial Scalability 14 2 2 6 2 Inter layer intra prediction 14 2 2 6 3 Inter layer macroblock mode and motion prediction 14 2 2 6 4 Inter layer residual prediction 14 2 2 6 5 SNR Scalability 15 2 2 6 6 Fine Grain Scalability 16 vi 2 2 6 7 Medium Grain Scalability 16 2 2 6 8 Temporal Scalability 16 2 2 7 Levels in H 264 17 2 3 H 264 Encoder 19 2 3 1 Inter Prediction 20 2 3 2 Intra Prediction 23 2 3 3 Transform Coding 25 2 3 4 Deblocking Filter 27 2 3 5 Entropy Coding 29 2 3 6 B slices and adaptive weighted prediction 30 2 3 7 H 264 Decoder 32 2 4 Summary 33 3 MACHINE LEARNING 34 3 1 Machine Learning Methods 35 3 1 1 Rote Learning 51 35 3 1 2 Inductive Learning 52 35 3 1 3 Analogy Learning 51 36 3 1 4 Explained Learning 36 3 1 5 Learning Based on Neural Networks 37 3 1 6 Knowledge Discovery 51 38 3 2 Applications of Machine Learning 51 39 3 3 Weka 40 3 4 C4 5 Algorithm 42 3 4 1 Algorithm 42 3 4 2 Flow and Feature Definitions 57 43 3 4 3 Feature Selection Algorithms 43 vii 3 4 4 The C4 5 Tree Construction Algorithm 53 44 3 5 Summary 49 4 PROPOSED ENCODER 50 4 1 Introduction 50 4 2 Approach 50 4 3 Experimental Results 52 4 4 Observations 74 5 CONCLUSIONS AND FUTURE WORK 75 4 1 Conclusions 75 4 2 Future Work 76 APPENDIX A STRUCTURAL SIMILARITY INDEX METRIC SSIM 77 B VIDEO SEQUENCES CONSIDERED IN THE THESIS 81 REFERENCES 84 BIOGRAPHICAL INFORMATION 91 viii LIST OF ILLUSTRATIONS Figure Page 1 1 H 264 AVC products to video related markets 51 3 2 1 Different profiles in H 264 with Distribution of various coding tools among the profiles 8 8 2 2 Tools introduced in FRExts and their classification under the new high profiles 11 10 2 3 Scalable video coding 50 13 2 4 The basic styles of scaling in video coding 50 13 2 5 Coding structure example with two spatial layers 47 15 2 6 H 264 Encoder block diagram 1 20 2 7 4x4 Luma prediction intra prediction modes in H 264 1 21 2 8 …
View Full Document