Efficient Large Size Transforms for High Performance Video Coding Rajan Joshi Yuriy A Reznik and Marta Karczewicz Qualcomm Inc 5775 Morehouse Drive San Diego CA USA 92121 ABSTRACT This paper describes design of transforms for extended block sizes for video coding The proposed transforms are orthogonal integer transforms based on a simple recursive factorization structure and allow very compact and efficient implementations We discuss techniques used for finding integer and scale factors in these transforms and describe our final design We evaluate efficiency of our proposed transforms in VCEG s H 265 JMKTA framework and show that they achieve nearly identical performance compared to much more complex transforms in the current test model Keywords discrete cosine transform DCT factorization multiplicative complexity scaled transform video coding H 264 H 265 HEVC 1 INTRODUCTION 1 3 Discrete Cosine Transform of type II DCT II is a fundamental operation performed by the majority of today s image and video compression algorithms It was first suggested by N Ahmed T Natarajan and K R Rao 1 and subsequent research provided a number of theoretical arguments for its use such as energy compaction property asymptotic equivalence of DCT to the Karhunen Lo ve transform for signals produced by Markov 1 process with high correlation coefficient etc see e g 3 Chapter 3 for a survey of the related results DCT II of size 8 has served as the transform of choice in H 261 JPEG MPEG 1 MPEG 2 H 263 and MPEG 4 visual standards 2 4 9 More recent standards such as MPEG 4 AVC H 264 10 VC 1 11 and AVS12 have adopted integer approximations of DCT II with transform sizes 4 8 and 16 An emerging JPEG XR image compression standard13 uses overlapping transforms which are also based on 4 point DCT II kernels A new emerging standard High Efficiency Video Coding HEVC currently under development by Joint Collaborative Team of video experts from MPEG and ITU T SG16 JCT VC 14 includes a number of integer transforms of sizes ranging from 4 to 64 As video resolutions keep increasing it is possible that even larger transforms will be considered in the future In this paper we describe design of scaled integer transforms which are numerically stable fully recursive in structure and remain orthogonal with perfect scaling in the absence of quantization As such they are well suitable for use in future video coding applications Described transforms have been proposed to ITU T SG16 Q6 VCEG standardization committee 21 and were also included in Qualcomm s response to JCT VC call for proposals22 This paper is organized as follows In Section 2 we describe design of underlying factorization that we use in the transform Section 3 discusses conversion to integer arithmetic and other implementation aspects Section 4 provides experimental results obtained using this transform in ITU T SG16 Q6 JMKTA video coding model Conclusions are drawn in Section 5 2 FACTORIZATION Let xn n 0 N 1 be a sequence of input samples i e line of pixel values DCT II and its inverse transform over this sequence are defined as follows X kII N 1 2 2n 1 k k xn cos N 2N n 0 Corresponding author Email yreznik ieee org Phone 858 658 1866 k 0 N 1 xn 2 N N 1 X k 0 II k 2n 1 k 2N k cos n 0 N 1 where k 1 2 if k 0 and 1 otherwise DCT IV transform and its inverse are defined as follows 2 N X kIV xk 2 N N 1 x n n 0 N 1 X n 0 IV k cos 2n 1 2k 1 k 0 N 1 4N cos 2n 1 2k 1 n 0 N 1 4 N We note that in video coding we usually work with NxN matrices of input data and so the above transforms need to be applied to all rows and columns in a separable fashion to produce corresponding matrices of transform coefficients2 Hereafter we will adopt such separable model in our design of integer 2D transforms and will focus mainly on speeding up computations of the component 1D transforms For further convenience we omit normalization factors 2 N and k and define matrices 2n 1 k C NII n k cos 2N n k 0 N 1 2n 1 2k 1 C NIV n k cos n k 0 N 1 4N representing coefficients of DCT II and DCT IV transforms correspondingly It is well known that even sized DCT II matrix can be factored into a product containing direct sum of smaller DCT II and DCT IV matrices as follows2 3 C II IN 2 J N 2 0 1 C NII PN N 2 IV CN 2 J N 2 J N 2 I N 2 0 where PN is a permutation matrix producing reordering xi x2i xN 2 i x2 i 1 i 0 1 N 2 1 2 and where IN 2 and JN 2 denote N 2 N 2 identity and order reversal matrices respectively Chen Smith Fralick15 Wang and many other well known DCT II factorizations2 3 rely on factorization 1 as a basic step in their decimation process We next apply the following decomposition to the DCT IV block in 1 CNIV 1 I N 2 1 PNT I N 2 1 0 I N 2 1 I N 2 1 0 IN 2 0 1 0 CNII 2 EN 2 J N 2 0 0 IN 2 CNII 2 0 0 R EN 2 N 3 where PN is a reordering matrix 2 EN 2 is the diagonal sign alteration matrix EN 2 diag 1 k 0 1 N 2 1 k 4 While faster non separable 2D designs can possibly be created 3 their large expanded structure and the need for support of multiple including hybrid e g 8x16 block sizes makes this approach much less appealing in practice RN is the matrix of Givens rotations cos 4 N RN sin 4N and where 4 N cos 4N sin cos 3 4N sin O 3 4N N cos N 1 sin sin 4N N 1 cos 4N N 1 4N N 1 4N N O 3 sin 4N cos 3 4N 5 CNII 2 denotes matrices of the remaining half sized DCT II transforms This decomposition of DCT IV is very similar to the one derived by G Plonka and M Tache17 In our formulation 3 all normalization factors of 1 2 are outside the transform Finally by nesting 1 and 3 we can now close the recursion For example an N 16 DCT II will be decomposed into 8 point DCT II and DCT IV Then 8 point DCT IV will be split into two 4 point DCT II transforms The 8 point DCTII will be split into 4 point DCT II and DCT IV This is continued until we reach 2 point blocks i e simple butterflies for both DCT II and DCT IV We show full flow graph of this factorization in Figure 1 This factorization is numerically stable only planar rotations are used throughout it is fully …
View Full Document