RATE-DISTORTION OPTIMIZED BITSTREAM

Home> Academic Documents> RATE-DISTORTION OPTIMIZED BITSTREAM

DOC PREVIEW

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

RATE-DISTORTION OPTIMIZED BITSTREAM EXTRACTOR FOR MOTION SCALABILITYIN SCALABLE VIDEO CODINGMeng-Ping Kao and Truong NguyenUniversity of California, San Diego, Department of ECEhttp://videoprocessing.ucsd.edu, [email protected], [email protected] scalability is designed to improve the coding efficiency of ascalable video coding framework, especially in the medium to lowrange of decoding bit rates or spatial resolutions. In order to fullybenefit from the superiority of motion scalability, a rate-distortionoptimized bitstream extractor, which determines the optimal motionquality layer for each decoding scenario, is required. In this paper,the determination process first starts off with a brute force search-ing algorithm. Although guaranteed by the optimal performancewithin the search domain, it has high computational complexity.Two properties, i.e. the monotonically non-decreasing property andthe unimodal property, are then derived to accurately describe therate-distortion behavior of motion scalability. Based on these twoproperties, modified searching algorithms are proposed to reduce thecomplexity by a factor up to 5.Index Terms— Bitstream extractor, motion scalability, rate dis-tortion optimization, scalable video coding1. INTRODUCTIONA typical scalable video coding (SVC) infrastructure, as shown inFig. 1, is composed of three main building blocks, i.e. the encoder,the decoder, and the bitstream extractor. Compared to a conventionalnon-scalable video codec, the decoder in SVC is allowed to demanda variety of decoding specifications, including different combina-tions of spatial, temporal, and quality layers. It is the main task ofan SVC bitstream extractor to fulfill those requests by properly trun-cating the scalable bitstream.The designing criteria for a generic SVC bitstream extractor canbe rather trivial. In the SVC standard [1], for example, each networkabstraction layer unit (NALU) belongs to a certain temporal, spatial,and quality layer and is tagged accordingly through high level syn-tax, temporalid(T ), dependency id(D),andquality id(Q).Inthecasewhereaspecific spatio-temporal resolution is explicitly in-dicated by T = Ttand D = Dt, the extraction can be easily doneby dropping all NALUs with T>Ttand D>Dt. If, however,there is an additional bit rate constraint imposed, which the remain-ing NALUs fail to meet, some NALUs with Q>0 have to be furtherdiscarded. This is the case where different designing principles comeinto effect, among which the rate-distortion (RD) optimized extrac-tion is the most popular one [2]. The overall idea is to retain thoseNALUs with higher RD contribution and therefore to optimize thequality under the rate constraint.When motion scalability [3] is taken into consideration, the bit-stream extractor has an additional requirement, i.e. optimal bit al-location among motion and texture [4]. In this paper, we focus onthe case where the decoding spatio-temporal resolution (T and D inthe SVC standard) is pre-specified and fixed. Under this setup, oneFig. 1. Scalable video coding.of the motion quality (MQ) layers, combining with the correspond-ing texture information, will provide the best reconstructed quality.As the target bit rate varies, however, the optimal MQ layer alsochanges accordingly. The optimal MQ layer as a function of decod-ing bit rate, if not provided by the encoder, will be determined bythe extractor. Based on this function, the adapted bitstream is guar-anteed with the best decoding quality throughout all possible rates,for this particular spatio-temporal resolution.The remainder of the paper is organized as follows. In Section2, a model-based theoretical justification of motion scalability is pre-sented. It will be clear how and when motion scalability can benefitthe coding efficiency. In Section 3, we propose three approachesfor optimal bitstream extraction, i.e. the brute force, model-assisted,and model-based methods. The properties that facilitate more effi-cient extractor designs are also derived here. Finally, experimentalresults are provided in Section 4 to verify the effectiveness of ournew designs .2. THEORETICAL JUSTIFICATION OF MOTIONSCALABILITYMotion information has traditionally been coded losslessly due to thecomplicated impacts that a corrupted motion may bring to the recon-structed video quality. In this section, we briefly review some modelsthat have been developed to describe the behavior of motion scala-bility. Combined with the exponential distortion-rate model from thesource coding theory, we are able to derive a beneficial condition formotion scalability.645978-1-4244-2354-5/09/$25.00 ©2009 IEEE ICASSP 20092.1. Linear Motion Distortion ModelThe first work analyzing the distortion introduced by MV quanti-zation is done by Secker [5]. Under a series of assumptions andapproximations, he proposed a linear motion distortion model, asshown in (1), which describes the linear relationship (with slope Ψ)between the squared MV error, ||δ||2, and the corresponding meansquared MC error, or simply known as the motion distortion, Dm.Dm≈ Ψ||δ||2(1)Note that Ψ is the isotropic motion sensitivity factor, averaged overall MV errors with magnitude ||δ||, of the reference picture. It is afunction of the energy spectral density of the corresponding refer-ence picture, which is highly content dependent.2.2. Additive Distortion ModelIn a generic video coding framework, the MC operation is followedby the texture/residual transform coding and quantization. In thecase where motion coding is non-scalable, the total distortion of thereconstructed picture, D, can be simply described by the texture dis-tortion, Dt, which is introduced by the texture quantization opera-tion.However, if motion scalability is taken into consideration, themotion distortion may also contribute to the total distortion. The ad-ditive distortion model [5] states that the total distortion is the sum-mation of the motion distortion and the texture distortion. The as-sumption behind is that the motion error, m[n] − m∗[n], and thetexture error, t[n] − t∗[n], are orthogonal to each other.Dm+ Dt=1N1N2||m[n] − m∗[n]||2+ ||t[n] − t∗[n]||2 =1N1N2||m[n]+t[n] − c[n]||2= D,(2)where c[n ]=m∗[n]+t∗[n] is the original picture without distor-tion.2.3. Beneficial Condition for Motion ScalabilityAlthough the true distortion-rate model is data dependent and com-plicated, a simpler model has been derived and used for video texturecoding


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 4 pages.

Please select your school