Multiplexing the elementary streams of H 264 video and MPEG4 HE AAC v2 audio de multiplexing and achieving lip synchronization Naveen Siddaraju and K R Rao IEEE Fellow Abstract Television broadcasting applications NTRODUCTION Electrical Engineering Department University of Texas I at IArlington Arlington TX such as ATSC M H DVB 16 require that the Mobile broadcast systems are becoming Email naveen siddaraju mavs uta edu rao uta edu encoded audio and video streams to be increasingly popular as cellular phones and transmitted across a network in a single highly efficient digital video compression transport stream containing fixed sized data techniques merge to enable digital TV and packets that can be easily recognized and multimedia reception on the move Mobile decoded at the receiver MPEG2 part1 specifies television broadcast systems like DVB H two layers of packetization to achieve a digital video broadcast handheld 16 and transport stream suitable for digital ATSC M H advanced television systems transmission In a broadcasting system committee mobile handheld 17 18 21 multiplexing is a process in which two or more have relatively small bandwidth allocation and elementary streams are converted into a single the processing power at the target device transport stream ensuring synchronous playback decoder also varies Hence the choice of the of the elementary streams and proper buffer compression standards used plays an important behavior at the decoder This paper presents a role H 264 1 5 6 and HEAACv2 2 7 scheme to multiplex the elementary streams of 13 are the codecs used in the proposed method H 264 video and HE AAC v2 audio using the for the video and audio respectively MPEG2 systems specifications 4 then deH 264 5 is the latest and the most advanced multiplex the transport stream and playback the video codec available today It was jointly decoded elementary streams with lip developed by the VCEG video coding experts synchronization or audio video synchronization group of ITU T international This paper briefly introduces the MPEG2 telecommunication union and the MPEG systems two layers of packetization namely moving pictures experts group of ISO IEC program elementary stream PES and transport international standards organization This stream TS it also introduces the concept of standard achieves much greater compression timestamps and the approach followed in this than its predecessors like MPEG 2 video 37 paper and finally the proposed multiplexing MPEG4 part 2 visual 38 etc But the higher and de multiplexing algorithms approach coding efficiency comes at the cost of increased followed to achieve synchronization is explained complexity The H 264 has been adopted as the followed finally by the results of the video standard for many applications around the implementation world including ATSC 21 Index terms H 264 HEAACv2 multiplexing MPEG2 systems PES TS HEAACv2 or High efficiency advanced audio codec version 2 also known as enhanced aacplus is a low bit rate audio codec defined in MPEG4 audio profile 2 belonging to the AAC family It is specifically designed for low bit rate applications such as streaming mobile broadcasting etc HE AAC v2 has been proven to be the most efficient audio compression tool available today It comes with a fully featured toolset which enables coding in mono stereo and multichannel modes up to 48 channels HEAACv2 7 is the adopted standard for ATSC M H and many other systems around the world The encoded bit streams or elementary streams of H 264 and HEAACv2 are arranged as a sequence of access units An access unit is a coded representation of a frame Since each frame is coded differently the size of each access unit also varies In order to transmit a multimedia content audio and video across a channel the two streams has to be converted in to a single stream of fixed sized packets For this the elementary streams has to undergo two layers of packetization Fig 1 The first layer of packetization yields Packetized Elementary Stream PES and the second layer of packetization where the actual multiplexing takes place results in a stream of fixed sized packets called as Transport Stream TS These TS packets are what are actually transmitted across the network using broadcast techniques such as those used in ATSC and DVB 16 PES packets are obtained after the first layer of packetization of audio and video coded data This packetization process is carried out by sequentially separating out the audio and video elementary streams into access units Hence each PES packet is an encapsulation of one frame of coded data Each PES packet contains a packet header and the payload data from only one particular stream PES header contains information which can distinguish between audio and video PES packets Since the number of bits used to represent a frame in the bit stream varies for both audio and video the size of the PES packets also varies Figure 2 shows how the elementary stream is converted into PES stream Fig 2 Conversion of an elementary stream into PES packets 29 The PES header format used is shown in table 1 The PES header starts with a 3 byte packet start code prefix which is always 0x000001 followed by 1 byte stream id Stream id is used to uniquely identify a particular stream Stream id along with start code prefix is known as start code 4 bytes PES packet length may vary and go up to 65536 bytes In case of longer elementary stream the packet length may be set as unbound i e 0 only in the case of video stream The next two bytes in the header is the time stamp field which contains the playback Fig 1 MPEG2 two layers of packetization 22 A PACKETIZED ELEMENTARY STREAMS PES time information In the proposed method frame number is used to calculate the playback time which is explained next per second fps So assuming that the decoder Name Size in Bytes Description Packet start code prefix 3 0x000001 1 Unique ID to distinguish between audio and video PES packet Examples Audio streams 0xC0 0xDF Video streams 0xE0 0xEF 3 Stream id Note the above 4 bytes together known as start code PES Packet length 2 The PES packet can be of any length A value of zero for the PES packet length can be used only when the PES packet payload is a video elementary stream Time Stamp 2 frame number Table 1 PES header format 4 B TIME STAMP has a prior knowledge about the fps of the video sequence the presentation time PT or the playback time of a particular video frame can be calculated using 1 Video
View Full Document