DOC PREVIEW
UT EE 382C - MPEG-4 Structured Audio Systems

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1MPEG-4 Structured Audio SystemsMihir AnandparaThe University of Texas at [email protected] MPEG-4 standard has been proposed to provide high quality audio and video content over the Internet. Thiscontent is represented in the form of audiovisual objects. However, different parts of the audiovisual scene are encodedseperately depending on the nature of the data to be encoded. The standard calls for aggresive coding techniques whichensure reception of high quality audio and video at low bit rates. One such encoding scheme for audio, generally appliedto synthetic audio and sound effects, is known as Structured Audio. This paper presents a survey on the structuredrepresentation of sound and the ways it can be decoded and synthesized at the receiving terminal.All the different components of the audio scene are decoded seperately (Structured Audio is one of the components).The different decoded streams are then composed into one coherent sound signal and presented to the user. This audiocomposition in MPEG-4 is accomplished by the AudioBIFS layer in the MPEG-4 decoder. The nodes in this layercompose the different sounds together and also provide some abstract effects and virtual reality effects post-processingon the entire audio scene. Some of the post-processing can also be represented in the structured audio format that isused for synthetic sound encoding. This project focuses on the working of the AudioBIFS layer, specifically the nodesin this layer which rely use a structured representation encode postprocessing of audio effects, such as reverbrations.I. INTRODUCTIONStreaming audio and video content on the Internet has become increasingly popular. Several stan-dards have been proposed for dealing with streaming audio and video. MPEG-4 is the first standardthat addresses presentation content as a set of audio visual objects. The main functionalities inMPEG-4 are content-based coding, universal accessibility, and good coding efficiency [2]. Usingthe new MPEG standard will result in audio, speech and video representations which can be trans-mitted at very low bit rates, but render high fidelity [11].Traditional audio coding techniques can be divided into two categories. Lossless encoders removeentropic redundancy from the sound signal. This redundancy exists due to the fact that successivesamples of the data are correlated, and some redundancy may be eliminated using this principle.Also, more frequently occuring samples can be encoded with shorter codes than less frequentlyoccuring samples, as in Huffman coding. On the other hand, lossy encoders (MP-3, Real Audio)remove perceptual redundancy from a sound signal. These encoding schemes remove those details2from the sound signal that cannot be perceived by the human ear. They use physo-acoustic principlesto determine what part of the signal is redundant and can be eliminated.The MPEG-4 standard has been developed for state-of-the-art representation, transmission and de-coding of multimedia objects at a low bit-rate. MPEG-4 audio can be used for every applicationrequiring the use of advanced sound compression, synthesis, manipulation and playback [3]. Thetraditional coding techniques discussed above are not enough to represent audio signals containing alarge amount of musical content or sound effects, and still maintain bandwidth efficiency. However,sound signals, especially music signals, represent another level of redundancy known as structuralredundancy. In a soundtrack, many notes or sound events sound the same or very similar. Also,many soundtracks contain repeating patterns, such as drumbeats. If all parts of the soundtrack canbe represented symbolically, a lot of redundant information can be eliminated [5]. This character-sitic of soundtracks motivates the use of a symbolic and structured technique for representing soundsignals which can be transmitted at a much lower bandwidth. This symbolic representation of soundis referred to as structured audio in MPEG-4.Structured audio transmits sound in two parts: a description of a model and a set of parametersthat make use of the model [1]. Every different signal or sound in a soundtrack is mapped to aparticular model. Each model has several parameters associated with it. The parameters can beinterpreted as high level features about the sound signal. They can be varied to control various per-ceptual aspects like amplitude and pitch. This kind of structure and flexibility lends itself well toapplications involving synthesized sound, since the sound of each instrument can be associated witha specific model and decoded accordingly at the reciever.This paper is organized as follows. Section 2 gives a detailed description of the stuctured audiocomponent of the MPEG-4 standard, including encoding, parameter representation techniques and3audio postprocessing. Section 3 deals with decoding and synthesis of sound from a structured audiorepresentation. Section 4 gives an overview on the objectives and goals of this project.II. STRUCTURED AUDIO IN MPEG-4A. Representation of structured audio in MPEG-4: SAOLThe MPEG-4 Audio standard (ISO/IEC 14496 - Part 3) [3] consists of a subsection on repre-sentation, decoding and synthesis of structured sound. There are many ways to obtain and use astructured representation of signal, described in greater detail in [5]. The MPEG-4 standard allowsfor sound-synthesis algorithms to be specified as a computer program. Several computer music lan-guages [9], [4] have been developed, which use the concept of unit-generators. Unit generators aresimply functional blocks like oscillators, filters and envelopes, that are connected together to form anetwork for the signal path flow.A new language known as Structured Audio Orchestra Language (SAOL) has been developed forrepresentation of structured audio and effects in MPEG-4 audio scenes. Any MPEG-4 structuredaudio scene can be divided into two parts - the orchestra and the score. SAOL defines an orchestraas a set of instruments, where each instrument corresponds to a seperate model, and describles somedigital signal processing algorithm that produces or manipulates sound. The score contains infor-mation which can be used to control parameters to the signal processing algorithms described inthe orchestra at run-time. The structured audio score is encoded in another language, known as theStructured Audio Score Language (SASL). This seperation of the algorithm description and controllends flexibility


View Full Document

UT EE 382C - MPEG-4 Structured Audio Systems

Documents in this Course
Load more
Download MPEG-4 Structured Audio Systems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view MPEG-4 Structured Audio Systems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view MPEG-4 Structured Audio Systems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?