(((Virtual Surround Sound)))Harrison King HallNovember 14, 2006 MIT Course 6.111 Project Presentation(((Vision)))Surround Sound pervades most Home Audio devices, but it is infeasible for many applications. By virtualizing the speakers we can minimize system footprint and cost while maintaining a viable immersion experience.(((Steps)))Source Capable of Producing a Dolby Digital Bitstream Dolby Digital DecoderHRTF GeneratorAC97HeadphoneorSpeakersLeft_Ear Right_EarDolby_Digital_Bitstream_InLeft_Surround_Out Left_Front_Out Center_Channel_Out Right_Front_Out Right_Surround_Out Subwoofer_OutReadyReadyClock27MhzClock27MhzClockresetresetreset/1827MhzClockClockClock/18/18/18/18/18Downstream_ReadyDownstream_Ready/18/18Left_Surround_In Left_Front_In Center_Channel_In Right_Front_In Right_Surround_In Subwoofer_InAudio_OutOverview of Module Diagram(((Steps))).1Find a Source•Take disparate streams of PCM data•“Freely” available encoded source•Encode our own(daunting)(((Steps))).2Decode the SourceAdvanced Television Systems Committee, Inc. Document A/52B526.1.1.1 Continuous or Burst InputThe encoded AC-3 data may be input to the decoder as a continuous data stream at the nominalbit-rate, or chunks of data may be burst into the decoder at a high rate with a low duty cycle. Forburst mode operation, either the data source or the decoder may be the master controlling the bursttiming. The AC-3 decoder input buffer may be smaller in size if the decoder can request bursts ofdata on an as-needed basis. However, the external buffer memory may be larger in this case.6.1.1.2 Byte or Word AlignmentMost applications of this standard will convey the elementary AC-3 bit stream with byte or (16-bit) word alignment. The syncronization frame is always an integral number of words in length.Figure 6.1 Flow diagram of the decoding process.Figure 3: Dolby Digital Decoding Main Operation OrderingHRTF! Now that we have the individual data streams and there is a well-defined specification for the ideal location for placement around a user in a particular orientation we can calculate the appropriate time and frequency shifts for the source to make it appear that the audio on that channel is coming from any location within the sphere around the user.! The reason this technology works is that sound signals do not arrive at the same time or with the same frequency at each ear. This is in fact different for every individual, but without extensive individualized user testing there can not be anyone ideal solution produced. The solution is to take the average of a wide variety of different users to get an idea of a good average value for coefficients. This information is freely available and we will be consulting several of them to get the best fit possible.! This processing basically boils down to a FFT of each source for each ear and combined with the signal as it would appear to a microphone at the center of the head. This then is converted back to PCM data that has been shifted in frequency and phase so that it appears to be coming from a virtual speaker. There are also small effects that impact our perception of where data is coming from, namely the shape of one’s chest and the shape of the ear and ear cavity. Much of this is dealt with in Pinna multipliers where 10-20 different calculations, representing different measurements of the ear, sum to form a better image of the sound as it would appear from a surround sound system.Bit-stream Orderingand Control SignalsPre-Transform OperationsIFFTPost-Transform Operations1.Bit-stream Ordering and Control Signals2.Pre-Transform Operations3.IFFT Transform4.Post-Transform Operations(((Steps))).2Decode the Source: Module DiagramDolby Decoder27Mhz Clock to all modulesreset to all modulesFrame_Controllerbyte_stream_inputInput_BufferCRC1 (LFSR)PCM_GeneratorIFFTbuffer_write_datafinished_framebytestream_error_detectedinput_wordstopifft_coeff_inputsolution_outputside_info_readyeofifft_coeffsolution_ready ifft_doneifft_solution/18/18/18/18/18frame_field_infoframe_info_locsubrear_l rear_rfront_rcenterfront_lerror_detectedread_mem_bankifft_sizeifft_sizeSide_Information_Unpackercrc_errorframe_data_locupacked_frame_datadata_wordword_outreadyeof stream_input_errorat_58_thsSyncFrame_4X(2kX16 BRAM)_Wrapperwrite_addr write_dataread_addrread_dataread_mem_bankifft_startstartlast_wrote_memnext_mem/2/2/16/11/16/16crc_errorfinished_frame/11 /16error_detected/18/32/18downstream_ready(((Steps))).2.1Bit-stream syncing and Control Signals•Syncframe Start Detection•Syncword detection alone = ~15%•Syncword and CRC = ~.0015%•Out-of-Control Control Bits•Huge number•Store in a “pipelined” memory where each address maps to a single agreed upon value, so we can access only the specific data that we need•(((Steps))).2.2Pre-Transform Processing•What are we really looking at in an AC-3 stream?•“The actual audio information conveyed by the AC-3 bit stream consists of the quantized frequency coefficients. The coefficients are delivered in floating point form, with each coefficient consisting of an exponent and a mantissa.”•Steps:•Need to generate the set of exponents for each AudioBlock or for all 6 audio blocks(determined by A0)•Take set of exponents and determine number of bits to assign the mantissa for decoding•Decouple and re-matrix the input (if necessary) •Scary huge complications and will likely take a long time to write(i.e. Thanksgiving...maybe)(((Steps))).2.3Inverse Fast Fourier Transform•IFFT is well defined•2 possible block lengths(variable) 256 or 512•The 256 length requires 2 to maintain accuracy•They provide specific implementations, but CoreLogic FFT module might do same thing...(((Steps))).2.4Post-Transform Processing•Since the windows each contain 256 pieces of audio data we need to overlap and add them together•THIS GENERATES PCMs•Buffer them out and then on the request we shuttle them off(((Steps))).3Head-Related Transfer Function•Use phase and frequency shift to make a virtual speaker appear at some locationumd00_HRTF_pinna.aiHRTF FOR ISOLATED PINNAFrequency (kHz)246810121416-15-10-5051015dBElevation (deg)0 100 200-15-10-5051015Frequency (kHz)246810121416Elevation (deg)0 100 200Frequency (kHz)246810121416-15-10-5051015Frequency (kHz)246810121416-15-10-5051015dBFull HRTFHead and torsoPinnaumd00_HRTF_contributions.aiCONTRIBUTIONS TO THE HRTFjh_structural_model.aiA STRUCTURAL
View Full Document