E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Lecture 14:Source SeparationDan EllisDept. Electrical Engineering, Columbia [email protected] http://www.ee.columbia.edu/~dpwe/e4896/11. Sources, Mixtures, & Perception2. Spatial Filtering3. Time-Frequency Masking4. Model-Based SeparationELEN E4896 MUSIC SIGNAL PROCESSINGE4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /191. Sources, Mixtures, & Perception•Sound is a linear process (superposition)no “opacity” (unlike vision)sources → “auditory scenes” (polyphony)•Humans perceive discrete sources.. a subjective construct202_m+s-15-evil-goodvoice-fade0 2 4 6 8 10 12time/sfrq/Hz02000100030004000Voice (evil)StabRumbleStringsChoirVoice (pleasant)Analysislevel / dB-60-40-200E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Spatial Hearing•People perceive sources based on cuesspatial (binaural): ITD, ILD32.2 2.205 2.21 2.215 2.22 2.225 2.23 2.235-0.1-0.0500.050.1time / sshatr78m3 waveformLeftRightpath length differencepath length differencehead shadow (high freq)sourceLRBlauert ’96E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Auditory Scene Analysis•Spatial cues may not be enough/availablesingle channel signal•Brain uses signal-intrinsic cues to form sourcesonset, harmonicity4time / secfreq / kHz0 0.5 1 1.5 2 2.5 3 3.5 4level / dB01234-40-30-20-100102030Reynolds-McAdams OboeBregman ’90E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Auditory Scene Analysis“Imagine two narrow channels dug up from the edge of a lake, with handkerchiefs stretched across each one. Looking only at the motion of the handkerchiefs, you are to answer questions such as: How many boats are there on the lake and where are they?” (after Bregman’90) •Quite a challenge!5E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Audio Mixing•Studio recording combines separate tracks into, e.g., 2 channels (stereo)different levelspanningother effects•Stereo Intensity Panningmanipulating ILD onlyconstant powermore channels: use just nearest pair?6L R−1 −0.5 0 0.5 100.51E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /192. Spatial Filtering•N sources detected by M sensorsdegrees of freedom(else need other constraints)•Consider 2 x 2 case:directionalmics→mixing matrix:7m1s1m2s2a21a22a12a11m1m2=a11a12a21a22s1s2ˆs1ˆs2=ˆA1mE4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Source Cancelation•Simple 2 x 2 case example:8m1m2=10.50.81s1s2m1(t)=s1(t)+0.5s2(t)m2(t)=0.8s1(t)+s2(t) m1(t) 0.5m2(t)=0.6s1(t)if no delay and linearly-independent sums, can cancel one source per combinationE4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Independent Component Analysis•Can separate “blind” combinations by maximizing independence of outputskurtosis for independence?9m1 m2s1 s2a11 a21a12 a22x−δ MutInfo δa-0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4-0.6-0.4-0.200.20.40.60.8mix 1mix 20 0.2 0.4 0.6 0.8 1024681012 / kurtosisKurtosis vs. Mixture Scatters1s1s2s2kurt(y)=Ey µ4 3Bell & Sejnowski ’95E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Microphone Arrays•If interference is diffuse, can simply boost energy from target directione.g. shotgun mic - delay-and-sumoff-axis spectral colorationmany variants - filter & sum, sidelobe cancelation ...10DD+D++-40-200λ = 4Dλ = 2Dλ = D∆x = c . DBenesty, Chen, Huang ’08E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /193. Time-Frequency Masking•What if there is only one channel?cannot have fixed cancellationbut could have fast time-varying filtering:•The trick is finding the right mask...11freq / kHz02468freq / kHz02468time / s0.5 1 1.5 2 2.5 3Brown & Cooke ’94Roweis ’01E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Time-Frequency Masking•Works well for overlapping voicestime-frequency resolution?12freq / kHz02468freq / kHz02468level / dB-2002040time / sec time / sec0 0.5 1 0 0.5 1MaleOriginalMix +OracleLabelsOracle-basedResynthFemalecooke-v3n7.wa vcooke-v3msk-ideal.wa v cooke-n7msk-ideal.wavE4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Pan-Based Filtering•Can use time-frequency masking even for stereoe.g. calculate “panning index” as ILDmask cells matching that ILD13Avendano 2003 freq / kHz 0246level / dBlevel / dB ILD / dBfreq / kHzILD mask 1024 pt win −2.5 .. +1.0 dB 0 5 10 15 20time / s0246−20020−505−20020E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Harmonic-based Masking•Time-frequency masking can be used to pick out harmonicsgiven pitch track, know where to expect harmonics14Denbigh & Zhao 1992freq / kHz01234freq / kHz1234time / s0 1 2 3 4 5 6 7 80E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Harmonic Filtering•Given pitch track, could use time-varying comb filter to get harmonicsor: isolate each harmonic by heterodyning:15Avery Wang 1995ˆx(t)=kˆak(t)cos(k ˆ0(t)t)ˆak(t)=LP F {|x(t)ejk ˆ0(t)t|}time / sfreq / kHz0 1 2 3 4 5 6 7 801234freq / kHz01234freq / kHz01234E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Nonnegative Matrix Factorization•Decomposition of spectrogramsinto templates + activationfast & forgiving gradient descent algorithmfits neatly with time-frequency masking16Lee & Seung ’99Abdallah & Plumbley ’04Smaragdis & Brown ’03Virtanen ’07Smaragdis ’041 2 3Bases from all Wt50 100 150 200321Rows of HTime (DFT slices)Frequency (DFT index)50 100 150 20020406080100120X = W · HVirtanen ’03soundsE4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /194. Model-Based Separation•When N (sources) > M (sensors), need additional constraints to solve probleme.g. assumption of single dominant pitch•Can assemble into a model M of source sidefines set of “possible” waveforms..probabilistically: •Source separation from mixture as inference: where17Pr(si|M )s = {si} = arg maxsPr(x|s,A)P (A)iPr(si|M)Pr(x|s,A)=N (x|As,)E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19TimeFrequencyStereo − instantaneous mix0.5 1 1.5 2 2.5 301000200030004000TimeFrequency0.5 1 1.5 2 2.5 301000200030004000TimeFrequencySeparated source 10.5 1 1.5 2 2.5 301000200030004000TimeFrequencySeparated source 20.5 1 1.5 2 2.5 301000200030004000TimeFrequencySeparated source 30.5 1 1.5 2 2.5
View Full Document