Columbia ELEN E4896 - Source Separation

Unformatted text preview:

E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Lecture 14:Source SeparationDan EllisDept. Electrical Engineering, Columbia [email protected] http://www.ee.columbia.edu/~dpwe/e4896/11. Sources, Mixtures, & Perception2. Spatial Filtering3. Time-Frequency Masking4. Model-Based SeparationELEN E4896 MUSIC SIGNAL PROCESSINGE4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /191. Sources, Mixtures, & Perception•Sound is a linear process (superposition)no “opacity” (unlike vision)sources → “auditory scenes” (polyphony)•Humans perceive discrete sources.. a subjective construct202_m+s-15-evil-goodvoice-fade0 2 4 6 8 10 12time/sfrq/Hz02000100030004000Voice (evil)StabRumbleStringsChoirVoice (pleasant)Analysislevel / dB-60-40-200E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Spatial Hearing•People perceive sources based on cuesspatial (binaural): ITD, ILD32.2 2.205 2.21 2.215 2.22 2.225 2.23 2.235-0.1-0.0500.050.1time / sshatr78m3 waveformLeftRightpath length differencepath length differencehead shadow (high freq)sourceLRBlauert ’96E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Auditory Scene Analysis•Spatial cues may not be enough/availablesingle channel signal•Brain uses signal-intrinsic cues to form sourcesonset, harmonicity4time / secfreq / kHz0 0.5 1 1.5 2 2.5 3 3.5 4level / dB01234-40-30-20-100102030Reynolds-McAdams OboeBregman ’90E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Auditory Scene Analysis“Imagine two narrow channels dug up from the edge of a lake, with handkerchiefs stretched across each one. Looking only at the motion of the handkerchiefs, you are to answer questions such as: How many boats are there on the lake and where are they?” (after Bregman’90) •Quite a challenge!5E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Audio Mixing•Studio recording combines separate tracks into, e.g., 2 channels (stereo)different levelspanningother effects•Stereo Intensity Panningmanipulating ILD onlyconstant powermore channels: use just nearest pair?6L R−1 −0.5 0 0.5 100.51E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /192. Spatial Filtering•N sources detected by M sensorsdegrees of freedom(else need other constraints)•Consider 2 x 2 case:directionalmics→mixing matrix:7m1s1m2s2a21a22a12a11m1m2=a11a12a21a22s1s2ˆs1ˆs2=ˆA1mE4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Source Cancelation•Simple 2 x 2 case example:8m1m2=10.50.81s1s2m1(t)=s1(t)+0.5s2(t)m2(t)=0.8s1(t)+s2(t) m1(t)  0.5m2(t)=0.6s1(t)if no delay and linearly-independent sums, can cancel one source per combinationE4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Independent Component Analysis•Can separate “blind” combinations by maximizing independence of outputskurtosis for independence?9m1 m2s1 s2a11 a21a12 a22x−δ MutInfo δa-0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4-0.6-0.4-0.200.20.40.60.8mix 1mix 20 0.2 0.4 0.6 0.8 1024681012 / kurtosisKurtosis vs. Mixture Scatters1s1s2s2kurt(y)=Ey  µ4 3Bell & Sejnowski ’95E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Microphone Arrays•If interference is diffuse, can simply boost energy from target directione.g. shotgun mic - delay-and-sumoff-axis spectral colorationmany variants - filter & sum, sidelobe cancelation ...10DD+D++-40-200λ = 4Dλ = 2Dλ = D∆x = c . DBenesty, Chen, Huang ’08E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /193. Time-Frequency Masking•What if there is only one channel?cannot have fixed cancellationbut could have fast time-varying filtering:•The trick is finding the right mask...11freq / kHz02468freq / kHz02468time / s0.5 1 1.5 2 2.5 3Brown & Cooke ’94Roweis ’01E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Time-Frequency Masking•Works well for overlapping voicestime-frequency resolution?12freq / kHz02468freq / kHz02468level / dB-2002040time / sec time / sec0 0.5 1 0 0.5 1MaleOriginalMix +OracleLabelsOracle-basedResynthFemalecooke-v3n7.wa vcooke-v3msk-ideal.wa v cooke-n7msk-ideal.wavE4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Pan-Based Filtering•Can use time-frequency masking even for stereoe.g. calculate “panning index” as ILDmask cells matching that ILD13Avendano 2003 freq / kHz 0246level / dBlevel / dB ILD / dBfreq / kHzILD mask 1024 pt win −2.5 .. +1.0 dB 0 5 10 15 20time / s0246−20020−505−20020E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Harmonic-based Masking•Time-frequency masking can be used to pick out harmonicsgiven pitch track, know where to expect harmonics14Denbigh & Zhao 1992freq / kHz01234freq / kHz1234time / s0 1 2 3 4 5 6 7 80E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Harmonic Filtering•Given pitch track, could use time-varying comb filter to get harmonicsor: isolate each harmonic by heterodyning:15Avery Wang 1995ˆx(t)=kˆak(t)cos(k ˆ0(t)t)ˆak(t)=LP F {|x(t)ejk ˆ0(t)t|}time / sfreq / kHz0 1 2 3 4 5 6 7 801234freq / kHz01234freq / kHz01234E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19Nonnegative Matrix Factorization•Decomposition of spectrogramsinto templates + activationfast & forgiving gradient descent algorithmfits neatly with time-frequency masking16Lee & Seung ’99Abdallah & Plumbley ’04Smaragdis & Brown ’03Virtanen ’07Smaragdis ’041 2 3Bases from all Wt50 100 150 200321Rows of HTime (DFT slices)Frequency (DFT index)50 100 150 20020406080100120X = W · HVirtanen ’03soundsE4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /194. Model-Based Separation•When N (sources) > M (sensors), need additional constraints to solve probleme.g. assumption of single dominant pitch•Can assemble into a model M of source sidefines set of “possible” waveforms..probabilistically: •Source separation from mixture as inference: where17Pr(si|M )s = {si} = arg maxsPr(x|s,A)P (A)iPr(si|M)Pr(x|s,A)=N (x|As,)E4896 Music Signal Processing (Dan Ellis) 2013-04-29 - /19TimeFrequencyStereo − instantaneous mix0.5 1 1.5 2 2.5 301000200030004000TimeFrequency0.5 1 1.5 2 2.5 301000200030004000TimeFrequencySeparated source 10.5 1 1.5 2 2.5 301000200030004000TimeFrequencySeparated source 20.5 1 1.5 2 2.5 301000200030004000TimeFrequencySeparated source 30.5 1 1.5 2 2.5


View Full Document

Columbia ELEN E4896 - Source Separation

Download Source Separation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Source Separation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Source Separation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?