Columbia ELEN E4896 - Alignment and Matching

Unformatted text preview:

E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22Lecture 12:Alignment and MatchingDan EllisDept. Electrical Engineering, Columbia [email protected] http://www.ee.columbia.edu/~dpwe/e4896/11. Music Alignment2. Cover Song Detection3. Echo Nest AnalyzeELEN E4896 MUSIC SIGNAL PROCESSINGE4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /221. Music Alignment•Often have versions of the same music with unmatched time axesdifferent performancesperformance vs. score•Various applications for aligning themsynchronizing different tracks (with TSM)synchronized score displayground truth transcriptions2Kurth et al., 2007E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /2216 32 48time / beatsCDEGABLet It Be - Nick CaveEuclidean Distance16Let It Be - The Beatles32 48time / beatsCDEGAB123456The Similarity Matrix•Point-to-point comparison of sequences3e.g. Euclidean distanceor normalized inner product (cosine distance)dcos(i, j)=1kxi(k)yj(k)k|xi(k)|2k|yj(k)|2deuc(i, j)=k|xi(k)  yj(k)|2dijjiFoote 1999E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22 0.1 0.4 0.6 0.8 1.1 1.6 2.3 2.90.3 0.3 0.4 0.6 1.0 1.5 2.1 2.60.5 0.5 0.4 0.5 0.9 1.5 2.2 2.70.8 0.8 0.7 0.6 0.7 1.2 1.8 2.41.2 1.2 1.0 1.0 0.7 0.9 1.2 1.51.4 1.5 1.3 1.2 1.0 0.9 1.3 1.52.1 2.2 2.0 1.8 1.5 1.3 1.2 1.62.7 2.8 2.5 2.3 2.0 1.7 1.6 1.51.51.20.90.70.60.40.30.11 2 3 4 5 6 7 81234567800.10.20.30.40.50.60.70.80.91Dynamic Programming•Find best path combining local + transitionsworks for any kind of similarity matrix4Allowable transitionsFinds path {ik, jk} to minimize cost ...... recursivelyCimax,jmax=kd(ik,jk)+T (ik ik1,jk jk1)T(1,0) = 0.1T(0,1) = 0.1T(1,1) = 0.0{ik, jk}{ik-1, jk-1}Local costs dij ; C* ; pathsCi,j=minx,y={(1,1),(1,0),(0,1)}d(i, j)+T (x, y)+Cix,jyijdijBellman 1957E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22Audio-to-Audio Alignment•Dynamic programming to get time mapping+ phase vocoder time scaling550 100 150 200 250 300 350 40050100150200250300350400450500E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22Audio-Score Alignment•Aligning a score representation (e.g. MIDI)is a proxy for polyphonic transcription6freq / Hztime / secLet It Be + aligned MIDI labels0 2 4 6 8 10 12 14 16 18 20 2202004006008001000E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22Peak Structure Distance•How do we match spectra to score notes?synthesize audio from MIDI & compare audio?“Peak Structure distance”: is energy where we expect?70 50 100 150 200 250 300 350 400time / framestime / framesC2C3C4C5C6time / secfreq / kHzfreq / bins note0 5 10 15 2000.5150 100 150 200 250 300 350 400 45020406080freq / bins204060800 50 100 150 200 250 300 350 4000MIDI “Piano roll”Synthesized audioPredicted spectrum = mask M[k]“Peak Structure”= energy blw maskOrio & Schwartz 2001dpsd=1kM [k]|X[k]|k|X[k]|E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /222. Cover Song Detection•Musicians are fond of ‘cover versions’usually alter melody, harmony, instrumentation, rhythm, stylecan be hard to spot even for a human!•Can try to match via alignment.. with some threshold on best alignment cost?8E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22Smith-Waterman Local Alignment•“Local alignment” measure want largest score S*similarity s(i, j) must exceed penalty P(x,y) on avg. (e.g. 0.96 for diagonal, 1.2 for off-diagonal) 9Beatles vs. Carol Woods − cosine dist50 100 15020406080100120140160180200Smith−Waterman cd/2,.96.1.250 100time / beatstime / beats•Cover version may have different formdifferent number, ordering of verse/chorus/brigewant to find any large aligned regionsSi,j= maxx,ymax{0,s(i, j)  P (x, y)+Six,jy}E4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22Local Alignment Cover Detection•Smith-Waterman needs predictable valuesuse binary similarity based on best transposition 10Serrà & Gòmez, 2008Euclidean Binary Non-coverE4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22Cross-correlation Covers System•DP is good for time-warping, but expensivebeat-timing is tempo independent (if it works)simply cross-correlate beat-chroma patches?11100 200 300 400 500beats100 200 300 400 500beatsGEDCAchroma binsGEDCAchroma binsextractcross-correlateQueryCandidatehow big are the pieces?how do we combine individual scores?also expensiveE4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22Global Cross-Correlation•Cross-correlate entire beat-chroma matrices... at all possible transpositions (circular)implicit combination of match quality and duration•One good matching fragment is sufficient...?12Ellis & Poliner, 2007100 200 300 400 500beats @281 BPM-500 -400 -300 -200 -100 0 100 200 300 400skew / beats-50+5GEDCAchroma binsGEDCAchroma binsskew / semitonesElliott Smith - Between the BarsGlen Phillips - Between the BarsCross-correlationE4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22Filtered Cross-Correlation•Raw correlation not as important as precise local matchlooking for large contrast at ±1 beat skewi.e. high-pass filter13-500 -400 -300 -200 -100 0 100 200 300 40000.20.40.6-500 -400 -300 -200 -100 0 100 200 300 400skew / beatsskew / beats-50+5skew / semitonesCross-correlationCross-correlation @ skew = +2 semitonesrawfilteredE4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22Cover Song Results•23 Covers found in 8700 song ‘uspop2002’popular ‘decoys’ – normalization issues14TestQueryAb Ad Al Am Be Be Bl Ca Ce Cl Co Co Da En Fa Go Go Gr Hu I_ I_ Le TaAbracadabra/sugar_rayAddicted_To_Love/tina_turnerAll_Along_The_Watchtower/dave_matthews_bandAmerica/simon_and_garfunkelBefore_You_Accuse_Me/eric_claptonBetween_The_Bars/glen_phillipsBlue_Collar_Man/styxCaroline_No/brian_wilsonCecilia/simon_and_garfunkelClaudette/roy_orbisonCocaine/nazarethCome_Together/beatlesDay_Tripper/cheap_trickEnjoy_The_Silence/tori_amosFaith/limp_bizkitGod_Only_Knows/brian_wilsonGold_Dust_Woman/sheryl_crowGrand_Illusion/styxHush/milli_vanilliI_Can_t_Get_No_Satisfaction/rolling_stonesI_Love_You/faith_hillLet_It_Be/nick_caveTake_Me_To_The_River/annie_lennoxCover Songs - dpwe23 - 12/23 correctE4896 Music Signal Processing (Dan Ellis) 2013-04-15 - /22Analyzing Cover Song Correlation•Look inside global cross-correlation to find matching fragments...xcorr = t f (C1(t, f)⋅C2(t, f)) - view along time15Let


View Full Document

Columbia ELEN E4896 - Alignment and Matching

Download Alignment and Matching
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Alignment and Matching and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Alignment and Matching 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?