Columbia ELEN E4896 - MUSIC STRUCTURE SEGMENTATION USING SHIFT-INVARIANT - D2814423

Home> Schools> Columbia University> (ELEN) > ELEN E4896> MUSIC STRUCTURE SEGMENTATION USING SHIFT-INVARIANT

Columbia ELEN E4896 - MUSIC STRUCTURE SEGMENTATION USING SHIFT-INVARIANT

Course Elen E4896- MUSIC SIGNAL PROCESSING

Pages 3

Download Save

Unformatted text preview:

MUSIC STRUCTURE SEGMENTATION USING SHIFT-INVARIANTPROBABILISTIC LATENT COMPONENT ANALYSIS (MIREX 2010)Ron J. Weiss and Juan Pablo BelloMusic and Audio Research Lab, New York University{ronw,jpbello}@nyu.eduABSTRACTWe describe our music structure segmentation algorithmsubmitted to the MIREX 2010 evaluation. Our method isbased on shift-invariant probabilistic component analysis,a variant of convolutive non-negative matrix factorization,applied to chroma features. Repeated harmonic patterns areidentified by decomposing a chromagram into a sequence ofa small number of repeated basis patterns. The patterns andtheir locations within a song are simultaneously estimatedusing an iterative expectation-maximization algorithm. Theparameters of the decomposition are then used to computethe long-term structure segmentation by assuming a one-to-one mapping between the identified pattens and segmentlabels.1. SEGMENTATION ALGORITHMOur segmentation system is described in detail in [3]. Inthe following sections we briefly review the algorithm anddescribe the extensions implemented for MIREX.The Python implementation of the algorithm is freelyavailable under the terms of the GNU General Public Li-cense. The most recent version can be found online athttp://ronw.github.com/siplca-segmentation1.1 FeaturesThe segmentation algorithm uses beat-synchronous chromafeatures, computed using the algorithm described in [1],and normalized so that the maximum value in each frame is1. Example features computed from Good Day Sunshine byThe Beatles are shown in the top left pane of Figure 1.1.2 SI-PLCAThe beat-synchronous chromagram for a given song is de-composed using shift-invariant probabilistic latent compo-nent analysis (SI-PLCA) [2] into the convolution ofkbasiscomponents,Wk, and their activations in time,hk. Thedecomposition for each point in the chromagramVcan beThis document is licensed under the Creative CommonsAttribution-Noncommercial-Share Alike 3.0 License.http://creativecommons.org/licenses/by-nc-sa/3.0/c 2010 The Authors.written as follows:vft≈ ˆvft=XkXτzkwfkτ→τhkt(1)wherezkcorresponds to the mixing weight for each com-ponent and→τx shifts x τ places to the right.The basesWKcorrespond to fixed-length chroma tem-plates that are repeated throughout the song. The corre-sponding activation functionhTkdenotes when each com-ponent is active in time.The number of componentsKis fixed to a large number(15) and unneeded components are pruned away by enforc-ing that the mixing weightszkhave a sparse distribution.This enables the algorithm to automatically identify theoptimal number of bases needed to adequately explain thedata.For more details, including the full expectation-maximizationalgorithm for estimatingWk,zk, andhkfromV, see [3]and [2].1.3 SegmentationGiven the SI-PLCA decomposition described in the previ-ous section, we derive the structure segmentation using thefollowing likelihood function:Pt, k=XfXτzkwfkτ→τhkt(2)The quantity in equation (2) corresponds to the prob-ability that the observation at timetcomes from basisk.An example is shown in Figure 2. We assume that eachbasis corresponds to a unique segment label and computethe final segmentation fromPt, kby finding the opti-mal setting ofkat each time frame. We compute this paththrough equation (2) using the Viterbi algorithm using asimple transition matrix designed to smooth out transitionsbetween segments. The transition matrix is constructedto have a large weight along the diagonal to discouragespurious transitions between segments. The off diagonalcomponents are uniform, so no preference is given for anyparticular state.aij=(p i = j1K−1(1 − p) i 6= j(3)pwas set to0.9in the MIREX submission. Finally, thesegmentation output by the Viterbi algorithm is processedto remove any segments shorter than 32 beats. An examplesegmentation is shown in Figure 2.0 100 200 300 400 500 600 7000246810V (Iteration 199)0 100 200 300 400 500 600 7000246810Reconstruction0 100 200 300 400 500 600 7000246810Basis 0 reconstruction0 100 200 300 400 500 600 7000246810Basis 1 reconstruction0 100 200 300 400 500 600 7000246810Basis 2 reconstruction0 100 200 300 400 500 600 7000246810Basis 3 reconstruction0 100 200 300 400 500 600 7000246810Basis 4 reconstruction012 340.000.050.100.150.200.250.300.350.40Z0 10 20 30 40W00 10 20 30 40W10 10 20 30 40W20 10 20 30 40W30 10 20 30 40W40 100 200 300 400 500 600 700∗H00 100 200 300 400 500 600 700∗H10 100 200 300 400 500 600 700∗H20 100 200 300 400 500 600 700∗H30 100 200 300 400 500 600 700∗H4Figure 1. Example SI-PLCA decomposition. The top left pane shows the original beat-synchronous chromagram. Directlyunderneath is the approximation using SI-PLCA. The remaining panes in the left column contain the reconstruction usingeach basis alone. Finally, the parameters of the decomposition are shown in the right column.0 100 200 300 400 500 600 70002468100.000000.000060.000120.000180.000240.000300.000360.000420.000480 100 200 300 400 500 600 700012340.000000.000250.000500.000750.001000.001250.001500.001750.002000.002250 100 200 300 400 500 600 7000.40.20.00.20.40.00.40.81.21.62.02.42.83.23.64.00 100 200 300 400 500 600 7000.40.20.00.20.41.01.52.02.53.03.54.04.55.05.56.0Figure 2. Structure segmentation derived from the SI-PLCA decomposition shown in Figure 1. The top pane shows thechromagramV. The middle panes showPt, kfrom equation (2) and the resulting segmentation, respectively. The bottompane shows the ground-truth segmentation.2. REFERENCES[1]D.P.W. Ellis and G.E. Poliner. Identifying ‘cover songs’with chroma features and dynamic programming beattracking. In Proc. IEEE International Conference onAcoustics, Speech, and Signal Processing (ICASSP),pages IV–1429–1432, 2007.[2]P. Smaragdis, B. Raj, and M. Shashanka. Sparse andshift-invariant feature extraction from non-negative data.In Proc. IEEE International Conference on Acoustics,Speech, and Signal Processing (ICASSP), pages 2069–2072, 2008.[3]R. J. Weiss and J. P. Bello. Identifying Repeated Patternsin Music Using Sparse Convolutive Non-Negative Ma-trix Factorization. In Proc. International Conference onMusic Information Retrieval (ISMIR), Utrecht, Nether-lands, August

View Full Document


School:
Email:
New Password:
Confirm Password:

Columbia ELEN E4896 - MUSIC STRUCTURE SEGMENTATION USING SHIFT-INVARIANT

Sign up for free to view:

Please select your school