E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18Lecture 11:Chroma and ChordsDan EllisDept. Electrical Engineering, Columbia [email protected] http://www.ee.columbia.edu/~dpwe/e4896/11. Features for Music Audio2. Chroma Features3. Chord RecognitionELEN E4896 MUSIC SIGNAL PROCESSINGE4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /181. Features for Music Audio•Challenges of large music databaseshow to find “what we want”...2•Euclidean metaphormusic tracks as points in space•What are the dimensions?“sound” - timbre, instruments → MFCCmelody, chords→ Chromarhythm, tempo→ Rhythmic basesE4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18MFCCs•The standard feature for speech recognition3Logan 20000.25 0.255 0.26 0.265 0.27 time / s−0.500.50 1000 2000 3000 freq / Hz05100 5 10 15 freq / Mel012x 1040 5 10 15 freq / Mel0501000 10 20 30 quefrency−2000200FFT X[k]Mel scalefreq. warplog |X[k]|IFFTTruncateMFCCsSoundspectraaudspeccepstraE4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18MFCC Example•Resynthesize by imposing spectrum on noiseMFCCs capture instruments, not notes4freq / Hzcoefficientfreq / HzLet It Be - log-freq specgram (LIB-1) 30014006000MFCCs24681012time / secNoise excited MFCC resynthesis (LIB-2)0 5 10 15 20 2530014006000E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18MFCC Artist Classification•20 Artists x 6 albums eachtrain models on 5 albums, classify tracks from last•Model as MFCC mean + covarianceper artist“single Gaussian” model20 (mean) + 10 x 19 (covariance) parameters55% correct(guessing ~5%)5Confusion: MFCCs (acc 55.13%) aebecrcudadeflgagrlemameprqurarostsutou2aerosmithbeatlescreedence_c_rcuredave_matthews_bdepeche_modefleetwood_macgarth_brooksgreen_dayled_zeppelinmadonnametallicaprincequeenradioheadroxettesteely_dansuzanne_vegatori_amosu2trueEllis 2007E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /182. Chroma Features•What about modeling tonal content (notes)?melody spottingchord recognitioncover songs...•MFCCs exclude tonal content•Polyphonic transcription is too harde.g. sinusoidal tracking: confused by harmonics•Chroma features as solution...6MIDI note number4045505560657075MIDI note numbertime / s22 24 26 28 30 32 34 364045505560657075RecognizedTrueE4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18Chroma Features•Idea: Project all energy onto 12 semitonesregardless of octavemaintains main “musical” distinctioninvariant to musical equivalenceno need to worry about harmonics?W(k) is weighting, B(b) selects every ~ mod127C(b)=NMk=0B(12 log2(k/k0) b)W (k)|X[k]|50 100 150fft bin2 4 6 8time / sec50 100 150 200 250time / framefreq / kHz01234chromaACDFGchromaACDFGFujishima 1999E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18Better Chroma•Problems:blurring of bins close to edgeslimitation of FFT bin resolution•Solutions:peak picking - only keep energy at center of peaksInstantaneous Frequency - high-resolution estimatesadapt tuning center based on histogram of pitches82 4 6 8 time / sec50 100 150 200time / framefreq / kHz01234chromaACDFGchromaACDFG0 2000freq / Hz()E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18Chroma Resynthesis•Chroma describes the notes in an octave... but not the octave•Can resynthesize by presenting all octaves... with a smooth envelope“Shepard tones” - octave is ambiguousendless sequence illusion90 500 1000 1500 2000 2500freq / Hz-60-50-40-30-20-1002 4 6 8 10 time / secfreq / kHzlevel / dB01234Shepard tone resynth12 Shepard tone spectrayb(t)=Mo=1W (o +b12) cos 2o+b12w0tEllis & Poliner 2007E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18Chroma Example•Simple Shepard tone resynthesiscan also reimpose broad spectrum from MFCCs10Let It Be - log-freq specgram (LIB-1)Chroma featuresCDEGABShepard tone resynthesis of chroma (LIB-3)MFCC-filtered shepard tones (LIB-4)freq / Hz30014006000freq / Hzchroma bin30014006000freq / Hz30014006000time / sec5202510150E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18Beat-Synchronous Chroma•Drastically reduce data sizeby recording one chroma frame per beat11Let It Be - log-freq specgram (LIB-1)Onset envelope + beat timesBeat-synchronous chromaBeat-synchronous chroma + Shepard resynthesis (LIB-6)freq / Hz30014006000freq / Hz30014006000CDEGABchroma bintime / sec0510152025Bartsch & Wakefield 2001E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /183. Chord Recognition•Beat synchronous chroma look like chordscan we transcribe them?•Two approachesmanual templates (prior knowledge)learned models (from training data)12ACDEGchroma bintime / sec05101520C-E-GB-D-GA-C-EA-C-D-F...E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18Chord Recognition System•Analogous to speech recognitionGaussian models of features for each chordHidden Markov Models for chord transitions13AudioLabelsBeat trackResampleChroma100-1600 HzBPFChroma25-400 HzBPFRoot normalizeHMMViterbiCounttransitionsGaussianUnnormalizebeat-synchronouschroma featureschordlabels24x24transitionmatrix24GaussmodelstraintestC D E G A BCDEGABCDEGABC D E G A BC majc min C D E F G A B c d e f g a bCDEFGABcdefgabSheh & Ellis 2003E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18HMMs•Hidden Markov Models are good for inferring hidden statesunderlying Markov “generative model”each state has emission distributionobservationstell us somethingabout state...infer smoothedstate sequence1412300.20.40.60.80 10 20 30001230 1 2 3 400.20.40.60.8observation x time step nState sequenceEmission distributionsObservation sequencexnxnp(x|q) p(x|q)q = A q = B q = Cq = A q = B q = CAAAAAAAABBBBBBBBBBBCCCCBBBBBBBC ASECBp(qn+1|qn)S A B C E 0 1 0 0 00 0 0 0 10 .8 .1 .1 00 .1 .8 .1 00 .1 .1 .7 .1S A B C Eqnqn+1.8.8.7.1.1.1.1.1.1.1S A A A A A A A A B B B B B B B B B C C C C B B B B B B C EE4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18HMM Inference•HMM defines emission distribution and transition probabilities•Likelihood of observed given state sequence:15p(x|q)p(qn|qn1)p({xn}|{qn})=np(xn|qn)p(qn|qn1)q0q1q2q3q4SAAAES AAB ES A BBES BBBE.9 x .7 x .7 x .1 = 0.0441.9 x .7 x .2 x .2 = 0.0252.9 x .2 x .8 x .2 = 0.0288.1 x .8 x .8 x .2 = 0.0128Σ = 0.1109 Σ = p(X | M) = 0.40202.5 x 0.2 x 0.1 = 0.052.5 x 0.2 x 2.3 = 1.152.5 x 2.2 x 2.3 = 12.650.1 x 2.2 x 2.3 = 0.5060.00220.02900.36430.0065S A B
View Full Document