New version page

STRATEGIES

Upgrade to remove ads

This preview shows page 1 out of 4 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

ESTIMATING THE TONALITY OF POLYPHONIC AUDIO FILES: COGNITIVE VERSUS MACHINE LEARNING MODELLING STRATEGIESEmilia Gómez Perfecto Herrera Music Technology Group, Institut Universitari de l’Audiovisual Universitat Pompeu Fabra {emilia.gomez,perfecto.herrera}@iua.upf.es http://www.iua.upf.es/mtg ABSTRACT In this paper we evaluate two methods for key estimation from polyphonic audio recordings. Our goal is to compare between a strategy using a cognition-inspired model and several machine learning techniques to find a model for tonality (mode and key note) determination of polyphonic music from audio files. Both approaches have as an input a vector of values related to the intensity of each of the pitch classes of a chromatic scale. In this study, both methods are explained and evaluated in a large database of audio recordings of classical pieces. 1. INTRODUCTION Tonality and tonal aspects of musical pieces are very relevant for its appreciation. There have been attempts to relate those aspects with mood induction in listeners, and some kind of relatedness (or similarity) between different excerpts sharing tonality have been reported. Listeners are sensitive to key changes, which are also related to rhythm, structure, style and mood. Key changes can be used, for instance, as cues about the structure of a song, or as features to query for matching pieces in a database. Key and mode can also be used to navigate across digital music collections by computing similarities between the files or selected excerpts from them. In western music, the term key (or tonality) is usually defined as the relationship between a set of pitches having a tonic as its main tone, after which the key is named. A key is then defined by both its tonic (also called key note, for example: A) and its mode (ex: minor). The tonic is one in an octave range, within the 12 semitones of the chromatic scale (ex: A, A#/Bb, B, C, C#/Db, D, D#/Eb, E, F, F#/Gb, G). The mode is usually minor or major, depending on the used scale. The major and minor keys then rise to a total set of 24 different tonalities. Here we compare two approaches for computing the tonality from audio files containing polyphonic music. The first one is based on a tonality model that has been established after perceptual studies, and uses some musical knowledge to estimate the global key note and mode attached to a certain audio segment. The second one is based on machine learning algorithms trained on a database of labelled pieces. After a description of both approaches, we evaluate them, present the results and discuss some of our findings. 2. SYSTEM BLOCK DIAGRAM The overall system block diagram is presented in Figure 1. In order to estimate the key from polyphonic recordings, we first extract a set of low-level features from the audio signal. These features are then compared to a model of tonality in order to estimate the key of the piece. Figure 1. System block diagram. In this study we have assumed that the key is constant over the considered audio segment. That means that the modulations we can find do not affect the overall tonality of the piece and we can estimate a tonality for the segment. 3. FEATURE EXTRACTION The input of the key estimation block in Figure 1 is a vector of low-level features extracted from the audio signal. The features used in this study are the Harmonic Pitch Class Profile (HPCP), based on de Pitch Class Profile descriptor proposed by Fujishima in the context of a chord recognition system [1]. HPCP is a vector of low-level signal features measuring the intensity of each of the 12 pitch classes of the temperate scale within an analysis frame. The feature extraction procedure is summarized as follows. We refer to [2] for a detailed explanation. 1. Instantaneous HPCP vector is computed for each analysis frame using the magnitude of the spectral peaks that are located within a certain frequency band, considered as the most significant frequencies carrying harmonic information. We introduce a Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. © 2004 Universitat Pompeu Fabra.weight into the HPCP computation to get into account differences in tuning, and the resolution is changed to less than one semitone. The HPCP vector is normalized for each analysis frame in order to discard energy information. 2. Global HPCP is computed by averaging instantaneous HPCP within the considered segment. 4. TONALITY COMPUTATION USING A COGNITION-INSPIRED MODEL This algorithm is based on a key estimation algorithm proposed by Krumhansl et al. and summarized in [3, pp. 77-111]: the probe tone method. It measures the expectation of each of the 12 tones of a chromatic scale after a certain tonal context. This measure is representative to quantify the hierarchy of notes in a given tonal context. The output of the model is a rating for each of the 12 semitones of a chromatic scale (starting from the tonic), shown in Figure 2. The data were produced by experienced musicians following tonal contexts that consisted of tonic triads and chord cadences. This profile is used to estimate the key of a MIDI melodic line, by correlating it with a vector containing the relative duration of each of the 12 pitch classes within the MIDI sequence [3]. Figure 2. Probe tone ratings from the study by Krumhansl and Kessler (1982) shown with reference to a major key (top) and a minor key (bottom). Our approach relies on extending this model to deal with audio recordings in a polyphonic situation. We consider the profile value for a given pitch class to represent also the hierarchy of a chord in a given key. Given this assumption, we consider all the chords containing a given pitch class when measuring the relevance of this pitch class within a certain key. For instance, the dominant pitch class (i=8) appears in both tonic and dominant chords, so that the profile value for i=8 adds the contribution of the tonic and the dominant chords of the key. We only consider the three main triads of the major/minor key as the most representative chords (tonic, subdominant and dominant). We also adapt the method to work with audio features (HPCP related to energy) instead of MIDI. The spectrum of a note is composed of several harmonics,


Download STRATEGIES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view STRATEGIES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view STRATEGIES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?