The DiTME Project
Interdisciplinary research in music technology
4.2 Automatic ornamentation transcription
The ODTW and ODCF systems provide a remarkable improvement on detecting the slow onsets. However, the problem related to the detection of ornamentation events in onset detection systems is not overcome by the systems, which assume that close onset candidates belong to the same onset. The latter limitation is overcome by the ornamentation detector outlined in, Figure 14, (Gainza et al . 2004b; Gainza and Coyle 2007). The system detects audio segments by utilising an onset detector based on comb filters, which is capable of detecting very close events. In addition, a novel method to remove spurious onsets due to offset events is introduced. The system utilises musical ornamentation theory to decide whether a sequence of audio segments corresponds to an ornamentation musical structure.
The different parts of the ornamentation transcription system presented here are depicted in, Figure 14. Firstly, the onset detection block is described, from which a vector of onset candidates is obtained. Next, spurious onset detections due to offset events are removed. Following this, audio segments are formed and divided into note and ornamentation candidate segments. Next, the pitch of the audio segments is estimated. Finally, single and multi-note ornaments are transcribed, Figure 14.
Consider, Figure 15, where a signal excerpt containing a roll played by a flute is depicted in the top plot. The ODF of the signal generated by utilising the ODCF is depicted in the bottom plot. It can be seen that the ODCF provides a distinctive peak at the location of the new events in the signal, which we denote as onn. Figure 15.
Every onset candidate onn is matched to the next onset candidate in time order onn+1 to form audio segments Sgn = [onn, onn+1]. Next, a table of audio segments is formed, wherein the second and third columns denote the beginning and ending of the audio segments. As an example, Table 3, shows the audio segments of the signal depicted in, Figure 15.
Next, according to time duration, the audio segments are split into note and ornamentation segment candidates as follows:
where Te is the longest expected ornamentation time for an experienced player, which has been analytically set to Te= 70ms. The Sgn segment type is shown in the fourth column of the audio segments table, as can be seen in, Table 3.
In order to obtain the pitch of the audio segments, a similar method to that of Brown (1992) is utilised. Following this, the fundamental frequency estimation is refined by using parabolic interpolation (Serra 1989). The pitch of each audio segment Sgn is shown in the fifth column of, Table 3, and is denoted as P(n).
4.3.1 Single-note ornaments transcription (cuts and strikes)
- The cut momentarily increases the pitch. By considering, Figure 15, for example, it can be seen that the second and third segments in, Table 3, are an ornamentation and a note segment. In addition, P(2)= C#6 is higher than P(3) = B5. Consequently, B5 has been ornamented with a cut in C#6, and both segments together form a cut segment.
- The strike separates two notes of the same pitch by momentarily lowering the pitch of the second note. A strike ornament that separates two notes is also present in Figure 15 example(Figure 15). From Table 3(Table 3)it can be derived that the fifth segment is a B5 note, which is separated from another B5 note by using the strike represented by the fourth segment.
4.3.2 Multi-note ornamentation transcription
Cranns and rolls are formed by combining ornamented and unornamented slurred notes of the same pitch.
This attempt to transcribe the most common types of ornamentation has never been previously attempted and is a particularly novel contribution to the field of onset detection and music transcription. The onset time estimation provided by this system suitably reflects Irish traditional music features, as the onset is estimated at the beginning of the ornamentation event.
- The roll is formed by a note followed by a cut segment and a strike segment. By considering, Table 3, it can be seen that the combination of a B5, a cut segment and a strike segment form a roll, where the three note segments have the same pitch B5. The short roll version removes the first unornamented note.
- The crann segment structure is similar to the roll. The difference lies in the use of cuts alone to ornament the notes. The short crann removes the first unornamented note
- The shake is a four notes ornament formed by rapid alterations between the principal note and a further note one whole or one half step above it (Larsen 2003). It commences with the three ornaments and finishes with the principal note. An example of a shake can be seen in, Figure 16(top plot), where an excerpt of a tin whistle tune is depicted. In the bottom plot the ODF generated by the ODCF is also depicted. By obtaining the pitch of those segments, a sequence of three ornaments (F#5, E5, F#5) and the principal note again E5 is obtained, which corresponds to a shake ornament.
Consequently, all of the difficulties encountered by existing onset detection approaches have been dealt with by the systems described in Sections 4.1 to 4.3.
4.4 Multi-pitch estimation using comb filters
When playing in unison, existing periodicity based pitch detection methods, such as FIR comb filters, might be utilised to transcribe the notes. However, with the inclusion of harmonic accompaniment the performance of these methods degrades. In an effort to detect the accompaniment chords, a multi-pitch detection system has been implemented (multi-pitch estimation using comb filters (MPECF); see Gainza et al. 2005b), which combines the structure of the multi-pitch detection model of Tadokoro et al. (2003) with the use of a more accurate comb filter and the weighting method of Martin (1982) and Morgan et al. (1997). The system detects the harmonic chords provided by a guitar accompaniment of a tin whistle.
In order to transcribe the musical chords played by the harmonic accompaniment, a system based on Tadokoro’s model is utilised, and is depicted in, Figure 17. As in Tadokoro (2003), the MPECF filter that produces an amplitude minimum represents the first detected note. Next, other notes in the audio signal are detected by iteratively connecting the output of the filter that has produced the minimum with the input of the parallel comb filter system (see Tadokoro 2003). The same filtering process is repeated again until all the notes have been extracted. After estimating the notes, an existing major or minor chord present is transcribed.
The system has been evaluated using three different databases, comprising synthetic monophonic and polyphonic signals, real guitar chords, and mixtures of guitar chords accompanying tin whistle tunes. The results are accurate for all of the databases, where the MPECF system is capable of detecting four simultaneous notes in polyphony (three note chord and a tin whistle note).