MIDI Files as Training Data. A fundamental difference: MIDI scores… | by Francesco Foscarin

Earlier than beginning any deep studying challenge with MIDI information, be sure to know the distinction between MIDI scores and MIDI performances!

This text is for folks planning or starting to work with MIDI information. This format is extensively used within the music group, and it caught the eye of laptop music researchers because of the availability of datasets.

Nonetheless, various kinds of info might be encoded in MIDI information. Specifically, there’s a massive distinction between MIDI scores and MIDI performances. Not being conscious of this ends in time wasted on a ineffective activity or an incorrect selection of coaching knowledge and approaches.

I’ll present a primary introduction to the 2 codecs and provides hands-on examples of the best way to begin working with them in Python.

What’s MIDI?

MIDI was launched as a real-time communication protocol between synthesizers. The principle concept is to ship a message each time a notice is pressed (notice on) on a MIDI keyboard and one other message when the notice is launched (notice off). Then the synthesizer on the receiving finish will know what sound to supply.

Welcome to MIDI information!

If we acquire and save all these messages (ensuring so as to add their time place) then we have now a MIDI file that we will use to breed a bit. Aside from note-on and note-off, many different kinds of messages exist, for instance specifying pedal info or different controllers.
You may consider plotting this info with a pianoroll.
Beware, this isn’t a MIDI file, however solely a potential illustration of its content material! Some software program (on this instance Reaper) provides a small piano keyboard subsequent to the pianoroll to make it simpler to visually interpret.

How is a MIDI file created?

A MIDI file might be created primarily in two methods: 1) by enjoying on a MIDI instrument, 2) by manually writing right into a sequencer (Reaper, Cubase, GarageBand, Logic) or a musical rating editor (for instance from MuseScore).

With every approach of manufacturing MIDI information comes additionally a unique sort of file:

enjoying on a MIDI instrument → MIDI efficiency
manually writing the notes (sequencer or musical rating) → MIDI rating

We’ll now dive into every sort, after which summarize their variations.

Earlier than beginning, a disclaimer: I cannot focus particularly on how the knowledge is encoded, however on what info might be extracted from the file. For instance, after I say “ time is represented in seconds” it implies that we will get seconds, though the encoding itself is extra complicated.

We are able to discover 4 varieties of data in a MIDI efficiency:

When the notice begin: notice onset
When the notice finish: notice offset (or notice period computed as offset -onset)
Which notice was performed: notice pitch
How “sturdy” was the important thing pressed: notice velocity

Observe onset and offset (and period) are represented in seconds, akin to the seconds the notes had been pressed and launched by the individual enjoying the MIDI instrument.
Observe pitch is encoded with an integer from 0 (lowest) to 127 (highest); notice that extra notes might be represented than these that may be performed by a piano; the piano vary corresponds to 21–108.
Observe velocity can also be encoded with an integer from 0 (silence) to 127 (most depth).

The overwhelming majority of MIDI performances are piano performances as a result of most MIDI devices are MIDI keyboards. Different MIDI devices (for instance MIDI saxophones, MIDI drums, and MIDI sensors for guitar) exist, however they don’t seem to be as frequent.

The most important dataset of human MIDI performances (classical piano music) is the Maestro dataset by Google Magenta.

The principle property of MIDI performances

A basic attribute of MIDI performances is that there are by no means notes with precisely the identical onset or period (that is, in idea, potential however, in observe, extraordinarily unlikely).

Certainly, even when they actually strive, gamers received’t be capable to press two (or extra) notes precisely on the similar time, since there’s a restrict to the precision people can get hold of. The identical is true for notice durations. Furthermore, this isn’t even a precedence for many musicians, since time deviation may help to supply a extra expressive or groovy feeling. Lastly, consecutive notes can have some silence in between or partially overlap.

For that reason, MIDI performances are typically additionally known as unquantized MIDI. Temporal positions are unfold on a steady time scale, and never quantized to discrete positions (for digital encoding causes, it’s technically a discrete scale, however extraordinarily tremendous, thus we will think about it steady).

Fingers-on instance

Allow us to take a look at a MIDI efficiency. We are going to use the ASAP dataset, out there on GitHub.

In your favourite terminal (I’m utilizing PowerShell on Home windows), go to a handy location and clone the repository.

git clone https://github.com/fosfrancesco/asap-dataset

We may even use the Python library Partitura to open the MIDI information, so you’ll be able to set up it in your Python atmosphere.

pip set up partitura

Now that every little thing is about, let’s open the MIDI file, and print the primary 10 notes. Since this can be a MIDI efficiency, we’ll use the load_midi_performance operate.

from pathlib import Path
import partitura as pt# set the trail to the asap dataset (change it to your native path!)
asap_basepath = Path('../asap-dataset/')
# choose a efficiency, right here we use Bach Prelude BWV 848 in C#
performance_path = Path("Bach/Prelude/bwv_848/Denisova06M.mid")
print("Loading midi file: ", asap_basepath/performance_path)
# load the efficiency
efficiency = pt.load_performance_midi(asap_basepath/performance_path)
# extract the notice array
note_array = efficiency.note_array()
# print the dtype of the notice array (useful to know the best way to interpret it)
print("Numpy dtype:")
print(note_array.dtype)
# print the primary 10 notes within the notice array
print("First 10 notes:")
print(efficiency.note_array()[:10])

The output of this Python program ought to seem like this:

Numpy dtype:
[('onset_sec', '<f4'), ('duration_sec', '<f4'), ('onset_tick', '<i4'), ('duration_tick', '<i4'), ('pitch', '<i4'), ('velocity', '<i4'), ('track', '<i4'), ('channel', '<i4'), ('id', '<U256')]
First 10 notes:
[(1.0286459, 0.21354167,  790, 164, 49, 53, 0, 0, 'n0')
(1.03125  , 0.09765625,  792,  75, 77, 69, 0, 0, 'n1')
(1.1302084, 0.046875  ,  868,  36, 73, 64, 0, 0, 'n2')
(1.21875  , 0.07942709,  936,  61, 68, 66, 0, 0, 'n3')
(1.3541666, 0.04166667, 1040,  32, 73, 34, 0, 0, 'n4')
(1.4361979, 0.0390625 , 1103,  30, 61, 62, 0, 0, 'n5')
(1.4361979, 0.04296875, 1103,  33, 77, 48, 0, 0, 'n6')
(1.5143229, 0.07421875, 1163,  57, 73, 69, 0, 0, 'n7')
(1.6380209, 0.06380209, 1258,  49, 78, 75, 0, 0, 'n8')
(1.6393229, 0.21484375, 1259, 165, 51, 54, 0, 0, 'n9')]

You may see that we have now the onset and durations in seconds, pitch and velocity. Different fields should not so related for MIDI performances.

Onsets and durations are additionally represented in ticks. That is nearer to the precise approach this info is encoded in a MIDI file: a really brief temporal period (= 1 tick) is chosen, and all temporal info is encoded as a a number of of this period. Whenever you take care of music performances, you’ll be able to sometimes ignore this info and use straight the knowledge in seconds.

You may confirm that there are by no means two notes with precisely the identical onset or the identical period!

Midi scores use a a lot richer set of MIDI messages to encode info comparable to time signature, key signature, bar, and beat positions.

For that reason, they resemble musical scores (sheet music), though they nonetheless miss some important info, for instance, pitch spelling, ties, dots, rests, beams, and so on…

The temporal info will not be encoded in seconds however in additional musically summary models, like quarter notes.

The principle property of MIDI scores

A basic attribute of MIDI rating is that all notice onsets are aligned to a quantized grid, outlined first by bar positions after which by recursive integer divisions (primarily by 2 and three, however different divisions comparable to 5,7,11, and so on…) are used for tuplets.

Fingers-on instance

We at the moment are going to take a look at the rating from Bach Prelude BWV 848 in C#, which is the rating of the efficiency we loaded earlier than. Partitura has a devoted load_score_midi operate.

from pathlib import Path
import partitura as pt# set the trail to the asap dataset (change it to your native path!)
asap_basepath = Path('../asap-dataset/')
# choose a rating, right here we use Bach Prelude BWV 848 in C#
score_path = Path("Bach/Prelude/bwv_848/midi_score.mid")
print("Loading midi file: ", asap_basepath/score_path)
# load the rating
rating = pt.load_score_midi(asap_basepath/score_path)
# extract the notice array
note_array = rating.note_array()
# print the dtype of the notice array (useful to know the best way to interpret it)
print("Numpy dtype:")
print(note_array.dtype)
# print the primary 10 notes within the notice array
print("First 10 notes:")
print(rating.note_array()[:10])

The output of this Python program ought to seem like this:

Numpy dtype:
[('onset_beat', '<f4'), ('duration_beat', '<f4'), ('onset_quarter', '<f4'), ('duration_quarter', '<f4'), ('onset_div', '<i4'), ('duration_div', '<i4'), ('pitch', '<i4'), ('voice', '<i4'), ('id', '<U256'), ('divs_pq', '<i4')]
First 10 notes:
[(0. , 1.9958333 , 0.  , 0.99791664,   0, 479, 49, 1, 'P01_n425', 480)
(0. , 0.49583334, 0.  , 0.24791667,   0, 119, 77, 1, 'P00_n0', 480)
(0.5, 0.49583334, 0.25, 0.24791667, 120, 119, 73, 1, 'P00_n1', 480)
(1. , 0.49583334, 0.5 , 0.24791667, 240, 119, 68, 1, 'P00_n2', 480)
(1.5, 0.49583334, 0.75, 0.24791667, 360, 119, 73, 1, 'P00_n3', 480)
(2. , 0.99583334, 1.  , 0.49791667, 480, 239, 61, 1, 'P01_n426', 480)
(2. , 0.49583334, 1.  , 0.24791667, 480, 119, 77, 1, 'P00_n4', 480)
(2.5, 0.49583334, 1.25, 0.24791667, 600, 119, 73, 1, 'P00_n5', 480)
(3. , 1.9958333 , 1.5 , 0.99791664, 720, 479, 51, 1, 'P01_n427', 480)
(3. , 0.49583334, 1.5 , 0.24791667, 720, 119, 78, 1, 'P00_n6', 480)]

You may see that the onsets of the notes are all falling precisely on a grid. If we think about onset_quarter (the third column) we will see that sixteenth notes fall each 0.25 quarters, as anticipated.

The period is a little more problematic. For instance, on this rating, a sixteenth notice ought to have a quarter_duration of 0.25. Nonetheless, we will see from the Python output that the period is definitely 0.24791667. What occurred is that MuseScore, which was used to generate this MIDI file, shortened a bit every notice. Why? Simply to make the audio rendition of this MIDI file sound a bit higher. And it does certainly, at the price of inflicting many issues to the folks utilizing these information for Pc Music analysis. Comparable issues additionally exist in extensively used datasets, such because the Lakh MIDI Dataset.

Given the variations between MIDI scores and MIDI performances we’ve seen, let me provide you with some generic pointers that may assist in accurately organising your deep studying system.

Choose MIDI scores for music technology programs, for the reason that quantized notice positions might be represented with a fairly small vocabulary, and different simplifications are potential, like solely contemplating monophonic melodies.

Use MIDI efficiency for programs that concentrate on the way in which people play and understand music, for instance, beat monitoring programs, tempo estimators, and emotion recognition programs (specializing in expressive enjoying).

Use each varieties of information for duties like score-following (enter: efficiency, output: rating) and expressive efficiency technology (enter: rating, output: efficiency).

Further issues

I’ve offered the primary variations between MIDI scores and MIDI performances. Nonetheless, as typically occurs, issues could also be extra complicated.

For instance, some datasets, just like the AMAPS datasets, are initially MIDI scores, however the authors launched time modifications at each notice, to simulate the time deviation of actual human gamers (notice that this solely occurs between notes at completely different time positions; all notes in a chord will nonetheless be completely simultaneous).

Furthermore, some MIDI exports, just like the one from MuseScore, may even attempt to make the MIDI rating extra much like a MIDI efficiency, once more by altering tempo indication if the piece modifications tempo, by inserting a really small silence between consecutive notes (we noticed this within the instance earlier than), and by enjoying grace notes as a really brief notice barely earlier than the reference notice onset.

Certainly, grace notes represent a really annoying downside in MIDI scores. Their period is unspecified in musical phrases, we simply generically know that they need to be “brief”. And their onset is within the rating the identical one of many reference notice, however this is able to sound very bizarre if we listed to an audio rendition of the MIDI file. Ought to we then shorten the earlier notice, or the following notice to create space for the grace notice?

Different elaborations are additionally problematic since there are not any distinctive guidelines on the best way to play them, for instance, what number of notes ought to a trill accommodates? Ought to a mordent begin from the precise notice or the higher notice?

MIDI information are nice, as a result of they explicitly present details about the pitch, onset, and period of each notice. This implies for instance that, in comparison with audio information, fashions concentrating on MIDI knowledge might be smaller and be educated with smaller datasets.

This comes at a value: MIDI information, and symbolically encoded music normally, are complicated codecs to make use of since they encode so many varieties of data in many alternative methods.

To correctly use MIDI knowledge as coaching knowledge, you will need to pay attention to the sort of knowledge which are encoded. I hope this text gave you a superb start line to study extra about this matter!

[All figures are from the author.]

Source link

Sentiment Analysis Template: A Complete Data Science Project | by Leo Anello 💡 | Dec, 2024

Transformers Key-Value (KV) Caching Explained | by Michał Oleszak | Dec, 2024

CV VideoPlayer — Once and For All | by Daniel Tomer | Dec, 2024

FG to stop minerals testing abroad, unveil analysis lab

New Coin Listing – Sealana Crypto Presale Hits $5 Million, 24 Hours Left

Financial Peace University vs. True Financial Freedom vs. Crown Financial MoneyLife

Nigeria not an easy place for startups

Best AI Nude Generators Revealed (2024)

Our Picks

Everything we know so far about the iPhone 17: price, release date, features

Police fire tear gas at Serbians protesting deadly station roof collapse | Protests News

Ukraine’s Zelenskyy presses US to greenlight deeper strikes into Russia

Most Popular

FG to stop minerals testing abroad, unveil analysis lab

New Coin Listing – Sealana Crypto Presale Hits $5 Million, 24 Hours Left

Financial Peace University vs. True Financial Freedom vs. Crown Financial MoneyLife

MIDI Files as Training Data. A fundamental difference: MIDI scores… | by Francesco Foscarin | Sep, 2024

What’s MIDI?

Welcome to MIDI information!

How is a MIDI file created?

The principle property of MIDI performances

Fingers-on instance

The principle property of MIDI scores

Fingers-on instance

Further issues

Related Posts