Goal
To be able to classify an audio clip into one of ten different music genres.
Preamble
The goal of the project was to be able to automatically classify a digital audio clip by genre. Instead of having to manually label all music stored in the digital library of millions of songs, the idea was to create a tool that would be able to achieve this much more quickly.
Data
The data consisted of 1,000 digital music mp3 recordings downloaded from either the freemusicarchive.org or uppbeat.io. These consisted of 100 recordings from each of the following genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae and rock.
Method
The files were converted into .wav files using the Python pydub library and then segmented into 30-second length clips. The clips were then converted into audio spectrograms (see below).
The spectrograms were loaded as 3-channel 720 x 432 pixel .png images into a Keras Tensorflow model with 4 highly tuned layers. Normalization, average pooling and dropout sequences were all included in the model. 50 epochs were run until model convergence.
This model was compared to two other approaches:
1. Keras Tensorflow transfer learning model sitting on top of an efficientnetv2-xl-21k CNN
2. Keras Tensorflow learning on .mp3 files converted to numpy arrays
Results
It was discovered that the transfer learning approach was the least accurate model, with the model running on numpy arrays of .mp3 files faring only slightly worse than the best performing fully manual Keras Tensorflow on spectrogram model.
This was the first time I had worked with audio files, and what I discovered in the course of this project was that they don’t behave like other forms of data. This is likely due to the cyclical but not exactly time-series nature of music, as well as the relative mathematical relationships between notes that are apparent in the various music genres.
There is certainly still much to explore in this area. For example, possibly identifying more representative music segments as well as utilizing other tuning approaches may lead to further model improvement. Also, being able to identify which musical instruments are included in a music segment could widen the scope of this model.
published February 22, 2022