Blog Logo

2019-11-17T22:40:32.169Z ~ 5 min read

♫ -> ヅ = hmm. Music Genre Classification


♫ - Music Genre Classification

Build a machine learning (ML) model that takes audio files as input and returns a corresponding music genre.

Building a model using fastai and 🚀 Deploy the model to GCP

Gist of everything


Outcome of assignment

Expected outcome

  • fastai library for model predictions
  • serving the model in GCP w. their new custom prediction model function.

Actual outcome

  • trained fastai library for model predictions
  • serving the model in GCP w. Google App Engine


Out of interest of time. The model is only trained on 2 classes ['classical', 'blues']

app -

upload - image from docs/img_data/*


Training the model

  1. Download data
  2. Find out research around the area of predicting music genre w. this particular data set.
    1. Found out about MelSpectrograms for predicting the whole songs seems to be the silver standard (not cutting edge).
  3. Using fastaiv3 for learning about the library of image classification and its approach to making a first model.
  4. Creating the data set for data augmentation / processing from raw audio file outputs to a Melspectrogram to represent the whole songs. (currently processed the blues, classical for interest of time) - colab:
  5. Train a image classifier using transfer learning w. ResNet as base - colab:
  6. Deploying model to GCP.
    1. Challenge here was that GCP does not natively support pytorch models. March GCP released custom prediction capabilities and wanted to get my hands dirty Serving a PyTorch text classifier on AI Platform Serving using custom online prediction

    2. Challenge number 2 is that preprocessing can be dealt w. lambda functionality of the keras library. Hopefully we can make the preprocessing of the audio file into a image using the lambdas of keras. See preprocessing w. lambda keras

    3. Challenge number 3 is use case. What is the actual use case of this? Depending on that :

      Challenge number 2 might not even be relevant as we might only need this music-genre classification for post analytics. And therefore could postprocess data another way to skip the preprocessing directly from the api.

Exact steps made and challenges presented on the way

  1. Download data set from Download link
  2. Unzip the tar into your drive or an environment so you can pick it up from colab (as colab does not persist data)
  3. Creating the data set for training w. Melspectrograms.
  4. Training using the newly created img_data using fastai in colab
  5. deploy model using the newly created guide for deploying pytorch models for gcp
  6. current step/progress, when trying to deploy the custom model prediction
$ make create-model
$ make create-version
gcloud alpha ai-platform versions create v2 --model music_genre_classification \
	--origin=gs://music-genre-classification/music-genre-v1.0.0/ \
	--python-version=3.5 \
	--runtime-version=1.13 \
	--package-uris=gs://music-genre-classification/python-prediction/music_genre_prediction-0.1.tar.gz \
	--machine-type=mls1-c4-m4 \
Creating version (this might take a few minutes)......failed.
ERROR: ( Create Version failed. Bad model detected with error:  "Failed to load model: User-provided package music_genre_prediction-0.1.tar.gz failed to install: Command '['python-default', '-m', 'pip', 'install', '--target=/tmp/custom_lib', '--no-cache-dir', '-b', '/tmp/pip_builds', '/tmp/custom_code/music_genre_prediction-0.1.tar.gz']' returned non-zero exit status 1 (Error code: 0)"
make: *** [Makefile:88: create-version] Error 1

New Approach since the GCP AI Platform did not allow for deploying Pytorch models

  1. Deploy on Google App Engine

  2. Following the fastai guide on the fastai website.

    1. make a downloadable link of the model that you train
    2. deploy the app. Need to be EXACTLY the number of classes your are prediction w. the model. (This happened to be the cause of some unexplainable error)
  3. Deployed app is up and running @

  4. Google App Engine does not allow requests to be built around cURL (c-language) requests. See curl-on-app-engine

  5. Postman request instead

Would have taken some time to configura to setup cURL for GAE

$ curl --request POST \
>   --url \
>   --header 'Accept: */*' \
>   --header 'Accept-Encoding: gzip, deflate' \
>   --header 'Connection: keep-alive' \
>   --header 'Content-Length: 474050' \
>   --header 'Content-Type: multipart/form-data; boundary=--------------------------890612561901535102156936' \
>   --header 'Host:' \
>   --header 'User-Agent: PostmanRuntime/7.19.0' \
>   --header 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \
>   --form file=@/home/eleijonmarck/dev/epidemic-sound-ml-assignment/data/img_data/classical/classical00011.png

curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)

📊 Data

As a dataset, use the (in)famous GTZAN Genre Collection dataset from

Tzanetakis, George, and Perry Cook. “Musical genre classification of audio signals.” IEEE Transactions on speech and audio processing 10.5 (2002): 293-302.

Download link and homepage for the dataset.

Next steps:

  • create baseline
  • make preprocessing step for images
  • create embeddings of the melspectrogram features for visualizing in t-SNE
  • signal processing for melspectrograms to create MFCC features -

Headshot of Moi

Hi, I'm Eric. I'm a software engineer and data scientist based in Lisbon. You can follow me on Twitter, see some of my work on GitHub,