Posted onAug 25, 2024

Tensorflow music prediction

In this article, I show how to use tensorflow to predict a style of music.
In my example, I compare techno and classical music.

You can find the code on my github :
https://github.com/victordalet/sound_to_partition

I - Dataset

For the first step, you need to create onedataset forlder and inside add one folder for music style, for example i add onetechno folder andclassic folder in which in put mywav soung.

II - Train

I create a train file, with the argumentsmax_epochs to be completed.

Modify the classes in the constructor that correspond to your directory in the dataset folder.

In the loading and processing method, I retrieve the wav file from a different directory and obtain the spectogram.

For training purposes, I use the Keras convolutions and model.

importosimportsysfromtypingimportListimportlibrosaimportnumpyasnpfromtensorflow.keras.layersimportInput,Conv2D,MaxPooling2D,Flatten,Densefromtensorflow.keras.modelsimportModelfromtensorflow.keras.optimizersimportAdamfromsklearn.model_selectionimporttrain_test_splitfromtensorflow.keras.utilsimportto_categoricalfromtensorflow.imageimportresizeclassTrain:def__init__(self):self.X_train=Noneself.X_test=Noneself.y_train=Noneself.y_test=Noneself.data_dir:str='dataset'self.classes:List[str]=['techno','classic']self.max_epochs:int=int(sys.argv[1])@staticmethoddefload_and_preprocess_data(data_dir,classes,target_shape=(128,128)):data=[]labels=[]fori,class_nameinenumerate(classes):class_dir=os.path.join(data_dir,class_name)forfilenameinos.listdir(class_dir):iffilename.endswith('.wav'):file_path=os.path.join(class_dir,filename)audio_data,sample_rate=librosa.load(file_path,sr=None)mel_spectrogram=librosa.feature.melspectrogram(y=audio_data,sr=sample_rate)mel_spectrogram=resize(np.expand_dims(mel_spectrogram,axis=-1),target_shape)data.append(mel_spectrogram)labels.append(i)returnnp.array(data),np.array(labels)defcreate_model(self):data,labels=self.load_and_preprocess_data(self.data_dir,self.classes)labels=to_categorical(labels,num_classes=len(self.classes))# Convert labels to one-hot encodingself.X_train,self.X_test,self.y_train,self.y_test=train_test_split(data,labels,test_size=0.2,random_state=42)input_shape=self.X_train[0].shapeinput_layer=Input(shape=input_shape)x=Conv2D(32,(3,3),activation='relu')(input_layer)x=MaxPooling2D((2,2))(x)x=Conv2D(64,(3,3),activation='relu')(x)x=MaxPooling2D((2,2))(x)x=Flatten()(x)x=Dense(64,activation='relu')(x)output_layer=Dense(len(self.classes),activation='softmax')(x)self.model=Model(input_layer,output_layer)self.model.compile(optimizer=Adam(learning_rate=0.001),loss='categorical_crossentropy',metrics=['accuracy'])deftrain_model(self):self.model.fit(self.X_train,self.y_train,epochs=self.max_epochs,batch_size=32,validation_data=(self.X_test,self.y_test))test_accuracy=self.model.evaluate(self.X_test,self.y_test,verbose=0)print(test_accuracy[1])defsave_model(self):self.model.save('weight.h5')if__name__=='__main__':train=Train()train.create_model()train.train_model()train.save_model()

III - Test

To test and use the model, I've created this class to retrieve the weight and predict the style of the music.

Don't forget to add the right classes to the constructor.

fromtypingimportListimportlibrosaimportnumpyasnpfromtensorflow.keras.modelsimportload_modelfromtensorflow.imageimportresizeimporttensorflowastfclassTest:def__init__(self,audio_file_path:str):self.model=load_model('weight.h5')self.target_shape=(128,128)self.classes:List[str]=['techno','classic']self.audio_file_path:str=audio_file_pathdeftest_audio(self,file_path,model):audio_data,sample_rate=librosa.load(file_path,sr=None)mel_spectrogram=librosa.feature.melspectrogram(y=audio_data,sr=sample_rate)mel_spectrogram=resize(np.expand_dims(mel_spectrogram,axis=-1),self.target_shape)mel_spectrogram=tf.reshape(mel_spectrogram,(1,)+self.target_shape+(1,))predictions=model.predict(mel_spectrogram)class_probabilities=predictions[0]predicted_class_index=np.argmax(class_probabilities)returnclass_probabilities,predicted_class_indexdeftest(self):class_probabilities,predicted_class_index=self.test_audio(self.audio_file_path,self.model)fori,class_labelinenumerate(self.classes):probability=class_probabilities[i]print(f'Class:{class_label}, Probability:{probability:.4f}')predicted_class=self.classes[predicted_class_index]accuracy=class_probabilities[predicted_class_index]print(f'The audio is classified as:{predicted_class}')print(f'Accuracy:{accuracy:.4f}')