
In this article, I show how to use tensorflow to predict a style of music.
In my example, I compare techno and classical music.
You can find the code on my github :
https://github.com/victordalet/sound_to_partition
I - Dataset
For the first step, you need to create onedataset
forlder and inside add one folder for music style, for example i add onetechno
folder andclassic
folder in which in put mywav
soung.
II - Train
I create a train file, with the argumentsmax_epochs
to be completed.
Modify the classes in the constructor that correspond to your directory in the dataset folder.
In the loading and processing method, I retrieve the wav file from a different directory and obtain the spectogram.
For training purposes, I use the Keras convolutions and model.
importosimportsysfromtypingimportListimportlibrosaimportnumpyasnpfromtensorflow.keras.layersimportInput,Conv2D,MaxPooling2D,Flatten,Densefromtensorflow.keras.modelsimportModelfromtensorflow.keras.optimizersimportAdamfromsklearn.model_selectionimporttrain_test_splitfromtensorflow.keras.utilsimportto_categoricalfromtensorflow.imageimportresizeclassTrain:def__init__(self):self.X_train=Noneself.X_test=Noneself.y_train=Noneself.y_test=Noneself.data_dir:str='dataset'self.classes:List[str]=['techno','classic']self.max_epochs:int=int(sys.argv[1])@staticmethoddefload_and_preprocess_data(data_dir,classes,target_shape=(128,128)):data=[]labels=[]fori,class_nameinenumerate(classes):class_dir=os.path.join(data_dir,class_name)forfilenameinos.listdir(class_dir):iffilename.endswith('.wav'):file_path=os.path.join(class_dir,filename)audio_data,sample_rate=librosa.load(file_path,sr=None)mel_spectrogram=librosa.feature.melspectrogram(y=audio_data,sr=sample_rate)mel_spectrogram=resize(np.expand_dims(mel_spectrogram,axis=-1),target_shape)data.append(mel_spectrogram)labels.append(i)returnnp.array(data),np.array(labels)defcreate_model(self):data,labels=self.load_and_preprocess_data(self.data_dir,self.classes)labels=to_categorical(labels,num_classes=len(self.classes))# Convert labels to one-hot encodingself.X_train,self.X_test,self.y_train,self.y_test=train_test_split(data,labels,test_size=0.2,random_state=42)input_shape=self.X_train[0].shapeinput_layer=Input(shape=input_shape)x=Conv2D(32,(3,3),activation='relu')(input_layer)x=MaxPooling2D((2,2))(x)x=Conv2D(64,(3,3),activation='relu')(x)x=MaxPooling2D((2,2))(x)x=Flatten()(x)x=Dense(64,activation='relu')(x)output_layer=Dense(len(self.classes),activation='softmax')(x)self.model=Model(input_layer,output_layer)self.model.compile(optimizer=Adam(learning_rate=0.001),loss='categorical_crossentropy',metrics=['accuracy'])deftrain_model(self):self.model.fit(self.X_train,self.y_train,epochs=self.max_epochs,batch_size=32,validation_data=(self.X_test,self.y_test))test_accuracy=self.model.evaluate(self.X_test,self.y_test,verbose=0)print(test_accuracy[1])defsave_model(self):self.model.save('weight.h5')if__name__=='__main__':train=Train()train.create_model()train.train_model()train.save_model()
III - Test
To test and use the model, I've created this class to retrieve the weight and predict the style of the music.
Don't forget to add the right classes to the constructor.
fromtypingimportListimportlibrosaimportnumpyasnpfromtensorflow.keras.modelsimportload_modelfromtensorflow.imageimportresizeimporttensorflowastfclassTest:def__init__(self,audio_file_path:str):self.model=load_model('weight.h5')self.target_shape=(128,128)self.classes:List[str]=['techno','classic']self.audio_file_path:str=audio_file_pathdeftest_audio(self,file_path,model):audio_data,sample_rate=librosa.load(file_path,sr=None)mel_spectrogram=librosa.feature.melspectrogram(y=audio_data,sr=sample_rate)mel_spectrogram=resize(np.expand_dims(mel_spectrogram,axis=-1),self.target_shape)mel_spectrogram=tf.reshape(mel_spectrogram,(1,)+self.target_shape+(1,))predictions=model.predict(mel_spectrogram)class_probabilities=predictions[0]predicted_class_index=np.argmax(class_probabilities)returnclass_probabilities,predicted_class_indexdeftest(self):class_probabilities,predicted_class_index=self.test_audio(self.audio_file_path,self.model)fori,class_labelinenumerate(self.classes):probability=class_probabilities[i]print(f'Class:{class_label}, Probability:{probability:.4f}')predicted_class=self.classes[predicted_class_index]accuracy=class_probabilities[predicted_class_index]print(f'The audio is classified as:{predicted_class}')print(f'Accuracy:{accuracy:.4f}')
Top comments(0)
For further actions, you may consider blocking this person and/orreporting abuse