Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Cover image for Tensorflow music prediction
victor_dalet
victor_dalet

Posted on

Tensorflow music prediction

In this article, I show how to use tensorflow to predict a style of music.
In my example, I compare techno and classical music.

You can find the code on my github :
https://github.com/victordalet/sound_to_partition


I - Dataset

For the first step, you need to create onedataset forlder and inside add one folder for music style, for example i add onetechno folder andclassic folder in which in put mywav soung.

II - Train

I create a train file, with the argumentsmax_epochs to be completed.

Modify the classes in the constructor that correspond to your directory in the dataset folder.

In the loading and processing method, I retrieve the wav file from a different directory and obtain the spectogram.

For training purposes, I use the Keras convolutions and model.

importosimportsysfromtypingimportListimportlibrosaimportnumpyasnpfromtensorflow.keras.layersimportInput,Conv2D,MaxPooling2D,Flatten,Densefromtensorflow.keras.modelsimportModelfromtensorflow.keras.optimizersimportAdamfromsklearn.model_selectionimporttrain_test_splitfromtensorflow.keras.utilsimportto_categoricalfromtensorflow.imageimportresizeclassTrain:def__init__(self):self.X_train=Noneself.X_test=Noneself.y_train=Noneself.y_test=Noneself.data_dir:str='dataset'self.classes:List[str]=['techno','classic']self.max_epochs:int=int(sys.argv[1])@staticmethoddefload_and_preprocess_data(data_dir,classes,target_shape=(128,128)):data=[]labels=[]fori,class_nameinenumerate(classes):class_dir=os.path.join(data_dir,class_name)forfilenameinos.listdir(class_dir):iffilename.endswith('.wav'):file_path=os.path.join(class_dir,filename)audio_data,sample_rate=librosa.load(file_path,sr=None)mel_spectrogram=librosa.feature.melspectrogram(y=audio_data,sr=sample_rate)mel_spectrogram=resize(np.expand_dims(mel_spectrogram,axis=-1),target_shape)data.append(mel_spectrogram)labels.append(i)returnnp.array(data),np.array(labels)defcreate_model(self):data,labels=self.load_and_preprocess_data(self.data_dir,self.classes)labels=to_categorical(labels,num_classes=len(self.classes))# Convert labels to one-hot encodingself.X_train,self.X_test,self.y_train,self.y_test=train_test_split(data,labels,test_size=0.2,random_state=42)input_shape=self.X_train[0].shapeinput_layer=Input(shape=input_shape)x=Conv2D(32,(3,3),activation='relu')(input_layer)x=MaxPooling2D((2,2))(x)x=Conv2D(64,(3,3),activation='relu')(x)x=MaxPooling2D((2,2))(x)x=Flatten()(x)x=Dense(64,activation='relu')(x)output_layer=Dense(len(self.classes),activation='softmax')(x)self.model=Model(input_layer,output_layer)self.model.compile(optimizer=Adam(learning_rate=0.001),loss='categorical_crossentropy',metrics=['accuracy'])deftrain_model(self):self.model.fit(self.X_train,self.y_train,epochs=self.max_epochs,batch_size=32,validation_data=(self.X_test,self.y_test))test_accuracy=self.model.evaluate(self.X_test,self.y_test,verbose=0)print(test_accuracy[1])defsave_model(self):self.model.save('weight.h5')if__name__=='__main__':train=Train()train.create_model()train.train_model()train.save_model()
Enter fullscreen modeExit fullscreen mode

III - Test

To test and use the model, I've created this class to retrieve the weight and predict the style of the music.

Don't forget to add the right classes to the constructor.

fromtypingimportListimportlibrosaimportnumpyasnpfromtensorflow.keras.modelsimportload_modelfromtensorflow.imageimportresizeimporttensorflowastfclassTest:def__init__(self,audio_file_path:str):self.model=load_model('weight.h5')self.target_shape=(128,128)self.classes:List[str]=['techno','classic']self.audio_file_path:str=audio_file_pathdeftest_audio(self,file_path,model):audio_data,sample_rate=librosa.load(file_path,sr=None)mel_spectrogram=librosa.feature.melspectrogram(y=audio_data,sr=sample_rate)mel_spectrogram=resize(np.expand_dims(mel_spectrogram,axis=-1),self.target_shape)mel_spectrogram=tf.reshape(mel_spectrogram,(1,)+self.target_shape+(1,))predictions=model.predict(mel_spectrogram)class_probabilities=predictions[0]predicted_class_index=np.argmax(class_probabilities)returnclass_probabilities,predicted_class_indexdeftest(self):class_probabilities,predicted_class_index=self.test_audio(self.audio_file_path,self.model)fori,class_labelinenumerate(self.classes):probability=class_probabilities[i]print(f'Class:{class_label}, Probability:{probability:.4f}')predicted_class=self.classes[predicted_class_index]accuracy=class_probabilities[predicted_class_index]print(f'The audio is classified as:{predicted_class}')print(f'Accuracy:{accuracy:.4f}')
Enter fullscreen modeExit fullscreen mode

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

Hi....
  • Location
    Paris
  • Education
    ESGI
  • Work
    computer vision developer
  • Joined

More fromvictor_dalet

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp