CN114373476B

Movatterモバイル変換

Info

Publication number: CN114373476B
Application number: CN202210028342.9A
Authority: CN
Inventors: 雷震春; 周勇
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2022-01-11
Filing date: 2022-01-11
Publication date: 2025-09-19
Anticipated expiration: 2042-01-11
Also published as: CN114373476A

Abstract

Translated fromChinese

本发明提供一种基于多尺度残差注意力网络的声音场景分类方法，包括将采集到的音频数据进行特征提取，提取出对数梅尔频谱图及其一阶差分和二阶差分作为输入特征；构建多尺度残差注意力网络，将提取到的对数梅尔频谱图输入到网络中进行训练建立分类模型；采用mixup方法增强数据多样性；采用焦点损失关注分类困难的样本；获取新的声音场景语音，利用分类模型对语音进行声音场景分类，得到声音场景分类结果。本发明采用对数梅尔频谱图及其一阶和二阶差分，使用多尺度残差注意力网络模型来对声音场景进行分类，能够挖掘更多丰富全面的特征信息，从而提高声音场景分类性能。

The present invention provides a sound scene classification method based on a multi-scale residual attention network, comprising the following steps: extracting features from collected audio data, extracting a log-Mel spectrogram and its first-order and second-order differences as input features; constructing a multi-scale residual attention network, inputting the extracted log-Mel spectrogram into the network for training and establishing a classification model; employing a mixup method to enhance data diversity; employing a focal loss to focus on samples that are difficult to classify; acquiring new sound scene speech, and using the classification model to classify the speech into a sound scene, thereby obtaining a sound scene classification result. The present invention employs a log-Mel spectrogram and its first-order and second-order differences, and uses a multi-scale residual attention network model to classify sound scenes, thereby being able to mine more rich and comprehensive feature information, thereby improving sound scene classification performance.