CN115132223B

Movatterモバイル変換

Info

Publication number: CN115132223B
Application number: CN202210748518.8A
Authority: CN
Inventors: 刘海; 张昭理; 何嘉文; 刘俊强; 王书通; 王坤; 刘婷婷; 杨兵
Original assignee: Hubei University; Central China Normal University
Current assignee: Hubei University; Central China Normal University
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2024-11-26
Anticipated expiration: 2042-06-29
Also published as: CN115132223A

Abstract

Translated fromChinese

本发明公开了一种基于时频增强的音频数据标注精度增强方法，该方法包括：数据采集流程，对教师的教学音频进行采集，再将教学音频中的原始信号通过线性声谱图的过渡量转化为梅尔声谱图；数据增强流程，使用VoiceAugment音频数据增强算法对输入的需要标注的教学音频进行数据增强，增强技术由频率信道掩蔽块和时间帧掩蔽块组成，增强了教学音频的特征属性；自动标注流程，使用ANNA模型对教学音频实现自动标注，ANNA模型由波谱图特征获取、声谱图特征获取、特征融合和情感标注等模块组成。本发明实现了教师教学情感的自动标注，提高了教学音频的标注速度和标注精度，克服了因人力标注所致的耗时费力等缺陷，为教师课堂的情感预测等服务提供了较为准确的数据标签。

The present invention discloses a method for enhancing the accuracy of audio data annotation based on time-frequency enhancement, which includes: a data collection process, in which the teacher's teaching audio is collected, and then the original signal in the teaching audio is converted into a Mel spectrogram through the transition amount of the linear spectrogram; a data enhancement process, in which the VoiceAugment audio data enhancement algorithm is used to perform data enhancement on the input teaching audio that needs to be annotated, and the enhancement technology is composed of a frequency channel masking block and a time frame masking block, which enhances the characteristic attributes of the teaching audio; an automatic annotation process, in which the ANNA model is used to automatically annotate the teaching audio, and the ANNA model is composed of modules such as spectrogram feature acquisition, spectrogram feature acquisition, feature fusion and emotion annotation. The present invention realizes the automatic annotation of the teacher's teaching emotion, improves the annotation speed and annotation accuracy of the teaching audio, overcomes the time-consuming and labor-intensive defects caused by human annotation, and provides more accurate data labels for services such as emotion prediction in the teacher's classroom.