CN114299418B

Movatterモバイル変換

Info

Publication number: CN114299418B
Application number: CN202111507949.7A
Authority: CN
Inventors: 肖业伟; 滕连伟; 朱澳苏; 刘烜铭; 田丕承
Original assignee: Xiangtan University
Current assignee: Xiangtan University
Priority date: 2021-12-10
Filing date: 2021-12-10
Publication date: 2025-01-03
Anticipated expiration: 2041-12-10
Also published as: CN114299418A

Abstract

本发明公开了一种粤语唇读识别方法、设备以及存储介质，方法包括获取第一粤语视频片段；裁剪第一粤语视频片段中的无用片段，得到第二粤语视频片段；划分第二粤语视频片段中的视频序列和音频序列，对音频序列进行分词并生成分词时间戳，根据分词和分词时间戳生成标签；提取视频序列中的人脸图像，并过滤不完整的人脸图像，根据过滤后的人脸图像和标签生成样本图像；根据样本图像训练预设的粤语唇读识别模型，得到训练完成的粤语唇读识别模型；根据训练完成的粤语唇读识别模型识别目标视频序列，得到识别结果。本方法能够采集粤语单词级的唇读样本图像数据集，由于剔除了视频序列中的无用序列，能够提升训练后的模型的识别精度。

The present invention discloses a Cantonese lip reading recognition method, device and storage medium. The method comprises obtaining a first Cantonese video clip; cutting out useless clips in the first Cantonese video clip to obtain a second Cantonese video clip; dividing the video sequence and audio sequence in the second Cantonese video clip, segmenting the audio sequence and generating a segmentation timestamp, and generating a label according to the segmentation and the segmentation timestamp; extracting a face image in the video sequence, filtering incomplete face images, and generating a sample image according to the filtered face image and the label; training a preset Cantonese lip reading recognition model according to the sample image to obtain a trained Cantonese lip reading recognition model; and recognizing a target video sequence according to the trained Cantonese lip reading recognition model to obtain a recognition result. The method can collect a Cantonese word-level lip reading sample image data set, and can improve the recognition accuracy of the trained model because useless sequences in the video sequence are eliminated.