CN118015110B

Movatterモバイル変換

Info

Publication number: CN118015110B
Application number: CN202311762681.0A
Authority: CN
Inventors: 王霄鹏; 虞钉钉; 胡贤良
Original assignee: Huayuan Computing Technology Shanghai Co ltd
Current assignee: Huayuan Computing Technology Shanghai Co ltd
Priority date: 2023-12-19
Filing date: 2023-12-19
Publication date: 2025-01-14
Anticipated expiration: 2043-12-19
Also published as: CN118015110A

Abstract

Translated fromChinese

一种人脸图像生成方法及装置、计算机可读存储介质、终端，所述方法包括：确定人脸图像生成模型，人脸图像生成模型包括音频内容特征提取子模型、音频情感特征提取子模型、扩散子模型；将驱动音频分别输入音频内容特征提取子模型和音频情感特征提取子模型进行特征提取，得到音频内容特征和音频情感特征；至少基于音频内容特征和音频情感特征进行拼接，得到音频融合特征；将所述音频融合特征和带噪声的参考人脸图像特征输入所述扩散子模型进行去噪处理，得到目标完整人脸特征；对目标完整人脸特征进行解码，得到完整人脸生成图像。上述方案有助于生成既能准确匹配驱动音频中的口型，又能精准表达驱动音频包含的情绪的人脸生成图像。

A method and device for generating a facial image, a computer-readable storage medium, and a terminal, the method comprising: determining a facial image generation model, the facial image generation model comprising an audio content feature extraction sub-model, an audio emotion feature extraction sub-model, and a diffusion sub-model; inputting the driving audio into the audio content feature extraction sub-model and the audio emotion feature extraction sub-model for feature extraction, thereby obtaining audio content features and audio emotion features; performing splicing based at least on the audio content features and the audio emotion features, thereby obtaining audio fusion features; inputting the audio fusion features and the features of a reference facial image with noise into the diffusion sub-model for denoising, thereby obtaining target complete facial features; and decoding the target complete facial features to obtain a complete facial generated image. The above scheme helps to generate a facial generated image that can accurately match the lip shape in the driving audio and accurately express the emotions contained in the driving audio.