Movatterモバイル変換


[0]ホーム

URL:


CN118052723A - Intelligent design system for face replacement - Google Patents

Intelligent design system for face replacement
Download PDF

Info

Publication number
CN118052723A
CN118052723ACN202311680842.1ACN202311680842ACN118052723ACN 118052723 ACN118052723 ACN 118052723ACN 202311680842 ACN202311680842 ACN 202311680842ACN 118052723 ACN118052723 ACN 118052723A
Authority
CN
China
Prior art keywords
face
image
key point
facial
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311680842.1A
Other languages
Chinese (zh)
Other versions
CN118052723B (en
Inventor
黄家鸿
陈照军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shidai Technology Group Co ltd
Original Assignee
Shenzhen Shidai Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Shidai Technology Group Co ltdfiledCriticalShenzhen Shidai Technology Group Co ltd
Priority to CN202311680842.1ApriorityCriticalpatent/CN118052723B/en
Publication of CN118052723ApublicationCriticalpatent/CN118052723A/en
Application grantedgrantedCritical
Publication of CN118052723BpublicationCriticalpatent/CN118052723B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention belongs to the technical field of computer vision, and discloses an intelligent design system for face replacement; the face detection tracking system comprises a face detection tracking module and a face detection tracking module, wherein the face detection tracking module acquires a face region image; the face feature extraction module is used for extracting a face key point coordinate sequence and a depth learning feature from the face region image, and acquiring the face region feature according to the face key point coordinate sequence and the depth learning feature; the face matching degree evaluation module is used for comparing the face region characteristics of the face region image to be replaced with those of the target face image, dividing the face region image to be replaced and the target face image into n sub-regions, and obtaining n replacement matching degrees; the self-adaptive image fusion module is used for setting a matching degree threshold interval to obtain a comparison result; and according to the comparison result, an image fusion algorithm is adopted to fuse and replace the target portrait image and the face region image to be replaced, so that the capability of accurately positioning the face region is ensured, and the self-adaptive fusion of facial features is realized.

Description

Translated fromChinese
一种用于人脸替换的智能设计系统An intelligent design system for face replacement

技术领域Technical Field

本发明涉及计算机视觉技术领域,更具体地说,本发明涉及一种用于人脸替换的智能设计系统。The present invention relates to the field of computer vision technology, and more specifically, to an intelligent design system for face replacement.

背景技术Background technique

申请公开号为CN116403260A的专利公开了一种AI人脸替换方法,包括如下步骤:S1:人脸库的建立;S2:人脸提取;S3:模型训练;S4:人脸转换;所述S1具体包括:S11:人脸生成库建立;S12:人脸替换库的建立;所述S2具体包括:S21:人脸检测;S22:人脸关键点检测与对齐;S23:人脸分割;所述S3具体包括:S31:泛化生成模型训练;S32:特定人脸模型训练;所述S4具体包括:S41:人脸交换;S42:人脸融合。本发明AI人脸替换方法通过人脸分割能够将人脸检测步骤得到的人脸图片中包含的不规则的遮挡物,比如头发、手、眼镜等进行消除,以保证模型可在训练过程中稳定收敛,不会将遮挡物当成人脸信息,及保证最终换脸效果更加无缝平滑,提高替换人脸的准确性和稳定性。The patent with application publication number CN116403260A discloses an AI face replacement method, including the following steps: S1: establishment of a face library; S2: face extraction; S3: model training; S4: face conversion; the S1 specifically includes: S11: establishment of a face generation library; S12: establishment of a face replacement library; the S2 specifically includes: S21: face detection; S22: face key point detection and alignment; S23: face segmentation; the S3 specifically includes: S31: generalized generation model training; S32: specific face model training; the S4 specifically includes: S41: face exchange; S42: face fusion. The AI face replacement method of the present invention can eliminate irregular occluders, such as hair, hands, glasses, etc., contained in the face picture obtained in the face detection step through face segmentation, so as to ensure that the model can converge stably during the training process, will not regard the occluders as face information, and ensure that the final face replacement effect is more seamless and smooth, and improve the accuracy and stability of the replaced face.

现有技术的问题在于,缺少能够对视频画面中的人脸实现全过程的自动化精确定位、表达、分析和处理的技术手段;主要面临的挑战包括:视频流场景下的人脸检测识别效果难以保证;对面部复杂特征的表达与理解也比较薄弱;没有对面部区域进行适配性分析和处理的决策能力;缺乏将面部定位、分析和处理有机集成,进行自适应人脸融合的系统化解决方案;这些问题的存在都制约了当前视频画面编辑应用中面部区域的智能化处理效果。The problem with the existing technology is that there is a lack of technical means to achieve automatic and precise positioning, expression, analysis and processing of faces in video images throughout the entire process; the main challenges include: the face detection and recognition effect in video streaming scenarios is difficult to guarantee; the expression and understanding of complex facial features are also relatively weak; there is no decision-making ability to adaptively analyze and process facial areas; there is a lack of systematic solutions that organically integrate face positioning, analysis and processing for adaptive face fusion; the existence of these problems restricts the intelligent processing effect of facial areas in current video editing applications.

鉴于此,本发明提出一种用于人脸替换的智能设计系统以解决上述问题。In view of this, the present invention proposes an intelligent design system for face replacement to solve the above problems.

发明内容Summary of the invention

为了克服现有技术的上述缺陷,为实现上述目的,本发明提供如下技术方案:一种用于人脸替换的智能设计系统,包括:In order to overcome the above-mentioned defects of the prior art and to achieve the above-mentioned purpose, the present invention provides the following technical solution: an intelligent design system for face replacement, comprising:

人脸检测追踪模块,用于对输入的视频流进行人脸检测,获取人脸区域图像;The face detection and tracking module is used to detect faces on the input video stream and obtain the face area image;

人脸特征提取模块,用于对人脸区域图像提取人脸关键点坐标序列和深度学习特征,根据人脸关键点坐标序列和深度学习特征获取人脸区域特征;A face feature extraction module is used to extract a facial key point coordinate sequence and deep learning features from a face region image, and obtain facial region features based on the facial key point coordinate sequence and deep learning features;

人脸匹配度评估模块,用于对待替换人脸区域图像与目标人脸图像进行人脸区域特征比对,将待替换人脸区域图像与目标人脸图像划分为n个子区域,获取待替换人脸区域图像与目标人脸图像n个子区域之间的n个替换匹配度;A face matching evaluation module is used to compare face region features between the face region image to be replaced and the target face image, divide the face region image to be replaced and the target face image into n sub-regions, and obtain n replacement matching degrees between the face region image to be replaced and the n sub-regions of the target face image;

自适应图像融合模块,设置匹配度阈值区间,通过比对n个替换匹配度和匹配度阈值区间获取比对结果;根据比对结果采用图像融合算法,将目标人像图像和待替换人脸区域图像进行融合替换;各个模块之间通过有线和/或无线的方式进行连接,实现模块间的数据传输。The adaptive image fusion module sets a matching degree threshold interval and obtains a matching result by comparing n replacement matching degrees and the matching degree threshold interval; an image fusion algorithm is used according to the matching result to fuse and replace the target portrait image and the image of the face area to be replaced; each module is connected by wire and/or wireless means to realize data transmission between modules.

进一步地,所述对输入的视频流进行人脸检测,获取人脸区域图像的方式包括:Furthermore, the method of performing face detection on the input video stream to obtain the face area image includes:

预训练完成的MTCNN人脸检测模型,加载到内存中;逐帧读取输入的视频流画面,提取RGB图像;将提取的RGB图像,输入到预先训练MTCNN人脸检测模型中,生成每个人脸的坐标框;根据人脸坐标框,在输入的RGB图像中截取人脸框所在的区域,生成人脸区域图像;The pre-trained MTCNN face detection model is loaded into the memory; the input video stream is read frame by frame to extract the RGB image; the extracted RGB image is input into the pre-trained MTCNN face detection model to generate the coordinate frame of each face; according to the face coordinate frame, the area where the face frame is located is intercepted in the input RGB image to generate the face area image;

所述MTCNN人脸检测模型的预训练过程包括:The pre-training process of the MTCNN face detection model includes:

将MTCNN人脸检测模型的训练分为三个阶段,三个阶段包括训练提取网络、训练优化网络和训练输出网络。The training of the MTCNN face detection model is divided into three stages, including training the extraction network, training the optimization network, and training the output network.

进一步地,所述训练提取网络的方式包括:Furthermore, the training extraction network includes:

采集m张包括人脸的图像作为训练集;m为大于1的整数;对训练集图像进行人工或者计算机人脸标注,获取包括人脸标签框的图像训练集;Collect m images including human faces as a training set, where m is an integer greater than 1; perform manual or computer face annotation on the training set images to obtain an image training set including a face label frame;

提取网络采用卷积神经网络结构,包括卷积层、激活函数层、池化层和全连接层组成;The extraction network adopts a convolutional neural network structure, which includes convolutional layer, activation function layer, pooling layer and fully connected layer;

设置输入层为输入RGB图像的三通道像素数据;使用b个卷积核对输入RGB图像进行卷积运算,获取特征图;利用激活函数层,对卷积层输出的特征图施加非线性激活;通过池化层,对特征图进行降维处理;得到降维特征图;利用全连接层将降维特征图变为特征图向量;Set the input layer to the three-channel pixel data of the input RGB image; use b convolution kernels to perform convolution operation on the input RGB image to obtain the feature map; use the activation function layer to apply nonlinear activation to the feature map output by the convolution layer; use the pooling layer to reduce the dimension of the feature map; obtain the reduced dimension feature map; use the fully connected layer to convert the reduced dimension feature map into a feature map vector;

损失函数层采用交叉熵函数H(P,Q)=-∑iP(i)log(Q(i));The loss function layer uses the cross entropy function H(P,Q)=-∑i P(i)log(Q(i));

其中,H(P,Q)表示真实分布P和模型预测分布Q之间的交叉熵损失;P(i)表示真实分布中第i个类别的概率;Q(i)表示模型预测分布中第i个类别的概率;Among them, H(P,Q) represents the cross entropy loss between the true distribution P and the model predicted distribution Q; P(i) represents the probability of the i-th category in the true distribution; Q(i) represents the probability of the i-th category in the model predicted distribution;

定义回归框损失函数测量网络预测框和标签框的误差;Define the regression box loss function to measure the error between the network prediction box and the label box;

回归框损失函数为:The regression box loss function is:

其中E的取值为px-gx,py-gy,pw-gw和ph-ghThe values of E are px -gx , py -gyy , pw -gw and ph -gh ;

其中,(px,py)为预测框的中心坐标,pw为预测框的宽,ph为预测框的高;(gx,gy)为标签框的中心坐标,gw为标签框的宽,gh为标签框的高;N为总的图像训练集内图像的数量;Where (px ,py ) is the center coordinate of the prediction box, pw is the width of the prediction box, andph is the height of the prediction box; (gx ,gy ) is the center coordinate of the label box, gw is the width of the label box, and gh is the height of the label box; N is the number of images in the total image training set;

利用反向传播算法根据损失函数更新网络参数,使预测框不断逼近标签框;通过迭代优化模型,最终获得提取网络。The back propagation algorithm is used to update the network parameters according to the loss function, so that the prediction box continues to approach the label box; by iteratively optimizing the model, the extraction network is finally obtained.

进一步地,训练优化网络的步骤包括:Furthermore, the steps of training and optimizing the network include:

利用提取网络对训练集图像生成的所有候选框进行评估;评估每个候选框与标签框的IoU;IoU表示两者的重合度;Use the extraction network to evaluate all candidate boxes generated by the training set images; evaluate the IoU between each candidate box and the label box; IoU indicates the degree of overlap between the two;

若候选框的IoU>0.5,则标注该候选框为正样本;若候选框的IoU≤0.5,则标注该候选框为负样本;If the IoU of the candidate box is greater than 0.5, the candidate box is marked as a positive sample; if the IoU of the candidate box is less than or equal to 0.5, the candidate box is marked as a negative sample;

定义优化网络的网络结构,网络结构包括卷积层、池化层和全连接层;定义优化网络的训练目标为输出为一个二值判断结果,二值判断结果为当前候选框是否包括人脸;期望为其判断结果与标注的标签一致;标签包括正样本和负样本;Define the network structure of the optimization network, which includes convolutional layers, pooling layers, and fully connected layers; define the training goal of the optimization network as outputting a binary judgment result, which is whether the current candidate box includes a face; expect the judgment result to be consistent with the annotated label; the label includes positive samples and negative samples;

采用二分类交叉熵损失函数作为优化网络的损失函数;The binary cross entropy loss function is used as the loss function of the optimization network;

二分类交叉熵损失函数Loss(o,y)=-(ylog(o)+(1-y)log(1-o));Binary cross entropy loss function Loss(o,y)=-(ylog(o)+(1-y)log(1-o));

其中,o为当前样本预测为正样本的概率;y当前样本的标签,对于正样本y为1、对于负样本y为0;Among them, o is the probability that the current sample is predicted to be a positive sample; y is the label of the current sample, which is 1 for positive samples and 0 for negative samples;

输入带候选框的图像至优化网络进行网络计算,得到二值判断结果;比较二值判断结果与标签并计算二分类交叉熵损失函数值;通过反向传播等算法更新优化网络的网络参数,二分类交叉熵损失函数的值在更新优化的过程中会逐步下降,当其下降速度变缓时,则结束训练;得到优化网络。Input the image with the candidate box to the optimized network for network calculation to obtain the binary judgment result; compare the binary judgment result with the label and calculate the value of the binary cross entropy loss function; update the network parameters of the optimized network through back propagation and other algorithms. The value of the binary cross entropy loss function will gradually decrease during the update and optimization process. When its decrease rate slows down, the training is terminated; and the optimized network is obtained.

进一步地,所述训练输出网络的步骤包括:Furthermore, the step of training the output network includes:

经过优化网络的过滤,留下的高质量候选框图片作为输出网络的训练样本;构建输出网络的网络结构与优化网络的网络结构相同;After filtering by the optimized network, the high-quality candidate frame images left are used as training samples for the output network; the network structure for constructing the output network is the same as that of the optimized network;

定义输出网络的损失函数为坐标回归损失函数,坐标回归损失函数用于测量输出网络预测框与标签框在位置和大小上的误差平方和;The loss function of the output network is defined as a coordinate regression loss function, which is used to measure the sum of square errors between the output network prediction box and the label box in terms of position and size.

输出网络的输入为候选框图像,输出网络的输出为调整后的预测框坐标;预测框坐标包括预测框的中心坐标、预测框的宽和预测框的高;The input of the output network is the candidate box image, and the output of the output network is the adjusted prediction box coordinates; the prediction box coordinates include the center coordinates of the prediction box, the width of the prediction box, and the height of the prediction box;

预定义输出网络的参数,输出网络根据当前参数,预测输出调整后的候选框坐标;计算该预测框与该图像标签框的坐标回归损失函数值;通过链式法则求导计算每个参数对坐标回归损失函数值的梯度;沿负梯度方向微调输出网络的参数,使坐标回归损失函数的值降低;当其降低速度变缓时,则结束训练;得到输出网络;Predefine the parameters of the output network, and the output network predicts the coordinates of the candidate box after the output adjustment according to the current parameters; calculate the coordinate regression loss function value of the predicted box and the image label box; calculate the gradient of each parameter to the coordinate regression loss function value by derivation through the chain rule; fine-tune the parameters of the output network along the negative gradient direction to reduce the value of the coordinate regression loss function; when the reduction speed slows down, end the training; and obtain the output network;

三个阶段均训练完成后,将三个阶段串联即构成完整的MTCNN人脸检测模;After the three stages are trained, they are connected in series to form a complete MTCNN face detection model.

进一步地,所述人脸关键点坐标序列的提取过程包括:Furthermore, the extraction process of the facial key point coordinate sequence includes:

构建关键点检测卷积神经网络,关键点检测卷积神经网络包括a个卷积层、池化层、和全连接层;关键点检测卷积神经网络的输入为人脸区域图像;人脸区域图像为RGB图像;关键点检测卷积神经网络的输出为人脸关键点坐标,人脸关键点共c个;Construct a key point detection convolutional neural network, which includes a convolutional layer, a pooling layer, and a fully connected layer; the input of the key point detection convolutional neural network is a face area image; the face area image is an RGB image; the output of the key point detection convolutional neural network is the coordinates of the face key points, and there are c face key points in total;

收集并标注v张人脸图像特征图像,构成人脸图像特征图像集;Collect and annotate v facial image feature images to form a facial image feature image set;

收集的人脸图像特征图像涵盖不同性别、年龄、种族且面部表情姿态均不同;使用图像标注工具手动或计算机标记每张人脸图像特征图像中人脸的c个关键点坐标;The collected facial image feature images cover different genders, ages, races and facial expressions and postures; the image annotation tools are used to manually or computer-mark the coordinates of c key points of the face in each facial image feature image;

将标注过的人脸图像特征图像集输入关键点检测卷积神经网络;损失函数为预测的关键点坐标与标注的关键点坐标的欧式距离;The labeled facial image feature set is input into the key point detection convolutional neural network; the loss function is the Euclidean distance between the predicted key point coordinates and the labeled key point coordinates;

在单张人脸图像特征图像上的损失函数为The loss function on a single face image feature image is

其中,为关键点检测卷积神经网络预测的第i张图像的第j个关键点坐标;yij为第i张图像手工标注的第j个关键点的坐标;/>为预测关键点坐标/>与标注关键点坐标yij的欧式距离;c为人脸关键点的数量;in, is the coordinate of the jth key point of the i-th image predicted by the key point detection convolutional neural network;yij is the coordinate of the jth key point manually annotated in the i-th image; /> To predict key point coordinates/> The Euclidean distance from the coordinates of the labeled key points yij ; c is the number of facial key points;

在人脸图像特征图像集上,关键点检测模型的总体损失函数为:On the face image feature image set, the overall loss function of the key point detection model is:

其中,v为人脸图像特征图像集包括的所有图像数量;Among them, v is the number of all images included in the face image feature image set;

定义关键点检测卷积神经网络的网络参数并初始化;输入人脸图像特征图像集内的图像进行训练,并前向传播,计算损失Lz;通过反向传播算法计算损失Lz对参数θ的梯度通过梯度下降法/>其中,α为学习率;通过迭代优化,逐步减小损失Lz的值,得到人脸关键点检测模型;Define the network parameters of the key point detection convolutional neural network and initialize it; input the images in the face image feature set for training, and forward propagate to calculate the lossLz ; calculate the gradient of the lossLz to the parameter θ through the back-propagation algorithm By gradient descent method/> Among them, α is the learning rate; through iterative optimization, the value of lossLz is gradually reduced to obtain the face key point detection model;

将人脸区域图像输入至人脸关键点检测模型预测提取输出c个关键点坐标;将c个关键点坐标构成人脸关键点坐标序列;Input the face region image into the face key point detection model to predict and extract the output c key point coordinates; construct the face key point coordinate sequence from the c key point coordinates;

所述深度学习特征的提取过程包括:The deep learning feature extraction process includes:

预先定义人脸关键点检测模型的中间层;将人脸区域图像输入至人脸关键点检测模型中;提取人脸关键点检测模型中间层输出的深度特征向量;深度特征向量即为深度学习特征;Predefine the middle layer of the face key point detection model; input the face area image into the face key point detection model; extract the deep feature vector output by the middle layer of the face key point detection model; the deep feature vector is the deep learning feature;

所述人脸区域特征的获取方式包括:The method of acquiring the facial region features includes:

深度特征向量为一个定长的向量,人脸关键点坐标序列为不定长;将人脸关键点坐标序列展平,每一个关键点用一个长度为2的向量表示;将深度特征向量,与展平后的人脸关键点坐标序列拼接;得到一个定长向量。The depth feature vector is a fixed-length vector, and the facial key point coordinate sequence is of variable length; the facial key point coordinate sequence is flattened, and each key point is represented by a vector of length 2; the depth feature vector is concatenated with the flattened facial key point coordinate sequence to obtain a fixed-length vector.

进一步地,所述替换匹配度的获取方式包括:Furthermore, the method for obtaining the replacement matching degree includes:

利用人脸特征提取模块分别提取待替换人脸区域图像A的人脸区域特征和目标人脸图像B的人脸区域特征;根据n个子区域划分n对人脸区域特征;人脸区域特征为高维向量,包括深度学习特征和人脸关键点坐标序列两部分信息;分别计算n对人脸区域特征之间的余弦相似度sim(ai,bi);The facial region features of the face region image A to be replaced and the face region features of the target face image B are respectively extracted by using the facial feature extraction module; n pairs of facial region features are divided according to n sub-regions; the facial region features are high-dimensional vectors, including two parts of information, namely, deep learning features and facial key point coordinate sequences; the cosine similarity sim(ai ,bi ) between the n pairs of facial region features is respectively calculated;

其中,ai为第i个子区域的待替换人脸区域图像A的人脸区域特征;bi为第i个子区域的目标人脸图像B的人脸区域特征;ai.bi表示ai和bi向量的内积;||ai||和||bi||分别表示ai和bi的L2范数;Wherein,ai is the face region feature of the face region image A to be replaced in the i-th sub-region;bi is the face region feature of the target face image B in the i-th sub-region;ai.bi represents the inner product ofai andbi vectors; ||ai || and ||bi || represent the L2 norm ofai andbi respectively;

将sim(ai,bi)的值作为待替换人脸区域图像与目标人脸图像子区域之间的替换匹配度。The value of sim(ai ,bi ) is used as the replacement matching degree between the face region image to be replaced and the sub-region of the target face image.

进一步地,匹配度阈值区间的设置方式包括:Furthermore, the matching degree threshold interval is set in the following manner:

收集包括u对人脸图像的训练数据集,每对人脸图像为同一人不同场景拍摄的人脸图片;对每对人脸图像计算匹配度scoreij;统计整个训练数据集中所有对人脸图像的匹配度scoreij,计算所有对人脸图像的匹配度scoreij的均值计算所有对人脸图像的匹配度scoreij的标准差/>其中,u为训练数据集内人脸图像对的数量;Collect a training data set including u pairs of face images, where each pair of face images is a face image of the same person taken in different scenes; calculate the matching degree scoreij for each pair of face images; count the matching degrees scoreij of all pairs of face images in the entire training data set, and calculate the mean of the matching degrees scoreij of all pairs of face images Calculate the standard deviation of the matching scoresij of all face images/> Where u is the number of face image pairs in the training dataset;

设置匹配度阈值区间的上限为ρ+3σ;设置匹配度阈值区间下限为ρ-3σ;则匹配度阈值区间为[ρ+3σ,ρ-3σ];Set the upper limit of the matching threshold interval to ρ+3σ; set the lower limit of the matching threshold interval to ρ-3σ; then the matching threshold interval is [ρ+3σ,ρ-3σ];

所述对比结果为高匹配或中匹配或低匹配;The comparison result is a high match, a medium match, or a low match;

若n个替换匹配度中的替换匹配度大于或等于,则对应的子区域的对比结果为高匹配;若n个替换匹配度中的替换匹配度处于匹配度阈值区间内,则对应的子区域的对比结果为中匹配;若n个替换匹配度中的替换匹配度小于或等于ρ-3σ,则对应的子区域的对比结果为低匹配。If the replacement matching degree among the n replacement matching degrees is greater than or equal to, the comparison result of the corresponding sub-region is a high match; if the replacement matching degree among the n replacement matching degrees is within the matching degree threshold interval, the comparison result of the corresponding sub-region is a medium match; if the replacement matching degree among the n replacement matching degrees is less than or equal to ρ-3σ, the comparison result of the corresponding sub-region is a low match.

进一步地,所述根据比对结果采用图像融合算法,将目标人像图像和待替换人脸区域图像进行融合替换的方式包括:Furthermore, the method of using an image fusion algorithm according to the comparison result to fuse and replace the target portrait image and the face area image to be replaced includes:

将对比结果为高匹配的目标人像图像和待替换人脸区域图像子区域运用泊松融合算法进行融合替换;The target portrait image with a high match in the comparison result and the sub-region of the face region to be replaced are fused and replaced using the Poisson fusion algorithm;

将对比结果为中匹配的目标人像图像和待替换人脸区域图像子区域运用颜色迁移算法进行融合替换;The target portrait image and the sub-region of the face area to be replaced are fused and replaced by using the color migration algorithm;

将对比结果为低匹配的目标人像图像和待替换人脸区域图像子区域运用几何变换算法进行融合替换;实现目标人像图像和待替换人脸区域图像之间的融合替换。The target portrait image with a low match in comparison result and the sub-region of the face area image to be replaced are fused and replaced using a geometric transformation algorithm; and the fusion replacement between the target portrait image and the sub-region of the face area image to be replaced is achieved.

一种用于人脸替换的智能设计方法,其基于所述的一种用于人脸替换的智能设计系统实现,包括:S1、对输入的视频流进行人脸检测,获取人脸区域图像;An intelligent design method for face replacement, which is implemented based on the intelligent design system for face replacement, comprises: S1, performing face detection on an input video stream to obtain a face area image;

S2、对人脸区域图像提取人脸关键点坐标序列和深度学习特征,根据人脸关键点坐标序列和深度学习特征获取人脸区域特征;S2, extracting facial key point coordinate sequence and deep learning features from the facial region image, and obtaining facial region features according to the facial key point coordinate sequence and the deep learning features;

S3、对待替换人脸区域图像与目标人脸图像进行人脸区域特征比对,将待替换人脸区域图像与目标人脸图像划分为n个子区域,获取待替换人脸区域图像与目标人脸图像n个子区域之间的n个替换匹配度;S3, performing facial region feature comparison between the facial region image to be replaced and the target facial image, dividing the facial region image to be replaced and the target facial image into n sub-regions, and obtaining n replacement matching degrees between the facial region image to be replaced and the n sub-regions of the target facial image;

S4、设置匹配度阈值区间,通过比对n个替换匹配度和匹配度阈值区间获取比对结果;根据比对结果采用图像融合算法,将目标人像图像和待替换人脸区域图像进行融合替换。S4, setting a matching degree threshold interval, and obtaining a matching result by comparing n replacement matching degrees and the matching degree threshold interval; and using an image fusion algorithm according to the matching result to fuse and replace the target portrait image and the face area image to be replaced.

本发明一种用于人脸替换的智能设计系统的技术效果和优点:Technical effects and advantages of an intelligent design system for face replacement according to the present invention:

实现对视频画面中的人脸进行检测识别、定位、匹配分析和自适应融合替换的全自动化处理;系统集成了人脸检测、特征表达、相似度判断和图像融合模块,可对人脸区域精确识别、表达和分析,并根据判断结果选择最优算法,实现人脸的自然融合替换;这样,在保证处理效果的同时,也实现了处理策略的自动化选择和系统的灵活扩展;总体上,本系统拥有支持多种智能视频应用的人脸处理能力,既保证了精确定位人脸区域的能力,又实现了对复杂面部特征和纹理的自适应表达、分析和融合。It realizes fully automated processing of detection, recognition, positioning, matching analysis and adaptive fusion and replacement of faces in video images; the system integrates face detection, feature expression, similarity judgment and image fusion modules, which can accurately identify, express and analyze face areas, and select the optimal algorithm according to the judgment results to realize natural fusion and replacement of faces; in this way, while ensuring the processing effect, it also realizes the automatic selection of processing strategies and flexible expansion of the system; overall, this system has the face processing capability to support a variety of intelligent video applications, which not only ensures the ability to accurately locate face areas, but also realizes the adaptive expression, analysis and fusion of complex facial features and textures.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的一种用于人脸替换的智能设计系统示意图;FIG1 is a schematic diagram of an intelligent design system for face replacement according to the present invention;

图2为本发明的人脸关键点示意图;FIG2 is a schematic diagram of key points of a human face according to the present invention;

图3为本发明的一种用于人脸替换的智能设计方法示意图;FIG3 is a schematic diagram of an intelligent design method for face replacement according to the present invention;

图4为本发明的电子设备示意图。FIG. 4 is a schematic diagram of an electronic device of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

实施例1Example 1

请参阅图1所示,本实施例所述一种用于人脸替换的智能设计系统,包括:人脸检测追踪模块,用于对输入的视频流进行人脸检测,获取人脸区域图像;Referring to FIG. 1 , the intelligent design system for face replacement described in this embodiment includes: a face detection and tracking module, which is used to perform face detection on an input video stream and obtain a face region image;

人脸特征提取模块,用于对人脸区域图像提取人脸关键点坐标序列和深度学习特征,根据人脸关键点坐标序列和深度学习特征获取人脸区域特征;A face feature extraction module is used to extract a facial key point coordinate sequence and deep learning features from a face region image, and obtain facial region features based on the facial key point coordinate sequence and deep learning features;

人脸匹配度评估模块,用于对待替换人脸区域图像与目标人脸图像进行人脸区域特征比对,将待替换人脸区域图像与目标人脸图像划分为n个子区域,获取待替换人脸区域图像与目标人脸图像n个子区域之间的n个替换匹配度;A face matching evaluation module is used to compare face region features between the face region image to be replaced and the target face image, divide the face region image to be replaced and the target face image into n sub-regions, and obtain n replacement matching degrees between the face region image to be replaced and the n sub-regions of the target face image;

自适应图像融合模块,设置匹配度阈值区间,通过比对n个替换匹配度和匹配度阈值区间获取比对结果;根据比对结果采用图像融合算法,将目标人像图像和待替换人脸区域图像进行融合替换;各个模块之间通过有线和/或无线的方式进行连接,实现模块间的数据传输;The adaptive image fusion module sets a matching degree threshold interval, obtains a matching result by comparing n replacement matching degrees and the matching degree threshold interval; uses an image fusion algorithm according to the matching result to fuse and replace the target portrait image and the face area image to be replaced; and connects the modules by wired and/or wireless means to realize data transmission between the modules;

进一步的,所述对输入的视频流进行人脸检测,获取人脸区域图像的方式包括:Furthermore, the method of performing face detection on the input video stream to obtain the face area image includes:

预训练完成的MTCNN人脸检测模型,加载到内存中;逐帧读取输入的视频流画面,提取RGB图像;将提取的RGB图像,输入到预先训练MTCNN人脸检测模型中,生成每个人脸的坐标框;根据人脸坐标框,在输入的RGB图像中截取人脸框所在的区域,生成人脸区域图像;The pre-trained MTCNN face detection model is loaded into the memory; the input video stream is read frame by frame to extract the RGB image; the extracted RGB image is input into the pre-trained MTCNN face detection model to generate the coordinate frame of each face; according to the face coordinate frame, the area where the face frame is located is intercepted in the input RGB image to generate the face area image;

所述MTCNN人脸检测模型的预训练过程包括:The pre-training process of the MTCNN face detection model includes:

将MTCNN人脸检测模型的训练分为三个阶段,三个阶段包括训练提取网络、训练优化网络和训练输出网络;The training of the MTCNN face detection model is divided into three stages, including training the extraction network, training the optimization network, and training the output network.

所述训练提取网络的方式包括:The method of training the extraction network includes:

采集m张包括人脸的图像作为训练集;m为大于1的整数;对训练集图像进行人工或者计算机人脸标注,获取包括人脸标签框的图像训练集;Collect m images including human faces as a training set, where m is an integer greater than 1; perform manual or computer face annotation on the training set images to obtain an image training set including a face label frame;

提取网络采用卷积神经网络结构,包括卷积层、激活函数层、池化层和全连接层组成;The extraction network adopts a convolutional neural network structure, which includes convolutional layer, activation function layer, pooling layer and fully connected layer;

设置输入层为输入RGB图像的三通道像素数据;使用b个卷积核对输入RGB图像进行卷积运算,获取特征图;利用激活函数层,例如ReLU层,对卷积层输出的特征图施加非线性激活;通过池化层,对特征图进行降维处理;得到降维特征图;利用全连接层将降维特征图变为特征图向量;Set the input layer to the three-channel pixel data of the input RGB image; use b convolution kernels to perform convolution operation on the input RGB image to obtain the feature map; use the activation function layer, such as the ReLU layer, to apply nonlinear activation to the feature map output by the convolution layer; use the pooling layer to reduce the dimension of the feature map; obtain the reduced-dimensional feature map; use the fully connected layer to convert the reduced-dimensional feature map into a feature map vector;

损失函数层,例如,均方误差和交叉熵,衡量提取网络预测输出和真实标签的差距;The loss function layer, for example, mean squared error and cross entropy, measures the gap between the predicted output of the extraction network and the true label;

在一个优选的实施例中,损失函数层采用交叉熵;In a preferred embodiment, the loss function layer uses cross entropy;

交叉熵H(P,Q)=-∑iP(i)log(Q(i));Cross entropy H(P,Q) = -∑i P(i)log(Q(i));

其中,H(P,Q)表示真实分布P和模型预测分布Q之间的交叉熵损失;P(i)表示真实分布中第i个类别的概率,对于人脸提取任务,可以看作是是否包含人脸的标签;Q(i)表示模型预测分布中第i个类别的概率;Among them, H(P,Q) represents the cross entropy loss between the true distribution P and the model predicted distribution Q; P(i) represents the probability of the i-th category in the true distribution. For the face extraction task, it can be regarded as a label of whether a face is contained; Q(i) represents the probability of the i-th category in the model predicted distribution;

定义回归框损失函数测量网络预测框和标签框的误差;Define the regression box loss function to measure the error between the network prediction box and the label box;

回归框损失函数为:The regression box loss function is:

其中E的取值为px-gx,py-gy,pw-gw和ph-ghThe values of E are px -gx , py -gyy , pw -gw and ph -gh ;

其中,(px,py)为预测框的中心坐标,pw为预测框的宽,ph为预测框的高;(gx,gy)为标签框的中心坐标,gw为标签框的宽,gh为标签框的高;N为总的图像训练集内图像的数量;Where (px ,py ) is the center coordinate of the prediction box, pw is the width of the prediction box, andph is the height of the prediction box; (gx ,gy ) is the center coordinate of the label box, gw is the width of the label box, and gh is the height of the label box; N is the number of images in the total image training set;

利用反向传播算法根据损失函数更新网络参数,使预测框不断逼近标签框;通过迭代优化模型,最终获得提取网络;The back propagation algorithm is used to update the network parameters according to the loss function, so that the prediction box is constantly close to the label box; the extraction network is finally obtained by iteratively optimizing the model;

需要说明的是,在提取网络的参数设置上,卷积核数量、大小、激活函数的选择、池化方式等都会影响最终的提取效果;通过多次调整测试,取得最佳参数组合;It should be noted that in the parameter setting of the extraction network, the number and size of convolution kernels, the choice of activation function, the pooling method, etc. will affect the final extraction effect; the best parameter combination is obtained through multiple adjustments and tests;

所述训练优化网络的步骤包括:The steps of training and optimizing the network include:

利用提取网络对训练集图像生成的所有候选框进行评估;评估每个候选框与标签框的IoU;IoU表示两者的重合度;Use the extraction network to evaluate all candidate boxes generated by the training set images; evaluate the IoU between each candidate box and the label box; IoU indicates the degree of overlap between the two;

若候选框的IoU>0.5,则该候选框基本上准确包含了人脸,标注该候选框为正样本;若候选框的IoU≤0.5,则该候选框与标签框基本不重合,无法正确检测人脸,标注该候选框为负样本;If the IoU of the candidate box is greater than 0.5, the candidate box basically contains the face accurately, and the candidate box is marked as a positive sample; if the IoU of the candidate box is less than or equal to 0.5, the candidate box and the label box basically do not overlap, and the face cannot be correctly detected, and the candidate box is marked as a negative sample;

定义优化网络的网络结构,网络结构包括卷积层、池化层和全连接层;定义优化网络的训练目标为输出为一个二值判断结果,二值判断结果为当前候选框是否包括人脸;期望其判断结果与标注的标签一致;标签包括正样本和负样本;Define the network structure of the optimization network, which includes convolutional layers, pooling layers, and fully connected layers; define the training goal of the optimization network as outputting a binary judgment result, which is whether the current candidate box includes a face; expect the judgment result to be consistent with the annotated label; the label includes positive samples and negative samples;

采用二分类交叉熵损失函数作为优化网络的损失函数;The binary cross entropy loss function is used as the loss function of the optimization network;

二分类交叉熵损失函数Loss(o,y)=-(ylog(o)+(1-y)log(1-o));Binary cross entropy loss function Loss(o,y)=-(ylog(o)+(1-y)log(1-o));

其中,o为当前样本预测为正样本的概率;y当前样本的标签,对于正样本y为1、对于负样本y为0;Among them, o is the probability that the current sample is predicted to be a positive sample; y is the label of the current sample, which is 1 for positive samples and 0 for negative samples;

输入带候选框的图像至优化网络进行网络计算,得到二值判断结果;比较二值判断结果与标签并计算二分类交叉熵损失函数值;通过反向传播等算法更新优化网络的网络参数,使二分类交叉熵损失函数值降低,提高判断准确率;Input the image with the candidate box to the optimized network for network calculation to obtain the binary judgment result; compare the binary judgment result with the label and calculate the binary cross entropy loss function value; update the network parameters of the optimized network through back propagation and other algorithms to reduce the binary cross entropy loss function value and improve the judgment accuracy;

随着迭代的进行,二分类交叉熵损失函数的值会逐步下降,当其下降速度变缓时,则优化网络趋于饱和,即结束训练;得到优化网络;As the iteration proceeds, the value of the binary cross entropy loss function will gradually decrease. When its decreasing speed slows down, the optimized network tends to be saturated, that is, the training ends, and the optimized network is obtained;

候选框输入该模型,大部分负样本会被过滤,提高后续检测的精度;When the candidate boxes are input into the model, most negative samples will be filtered out, thus improving the accuracy of subsequent detection;

所述训练输出网络的步骤包括:The step of training the output network comprises:

经过优化网络的过滤,留下的高质量候选框图片作为输出网络的训练样本;After being filtered by the optimized network, the remaining high-quality candidate box images are used as training samples for the output network;

构建输出网络的网络结构与优化网络的网络结构相同;The network structure of the output network is the same as that of the optimized network;

定义输出网络的损失函数为坐标回归损失函数,坐标回归损失函数用于测量输出网络预测框与标签框在位置和大小上的误差平方和;The loss function of the output network is defined as a coordinate regression loss function, which is used to measure the sum of square errors between the output network prediction box and the label box in terms of position and size.

需要解释的是,坐标回归损失函数计算方法是:It needs to be explained that the coordinate regression loss function is calculated as:

1.计算预测框中心横坐标px和标签框中心横坐标gx的差值,求平方;1. Calculate the difference between the horizontal coordinate px of the center of the prediction box and the horizontal coordinate gx of the center of the label box, and square it;

2.计算预测框中心纵坐标py和标签框中心纵坐标gy的差值,求平方;2. Calculate the difference between the vertical coordinate py of the center of the prediction box and the vertical coordinate gy of the center of the label box, and square it;

3.计算预测框宽度pw和标签框宽度gw的差值,求平方;3. Calculate the difference between the prediction box width pw and the label box width gw, and square it;

4.计算预测框高度ph和标签框高度gh的差值,求平方;4. Calculate the difference between the predicted box height ph and the label box height gh, and square it;

所述的横纵坐标均是在图像坐标系的基础上,坐标点即为像素点的在图像坐标系中左边;The horizontal and vertical coordinates are based on the image coordinate system, and the coordinate point is the left side of the pixel point in the image coordinate system;

将上述1-4项的平方和总和,就是最终的坐标回归损失函数的值;The sum of the squares of the above 1-4 items is the final value of the coordinate regression loss function;

输出网络的输入为候选框图像,输出网络的输出为调整后的预测框坐标;预测框坐标包括预测框的中心坐标、预测框的宽和预测框的高;The input of the output network is the candidate box image, and the output of the output network is the adjusted prediction box coordinates; the prediction box coordinates include the center coordinates of the prediction box, the width of the prediction box, and the height of the prediction box;

预定义输出网络的参数,输出网络根据当前参数,预测输出调整后的候选框坐标;计算该预测框与该图像标签框的坐标回归损失函数值;通过链式法则求导计算每个参数对坐标回归损失函数值的梯度;沿负梯度方向微调输出网络的参数,使坐标回归损失函数的值降低;当其降低速度变缓时,则结束训练;得到输出网络;Predefine the parameters of the output network, and the output network predicts the coordinates of the candidate box after the output adjustment according to the current parameters; calculate the coordinate regression loss function value of the predicted box and the image label box; calculate the gradient of each parameter to the coordinate regression loss function value by derivation through the chain rule; fine-tune the parameters of the output network along the negative gradient direction to reduce the value of the coordinate regression loss function; when the reduction speed slows down, end the training; and obtain the output network;

调整输出网络参数,逐步缩小预测框与标签框的位置和大小差异,使得坐标预测更加准确Adjust the output network parameters to gradually reduce the position and size difference between the prediction box and the label box, making the coordinate prediction more accurate

三个阶段均训练完成后,将三个阶段串联使用即构成完整的MTCNN人脸检测模型;After all three stages are trained, the three stages are used in series to form a complete MTCNN face detection model;

进一步的,所述人脸关键点坐标序列的提取过程包括:Furthermore, the extraction process of the facial key point coordinate sequence includes:

构建关键点检测卷积神经网络,关键点检测卷积神经网络包括a个卷积层、池化层、和全连接层;Construct a key point detection convolutional neural network, which includes a convolutional layer, a pooling layer, and a fully connected layer;

关键点检测卷积神经网络的输入为人脸区域图像;人脸区域图像为RGB图像;The input of the key point detection convolutional neural network is the face area image; the face area image is an RGB image;

关键点检测卷积神经网络的输出为人脸关键点坐标,人脸关键点包括眼角点、眉心点、嘴角点等共c个关键点;如图2所示,The output of the key point detection convolutional neural network is the coordinates of the key points of the face. The key points of the face include the corners of the eyes, the center of the eyebrows, the corners of the mouth, etc., a total of c key points; as shown in Figure 2,

以68个关键点举例,轮廓特征点共17点,分别包括左右眼角点、眉角点共8点;鼻梁点4点,左右鼻翼点各1点;上唇点2点,下颌点2点;Taking 68 key points as an example, there are 17 contour feature points, including 8 points for the left and right eye corners and eyebrow corners; 4 points for the bridge of the nose, 1 point for each of the left and right nose wings; 2 points for the upper lip and 2 points for the lower jaw;

两眉的特征点共22点,分别包括每条眉毛11点,分别描述眉形的轮廓和细节;两眼的特征点共20点,分别包括每只眼睛10点,包含上下眼睑的轮廓和眼球部分的点;鼻子的特征点共9点分别包括鼻尖点1点,鼻翼两侧4点;鼻底部中线4点;There are 22 feature points of the eyebrows, including 11 points for each eyebrow, which describe the outline and details of the eyebrow shape; there are 20 feature points of the eyes, including 10 points for each eye, including the outline of the upper and lower eyelids and the points of the eyeball; there are 9 feature points of the nose, including 1 point at the tip of the nose, 4 points on both sides of the nose, and 4 points on the midline of the nose base;

收集并标注v张人脸图像特征图像,构成人脸图像特征图像集;Collect and annotate v facial image feature images to form a facial image feature image set;

收集的人脸图像特征图像涵盖不同性别、年龄、种族且面部表情姿态均不同;使用图像标注工具手动或计算机标记每张人脸图像特征图像中人脸的c个关键点坐标;The collected facial image feature images cover different genders, ages, races and facial expressions and postures; the image annotation tools are used to manually or computer-mark the coordinates of c key points of the face in each facial image feature image;

将标注过的人脸图像特征图像集输入关键点检测卷积神经网络;损失函数为预测的关键点坐标与标注的关键点坐标的欧式距离;The labeled facial image feature set is input into the key point detection convolutional neural network; the loss function is the Euclidean distance between the predicted key point coordinates and the labeled key point coordinates;

在单张人脸图像特征图像上的损失函数为The loss function on a single face image feature image is

其中,为关键点检测卷积神经网络预测的第i张图像的第j个关键点坐标;yij为第i张图像手工标注的第j个关键点的坐标;/>为预测关键点坐标/>与标注关键点坐标yij的欧式距离;c为人脸关键点的数量;in, is the coordinate of the jth key point of the i-th image predicted by the key point detection convolutional neural network;yij is the coordinate of the jth key point manually annotated in the i-th image; /> To predict key point coordinates/> The Euclidean distance from the coordinates of the labeled key points yij ; c is the number of facial key points;

在人脸图像特征图像集上,关键点检测模型的总体损失函数为:On the face image feature image set, the overall loss function of the key point detection model is:

其中,v为人脸图像特征图像集包括的所有图像数量;Among them, v is the number of all images included in the face image feature image set;

定义关键点检测卷积神经网络的网络参数,初始化参数;Define the network parameters and initialization parameters of the key point detection convolutional neural network;

输入人脸图像特征图像集内的图像进行训练,并前向传播,计算损失Lz;通过反向传播算法计算损失Lz对参数θ的梯度通过梯度下降法/>其中,α为学习率;Input the images in the face image feature set for training, and forward propagate to calculate the lossLz ; calculate the gradient of the lossLz to the parameter θ through the back-propagation algorithm By gradient descent method/> Among them, α is the learning rate;

通过迭代优化,逐步减小损失Lz的值,得到人脸关键点检测模型;Through iterative optimization, the value of loss Lz is gradually reduced to obtain the face key point detection model;

将人脸区域图像输入至人脸关键点检测模型预测提取输出c个关键点坐标;将c个关键点坐标构成人脸关键点坐标序列;Input the face region image into the face key point detection model to predict and extract the output c key point coordinates; construct the face key point coordinate sequence from the c key point coordinates;

进一步的,所述深度学习特征的提取过程包括:Furthermore, the deep learning feature extraction process includes:

预先定义人脸关键点检测模型的中间层;将人脸区域图像输入至人脸关键点检测模型中;提取人脸关键点检测模型中间层输出的深度特征向量;深度特征向量即为深度学习特征;Predefine the middle layer of the face key point detection model; input the face area image into the face key point detection model; extract the deep feature vector output by the middle layer of the face key point detection model; the deep feature vector is the deep learning feature;

需要说明的是,中间层的选择为人脸关键点检测模型中间卷积层,亦可以称为特征提取层,例如,在ResNet-50模型中选择第3层卷积层作为特征提取层;输入单张人脸区域图像,启动模型前向传播计算,计算过程会经过定义的特征提取层,输出识别结果;在特征提取层时输出其激活映射,该激活映射保留了输入人脸图像的高级语义信息,即作为人脸图像的深度特征表示;对深度特征表示进行后处理,包括调整特征映射大小,L2正则化归一化特征值到相同尺度,最终形成固定长度的深度特征向量;It should be noted that the middle layer is the middle convolution layer of the face key point detection model, which can also be called the feature extraction layer. For example, in the ResNet-50 model, the third convolution layer is selected as the feature extraction layer; a single face area image is input, and the model forward propagation calculation is started. The calculation process will pass through the defined feature extraction layer and output the recognition result; when in the feature extraction layer, its activation map is output, and the activation map retains the high-level semantic information of the input face image, that is, as a deep feature representation of the face image; the deep feature representation is post-processed, including adjusting the feature map size, L2 regularization normalization feature values to the same scale, and finally forming a fixed-length deep feature vector;

进一步的,所述人脸区域特征的获取方式包括:Furthermore, the method of acquiring the facial region features includes:

深度特征向量为一个定长的向量,例如1024维;记录了图像的全局信息;人脸关键点坐标序列为不定长,例如c个横纵坐标对;主要描述局部细节;将人脸关键点坐标序列展平,每一个关键点用一个长度为2的向量表示;The depth feature vector is a fixed-length vector, such as 1024 dimensions; it records the global information of the image; the facial key point coordinate sequence is of variable length, such as c horizontal and vertical coordinate pairs; it mainly describes local details; the facial key point coordinate sequence is flattened, and each key point is represented by a vector of length 2;

将深度特征向量,与展平后的人脸关键点坐标序列直接拼接;得到一个定长向量,既包括了深度特征全局信息,也包括了关键点局部信息;The depth feature vector is directly concatenated with the flattened facial key point coordinate sequence to obtain a fixed-length vector that includes both the global information of the depth feature and the local information of the key points.

进一步的,所述替换匹配度的获取方式包括:Furthermore, the method for obtaining the replacement matching degree includes:

利用人脸特征提取模块分别提取待替换人脸区域图像A的人脸区域特征和目标人脸图像B的人脸区域特征;根据n个子区域划分n对人脸区域特征;The facial region features of the face region image A to be replaced and the face region features of the target face image B are respectively extracted using a facial feature extraction module; n pairs of facial region features are divided according to n sub-regions;

人脸区域特征为高维向量,包括深度学习特征和人脸关键点坐标序列两部分信息;The face region feature is a high-dimensional vector, which includes two parts of information: deep learning features and facial key point coordinate sequence;

分别计算n对个人脸区域特征之间的余弦相似度sim(ai,bi);Calculate the cosine similarity sim(ai ,bi ) between n pairs of facial region features respectively;

其中,ai为第i个子区域的待替换人脸区域图像A的人脸区域特征;bi为第i个子区域的目标人脸图像B的人脸区域特征;ai.bi表示ai和bi向量的内积;||ai||和||bi||分别表示ai和bi的L2范数;Wherein,ai is the face region feature of the face region image A to be replaced in the i-th sub-region;bi is the face region feature of the target face image B in the i-th sub-region;ai.bi represents the inner product ofai andbi vectors; ||ai || and ||bi || represent the L2 norm ofai andbi respectively;

需要说明的是,sim(ai,bi)的值域为[-1,1];当ai和bi两个向量方向完全相同时,取值为1;当向量垂直时,为0;当ai和bi向量方向完全相反时,取值为-1;It should be noted that the value range of sim(ai ,bi ) is [-1,1]; when the directions of the two vectors ai and bi are exactly the same, the value is 1; when the vectors are perpendicular, the value is 0; when the directions of the two vectors ai and bi are completely opposite, the value is -1;

根据sim(ai,bi)的值,判断两人脸区域特征的差异大小;值越大表示越相似,值接近1,代表高度匹配;将sim(ai,bi)的值作为待替换人脸区域图像与目标人脸图像之间的匹配度;According to the value of sim(ai ,bi ), the difference between the features of the two face regions is determined; the larger the value, the more similar they are, and the value close to 1 represents a high degree of match; the value of sim(ai ,bi ) is used as the match between the face region image to be replaced and the target face image;

需要说明的是,L2范数的计算方法就是将向量中各个元素的平方和取平方根;It should be noted that the calculation method of the L2 norm is to take the square root of the sum of the squares of each element in the vector;

进一步的,所述匹配度阈值区间的设置方式包括:Furthermore, the matching degree threshold interval is set in the following manner:

收集包括u对人脸图像的训练数据集,每对人脸图像为同一人不同场景拍摄的人脸图片;对每对人脸图像计算匹配度scoreij;统计整个训练数据集中所有对人脸图像的匹配度scoreij,计算所有对人脸图像的匹配度scoreij的均值计算所有对人脸图像的匹配度scoreij的标准差/>其中,u为训练数据集内人脸图像对的数量;Collect a training data set including u pairs of face images, where each pair of face images is a face image of the same person taken in different scenes; calculate the matching degree scoreij for each pair of face images; count the matching degrees scoreij of all pairs of face images in the entire training data set, and calculate the mean of the matching degrees scoreij of all pairs of face images Calculate the standard deviation of the matching scoresij of all face images/> Where u is the number of face image pairs in the training dataset;

根据高斯分布理论,大于或等于ρ+3σ可看作高匹配样本,设置匹配度阈值区间的上限为ρ+3σ;小于或等于ρ-3σ可看作低匹配样本;设置匹配度阈值区间下限为ρ-3σ;则匹配度阈值区间为[ρ+3σ,ρ-3σ];According to Gaussian distribution theory, samples greater than or equal to ρ+3σ can be regarded as high matching samples, and the upper limit of the matching threshold interval is set to ρ+3σ; samples less than or equal to ρ-3σ can be regarded as low matching samples, and the lower limit of the matching threshold interval is set to ρ-3σ; then the matching threshold interval is [ρ+3σ,ρ-3σ];

进一步的,所述对比结果为高匹配或中匹配或低匹配;Further, the comparison result is a high match, a medium match, or a low match;

若n个替换匹配度中的替换匹配度大于或等于,则对应的子区域的对比结果为高匹配;若n个替换匹配度中的替换匹配度处于匹配度阈值区间内,则对应的子区域的对比结果为中匹配;若n个替换匹配度中的替换匹配度小于或等于ρ-3σ,则对应的子区域的对比结果为低匹配;If the replacement matching degree among the n replacement matching degrees is greater than or equal to , the comparison result of the corresponding sub-region is a high match; if the replacement matching degree among the n replacement matching degrees is within the matching degree threshold interval, the comparison result of the corresponding sub-region is a medium match; if the replacement matching degree among the n replacement matching degrees is less than or equal to ρ-3σ, the comparison result of the corresponding sub-region is a low match;

进一步的,所述根据比对结果采用图像融合算法,将目标人像图像和待替换人脸区域图像进行融合替换的方式包括:Furthermore, the method of using an image fusion algorithm according to the comparison result to fuse and replace the target portrait image and the face area image to be replaced includes:

将对比结果为高匹配的目标人像图像和待替换人脸区域图像子区域运用泊松融合算法进行融合替换;The target portrait image with a high match in the comparison result and the sub-region of the face region to be replaced are fused and replaced using the Poisson fusion algorithm;

将对比结果为中匹配的目标人像图像和待替换人脸区域图像子区域运用颜色迁移算法进行融合替换;The target portrait image and the sub-region of the face area to be replaced are fused and replaced by using the color migration algorithm;

将对比结果为低匹配的目标人像图像和待替换人脸区域图像子区域运用几何变换算法进行融合替换;实现目标人像图像和待替换人脸区域图像之间的融合替换;The target portrait image with a low match in the comparison result and the sub-region of the face area image to be replaced are fused and replaced by using a geometric transformation algorithm; the fusion replacement between the target portrait image and the sub-region of the face area image to be replaced is realized;

所述泊松融合算法进行融合替换的过程包括:The process of fusion replacement by the Poisson fusion algorithm includes:

步骤101、提取待替换人脸图像A和目标人脸图像B对应的高匹配子区域,分别标记为图像I和图像J;Step 101, extracting high matching sub-regions corresponding to the face image to be replaced A and the target face image B, and marking them as image I and image J respectively;

步骤102、计算图像I和图像J在每个像素位置的梯度方向场,获取引导场BI和引导场,表示图像I和J在各像素点处的亮度变化趋势;Step 102: Calculate the gradient direction field of the image I and the image J at each pixel position, obtain the guide field BI and the guide field, and represent the brightness change trend of the image I and the image J at each pixel point;

需要说明的是,计算梯度方向场的过程如下:It should be noted that the process of calculating the gradient direction field is as follows:

设GxI,GyI分别表示图像I在x方向和y方向上的梯度大小映射;GxI(x,y),GxI(x,y)表示位置(x,y)处的水平和垂直梯度分量;其中所述的x和y方向均是在图像所处坐标系上定义的,其中x和y分别是像素点的横坐标和纵坐标;Let GxI and GyI represent the gradient magnitude mapping of image I in the x direction and y direction respectively; GxI(x,y) and GxI(x,y) represent the horizontal and vertical gradient components at the position (x,y); the x and y directions are defined in the coordinate system of the image, where x and y are the horizontal and vertical coordinates of the pixel respectively;

计算梯度幅值Calculate the gradient magnitude

计算位置的梯度方向Calculate the gradient direction of the position

构建引导场BI=(AmpI(x,y),θI(x,y));Construct the guidance field BI = (AmpI(x,y),θI(x,y));

对图像J进行同样的处理;即获取引导场BI和引导场BJ;The same processing is performed on the image J; that is, the guiding field BI and the guiding field BJ are obtained;

步骤103、将引导场BI和引导场BJ带入泊松方程求解,得到图像I和图像J的密度分布映射,表示各像素点的像素密度大小,作为密度图AI和密度图AJ;Step 103: Substitute the guiding field BI and the guiding field BJ into the Poisson equation to solve it, and obtain the density distribution mapping of the image I and the image J, which represents the pixel density of each pixel point, as the density map AI and the density map AJ;

所述的泊松方程求解过程包括:The Poisson equation solution process includes:

Laplacian(g)=div(g0Δg0),式中,g表示待求的图像像素密度分布映射;g0表示给定的引导场;Δ表示拉普拉斯算子;div表示散度运算;Laplacian(g)=div(g0 Δg0 ), where g represents the image pixel density distribution map to be obtained; g0 represents the given guidance field; Δ represents the Laplacian operator; div represents the divergence operation;

令g为密度图AI,g0为引导场BI;得到Laplacian(AI)=div(BIΔBI);Let g be the density map AI, g0 be the guidance field BI; get Laplacian(AI)=div(BIΔBI);

通过有限差分的方式离散化Laplacian(AI)=div(BIΔBI);即获得密度图AI;对引导场BJ进行同样的处理,得到密度图AJ;Discretize Laplacian (AI) = div (BIΔBI) by finite difference, that is, obtain density map AI; perform the same processing on the guide field BJ, and obtain density map AJ;

步骤104、定义融合权重参数β;融合权重参数β的数值定义为0-1之间;Step 104, defining a fusion weight parameter β; the value of the fusion weight parameter β is defined to be between 0 and 1;

计算新的像素密度分布Den=β×AI+(1-β)×AJ;Calculate the new pixel density distribution Den = β × AI + (1-β) × AJ;

步骤105、根据新的像素密度分布Den分布映射,控制图像I和图像J中每个像素点的融合贡献大小,最终生成融合替换后的新图像,完成人脸区域子图像的泊松融合;Step 105: According to the new pixel density distribution Den distribution mapping, the fusion contribution size of each pixel in the image I and the image J is controlled, and finally a new image after fusion and replacement is generated, and the Poisson fusion of the sub-images in the face area is completed;

所述颜色迁移算法进行融合替换的过程包括:The process of fusion replacement by the color migration algorithm includes:

步骤201、提取待替换人脸图像A和目标人脸图像B对应的中匹配子区域,分别标记为图像Qx和图像Qy;Step 201, extract the matching sub-regions corresponding to the face image A to be replaced and the target face image B, and mark them as image Qx and image Qy respectively;

步骤202、将RGB格式的图像Qx和图像Qy转换到LAB颜色空间,得到图像Qx1和图像Qy1;Step 202: Convert the RGB format image Qx and image Qy to the LAB color space to obtain image Qx1 and image Qy1;

步骤203、计算图像Qx1在LAB颜色空间下的均值mean I和标准差dev I;计算图像Qy1在LAB颜色空间下的均值meanJ和标准差devJ;Step 203, calculate the mean value mean I and the standard deviation dev I of the image Qx1 in the LAB color space; calculate the mean value meanJ and the standard deviation devJ of the image Qy1 in the LAB color space;

步骤204、将图像Qy1的meanJ和devJ的值赋值给图像Qx;即图像Qx采用和图像Qy一致的颜色均值和标准差;即完成人脸区域中匹配子区域的融合替换;Step 204, assign the values of meanJ and devJ of image Qy1 to image Qx; that is, image Qx adopts the same color mean and standard deviation as image Qy; that is, the fusion replacement of the matching sub-regions in the face region is completed;

所述几何变换算法进行融合替换的过程包括:The process of fusion replacement by the geometric transformation algorithm includes:

步骤301、提取待替换人脸图像A和目标人脸图像B对应的低匹配子区域图像,分别标记为子图像src和子图像tar;Step 301, extracting low matching sub-region images corresponding to the face image to be replaced A and the target face image B, and marking them as sub-image src and sub-image tar respectively;

步骤302、提取在子图像src内检测到的人脸关键点坐标序列rc;提取在子图像tar内检测到的人脸关键点坐标序列ar;Step 302: extract the facial key point coordinate sequence rc detected in the sub-image src; extract the facial key point coordinate sequence ar detected in the sub-image tar;

步骤303、基于人脸关键点坐标序列rc和人脸关键点坐标序列ar的差异,计算出从子图像src和子图像tar的几何变换矩阵Me;Step 303: Calculate the geometric transformation matrix Me from the sub-image src and the sub-image tar based on the difference between the facial key point coordinate sequence rc and the facial key point coordinate sequence ar;

步骤304、应用变换矩阵Me重新映射转换子图像src中的每个像素坐标,得到新的子图像src',其面部形状与子图像tar一致;Step 304: Apply the transformation matrix Me to remap the coordinates of each pixel in the sub-image src to obtain a new sub-image src', whose facial shape is consistent with the sub-image tar;

步骤305、将几何转换后新的子图像src'子图像,与对应子图像tar中的像素值进行融合,即完成人脸区域低匹配子区域的融合替换;Step 305: fuse the new sub-image src' after geometric transformation with the pixel values in the corresponding sub-image tar, so as to complete the fusion replacement of the low matching sub-regions in the face region;

具体的,构建几何变换矩阵的功能函数,功能函数的输入为人脸关键点坐标序列rc和人脸关键点坐标序列ar;利用最小二乘法,计算从人脸关键点坐标序列rc映射到人脸关键点坐标序列ar对应的最优仿射变换矩阵Me;矩阵Me的大小为3x3,矩阵Me包括图像坐标变换所需的平移、旋转、缩放信息;输出得到的几何变换矩阵Me;Specifically, a function of constructing a geometric transformation matrix, the input of which is a facial key point coordinate sequence rc and a facial key point coordinate sequence ar; using the least squares method, the optimal affine transformation matrix Me corresponding to the mapping from the facial key point coordinate sequence rc to the facial key point coordinate sequence ar is calculated; the size of the matrix Me is 3x3, and the matrix Me includes the translation, rotation, and scaling information required for the image coordinate transformation; the output is the obtained geometric transformation matrix Me;

遍历子图像src中的每个像素点的坐标位置(xi,yi);将(xi,yi)代入变换矩阵Me进行矩阵变换计算,得到坐标映射位置(xi',yi');Traverse the coordinate position (xi, yi) of each pixel in the sub-image src; substitute (xi, yi) into the transformation matrix Me for matrix transformation calculation to obtain the coordinate mapping position (xi', yi');

将全部像素点的坐标映射位置,在空白图像上进行重构,生成子图像src',其面部形状与子图像tar一致;计算子图像src中每个像素点的颜色RGB值与src'中新坐标(xi',yi')处的距离权重wr;Map the coordinates of all pixels to the positions and reconstruct them on the blank image to generate a sub-image src', whose facial shape is consistent with the sub-image tar; calculate the distance weight wr between the color RGB value of each pixel in the sub-image src and the new coordinates (xi', yi') in src';

根据距离权重wr与子图像tar中对应位置(xi',yi')的RGB色值,计算该位置的融合像素值RGB';通过上述像素融合过程,在子图像src'的上生成融合的新子图像;According to the distance weight wr and the RGB color value of the corresponding position (xi', yi') in the sub-image tar, the fused pixel value RGB' of the position is calculated; through the above pixel fusion process, a new fused sub-image is generated on the sub-image src';

需要说明的是,距离权重wr的获取过程如下:It should be noted that the process of obtaining the distance weight wr is as follows:

计算坐标映射位置和原坐标位置的欧式距离定义高斯核函数/>其中,τ为指定的高斯核参数;d为需要参与运算的自变量;Calculate the Euclidean distance between the coordinate mapping position and the original coordinate position Define Gaussian kernel function/> Among them, τ is the specified Gaussian kernel parameter; d is the independent variable that needs to participate in the operation;

将欧式距离DR运用高斯核函数,如下:The Gaussian kernel function is applied to the Euclidean distance DR as follows:

则f(DR)即为距离权重wr。 Then f(DR) is the distance weight wr.

本实施例,实现对视频画面中的人脸进行检测识别、定位、匹配分析和自适应融合替换的全自动化处理;系统集成了人脸检测、特征表达、相似度判断和图像融合模块,可对人脸区域精确识别、表达和分析,并根据判断结果选择最优算法,实现人脸的自然融合替换;这样,在保证处理效果的同时,也实现了处理策略的自动化选择和系统的灵活扩展;总体上,本系统拥有支持多种智能视频应用的人脸处理能力,既保证了精确定位人脸区域的能力,又实现了对复杂面部特征和纹理的自适应表达、分析和融合。This embodiment realizes fully automated processing of detection, recognition, positioning, matching analysis and adaptive fusion and replacement of faces in video images; the system integrates face detection, feature expression, similarity judgment and image fusion modules, can accurately recognize, express and analyze face areas, and select the optimal algorithm according to the judgment results to realize natural fusion and replacement of faces; in this way, while ensuring the processing effect, it also realizes the automated selection of processing strategies and the flexible expansion of the system; in general, this system has the face processing capability to support a variety of intelligent video applications, which not only ensures the ability to accurately locate face areas, but also realizes the adaptive expression, analysis and fusion of complex facial features and textures.

实施例2Example 2

请参阅图3所示,本实施例未详细叙述部分见实施例1描述内容,提供一种用于人脸替换的智能设计方法,包括:S1、对输入的视频流进行人脸检测,获取人脸区域图像;Please refer to FIG. 3 . For the parts not described in detail in this embodiment, please refer to the description of Embodiment 1. An intelligent design method for face replacement is provided, including: S1, performing face detection on an input video stream to obtain a face region image;

S2、对人脸区域图像提取人脸关键点坐标序列和深度学习特征,根据人脸关键点坐标序列和深度学习特征获取人脸区域特征;S2, extracting facial key point coordinate sequence and deep learning features from the facial region image, and obtaining facial region features according to the facial key point coordinate sequence and the deep learning features;

S3、对待替换人脸区域图像与目标人脸图像进行人脸区域特征比对,将待替换人脸区域图像与目标人脸图像划分为n个子区域,获取待替换人脸区域图像与目标人脸图像n个子区域之间的n个替换匹配度;S3, performing facial region feature comparison between the facial region image to be replaced and the target facial image, dividing the facial region image to be replaced and the target facial image into n sub-regions, and obtaining n replacement matching degrees between the facial region image to be replaced and the n sub-regions of the target facial image;

S4、设置匹配度阈值区间,通过比对n个替换匹配度和匹配度阈值区间获取比对结果;根据比对结果采用图像融合算法,将目标人像图像和待替换人脸区域图像进行融合替换。S4, setting a matching degree threshold interval, and obtaining a matching result by comparing n replacement matching degrees and the matching degree threshold interval; and using an image fusion algorithm according to the matching result to fuse and replace the target portrait image and the face area image to be replaced.

实施例3Example 3

请参阅图4所示,根据本申请的又一方面还提供了电子设备。该电子设备可包括输入设备、运算器、控制器、主存储器和输出设备。其中,主存储器中存储有计算机可读代码,计算机可读代码当由一个或多个处理器运行时,可以执行如上所述的一种基于大数据的可折叠中空板生产数据提取方法。Referring to FIG. 4 , according to another aspect of the present application, an electronic device is also provided. The electronic device may include an input device, an arithmetic unit, a controller, a main memory, and an output device. The main memory stores a computer readable code, and when the computer readable code is executed by one or more processors, the method for extracting production data of a foldable hollow board based on big data as described above may be executed.

根据本申请实施方式的方法或系统也可以借助于图4所示的电子设备的架构来实现。如图4所示,电子设备可输入设备、运算器、控制器、主存储器和输出设备等。电子设备中的存储设备,例如ROM503或硬盘可存储本申请提供的一种基于大数据的可折叠中空板生产数据提取方法。进一步地,电子设备还可包括用户界面。当然,图4所示的架构只是示例性的,在实现不同的设备时,根据实际需要,可以省略图4示出的电子设备中的一个或多个组件。The method or system according to the implementation mode of the present application can also be implemented with the aid of the architecture of the electronic device shown in FIG4. As shown in FIG4, the electronic device may include an input device, an operator, a controller, a main memory, and an output device, etc. A storage device in the electronic device, such as ROM503 or a hard disk, can store a method for extracting production data of a foldable hollow board based on big data provided in the present application. Furthermore, the electronic device may also include a user interface. Of course, the architecture shown in FIG4 is only exemplary. When implementing different devices, one or more components in the electronic device shown in FIG4 may be omitted according to actual needs.

另外,根据本申请的实施方式,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请提供了非暂时性机器可读存储介质,所述非暂时性机器可读存储介质存储有机器可读指令,所述机器可读指令能够由处理器运行以执行与本申请提供的方法步骤对应的指令,一种用于人脸替换的智能设计方法。在该计算机程序被中央处理单元(CPU)执行时,执行本申请的方法中限定的上述功能。In addition, according to the implementation of the present application, the process described above with reference to the flowchart can be implemented as a computer software program. For example, the present application provides a non-transitory machine-readable storage medium, which stores machine-readable instructions, and the machine-readable instructions can be executed by a processor to execute instructions corresponding to the method steps provided by the present application, an intelligent design method for face replacement. When the computer program is executed by a central processing unit (CPU), the above functions defined in the method of the present application are executed.

Claims (10)

Translated fromChinese
1.一种用于人脸替换的智能设计系统,其特征在于,包括:人脸检测追踪模块,用于对输入的视频流进行人脸检测,获取人脸区域图像;1. An intelligent design system for face replacement, characterized by comprising: a face detection and tracking module, used to perform face detection on an input video stream and obtain a face area image;人脸特征提取模块,用于对人脸区域图像提取人脸关键点坐标序列和深度学习特征,根据人脸关键点坐标序列和深度学习特征获取人脸区域特征;A face feature extraction module is used to extract a facial key point coordinate sequence and deep learning features from a face region image, and obtain facial region features based on the facial key point coordinate sequence and deep learning features;人脸匹配度评估模块,用于对待替换人脸区域图像与目标人脸图像进行人脸区域特征比对,将待替换人脸区域图像与目标人脸图像划分为n个子区域,获取待替换人脸区域图像与目标人脸图像n个子区域之间的n个替换匹配度;A face matching evaluation module is used to compare face region features between the face region image to be replaced and the target face image, divide the face region image to be replaced and the target face image into n sub-regions, and obtain n replacement matching degrees between the face region image to be replaced and the n sub-regions of the target face image;自适应图像融合模块,设置匹配度阈值区间,通过比对n个替换匹配度和匹配度阈值区间获取比对结果;根据比对结果采用图像融合算法,将目标人像图像和待替换人脸区域图像进行融合替换;各个模块之间通过有线和/或无线的方式进行连接,实现模块间的数据传输。The adaptive image fusion module sets a matching degree threshold interval and obtains a matching result by comparing n replacement matching degrees and the matching degree threshold interval; an image fusion algorithm is used according to the matching result to fuse and replace the target portrait image and the image of the face area to be replaced; each module is connected by wire and/or wireless means to realize data transmission between modules.2.根据权利要求1所述的一种用于人脸替换的智能设计系统,其特征在于,所述对输入的视频流进行人脸检测,获取人脸区域图像的方式包括:2. According to claim 1, an intelligent design system for face replacement is characterized in that the method of performing face detection on the input video stream to obtain the face area image includes:预训练完成的MTCNN人脸检测模型,加载到内存中;逐帧读取输入的视频流画面,提取RGB图像;将提取的RGB图像,输入到预先训练MTCNN人脸检测模型中,生成每个人脸的坐标框;根据人脸坐标框,在输入的RGB图像中截取人脸框所在的区域,生成人脸区域图像;The pre-trained MTCNN face detection model is loaded into the memory; the input video stream is read frame by frame to extract the RGB image; the extracted RGB image is input into the pre-trained MTCNN face detection model to generate the coordinate frame of each face; according to the face coordinate frame, the area where the face frame is located is intercepted in the input RGB image to generate the face area image;所述MTCNN人脸检测模型的预训练过程包括:The pre-training process of the MTCNN face detection model includes:将MTCNN人脸检测模型的训练分为三个阶段,三个阶段包括训练提取网络、训练优化网络和训练输出网络。The training of the MTCNN face detection model is divided into three stages, including training the extraction network, training the optimization network, and training the output network.3.根据权利要求2所述的一种用于人脸替换的智能设计系统,其特征在于,所述训练提取网络的方式包括:3. The intelligent design system for face replacement according to claim 2, wherein the method of training the extraction network comprises:采集m张包括人脸的图像作为训练集;m为大于1的整数;对训练集图像进行人工或者计算机人脸标注,获取包括人脸标签框的图像训练集;Collect m images including human faces as a training set, where m is an integer greater than 1; perform manual or computer face annotation on the training set images to obtain an image training set including a face label frame;提取网络采用卷积神经网络结构,包括卷积层、激活函数层、池化层和全连接层组成;The extraction network adopts a convolutional neural network structure, which includes convolutional layer, activation function layer, pooling layer and fully connected layer;设置输入层为输入RGB图像的三通道像素数据;使用b个卷积核对输入RGB图像进行卷积运算,获取特征图;利用激活函数层,对卷积层输出的特征图施加非线性激活;通过池化层,对特征图进行降维处理;得到降维特征图;利用全连接层将降维特征图变为特征图向量;Set the input layer to the three-channel pixel data of the input RGB image; use b convolution kernels to perform convolution operation on the input RGB image to obtain the feature map; use the activation function layer to apply nonlinear activation to the feature map output by the convolution layer; use the pooling layer to reduce the dimension of the feature map; obtain the reduced dimension feature map; use the fully connected layer to convert the reduced dimension feature map into a feature map vector;损失函数层采用交叉熵函数H(P,Q)=-∑iP(i)log(Q(i));The loss function layer uses the cross entropy function H(P,Q)=-∑i P(i)log(Q(i));其中,H(P,Q)表示真实分布P和模型预测分布Q之间的交叉熵损失;P(i)表示真实分布中第i个类别的概率;Q(i)表示模型预测分布中第i个类别的概率;Among them, H(P,Q) represents the cross entropy loss between the true distribution P and the model predicted distribution Q; P(i) represents the probability of the i-th category in the true distribution; Q(i) represents the probability of the i-th category in the model predicted distribution;定义回归框损失函数测量网络预测框和标签框的误差;Define the regression box loss function to measure the error between the network prediction box and the label box;回归框损失函数为:The regression box loss function is:其中E的取值为px-gx,py-gy,pw-gw和ph-ghThe values of E are px -gx , py -gyy , pw -gw and ph -gh ;其中,(px,py)为预测框的中心坐标,pw为预测框的宽,ph为预测框的高;(gx,gy)为标签框的中心坐标,gw为标签框的宽,gh为标签框的高;N为总的图像训练集内图像的数量;Where (px ,py ) is the center coordinate of the prediction box, pw is the width of the prediction box, andph is the height of the prediction box; (gx ,gy ) is the center coordinate of the label box, gw is the width of the label box, and gh is the height of the label box; N is the number of images in the total image training set;利用反向传播算法根据损失函数更新网络参数,使预测框不断逼近标签框;通过迭代优化模型,最终获得提取网络。The back propagation algorithm is used to update the network parameters according to the loss function, so that the prediction box continues to approach the label box; by iteratively optimizing the model, the extraction network is finally obtained.4.根据权利要求3所述的一种用于人脸替换的智能设计系统,其特征在于,训练优化网络的步骤包括:4. The intelligent design system for face replacement according to claim 3, wherein the step of training the optimization network comprises:利用提取网络对训练集图像生成的所有候选框进行评估;评估每个候选框与标签框的IoU;IoU表示两者的重合度;Use the extraction network to evaluate all candidate boxes generated by the training set images; evaluate the IoU between each candidate box and the label box; IoU indicates the degree of overlap between the two;若候选框的IoU>0.5,则标注该候选框为正样本;若候选框的IoU≤0.5,则标注该候选框为负样本;If the IoU of the candidate box is greater than 0.5, the candidate box is marked as a positive sample; if the IoU of the candidate box is less than or equal to 0.5, the candidate box is marked as a negative sample;定义优化网络的网络结构,网络结构包括卷积层、池化层和全连接层;定义优化网络的训练目标为输出为一个二值判断结果,二值判断结果为当前候选框是否包括人脸;期望为其判断结果与标注的标签一致;标签包括正样本和负样本;Define the network structure of the optimization network, which includes convolutional layers, pooling layers, and fully connected layers; define the training goal of the optimization network as outputting a binary judgment result, which is whether the current candidate box includes a face; expect the judgment result to be consistent with the annotated label; the label includes positive samples and negative samples;采用二分类交叉熵损失函数作为优化网络的损失函数;The binary cross entropy loss function is used as the loss function of the optimization network;二分类交叉熵损失函数Loss(o,y)=-(ylog(o)+(1-y)log(1-o));Binary cross entropy loss function Loss(o,y)=-(ylog(o)+(1-y)log(1-o));其中,o为当前样本预测为正样本的概率;y当前样本的标签,对于正样本y为1、对于负样本y为0;Among them, o is the probability that the current sample is predicted to be a positive sample; y is the label of the current sample, which is 1 for positive samples and 0 for negative samples;输入带候选框的图像至优化网络进行网络计算,得到二值判断结果;比较二值判断结果与标签并计算二分类交叉熵损失函数值;通过反向传播等算法更新优化网络的网络参数,二分类交叉熵损失函数的值在更新优化的过程中会逐步下降,当其下降速度变缓时,则结束训练;得到优化网络。Input the image with the candidate box to the optimized network for network calculation to obtain the binary judgment result; compare the binary judgment result with the label and calculate the value of the binary cross entropy loss function; update the network parameters of the optimized network through back propagation and other algorithms. The value of the binary cross entropy loss function will gradually decrease during the update and optimization process. When its decrease rate slows down, the training is terminated; and the optimized network is obtained.5.根据权利要求4所述的一种用于人脸替换的智能设计系统,其特征在于,所述训练输出网络的步骤包括:5. The intelligent design system for face replacement according to claim 4, wherein the step of training the output network comprises:经过优化网络的过滤,留下的高质量候选框图片作为输出网络的训练样本;构建输出网络的网络结构与优化网络的网络结构相同;After filtering by the optimized network, the high-quality candidate frame images left are used as training samples for the output network; the network structure for constructing the output network is the same as that of the optimized network;定义输出网络的损失函数为坐标回归损失函数,坐标回归损失函数用于测量输出网络预测框与标签框在位置和大小上的误差平方和;The loss function of the output network is defined as a coordinate regression loss function, which is used to measure the sum of square errors between the output network prediction box and the label box in terms of position and size.输出网络的输入为候选框图像,输出网络的输出为调整后的预测框坐标;预测框坐标包括预测框的中心坐标、预测框的宽和预测框的高;The input of the output network is the candidate box image, and the output of the output network is the adjusted prediction box coordinates; the prediction box coordinates include the center coordinates of the prediction box, the width of the prediction box, and the height of the prediction box;预定义输出网络的参数,输出网络根据当前参数,预测输出调整后的候选框坐标;计算该预测框与该图像标签框的坐标回归损失函数值;通过链式法则求导计算每个参数对坐标回归损失函数值的梯度;沿负梯度方向微调输出网络的参数,使坐标回归损失函数的值降低;当其降低速度变缓时,则结束训练;得到输出网络;Predefine the parameters of the output network, and the output network predicts the coordinates of the candidate box after the output adjustment according to the current parameters; calculate the coordinate regression loss function value of the predicted box and the image label box; calculate the gradient of each parameter to the coordinate regression loss function value by derivation through the chain rule; fine-tune the parameters of the output network along the negative gradient direction to reduce the value of the coordinate regression loss function; when the reduction speed slows down, end the training; and obtain the output network;三个阶段均训练完成后,将三个阶段串联即构成完整的MTCNN人脸检测模。After the three stages are trained, they are connected in series to form a complete MTCNN face detection model.6.根据权利要求5所述的一种用于人脸替换的智能设计系统,其特征在于,所述人脸关键点坐标序列的提取过程包括:6. The intelligent design system for face replacement according to claim 5, wherein the process of extracting the facial key point coordinate sequence comprises:构建关键点检测卷积神经网络,关键点检测卷积神经网络包括a个卷积层、池化层、和全连接层;关键点检测卷积神经网络的输入为人脸区域图像;人脸区域图像为RGB图像;关键点检测卷积神经网络的输出为人脸关键点坐标,人脸关键点共c个;Construct a key point detection convolutional neural network, which includes a convolutional layer, a pooling layer, and a fully connected layer; the input of the key point detection convolutional neural network is a face area image; the face area image is an RGB image; the output of the key point detection convolutional neural network is the coordinates of the face key points, and there are c face key points in total;收集并标注v张人脸图像特征图像,构成人脸图像特征图像集;Collect and annotate v facial image feature images to form a facial image feature image set;收集的人脸图像特征图像涵盖不同性别、年龄、种族且面部表情姿态均不同;使用图像标注工具手动或计算机标记每张人脸图像特征图像中人脸的c个关键点坐标;The collected facial image feature images cover different genders, ages, races and facial expressions and postures; the image annotation tools are used to manually or computer-mark the coordinates of c key points of the face in each facial image feature image;将标注过的人脸图像特征图像集输入关键点检测卷积神经网络;损失函数为预测的关键点坐标与标注的关键点坐标的欧式距离;The labeled facial image feature set is input into the key point detection convolutional neural network; the loss function is the Euclidean distance between the predicted key point coordinates and the labeled key point coordinates;在单张人脸图像特征图像上的损失函数为The loss function on a single face image feature image is其中,为关键点检测卷积神经网络预测的第i张图像的第j个关键点坐标;yij为第i张图像手工标注的第j个关键点的坐标;/>为预测关键点坐标/>与标注关键点坐标yij的欧式距离;c为人脸关键点的数量;in, is the coordinate of the jth key point of the i-th image predicted by the key point detection convolutional neural network;yij is the coordinate of the jth key point manually annotated in the i-th image; /> To predict key point coordinates/> The Euclidean distance from the coordinates of the labeled key points yij ; c is the number of facial key points;在人脸图像特征图像集上,关键点检测模型的总体损失函数为:On the face image feature image set, the overall loss function of the key point detection model is:其中,v为人脸图像特征图像集包括的所有图像数量;Among them, v is the number of all images included in the face image feature image set;定义关键点检测卷积神经网络的网络参数并初始化;输入人脸图像特征图像集内的图像进行训练,并前向传播,计算损失Lz;通过反向传播算法计算损失Lz对参数θ的梯度通过梯度下降法/>其中,α为学习率;通过迭代优化,逐步减小损失Lz的值,得到人脸关键点检测模型;Define the network parameters of the key point detection convolutional neural network and initialize it; input the images in the face image feature set for training, and forward propagate to calculate the lossLz ; calculate the gradient of the lossLz to the parameter θ through the back-propagation algorithm By gradient descent method/> Among them, α is the learning rate; through iterative optimization, the value of lossLz is gradually reduced to obtain the face key point detection model;将人脸区域图像输入至人脸关键点检测模型预测提取输出c个关键点坐标;将c个关键点坐标构成人脸关键点坐标序列;Input the face region image into the face key point detection model to predict and extract the output c key point coordinates; construct the face key point coordinate sequence from the c key point coordinates;所述深度学习特征的提取过程包括:The deep learning feature extraction process includes:预先定义人脸关键点检测模型的中间层;将人脸区域图像输入至人脸关键点检测模型中;提取人脸关键点检测模型中间层输出的深度特征向量;深度特征向量即为深度学习特征;Predefine the middle layer of the face key point detection model; input the face area image into the face key point detection model; extract the deep feature vector output by the middle layer of the face key point detection model; the deep feature vector is the deep learning feature;所述人脸区域特征的获取方式包括:The method of acquiring the facial region features includes:深度特征向量为一个定长的向量,人脸关键点坐标序列为不定长;将人脸关键点坐标序列展平,每一个关键点用一个长度为2的向量表示;将深度特征向量,与展平后的人脸关键点坐标序列拼接;得到一个定长向量。The depth feature vector is a fixed-length vector, and the facial key point coordinate sequence is of variable length; the facial key point coordinate sequence is flattened, and each key point is represented by a vector of length 2; the depth feature vector is concatenated with the flattened facial key point coordinate sequence to obtain a fixed-length vector.7.根据权利要求6所述的一种用于人脸替换的智能设计系统,其特征在于,所述替换匹配度的获取方式包括:7. The intelligent design system for face replacement according to claim 6, wherein the method for obtaining the replacement matching degree comprises:利用人脸特征提取模块分别提取待替换人脸区域图像A的人脸区域特征和目标人脸图像B的人脸区域特征;根据n个子区域划分n对人脸区域特征;人脸区域特征为高维向量,包括深度学习特征和人脸关键点坐标序列两部分信息;分别计算n对人脸区域特征之间的余弦相似度sim(ai,bi);The facial region features of the face region image A to be replaced and the face region features of the target face image B are respectively extracted by using the facial feature extraction module; n pairs of facial region features are divided according to n sub-regions; the facial region features are high-dimensional vectors, including two parts of information, namely, deep learning features and facial key point coordinate sequences; the cosine similarity sim(ai ,bi ) between the n pairs of facial region features is respectively calculated;其中,ai为第i个子区域的待替换人脸区域图像A的人脸区域特征;bi为第i个子区域的目标人脸图像B的人脸区域特征;ai.bi表示ai和bi向量的内积;||ai||和||bi||分别表示ai和bi的L2范数;Wherein,ai is the face region feature of the face region image A to be replaced in the i-th sub-region;bi is the face region feature of the target face image B in the i-th sub-region;ai.bi represents the inner product ofai andbi vectors; ||ai || and ||bi || represent the L2 norm ofai andbi respectively;将sim(ai,bi)的值作为待替换人脸区域图像与目标人脸图像子区域之间的替换匹配度。The value of sim(ai ,bi ) is used as the replacement matching degree between the face region image to be replaced and the sub-region of the target face image.8.根据权利要求7所述的一种用于人脸替换的智能设计系统,其特征在于,匹配度阈值区间的设置方式包括:8. The intelligent design system for face replacement according to claim 7, wherein the setting method of the matching degree threshold interval includes:收集包括u对人脸图像的训练数据集,每对人脸图像为同一人不同场景拍摄的人脸图片;对每对人脸图像计算匹配度scoreij;统计整个训练数据集中所有对人脸图像的匹配度scoreij,计算所有对人脸图像的匹配度scoreij的均值计算所有对人脸图像的匹配度scoreij的标准差/>其中,u为训练数据集内人脸图像对的数量;Collect a training data set including u pairs of face images, where each pair of face images is a face image of the same person taken in different scenes; calculate the matching degree scoreij for each pair of face images; count the matching degrees scoreij of all pairs of face images in the entire training data set, and calculate the mean of the matching degrees scoreij of all pairs of face images Calculate the standard deviation of the matching scoresij of all face images/> Where u is the number of face image pairs in the training dataset;设置匹配度阈值区间的上限为ρ+3σ;设置匹配度阈值区间下限为ρ-3σ;则匹配度阈值区间为[ρ+3σ,ρ-3σ];Set the upper limit of the matching threshold interval to ρ+3σ; set the lower limit of the matching threshold interval to ρ-3σ; then the matching threshold interval is [ρ+3σ,ρ-3σ];所述对比结果为高匹配或中匹配或低匹配;The comparison result is a high match, a medium match, or a low match;若n个替换匹配度中的替换匹配度大于或等于,则对应的子区域的对比结果为高匹配;若n个替换匹配度中的替换匹配度处于匹配度阈值区间内,则对应的子区域的对比结果为中匹配;若n个替换匹配度中的替换匹配度小于或等于ρ-3σ,则对应的子区域的对比结果为低匹配。If the replacement matching degree among the n replacement matching degrees is greater than or equal to, the comparison result of the corresponding sub-region is a high match; if the replacement matching degree among the n replacement matching degrees is within the matching degree threshold interval, the comparison result of the corresponding sub-region is a medium match; if the replacement matching degree among the n replacement matching degrees is less than or equal to ρ-3σ, the comparison result of the corresponding sub-region is a low match.9.根据权利要求8所述的一种用于人脸替换的智能设计系统,其特征在于,所述根据比对结果采用图像融合算法,将目标人像图像和待替换人脸区域图像进行融合替换的方式包括:9. The intelligent design system for face replacement according to claim 8, characterized in that the method of using an image fusion algorithm according to the comparison result to fuse and replace the target portrait image and the face area image to be replaced comprises:将对比结果为高匹配的目标人像图像和待替换人脸区域图像子区域运用泊松融合算法进行融合替换;The target portrait image with a high match in the comparison result and the sub-region of the face area to be replaced are fused and replaced using the Poisson fusion algorithm;将对比结果为中匹配的目标人像图像和待替换人脸区域图像子区域运用颜色迁移算法进行融合替换;The target portrait image and the sub-region of the face area to be replaced are fused and replaced by using the color migration algorithm;将对比结果为低匹配的目标人像图像和待替换人脸区域图像子区域运用几何变换算法进行融合替换;实现目标人像图像和待替换人脸区域图像之间的融合替换。The target portrait image with a low match in comparison result and the sub-region of the face area image to be replaced are fused and replaced using a geometric transformation algorithm; and the fusion replacement between the target portrait image and the sub-region of the face area image to be replaced is achieved.10.一种用于人脸替换的智能设计方法,其基于权利要求1至9中任一项所述的一种用于人脸替换的智能设计系统实现,其特征在于,包括:S1、对输入的视频流进行人脸检测,获取人脸区域图像;10. An intelligent design method for face replacement, which is implemented based on an intelligent design system for face replacement according to any one of claims 1 to 9, characterized in that it comprises: S1, performing face detection on an input video stream to obtain a face area image;S2、对人脸区域图像提取人脸关键点坐标序列和深度学习特征,根据人脸关键点坐标序列和深度学习特征获取人脸区域特征;S2, extracting facial key point coordinate sequence and deep learning features from the facial region image, and obtaining facial region features according to the facial key point coordinate sequence and the deep learning features;S3、对待替换人脸区域图像与目标人脸图像进行人脸区域特征比对,将待替换人脸区域图像与目标人脸图像划分为n个子区域,获取待替换人脸区域图像与目标人脸图像n个子区域之间的n个替换匹配度;S3, performing facial region feature comparison between the facial region image to be replaced and the target facial image, dividing the facial region image to be replaced and the target facial image into n sub-regions, and obtaining n replacement matching degrees between the facial region image to be replaced and the n sub-regions of the target facial image;S4、设置匹配度阈值区间,通过比对n个替换匹配度和匹配度阈值区间获取比对结果;根据比对结果采用图像融合算法,将目标人像图像和待替换人脸区域图像进行融合替换。S4, setting a matching degree threshold interval, and obtaining a matching result by comparing n replacement matching degrees and the matching degree threshold interval; and using an image fusion algorithm according to the matching result to fuse and replace the target portrait image and the face area image to be replaced.
CN202311680842.1A2023-12-082023-12-08 An intelligent design system for face replacementActiveCN118052723B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202311680842.1ACN118052723B (en)2023-12-082023-12-08 An intelligent design system for face replacement

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202311680842.1ACN118052723B (en)2023-12-082023-12-08 An intelligent design system for face replacement

Publications (2)

Publication NumberPublication Date
CN118052723Atrue CN118052723A (en)2024-05-17
CN118052723B CN118052723B (en)2024-11-22

Family

ID=91045618

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202311680842.1AActiveCN118052723B (en)2023-12-082023-12-08 An intelligent design system for face replacement

Country Status (1)

CountryLink
CN (1)CN118052723B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN118553003A (en)*2024-07-302024-08-27辽宁明远和邦跨境电商有限公司Live broadcasting face changing system and method based on AI face recognition
CN118887714A (en)*2024-07-082024-11-01杭州轩晔数字科技有限公司 An automatic target locking algorithm for video face swapping based on target tracking

Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105118082A (en)*2015-07-302015-12-02科大讯飞股份有限公司Personalized video generation method and system
CN106951089A (en)*2017-03-282017-07-14深圳市石代科技有限公司Gesture interaction method and system
CN110363170A (en)*2019-07-222019-10-22北京华捷艾米科技有限公司Video is changed face method and apparatus
CN110517214A (en)*2019-08-282019-11-29北京百度网讯科技有限公司 Method and device for generating images
CN111259742A (en)*2020-01-092020-06-09南京理工大学Abnormal crowd detection method based on deep learning
WO2021023003A1 (en)*2019-08-052021-02-11深圳Tcl新技术有限公司Face conversion model training method, storage medium, and terminal device
WO2021134871A1 (en)*2019-12-302021-07-08深圳市爱协生科技有限公司Forensics method for synthesized face image based on local binary pattern and deep learning
WO2022267653A1 (en)*2021-06-232022-12-29北京旷视科技有限公司Image processing method, electronic device, and computer readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105118082A (en)*2015-07-302015-12-02科大讯飞股份有限公司Personalized video generation method and system
CN106951089A (en)*2017-03-282017-07-14深圳市石代科技有限公司Gesture interaction method and system
CN110363170A (en)*2019-07-222019-10-22北京华捷艾米科技有限公司Video is changed face method and apparatus
WO2021023003A1 (en)*2019-08-052021-02-11深圳Tcl新技术有限公司Face conversion model training method, storage medium, and terminal device
CN110517214A (en)*2019-08-282019-11-29北京百度网讯科技有限公司 Method and device for generating images
WO2021134871A1 (en)*2019-12-302021-07-08深圳市爱协生科技有限公司Forensics method for synthesized face image based on local binary pattern and deep learning
CN111259742A (en)*2020-01-092020-06-09南京理工大学Abnormal crowd detection method based on deep learning
WO2022267653A1 (en)*2021-06-232022-12-29北京旷视科技有限公司Image processing method, electronic device, and computer readable storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KUO-YU CHIU等: "A Face Replacement System Based on Face Pose Estimation", INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, vol. 1, no. 6, 31 December 2010 (2010-12-31)*
孙晓红;周舒扬;: ""换脸视频"引发的传播伦理风险与应对", 青年记者, no. 11, 20 April 2020 (2020-04-20)*
黄诚;: "基于Candide-3算法的图像中面部替换技术", 计算技术与自动化, no. 02, 15 June 2018 (2018-06-15)*

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN118887714A (en)*2024-07-082024-11-01杭州轩晔数字科技有限公司 An automatic target locking algorithm for video face swapping based on target tracking
CN118887714B (en)*2024-07-082025-05-16杭州轩晔数字科技有限公司 An automatic target locking algorithm for video face swapping based on target tracking
CN118553003A (en)*2024-07-302024-08-27辽宁明远和邦跨境电商有限公司Live broadcasting face changing system and method based on AI face recognition

Also Published As

Publication numberPublication date
CN118052723B (en)2024-11-22

Similar Documents

PublicationPublication DateTitle
US12094247B2 (en)Expression recognition method and related apparatus
CN110532920B (en)Face recognition method for small-quantity data set based on FaceNet method
CN109902548B (en) Object attribute identification method, device, computing device and system
CN105868716B (en)A kind of face identification method based on facial geometric feature
CN111160269A (en) A method and device for detecting facial key points
CN107808129B (en) A facial multi-feature point localization method based on a single convolutional neural network
WO2021196389A1 (en)Facial action unit recognition method and apparatus, electronic device, and storage medium
CN118052723A (en)Intelligent design system for face replacement
CN110660037A (en)Method, apparatus, system and computer program product for face exchange between images
CN111680550B (en)Emotion information identification method and device, storage medium and computer equipment
CN108446672B (en)Face alignment method based on shape estimation of coarse face to fine face
WO2020078119A1 (en)Method, device and system for simulating user wearing clothing and accessories
WO2023284182A1 (en)Training method for recognizing moving target, method and device for recognizing moving target
CN110826534B (en)Face key point detection method and system based on local principal component analysis
CN110956082B (en)Face key point detection method and detection system based on deep learning
CN112699837A (en)Gesture recognition method and device based on deep learning
WO2024109374A1 (en)Training method and apparatus for face swapping model, and device, storage medium and program product
CN112001215B (en)Text irrelevant speaker identity recognition method based on three-dimensional lip movement
CN110569724A (en) A Face Alignment Method Based on Residual Hourglass Network
CN110610138A (en) A Facial Sentiment Analysis Method Based on Convolutional Neural Network
CN117689887A (en)Workpiece grabbing method, device, equipment and storage medium based on point cloud segmentation
CN110909778A (en)Image semantic feature matching method based on geometric consistency
CN118799343A (en) Plant segmentation method, device and equipment based on depth information
CN111523406A (en) A deflected face-to-positive method based on the improved structure of generative adversarial network
Luo et al.Gaze Estimation Based on the Improved Xception Network

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp