Movatterモバイル変換


[0]ホーム

URL:


CN115661900B - A method for converting thermal infrared to visible light images of faces based on prior information - Google Patents

A method for converting thermal infrared to visible light images of faces based on prior information

Info

Publication number
CN115661900B
CN115661900BCN202211325764.9ACN202211325764ACN115661900BCN 115661900 BCN115661900 BCN 115661900BCN 202211325764 ACN202211325764 ACN 202211325764ACN 115661900 BCN115661900 BCN 115661900B
Authority
CN
China
Prior art keywords
face
image
visible light
thermal infrared
facial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211325764.9A
Other languages
Chinese (zh)
Other versions
CN115661900A (en
Inventor
吴先健
高新波
张颜
王楠楠
梁凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and TelecommunicationsfiledCriticalChongqing University of Post and Telecommunications
Priority to CN202211325764.9ApriorityCriticalpatent/CN115661900B/en
Publication of CN115661900ApublicationCriticalpatent/CN115661900A/en
Application grantedgrantedCritical
Publication of CN115661900BpublicationCriticalpatent/CN115661900B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Landscapes

Abstract

Translated fromChinese

本发明属于计算机视觉与人工智能领域,具体涉及一种基于先验信息的人脸热红外‑可见光图像转换方法。在对比学习框架下,本发明设计了一种基于人脸解析图作为先验信息去引导生成网络学习人脸图像的局部纹理信息。基于先验信息的人脸热红外‑可见光生成网络模型主要包括人脸解析图条件网络模块、空间特征变换映射层、注意力模块、生成器网络模块以及判别器;该模型通过空间特征变换映射层STL进行转换,它以人脸解析图映射特征为先验条件,生成一对调制参数,根据调制参数对生成网络的人脸特征进行仿射变换,从而自适应地优化人脸图像的生成质量,同时通过设计的人脸梯度增强损失一起监督学习,有利于缓解人脸生成图像上的伪影出现,提高局部纹理细节,使图像生成尽可能还原对应的人脸属性信息。

The present invention belongs to the field of computer vision and artificial intelligence, and specifically relates to a method for converting thermal infrared-visible light images of faces based on prior information. Under the contrastive learning framework, the present invention designs a method based on a face parsing graph as prior information to guide the generation network to learn the local texture information of face images. The thermal infrared-visible light generation network model of the face based on prior information mainly includes a face parsing graph conditional network module, a spatial feature transformation mapping layer, an attention module, a generator network module and a discriminator; the model is converted through the spatial feature transformation mapping layer STL, which uses the face parsing graph mapping features as a priori conditions to generate a pair of modulation parameters, and performs affine transformation on the face features of the generation network according to the modulation parameters, thereby adaptively optimizing the generation quality of the face image, and at the same time supervises the learning together with the designed face gradient enhancement loss, which is conducive to alleviating the appearance of artifacts on the face generated image, improving local texture details, and making the image generation restore the corresponding face attribute information as much as possible.

Description

Facial thermal infrared-visible light image conversion method based on priori information
Technical Field
The invention belongs to the field of computer vision and artificial intelligence, and particularly relates to a facial thermal infrared-visible light image conversion method based on priori information.
Background
Along with the popularization of various sensing instruments in daily life, the image data of different modes are also enriched. In the field of computer vision, conversion and mapping relation learning between images of different modes are important research directions. The thermal infrared image sensor is a novel imaging mode for acquiring information describing a target and a scene according to radiant energy between the target and the scene. It is therefore suitable for security monitoring and military reconnaissance applications. The face thermal infrared image is different from the near infrared image, has the defects of poor visibility, low contrast, lack of detailed texture information and details, sensitivity to temperature difference and the like, and is difficult to effectively identify and analyze. The visible light image of the human face has abundant detailed information of the texture of the human face, has high contrast and high visibility, and has a great number of mature visible light face recognition and analysis methods. Therefore, the thermal infrared face image is converted into the visible light face image, and the face information in the thermal infrared image can be efficiently identified and analyzed.
At present, a plurality of cross-mode generation frames are developed aiming at the conversion problem of the thermal infrared image and the visible light image of the human face. Among these general transformation frameworks are two broad classes-Generation Antagonism Networks (GAN) and their variants and variation self-encoders (VAE) and their variants, most of the current visible-thermal infrared face image transformation algorithms are constructed based on Generation Antagonism Networks (GAN). Such as the U-Net structure based image conversion generic framework pix2pix, which uses the L1 loss metric to generate the difference between the image and the real image. The method needs the facial thermal infrared-visible light data matched with each other. Therefore, the method has the problems of insufficient image quantity, difficult acquisition and the like in the pixel-level paired data set, and cannot meet the training of a high-quality face synthesis model. CycleGAN is a classical unpaired image conversion framework, which can realize bidirectional image conversion, introduces cyclic consistency loss, is favorable for the generation of details of images to a certain extent, and still causes distortion and artifact phenomena in the synthesized face image.
In summary, in the above method, when performing the task of generating a thermal infrared-visible light face, the generated face visible light image may have a problem of face attribute, such as a problem of skin color, a problem of image artifact generation, and a problem of loss of texture details. The quality problems and visual effects of the generated images are affected to a great extent.
Disclosure of Invention
The invention provides a facial thermal infrared-visible light image conversion method based on prior information, which is characterized by comprising the steps of obtaining a facial infrared image to be converted and a corresponding visible light paired image, inputting the facial infrared image into a trained facial thermal infrared-visible light generation network model based on prior information to obtain a facial visible light synthesized image, wherein the facial thermal infrared-visible light generation network model based on prior information comprises a facial analysis map condition network module, a spatial feature transformation mapping layer, a attention module, a generator network module and a discriminator;
the process for training the human face thermal infrared-visible light generation network model based on the priori information comprises the following steps:
s1, acquiring a thermal infrared-visible light image dataset, and classifying skin color information labels of images in the dataset;
s2, preprocessing the paired images in the data set, and inputting the preprocessed images into a facial thermal infrared-visible light generation network model based on priori information;
S3, extracting features of the preprocessed face analysis chart by adopting a face analysis chart condition network module to obtain face priori information features;
S4, carrying out feature extraction and coding on the preprocessed human face thermal infrared image by adopting a generator network module to obtain coded human face feature information;
s5, inputting the prior information features of the human face and the encoded human face feature information into a space feature transformation mapping layer to generate a pair of modulation parameters;
S6, the face characteristic information is subjected to multi-layer residual transformation and a decoder to obtain a corresponding face visible light combined image, and the face visible light combined image is input into a discriminator for discrimination training;
S7, inputting the human face visible light synthesized image and the human face thermal infrared image into an encoder and two layers of MLP mapping layers to obtain corresponding human face thermal infrared image features and human face visible light synthesized image features;
And S8, optimizing parameters of the model by adopting an Adam optimizer, and outputting optimal parameters when the loss function of the model is minimum, so as to obtain an optimal human face thermal infrared-visible light generation network model based on priori information.
Preferably, the generator network module comprises an encoder Genc, a converter and a decoder Gdec, wherein the encoder Genc mainly comprises 3 layers of CIR, each layer of CIR is formed by convolution, instanceNorm normalization and ReLU activation functions, the input image is subjected to feature extraction through the encoder, the converter comprises 9 Residual blocks, each Residual block is sequentially formed by a spatial feature transformation mapping layer STL and CIR layers, the Residual block mainly carries out enhancement processing on the feature image extracted by the encoder, the decoder Gdec comprises two CTIR layers, one Reflect operation layer and one convolution layer, wherein the CTIR layers are sequentially formed by deconvolution, instanceNorm normalization and ReLU activation functions, the decoder is used for up-sampling operation, and the learned face features are gradually reconstructed to the original image size.
Preferably, the two input face features are processed by using a spatial feature transformation mapping layer. The two inputs are face priori information characteristics generated by a face analysis chart condition module and characteristic output GF of each layer of a generating network respectively, wherein the face priori information characteristics are respectively subjected to two-layer convolution operation to obtain a pair of parameters alpha and beta, and the face characteristic output GF in the generating network is subjected to point multiplication firstly and then to addition operation according to the modulation parameters, so that the output of the whole STL network is obtained. The network can perform corresponding mapping transformation on the face characteristic information in the generation network in space, so that the generation quality of the face image is adaptively optimized.
Preferably, the attention module is adopted to perform contrast learning on the thermal infrared input image of the human face and the visible light combined image of the human face, and the process comprises the following steps:
S71, extracting the characteristics of the human face thermal infrared image and the characteristics of the human face visible light synthesized image in a multi-layer mode, namely respectively obtaining human face characteristic spectrums FH∈RC×H×W and FV∈RC×H×W by the characteristics of the human face thermal infrared image and the characteristics of the human face visible light synthesized image through an encoder Genc and a two-layer MLP network layer Hl;
S72, carrying out reshape and transferring operation on the characteristic spectrum of the thermal infrared image of the human face to obtain two-dimensional matrixes QH∈RHW×C and VH∈RHW×C;
And S73, constructing global attention contrast loss according to the two-dimensional matrixes QH∈RHW×C and VH∈RHW×C.
Furthermore, the model has two attention methods, namely, global attention and local attention are respectively used for building contrast learning loss. In this embodiment, global attention is used to construct contrast learning loss. The process for constructing the global attention contrast loss comprises the steps of multiplying QH by a transpose KH∈RC×HW of the QH to obtain a matrix, carrying out Softmax normalization operation on each row of the matrix to obtain a global attention matrix Aglobal∈RHW×HW, calculating an entropy value Hs of each row in the global attention matrix according to an entropy value calculation formula, carrying out ascending arrangement on each row of the global attention matrix according to the calculated entropy value, respectively routing VH∈RHW×C and VV∈RHW×C characteristics in a source domain face thermal infrared image and a target domain face visible light synthetic image according to the ordered matrices, and finally routing corresponding value characteristics VH and VV to construct the global contrast loss.
Further, the process of constructing the local attention contrast loss comprises the steps that a k multiplied by k window with constant size is used for local attention, and sliding inquiry with the step length of 1 is carried out on the source domain face thermal infrared image, so that the space information interaction in a local area can be enhanced. Multiplying QH by its partial transposeObtaining a matrix, and performing Softmax normalization operation on each row of the matrix to obtain a local attention matrixCalculating an entropy value Hs of each row in Alocal, arranging each row of data in a local attention matrix in an ascending order according to the calculated entropy value, and respectively routing the values of the source domain face thermal infrared image according to the arranged matrixAnd the value of the target domain face visible light combined imageThereby creating a multi-layer local contrast penalty.
Preferably, the loss function of the model is:
L=λ1LConH2LConG(H)3LPcl4LGm5Lgan
Wherein lambda1、λ2、λ3、λ4、λ5 is the super-parameters of contrast learning loss, identity preservation contrast learning loss, gradient enhancement loss, pixel level consistency loss and generation contrast loss respectively.
The invention has the beneficial effects that:
The invention uses the face analysis map as prior information to guide and generate local texture information of the network learning face image, and uses the face analysis map mapping feature as prior condition to generate a pair of modulation parameters through a spatial feature transformation mapping layer (STL), and carries out affine transformation on the face features of the generation network according to the modulation parameters, thereby adaptively optimizing the generation quality of the face image, being beneficial to relieving the occurrence of artifacts on the face generation image and improving the local texture detail. The invention designs a face gradient enhancement loss in the model training process, which extracts a gradient map corresponding to a face visible light synthesized image and a face visible light truth image (GT), and the gradient enhancement loss can enhance the face details of the face visible light synthesized image and keep better face contour information.
Drawings
FIG. 1 is a schematic flow chart of a face thermal infrared-visible light image synthesis method of the invention;
FIG. 2 is a schematic diagram of a frame structure of a face thermal infrared-visible image synthesis method according to the present invention;
FIG. 3 is a schematic diagram of a spatial feature transformation mapping layer method according to the present invention;
FIG. 4 is a schematic diagram of a face analysis diagram conditional network module method according to the present invention;
FIG. 5 is a schematic diagram of an attention module method according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
A facial thermal infrared-visible light image conversion method based on priori information is as shown in fig. 1 and 2, and comprises the steps of firstly preparing paired facial thermal infrared-visible light data sets and corresponding facial priori data sets (facial analytic images), simultaneously carrying out skin color information classification and label extraction, preprocessing corresponding data, constructing a facial thermal infrared-visible light generation network model based on the priori information and a model comprising a facial analytic image condition network (FPCN), a spatial feature conversion mapping layer (STL), a generator network module G, a discriminator D and the like, training a facial generation network model, carrying out corresponding attention operation on a source domain image and a target domain image to obtain a selected significant anchor point and positive and negative samples, carrying out training and optimization by combining a plurality of loss functions and an Adam optimizer, updating network parameters, completing training to obtain an optimal model, and inputting the facial thermal infrared image into the optimal generation model to obtain a facial visible light synthetic image.
The face thermal infrared-visible light image conversion method based on priori information comprises the steps of obtaining a face infrared image to be converted and a corresponding visible light paired image, inputting the face infrared image into a trained face thermal infrared-visible light generation network model based on the priori information to obtain a face visible light synthesized image, wherein the face thermal infrared-visible light generation network model based on the priori information comprises a face analytic graph condition network module, a spatial feature transformation mapping layer, an attention module, a generator network module and a discriminator.
The process for training the human face thermal infrared-visible light generation network model based on the prior information comprises the following steps:
s1, acquiring a thermal infrared-visible light image dataset, and classifying skin color information labels of images in the dataset;
s2, preprocessing the paired images in the data set, and inputting the preprocessed images into a facial thermal infrared-visible light generation network model based on priori information;
S3, extracting features of the preprocessed face analysis chart by adopting a face analysis chart condition network module to obtain face priori information features;
S4, carrying out feature extraction and coding on the preprocessed human face thermal infrared image by adopting a generator network module to obtain coded human face feature information;
s5, inputting the prior information features of the human face and the encoded human face feature information into a space feature transformation mapping layer to generate a pair of modulation parameters;
S6, the face characteristic information is subjected to multi-layer residual transformation and a decoder to obtain a corresponding face visible light combined image, and the face visible light combined image is input into a discriminator for discrimination training;
S7, inputting the human face visible light synthesized image and the human face thermal infrared image into an encoder and two layers of MLP mapping layers to obtain corresponding human face thermal infrared image features and human face visible light synthesized image features;
And S8, optimizing parameters of the model by adopting an Adam optimizer, and outputting optimal parameters when the loss function of the model is minimum, so as to obtain an optimal human face thermal infrared-visible light generation network model based on priori information.
Another embodiment for training a facial thermal infrared-visible light generating network model based on prior information, as shown in fig. 1, includes:
s1, acquiring a thermal infrared-visible light image dataset, setting iteration times, and classifying skin color information labels of images in the dataset;
s2, preprocessing the paired images in the data set, and inputting the preprocessed images into a facial thermal infrared-visible light generation network model based on priori information;
S3, extracting features of the preprocessed face analysis chart by adopting a face analysis chart condition network module to obtain face priori information features;
S4, carrying out feature extraction and coding on the preprocessed human face thermal infrared image by adopting a generator network module to obtain coded human face feature information;
s5, inputting the prior information features of the human face and the encoded human face feature information into a space feature transformation mapping layer to generate a pair of modulation parameters;
S6, the face characteristic information is subjected to multi-layer residual transformation and a decoder to obtain a corresponding face visible light combined image, and the face visible light combined image is input into a discriminator for discrimination training;
S7, inputting the human face visible light synthesized image and the human face thermal infrared image into an encoder and two layers of MLP mapping layers to obtain corresponding human face thermal infrared image features and human face visible light synthesized image features;
And S8, optimizing parameters of the model by adopting an Adam optimizer, carrying out back propagation on the optimized parameters, adding 1 to the iteration times, determining the current iteration times and the set iteration times, outputting the optimal parameters if the current iteration times are equal to the set iteration times, and obtaining the optimal human face thermal infrared-visible light generation network model based on prior information, otherwise, returning to the step S3.
Acquiring the infrared-visible light image pairing data set comprises acquiring corresponding face thermal infrared and visible light data sets by using a thermal infrared and visible light dual-mode camera, wherein the size of the face thermal infrared and visible light data set is 256x256. Then aligning the face and using RETINAFACE face detection algorithm to locate 5 key points of the face, cutting the face to obtain correspondent face thermal infrared-visible light data set, classifying skin color information of cut and unclamped face thermal image-visible light data set and extracting correspondent label for training of subsequent model, and according to the latest face analysis image synthetic model to generate correspondent face input priori information (face analysis image data set).
In this embodiment, as shown in fig. 4, the face analysis map condition network module (FPCN) uses the face analysis map as input and performs a 3-layer convolution layer processing. The convolution layer here uses 1 x1 and 3 x 3 convolution kernels to extract face features.
Spatial feature transformation mapping layer (STL) as shown in fig. 3, the module has two inputs, namely facial a priori information features generated by FPCN module and feature outputs GF of each layer of the generated network. The prior information feature of the face obtains a pair of parameters alpha and beta through a multi-layer convolution operation and a sigmoid activation function respectively. And performing point multiplication on the face characteristic output GF in the generating network according to the modulation parameters, and then performing addition operation, so as to obtain the output of the whole STL network. The network can perform corresponding mapping transformation on the face characteristic information in the generation network in space, so that the generation quality of the face image is adaptively optimized.
The generator network module G mainly comprises an encoder Genc, a converter consisting of 9 Residual blocks and a decoder Gdec.
In this embodiment, the encoder Genc consists essentially of 3 layers of CIRs, each layer of CIR consisting of a convolution, instanceNorm normalization, reLU activation function operation. A Reflect operation layer (boundary filling) is inserted before the 3-layer CIR, and the operation symmetrically fills the image up, down, left and right along the edge to increase the resolution of the image. The main function of the encoder Genc is to extract the face features.
In this embodiment, the converter consisting of a plurality of Residual blocks is mainly composed of 9 Residual blocks. Each Residual block is composed of a spatial feature conversion mapping layer and a CIR layer in sequence, and the Residual block is mainly used for enhancing features. The facial analysis chart is used as priori information, and the characteristics of the generation network are mapped and transformed through the modulation parameters generated by the spatial characteristic transformation mapping layer so as to be suitable for output, thereby greatly improving the texture details generated by the facial image.
In this embodiment, the decoder Gdec is mainly composed of two CTIR layers, one Reflect operation layer, and one convolution layer in this order. The CTIR layers consist of deconvolution, instanceNorm normalization, and ReLU activation functions in sequence. The decoder gradually reconstructs the image into a human face visible light image with the original size.
In this embodiment, the structure of the discriminator is PatchGAN, and the focus of the discriminator is to compare the matrix with the output structure of n×n, mainly considering the difference of global receptive field information, and having a certain high resolution and high detail retention for image definition. The method comprises the steps of selecting an image block with the size of 70 multiplied by 70 from original pictures to judge authenticity each time, finally outputting a matrix with the size of 30 multiplied by 30, and taking the average value of the matrix as the output of True/False.
As shown in fig. 5, the process of performing contrast learning on the target image and the input image by using the attention module includes:
Step 1, extracting multi-layer characteristics of source domain and target domain images, namely enabling the source domain image (human face thermal infrared input image) and the target domain image (human face visible light combined image) to pass through an encoder Genc and a two-layer MLP network layer Hl respectively. The human face thermal infrared input image extracts a plurality of layers of characteristic spectrums through an encoder Genc, meanwhile, an L-layer characteristic image after encoding is selected, each layer of characteristic has S space positions, and the characteristic of each layer of characteristic is projected to a shared embedded space through a 2-layer MLP network. Wherein different spatial locations of different layers represent different image blocks. The relationship after image block mapping is { yl}L=Hl{Gelnc(x)}L, L represents the output characteristics of the first layer, L e {1,2, 3., L }. The method is also corresponding to the target domain image, and the image blocks after encoding and mapping the target domain face visible light synthesized image are adopted, wherein one image block is taken as an anchor point, the image blocks at the corresponding positions of the source domain face thermal infrared input image are taken as positive samples, and the image blocks at other positions of the same face thermal infrared input image are taken as negative samples.
And 2, an attention module in contrast learning, wherein the attention module is mainly used for solving the problem of position selection of positive and negative samples in contrast learning, and is used for selecting random position image blocks to carry out constraint, which may not be suitable. Because less source domain saliency information is contained in certain positions of the image, only the saliency information containing important domains needs to be selected, and the contrast loss constructed in this way is more beneficial to ensuring the consistency of the cross domains. There are two methods of attention to the model, global attention and local attention, respectively, to construct contrast learning loss, in this embodiment global attention is used to construct contrast learning loss.
Step 2.1, the source domain face thermal infrared input image and the target domain face visible light combined image are respectively obtained into a feature map FH∈RC×H×W and a feature map FV∈RC×H×W through an encoder Genc and a two-layer MLP network layer Hl. In this embodiment, a global attention method is used, firstly, two-dimensional matrices QH∈RHW×C and VH∈RHW×C are obtained by performing two steps of operations of reshape and transferring the features of the thermal infrared image of the face, QH is multiplied by its transpose KH∈RC×HW to obtain a matrix, and then, softmax normalization is performed on each row of the matrix to obtain a global attention matrix aglobal∈RHW×HW. Entropy can be used as an index for measuring the importance of features, and thus the importance of features can be measured according to the calculated entropy value Hs of each line in aglobal. Where i and j correspond to index positions of rows and columns in Aglobal, the global calculation formula for entropy value Hs is:
After calculating the entropy value of each row of Aglobal, performing a sort ascending operation according to the entropy value, and selecting the minimum N rows as a global attention matrix Aglobal-s∈RN×HW. And respectively routing the matrix to VH∈RHW×C and VV∈RHW×C characteristics in the source domain face thermal infrared image and the target domain face visible light combined image.
Aglobal-s is applied to features from the face thermal infrared input image and the face visible light combined image as shown in fig. 5, and the corresponding value features (VH and VV) are routed to form corresponding anchor points, positive samples and negative samples. The positive sample and the negative sample are positioned in the source domain face thermal infrared input image, and the anchor point is positioned in the target domain face visible light synthesized image, so that corresponding global contrast loss is established.
Step 2.2 for the local attention method, the difference is that the local attention uses a k×k window with constant size, and the sliding inquiry with the step length of 1 is performed on the source domain face thermal infrared input image, so that the space information interaction and connection in the local area can be enhanced. The feature map of the thermal infrared image of the face is first passed through reshape and transdose to obtain two-dimensional matrices QH∈RHW×C and VH∈RHW×C, unlike global attention where QH is multiplied by its local transposeObtaining a matrix, and then carrying out Softmax normalization operation on each row of the matrix to obtain a local attention matrixThe significance of the features is also measured according to the entropy value Hs of each line in Aglobal, then, like the global attention, the front N lines are selected according to the ascending order of the entropy values to form a selected local attention matrix Alocal-s, and then, the values of the source domain face thermal infrared image are respectively routedAnd the value of the target domain face visible light combined imageAnd finally constructing corresponding multi-layer local contrast loss.
Establishing a contrast loss function LCon according to the source domain face thermal infrared image and the target domain face visible light synthetic image:
Where τ=0.07 is a super parameter, V is an anchor point of the target domain face visible light combined image, and H+ and H- are a positive sample and N-1 negative samples in the source domain face thermal infrared image, respectively. The comparative loss is noted as LConH.
A positive sample and N-1 negative samples are also selected from the face visible light synthetic image of the target domain, which is similar to identity retention loss, so that the similarity of the characteristics and the structure of G (H) and H is ensured, and the excessive change of the face synthetic image by a generator is prevented. The loss function is denoted as LConG(H).
Extracting corresponding face gradient map from the target domain face visible light synthesized image IG(H) and the face visible light truth image IV (GT)AndGradient enhancement loss LGm was constructed. The gradient enhancement loss can reduce the generation of human face artifacts and keep better human face contour information.
Considering that many thermal infrared-visible light datasets provide corresponding paired images in both fields, the unsupervised algorithm can be extended by enforcing additional constraints, thereby minimizing the "L1 distance" of the composite image from the real visible light image. This supervised penalty is complementary to the unsupervised penalty of the CUT, while this additional regularization may supplement the unsupervised image synthesis algorithm. This loss is referred to as pixel level coincidence loss and is denoted as LPcl. Wherein IV is the corresponding face visible light truth image (GT).
Lpcl=||IG(H)-IV||1
The resistance loss between the generator G and the arbiter D is represented by the thermal infrared input image of the face as IH, and the pseudo visible image generated by the face generation network model is represented by IG(H). The skin color label information condition represents Z, skin color information is divided into one-hot coding representations, and category labels and image data are subjected to cat operation as input during training. The face analysis chart (face priori information condition) is represented as P, and ζ is represented as minimum L1 loss of the face visible light synthesized image and the real face visible light image synthesized by generating the network. The method comprises the steps of generating a mapping relation between a network based on face priori information and face label conditions: So its generation fight loss is:
the total loss function of the model is:
L=λ1LConH2LConG(H)3LPcl4LGm5Lgan
Wherein lambda1、λ2、λ3、λ4、λ5 is the super-parameters of contrast learning loss, identity preservation contrast learning loss, gradient enhancement loss, pixel level consistency loss and generation contrast loss respectively.
Under the contrast learning framework, the invention provides local texture information for guiding and generating a network learning face image based on a face analysis chart as priori knowledge. A space feature transformation mapping layer (STL) is mainly introduced, and generates a pair of modulation parameters based on the prior condition of the mapping features of the face analysis chart, and affine transformation is carried out on the face features of the generation network according to the modulation parameters, so that the generation of the face image is adaptively optimized. Meanwhile, the invention designs a method for reducing the generation of the face artifacts by losing the face gradient enhancement loss. In addition, the invention provides that skin color label condition information is added to the inputted human face thermal infrared image and the paired visible light image, so that the image generates corresponding skin color information which is restored as far as possible.
While the foregoing is directed to embodiments, aspects and advantages of the present invention, other and further details of the invention may be had by the foregoing description, it will be understood that the foregoing embodiments are merely exemplary of the invention, and that any changes, substitutions, alterations, etc. which may be made herein without departing from the spirit and principles of the invention.

Claims (9)

Translated fromChinese
1.一种基于先验信息的人脸热红外-可见光图像转换方法,其特征在于,包括:获取待转换的人脸红外图像和相应的可见光配对图像,将人脸红外图像输入到训练好的基于先验信息的人脸热红外-可见光生成网络模型,得到人脸可见光合成图像;基于先验信息的人脸热红外-可见光生成网络模型包括人脸解析图条件网络模块、空间特征变换映射层、注意力模块、生成器网络模块以及判别器;1. A method for converting thermal infrared to visible light facial images based on prior information, comprising: obtaining an infrared facial image to be converted and a corresponding paired visible light image; inputting the infrared facial image into a trained thermal infrared to visible light generative network model based on prior information to obtain a composite visible light facial image; the thermal infrared to visible light generative network model based on prior information includes a facial parsing graph conditional network module, a spatial feature transformation mapping layer, an attention module, a generator network module, and a discriminator;对基于先验信息的人脸热红外-可见光生成网络模型进行训练的过程包括:The process of training the face thermal infrared-visible light generation network model based on prior information includes:S1:获取热红外-可见光图像数据集,对数据集中的图像进行肤色信息标签分类;S1: Obtain a thermal infrared-visible light image dataset and classify the skin color information labels of the images in the dataset;S2:对数据集中的配对图像进行预处理,并将预处理后的图像输入到基于先验信息的人脸热红外-可见光生成网络模型中;S2: Preprocess the paired images in the dataset and input the preprocessed images into the face thermal infrared-visible light generation network model based on prior information;S3:采用人脸解析图条件网络模块对预处理后的人脸解析图进行特征提取,得到人脸先验信息特征;S3: Use the face parsing graph conditional network module to extract features from the preprocessed face parsing graph to obtain the face prior information features;S4:采用生成器网络模块对预处理后的人脸热红外图像进行特征提取与编码,得到编码后的人脸特征信息;S4: Using the generator network module to extract and encode the preprocessed facial thermal infrared image to obtain encoded facial feature information;S5:将人脸先验信息特征和编码后的人脸特征信息输入到空间特征变换映射层中,生成一对调制参数;根据调制参数对编码后的人脸特征信息进行映射,得到映射后的人脸特征信息;S5: Inputting the facial prior information features and the encoded facial feature information into the spatial feature transformation mapping layer to generate a pair of modulation parameters; mapping the encoded facial feature information according to the modulation parameters to obtain mapped facial feature information;S6:人脸特征信息经过多层的残差变换以及解码器得到对应的人脸可见光合成图像,将人脸可见光合成图像输入到判别器中进行判别训练;S6: The facial feature information undergoes multiple layers of residual transformation and decoder to obtain the corresponding visible light synthetic face image, which is then input into the discriminator for discrimination training.S7:将人脸可见光合成图像和人脸热红外图像输入到编码器和两层MLP映射层,得到对应的人脸热红外图像特征和人脸可见光合成图像特征;将人脸热红外图像特征和人脸可见光合成图像特征输入注意力模块进行对比学习,根据对比学习公式计算模型的损失函数;S7: Input the visible light composite face image and the thermal infrared face image into the encoder and two MLP mapping layers to obtain the corresponding thermal infrared face image features and visible light composite face image features; input the thermal infrared face image features and the visible light composite face image features into the attention module for contrastive learning, and calculate the model loss function according to the contrastive learning formula;S8:采用Adam优化器对模型的参数进行优化,当模型的损失函数最小时,输出最优参数,得到最优的基于先验信息的人脸热红外-可见光生成网络模型。S8: The Adam optimizer is used to optimize the model parameters. When the loss function of the model is minimized, the optimal parameters are output to obtain the optimal face thermal infrared-visible light generation network model based on prior information.2.根据权利要求1所述的一种基于先验信息的人脸热红外-可见光图像转换方法,其特征在于,对数据集中的人脸图进行像的肤色信息标签分类包括:标签0,1,2分别表示肤色为白色、黄色以及黑色;将标签信息加入模型中学习,并采用one-hot独热编码。2. According to the method of converting thermal infrared to visible light images of faces based on prior information in claim 1, the method is characterized in that the skin color information labels of the facial images in the dataset are classified, including: labels 0, 1, and 2 represent skin colors of white, yellow, and black, respectively; the label information is added to the model for learning and one-hot encoding is used.3.根据权利要求1所述的一种基于先验信息的人脸热红外-可见光图像转换方法,其特征在于,对配对的图像进行预处理包括:对数据集中的图像数据进行人脸对齐,采用RetinaFace人脸检测算法定位人脸的5个关键点,根据定位的关键点对人脸图像进行裁剪;将处理好的图像resize为256×256大小的图像。3. The method for converting thermal infrared to visible light facial images based on prior information according to claim 1 is characterized in that preprocessing the paired images includes: performing face alignment on the image data in the dataset, locating five key points of the face using the RetinaFace face detection algorithm, and cropping the facial image based on the located key points; and resizing the processed image to a 256×256 image.4.根据权利要求1所述的一种基于先验信息的人脸热红外-可见光图像转换方法,其特征在于,人脸解析图条件网络模块包括3个1×1卷积核的卷积层,将预处理后的人脸解析图像进行3层卷积处理,得到人脸先验信息特征。4. The method for converting thermal infrared to visible light facial images based on prior information according to claim 1 is characterized in that the facial parsing graph conditional network module includes three convolutional layers with 1×1 convolution kernels, and the preprocessed facial parsing image is subjected to three layers of convolution processing to obtain facial prior information features.5.根据权利要求1所述的一种基于先验信息的人脸热红外-可见光图像转换方法,其特征在于,生成器网络模块包括编码器Genc、转换器以及解码器Gdec;编码器Genc主要由3层CIR组成,每一层CIR是由一个卷积、InstanceNorm归一化、ReLU激活函数构成,通过编码器对输入图像进行特征提取;转换器由9个Residual block组成,每个Residual block依次由一个空间特征变换映射层STL、CIR层构成,这种残差模块主要是对编码器所提取出的特征图进行增强处理;解码器Gdec包括两个CTIR层、一个Reflect操作层、一个卷积层,其中CTIR层依次由反卷积、InstanceNorm归一化、ReLU激活函数组成;解码器的作用是上采样操作,将学习到的人脸特征逐渐重建至原图像大小。5. The method for converting a facial thermal infrared image to a visible light image based on prior information according to claim 1 is characterized in that the generator network module includes an encoderGenc , a converter, and a decoderGdec ; the encoderGenc mainly consists of three CIR layers, each CIR layer consisting of a convolution, InstanceNorm normalization, and a Relu activation function, and the encoder extracts features from the input image; the converter consists of nine residual blocks, each of which is sequentially composed of a spatial feature transformation mapping layer (STL) and a CIR layer, and this residual module mainly enhances the feature map extracted by the encoder; the decoderGdec includes two CTIR layers, a Reflect operation layer, and a convolution layer, wherein the CTIR layer is sequentially composed of deconvolution, InstanceNorm normalization, and a Relu activation function; the decoder performs an upsampling operation to gradually reconstruct the learned facial features to the original image size.6.根据权利要求1所述的一种基于先验信息的人脸热红外-可见光图像转换方法,其特征在于,采用空间特征变换映射层对输入的人脸特征进行处理的过程包括:对人脸先验信息特征和编码后的人脸特征信息分别通过两层卷积操作,得到一对参数α和β;将生成的参数对与生成网络每一层的特征输出GF先做点乘,后做加法运算,从而得到整个STL网络的输出。6. The method for converting a facial thermal infrared image to a visible light image based on prior information according to claim 1 is characterized in that the process of processing the input facial features using the spatial feature transformation mapping layer includes: performing two-layer convolution operations on the facial prior information features and the encoded facial feature information respectively to obtain a pair of parameters α and β; performing a dot multiplication on the generated parameter pair and the feature outputGF of each layer of the generative network, and then performing an addition operation to obtain the output of the entire STL network.7.根据权利要求1所述的一种基于先验信息的人脸热红外-可见光图像转换方法,其特征在于,采用注意力模块对人脸热红外输入图像和人脸可见光合成图像进行对比学习的过程包括:7. The method for converting a thermal infrared face image to a visible light face image based on prior information according to claim 1, wherein the process of using an attention module to perform comparative learning on the thermal infrared face input image and the visible light face composite image comprises:S71:将人脸热红外图像特征和人脸可见光合成图像特征进行多层特征提取;即将人脸热红外图像特征和人脸可见光合成图像特征各通过一个编码器Genc和一个两层的MLP网络层Hl分别得到人脸特征谱FH∈RC×H×W和FV∈RC×H×WS71: Perform multi-layer feature extraction on the thermal infrared image features and the visible light composite image features of the face; that is, the thermal infrared image features and the visible light composite image features of the face are respectively passed through an encoderGenc and a two-layer MLP network layerHl to obtain facial feature spectraFH∈RC×H×W andFV∈RC×H×W respectively;S72:将人脸热红外图像的特征谱进行reshape和transpose操作,得到二维矩阵QH∈RHW×C和VH∈RHW×CS72: reshape and transpose the characteristic spectrum of the thermal infrared image of the human face to obtain two-dimensional matrices QHRHW×C and VHRHW×C ;S73:根据二维矩阵QH∈RHW×C和VH∈RHW×C构建全局注意力对比损失。S73: Construct the global attention contrastive loss according to the two-dimensional matricesQH∈RHW×C andVH∈RHW×C .8.根据权利要求7所述的一种基于先验信息的人脸热红外-可见光图像转换方法,其特征在于,构建全局注意力对比损失的过程包括:将QH乘以它的转置KH∈RC×HW得到一个矩阵,将该矩阵的每一行进行Softmax归一化操作,从而得到一个全局注意力矩阵Aglobal∈RHW×HW;根据熵值计算公式计算全局注意力矩阵中每一行的熵值Hs,并以计算的熵值大小对全局注意力矩阵中的每行数据进行升序排列;根据排序后的矩阵分别路由源域人脸热红外图像和目标域人脸可见光合成图像中的VH∈RHW×C和VV∈RHW×C特征;最后路由相应的值特征VH和VV来构建全局对比损失。8. The method for converting thermal infrared to visible light images of faces based on prior information according to claim 7, wherein the process of constructing a global attention contrast loss comprises: multiplying QH by its transpose KHRC×HW to obtain a matrix, performing a Softmax normalization operation on each row of the matrix to obtain a global attention matrix Aglobal ∈ RHW×HW ; calculating the entropy value Hs of each row in the global attention matrix according to an entropy calculation formula, and sorting each row of data in the global attention matrix in ascending order according to the calculated entropy value; routing VHRHW×C and VVRHW×Cfeatures in the source domain thermal infrared face image and the target domain visible light synthetic face image respectively according to the sorted matrix; and finally routing the corresponding value features VH and VV to construct the global contrast loss.9.根据权利要求1所述的一种基于先验信息的人脸热红外-可见光图像转换方法,其特征在于,模型的损失函数为:9. The method for converting thermal infrared to visible light images of faces based on prior information according to claim 1, wherein the loss function of the model is:L=λ1LConH2LConG(H)3LPcl4LGm5LganL=λ1 LConH2 LConG(H)3 LPcl4 LGm5 Lgan其中,λ1、λ2、λ3、λ4、λ5分别为对比学习损失、身份保持对比学习损失、梯度增强损失、像素级一致损失、生成对抗损失的超参数。Among them, λ1 , λ2 , λ3 , λ4 , and λ5 are the hyperparameters of contrastive learning loss, identity-preserving contrastive learning loss, gradient enhancement loss, pixel-level consistency loss, and generative adversarial loss, respectively.
CN202211325764.9A2022-10-272022-10-27 A method for converting thermal infrared to visible light images of faces based on prior informationActiveCN115661900B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202211325764.9ACN115661900B (en)2022-10-272022-10-27 A method for converting thermal infrared to visible light images of faces based on prior information

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202211325764.9ACN115661900B (en)2022-10-272022-10-27 A method for converting thermal infrared to visible light images of faces based on prior information

Publications (2)

Publication NumberPublication Date
CN115661900A CN115661900A (en)2023-01-31
CN115661900Btrue CN115661900B (en)2025-08-08

Family

ID=84993297

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202211325764.9AActiveCN115661900B (en)2022-10-272022-10-27 A method for converting thermal infrared to visible light images of faces based on prior information

Country Status (1)

CountryLink
CN (1)CN115661900B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103268485A (en)*2013-06-092013-08-28上海交通大学 A Face Recognition Method Based on Sparse Regularization to Realize Fusion of Multi-Band Face Image Information
CN107341481A (en)*2017-07-122017-11-10深圳奥比中光科技有限公司It is identified using structure light image

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP3616622B2 (en)*2002-08-262005-02-02株式会社東芝 Infrared imaging device
CN101202845B (en)*2007-11-142011-05-18北京大学Method for changing infrared image into visible light image and device
CN113298094B (en)*2021-06-102022-11-04安徽大学 An RGB-T Salient Object Detection Method Based on Modality Correlation and Dual Perceptual Decoder

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103268485A (en)*2013-06-092013-08-28上海交通大学 A Face Recognition Method Based on Sparse Regularization to Realize Fusion of Multi-Band Face Image Information
CN107341481A (en)*2017-07-122017-11-10深圳奥比中光科技有限公司It is identified using structure light image

Also Published As

Publication numberPublication date
CN115661900A (en)2023-01-31

Similar Documents

PublicationPublication DateTitle
CN113283444B (en)Heterogeneous image migration method based on generation countermeasure network
CN111709902B (en)Infrared and visible light image fusion method based on self-attention mechanism
CN112418095B (en) A method and system for facial expression recognition combined with attention mechanism
CN111985405B (en) A face age synthesis method and system
CN111145131A (en)Infrared and visible light image fusion method based on multi-scale generation type countermeasure network
Wang et al.UIE-convformer: Underwater image enhancement based on convolution and feature fusion transformer
CN108648197A (en)A kind of object candidate area extracting method based on image background mask
Liu et al.Multiscale underwater image enhancement in RGB and HSV color spaces
CN113706407B (en) Infrared and visible light image fusion method based on separation and characterization
CN116137043B (en)Infrared image colorization method based on convolution and transfomer
CN112990340B (en)Self-learning migration method based on feature sharing
CN113128424A (en)Attention mechanism-based graph convolution neural network action identification method
Xu et al.Depth map denoising network and lightweight fusion network for enhanced 3D face recognition
CN114972378A (en)Brain tumor MRI image segmentation method based on mask attention mechanism
CN117333410B (en) Infrared and visible light image fusion method based on Swin Transformer and GAN
CN113706404B (en) A method and system for correcting face images at depressed angles based on self-attention mechanism
Lan et al.An optimized GAN method based on the Que-Attn and contrastive learning for underwater image enhancement
Huang et al.RDCa-Net: Residual dense channel attention symmetric network for infrared and visible image fusion
Zhou et al.Attention transfer network for nature image matting
CN114663802B (en) Cross-modal video migration method of surveillance videos based on feature spatiotemporal constraints
Shihabudeen et al.Deep learning L2 norm fusion for infrared & visible images
Yao et al.A forecast-refinement neural network based on DyConvGRU and U-Net for radar echo extrapolation
CN115049739A (en)Binocular vision stereo matching method based on edge detection
CN115661900B (en) A method for converting thermal infrared to visible light images of faces based on prior information
CN117097876B (en)Event camera image reconstruction method based on neural network

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp