Background
In criminal investigation pursuit, the public security department is provided with a citizen photo database, and the citizen photo database is combined with a face recognition technology to determine the identity of a criminal suspect, but in practice, the criminal suspect photo is difficult to obtain, but a sketch portrait of the criminal suspect can be obtained under the cooperation of a painter and a witness, so that subsequent face retrieval and recognition can be carried out. Because of the great difference between the portrait and the common face photograph, it is difficult to obtain satisfactory recognition effect by directly using the traditional face recognition method. The photos in the citizen photo database are combined into the portraits, so that the difference of the textures of the citizen can be effectively reduced, and the recognition rate is further improved.
Gao et al, in their published paper "X.Gao, J.Zhou, D.Tao, and X.Li," neuro facial sketch, vol.71, No.10-12, pp.1921-1930, Jun.2008, propose to use an embedded hidden Markov model to generate a pseudo-portrait. The method comprises the steps of firstly blocking photos and portraits in a training library, then modeling corresponding photo blocks and portraits by using an embedded hidden Markov model, giving a photo randomly, blocking the photos randomly, and selecting models generated by partial blocks for generating pseudo portraits and fusing the pseudo portraits by using a selective integration idea for any block to obtain a final pseudo portraits. The method has the disadvantages that because the method adopts a selective integration technology, the generated pseudo-portrait is subjected to weighted average, so that the background is not clean and the details are not clear, and the quality of the generated portrait is further reduced.
Zhou et al, published In the paper "H.Zhou, Z.Kuang, and K.Wong.Markov weight fields for Face Sketch Synthesis" (In Proc.IEEE int.conference on computer vision, pp.1091-1097,2012), propose a Face Sketch Synthesis method based on the Markov weight field. The method comprises the steps of uniformly partitioning a training image and an input test image, and searching a plurality of neighbors of any test image block to obtain a candidate block in the form of an image to be synthesized. And then modeling the test image block, the neighbor block and the candidate image block by using a Markov picture model to obtain a reconstruction weight. And finally, reconstructing a composite image block by using the reconstruction weight and the candidate image block, and splicing to obtain a composite image. The method has the disadvantages that the image block characteristics use original pixel information, the representation capability is insufficient, and the influence of environmental noise such as illumination is large.
A face portrait synthesizing method based on a directional diagram model is disclosed in a patent of 'face portrait synthesizing method based on a directional diagram model' applied by the university of electronic science and technology of Western Ann (application number: CN201610171867.2, application date: 2016.03.24, application publication number: CN 105869134A). The method comprises the steps of uniformly partitioning a training image and an input test image, and searching a plurality of adjacent photo blocks and corresponding adjacent photo blocks of any test photo block. Then extracting direction characteristics of the test photo block and the adjacent photo blocks. Then, a Markov picture model is used for modeling the direction characteristics of the test picture block and the adjacent picture blocks, and the reconstruction weight of the synthesized picture block reconstructed by the adjacent picture blocks is obtained. And finally, reconstructing a composite image block by using the reconstruction weight and the neighboring image block, and splicing to obtain the composite image. The method has the disadvantages that the image block features use high-frequency features of artificial design, the self-adaptive capacity is insufficient, and the features are not fully learned.
Disclosure of Invention
The present invention is directed to overcome the above-mentioned deficiencies of the prior art, and to provide a method for synthesizing a face image based on depth map model feature learning, which can synthesize a high-quality image that is not affected by environmental noise such as illumination.
The specific steps for realizing the purpose of the invention are as follows:
(1) generating a sample set:
(1a) m face photos are taken out from the face photo sample set to form a training face photo sample set, wherein M is more than or equal to 2 and less than or equal to U-1, and U represents the total number of the face photos in the sample set;
(1b) forming a testing face photo set by the remaining face photos in the face photo sample set;
(1c) taking face pictures corresponding to the face photos of the training face photo sample set one by one from the face picture sample set to form a training face picture sample set;
(2) generating an image block set:
(2a) randomly selecting a test face photo from the test face photo set, dividing the test face photo into photo blocks with the same size and the same overlapping degree, and forming a test photo block set;
(2b) dividing each photo in the training face photo sample set into photo blocks with the same size and the same overlapping degree to form a training photo sample block set;
(2c) dividing each portrait in a training face portrait sample set into portrait blocks with the same size and the same overlapping degree to form a training portrait sample block set;
(3) extracting depth features:
(3a) inputting all photo blocks in the training photo block set and the test photo block set into a deep convolution network VGG for object recognition which is trained on an object recognition database ImageNet, and carrying out forward propagation;
(3b) taking a 128-layer feature map output by the middle layer of the deep convolutional network VGG as the depth feature of the photo block, wherein the coefficient of each layer of the feature map is u
i,lAnd is and
where, Σ denotes a summation operation, i denotes a sequence number of a test picture block, i ═ 1, 2.., N denotes a total number of test picture blocks, l denotes a sequence number of a feature map, and l ═ 1.., 128;
(4) solving the face image block reconstruction coefficient:
(4a) using K neighbor search algorithm, finding out 10 neighbor training photo blocks which are most similar to each test photo block in the training photo sample block set, and simultaneously selecting out the training photo blocks which are most similar to the neighbor training photo blocks from the training photo sample block set10 neighboring training image blocks corresponding to the image blocks one by one, wherein the coefficient of each neighboring training image block is w
i,kWherein, in the step (A),
k represents a training image block number, k is 1.
(4b) Using a Markov graph model formula to carry out depth feature on all test photo blocks, depth features of all neighbor training photo blocks, all neighbor training photo blocks and coefficients u of a depth feature graphi,lCoefficient w of neighboring training image blocki,kModeling;
(4c) solving the Markov graph model formula to obtain the face image block reconstruction coefficient wi,k;
(6) Reconstructing the face image block:
10 neighboring training image blocks corresponding to each test photo block and respective coefficients wi,kMultiplying, summing results after multiplying, and taking the result as a reconstructed face image block corresponding to each test image block;
(7) synthesizing a face portrait:
and splicing the reconstructed face image blocks corresponding to all the test image blocks to obtain a synthesized face image.
Compared with the prior art, the invention has the following advantages:
1, because the depth features extracted from the depth convolution network are used for replacing original pixel value information of the image block, the problems that the feature representation capability used in the prior art is insufficient and is greatly influenced by environmental noise such as illumination and the like are solved, and the method has the advantage of robustness to the environmental noise such as illumination and the like.
2, because the invention uses the Markov picture model to jointly model the depth characteristic picture coefficient and the face image block reconstruction coefficient, the problems of unclean background and unclear details of the face image synthesized by the prior art are overcome, and the invention has the advantages of clean background and clear details of the synthesized face image.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1, the specific steps of the present invention are as follows.
Step 1, generating a sample set.
And taking M face photos from the face photo sample set to form a training face photo sample set, wherein M is more than or equal to 2 and less than or equal to U-1, and U represents the total number of the face photos in the sample set.
And forming a test face photo set by the face photos left in the face photo sample set.
And (4) taking the face pictures which correspond to the face pictures of the training face picture sample set one by one from the face picture sample set to form a training face picture sample set.
And 2, generating an image block set.
And randomly selecting a test face photo from the test face photo set, dividing the test face photo into photo blocks with the same size and the same overlapping degree, and forming a test photo block set.
Dividing each photo in the training face photo sample set into photo blocks with the same size and the same overlapping degree to form a training photo sample block set.
Each face image in the training face image sample set is divided into image blocks with the same size and the same overlapping degree to form a training image sample block set.
The overlapping degree means that the area of the overlapping area between two adjacent image blocks is 1/2 of the area of each image block.
And 3, extracting depth features.
And inputting all the photo blocks in the training photo block set and the test photo block set into a deep convolution network VGG for object recognition which is trained on an object recognition database ImageNet, and carrying out forward propagation.
Taking a 128-layer feature map output by the middle layer of the deep convolutional network VGG as the depth feature of the photo block, wherein the coefficient of each layer of the feature map is ui,lAnd is andwhere, Σ denotes a summation operation, i denotes a sequence number of the test picture block, i ═ 1, 2.., N denotes a total number of the test picture blocks, l denotes a sequence number of the feature map, and l ═ 1.., 128.
The middle layer refers to the activation function layer of the deep convolutional network VGG.
And 4, solving the face image block reconstruction coefficient.
Using K neighbor search algorithm, finding out 10 neighbor training picture blocks which are most similar to each test picture block from the training picture sample block set, and simultaneously selecting 10 neighbor training picture blocks which are in one-to-one correspondence with the neighbor training picture blocks from the training picture sample block set, wherein the coefficient of each neighbor training picture block is w
i,kWherein, in the step (A),
k denotes a training image block number, k 1.
The K neighbor search algorithm comprises the following specific steps:
step one, calculating Euclidean distances between the depth feature vector of each test photo block and the depth feature vectors of all training photo blocks;
secondly, sequencing all the training photo blocks according to the sequence of the Euclidean distance values from small to large;
and thirdly, selecting the first 10 training photo blocks as neighbor training photo blocks.
Using a Markov graph model formula to carry out depth feature on all test photo blocks, depth features of all neighbor training photo blocks, all neighbor training photo blocks and coefficients u of a depth feature graphi,lCoefficient w of neighboring training image blocki,kAnd (6) modeling.
The formula of the Markov graph model is as follows:
wherein min represents minimum operation, Σ represents summation operation, | y | | | Y calculation2Representing a modulo squaring operation, wi,kCoefficient of k-th neighboring training image block representing ith test image block, oi,kA pixel value vector, w, representing the overlapping portion of the k-th neighboring training image block of the i-th test image blockj,kCoefficient of j-th neighboring training image block representing j-th test image block, oj,kA pixel value vector, u, representing the overlapping portion of the k-th neighboring training image block of the jth test image blocki,lCoefficients of the l-th layer depth feature map representing the depth features of the i-th test picture block, dl(xi) L-th layer feature map representing depth features of the i-th test photo block, dl(xi,k) An l-th layer feature map representing depth features of a k-th neighboring training picture block of the i-th test picture block.
Solving the Markov graph model formula to obtain the face image block reconstruction coefficient wi,k。
And 5, reconstructing the face image block.
10 neighboring training image blocks corresponding to each test photo block and respective coefficients wi,kMultiplying, and summing the results after multiplication to obtain a reconstructed face image block corresponding to each test image block.
And 6, synthesizing the face portrait.
And splicing the reconstructed face image blocks corresponding to all the test image blocks to obtain a synthesized face image.
The method for splicing the reconstructed image blocks corresponding to all the test image blocks comprises the following steps:
firstly, placing reconstructed picture blocks corresponding to all test picture blocks at different positions of the picture according to the positions of the reconstructed picture blocks;
secondly, taking the average value of the pixel values of the overlapped parts between two adjacent reconstructed face image blocks;
and thirdly, replacing the pixel value of the overlapped part between the two adjacent reconstructed face image blocks by the average value of the pixel values of the overlapped part between the two adjacent reconstructed face image blocks to obtain the synthesized face image.
The effects of the present invention are further illustrated by the following simulation experiments.
1. Simulation experiment conditions are as follows:
the computer configuration environment of the simulation experiment is Intel (R) Core i7-47903.6GHZ and an internal memory 16G, Linux operating system, the programming language uses Python, and the database adopts the CUHK student database of hong Kong Chinese university.
The prior art comparison method used in the simulation experiment of the present invention includes the following two methods:
one is a method based on local linear embedding, and is marked as LLE in an experiment; the reference is "Q.Liu, X.Tang, H.jin, H.Lu, and S.Ma" (A Nonlinear apparatus for Face Sketch Synthesis and registration. in Proc. IEEE int. conference on Computer Vision, pp.1005-1010,2005);
the other method is a Markov weight field model-based method, and is marked as MWF in the experiment; the reference is "H.Zhou, Z.Kuang, and K.Wong.Markov Weight Fields for Face Sketch Synthesis" (InProc.IEEE int. conference on Computer Vision, pp.1091-1097,2012).
2. Simulation experiment contents:
the invention has a group of simulation experiments.
And (3) synthesizing an image on a CUHK student database, and comparing the image with an image synthesized by a local linear embedded LLE and Markov weight field model MWF method.
3. Simulation experiment results and analysis:
the results of the simulation experiment of the present invention are shown in FIG. 2, in which FIG. 2(a) is a test photograph taken arbitrarily from a sample set of test photographs, FIG. 2(b) is a picture synthesized using the prior art local linear embedding LLE method, FIG. 2(c) is a picture synthesized using the prior art Markov weight field model MWF method, and FIG. 2(d) is a picture synthesized using the method of the present invention.
As can be seen from fig. 2, because the depth feature is used to replace the original pixel value information of the image block, the method has better robustness to environmental noise such as illumination, and therefore, for a picture greatly influenced by illumination, compared with the local linear embedding LLE and markov weight field model MWF methods, the synthesized picture has higher quality and less noise.