Disclosure of Invention
The technical problem to be solved by the present invention is to provide a watermark removing method based on a generative countermeasure network, which realizes automatic watermark removal and improves the watermark removing effect.
The invention is realized by the following steps: a watermark removing method based on a generative countermeasure network comprises the following steps:
step S10, building a generator based on the recursive attention cycle network and the context automatic encoder;
step S20, building a discriminator based on the recursion attention cycle network and PatchGAN;
step S30, inputting a plurality of watermark sample pictures into a conditional generation type countermeasure network composed of the generator and the discriminator to carry out countermeasure training;
and step S40, inputting the watermark picture into the generator after the countermeasure training to generate a watermark-removed picture.
Further, in the step S10,
the recursive attention cycle network comprises at least two layers of ResNet, a convolution LSTM unit and an attention distribution graph for generating an attention distribution graph
Convs of the convolution layer of (a); wherein N is a positive integer; the recursive attention cycle network is used for positioning the area needing removing the watermark;
the context auto-encoder consists of a U-Net structure of 16 Conv-relu blocks for de-watermarking the recursively attention-cycled network located regions.
Further, in step S10, the generated network loss function of the generator is:
LG=10-2LGAN(O)+LATT({A},M)+LM({S},{T})+LP(O,T);
LGAN(O)=log(1-D(O));
LP(O,T)=LMSE(VGG(O),VGG(T));
wherein L isGA loss value representing the generated network; o represents the watermark-removed picture generated by the generator; t represents a waterless picture corresponding to O; d represents a discrimination network; m represents a binary mask; l isGAN(O) represents a loss function of the generated network; l isATT({ A }, M) represents the loss function of the recursive attention-cycle network, attention profile A, output at time step ttMean square error with binary mask M, N is 5, θ is 0.9; l isMSE(. to) represents the mean square error; l isM({ S }, { T }) represents the multi-scale loss function of the context autocoder, SiRepresenting the ith output, T, extracted from the context autocoderiShows the reduction of the waterless printed picture to SiSame size, λiWeights representing different size pictures; l isPAnd (O, T) represents a perceptual loss function of the context automatic encoder, namely, a plurality of features are extracted from the pictures O and T by using a trained feature network VGG, and the sum is obtained after the mean square error is calculated.
Further, in step S20, the discriminant network loss function of the discriminator is:
LD(T,O)=-log(D(T))-log(1-D(O))+γLmap(O,T,AN);
Lmap(O,T,AN)=LMSE(Dmap(O),AN)+LMSE(Dmap(T),0);
wherein L isD(T, O) represents a loss value for discriminating the network; gamma denotes Lmap(O,T,AN) The weight lost; l ismap(O,T,AN) Representing a difference between an attention mask generated by one of the layers of the arbiter and the attention profile; dmap() represents the process by which the arbiter generates the attention mask; 0 represents an attention profile that contains only 0 values.
Further, the step S30 specifically includes:
step S31, inputting a plurality of watermark sample pictures into the generator to generate a watermark sample picture;
step S32, inputting the watermark sample picture and the watermark removed sample picture into a discriminator;
step S33, judging whether all the water mark removing sample pictures are true and whether the water mark removing sample pictures are matched with the corresponding water mark sample pictures by the discriminator, if so, finishing the confrontation training, and entering step S40; if not, the process proceeds to step S31 to continue the countermeasure training.
Further, in step S30, the objective function of the conditional adversary network is:
wherein L is
cGAN(G, D) represents an objective function of the condition generating countermeasure network; s represents a watermark sample picture; x represents a real picture corresponding to the watermark sample picture; d (s, x) represents inputting the watermark sample picture and the real picture into the discriminator; d (s, g (s)) represents inputting the watermark sample picture and the de-watermark sample picture generated by the generator into the discriminator;
representing an expectation of joint distribution of the watermark sample picture and the corresponding real picture;
indicating the desirability of de-watermark sample picture distribution.
The invention has the advantages that:
1. the method comprises the steps of inputting a watermark sample picture into a conditional generation type countermeasure network consisting of a generator and a discriminator to perform countermeasure training, and then inputting the watermark picture into the generator after the countermeasure training to generate a watermark removing picture, namely, the generator after the countermeasure training is used for automatically removing the watermark in batch, so that the watermark removing efficiency is improved, and the method is suitable for processing large-batch images with complex background and complex watermark; the generator is built based on the recursive attention cycle network and the context automatic encoder, the discriminator is built based on the recursive attention cycle network and the PatchGAN, namely the generator generates an attention distribution map through the recursive attention cycle network for positioning the region needing to be removed with the watermark, the context automatic encoder removes the watermark based on the positioned region, the discriminator concentrates attention on the region needing to be removed with the watermark based on the attention distribution map, the watermark region does not need to be marked in advance like the prior art, all regions with the watermark can be automatically noticed, the method is suitable for the picture with the complex watermark, the automatic watermark removal is finally realized, and the removal effect of the watermark is greatly improved.
2. The conditional generation type countermeasure network (C-GAN) replaces the traditional generation type countermeasure network (GAN), the generator inputs the watermark sample picture and the watermark removing sample picture into the discriminator, and the discriminator not only needs to judge the authenticity of the watermark removing sample picture, but also needs to judge whether the watermark removing sample picture is matched with the watermark sample picture, so that the watermark removing effect is greatly improved.
3. The PatchGAN replaces the traditional GAN to build a discriminator, the influence of different parts of the image on the discriminator is considered, the trained model can pay more attention to the detailed part of the image, the overall difference representation which is more accurate than the single scalar output is realized, and the local feature and the overall feature of the image are fused.
Detailed Description
The technical scheme in the embodiment of the application has the following general idea: the traditional picture watermark removing method is converted into an image conversion task, the watermarked picture is converted into the watermark removing picture, namely the watermark removing picture generated by the generator is real enough through continuous countertraining between the generator and the discriminator, and therefore the ideal watermark removing effect is achieved.
Referring to fig. 1 to 4, a preferred embodiment of a watermark removing method based on a generative countermeasure network according to the present invention includes the following steps:
step S10, based on the recursion attention cycle network and the context automatic encoder building Generator (Generator); the input of the context automatic encoder is a watermark picture and an attention distribution map generated by a recursive attention circulation network, and the output is a watermark-removed picture;
step S20, building a Discriminator (Discriminator) based on the recursion attention cycle network and PatchGAN; the generator is used for generating a watermark removing picture, and the discriminator is used for judging the truth of the watermark removing picture;
since the GAN discriminator is to map the input information into a real number, i.e., the probability that the input sample is a real sample, the Patchgan discriminator is to map the input information into a Patch (matrix) X of N × Ni,j,Xi,jThe value of (D) represents the probability that each Patch is a true sample, Xi,jAnd the average value is the final output of the discriminator. Xi,jThe method is characterized in that a feature map output by a convolution layer is used, a certain position of an original image can be traced from the feature map, and the influence of the position on a final output result can be seen from the result of a discriminator, so that the discriminator can pay more attention to the detailed part of a generated image, namely, the discriminator is more sensitive to high frequency, and the PatchGAN is used for replacing the traditional GAN to build the discriminator.
Step S30, inputting a plurality of watermark sample pictures into a conditional generation type confrontation network (C-GAN) composed of the generator and the discriminator for confrontation training;
the traditional generation type countermeasure network (GAN) only needs to judge the authenticity of the watermark sample picture, can not ensure whether the input watermark sample picture generates the corresponding watermark sample picture, and is easy to drill empty bits;
and step S40, inputting the watermark picture into the generator after the countermeasure training to generate a watermark-removed picture.
In the step S10, in the above step,
each time step of the recursive attention loop network comprises one toTwo-layer-less ResNet, a convolution LSTM unit and a method for generating an attention profile
Convs of the convolution layer of (a); wherein N is a positive integer; the recursive attention circulation network is used for positioning the area needing removing the watermark, so that the generation network can pay more attention to the watermark area and the surrounding structure, and the judgment network can better evaluate the local consistency of the watermark recovery area;
the attention profile is a matrix from 0 to 1, with larger values indicating more attention; the attention profile is a non-binary map representing a gradually increasing attention from the non-watermarked area to the watermarked area, even though the attention in the watermarked area is different, because there is a difference in transparency of the watermark area, some parts of the watermark do not completely block the background, thereby conveying some background information.
The context auto-encoder consists of a U-Net structure of 16 Conv-relu blocks for de-watermarking the recursively attention-cycled network located regions.
The U-Net network belongs to an encoder-decoder, but is different from the traditional encoder-decoder in the characteristic jump layer connection, and all data information is required to flow through each layer from input to output in the traditional GAN generation model network structure, so that the training time is undoubtedly prolonged; for the task of image watermarking, although an input image and a target image need to be subjected to complex conversion, the structures of the input image and an output image are basically the same, namely the input image and the output image are shared on low-level information in the image conversion process, and the information does not need to be converted, so that the waste is caused by the traditional GAN generation model network structure, the network structure is adjusted according to the image conversion requirement, and the information sharing between the input image and the output image can be realized by using a U-Net network structure; the benefit of the U-Net network architecture is that the connection between the encode and decode parts of the same size in the network gives the generative model the ability to skip some of the subsequent steps, also known as skip-connections, so that low-level detail information under different resolution conditions is preserved and part of the information can be transmitted directly through the connection when the network is trained.
In step S10, the generated network loss function of the generator is:
LG=10-2LGAN(O)+LATT({A},M)+LM({S},{T})+LP(O,T);
LGAN(O)=log(1-D(O));
LP(O,T)=LMSE(VGG(O),VGG(T));
wherein L isGA loss value representing the generated network; o represents the watermark-removed picture generated by the generator; t represents a waterless picture corresponding to O; d represents a discrimination network; m represents a binary mask; l isGAN(O) represents a loss function of the generated network; l isATT({ A }, M) represents the loss function of the recursive attention-cycle network, attention profile A, output at time step ttThe mean square error with the binary mask M, N is 5, θ is 0.9, we expect a higher N to produce a better attention map, but for a very large N, more video memory resources are needed, so let N take the value of 5; l isMSE(. to) represents the mean square error; l isM({ S }, { T }) represents the multi-scale loss function of the context autocoder, SiRepresenting the ith output, T, extracted from the context autocoderiShows the reduction of the waterless printed picture to SiSame size, λiWeights representing pictures of different sizes, will beiThe values of (a) are respectively set to be 0.6, 0.8 and 1, so that the sizes of the output pictures of the last layer, the third last layer and the fifth last layer of the context automatic encoder can be respectively the original sizes1/4, 1/2 and 1 fold; l isPAnd (O, T) represents a perceptual loss function of the context automatic encoder, namely, a plurality of features are extracted from the pictures O and T by using a trained feature network VGG, and the sum is obtained after the mean square error is calculated.
In step S20, the discriminant network loss function of the discriminator is:
LD(T,O)=-log(D(T))-log(1-D(O))+γLmap(O,T,AN);
Lmap(O,T,AN)=LMSE(Dmap(O),AN)+LMSE(Dmap(T),0);
wherein L isD(T, O) represents a loss value for discriminating the network; gamma denotes Lmap(O,T,AN) The weight lost; l ismap(O,T,AN) Representing a difference between an Attention mask and an Attention Map (Attention Map) generated by one of the layers of the arbiter; dmap() represents the process by which the arbiter generates the attention mask; 0 represents an attention profile that contains only 0 values.
The step S30 specifically includes:
step S31, inputting a plurality of watermark sample pictures into the generator to generate a watermark sample picture;
step S32, inputting the watermark sample picture and the watermark removed sample picture into a discriminator;
step S33, judging whether all the water mark removing sample pictures are true and whether the water mark removing sample pictures are matched with the corresponding water mark sample pictures by the discriminator, if so, finishing the confrontation training, and entering step S40; if not, the process proceeds to step S31 to continue the countermeasure training.
In step S30, the objective function of the conditional adversary network is:
wherein L is
cGAN(G, D) represents an objective function of the condition generating countermeasure network; s represents a watermark sample picture; x tableDisplaying a real picture corresponding to the watermark sample picture; d (s, x) represents inputting the watermark sample picture and the real picture into the discriminator; d (s, g (s)) represents inputting the watermark sample picture and the de-watermark sample picture generated by the generator into the discriminator;
representing an expectation of joint distribution of the watermark sample picture and the corresponding real picture;
indicating the desirability of de-watermark sample picture distribution.
The generator of the C-GAN algorithm generates an image based on random noise, but the random noise is often buried in a watermark sample picture, so that the input random noise is omitted.
In summary, the invention has the advantages that:
1. the method comprises the steps of inputting a watermark sample picture into a conditional generation type countermeasure network consisting of a generator and a discriminator to perform countermeasure training, and then inputting the watermark picture into the generator after the countermeasure training to generate a watermark removing picture, namely, the generator after the countermeasure training is used for automatically removing the watermark in batch, so that the watermark removing efficiency is improved, and the method is suitable for processing large-batch images with complex background and complex watermark; the generator is built based on the recursive attention cycle network and the context automatic encoder, the discriminator is built based on the recursive attention cycle network and the PatchGAN, namely the generator generates an attention distribution map through the recursive attention cycle network for positioning the region needing to be removed with the watermark, the context automatic encoder removes the watermark based on the positioned region, the discriminator concentrates attention on the region needing to be removed with the watermark based on the attention distribution map, the watermark region does not need to be marked in advance like the prior art, all regions with the watermark can be automatically noticed, the method is suitable for the picture with the complex watermark, the automatic watermark removal is finally realized, and the removal effect of the watermark is greatly improved.
2. The conditional generation type countermeasure network (C-GAN) replaces the traditional generation type countermeasure network (GAN), the generator inputs the watermark sample picture and the watermark removing sample picture into the discriminator, and the discriminator not only needs to judge the authenticity of the watermark removing sample picture, but also needs to judge whether the watermark removing sample picture is matched with the watermark sample picture, so that the watermark removing effect is greatly improved.
3. The PatchGAN replaces the traditional GAN to build a discriminator, the influence of different parts of the image on the discriminator is considered, the trained model can pay more attention to the detailed part of the image, the overall difference representation which is more accurate than the single scalar output is realized, and the local feature and the overall feature of the image are fused.
Although specific embodiments of the invention have been described above, it will be understood by those skilled in the art that the specific embodiments described are illustrative only and are not limiting upon the scope of the invention, and that equivalent modifications and variations can be made by those skilled in the art without departing from the spirit of the invention, which is to be limited only by the appended claims.