Method and system for removing antagonistic noise aiming at antagonistic sample of deep neural networkTechnical Field
The invention belongs to the field of image processing, and particularly relates to an anti-noise removing method and system for an anti-sample of a deep neural network.
Background
Confrontational samples (adaptive algorithms) are a typical defect that is widespread in a variety of deep neural networks. In recent years, deep neural networks have achieved significant effects in a number of machine learning areas, such as automated driving, object detection, object classification, medical image-assisted diagnosis, and so forth. The important reason for the development of the deep neural network is that the neural network has strong fitting capability and can easily fit any nonlinear function. But beginning in 2014, we found a specific artificially manufactured image sample that the human eye can hardly recognize from the original image sample, but allows the deep neural network to make a very different or even completely opposite prediction. For example, in the field of automatic driving, people can modify a "STOP" identifier at a pixel level, so that the modified identifier cannot be distinguished from the original "STOP" identifier by human eyes, but the identifier is judged to be "Turn Left" or even "Speed Up" by a recognition classification part in an automatic driving model, and a serious traffic accident is likely to be caused. Such artificial samples that are prone to cause the deep neural network to make false predictions are referred to as countersamples (adaptive algorithms), and the process of generating such samples is referred to as counterattacks (adaptive attacks). Because the confrontation sample has the characteristic of being difficult to distinguish by human eyes, the confrontation sample is difficult to remove only by hand, and high labor cost is faced, so that the improvement of the robustness of the neural network model aiming at the confrontation sample has wide development prospect and important practical significance, and the technology for improving the robustness of the neural network model in academic research is called confrontation Defense (adaptive Defense).
At present, the attack resisting method aiming at the neural network is various, and comprises three attack types of black box attack, gray box attack and white box attack. In the white-box attack, an attacker can acquire all parameters, gradients, input and output and other information of the model. Similarly, in the gray box attack, an attacker can only obtain part of parameters in the model, and in the black box attack, the attacker can only access the model for a limited time and obtain a limited number of inputs and corresponding outputs, and the parameters of the model are not known at all. These attack methods are based on the gradient of the model, so the gradient of the model also becomes important information that can be utilized in fighting against defense.
Currently, the confrontational defense aiming at the neural network classification model can be mainly divided into three types: 1) antagonistic training 2) removal of antagonistic noise based on the generative model 3) model fusion.
The countermeasure defense method based on the countermeasure training is based on the idea of data enhancement, the model is trained by using a clean image sample, meanwhile, the countermeasure attack is carried out on the model to generate the countermeasure image sample, more than one attack method is used, the countermeasure image sample is used for training the model, namely, the countermeasure image sample is added into a training set, the method can effectively enhance the robustness of the model aiming at the countermeasure image sample, and the defects that the required calculation cost is high, the speed is slow, and the countermeasure image sample generated by the countermeasure attack method without adding the training set can not be defended.
The method for removing the counternoise based on the generative model is mainly used for reconstructing the counterimage sample by using the generative model, and the counternoise can be removed in the reconstruction process. The generation model used by the method comprises a generation countermeasure network, a variational self-encoder, a common encoder and the like, and usually a classifier is accessed after the generation model, and training methods such as separate training, joint training and the like are available. In the joint training method, the classifier can help the generation model part to learn the decision boundary of classification, and the sample after the generation model reconstruction can be used for classification. The more well-known methods are Magnet by Meng et al in 2017, defense-GAN by Samangouei et al in 2018, and the like. The current method based on the generative model basically only utilizes the classifier to generate the confrontation sample, and an attacker cannot obtain information about defense mechanisms, such as the structure, gradient and the like of the generative model, which is also called as 'gray box attack'.
The countermeasure defense based on model fusion aims at simultaneously training a plurality of classification models, and the testing process utilizes a plurality of modes to fuse the trained models, such as a voting method, an averaging method, a weighted average and the like. An attacker needs to attack a plurality of classification models successfully at the same time in the attack process to achieve an ideal attack effect. A well-known method is the self-orthogonal amplification super-network method proposed by Bian et al in 2020. In the method, a plurality of models need to be trained simultaneously during training, so the training time is still long, and the computational complexity is high.
Disclosure of Invention
The invention provides an anti-noise removing method for an anti-sample of a deep neural network, aiming at the problem that a classification accuracy rate of the deep neural network classification model facing the anti-sample generated artificially is remarkably reduced.
According to one aspect of the invention, a method for removing the confrontation noise of the confrontation sample of the deep neural network is provided, which comprises the following steps:
carrying out end-to-end training on the conditional variation self-encoder and the classifier by using a clean image sample to obtain the trained conditional variation self-encoder and classifier;
and inputting the confrontation image to be denoised into the trained conditional variation autoencoder and the trained classifier, and obtaining a denoised sample and a denoised class of the confrontation image.
Preferably, the performing end-to-end training on the conditional variation self-encoder and the classifier by using the clean image sample to obtain the trained conditional variation self-encoder and classifier includes:
s101, directly connecting a condition variation self-encoder with a classifier, and inputting a clean image to be matched with a corresponding class label;
an encoder in the conditional variation self-encoder encodes an input clean image and a corresponding class label to a feature space to obtain a mean value and a standard deviation of distribution obeyed by potential variables in the feature space:
μ,σ=Encoder(xclean,ytrue)
where μ denotes the mean of the encoded latent variable z, σ is the standard deviation of the encoded latent variable z, xcleanFor inputting a clean image, ytrueA category label corresponding to the clean image;
s102, sampling in the feature space to obtain a latent variable z:
z=μ+ε·σ
ε~N(0,I)
wherein epsilon represents a random hyperparameter obeying N (0, I) normal distribution, and a potential variable z obeying the normal distribution N (mu, epsilon) is obtained by the resampling method;
s103, inputting the latent variable z into a decoder in a conditional variant self-encoder, reconstructing a clean input image:
xrecon=Decoder(z,ytrue)
wherein xreconRepresenting a clean input image reconstructed by the decoder, the decoding process is also supervised, requiring an additional input class label ytrue;
S104, inputting the reconstructed clean input image into a classifier to obtain a category y predicted according to a reconstructed samplepred:
ypred=Classifier(xrecon)
S105, forming a combined loss function by the condition variational self-encoder and the loss function of the classifier, and performing end-to-end training by using an Adam optimization algorithm:
L1=xclean logxrecon+(1-xclean)log(1-xrecon)
L2=μ2+σ2-logσ2-1
L3=ytrue logypred+(1-ytrue)log(1-ypred)
L=L1+L2+L3+Lreg
the loss function of the trained conditional variational self-encoder and the classifier model is divided into four parts: conditional variation loss L from the encoder1And L2Class loss L of classifier3And regularization loss L of control condition variation from the encoder parameter sizereg,L1And L3Using cross entropy loss, L2Regularizing the loss function L using KL divergence lossregDefined as the two-norm of the encoder and decoder parameters.
Preferably, the conditional variational self-encoder includes:
a base condition variation self-encoder;
the multi-layer full-connection layer is additionally added at the label input end of the basic condition variation self-encoder and is used for improving the dimensionality of the note and the weight of the label in the encoding operation;
a plurality of BN layers which are additionally added at the label input end of the basic condition variation self-encoder and used for avoiding the variance drift phenomenon in the encoding process; and the number of the first and second groups,
and a BN layer which is additionally added after the average value output by the base condition variation self-encoder is used for improving the encoding effect.
Preferably, the inputting the countermeasure image to be denoised to the trained conditional variational auto-encoder and classifier to obtain the denoising samples and the classification of the countermeasure image comprises:
s201, encoding the confrontation image to be denoised to a feature space by the trained conditional variation self-encoder;
s202, resampling latent variables based on the feature space;
s203, reconstructing the original confrontation image to be denoised for multiple times based on the latent variable;
s204, screening out an image sample for eliminating the counternoise from the reconstructed counterimage to be denoised;
s205, detecting the anti-noise elimination effect of the screened image sample by adopting the trained classifier.
Preferably, in S201, the trained conditional variational self-encoder encodes the confrontation image to be denoised, including:
traversing all category labels y of a datasetiIn turn confronted with the input image xadverComposing an image tag pair (Xadver, yi);
labeling the image with a labelInputting the trained conditional variational self-encoder to (Xadver, yi), and encoding the paired (Xadver, yi) to obtain a mean value u for resampling and decodingiAnd standard deviation sigmaiObtaining u of the feature spacei,σi;
ui,σi=Encoder(xadver,yi)
Wherein i is 0,1, C is the number of class labels, resulting in C μiAnd σi。
Preferably, the S202, resampling latent variables based on the feature space, includes:
obtaining a latent variable z obeying normal distribution N (mu, epsilon) by a resampling methodi:
zi=μi+εi·σi
εi~N(0,I)
Where ε represents the random hyperparameter following a normal distribution of N (0, I).
Preferably, the S203, reconstructing the original confrontation image to be denoised for a plurality of times based on the latent variable, includes:
xtest,i=Decoder(zi,yi),
wherein xtest,iThe number of the images represents C after multiple reconstructions.
Preferably, in S204, screening out an image sample with the anti-noise removed from the reconstructed anti-image to be denoised, including:
taking the loss function value as an evaluation standard, selecting a reconstruction sample corresponding to the minimum loss function value as a sample closest to the original input, namely an image sample for removing the anti-noise:
wherein y ismidClass labels, x, corresponding to reconstructed samples representing removed countermeasures against noisemidRepresenting the reconstructed samples screened out with the anti-noise removed.
Preferably, in S205, the classifier detects an anti-noise cleaning effect, including:
x is to bemidInputting the classifier to obtain the prediction result y of the classifierpred;
ypred=Classifier(xmid)。
Counting the number ratio of the prediction result to the real label to judge whether the noise elimination effect is good or bad: the quantity ratio reaches a set threshold value, which indicates that the confrontation sample has no attack effect, i.e. the confrontation noise causing the attack effect is removed.
According to a second aspect of the present invention, there is provided a chip system, comprising a processor coupled to a memory, the memory storing program instructions, and the program instructions stored in the memory when executed by the processor implement any of the above-mentioned methods for removing noise against a deep neural network countersample.
Compared with the prior art, the invention has the following beneficial effects:
the invention can improve the robustness of the model aiming at the confrontation image sample by removing the noise in the confrontation image sample, and the image sample with the confrontation noise removed has obvious characteristics, can be correctly classified, and also improves the reliability and the safety of subsequent tasks.
The method is conveniently applied to a plurality of application scenes with higher safety levels, such as automatic driving, image detection, face recognition and the like.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flow chart of an anti-noise removal method for an anti-sample of a deep neural network according to an embodiment of the present invention;
FIG. 2 is a process diagram of an end-to-end training condition variation auto-encoder and classifier according to another preferred embodiment of the present invention;
FIG. 3 is a process diagram for testing the robustness of the overall model against the images according to another preferred embodiment of the present invention;
FIG. 4 is an algorithmic pseudocode for testing the robustness of an overall model against resist images in accordance with another preferred embodiment of the present invention;
FIG. 5 is a sample of a typical clean image used in another preferred embodiment of the present invention;
FIG. 6 is a sample of a typical confrontation image used in another preferred embodiment provided by the present invention;
FIG. 7 is a sample image with noise rejection removed according to another preferred embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention provides an embodiment, and provides an antagonistic noise removing method for an antagonistic sample of a deep neural network, which comprises the following steps:
carrying out end-to-end training on the conditional variation self-encoder and the classifier by using a clean image sample to obtain the trained conditional variation self-encoder and classifier;
and inputting the confrontation image to be denoised into the trained conditional variation autoencoder and the trained classifier, and obtaining a denoised sample and a denoised class of the confrontation image.
Based on the above further optimization, a preferred embodiment is provided, as shown in fig. 1, which is a schematic flow chart of an anti-noise removing method for an anti-sample of a deep neural network, including:
s1, training a model by using a clean image;
s2, encoding the input confrontation image sample to a feature space;
s3, resampling potential variables;
s4, reconstructing input based on the latent variables;
s5, screening a denoising image sample;
s6, predicting the category by the classifier;
and S7, obtaining a denoising sample and a denoising class.
In the embodiment, the confrontation image samples are coded to the feature space which is not far away from the clean image samples, the latent variables are sampled in the feature space, the image samples for resisting the noise are decoded out, the confrontation image samples acquired under various attack methods can be processed, and the confrontation noise can be quickly and effectively eliminated.
In order to better train the model, the present invention provides another preferred embodiment. As shown in fig. 2, a process of the end-to-end training condition variation self-encoder and classifier of the present embodiment is explained. Referring to fig. 2, S1, the clean image training model includes:
s101, the conditional variation self-encoder is directly connected with the classifier, namely the coding and decoding output of the conditional variation encoder is used as the input of the classifier. This embodiment, condition variational autoencoder additionally adds the full articulable layer of multilayer at basic condition variational autoencoder's label input, improves the dimensionality and the shared weight of label in the coding operation of label, adds a plurality of BN layers, avoids appearing the variance drift phenomenon and then influence the generalization ability in the coding process, adds the BN layer behind the mean value of encoder output, improves the coding effect of encoder.
Inputting a clean image and matching with a corresponding label, and encoding the clean image: in the training process, a clean image and a corresponding class label are respectively input at an image input end and a class input end of a condition variation self-encoder:
μ,σ=Encoder(xclean,ytrue)
where μ denotes the mean of the encoded latent variable z, σ is the standard deviation of the encoded latent variable z, xcleanFor inputting a clean image, ytrueFor the class label corresponding to the clean image, the encoder encodes the input clean image and the corresponding label pair to a feature space to obtain the feature spaceThe mean value and the standard deviation of the obedient distribution of the medium latent variable are used for the next sampling;
s102, sampling a feature space to obtain potential variables:
z=μ+ε·σ
ε~N(0,I)
wherein epsilon represents a random hyperparameter obeying N (0, I) normal distribution, and a latent variable z obeying the normal distribution N (mu, epsilon) is obtained by a resampling method. In the conditional variational autocoder, a variable obtained by dimensionality reduction, encoding, and resampling of an input image is referred to as a latent variable, and is denoted by a letter z in the present embodiment. The function of the variable is to characterize the input image, decode and reconstruct the input image from the variable.
S103, reconstructing a clean input image by a decoder:
xrecon=Decoder(z,ytrue)
wherein xreconRepresenting a clean input image reconstructed by the decoder, the decoding process is also supervised, and an additional input of a classification label y is requiredtrue(ii) a In the conditional variant self-encoder, an encoder part and a decoder part input an image and a corresponding class label at the same time, and an encoding operation and a decoding operation operate the encoder part and the decoder part at the same time, so that the class label also affects the encoding effect and the decoding effect, namely, the process is supervised.
S104, inputting the reconstructed image into a classifier to obtain a category y predicted according to the reconstructed samplepred:
ypred=Classifier(xrecon)
And S105, forming a combined loss function by the condition variation self-encoder and the loss function of the classifier, and performing end-to-end training by using an Adam optimization algorithm, wherein the end-to-end training in the embodiment refers to the end-to-end training from the input end of the condition variation self-encoder to the output end of the classifier.
L1=xclean logxrecon+(1-xclean)log(1-xrecon)
L2=μ2+σ2-logσ2-1
L3=ytrue logypred+(1-ytrue)log(1-ypred)
L=L1+L2+L3+Lreg
The loss function of the model is divided into four parts: conditional variation loss L from the encoder1And L2Class loss L of classifier3And regularization loss L of control condition variation from the encoder parameter sizereg。L1And L3Using cross entropy loss, L2Regularizing the loss function L using KL divergence lossregDefined as the two-norm of the encoder and decoder parameters.
After training is completed, the model (conditional variant auto-encoder and classifier) can be used for the subsequent testing process of the confrontational image samples.
In order to test the robustness of the above overall model against the images, the present invention provides a preferred embodiment. As shown in fig. 3, a process of testing the robustness of the entire model against the resist image according to the present embodiment is explained. FIG. 4 is algorithmic pseudo-code to test the robustness of the overall model against the challenge image corresponding to FIG. 3. In this embodiment, a supervised combination model is formed by combining the conditional variation auto-encoder and the classifier obtained in the previous embodiment, the antagonistic image sample is encoded to a feature space which is not far away from the clean image sample, latent variables are sampled in the feature space, and the image sample subjected to anti-noise is decoded and removed.
Specifically, in the present embodiment, the image sample used is from the public data set MNIST, and the data set contains 10 types of pictures, so the number of types C in the present invention is 10. Fig. 5a and 5b show exemplary clean images used in the present embodiment, respectively. Fig. 6a and 6b show typical confrontational image samples used in the present embodiment, respectively.
Specifically, S2, the conditional variant auto-encoder encoding the original image includes:
because the input confrontation image (fig. 6a and 6b) to be denoised lacks labels in the test process, all the class labels are traversed, and an image label pair is formed by the labels and the input confrontation image in sequence, and the input condition variation self-encoder:
ui,σi=Encoder(xadver,yi)
wherein i is 0,1, 10, and C, C is the number of class labels, yielding 10 μ in totaliAnd σi;
Specifically, S3, latent variables are sampled based on the feature space:
zi=μi+εi·σi
εi~N(0,I)
wherein epsilon represents a random hyperparameter obeying N (0, I) normal distribution, and a potential variable z obeying N (mu, epsilon) normal distribution is obtained by a resampling methodi;
Specifically, S4, reconstructing the original input image multiple times based on the latent variables:
xtest,i=Decoder(zi,yi)
wherein xtest,i10 images after multiple reconstruction are represented;
specifically, in step S5, the reconstructed image is screened out of the image samples with the noise countermeasure removed:
wherein y ismidClass labels representing correspondences of reconstructed samples with noise removed, denoted by xmidRepresenting the reconstructed samples screened out with the anti-noise removed.
In order to screen out the sample closest to the original input, the loss function value is used as an evaluation standard, the reconstruction sample corresponding to the minimum loss function value is selected as the sample closest to the original input, and the original deep neural network is trained by a clean sample, so that the selected reconstruction sample can be regarded as the clean image sample closest to the original input, namely the image sample for resisting noise is removed.
Specifically, S6, the classifier detects the anti-noise removal effect step:
ypred=Classifier(xmid)
and inputting the reconstructed sample into a classifier part of the model, and if the reconstructed sample can be correctly classified by the classifier part, indicating that the reconstructed sample has no antagonistic effect, wherein the process is called a detection process for resisting whether noise is eliminated or not.
In this embodiment, the quality of the noise removal effect is determined by counting the number ratio of the prediction result to the real tag. The prediction result is the same as the real label, and the representative model can be correctly classified. The same number ratio is large, which indicates that the challenge sample has no attack effect, i.e. the challenge noise causing the attack effect is removed. On the MNIST data set of this embodiment, the classification accuracy of the model for the confrontation sample reaches 90% or more, which indicates that the model has a good confrontation noise removal effect.
In the art, there are multiple data sets used to measure the effectiveness of models to remove the countering noise, such as MNIST, CIFAR10, CIFAR100, and so on. There are different same number ratio threshold ranges for different data sets.
Fig. 7a and 7b show the image samples after the anti-noise is removed in this embodiment. It can be seen that the confrontation image samples are comparable to the clean image samples (fig. 5a and 5 b).
Specifically, S7 obtains the true semantic label, y, of the original imagepredRepresenting the input image with the anti-noise removed, because the anti-noise is removed by the previous steps, and the classifier part is also trained by the reconstructed image of the clean input image, the classifier can ensure that the correct label of the reconstructed image is obtained, namely the robustness is kept on the attack noise in the input sample.
The correct labels are self-contained in the public data set, the number of the predicted labels ypred which is the same as that of the correct labels is counted in the test process, and the index of the same number is used for evaluating the classification performance of the classification model.
The structure of the conditional variation self-encoder in this example is recorded in table 1. The classifier structure may be determined according to the specific application, and the common classification neural network Resnet-18 is used as a classification model in the present example.
TABLE 1 CONDITIONS VARIATION SELF-ENCODER STRUCTURE
In the embodiment, because the embodiment is trained only by using the clean images, the trained model only projects the input images into the feature spaces corresponding to the respective clean image categories, and ideally, the reconstructed images decoded from the feature spaces should also conform to the distribution obeyed by the clean images. Therefore, the model can keep higher robustness under the attack of various attack methods.
The method has the advantages that the noise in the countermeasure image sample is removed, the robustness of the model for the countermeasure image sample can be improved, the image sample subjected to the removal of the countermeasure noise is obvious in characteristic and can be correctly classified, the reliability and the safety of subsequent tasks are improved, and the method is conveniently applied to multiple application scenes with higher safety levels such as automatic driving, image detection and face recognition.
Based on the same concept of the foregoing embodiments, an embodiment is provided, in which a chip system includes a processor coupled to a memory, and the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the method for removing the anti-noise of the anti-noise sample for the deep neural network in any of the foregoing embodiments is implemented.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The above-described preferred features may be used in any combination without conflict with each other.