Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a generation-based anti-attack method.
The purpose of the invention is realized by the following technical scheme: a method for countering attacks based on generation, comprising the steps of:
calculating similarity encoding data of a tracking template and a seed point set of a search area, and fusing the features extracted from the tracking template and the similarity encoding data to obtain enhanced features;
inputting the added features into a binomial distribution coding layer, wherein the binomial distribution coding layer learns a Bernoulli distribution for each point to describe the filtering state of the point; the antagonistic template was obtained by distillation using the filtration state.
Further, the calculating the similarity encoding data of the tracking template and the seed point set of the search area includes:
respectively extracting a first seed point set of a tracking template and a second seed point set of a search area by utilizing a downsampling mode;
and taking the cosine distances of the first seed point set and the second seed point set obtained by calculation as potential similarity coding data.
Further, the cosine distance is convolved and symmetrical to be used as potential similarity encoding data.
Further, the fusing the features extracted from the tracking template and the similarity encoding data to obtain enhanced features includes:
the first seed point set is up-sampled to obtain potential features of the tracking template;
and splicing and fusing the potential features and the similarity coding data which are repeatedly calculated for many times to obtain enhanced features.
Further, the method further comprises:
using the potential similarity encoding data as the feature loss LfeatTo distinguish the tracking template from the search space in a potential feature space:
where Sim' represents the potential similarity encoding data of the tracking template and the search space, d2The dimension of the similarity code is represented, and i represents the serial number of the point.
Further, the two-term distribution coding layer learns a bernoulli distribution for each point to describe the filtering state of the point, and comprises:
and carrying out point filtering by using the stretched binary contrast distribution, wherein the interval range of the binary contrast distribution is (gamma, zeta) interval, gamma is less than 0, and zeta is more than 1, and attaching the filtering state to each point of the tracking template.
Further, the inputting of the added feature to the point filtering implemented by using the stretched binary contract distribution and attaching the filtering status to each point of the tracking template includes:
given aA random variable s follows a binary coherent distribution phi in the (0, 1) interval and may be represented by qs(s | φ) as the probability density, Q, of the distributions(s | φ) as its cumulative probability; the binary sphere distribution phi is represented by the parameter phi ═ (log alpha, beta), where log alpha represents position and beta represents temperature; the binary constellation distribution is re-parameterized by a random variable U-U (0, 1) that follows a uniform distribution, and is expressed as:
s=Sigmoid((log u-log(1-u)+logα)/β)
stretching the binary secret distribution to a (gamma, zeta) interval, wherein gamma is less than 0 and zeta is more than 1, and then cutting the binary secret distribution by using hard-sigmoid to obtain a hard-secret distribution:
wherein z represents a filtration state, zi∈{0,1};
The obtaining of the antagonistic template by using the filtration state distillation comprises the following steps:
filtration state ziThe 0 spots were filtered out by a filter, generating the antagonistic template.
Further, the method further comprises:
by means of L0Regularization as a filtering loss function; where the L0 regularization is defined as the cumulative probability that the hard-con crete distribution is at greater than zero:
further, the method further comprises:
generating a plurality of proposals by using the adversity template and corresponding probability scores thereof as candidate regions of the target position; the proposal with the highest probability score will be selected as the final prediction.
Further, the method further comprises:
using a localization loss function LlocSimultaneously decreasing the score of all proposals aggregated into a group is defined as:
in the formula, R represents the proposals sorted by score, and p, q, and R each represent a range of subscripts of the proposals aggregated into a group.
Further, the method further comprises:
using L2 distance as perceptual loss function LpercThe changes used to constrain the data are defined as:
in the formula (I), the compound is shown in the specification,
representing the antagonistic template, P
tmpA tracking template is represented that is,
points representing antagonistic templates, x
iRepresenting points of the tracking template.
The invention has the beneficial effects that:
(1) in an exemplary embodiment of the present invention, the feature extraction is performed by using a similarity coding method and a similarity coding and tracking template fusion method, which have the following advantages: advantages of the similarity coding approach: the template and the search space can be distinguished in the feature space, and the template and the search space are further distinguished in the abstract space; the similarity coding and tracking template fusion mode has the advantages that: embedding the similarity codes as potential features into a tracking template, and enhancing the features of the tracking template; thirdly, point filtering is realized by adopting a differentiable form to fit a bernoulli distribution (namely, the bernoulli distribution for learning the discarding probability of each point is used for describing the filtering state of the point), and the method has the advantages that: the neural network can be used for learning in a gradient descent method, and the Bernoulli distribution is adopted, so that the simulation of point filtering is more appropriate. By adding the similarity coding as an enhanced feature into the learning of the neural network, the potential feature space can be better excavated.
(2) In another exemplary embodiment of the present invention, the similarity calculation repeated M times is implemented and fused, and the effect is that: features of the template can be enhanced and differences in feature space between the template and the search area mined.
(3) In yet another exemplary embodiment of the present invention, for the similarity data, due to the permutation invariance of the point cloud, a symmetric function is further used, ensuring the output of the same similarity under different point sequences.
(4) In yet another exemplary embodiment of the present invention, various losses are used to improve the different effects.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if," as used herein, may be interpreted as "when or" responsive to a determination, "depending on the context.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The background of the present exemplary embodiment is explained first: it is known to store a video sequence comprising N frames in the form of a point cloud
3D object tracking aims at locating a tracking template P in successive frames
tmp. Note that the template P is tracked
tmpIs from the first frame s
1Obtained from (i) P
tmp∈s
1。
In the prior art, the 3D tracker can generate a search region P in the current frame by expanding the range of the prediction result in the previous frameaeaAnd in the search area PaeaSome proposals and the probability scores corresponding to them are generated as candidates for the target location.
In the present exemplary embodiment, in the counterattack of the 3D target tracking, the 3D tracker needs to be disturbed, so that it predicts the result of the deviation from the real target position.
Thus: tracing template PtmpSearch region P representing a tracking target specified in the first frameareaIndicating the expanded area of the prediction result of the previous frame.
Referring to fig. 1, fig. 1 shows a method for resisting attacks based on generation, disclosed in an exemplary embodiment of the present application, including the following steps:
s11: feature extraction: calculating similarity encoding data of a tracking template and a seed point set of a search area, and fusing the features extracted from the tracking template and the similarity encoding data to obtain enhanced features;
s13: and (3) resisting distillation: inputting the added features into a binomial distribution coding layer, wherein the binomial distribution coding layer learns a Bernoulli distribution for each point to describe the filtering state of the point; the antagonistic template was obtained by distillation using the filtration state.
Specifically, in this exemplary embodiment, the proposed generation-based counter attack method flow is for a 3D tracker, as shown in fig. 1:
in step S11, a known inclusion M
1Tracking template P of points
tmpAnd comprises M
2Search region P of dots
areaTracing the template P
tmpAnd search space P
areaIs encoded into a description of the template features, i.e. similarity encoding data Sim' with dimension M
1×(d
1+d
2) Wherein d is
1Representing the number of features of the point after upsampling, d
2Representing the number of dimensions of similarity coding; the template P will then be tracked
tmpExtracted features
Fusing with similarity-encoded data Sim' to obtain enhanced features
Then in step S13, a binomial distribution coding layer is used to learn the primary effort distribution of each point drop probability for describing the filtering state of the point, thereby distilling an antagonistic template
Wherein, adopt the mode of similarity code and carry out the feature extraction with the mode of similarity code and the integration of tracking template, its advantage lies in respectively: (1) the advantages of the approach of similarity coding: the difference between the template and the search area in the feature space can be mined, and the template and the search area are coded; (2) the advantage of the way of fusing the similarity encoding with the tracking template: and taking the coded similarity as the characteristics of a search area of the potential space and the template, and performing characteristic enhancement on the template. Before the method is used, the original model accuracy can be reduced by 6.5% by the attack method, and after the method is adopted, the model accuracy can be reduced by 22.9%.
While point filtering is realized by fitting a bernoulli distribution in a differentiable form (i.e. a bernoulli distribution for learning the discarding probability of each point is used for describing the filtering state of the point), the advantages are that: the discrete filtering states can be described using a method that can be graded down to achieve the filtering operation.
Preferably, in an exemplary embodiment, the calculating the similarity encoding data of the tracking template and the seed point set of the search area includes:
s31: respectively extracting a first seed point set of a tracking template and a second seed point set of a search area by utilizing a downsampling mode;
s33: and taking the cosine distances of the first seed point set and the second seed point set obtained by calculation as potential similarity coding data.
Specifically, in step S31 of the exemplary embodiment, the downsampling manner may be implemented by PointNet + + (i.e., the backbone network in fig. 1), and the template P istmpAnd search region PareaAre respectively down-sampled to obtain S by a farthest point sampling algorithm1And S2A plurality of f epsilon R with multi-scale characteristicsnRespectively forming a first set of seed points Stmp(S1X n) and a second set of seed points sarea(S2×n)。
In step S33, for better feature extraction, a seed point set S is designed for calculation
tmpAnd S
areaCosine distance of
As a branch of the potential similarity data Sim'. The encoded potential similarity can effectively fuse the templates P
tmpAnd search space P
area。
Preferably, in an exemplary embodiment, the cosine distance is convolved and symmetric as the potential similarity encoding data.
It should be noted that, due to the invariance of point cloud replacement, a symmetric function is further used to ensure that the output with the same similarity is obtained under different point sequences
Preferably, in an exemplary embodiment, the fusing the features extracted from the tracking template and the similarity encoding data to obtain enhanced features includes:
s51: the first seed point set is up-sampled to obtain potential features of the tracking template;
s53: and splicing and fusing the potential features and the similarity coding data which are repeatedly calculated for many times to obtain enhanced features.
Specifically, in step S51, a first set of sub-points S
tmpIs up-sampled into
To generate potential features for each point in the original track template.
And in step S53, the potential feature P is obtained
tmpSplicing with the similarity repeated for M times can obtain the enhanced characteristics
It should be noted that repeating the similarity operation M times is realized by copying, so that the similarity is used as a feature of a point and spliced into a feature of each point.
Preferably, in an exemplary embodiment, the method further comprises:
will have potential similarityEncoding data as a characteristic loss LfeatTo distinguish the tracking template from the search space in a potential feature space:
where Sim' represents the potential similarity encoding data of the tracking template and the search space, d2Denotes the dimension of the similarity code, i denotes the index of the dimension.
I.e. taking the mean value of the similarity codes as the loss LfeatBy reducing this loss, the similarity between the search space and the target in the feature space can be reduced, and the two can be distinguished in the feature space.
Preferably, in an exemplary embodiment, the two-term distribution coding layer learns a bernoulli distribution for each point to describe a filtering state of the point, and the method includes:
and carrying out point filtering by using the stretched binary contrast distribution, wherein the interval range of the binary contrast distribution is (gamma, zeta) interval, gamma is less than 0, and zeta is more than 1, and attaching the filtering state to each point of the tracking template.
Specifically, in the exemplary embodiment, the point filtering module learns the probabilities that the various points are filtered out. A binomial distribution coding layer implements point filtering by a point-scale filter, which learns a Bernoulli distribution for each point to describe the filtering state of the point. In particular, the filtering state z of the point scale
iE {0, 1} is appended to each point of the tracking template, and can be expressed as
Filtration state z
iThose points that are 0 are filtered out by the filter, thereby generating the antagonistic template
But it is not insignificant due to the discontinuities in the bernoulli distribution. Thus, a Binary Concrete distribution is used, which is a P-Bo distributionA smooth simulation of the Knoop distribution, and it is continuously differentiable. Meanwhile, in order to ensure that the point filtering module can effectively filter points, the value of the point filtering module needs to be determined to be 0 or 1, so the binary constant distribution is extended to the (gamma, zeta) interval, wherein gamma is less than 0 and zeta is more than 1.
Preferably, in an exemplary embodiment, the inputting of the added feature to the point filtering with the stretched binary concrete distribution and attaching the filtering status to each point of the tracking template includes:
s71: the Binary Concrete distribution is a smooth simulation of the Bernoulli distribution, and it is continuously differentiable; given a random variable s, lying within the (0, 1) interval, following a binary coherent distribution, and may be represented by qs(s | φ) as the probability density, Q, of the distributions(s | φ) as its cumulative probability; the binary sphere distribution phi is represented by the parameter phi ═ (log alpha, beta), where log alpha represents position and beta represents temperature; the binary constellation distribution is re-parameterized by a random variable U-U (0, 1) that follows a uniform distribution, and is expressed as:
s=Sigmoid((log u-log(1-u)+logα)/β)
s73: in order to ensure that the point filtering module can effectively filter points, the value of the point filtering module needs to be determined as 0 or 1, the binary secret distribution is stretched to a (gamma, zeta) interval, wherein gamma is less than 0 and zeta is more than 1, and then the cut-off processing is carried out on the binary secret distribution by using hard-sigmoid to obtain the hard-secret distribution:
wherein z represents a filtration state, zi∈{0,1};
The obtaining of the antagonistic template by using the filtration state distillation comprises the following steps:
filtration state ziThe 0 spots are filtered out by a filter to generate an antagonistic template。
Preferably, in an exemplary embodiment, since L0Regularization does not result in a collapse of the filter state values, so it is used to penalize the binomial distribution encoding layer. Binomial distribution coding layer L0The regularization is constrained to minimize the number of points filtered out. Thus, the method further comprises:
by means of L0Regularization as a filtering loss function; where the L0 regularization is defined as the cumulative probability that the hard-con crete distribution is at greater than zero:
the cumulative probability greater than the zero point is "probability not equal to 0", that is, the probability that the filtering state is 1, and the number of points to be filtered is controlled by this probability.
More preferably, in an exemplary embodiment, as shown in fig. 1, the method further comprises:
generating a plurality of proposals by using the adversity template and corresponding probability scores thereof as candidate regions of the target position; the proposal with the highest probability score will be selected as the final prediction.
In addition, the search region P is searchedareaAnd antagonistic template PtmpThe method comprises the steps of inputting the data into a Victim depth Model (Vistim Deep Model) to obtain a plurality of proposals, and selecting the proposal with the highest score by the Victim depth Model to serve as a prediction result of a current frame so as to realize target tracking.
Preferably, in an exemplary embodiment, the tracker predicts some proposals and their corresponding probability scores, and the proposal with the highest probability score is selected as the final prediction result as the candidate region of the target position. However, the positions of other proposals with higher scores are closer to the real position, so that the prediction results of the proposals are more accurate. Thus using the localization loss function LlocThe proposal scores in the specified range can be accumulated to form a group,enabling it to simultaneously reduce the scores of all proposals aggregated into a group, thereby simultaneously reducing the scores of the better proposals, is defined as:
in the formula, R represents the proposals sorted by score, and p, q, and R each represent a range of subscripts of the proposals aggregated into a group.
Preferably, in an exemplary embodiment, the method further comprises:
for the attack effect not to be perceptible to the human eye, the L2 distance is used as a perception loss function LpecThe changes used to constrain the data are defined as:
in the formula (I), the compound is shown in the specification,
representing the antagonistic template, P
tmpA tracking template is represented that is,
points representing antagonistic templates, x
iRepresenting points of the tracking template.
Summarizing all of the exemplary embodiments, the target loss function can be expressed as:
L=Lfeat+a*Lloc+b*L0+c*Lperc
where a, b, c are hyper-parameters, used to balance the terms in the loss function.
Based on any one of the above exemplary embodiments, an exemplary embodiment of the present invention provides a storage medium having stored thereon computer instructions that, when executed, perform the steps of the method for countering an attack based on generation.
Based on any one of the above exemplary embodiments, an exemplary embodiment of the present invention provides a terminal, which includes a memory and a processor, where the memory stores computer instructions executable on the processor, and the processor executes the computer instructions to perform the steps of the anti-attack method based on generation.
Based on such understanding, the technical solutions of the present embodiments may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including several instructions for causing an apparatus to execute all or part of the steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It is to be understood that the above-described embodiments are illustrative only and not restrictive of the broad invention, and that various other modifications and changes in light thereof will be suggested to persons skilled in the art based upon the above teachings. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the spirit or scope of the invention.