where Sim' represents the potential similarity encoding data of the tracking template and the search space, d₂The dimension of the similarity code is represented, and i represents the serial number of the point.

Further, the two-term distribution coding layer learns a bernoulli distribution for each point to describe the filtering state of the point, and comprises:

and carrying out point filtering by using the stretched binary contrast distribution, wherein the interval range of the binary contrast distribution is (gamma, zeta) interval, gamma is less than 0, and zeta is more than 1, and attaching the filtering state to each point of the tracking template.

Further, the inputting of the added feature to the point filtering implemented by using the stretched binary contract distribution and attaching the filtering status to each point of the tracking template includes:

given aA random variable s follows a binary coherent distribution phi in the (0, 1) interval and may be represented by q_s(s | φ) as the probability density, Q, of the distribution_s(s | φ) as its cumulative probability; the binary sphere distribution phi is represented by the parameter phi ═ (log alpha, beta), where log alpha represents position and beta represents temperature; the binary constellation distribution is re-parameterized by a random variable U-U (0, 1) that follows a uniform distribution, and is expressed as:

s＝Sigmoid((log u-log(1-u)+logα)/β)

stretching the binary secret distribution to a (gamma, zeta) interval, wherein gamma is less than 0 and zeta is more than 1, and then cutting the binary secret distribution by using hard-sigmoid to obtain a hard-secret distribution:

wherein z represents a filtration state, z_i∈{0，1}；

The obtaining of the antagonistic template by using the filtration state distillation comprises the following steps:

filtration state z_iThe 0 spots were filtered out by a filter, generating the antagonistic template.

Further, the method further comprises:

by means of L₀Regularization as a filtering loss function; where the L0 regularization is defined as the cumulative probability that the hard-con crete distribution is at greater than zero:

further, the method further comprises:

generating a plurality of proposals by using the adversity template and corresponding probability scores thereof as candidate regions of the target position; the proposal with the highest probability score will be selected as the final prediction.

Further, the method further comprises:

using a localization loss function L_locSimultaneously decreasing the score of all proposals aggregated into a group is defined as:

in the formula, R represents the proposals sorted by score, and p, q, and R each represent a range of subscripts of the proposals aggregated into a group.

Further, the method further comprises:

using L2 distance as perceptual loss function L_percThe changes used to constrain the data are defined as:

in the formula (I), the compound is shown in the specification,

representing the antagonistic template, P_tmpA tracking template is represented that is,

points representing antagonistic templates, x_iRepresenting points of the tracking template.

The invention has the beneficial effects that:

(1) in an exemplary embodiment of the present invention, the feature extraction is performed by using a similarity coding method and a similarity coding and tracking template fusion method, which have the following advantages: advantages of the similarity coding approach: the template and the search space can be distinguished in the feature space, and the template and the search space are further distinguished in the abstract space; the similarity coding and tracking template fusion mode has the advantages that: embedding the similarity codes as potential features into a tracking template, and enhancing the features of the tracking template; thirdly, point filtering is realized by adopting a differentiable form to fit a bernoulli distribution (namely, the bernoulli distribution for learning the discarding probability of each point is used for describing the filtering state of the point), and the method has the advantages that: the neural network can be used for learning in a gradient descent method, and the Bernoulli distribution is adopted, so that the simulation of point filtering is more appropriate. By adding the similarity coding as an enhanced feature into the learning of the neural network, the potential feature space can be better excavated.

(2) In another exemplary embodiment of the present invention, the similarity calculation repeated M times is implemented and fused, and the effect is that: features of the template can be enhanced and differences in feature space between the template and the search area mined.

(3) In yet another exemplary embodiment of the present invention, for the similarity data, due to the permutation invariance of the point cloud, a symmetric function is further used, ensuring the output of the same similarity under different point sequences.

(4) In yet another exemplary embodiment of the present invention, various losses are used to improve the different effects.

Drawings

Fig. 1 is a flow chart of a method provided in an exemplary embodiment of the invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if," as used herein, may be interpreted as "when or" responsive to a determination, "depending on the context.

In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The background of the present exemplary embodiment is explained first: it is known to store a video sequence comprising N frames in the form of a point cloud

3D object tracking aims at locating a tracking template P in successive frames_tmp. Note that the template P is tracked_tmpIs from the first frame s₁Obtained from (i) P_tmp∈s₁。

In the prior art, the 3D tracker can generate a search region P in the current frame by expanding the range of the prediction result in the previous frame_aeaAnd in the search area P_aeaSome proposals and the probability scores corresponding to them are generated as candidates for the target location.

In the present exemplary embodiment, in the counterattack of the 3D target tracking, the 3D tracker needs to be disturbed, so that it predicts the result of the deviation from the real target position.

Thus: tracing template P_tmpSearch region P representing a tracking target specified in the first frame_areaIndicating the expanded area of the prediction result of the previous frame.

Referring to fig. 1, fig. 1 shows a method for resisting attacks based on generation, disclosed in an exemplary embodiment of the present application, including the following steps:

s11: feature extraction: calculating similarity encoding data of a tracking template and a seed point set of a search area, and fusing the features extracted from the tracking template and the similarity encoding data to obtain enhanced features;

s13: and (3) resisting distillation: inputting the added features into a binomial distribution coding layer, wherein the binomial distribution coding layer learns a Bernoulli distribution for each point to describe the filtering state of the point; the antagonistic template was obtained by distillation using the filtration state.

Specifically, in this exemplary embodiment, the proposed generation-based counter attack method flow is for a 3D tracker, as shown in fig. 1:

in step S11, a known inclusion M₁Tracking template P of points_tmpAnd comprises M₂Search region P of dots_areaTracing the template P_tmpAnd search space P_areaIs encoded into a description of the template features, i.e. similarity encoding data Sim' with dimension M₁×(d₁+d₂) Wherein d is₁Representing the number of features of the point after upsampling, d₂Representing the number of dimensions of similarity coding; the template P will then be tracked_tmpExtracted features

Fusing with similarity-encoded data Sim' to obtain enhanced features

Then in step S13, a binomial distribution coding layer is used to learn the primary effort distribution of each point drop probability for describing the filtering state of the point, thereby distilling an antagonistic template

Wherein, adopt the mode of similarity code and carry out the feature extraction with the mode of similarity code and the integration of tracking template, its advantage lies in respectively: (1) the advantages of the approach of similarity coding: the difference between the template and the search area in the feature space can be mined, and the template and the search area are coded; (2) the advantage of the way of fusing the similarity encoding with the tracking template: and taking the coded similarity as the characteristics of a search area of the potential space and the template, and performing characteristic enhancement on the template. Before the method is used, the original model accuracy can be reduced by 6.5% by the attack method, and after the method is adopted, the model accuracy can be reduced by 22.9%.

While point filtering is realized by fitting a bernoulli distribution in a differentiable form (i.e. a bernoulli distribution for learning the discarding probability of each point is used for describing the filtering state of the point), the advantages are that: the discrete filtering states can be described using a method that can be graded down to achieve the filtering operation.

Preferably, in an exemplary embodiment, the calculating the similarity encoding data of the tracking template and the seed point set of the search area includes:

s31: respectively extracting a first seed point set of a tracking template and a second seed point set of a search area by utilizing a downsampling mode;

s33: and taking the cosine distances of the first seed point set and the second seed point set obtained by calculation as potential similarity coding data.

Specifically, in step S31 of the exemplary embodiment, the downsampling manner may be implemented by PointNet + + (i.e., the backbone network in fig. 1), and the template P is_tmpAnd search region P_areaAre respectively down-sampled to obtain S by a farthest point sampling algorithm₁And S₂A plurality of f epsilon R with multi-scale characteristicsⁿRespectively forming a first set of seed points S_tmp(S₁X n) and a second set of seed points s_area(S₂×n)。

In step S33, for better feature extraction, a seed point set S is designed for calculation_tmpAnd S_areaCosine distance of

As a branch of the potential similarity data Sim'. The encoded potential similarity can effectively fuse the templates P_tmpAnd search space P_area。

Preferably, in an exemplary embodiment, the cosine distance is convolved and symmetric as the potential similarity encoding data.

It should be noted that, due to the invariance of point cloud replacement, a symmetric function is further used to ensure that the output with the same similarity is obtained under different point sequences

Preferably, in an exemplary embodiment, the fusing the features extracted from the tracking template and the similarity encoding data to obtain enhanced features includes:

s51: the first seed point set is up-sampled to obtain potential features of the tracking template;

s53: and splicing and fusing the potential features and the similarity coding data which are repeatedly calculated for many times to obtain enhanced features.

Specifically, in step S51, a first set of sub-points S_tmpIs up-sampled into

To generate potential features for each point in the original track template.

And in step S53, the potential feature P is obtained_tmpSplicing with the similarity repeated for M times can obtain the enhanced characteristics

It should be noted that repeating the similarity operation M times is realized by copying, so that the similarity is used as a feature of a point and spliced into a feature of each point.

Preferably, in an exemplary embodiment, the method further comprises:

will have potential similarityEncoding data as a characteristic loss L_featTo distinguish the tracking template from the search space in a potential feature space:

where Sim' represents the potential similarity encoding data of the tracking template and the search space, d₂Denotes the dimension of the similarity code, i denotes the index of the dimension.

I.e. taking the mean value of the similarity codes as the loss L_featBy reducing this loss, the similarity between the search space and the target in the feature space can be reduced, and the two can be distinguished in the feature space.

Preferably, in an exemplary embodiment, the two-term distribution coding layer learns a bernoulli distribution for each point to describe a filtering state of the point, and the method includes:

Specifically, in the exemplary embodiment, the point filtering module learns the probabilities that the various points are filtered out. A binomial distribution coding layer implements point filtering by a point-scale filter, which learns a Bernoulli distribution for each point to describe the filtering state of the point. In particular, the filtering state z of the point scale_iE {0, 1} is appended to each point of the tracking template, and can be expressed as

Filtration state z_iThose points that are 0 are filtered out by the filter, thereby generating the antagonistic template

But it is not insignificant due to the discontinuities in the bernoulli distribution. Thus, a Binary Concrete distribution is used, which is a P-Bo distributionA smooth simulation of the Knoop distribution, and it is continuously differentiable. Meanwhile, in order to ensure that the point filtering module can effectively filter points, the value of the point filtering module needs to be determined to be 0 or 1, so the binary constant distribution is extended to the (gamma, zeta) interval, wherein gamma is less than 0 and zeta is more than 1.

Preferably, in an exemplary embodiment, the inputting of the added feature to the point filtering with the stretched binary concrete distribution and attaching the filtering status to each point of the tracking template includes:

s71: the Binary Concrete distribution is a smooth simulation of the Bernoulli distribution, and it is continuously differentiable; given a random variable s, lying within the (0, 1) interval, following a binary coherent distribution, and may be represented by q_s(s | φ) as the probability density, Q, of the distribution_s(s | φ) as its cumulative probability; the binary sphere distribution phi is represented by the parameter phi ═ (log alpha, beta), where log alpha represents position and beta represents temperature; the binary constellation distribution is re-parameterized by a random variable U-U (0, 1) that follows a uniform distribution, and is expressed as:

s＝Sigmoid((log u-log(1-u)+logα)/β)

s73: in order to ensure that the point filtering module can effectively filter points, the value of the point filtering module needs to be determined as 0 or 1, the binary secret distribution is stretched to a (gamma, zeta) interval, wherein gamma is less than 0 and zeta is more than 1, and then the cut-off processing is carried out on the binary secret distribution by using hard-sigmoid to obtain the hard-secret distribution:

wherein z represents a filtration state, z_i∈{0，1}；

filtration state z_iThe 0 spots are filtered out by a filter to generate an antagonistic template。

Preferably, in an exemplary embodiment, since L₀Regularization does not result in a collapse of the filter state values, so it is used to penalize the binomial distribution encoding layer. Binomial distribution coding layer L₀The regularization is constrained to minimize the number of points filtered out. Thus, the method further comprises:

the cumulative probability greater than the zero point is "probability not equal to 0", that is, the probability that the filtering state is 1, and the number of points to be filtered is controlled by this probability.

More preferably, in an exemplary embodiment, as shown in fig. 1, the method further comprises:

In addition, the search region P is searched_areaAnd antagonistic template P_tmpThe method comprises the steps of inputting the data into a Victim depth Model (Vistim Deep Model) to obtain a plurality of proposals, and selecting the proposal with the highest score by the Victim depth Model to serve as a prediction result of a current frame so as to realize target tracking.

Preferably, in an exemplary embodiment, the tracker predicts some proposals and their corresponding probability scores, and the proposal with the highest probability score is selected as the final prediction result as the candidate region of the target position. However, the positions of other proposals with higher scores are closer to the real position, so that the prediction results of the proposals are more accurate. Thus using the localization loss function L_locThe proposal scores in the specified range can be accumulated to form a group,enabling it to simultaneously reduce the scores of all proposals aggregated into a group, thereby simultaneously reducing the scores of the better proposals, is defined as:

Preferably, in an exemplary embodiment, the method further comprises:

for the attack effect not to be perceptible to the human eye, the L2 distance is used as a perception loss function L_pecThe changes used to constrain the data are defined as:

in the formula (I), the compound is shown in the specification,

Summarizing all of the exemplary embodiments, the target loss function can be expressed as:

L＝L_feat+a*L_loc+b*L₀+c*L_perc

where a, b, c are hyper-parameters, used to balance the terms in the loss function.

Based on any one of the above exemplary embodiments, an exemplary embodiment of the present invention provides a storage medium having stored thereon computer instructions that, when executed, perform the steps of the method for countering an attack based on generation.

Based on any one of the above exemplary embodiments, an exemplary embodiment of the present invention provides a terminal, which includes a memory and a processor, where the memory stores computer instructions executable on the processor, and the processor executes the computer instructions to perform the steps of the anti-attack method based on generation.

Based on such understanding, the technical solutions of the present embodiments may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including several instructions for causing an apparatus to execute all or part of the steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

It is to be understood that the above-described embodiments are illustrative only and not restrictive of the broad invention, and that various other modifications and changes in light thereof will be suggested to persons skilled in the art based upon the above teachings. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the spirit or scope of the invention.

Claims

Translated fromChinese

1.一种基于生成的对抗攻击方法，其特征在于：包括以下步骤：1. a method of confrontation attack based on generation, is characterized in that: comprise the following steps:

计算跟踪模板和搜索区域的种子点集的相似度编码数据，将从跟踪模板提取的特征和所述相似度编码数据进行融合得到增强特征；Calculate the similarity coding data of the tracking template and the seed point set of the search area, and fuse the features extracted from the tracking template and the similarity coding data to obtain enhanced features;

将所述增加特征输入至二项分布编码层，所述二项分布编码层为每个点学习出一个伯努利分布用于描述点的过滤状态；利用过滤状态蒸馏得到对抗性模板。The added feature is input to the binomial distribution coding layer, and the binomial distribution coding layer learns a Bernoulli distribution for each point to describe the filter state of the point; the adversarial template is obtained by distillation of the filter state.

2.根据权利要求1所述的一种基于生成的对抗攻击方法，其特征在于：所述计算跟踪模板和搜索区域的种子点集的相似度编码数据，包括：2. a kind of generation-based adversarial attack method according to claim 1, is characterized in that: the similarity coding data of the seed point set of described calculation tracking template and search area, comprises:

利用下采样的方式分别提取跟踪模板的第一种子点集和搜索区域的第二种子点集；Extract the first seed point set of the tracking template and the second seed point set of the search area by means of downsampling;

以计算得到的第一种子点集和第二种子点集的余弦距离，作为潜在相似度编码数据。The calculated cosine distance between the first seed point set and the second seed point set is used as the encoded data of potential similarity.

3.根据权利要求2所述的一种基于生成的对抗攻击方法，其特征在于：所述将从跟踪模板提取的特征和所述相似度编码数据进行融合得到增强特征，包括：3. a kind of generation-based confrontation attack method according to claim 2, is characterized in that: described from the feature extracted from the tracking template and described similarity coding data are fused to obtain enhanced features, comprising:

将第一种子点集进行上采样，得到跟踪模板的潜在特征；Upsampling the first seed point set to obtain the latent features of the tracking template;

将所述潜在特征与重复了多次计算的相似度编码数据进行拼接融合，得到增强特征。The latent features are spliced and fused with similarity encoded data that has been repeatedly calculated to obtain enhanced features.

4.根据权利要求2所述的一种基于生成的对抗攻击方法，其特征在于：所述方法还包括：4. A generation-based adversarial attack method according to claim 2, wherein the method further comprises:

将潜在相似度编码数据作为特征损失L_feat，以在一个潜在的特征空间中区分跟踪模板和搜索空间：The latent similarity encoded data is used as a feature loss L_feat to distinguish the tracking template from the search space in a latent feature space:

式中，Sim′表示跟踪模板和搜索空间的潜在相似度编码数据，d₂表示相似度编码的维度，i表示维度的下标。In the formula, Sim′ represents the latent similarity coding data of the tracking template and the search space, d₂ represents the dimension of the similarity coding, and i represents the subscript of the dimension.

5.根据权利要求1所述的一种基于生成的对抗攻击方法，其特征在于：所述二项分布编码层为每个点学习出一个伯努利分布用于描述点的过滤状态，包括：5. A kind of generation-based adversarial attack method according to claim 1, is characterized in that: described binomial distribution coding layer learns a Bernoulli distribution for each point to describe the filtering state of the point, including:

利用拉伸过后的binary concrete分布实现点过滤并将过滤状态附在跟踪模板的每个点上，所述binary concrete分布的区间范围为(γ，ζ)区间，其中γ<0且ζ>1。The stretched binary concrete distribution is used to realize point filtering and attach the filtering state to each point of the tracking template. The interval range of the binary concrete distribution is the (γ, ζ) interval, where γ<0 and ζ>1.

6.根据权利要求5所述的一种基于生成的对抗攻击方法，其特征在于：所述所述增加特征输入至利用拉伸过后的binary concrete分布实现点过滤，并将过滤状态附在跟踪模板的每个点上，包括：6 . The generation-based adversarial attack method according to claim 5 , wherein the added feature input is used to realize point filtering using the stretched binary concrete distribution, and the filtering state is attached to the tracking template. 7 . at each point, including:

给定一个随机变量s服从binary concrete分布φ位于(0，1)区间内，且可以用q_s(s|φ)作为该分布的概率密度，Q_s(s|φ)作为其累积概率；所述binary concrete分布φ用参数φ＝(logα，β(表示，其中logα表示位置，β表示温度；所述binary concrete分布用一个服从均匀分布的随机变量u～U(0，1)来重参数化，表示为：Given that a random variable s obeys a binary concrete distribution φ in the interval (0, 1), and can use q_s (s|φ) as the probability density of the distribution, and Q_s (s|φ) as its cumulative probability; The binary concrete distribution φ is represented by the parameter φ=(logα, β(, where logα represents the position and β represents the temperature; the binary concrete distribution is re-parameterized with a random variable u~U(0, 1) that obeys a uniform distribution. ,Expressed as:

s＝Sigmoid((logu-log(1-u)+logα)/β)s=Sigmoid((log-log(1-u)+logα)/β)

将binary concrete分布拉伸至(γ，ζ)区间，其中γ<0且ζ>1，再使用hard-sigmoid对其进行截断处理得到hard-concrete分布：Stretch the binary concrete distribution to the (γ,ζ) interval, where γ<0 and ζ>1, and then use hard-sigmoid to truncate it to obtain the hard-concrete distribution:

式中，z表示过滤状态，z_i∈{0，1}；In the formula, z represents the filtering state,_zi ∈ {0, 1};

所述利用过滤状态蒸馏得到对抗性模板包括：The adversarial template obtained by the filtered state distillation includes:

过滤状态z_i＝0的点被过滤器过滤掉，生成对抗性模板。Points with filtered state_zi = 0 are filtered out by the filter, generating an adversarial template.

7.根据权利要求6所述的一种基于生成的对抗攻击方法，其特征在于：所述方法还包括：7. A generation-based adversarial attack method according to claim 6, characterized in that: the method further comprises:

利用L₀正则化，作为过滤损失函数；其中，L0正则化被定义为hard-concrete分布在大于零点的累积概率：Use_L0 regularization as the filter loss function; where L0 regularization is defined as the cumulative probability of a hard-concrete distribution greater than zero:

8.根据权利要求1所述的一种基于生成的对抗攻击方法，其特征在于：所述方法还包括：8. A generation-based adversarial attack method according to claim 1, wherein the method further comprises:

利用所述对抗性模板生成多个提案，和它们对应的概率得分，作为目标位置的候选区域；具有最高概率得分的提案会被选择作为最终的预测结果。Multiple proposals are generated using the adversarial template, and their corresponding probability scores are used as candidate regions for the target location; the proposal with the highest probability score will be selected as the final prediction result.

9.根据权利要求8所述的一种基于生成的对抗攻击方法，其特征在于：所述方法还包括：9. A generation-based adversarial attack method according to claim 8, wherein the method further comprises:

利用定位损失函数L_loc，同时降低被聚合成一个组的所有提案的得分，被定义为：Using the localization loss function L_loc , while reducing the score of all proposals aggregated into a group, is defined as:

式中，R表示按照得分排序后的提案，p、q、r分别表示被聚合成组的提案的下标范围。In the formula, R represents the proposals sorted by score, and p, q, and r represent the subscript ranges of the proposals aggregated into groups, respectively.

10.根据权利要求1所述的一种基于生成的对抗攻击方法，其特征在于：所述方法还包括：10. A generation-based adversarial attack method according to claim 1, wherein the method further comprises:

利用L2距离作为感知损失函数L_perc，被用来约束数据的改变，被定义为：Using the L2 distance as the perceptual loss function_Lperc , which is used to constrain the change of the data, is defined as:

式中，

表示对抗性模板，P_tmp表示跟踪模板，

表示对抗性模板的点，x_i表示跟踪模板的点。In the formula,

represents the adversarial template, P_tmp represents the tracking template,

are the points representing the adversarial template, and x_i are the points that are tracking the template.