Movatterモバイル変換


[0]ホーム

URL:


CN115019103B - Small-sample target detection method based on coordinate attention group optimization - Google Patents

Small-sample target detection method based on coordinate attention group optimization
Download PDF

Info

Publication number
CN115019103B
CN115019103BCN202210697675.0ACN202210697675ACN115019103BCN 115019103 BCN115019103 BCN 115019103BCN 202210697675 ACN202210697675 ACN 202210697675ACN 115019103 BCN115019103 BCN 115019103B
Authority
CN
China
Prior art keywords
target
support
query
matrix
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210697675.0A
Other languages
Chinese (zh)
Other versions
CN115019103A (en
Inventor
李平
陈家俊
徐向华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi UniversityfiledCriticalHangzhou Dianzi University
Priority to CN202210697675.0ApriorityCriticalpatent/CN115019103B/en
Publication of CN115019103ApublicationCriticalpatent/CN115019103A/en
Application grantedgrantedCritical
Publication of CN115019103BpublicationCriticalpatent/CN115019103B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了基于坐标注意力群组优化的小样本目标检测方法。本发明方法首先对图像数据样本采样得到支持集和查询集,利用深度视觉特征提取器得到支持特征图集合和查询特征图集合;然后将支持特征图集合和初始化的目标参数矩阵集合输入坐标注意力引导模块,得到目标融合矩阵集合;再构建群组优化模块,支持特征图集合通过目标真实边界框和标签传播算法得到更新后的支持目标类向量集合;最后,将上述获得的集合输入查询目标预测模块得到查询集样本目标的边界框和类别。本发明方法利用坐标注意力机制使检测器能自适应地关注查询集中目标所在区域,同时通过群组优化模块得到判别性的支持目标类向量,从而提升小样本目标检测的性能。

The present invention discloses a small sample target detection method based on coordinate attention group optimization. The method of the present invention first samples the image data samples to obtain a support set and a query set, and uses a deep visual feature extractor to obtain a support feature map set and a query feature map set; then the support feature map set and the initialized target parameter matrix set are input into a coordinate attention guidance module to obtain a target fusion matrix set; then a group optimization module is constructed, and the support feature map set is used to obtain an updated support target class vector set through the target true bounding box and label propagation algorithm; finally, the above-obtained set is input into a query target prediction module to obtain the bounding box and category of the query set sample target. The method of the present invention uses the coordinate attention mechanism to enable the detector to adaptively focus on the area where the target is located in the query set, and at the same time obtains a discriminative support target class vector through the group optimization module, thereby improving the performance of small sample target detection.

Description

Small sample target detection method based on coordinate attention group optimization
Technical Field
The invention belongs to the technical field of computers, in particular to the technical field of computer vision, relates to target detection in a small sample scene, and particularly relates to a small sample target detection method based on coordinate attention group optimization.
Background
Object detection is a fundamental computer vision task aimed at locating and identifying a person or object of interest from an image. The traditional method obtains target characteristics through a sliding window mechanism and a manual characteristic extraction algorithm such as Scale-INVARIANT FEATURE TRANSFORM (SIFT), a direction gradient histogram (Histogram of Oriented Gradient, HOG) and the like, and then predicts target categories by adopting a support vector machine. Over the past decade, the deep learning method based on the convolutional neural network is widely applied to the field of target detection, and the performance of target detection is remarkably improved. However, deep learning methods typically require training a model with a large amount of labeled data, which is difficult to apply directly to data-scarce scenes. Meanwhile, the bounding box and the category of the target in the manual annotation image need to consume a great deal of labor cost, which greatly limits the popularization and application of the deep learning method in industry. To solve the above-mentioned problems of lack of data and labor cost, researchers have proposed the task of small sample target detection, which aims to detect similar targets in unlabeled images by means of a small number of labeled images. The method can be applied to scenes such as industrial product defect detection, rare animal detection and the like, and meanwhile, the cost of manual annotation data is reduced. For example, industrial product defect detection generally requires a large number of product samples with defects to train a model, however, the proportion of defective products in an actual production scene is small, and it is difficult to collect enough samples meeting the conditions, so that the model training is insufficient, and thus the model is over-fitted, and a small sample target detection model can detect defects on the product surface through a small number of defective samples, so that the problems are avoided.
Currently, most small sample target detection methods use a two-stage Faster regional convolutional neural network fast-RCNN (fast Region-Convolution Neural Network) model as a detection framework, and meanwhile, a Meta-learning (Meta-learning) mode is adopted for training. During data preprocessing, the small sample method divides the data set into a training set and a testing set according to the target category in the image, namely, the training set and the target category in the testing set have no intersection. The target class in the training set is called a Base class (Base class), and the target class in the test set is called a new class (new class). During training, the model randomly selects part of samples from the training set containing the basic category targets as a supporting set during each iterative optimization, and randomly selects part of the rest samples containing the similar targets as a query set. Wherein the support set is a sample set of known labels, and the model needs to detect the target in the query sample with a small number of support samples. The support set and the query set form a data set in a small sample object detection task. These methods typically employ a dual-branch architecture, where a Backbone (Backbone) network that shares the support set and query set input parameters is used to obtain the support feature map and the query feature map. Then, the query feature map obtains a query target feature map through a candidate box output by the regional generation network (Region Proposal Network, RPN), and the support feature map obtains a support target class feature vector through a global pooling operation. Finally, the support target class feature vector and the query target feature map are fused through the attention module to obtain a fused feature map, and then the fused feature map is input into a detector to obtain a target boundary box and a class of the query set. The model learns priori knowledge irrelevant to tasks according to a small sample target detection task data set generated by each iteration during training, so that the model is quickly adapted to a new detection task in actual application. After training, the model detects similar targets in other unlabeled samples through a small number of labeled samples containing new targets.
The method has the defects that 1) a two-stage detection frame is adopted, new types of targets are positioned depending on generated candidate frames, a model is trained on basic type samples only, if the new types are very different from the basic types, the new types of targets are difficult to output to be positioned by the candidate frames, 2) the model is easy to lose target space position characteristics by adopting global pooling operation, accurate positioning of the targets is not facilitated, 3) each sample in a support set is regarded as an independent individual, the similarity relation among different targets in the support set in a specific detection task is not fully considered, and when the different types of targets are similar, error classification is easy to cause. In summary, in order to alleviate the problem that the existing method is difficult to capture the spatial position characteristics of the support targets and similar targets of different categories, which results in inaccurate classification, there is an urgent need for a method to locate the targets of the query set according to the spatial positions of the targets in the support set without depending on the candidate frame, and identify the targets in the query set by using the similarity relationship between the targets in the support set.
Disclosure of Invention
The invention aims to provide a small sample target detection method based on coordinate attention group optimization, aiming at the defects of the prior art. The method does not need to generate candidate frames in advance, but captures the key space position characteristics of the support set targets from the horizontal direction and the vertical direction of the characteristic diagram through the coordinate attention, guides the detector to focus on the region containing the targets in the query set, thereby improving the positioning accuracy of the targets, and simultaneously adopts the group optimization module to update the support target class vectors, and obtains more discriminative class characteristics by utilizing the similarity relationship among different targets in the support set, thereby improving the accuracy of target identification.
The method comprises the steps of firstly obtaining an image data set containing a target boundary box and category labels, and then carrying out the following operations:
the method comprises the steps of (1) sampling an image data set to obtain a support set and a query set, inputting the support set and the query set into a depth visual feature extraction module, and outputting a support feature map set and a query feature map set;
step (2) constructing a coordinate attention guiding module, inputting a target parameter matrix set and a support feature map set which are initialized randomly, and outputting a target fusion matrix set;
Step (3) a group optimization module is constructed, input is a support feature map set and a boundary box label corresponding to the support set, and output is a support target class vector set;
step (4) constructing a query target prediction module, inputting a query feature map set, a target fusion matrix set and a support target class vector set, and outputting a predicted query target boundary box and class probability;
and (5) optimizing a small sample target detection model consisting of a coordinate attention guiding module, a group optimizing module and a query target predicting module by using a random gradient descent algorithm, and obtaining target bounding boxes and categories of images in the query set from the new support set and the query set through steps (1) - (4).
Further, the step (1) specifically comprises:
(1-1) first scaling the images in the dataset to the same size, performing non-return random sampling on the image samples to obtain a support setAnd a query setWherein,In the real number domain, Ns represents the number of image samples in the support set,Representing the ith support image sample, Nq representing the number of image samples in the query set,Representing a j-th query image sample, H representing image height, W representing image width, and 3 representing the number of RGB channels;
each supporting an image sampleWith labelsWherein Φi representsIs Ci,φ epsilon {1, C representsThe category of the phi-th object in the list,Representation ofFour-dimensional vector composed of upper left corner and lower right corner coordinates of phi-th target bounding box in the support setTogether contain CK targets, ck=c×k, i.eC represents the number of target classes, K represents the number of targets of each class;
(1-2) constructing a depth visual feature extraction module consisting of a depth convolution network and a two-dimensional convolution layer, wherein the depth convolution network is a 50-layer residual network ResNet-50 pre-trained in an ImageNet data set, and the convolution kernel size of the two-dimensional convolution layer is 1 multiplied by 1;
(1-3) support setAnd a query setInput to a depth vision feature extraction module to obtain a support feature image setAnd query feature graph setWherein,An i-th support feature map is shown,Representing the j-th query feature map, H ', W' and 256 represent the height, width and channel number of the individual feature map, respectively,
Still further, the step (2) specifically comprises:
(2-1) randomly initializing to obtain a target parameter matrix setThe method comprises the steps of Aj, constructing a coordinate attention guiding module, wherein the coordinate attention guiding module consists of a coordinate attention sub-module and a cross attention sub-module, the coordinate attention sub-module calculates the weight of each coordinate position from the horizontal direction and the vertical direction of a feature map to obtain a space position attention feature map set, and the cross attention sub-module is used for fusing the target parameter matrix set and the space position attention feature map set;
(2-2) for the ith support feature mapRespectively carrying out average pooling operation along the horizontal coordinate direction and the vertical coordinate direction to obtain a horizontal feature mapAnd vertical feature mapRepresenting tensorsThe value at the coordinate position (H ', W') in the mu-th channel is 1. Ltoreq.mu.ltoreq.256, 1. Ltoreq.h '. Ltoreq.H', 1. Ltoreq.w '. Ltoreq.W', tensorThe value at coordinate position (h', 1) in the mu th channelTensorThe value at coordinate position (1, w') in the mu th channel
(2-3) Horizontal feature mapAnd vertical feature mapSequentially inputting a two-dimensional convolution layer and an activation function layer to obtain a horizontal weight characteristic diagramAnd vertical weight feature mapWherein Conv1 (·) represents a two-dimensional convolution layer with a convolution kernel size of 1×1, σ (·) represents a Sigmoid activation function;
(2-4) from the support feature mapHorizontal weight feature mapAnd vertical weight feature mapCalculating to obtain a spatial position attention feature mapTensorThe value at coordinate position (h ', w') in the mu-th channelRepresenting tensorsThe coordinate position in the mu-th channel is the value at (h', 1),Representing tensorsThe coordinate position in the mu th channel is the value at (1, w');
(2-5) pair support feature map setAll feature patterns in (a)Executing the steps (2-2) - (2-4) to obtain a spatial position attention feature map setWill be assembledAll feature patterns in (a)Is unfolded and spliced according to the space dimension to obtain the space target position characteristic
(2-6) WillEach target parameter matrix Aj and spatial target position featureInputting into a cross attention sub-module to obtain a target fusion matrix setSoftmax (·) is the normalized exponential function, and the superscript T denotes the transpose operation.
Still further, the step (3) specifically comprises:
(3-1) constructing a group optimization module, wherein the group is a support target set belonging to the same category, a support target probability matrix and a similarity matrix are calculated according to the support target class mark and the obtained support target vector set, and a group propagation matrix is obtained through the support target probability matrix and the similarity matrix;
(3-2) aggregating the support feature maps obtained in step (1-3)And each of the support image samples obtained in step (1-1)Corresponding labelsEach bounding box bi,φ of the target feature set is input into the region of interest pooling layer to obtain a support target feature setThe interesting region pooling layer represents that the feature map performs maximum pooling operation in the corresponding target boundary box region, and Oc,k represents the feature map of the kth target in the c-th class in the set;
(3-3) supporting target feature atlasSequentially carrying out convolution and global average pooling operation to obtain a support target vector setWherein, the feature vector of the kth target in the c-th class in the collectionGAP (·) represents a global average pooling operation over the feature map space dimension, conv2 (·) represents a two-dimensional convolution layer with a convolution kernel size of 3×3;
(3-4) computing the set of support target class vectorsObtaining a support target probability matrix through a support target class vector setWherein, the support target class vector of the c-th class in the setThe value of the ith row and the ith column of matrix P, representing the probability that the ith support vector belongs to the ith class1≤u≤CK,1≤v≤C,u=(c-1)×K+k,Representing a collectionThe u-th support target vector in (i.e.)Exp (·) represents an exponential function with the natural constant e as a base, dist (·, ·) represents the Euclidean distance function, then
(3-5) Calculating a similarity matrix between support target vectors using a Gaussian kernel functionConstructing a group propagation matrix by a similarity matrix Z and a support target probability matrix PWherein the value of the w column of the ith row of the matrix Z represents the support target vectorAndSimilarity betweenW is more than or equal to 1 and less than or equal to CK, the super parameter gamma in the Gaussian kernel function is more than 0,Representing a collectionThe w-th support target vector in (a), the value Λu,v of the v-th column in the u-th row in the group propagation matrix Λ=zp, represents the support target vectorWeighting and summing according to the similarity among samples to obtain probability belonging to the v-th class;
(3-6) initializing an iteration upper limit value psi, iteratively optimizing a support target probability matrix P through a group propagation matrix lambda, a similarity matrix Z and a label propagation algorithm until the iteration number reaches the upper limit value psi to obtain an updated support target probability matrix P(Ψ), wherein P(0)=P,Λ(0) =lambda is adopted in the initial iteration, and the label propagation algorithm is iterated to be100≤ψ≤120,AndRepresent the firstThe support target probability matrix and group propagation matrix at the time of iteration,AndRepresentation matrixValue sum matrix of the ith row and the ith columnThe value of the v-th column of the u-th row,AndRepresentation matrixA value of a ith row and a ith column of a ith row;
(3-7) using the updated support target probability matrix P(Ψ) and the support target vector setCalculating to obtain a new support target class vector setWherein, the support target class vector of the c-th class in the setP(Ψ) represents the target probability matrix after the completion of the ψ -th iteration,The value of the matrix P(Ψ) in the ith row and the ith column represents the updated ith support target vectorProbability belonging to class c.
Further, the step (4) specifically comprises:
(4-1) constructing a query target prediction module, wherein the module consists of a converter submodule, a target classification function and a boundary frame prediction submodule;
Aggregating query feature graphsEach feature graph in the query is unfolded along the space dimension to obtain a query feature matrix setCalculating a position-coding matrix of query samplesValues of row kappa and column omega of matrix G1.Ltoreq.kappa.ltoreq.H '. Times.W', 1.ltoreq.omega.ltoreq.256, mod representing a remainder taking operation;
(4-2) aggregating query feature matricesAnd the position coding matrix E is input into an encoder of a converter submodule to obtain a query target coding feature setThe encoder consists of an attention layer and two full connection layers, and the jth query target coding featureRepresenting an element-by-element addition operation, FFN (·) represents a feed-forward neural network consisting of two fully connected layers;
(4-3) encoding the query object into the feature setAnd the target fusion matrix set obtained in the step (2-6)Input into decoder of converter submodule to obtain target decoding characteristic setThe decoder consists of two attention layers and two full connection layers, and the j-th query target decoding featureRepresentation ofInput to the intermediate result matrix obtained by the first attention layer,
(4-4) Decoding the feature set by querying the targetAnd the support target class vector set obtained in the step (3-6)Calculating the target prediction class probability of the query set, and calculating the probability that the mth target belongs to the c-th class in the jth query sample1≤m≤M,Representation matrixVectors of the m-th row;
(4-5) decoding the query object into a feature setInput to the bounding box prediction submodule to obtain a set of prediction target bounding boxes for the query setThe boundary frame prediction submodule is a multi-layer perceptron formed by three fully-connected layers, and the prediction boundary frame matrix of the jth query sampleM th row ofA prediction bounding box representing the mth target in the jth query sample,Is the upper left corner coordinate of the bounding box,Is the lower right corner of the bounding box.
Still further, step (5) is specifically:
(5-1) target prediction Category probability by query setAnd cross entropy loss function to calculate target classification lossYj,m,c epsilon {0,1} represents the true mark value of the mth target belonging to the c-th class in the jth query sample;
(5-2) prediction target bounding box set by query setCalculating bounding box lossBj,m represents the true bounding box of the mth target in the jth query sample,Representing the intersection ratio of the real boundary box and the prediction boundary box;
(5-3) sorting losses according to the targetsAnd bounding box lossObtaining total lossOptimizing a small sample target detection model consisting of a coordinate attention guiding module, a group optimizing module and a query target predicting module by utilizing a random gradient descent algorithm, and iteratively training the model until convergence to obtain an optimized small sample target detection model;
(5-4) sampling the new image dataset to obtain a support setAnd a query setInputting the optimized small sample target detection model, sequentially executing according to the steps (1) - (4), and outputting a query setTarget class probability for a medium image sampleAnd bounding box setAnd selecting the target class index with the highest probability as the prediction class.
The small sample target detection method based on the coordinate attention group optimization has the following characteristics that 1) targets in query concentration are detected directly by using a converter module and are not dependent on generated candidate frames, 2) the spatial position characteristics of the targets in support concentration are captured from two spatial directions of a feature map by utilizing the coordinate attention, the spatial characteristics of the targets are fused into a target parameter matrix, so that a detector can adaptively pay attention to the region containing the targets in the query concentration according to the support concentration, 3) the support target class vectors are updated by adopting a label propagation algorithm in a group optimization module, the similarity relationship among different targets in the support concentration is fully utilized, and the updated support target class vectors are more discriminative in an embedded space.
The invention is suitable for target detection tasks in small sample environments, and has the beneficial effects that 1) boundary frames and categories of query targets are directly predicted through a converter module, inaccurate candidate frames generated when new categories are greatly different from base categories are avoided, 2) coordinate attention can capture key spatial position characteristics of the support targets, a guiding detector can dynamically adjust the attention to relevant areas in the query samples according to the current spatial position characteristics of the targets, the positioning accuracy of the query targets is effectively improved, and 3) a group optimization module fully utilizes similarity relations among different targets in a supporting set to update support target category vectors, obtains more discriminative support target category characteristics, and is beneficial to model discrimination of targets of different categories, thereby improving the classification accuracy of the query targets. The coordinate attention mechanism and the group optimization mechanism provided by the invention remarkably improve the performance of the small sample target detection model, and can be applied to the practical application fields of industrial product defect detection, rare animal detection and the like.
Drawings
Fig. 1 is a flow chart of the method of the present invention.
Detailed Description
A small sample target detection method based on coordinate attention group optimization is characterized in that as shown in fig. 1, a support set and a query set are obtained by sampling an image data set, a support feature map set and a query feature map set are obtained by utilizing a depth visual feature extractor, then the support feature map set and an initialized target parameter matrix set are input into a coordinate attention guide module to obtain a target fusion matrix set, a group optimization module is built again, the support feature map set is updated through a target real boundary box and a label propagation algorithm to obtain a support target class vector set, and finally the query feature map set, the target fusion matrix set and the support target class vector set are input into a query target prediction module to obtain a boundary box and a class of a sample target in a query set. The method captures the spatial position characteristics of the support concentrated targets by utilizing the coordinate attention, so that the detector can adaptively pay attention to the region containing the targets in the query concentrated, the positioning accuracy of the query targets is improved, and meanwhile, the differentiated support target class vectors are obtained through the group optimization module, so that the classification accuracy of the query targets is improved.
The method comprises the steps of firstly obtaining an image data set containing a target boundary box and category labels, and then carrying out the following operations:
The method comprises the steps of (1) sampling an image data set to obtain a support set and a query set, inputting the support set and the query set into a deep visual feature extraction module, and outputting a support feature map set and a query feature map set, wherein the method specifically comprises the following steps:
(1-1) first scaling the images in the dataset to the same size, performing non-return random sampling on the image samples to obtain a support setAnd a query setWherein,For the real number domain, s represents "support", Ns represents the number of image samples in the support set,Representing the ith support image sample, q represents the "query", Nq represents the number of image samples in the query set,Representing a j-th query image sample, H representing image height, W representing image width, and 3 representing the number of RGB channels;
each supporting an image sampleWith labelsWherein Φi representsIs Ci,φ epsilon {1, C representsThe category of the phi-th object in the list,Representation ofFour-dimensional vector composed of upper left corner and lower right corner coordinates of phi-th target bounding box in the support setTogether contain CK targets, ck=c×k, i.eC represents the number of target classes, K represents the number of targets of each class;
(1-2) constructing a depth visual feature extraction module consisting of a depth convolution Network and a two-dimensional convolution layer, wherein the depth convolution Network is a 50-layer Residual Network ResNet-50 (Residual Network) pre-trained in an ImageNet dataset, and the convolution kernel size of the two-dimensional convolution layer is 1×1;
(1-3) support setAnd a query setInput to a depth vision feature extraction module to obtain a support feature image setAnd query feature graph setWherein,An i-th support feature map is shown,Representing the j-th query feature map, H ', W' and 256 represent the height, width and channel number of the individual feature map, respectively,
The step (2) of constructing a coordinate attention guiding module, wherein the coordinate attention guiding module is input into a randomly initialized target parameter matrix set and a support feature map set and output into a target fusion matrix set, and the specific steps are as follows:
(2-1) randomly initializing to obtain a target parameter matrix setThe method comprises the steps of (aj) constructing a coordinate attention guiding module, wherein the coordinate attention guiding module consists of a coordinate attention sub-module and a cross attention sub-module, the coordinate attention sub-module calculates the weight of each coordinate position from the horizontal direction and the vertical direction of a feature map, so that the weight of a position area containing target features is larger, a space position attention feature map set is further obtained, and the cross attention sub-module is used for fusing the target parameter matrix set and the space position attention feature map set;
(2-2) for the ith support feature mapRespectively carrying out average pooling operation along the horizontal coordinate direction and the vertical coordinate direction to obtain a horizontal feature mapAnd vertical feature mapHor represents "horizontal", ver represents "vertical",Representing tensorsThe value at the coordinate position (H ', W') in the mu-th channel is 1. Ltoreq.mu.ltoreq.256, 1. Ltoreq.h '. Ltoreq.H', 1. Ltoreq.w '. Ltoreq.W', tensorThe value at coordinate position (h', 1) in the mu th channelTensorThe value at coordinate position (1, w') in the mu th channel
(2-3) Horizontal feature mapAnd vertical feature mapSequentially inputting a two-dimensional convolution layer and an activation function layer to obtain a horizontal weight characteristic diagramAnd vertical weight feature mapWherein Conv1 (·) represents a two-dimensional convolution layer with a convolution kernel size of 1×1, σ (·) represents a Sigmoid activation function;
(2-4) from the support feature mapHorizontal weight feature mapAnd vertical weight feature mapCalculating to obtain a spatial position attention feature mapTensorThe value at coordinate position (h ', w') in the mu-th channelRepresenting tensorsThe coordinate position in the mu-th channel is the value at (h', 1),Representing tensorsThe coordinate position in the mu th channel is the value at (1, w');
(2-5) pair support feature map setAll feature patterns in (a)Executing the steps (2-2) - (2-4) to obtain a spatial position attention feature map setWill be assembledAll feature patterns in (a)Is unfolded and spliced according to the space dimension to obtain the space target position characteristic
(2-6) Aggregating the target parameter matricesEach target parameter matrix Aj and spatial target position featureInputting into a cross attention sub-module to obtain a target fusion matrix setSoftmax (·) is the normalized exponential function, and the superscript T denotes the transpose operation.
Step (3) a group optimization module is constructed, wherein the input is a support feature graph set and a boundary box label corresponding to the support set, and the output is a support target class vector set, and the method specifically comprises the following steps:
(3-1) constructing a group optimization module, wherein the group is a support target set belonging to the same category, a support target probability matrix and a similarity matrix are calculated according to the support target class mark and the obtained support target vector set, and a group propagation matrix is obtained through the support target probability matrix and the similarity matrix;
(3-2) aggregating the support feature maps obtained in step (1-3)And each of the support image samples obtained in step (1-1)Corresponding labelsIs input into the region of interest pooling (Regions Of Interest Pooling, ROI Pooling) layer to obtain a set of support target feature mapsThe interesting region pooling layer represents that the feature map performs maximum pooling operation in the corresponding target boundary box region, and Oc,k represents the feature map of the kth target in the c-th class in the set;
(3-3) supporting target feature atlasSequentially carrying out convolution and global average pooling operation to obtain a support target vector setWherein, the feature vector of the kth target in the c-th class in the collectionGAP (·) represents a global average pooling operation over the feature map space dimension, conv2 (·) represents a two-dimensional convolution layer with a convolution kernel size of 3×3;
(3-4) computing the set of support target class vectorsObtaining a support target probability matrix through a support target class vector setWherein, the support target class vector of the c-th class in the setThe value of the ith row and the ith column of matrix P, representing the probability that the ith support vector belongs to the ith class1≤u≤CK,1≤v≤C,u=(c-1)×K+k,Representing a collectionThe u-th support target vector in (i.e.)Exp (·) represents an exponential function with the natural constant e as a base, dist (·, ·) represents the Euclidean distance function, then
(3-5) Calculating a similarity matrix between support target vectors using a Gaussian kernel functionConstructing a group propagation matrix by a similarity matrix Z and a support target probability matrix PWherein the value of the w column of the ith row of the matrix Z represents the support target vectorAndSimilarity betweenW is more than or equal to 1 and less than or equal to CK, the super parameter gamma in the Gaussian kernel function is more than 0,Representing a collectionThe w-th support target vector in (a), the value Λu,v of the v-th column in the u-th row in the group propagation matrix Λ=zp, represents the support target vectorWeighting and summing according to the similarity among samples to obtain probability belonging to the v-th class;
(3-6) initializing an iteration upper limit value psi, iteratively optimizing a support target probability matrix P through a group propagation matrix lambda, a similarity matrix Z and a label propagation algorithm until the iteration number reaches the upper limit value psi to obtain an updated support target probability matrix P(Ψ), wherein P(0)=P,Λ(0) =lambda is adopted in the initial iteration, and the label propagation algorithm is iterated to beWherein the value range of psi is more than or equal to 100 and less than or equal to 120,AndRepresent the firstThe support target probability matrix and group propagation matrix at the time of iteration,AndRepresentation matrixValue sum matrix of the ith row and the ith columnThe value of the v-th column of the u-th row,AndRepresentation matrixA value of a ith row and a ith column of a ith row;
(3-7) using the updated support target probability matrix P(Ψ) and the support target vector setCalculating to obtain a new support target class vector setWherein, the support target class vector of the c-th class in the setP(Ψ) represents the target probability matrix after the completion of the ψ -th iteration,The value of the matrix P(Ψ) in the ith row and the ith column represents the updated ith support target vectorProbability belonging to class c.
Step (4) a query target prediction module is constructed, wherein the query target prediction module is input into a query feature graph set, a target fusion matrix set and a support target class vector set, and the query target prediction module is output into a predicted query target boundary box and class probability, and the specific steps are as follows:
(4-1) constructing a query target prediction module consisting of a converter (transducer) sub-module, a target classification function and a bounding box prediction sub-module;
Aggregating query feature graphsEach feature graph in the query is unfolded along the space dimension to obtain a query feature matrix setCalculating a position-coding matrix of query samplesValues of row kappa and column omega of matrix G1.Ltoreq.kappa.ltoreq.H '. Times.W', 1.ltoreq.omega.ltoreq.256, mod representing a remainder taking operation;
(4-2) aggregating query feature matricesAnd the position coding matrix E is input into an encoder of a converter submodule to obtain a query target coding feature setThe encoder consists of an attention layer and two full connection layers, and the jth query target coding featureRepresenting an element-by-element addition operation, FFN (·) represents a feed-forward neural network consisting of two fully connected layers;
(4-3) encoding the query object into the feature setAnd the target fusion matrix set obtained in the step (2-6)Input into decoder of converter submodule to obtain target decoding characteristic setThe decoder consists of two attention layers and two full connection layers, and the j-th query target decoding featureRepresentation ofInput to the intermediate result matrix obtained by the first attention layer,
(4-4) Decoding the feature set by querying the targetAnd the support target class vector set obtained in the step (3-6)Calculating the target prediction class probability of the query set, and calculating the probability that the mth target belongs to the c-th class in the jth query sample1≤m≤M,Representation matrixVectors of the m-th row;
(4-5) decoding the query object into a feature setInput to the bounding box prediction submodule to obtain a set of prediction target bounding boxes for the query setThe boundary frame prediction submodule is a multi-layer perceptron formed by three fully-connected layers, and the prediction boundary frame matrix of the jth query sampleM th row ofA prediction bounding box representing the mth target in the jth query sample,Is the upper left corner coordinate of the bounding box,Is the lower right corner of the bounding box.
The step (5) of optimizing a small sample target detection model consisting of a coordinate attention guiding module, a group optimizing module and a query target predicting module by utilizing a random gradient descent algorithm, and obtaining target bounding boxes and categories of images in a query set for a new support set and a query set through the steps (1) - (4), wherein the method specifically comprises the following steps:
(5-1) target prediction Category probability by query setAnd cross entropy loss function to calculate target classification lossYj,m,c epsilon {0,1} represents the true mark value of the mth target belonging to the c-th class in the jth query sample;
(5-2) prediction target bounding box set by query setCalculating bounding box lossBj,m represents the true bounding box of the mth target in the jth query sample,Representing the intersection ratio of the real boundary box and the prediction boundary box;
(5-3) sorting losses according to the targetsAnd bounding box lossObtaining total lossOptimizing a small sample target detection model consisting of a coordinate attention guiding module, a group optimizing module and a query target predicting module by utilizing a random gradient descent algorithm, and iteratively training the model until convergence to obtain an optimized small sample target detection model;
(5-4) sampling the new image dataset to obtain a support setAnd a query setInputting the optimized small sample target detection model, sequentially executing according to the steps (1) - (4), and outputting a query setTarget class probability for a medium image sampleAnd bounding box setAnd selecting the target class index with the highest probability as the prediction class.
The description of the present embodiment is merely an enumeration of implementation forms of the inventive concept, and the scope of protection of the present invention should not be construed as limited to the specific forms set forth in the embodiments, as well as equivalent technical means conceivable by those skilled in the art according to the inventive concept.

Claims (6)

Translated fromChinese
1.基于坐标注意力群组优化的小样本目标检测方法,其特征在于,首先获取含有目标边界框和类别标注的图像数据集合,然后进行如下操作:1. A small sample target detection method based on coordinate attention group optimization is characterized in that a set of image data containing target bounding boxes and category annotations is first obtained, and then the following operations are performed:步骤(1)对图像数据集进行采样,获得支持集和查询集,将两者输入到深度视觉特征提取模块,输出支持特征图集合和查询特征图集合;Step (1) sampling the image data set to obtain a support set and a query set, inputting the two into a deep visual feature extraction module, and outputting a support feature map set and a query feature map set;步骤(2)构建坐标注意力引导模块,输入为随机初始化的目标参数矩阵集合和支持特征图集合,输出为目标融合矩阵集合;Step (2) constructs a coordinate attention guidance module, the input of which is a randomly initialized target parameter matrix set and a support feature map set, and the output is a target fusion matrix set;步骤(3)构建群组优化模块,输入为支持特征图集合和支持集对应的边界框标注,输出为支持目标类向量集合;Step (3) constructing a group optimization module, the input of which is the support feature map set and the bounding box annotation corresponding to the support set, and the output is the support target class vector set;步骤(4)构建查询目标预测模块,输入为查询特征图集合、目标融合矩阵集合和支持目标类向量集合,输出为预测的查询目标边界框和类别概率;Step (4) constructing a query target prediction module, the input of which is a query feature map set, a target fusion matrix set and a support target class vector set, and the output is a predicted query target bounding box and class probability;步骤(5)利用随机梯度下降算法优化由坐标注意力引导模块、群组优化模块和查询目标预测模块组成的小样本目标检测模型,对新的支持集和查询集通过步骤(1)~(4)得到查询集中图像的目标边界框和类别。Step (5) uses a stochastic gradient descent algorithm to optimize a small sample target detection model consisting of a coordinate attention guidance module, a group optimization module, and a query target prediction module, and obtains the target bounding box and category of the image in the query set through steps (1) to (4) for the new support set and query set.2.如权利要求1所述的基于坐标注意力群组优化的小样本目标检测方法,其特征在于,步骤(1)具体是:2. The small sample target detection method based on coordinate attention group optimization as claimed in claim 1, characterized in that step (1) specifically comprises:(1-1)首先将数据集中的图像缩放到相同大小,对图像样本进行不放回随机采样,得到支持集和查询集其中,为实数域,Ns表示支持集中的图像样本个数,表示第i个支持图像样本,Nq表示查询集中的图像样本个数,表示第j个查询图像样本,H表示图像高度,W表示图像宽度,3表示RGB通道数量;(1-1) First, scale the images in the dataset to the same size and perform random sampling without replacement to obtain the support set. and queryset in, is the real number domain,Ns represents the number of image samples in the support set, represents the i-th support image sample, Nq represents the number of image samples in the query set, represents the j-th query image sample, H represents the image height, W represents the image width, and 3 represents the number of RGB channels;每个支持图像样本有标注其中,Φi表示中的目标数,ci,φ∈{1,…,C}表示中第φ个目标的类别,表示中第φ个目标边界框的左上角和右下角坐标组成的四维向量,支持集中共包含CK个目标,CK=C×K,即C表示目标类别数,K表示每类目标数;Each support image sample With annotation Among them, Φi represents The number of targets in , ci,φ∈ {1,…,C} represents The category of the φth target in , express The four-dimensional vector consisting of the coordinates of the upper left corner and lower right corner of the φth target bounding box in the support set The CPC contains CK targets, CK = C × K, that is, C represents the number of target categories, and K represents the number of targets in each category;(1-2)构建由深度卷积网络和一个二维卷积层组成的深度视觉特征提取模块,其中,深度卷积网络是在ImageNet数据集预训练的50层残差网络ResNet-50,二维卷积层的卷积核尺寸为1×1;(1-2) Construct a deep visual feature extraction module consisting of a deep convolutional network and a two-dimensional convolutional layer, where the deep convolutional network is a 50-layer residual network ResNet-50 pre-trained on the ImageNet dataset, and the convolution kernel size of the two-dimensional convolutional layer is 1×1;(1-3)将支持集和查询集输入到深度视觉特征提取模块,得到支持特征图集合和查询特征图集合其中,表示第i个支持特征图,表示第j个查询特征图,H′、W′和256分别表示单个特征图的高、宽和通道数,(1-3) will support the set and queryset Input into the deep visual feature extraction module to obtain a set of supporting feature maps and query feature graph collection in, represents the i-th support feature map, represents the j-th query feature map, H′, W′ and 256 represent the height, width and number of channels of a single feature map respectively.3.如权利要求2所述的基于坐标注意力群组优化的小样本目标检测方法,其特征在于,步骤(2)具体是:3. The small sample target detection method based on coordinate attention group optimization as claimed in claim 2, characterized in that step (2) specifically comprises:(2-1)随机初始化,得到目标参数矩阵集合Aj表示第j个查询图像样本对应的目标参数矩阵,M表示单个查询图像样本中待检测的目标个数,256表示单个目标参数矩阵中行向量维度;构建坐标注意力引导模块,所述坐标注意力引导模块由坐标注意力子模块和交叉注意力子模块组成,其中坐标注意力子模块从特征图的水平和垂直两个方向计算各个坐标位置的权重,得到空间位置注意特征图集合,交叉注意力子模块用于融合目标参数矩阵集合与空间位置注意特征图集合;(2-1) Random initialization to obtain the target parameter matrix set Aj represents the target parameter matrix corresponding to the j-th query image sample, M represents the number of targets to be detected in a single query image sample, and 256 represents the row vector dimension in a single target parameter matrix; construct a coordinate attention guidance module, the coordinate attention guidance module consists of a coordinate attention submodule and a cross attention submodule, wherein the coordinate attention submodule calculates the weight of each coordinate position from the horizontal and vertical directions of the feature map to obtain a set of spatial position attention feature maps, and the cross attention submodule is used to fuse the target parameter matrix set with the set of spatial position attention feature maps;(2-2)对第i个支持特征图分别沿水平坐标方向和垂直坐标方向做平均池化操作,得到水平特征图和垂直特征图表示张量第μ个通道中坐标位置为(h′,w′)处的值,1≤μ≤256,1≤h′≤H′,1≤w′≤W′,则:张量第μ个通道中坐标位置为(h′,1)处的值张量第μ个通道中坐标位置为(1,w′)处的值(2-2) For the i-th support feature map Perform average pooling operations along the horizontal coordinate direction and the vertical coordinate direction to obtain the horizontal feature map and vertical feature map Representing a tensor The value at the coordinate position (h′, w′) in the μth channel, 1≤μ≤256, 1≤h′≤H′, 1≤w′≤W′, then: The value at the coordinate position (h′,1) in the μth channel Tensor The value at the coordinate position (1,w′) in the μth channel(2-3)将水平特征图和垂直特征图依次输入二维卷积层和激活函数层,得到水平权重特征图和垂直权重特征图其中Conv1(·)表示卷积核尺寸为1×1的二维卷积层,σ(·)表示Sigmoid激活函数;(2-3) The horizontal feature map and vertical feature map Input the two-dimensional convolution layer and activation function layer in sequence to obtain the horizontal weight feature map And vertical weight feature map Where Conv1 (·) represents a two-dimensional convolutional layer with a convolution kernel size of 1×1, and σ(·) represents the Sigmoid activation function;(2-4)根据支持特征图水平权重特征图和垂直权重特征图计算得到空间位置注意特征图张量第μ个通道中坐标位置为(h′,w′)处的值表示张量第μ个通道中坐标位置为(h′,1)处的值,表示张量第μ个通道中坐标位置为(1,w′)处的值;(2-4) According to the support feature map Horizontal weight feature map And vertical weight feature map Calculate the spatial position attention feature map Tensor The value at coordinate (h′, w′) in the μth channel Representing a tensor The value at the coordinate position (h′,1) in the μth channel, Representing a tensor The value at the coordinate position (1, w′) in the μth channel;(2-5)对支持特征图集合中所有特征图执行步骤(2-2)~(2-4),得到空间位置注意特征图集合将集合中所有特征图的按空间维度展开并拼接,得到空间目标位置特征(2-5) Support feature map set All feature maps in Execute steps (2-2) to (2-4) to obtain a set of spatial position attention feature maps Will gather All feature maps in Expand and concatenate according to the spatial dimension to obtain the spatial target position feature(2-6)将中每个目标参数矩阵Aj与空间目标位置特征输入到交叉注意力子模块中,得到目标融合矩阵集合softmax(·)为归一化指数函数,上标T表示转置操作。(2-6) Each target parameter matrix Aj in the spatial target position feature Input into the cross attention submodule to obtain the target fusion matrix set Softmax(·) is a normalized exponential function, and the superscript T indicates the transposition operation.4.如权利要求3所述的基于坐标注意力群组优化的小样本目标检测方法,其特征在于,步骤(3)具体是:4. The small sample target detection method based on coordinate attention group optimization as claimed in claim 3, characterized in that step (3) specifically comprises:(3-1)构建群组优化模块,群组是属于同一类别的支持目标集合,根据支持目标类标记和得到的支持目标向量集合计算支持目标概率矩阵和相似度矩阵,并通过支持目标概率矩阵和相似度矩阵得到群组传播矩阵;(3-1) Construct a group optimization module. A group is a set of support targets belonging to the same category. The support target probability matrix and similarity matrix are calculated according to the support target class labels and the obtained support target vector set, and the group propagation matrix is obtained through the support target probability matrix and the similarity matrix.(3-2)将步骤(1-3)中得到的支持特征图集合和步骤(1-1)中得到的每个支持图像样本对应标注中每个边界框bi,φ输入到感兴趣区域池化层中,得到支持目标特征图集合其中感兴趣区域池化层表示特征图在对应目标边界框区域内做最大池化操作,Oc,k表示集合中第c类中第k个目标的特征图;(3-2) Set the support feature map obtained in step (1-3) And each support image sample obtained in step (1-1) Corresponding annotation Each bounding boxbi,φ in is input into the region of interest pooling layer to obtain a set of supporting target feature maps The region of interest pooling layer represents the maximum pooling operation of the feature map in the corresponding target bounding box area, and Oc,k represents the feature map of the kth target in the cth class in the set;(3-3)对支持目标特征图集合依次通过卷积和全局平均池化操作,得到支持目标向量集合其中,集合中第c类中第k个目标的特征向量GAP(·)表示在特征图空间维度上的全局平均池化操作,Conv2(·)表示卷积核尺寸为3×3的二维卷积层;(3-3) Support target feature map set Through convolution and global average pooling operations in sequence, we get the set of support target vectors. Among them, the feature vector of the kth target in the cth class in the set GAP(·) represents the global average pooling operation in the spatial dimension of the feature map, and Conv2 (·) represents a two-dimensional convolutional layer with a convolution kernel size of 3×3;(3-4)计算支持目标类向量集合通过支持目标类向量集合得到支持目标概率矩阵其中,集合中第c类的支持目标类向量矩阵P第u行第v列的值,表示第u个支持向量属于第v类的概率1≤u≤CK,1≤v≤C,u=(c-1)×K+k,表示集合中第u个支持目标向量,即exp(·)表示自然常数e为底的指数函数,dist(·,·)表示欧式距离函数,则(3-4) Calculate the set of support target class vectors The support target probability matrix is obtained by the support target class vector set. Among them, the support target class vector of the cth class in the set The value of the u-th row and v-th column of the matrix P represents the probability that the u-th support vector belongs to the v-th class. 1≤u≤CK, 1≤v≤C, u=(c-1)×K+k, Representing a collection The u-th support target vector in exp(·) represents the exponential function with the natural constant e as the base, dist(·,·) represents the Euclidean distance function, then(3-5)利用高斯核函数计算支持目标向量间的相似度矩阵通过相似度矩阵Z和支持目标概率矩阵P构建群组传播矩阵其中,矩阵Z第u行第w列的值,表示支持目标向量间的相似度1≤w≤CK,高斯核函数中的超参数γ>0,表示集合中第w个支持目标向量,群组传播矩阵Λ=ZP,Λ中第u行第v列的值Λu,v表示支持目标向量根据样本间相似度加权求和得到的属于第v类的概率;(3-5) Use the Gaussian kernel function to calculate the similarity matrix between support target vectors Construct the group propagation matrix through the similarity matrix Z and the support target probability matrix P Among them, the value of the uth row and wth column of the matrix Z represents the support target vector and Similarity between 1≤w≤CK, the hyperparameter γ in the Gaussian kernel function is greater than 0, Representing a collection The w-th supporting target vector in the group propagation matrix Λ=ZP, and the value Λu,v in the u-th row and v-th column in Λ represents the supporting target vector The probability of belonging to the vth class is obtained by weighted summation of similarities between samples;(3-6)初始化迭代上限值Ψ,通过群组传播矩阵Λ、相似度矩阵Z和标签传播算法迭代优化支持目标概率矩阵P,直至迭代次数达到上限值Ψ,得到更新后的支持目标概率矩阵P(Ψ),初始迭代时P(0)=P,Λ(0)=Λ,标签传播算法迭代为100≤ψ≤120,表示第次迭代时的支持目标概率矩阵和群组传播矩阵,表示矩阵第u行第v列的值和矩阵第u行第v列的值,表示矩阵第u行第v列的值和第u行第c列的值;(3-6) Initialize the iteration upper limit value Ψ, and iteratively optimize the support target probability matrix P through the group propagation matrix Λ, the similarity matrix Z and the label propagation algorithm until the number of iterations reaches the upper limit value Ψ, and obtain the updated support target probability matrix P(Ψ) . In the initial iteration, P(0) = P, Λ(0) = Λ, and the label propagation algorithm iteration is 100≤ψ≤120, and Indicates The support target probability matrix and group propagation matrix at the iteration, and Representation Matrix The value and matrix of row u and column v The value of row u and column v, and Representation Matrix The value of the uth row and vth column and the value of the uth row and cth column;(3-7)利用更新后的支持目标概率矩阵P(Ψ)和支持目标向量集合计算得到新的支持目标类向量集合其中,集合中第c类的支持目标类向量P(Ψ)表示第Ψ次迭代结束后的目标概率矩阵,为矩阵P(Ψ)第u行第c列的值,表示更新后第u个支持目标向量属于第c类的概率。(3-7) Using the updated support target probability matrix P(Ψ) and the support target vector set Calculate the new set of support target class vectors Among them, the support target class vector of the cth class in the set P(Ψ) represents the target probability matrix after the Ψth iteration. is the value of the u-th row and c-th column of the matrix P(Ψ) , indicating the u-th support target vector after the update The probability of belonging to class c.5.如权利要求4所述的基于坐标注意力群组优化的小样本目标检测方法,其特征在于,步骤(4)具体是:5. The small sample target detection method based on coordinate attention group optimization as described in claim 4, characterized in that step (4) specifically comprises:(4-1)构建查询目标预测模块,模块由转换器子模块、目标分类函数和边界框预测子模块组成;(4-1) Constructing a query target prediction module, which consists of a converter submodule, a target classification function, and a bounding box prediction submodule;将查询特征图集合中每个特征图沿空间维度展开得到查询特征矩阵集合计算查询样本的位置编码矩阵矩阵G第κ行第ω列的值1≤κ≤H′×W′,1≤ω≤256,mod表示取余数运算;Query the feature graph collection Each feature map is expanded along the spatial dimension to obtain the query feature matrix set Calculate the position encoding matrix of the query sample The value of the κth row and ωth column of the matrix G 1≤κ≤H′×W′, 1≤ω≤256, mod means remainder operation;(4-2)将查询特征矩阵集合和位置编码矩阵E输入到转换器子模块的编码器中,得到查询目标编码特征集合编码器由一个注意力层和两个全连接层组成,第j个查询目标编码特征表示逐元素相加操作,FFN(·)表示两个全连接层组成的前馈神经网络;(4-2) Set the query feature matrix And the position encoding matrix E is input into the encoder of the converter submodule to obtain the query target encoding feature set The encoder consists of an attention layer and two fully connected layers. The j-th query target encodes features represents the element-by-element addition operation, and FFN(·) represents a feed-forward neural network consisting of two fully connected layers;(4-3)将查询目标编码特征集合和步骤(2-6)中得到的目标融合矩阵集合输入到转换器子模块的解码器中,得到查询目标解码特征集合解码器由两个注意力层和两个全连接层组成,第j个查询目标解码特征表示输入到第一个注意力层得到的中间结果矩阵,(4-3) Set the query target encoding feature set And the target fusion matrix set obtained in step (2-6) Input into the decoder of the converter submodule to obtain the query target decoding feature set The decoder consists of two attention layers and two fully connected layers. The j-th query target decoding feature express The intermediate result matrix obtained by inputting into the first attention layer,(4-4)通过查询目标解码特征集合和步骤(3-6)得到的支持目标类向量集合计算查询集的目标预测类别概率,第j个查询样本中第m个目标属于第c类的概率1≤m≤M,表示矩阵第m行的向量;(4-4) Decoding feature sets by querying the target And the set of support target class vectors obtained in steps (3-6) Calculate the target prediction category probability of the query set, the probability that the mth target in the jth query sample belongs to the cth category 1≤m≤M, Representation Matrix The vector of the mth row;(4-5)将查询目标解码特征集合输入到边界框预测子模块得到查询集的预测目标边界框集合边界框预测子模块为三个全连层组成的多层感知器,第j个查询样本的预测边界框矩阵中第m行表示第j个查询样本中第m个目标的预测边界框,为边界框的左上角坐标,为边界框的右下角坐标。(4-5) Decode the query target feature set Input to the bounding box prediction submodule to obtain the predicted target bounding box set of the query set The bounding box prediction submodule is a multilayer perceptron consisting of three fully connected layers. The predicted bounding box matrix of the jth query sample is The mth row represents the predicted bounding box of the mth target in the jth query sample, is the coordinate of the upper left corner of the bounding box, The coordinates of the lower right corner of the bounding box.6.如权利要求5所述的基于坐标注意力群组优化的小样本目标检测方法,其特征在于,步骤(5)具体是:6. The small sample target detection method based on coordinate attention group optimization as claimed in claim 5, characterized in that step (5) specifically comprises:(5-1)通过查询集的目标预测类别概率和交叉熵损失函数计算目标分类损失yj,m,c∈{0,1}表示第j个查询样本中第m个目标属于第c类的真实标记值;(5-1) Predicting category probabilities through the target of the query set And the cross entropy loss function calculates the target classification lossyj,m,c∈ {0,1} represents the true label value of the mth target in the jth query sample belonging to the cth class;(5-2)通过查询集的预测目标边界框集合计算边界框损失Bj,m表示第j个查询样本中第m个目标的真实边界框,表示真实边界框与预测边界框的交并比;(5-2) Predicted target bounding box set through query set Calculating bounding box loss Bj,m represents the ground-truth bounding box of the mth object in the jth query sample, Represents the intersection-over-union ratio of the true bounding box and the predicted bounding box;(5-3)根据目标分类损失和边界框损失得到总损失利用利用随机梯度下降算法优化由坐标注意力引导模块、群组优化模块和查询目标预测模块组成的小样本目标检测模型,迭代训练该模型直至收敛,获得优化后的小样本目标检测模型;(5-3) According to the target classification loss and bounding box loss Get the total loss The small sample target detection model composed of the coordinate attention guidance module, the group optimization module and the query target prediction module is optimized by using the stochastic gradient descent algorithm, and the model is iteratively trained until convergence to obtain the optimized small sample target detection model;(5-4)对新的图像数据集采样得到支持集和查询集输入上述优化后的小样本目标检测模型,按照步骤(1)~(4)依次执行,输出查询集中图像样本的目标类别概率和边界框集合选择概率最大目标类索引作为预测类别。(5-4) Sampling the new image dataset to obtain the support set and queryset Input the optimized small sample target detection model and execute steps (1) to (4) in sequence to output the query set The target category probability of the image sample in and bounding box collection The target class index with the maximum probability is selected as the predicted class.
CN202210697675.0A2022-06-202022-06-20 Small-sample target detection method based on coordinate attention group optimizationActiveCN115019103B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202210697675.0ACN115019103B (en)2022-06-202022-06-20 Small-sample target detection method based on coordinate attention group optimization

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202210697675.0ACN115019103B (en)2022-06-202022-06-20 Small-sample target detection method based on coordinate attention group optimization

Publications (2)

Publication NumberPublication Date
CN115019103A CN115019103A (en)2022-09-06
CN115019103Btrue CN115019103B (en)2025-02-14

Family

ID=83075723

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202210697675.0AActiveCN115019103B (en)2022-06-202022-06-20 Small-sample target detection method based on coordinate attention group optimization

Country Status (1)

CountryLink
CN (1)CN115019103B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116091787B (en)*2022-10-082024-06-18中南大学Small sample target detection method based on feature filtering and feature alignment
CN115578592B (en)*2022-10-182025-07-18中国电子科技集团公司第三十八研究所SAR image small sample target detection method and system based on meta learning
CN115984630B (en)*2023-02-102025-02-07杭州电子科技大学 Small sample open set image recognition method based on low-dimensional contrast adaptation
CN116052108A (en)*2023-02-212023-05-02浙江工商大学Transformer-based traffic scene small sample target detection method and device
CN118628857A (en)*2023-03-072024-09-10腾讯科技(深圳)有限公司 Image annotation processing method, device, computer equipment and readable storage medium
CN115953665B (en)*2023-03-092023-06-02武汉人工智能研究院 A target detection method, device, equipment and storage medium
CN116109907B (en)*2023-04-172023-08-18成都须弥云图建筑设计有限公司Target detection method, target detection device, electronic equipment and storage medium
CN117036897B (en)*2023-05-292025-09-26中北大学 A few-shot object detection method based on Meta RCNN
CN118887547B (en)*2024-09-272024-12-27中国石油大学(华东)Cross-domain small sample SAR oil spill detection method based on category perception distance

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112818903A (en)*2020-12-102021-05-18北京航空航天大学Small sample remote sensing image target detection method based on meta-learning and cooperative attention
CN112990282A (en)*2021-03-032021-06-18华南理工大学Method and device for classifying fine-grained small sample images

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112215223B (en)*2020-10-162024-03-19清华大学Multidirectional scene character recognition method and system based on multi-element attention mechanism

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112818903A (en)*2020-12-102021-05-18北京航空航天大学Small sample remote sensing image target detection method based on meta-learning and cooperative attention
CN112990282A (en)*2021-03-032021-06-18华南理工大学Method and device for classifying fine-grained small sample images

Also Published As

Publication numberPublication date
CN115019103A (en)2022-09-06

Similar Documents

PublicationPublication DateTitle
CN115019103B (en) Small-sample target detection method based on coordinate attention group optimization
CN109977918B (en) An Optimization Method for Object Detection and Localization Based on Unsupervised Domain Adaptation
CN111709311B (en)Pedestrian re-identification method based on multi-scale convolution feature fusion
CN117237733B (en)Breast cancer full-slice image classification method combining self-supervision and weak supervision learning
Chen et al.Research on recognition of fly species based on improved RetinaNet and CBAM
CN104573669B (en)Image object detection method
CN106096561B (en)Infrared pedestrian detection method based on image block deep learning features
CN113657414B (en)Object identification method
CN105678284B (en)A kind of fixed bit human body behavior analysis method
CN115457332B (en) Image multi-label classification method based on graph convolutional neural network and class activation mapping
CN114692732B (en) A method, system, device and storage medium for online label updating
CN109961089A (en) Few-shot and zero-shot image classification methods based on metric learning and meta-learning
Bochinski et al.Deep active learning for in situ plankton classification
CN117152416B (en) A sparse attention target detection method based on improved DETR model
CN111931505A (en)Cross-language entity alignment method based on subgraph embedding
CN110163117B (en)Pedestrian re-identification method based on self-excitation discriminant feature learning
CN115311502A (en) A small sample scene classification method for remote sensing images based on multi-scale dual-stream architecture
CN109753897A (en) Behavior recognition method based on memory unit reinforcement-temporal dynamic learning
CN113222068A (en)Remote sensing image multi-label classification method based on adjacency matrix guidance label embedding
CN117333948A (en) An end-to-end multi-target broiler behavior recognition method integrating spatiotemporal attention mechanism
Rezatofighi et al.Learn to predict sets using feed-forward neural networks
CN112364747B (en)Target detection method under limited sample
CN108960005B (en)Method and system for establishing and displaying object visual label in intelligent visual Internet of things
Ding et al.DeoT: an end-to-end encoder-only Transformer object detector
CN105787045A (en)Precision enhancing method for visual media semantic indexing

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp