Background
In recent years, with the rapid development of computer vision based on deep learning, object detection has become a popular research direction of computer vision, and is widely applied to various fields such as video monitoring, industrial detection, medical treatment, and the like. The method has important practical significance in reducing the consumption of manpower and material resources by utilizing computer vision.
Target detection is a very basic and important task, and image segmentation, object tracking, keypoint detection, etc. typically rely on target detection. In target detection, the number, size and pose of objects in each image are different, namely unstructured output, which is a point very different from image classification.
In a practical scenario, however, deep learning-based target detection is very sensitive to scale and variation of targets, especially detection of small targets. The reason for this phenomenon is mainly the following three points:
Firstly, if the detected target is smaller in scale, as the training network deepens, the detected target can easily lose features such as edge information, gray information and the like, the high-level semantic information can also obtain fewer features, and in addition, some noise information can exist in an image to mislead the training network to learn wrong features;
Second, the size of the receptive field mapping to the original image plays a relatively important role in detecting whether the target is successful or not, and when the receptive field is smaller, more spatial structural features are reserved, but abstract semantic information may be less. On the contrary, the semantic information reserved when the receptive field is large is relatively richer, but the spatial structure information of the target may be lost;
Third, convolutional neural networks are discrete implementations of feature extraction, making sub-pixel accuracy difficult. When small targets are involved, the neural network is one pixel worse in the deep layer of the network, and possibly 8 pixels or 16 pixels or more in the shallow layer, which has little effect on the large targets, but has a great effect on the small targets. Therefore, it is important to improve the detection effect of a small object and to reduce the size of the model without decreasing the accuracy.
At present, the target detection method for small targets mainly comprises the following directions:
Firstly, the thought of an image pyramid is used for carrying out scale transformation, namely enlargement or reduction, on an input detected image, an image pyramid with the image scale gradually increasing or decreasing from top to bottom can be constructed on the basis, and then a window with a fixed size is used for sliding detection of an interested target on each layer of image. However, as images with different resolutions all need to pass through a convolutional neural network, the calculated amount is large, so that the detection speed is slow;
Secondly, the image features are fused, so that semantic information of shallow features and space structure information of deep features can be improved. However, since feature level fusion is performed by extracting image features as fusion information, many detail features are lost;
thirdly, adjusting the dimension and distribution of the anchor frame. In actual use, however, a large number of anchor frames are typically required to ensure sufficient overlap with the real frames such that only a small portion of the anchor frames overlap with the real frames, which can create a large imbalance between the positive and negative anchor frames and slow the training rate.
The existing research can only deal with the detection problem of small targets, but improving the robustness of the algorithm to the target scale change and realizing the detection of lightweight small targets are still difficult works in target detection.
Disclosure of Invention
The invention aims to provide a small object target detection method and system of a lightweight multi-scale attention mechanism, which can improve the detection precision of a small object, reduce the size of a model and solve the problem that the detection precision and a lightweight network cannot coexist in the existing research method.
The invention solves the problems by the following technical means:
The first aspect of the invention provides a small object target detection method based on YOLOv4 lightweight multi-scale attention mechanism, which comprises the following steps:
Step1, extracting network extraction features by utilizing GhostNet as backbone features of a YOLOv target detection architecture;
Step 2, capturing features which are identified in the small target image in two dimensions of the space and the channel by using a multi-scale attention module for the features extracted in the step 1;
And 3, adopting a Soft-NMS algorithm to reduce the confidence of the detection frame overlapped with the current optimal detection frame for the feature map with the identifying feature for the small target obtained in the step 2.
A second aspect of the present invention provides a small object target detection system based on YOLOv's 4 lightweight multi-scale attention mechanism, comprising:
the first feature extraction module performs feature extraction by using GhostNet as a trunk feature extraction network of the YOLOv target detection architecture;
The second feature extraction module is connected with the first feature extraction module, and captures features which are identified in the small target image in two dimensions of the space and the channel by using the multi-scale attention module on the features extracted by the first feature extraction module;
And the detection output module is connected with the second feature extraction module, and reduces the confidence coefficient of the detection frame overlapped with the current optimal detection frame in the feature diagram output by the second feature extraction module by adopting a Soft-NMS algorithm.
A third aspect of the present invention provides a small object target detection apparatus, comprising:
Memory, and
A processor coupled to the memory, the processor configured to execute the small object target detection method based on YOLOv a lightweight multi-scale attention mechanism based on instructions stored in the memory.
A fourth aspect of the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the small object target detection method based on the YOLOv4 lightweight multi-scale attention mechanism.
Compared with the prior art, the invention has the beneficial effects that:
According to the invention, ghostNet is used as a main feature of a YOLOv target detection framework to extract network extraction features, the network is subjected to weight reduction for the first time while the precision is ensured, and a multi-scale attention module is provided to carry out secondary weight reduction on the network, capture important features with discrimination in small target images in two dimensions of space and channel, reduce the confidence of a detection frame overlapped with the current optimal detection frame through a Soft-NMS algorithm, and obtain small object types in pictures in real time, efficiently and accurately only by modifying few parameters, and the small object types in the images can be obtained by adopting the method of the invention for different image acquisition equipment and images acquired in different scenes, so that the method has stronger robustness.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. It should be noted that this embodiment is only a part of the examples of the present invention, but not all examples, and all other examples obtained without making innovative work are included in the scope of the invention.
Example 1
The embodiment provides a small object target detection method based on YOLOv4 lightweight multi-scale attention mechanism, which comprises the following steps:
Step1, extracting network extraction features by utilizing GhostNet as backbone features of a YOLOv target detection architecture;
as shown in fig. 1, the YOLOv a 4 target detection architecture includes:
step 1.1, extracting primary features from an original image through a YOLOv target detection architecture taking GhostNet as a backbone network;
Step 1.2, conveying the strong semantic features from top to bottom through the FPN layer to the extracted preliminary features, conveying the strong positioning features from bottom to top through the PAN structure, and carrying out feature aggregation on different detection layers from different trunk layers.
Step 2, capturing features which are identified in the small target image in two dimensions of the space and the channel by using a multi-scale attention module for the features extracted in the step 1;
as shown in fig. 2, the specific steps of capturing features that are discriminative in small target images from both spatial and channel dimensions using a multi-scale attention module are:
And constructing a spatial attention mechanism module and a channel attention mechanism module, wherein the spatial attention mechanism module can enable a convolutional neural network to efficiently learn a region to be focused, so that spatial information in an original image is mapped to another space so as to preserve important features in the image, and the global features and the local features with discrimination can be adaptively learned by combining maximum pooling and average pooling. The channel attention mechanism of the channel attention mechanism module represents the correlation between the feature map of the channel and important features by adding a weight to the feature maps of n channels, wherein the larger the weight is, the more important features are contained in the feature map of the channel;
The method comprises the steps of constructing a spatial attention mechanism module and a channel attention mechanism module, combining the constructed spatial attention mechanism module and the channel attention mechanism module to construct a multi-scale attention mechanism, wherein the multi-scale attention mechanism adopts 4 branches to carry out multi-scale feature extraction on an input feature image, the first branch uses a 1×1 convolution operation to adjust the channel number to be the same as the channel number of the feature image output by other three branches, the second branch uses two cascaded 1×3 convolution operations and 3×1 convolution operations, the third branch uses two cascaded 1×5 convolution operations and 5×1 convolution operations, the two cascaded asymmetric convolution operations effectively reduce the number of network parameters, and meanwhile, more nonlinear activation layers can be introduced to improve the nonlinear learning capacity, and the fourth branch firstly uses a 3×3 maximum pooling operation to extract the feature texture and then carries out 1×1 convolution operation to adjust the channel number to be the same as the channel number of the feature image output by other three branches;
first, the feature map tensorInput to a spatial attention mechanism module for calculation to add spatial attention to obtain a feature map tensorWherein w, h and c are the width, height and channel number of the feature map respectively;
then, useThe 1 multiplied by 1 convolution check feature image tensor S carries out convolution operation to obtain the feature image tensor;
Then, 4 branches of a multiscale attention mechanism are used for respectively carrying out multiscale feature extraction on the feature map tensor D to obtain a multiscale feature map tensor、、、Performing feature fusion on the feature map tensors P1、P2、P3 and P4 by adopting Concat operation to obtain the feature map tensorsInputting the feature map tensor Q into a channel attention mechanism module for calculation to add channel attention to obtain the feature map tensor;
Finally, performing feature fusion on the feature map tensors S and C by adopting Add operation to obtain a feature map tensorAs an output of the multi-scale attention mechanism.
Step 3, adopting a Soft-NMS algorithm to reduce the confidence coefficient of the detection frame overlapped with the current optimal detection frame for the feature map with the identifying feature for the small target obtained in the step 2;
the decay formula of the Soft-NMS algorithm for reducing the confidence of the detection frame overlapped with the current optimal detection frame is as follows:
Wherein Si is confidence, bi is detection frame,For adjusting the degree of attenuation.
Effect contrast
By using the small object eye detection method based on the YOLOv4 lightweight multi-scale attention mechanism provided by the embodiment, the accurate class of the small object can be obtained by detecting the image, fig. 3 shows the performance comparison of the method of the embodiment with different algorithms under different scenes, wherein the first column is a label picture, the second column represents the detection result of adding GhostNet on the basis of YOLOv, the third column represents the detection result of adding Soft-NMS on the basis of YOLOv, and the fourth column represents the detection result of the method. In the second and third columns, there are many small objects such as people, vehicles, animals, etc., which are missed, but the method of the embodiment can accurately detect all the small objects without missing detection. For the case of small fuzzy targets or front-back shielding and dense distribution, the method of the embodiment can accurately detect the object types, but other algorithms cannot. The results show that the method of this example is superior to YOLOv 4.
Example 2
The present embodiment provides a small object target detection system based on YOLOv's 4 lightweight multi-scale attention mechanism, comprising:
the first feature extraction module performs feature extraction by using GhostNet as a trunk feature extraction network of the YOLOv target detection architecture;
The second feature extraction module is connected with the first feature extraction module, and captures features which are identified in the small target image in two dimensions of the space and the channel by using the multi-scale attention module on the features extracted by the first feature extraction module;
And the detection output module is connected with the second feature extraction module, and reduces the confidence coefficient of the detection frame overlapped with the current optimal detection frame in the feature diagram output by the second feature extraction module by adopting a Soft-NMS algorithm.
The specific implementation method of the system of this embodiment is referred to the method described in embodiment 1, and will not be described herein.
Example 3
The present embodiment provides a small object target detection apparatus, including:
Memory, and
A processor coupled to the memory, the processor configured to perform the small object target detection method of embodiment 1 based on the YOLOv's lightweight multi-scale attention mechanism based on instructions stored in the memory.
The memory may include, for example, system memory, fixed nonvolatile storage media, and the like. The system memory stores, for example, an operating system, application programs, boot Loader (Boot Loader), and other programs.
The small object target detection device may also include an input-output interface, a network interface, a storage interface, etc. These interfaces and the memory and processor may be connected by a bus, for example. The input/output interface provides a connection interface for input/output devices such as a display, a mouse, a keyboard, a touch screen and the like. The network interface provides a connection interface for various networking devices. The storage interface provides a connection interface for external storage devices such as an SD card and a U disk.
Example 4
The present embodiment provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the small object target detection method based on YOLOv a4 lightweight multi-scale attention mechanism described in embodiment 1
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-non-transitory readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flowchart and/or block of the flowchart illustrations and/or block diagrams, and combinations of flowcharts and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.