Underwater transparent biological detection based on fusion of event camera and color frame imageTechnical Field
The invention relates to the technical field of underwater biological detection, in particular to underwater transparent biological detection based on fusion of an event camera and a color frame image.
Background
United states has proposed that the 21 st century is the ocean century, and the ocean world plays an increasingly important role in international competition. Meanwhile, underwater organisms have a close association with humans. The area of the ocean dominant region of China is close to one third of the land area, the research and development requirements of China on ocean resources are continuously expanded in recent years, the utilization degree is continuously increased, the development of the underwater target detection technology is more urgent and important, and therefore the underwater target detection becomes one of the important problems of marine organisms today.
Since the underwater environment is much more complex than on land, it is difficult for a conventional camera to capture when underwater organisms are moving rapidly, especially relatively transparent organisms. Meanwhile, under the condition of underwater illumination attenuation, the problems of limited visible range, unclear blurring, low contrast, non-uniform illumination, uncooled colors and the like of an underwater image acquired by using a common camera are unavoidable.
An event camera (event camera) is a bio-inspired visual sensor that works in a manner that is quite different from a frame-based camera. The event camera does not output intensity image frames at a constant rate, but only information about local pixel level brightness variations (referred to as "events") that when exceeded a set threshold, the event camera would timestamp with microsecond resolution and output an asynchronous event stream. The unique advantages of the event camera enable the event camera to be more suitable for underwater application scenes, so that the invention can improve the utilization rate of image information by fusing the event frames generated by the event camera with the color frames generated by the common frame-based camera by means of the unique advantages of the event camera and fusing useful information in the two images.
The underwater target detection technology is the most widely used basic application task in the underwater detection technology, and is also an important automatic underwater data analysis method. YOLOX networks are an existing object detection network, which mainly comprises: the input end, the reference network, the neck network and the head output end; the input end is used for acquiring an input image and scaling the input image to the input size required by the network; a reference network for extracting some general image feature representations, wherein CSPDarknet53 is used as a backbone network; the feature graphs with different scales are fused by using a space pyramid pooling network in the neck network, and the feature extraction capacity of the network is improved by using a top-down feature graph pyramid network and a bottom-up feature pyramid path aggregation network; and the head output end is used for outputting a target detection result.
Disclosure of Invention
The invention utilizes a deep learning technology and aims to provide underwater transparent biological detection based on fusion of an event camera and a color frame image.
For underwater operation, particularly underwater robot operation and other scenes, the phenomena of color fading, low contrast, blurred details and the like can occur under the common condition of an RGB image obtained by using a common camera. And for fast swimming underwater organisms, a common camera is difficult to capture the clear movement form of the underwater organisms. The event camera may asynchronously output events for changes in intensity, including coordinates of pixels, polarity of intensity, and time stamps. Because image-based object detection techniques are now mature, events are first converted into images, and then color frame images (APS) and event frame images (DVS) are supplemented with image information in both by image fusion techniques. And after the fusion image is obtained, the fusion image is proportionally divided into a training set, a verification set and a test set. The mainstream object detection network YOLOX is then modified and the fused image is fed into the modified YOLOX for training. And finally, carrying out an ablation experiment and a comparison experiment on the trained YOLOX model to verify the effectiveness of the improved model.
The whole frame diagram of the method is shown in fig. 1, and can be divided into the following five steps: the event frames and the color frames are linearly fused to obtain a fused image; dividing the fusion image into a training set, a verification set and a test set according to a preset proportion; to improve YOLOX networks; training the improved YOLOX network to obtain a detection model of the underwater transparent organism; and predicting the image to be detected by using the training-obtained YOLOX underwater transparent biological detection model.
(1) The event frames and the color frames are linearly fused to obtain a fused image
The event data includes five parts, an abscissa x of the pixel, an ordinate y of the pixel, an increase in luminance polarity to +1, a decrease in luminance polarity to-1, and a time stamp. According to the change of the coordinates and the polarity of the pixels, the event data can be converted into an event image with the same size as the frame image in the accumulated time. Because the target detection algorithm for the image is relatively mature at present, the event data of the DVS channel is converted into the image and then is linearly fused with the APS color image.
(2) Dividing the fusion image into a training set, a verification set and a test set according to a preset proportion
The invention prepares the fusion image according to 8:1: the 1 scale is divided into training set, verification set and test set, namely 6497 pictures are divided into 5197 training sets, 649 Zhang Yanzheng sets and 651 test sets.
(3) Modifications to YOLOX networks
YOLOX, while currently the mainstream of object detection networks, still has room for improvement. The present invention proposes a three-point improvement for the feature fusion and loss function portion of the YOLOX network. Firstly, adding an adaptive feature fusion structure ASFF in a feature fusion part, wherein the ASFF structure suppresses inconsistency by a method of learning spatial filtering information, and can adaptively adjust the spatial weight of features of each scale during fusion, so that the scale invariance of the features can be improved; second, replace IOULoss used by YOLOX location loss with an α -iou function that can be used for accurate bounding box regression; finally, due to the complex and varied underwater environments, the presence of occlusions and overlaps between organisms can lead to indistinguishable foreground positive samples and background negative samples. The imbalance of the number of positive and negative samples can lead to unstable models, so the invention changes the cross entropy function of confidence Loss into Focal Loss to balance the number of positive and negative samples.
(4) Training the improved YOLOX network to obtain a detection model of the underwater transparent organism
Training the improved YOLOX model by using preset parameters to obtain a training model. Judging whether the training model meets the expected requirement according to the evaluation result of the verification set; if the training model meets the expected requirement, the training model is saved as an optimal model; and if the parameters of the training model do not meet the expected requirements, adjusting the parameters of the training model, and judging according to the evaluation result of the verification model until the parameters meet the expected requirements.
(5) Using a YOLOX underwater transparent biological detection model to predict an image to be detected,
and testing the test set by using the trained optimal model, and detecting the transparent underwater organisms and obtaining corresponding precision values.
Compared with the prior art, the invention has the following beneficial effects and obvious advantages: the event camera can be used for capturing fast moving objects well, the form of the moving objects can be recorded clearly under the condition of dark illumination, and meanwhile, the low power consumption of the event camera can be well suitable for underwater scenes. Therefore, aiming at the problems that the underwater environment is bad, the common camera is difficult to capture when the underwater organisms move rapidly, and the like, the event stream generated by the event camera can be used for converting the event stream into an event frame, and then the image is fused with an RGB frame to enhance the contrast of the image, and simultaneously, the YOLOX target detection algorithm is improved in three aspects so as to improve the detection precision of the underwater transparent organisms, thereby being beneficial to effectively protecting the scarce underwater organisms.
Drawings
FIG. 1 is a schematic diagram of a Yolox network model of the present invention;
FIG. 2 is a schematic flow chart of the present invention;
FIG. 3 is a schematic diagram of an event camera of the present invention;
fig. 4 is a schematic diagram of the ASFF structure of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide an underwater transparent biological detection method based on fusion of an event camera and a color frame image, which can improve the accuracy of underwater transparent biological detection, thereby achieving the purpose of protecting underwater organisms.
Because the feature fusion part of the YOLOX model in the prior art uses an FPN structure, the fusion mode is to adjust the sizes of different feature layers to be the same size and then accumulate, and the inconsistent sizes of the different feature layers can cause the noise of the fusion feature images to become large, so that the detection effect is poor; in addition, IOULOSS is used for positioning loss in YOLOX, and when a prediction frame and a real frame are not intersected, the use of the function can lead to the IOU being 0, and the situation that a loss function is not led can occur; at the same time, for a network like YOLOX one-stage the number of positive and negative samples is extremely unbalanced, which can lead to instability of the trained model.
In order to solve the drawbacks of the prior art, the present invention provides the following examples:
step 1: and acquiring an underwater transparent biological data set, wherein the image data set is an underwater transparent biological detection data set published and published on the internet, and the annotation file is in an XML file format and comprises an image size, a position coordinate of a target frame and a category label. The data set is divided into five categories including jellyfish, red jellyfish, cat fish, salad fish and beautiful white shrimp;
step 2: carrying out linear fusion on an event frame and an RGB frame of an underwater transparent biological data set by means of a function in OpenCV to obtain 6497 fusion images, and carrying out 8 on the fusion images: 1:1 is divided into a training set, a verification set and a test set;
step 3: based on the YOLOX network, an adaptive feature fusion ASFF module is added. ASFF can be represented by the following formula:
wherein ,is that the network adaptively learns the spatial importance weights of the level-1 to level-3 feature maps, and +.>Representing the adjustment from level-1, level-2, level-3 to level-LFeature vectors at feature map locations (i, j);
step 4: based on the YOLOX network, the IOULoss used in the positioning penalty is replaced by α -IOU, which can be used for accurate bounding box regression, is based on unified exponentiation of the existing penalty function of the IOU, and has the following formula:
and alpha is a weight coefficient, the regression accuracy of different horizontal bounding boxes can be realized more flexibly by adjusting alpha, and the best effect of alpha=3 is found in experiments.
Step 5: based on the YOLOX network, the cross entropy function of confidence loss, BCELoss, is changed to a focal loss function, which is defined as follows:
Lfl =-αt (1-pt )γ log(pt ),
wherein alpha is a weight factor, which can inhibit the number unbalance of positive and negative samples; gamma is a focusing parameter and represents a balance coefficient for controlling the weight of a sample difficult to classify, and in an actual experiment, the effect of gamma=2 is the best; p is pt The probability of difficultly classifying the sample is reflected.
In order to verify the effectiveness of the proposed method, the improved YOLOX is first trained using a fused image obtained after fusing an event frame with a color frame. The learning rate adopts a cosine algorithm, the initial learning rate is 0.01, the gradient model uses random gradient descent, training is stopped after 250 epochs are trained, and loss reaches a convergence state when 170 epochs are trained.
The modified YOLOX algorithm was subjected to ablation experiments as shown in table 1. It can be seen from the table that the YOLOX algorithm after improvement and using the fusion image as input is highest in accuracy, which is 2.58% higher than the mAP that is not improved and uses only the color frame image as input, and the accuracy value is improved, which proves the effectiveness of the improved YOLOX algorithm and also the effectiveness of fusion image obtained by fusing event frames and color frames.
TABLE 1 ablation experiments of Yolox
The modified YOLOX algorithm was subjected to comparative experiments. As shown in table 2, comparing the modified YOLOX algorithm with other mainstream classical target detection algorithms, it can be seen from table 2 that the modified YOLOX is improved by 2.18%, 5.65%, 4.75%, 2.47%, 2.58% compared to EfficientDet, faster-RCNN, SSD, retinaNet, YOLOX, respectively, wherein the improvement in accuracy value of the modified YOLOX is greatest compared to fast-RCNN.
TABLE 2 average precision value MAP for each class of YOLOX (%)
The improved YOLOX algorithm has greatly improved accuracy, and the effectiveness and superiority of the algorithm are shown. Therefore, the detection precision of the underwater transparent organism can be effectively improved by using the event frame and the color frame to carry out image fusion and improving the YOLOX, so that the underwater transparent organism can be more effectively protected.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.