Movatterモバイル変換


[0]ホーム

URL:


CN112801027A - Vehicle target detection method based on event camera - Google Patents

Vehicle target detection method based on event camera
Download PDF

Info

Publication number
CN112801027A
CN112801027ACN202110182127.XACN202110182127ACN112801027ACN 112801027 ACN112801027 ACN 112801027ACN 202110182127 ACN202110182127 ACN 202110182127ACN 112801027 ACN112801027 ACN 112801027A
Authority
CN
China
Prior art keywords
dvs
aps
image
event
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110182127.XA
Other languages
Chinese (zh)
Other versions
CN112801027B (en
Inventor
孙艳丰
刘萌允
齐娜
施云惠
尹宝才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of TechnologyfiledCriticalBeijing University of Technology
Priority to CN202110182127.XApriorityCriticalpatent/CN112801027B/en
Publication of CN112801027ApublicationCriticalpatent/CN112801027A/en
Application grantedgrantedCritical
Publication of CN112801027BpublicationCriticalpatent/CN112801027B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention discloses a vehicle target detection method based on an event camera, which is based on the event camera and is researched by utilizing a deep learning technology. The event camera can generate frames and event data asynchronously, which is of great help to overcome motion blur and extreme lighting conditions. Firstly, converting an event into an event image, then simultaneously sending a frame image and the event image into a fusion convolutional neural network, and adding a convolutional layer for extracting the characteristics of the event image; simultaneously, fusing the characteristics of the two through a fusion module in the middle layer of the network; and finally, redesigning the loss function to improve the effectiveness of vehicle target detection. The method can make up the defect that only the frame image is used for target detection in an extreme scene, and the event image is fused in the fusion convolution neural network on the basis of using the frame image, so that the vehicle target detection effect in the extreme scene is enhanced.

Description

Vehicle target detection method based on event camera
Technical Field
The invention discloses a vehicle target detection method under an extreme scene based on an event camera and by utilizing a deep learning technology, belongs to the field of computer vision, and particularly relates to technologies such as deep learning and target detection.
Background
With the rapid development of the automobile industry, the technology of automatically driving automobiles has received extensive attention in recent years from academic and industrial fields. Vehicle target detection is a challenging task in autonomous vehicle technology. It is an important application in the fields of automatic driving automobile technology and intelligent traffic system. It plays a key role in the automatic driving technology. The purpose of vehicle target detection is to accurately locate the positions of the remaining vehicles in the surrounding environment, avoiding accidents with other vehicles.
A great deal of current target detection research uses deep neural networks to enhance the target detection system. These studies basically use a frame-based camera called Active Pixel Sensor (APS). Thus, many of the detected objects are stationary or slowly moving, and lighting conditions are also suitable. In practice, vehicles encounter a variety of complex and extreme scenarios. In extreme lighting and motion blur situations, overexposure and blur situations can occur in images presented by conventional frame-based cameras, which can present a significant challenge to target detection.
Dynamic Vision Sensors (DVS) have the key features of high dynamic range and low latency. These characteristics enable them to capture environmental information and generate images faster than standard cameras. At the same time, they are not affected by motion blur, which is helpful for frame cameras in extreme cases. Furthermore, autonomous vehicles may be made more sensitive due to their low latency and short response time. Dynamic and active pixel sensors (DAVIS) can output regular gray frames and asynchronous events through APS and DVS channels, respectively. Regular grayscale frames may provide the main information for target detection, and asynchronous events may provide information for fast motion and illumination changes. With this heuristic, the detection performance of the target can be improved by combining the two data.
In recent years, deep learning algorithms have been used with great success and are widely used in image classification and target detection. The deep neural network has excellent feature extraction capability and strong learning capability, and can identify target categories and locate target positions in a target identification task. A Convolutional Neural Network (CNN) based on boundary regression can directly regress the position and class of a target from an input image without searching for candidate regions. But this requires that objects in the image fed into the CNN that need to be discriminated are sharp, whereas objects in the image generated in extreme scenes may be blurred. It cannot meet the requirement if only CNN is used for object detection of frame images generated in extreme scenes.
The CNN-based vehicle detection method fuses frame and event data output by a DAVIS camera. The method comprises the steps of reconstructing event data into an image, simultaneously sending a frame image and the event image into a convolutional neural network, and fusing the characteristics extracted from the event image and the characteristics extracted from the frame image in a network intermediate layer through a fusion module. And at the final detection layer, redesigning the loss function of the network, and adding a loss term to the DVS characteristics. The data set used in the experiment was a self-established vehicle target detection data set (Dataset of APS and DVS, DAD). The comparison of different input modes shows that the vehicle detection result is obviously improved under different environmental conditions. Meanwhile, compared with different methods such as a network using a single image input and a network using two kinds of data input at the same time, the method provided herein has a significant effect.
Disclosure of Invention
The invention provides a vehicle target detection method based on an event camera by utilizing a deep learning technology. Since a normal camera can produce motion blur, overexposure, or over-darkness in fast moving and extreme brightness scenes, temporal data generated by an event camera is used to enhance the detection effect. The event camera may asynchronously output events for changes in brightness, including coordinates of pixels, polarity of brightness, and time stamp, so the events are first turned into images. This is because the image-based target detection technology is mature at present, and the detection of the event is realized by the image detection technology. And simultaneously feeding the frame image (APS) and the event image (DVS) into a framework (ADF) of a converged convolutional network for convolution operation, and performing feature extraction and feature fusion in the network framework. Therefore, the characteristics of the images can be extracted, and the finally extracted characteristics have effective characteristic information of the characteristics. Finally, by modifying the loss function of the model, the loss term of the DVS is also increased on the basis of only carrying out the loss term on the APS. The overall frame diagram of the method is shown in the attached figure 1, and the method can be divided into the following four steps: and converting the event data into an event image, fusing the whole framework of the convolutional neural network to extract features, fusing the features through a fusion module, and carrying out target detection on the extracted features through a detection layer.
(1) Event data converted into event image
Considering that the current target detection algorithm for the image is relatively mature, the event data of the DVS channel is converted into the image and then is sent to the network together with the APS image for target detection. The event data is 5 parts in total, the abscissa x of the pixel, the ordinate y of the pixel, the luminance polarity increased by +1, the luminance polarity decreased by-1 and the time stamp. And converting the event data into an event image with the same size as the frame image in the accumulated time according to the change of the coordinates and the polarity of the pixels.
(2) Feature extraction monolithic framework
The invention uses the dark net-53 as a basic framework, and adds a convolution layer for extracting the characteristics of the DVS image on the basis of only carrying out convolution operation on the APS image. Because the data of the DVS channel is sparse, fewer convolutional layers are used to extract features at different resolutions. For Darknet-53, the DVS channel still uses continuous convolutional layers of3X 3 and 1X 1. The specific number of convolution layers is shown in table 1.
(3) Fusion module
In the network structure, a fusion module is designed with reference to ResNet. And the fusion module extracts the main features of the DVS at different resolutions and then fuses the main features with the features of the APS with the same size so as to guide the network to learn more detailed features of the APS and the DVS at the same time. The fusion module is shown in fig. 2.
(4) Target detection is carried out on the extracted features through a detection layer
And modifying the loss function of the network at the detection layer, wherein the loss function of the APS features adopts a cross entropy loss function, and the loss of coordinates, categories and confidence degrees is included. And on the basis, the cross-entropy loss function is also adopted to carry out loss calculation on the DVS characteristics. And finally, combining the detection result of the APS and the detection result of the DVS. The results for APS or DVS alone may still be correct results. Taking only the intersection of the two results, many correct detection results are lost. The results of the two are collected, so that the error can be reduced, and the accuracy can be improved.
Compared with the prior art, the invention has the following obvious advantages and beneficial effects:
the invention adopts convolution neural network technology to detect the vehicle in the extreme scene based on the APS image and DVS data generated by the event camera. Compared with the method that only traditional APS images are used, the event data are converted into event images, and the images are identified through mature depth learning. And then adding a fusion module in the convolutional neural network to perform feature level fusion on the two parts of information. Finally, by modifying the loss function again, the capability of the network for identifying the target when the problems of target blurring, illumination discomfort and the like exist in the image is improved, and a good effect is obtained in an extreme scene.
Drawings
FIG. 1 is a block diagram of an overall network architecture;
FIG. 2 is a schematic diagram of a fusion module;
FIG. 3 is a graph of experimental results;
Detailed Description
In light of the above description, a specific implementation flow is as follows, but the scope of protection of this patent is not limited to this implementation flow.
Step 1: event data converted into event image
Based on the generation mechanism of the event, there are three reconstruction methods to convert the event into the frame. They are a fixed event number method, a leaky integrator method, and a fixed time interval method, respectively. In the present invention, it is an object to be able to detect fast moving objects. The event reconstruction is set to a fixed frame length of 10ms using a fixed time interval method. In each time interval, according to the pixel position generated by the event, at the corresponding pixel point generated with polarity, the event with the polarity increased is drawn as a white pixel, the event with the polarity decreased is drawn as a black pixel, and the background color of the image is gray. And finally generating an event image with the same size as the APS image.
Step 2: feature extraction via a network ensemble framework
APS images and DVS images are simultaneously input into a network framework, and features are extracted through respective 3 × 3 and 1 × 1 convolutional layers, except that the number of convolutional layers for extracting the features is different, and the DVS is less than that of the APS. The network (2) predicts the input APS image and also predicts the DVS image. Both APS and DVS images are divided into S × S grids, each grid predicts B bounding boxes, and predicts C classes altogether. Each bbox was introduced into the Gaussian model, predicting 8 coordinate values, μ _ x, ε _ x, μ _ y, ε _ y, μ _ w, ε _ w, μ _ h, ε _ h. A confidence score p is also predicted. So at the last input detection layer of the network is a tensor of 2 × S × B × (C + 9). The three size tensors of the APS channel and the three same size tensors of the DVS channel are fed into the detection layer, respectively.
And step 3: fusion module
Passing APS and DVS through respective convolution layers to obtain characteristic FapsAnd FdvsFeeding into a fusion model, and first FapsAnd FdvsF → U, F ∈ R, U ∈ RM×N×C,U=[u1,u2,…,uC]To obtain a transformation characteristic UapsAnd UdvsWherein u iscIs a feature matrix of size M × N for the C-th channel among the C channels. Briefly, the Tc operation is taken as a convolution operation;
obtaining transformation characteristics UdvsThen, we consider the global information of all channels in the feature, compress this global information into one channel to get the aggregation information zc. Operating Tst (U) by global average poolingdvs) To accomplish this formally expressed as:
Figure BDA0002941730490000041
wherein u isc(i, j) is the (i, j) th value in the feature matrix. In order to utilize the aggregate information z in the compression operationcExcitation operation is carried out, convolution characteristic information of each channel is fused, and a dependency relation s on the channels is obtained, namely:
s=Tex(z,E)=δ(E2σ(E1z))#(2)
where σ denotes a ReLU activation function, δ denotes a sigmoid activation function, E1And E2Two weights. This is achieved using two fully connected layers;
using s to activate switch U through Tscale operationapsObtaining a feature block U':
U′=Tscale(Uaps,s)=Uaps·s#(3)
finally, the DVS feature block is fused with the APS feature to obtain the final fusion feature Faps′:
Figure BDA0002941730490000051
Splicing operation is adopted in specific implementation.
And 4, step 4: target detection is carried out on the extracted features through a detection layer
The same as APS part, DVS detection result is added in the detection layer, binary cross entropy loss is carried out on the objects and classes detected by DVS, and the negative log likelihood loss function (NLL) of the coordinate frame is as follows:
Figure BDA0002941730490000052
wherein
Figure BDA0002941730490000053
Is the NLL loss of the x coordinate of the DVS. W and H are the number of grids for each width and height, respectively, and K is the prior frame number. The output of the detection layer at the kth prior box of the (i, j) grid is:
Figure BDA0002941730490000054
and
Figure BDA0002941730490000055
Figure BDA0002941730490000056
the coordinates of x are shown as such,
Figure BDA0002941730490000057
representing the uncertainty of the x coordinate.
Figure BDA0002941730490000058
Is the group Truth in x-coordinate, which is calculated from the width and height of the adjusted image inGaussian yollov 3 and the kth prior box prior. ξ is a fixed value of 10-9.
Figure BDA0002941730490000059
The same as the x coordinate, represents the loss of the remaining coordinates y, w, h.
Figure BDA00029417304900000510
ωscale=2-wG×hG#(7)
ωscaleAccording to the size (w) of the object in the training processG,hG) Different weights are provided. (6) In (1)
Figure BDA00029417304900000511
Is a parameter that is applied in the loss only if there is an anchor point in the prior box that best fits the current object. The value of this parameter is 1 or 0, which is intersected by the intersection of GroudTruth and the kth prior box in the (i, j) meshes ((IOU).
Figure BDA00029417304900000512
CijkThe value of (d) depends on whether the bounding box of the grid cell fits the predicted object. If appropriate, Cijk1 is ═ 1; otherwise, Cijk=0。τnoobjThe k-th prior box indicating the grid does not fit the target.
Figure BDA0002941730490000061
Representing the correct category.
Figure BDA0002941730490000062
The k-th prior box indicating the mesh is not responsible for predicting the target.
The class losses are as follows:
Figure BDA0002941730490000063
Pijindicating the probability that the currently detected object is the correct object.
The loss function for the DVS portion is:
Figure BDA0002941730490000064
wherein L isDVSRepresenting the sum of DVS channel coordinate value loss, class loss, and confidence loss.
LAPSAnd LDVSConsistent in form. The overall network loss function is therefore:
L=LAPS+LDVS#(11)
by increasing the loss function of the DVS channel, the data of the model for detecting the extreme environment has stronger robustness, and the accuracy of the algorithm is improved.
To verify the validity of the proposed solution of the present invention, experiments were first performed on a custom data set. Comparative experiments were performed for different methods such as inputting only an APS image, inputting only a DVS image, inputting a superimposed image of APS and DVS pixels, and inputting both images at the same time, and the experimental results are shown in table 2. Further, the effect of the different input modes is shown in fig. 3. Each column in the figure corresponds to one input mode. Four scenes (fast moving, over-lit dark, and normal) were selected for each method. In a scene where the object is moving rapidly, the input DVS image may detect a rapidly moving vehicle, but may not detect a relatively stationary vehicle. The opposite input APS image can detect a relatively stationary vehicle, but cannot detect a fast moving vehicle. The effect of the image after the superposition of the input APS and DVS pixels is comparable to the effect of the input APS image alone. After the two images are input simultaneously, the vehicle can obtain good detection effect no matter the vehicle moves rapidly or is still. In the case of too strong or too dark illumination, neither the input APS image nor the superimposed image of APS and DVS pixels has a good detection effect. Compared with the prior art, the APS image and the DVS image can be input simultaneously, so that the characteristics of the two parts can be well fused, and the defects of the APS can be made up through the DVS. The DVS image detection effect is the worst in a normal scene because only the luminance variation in the image can generate information, and the region without luminance variation corresponds to the background and cannot be recognized. In general, the method of fusing two images in a network while using an ADF network is significantly superior to other methods.
At the same time, several of the most advanced single input networks were also selected for comparison, as shown in table 3. The network comparison results of the single image input are compared on the custom data set. As can be seen from the table, in the case where the model of the present invention inputs only a single image, the network performance is not as good as that of other networks because the network itself is designed to implement dual input. Therefore, when the model simultaneously inputs frames and events, the experimental result is improved, and the effect of improving the identification by using the event data is proved.
In addition, the present invention compares the PKU-DDD17-CAR dataset with the JDF network into which both data were imported, and the results are shown in Table 4. And converting the event data in the data set into images and then sending the images into the ADF network. The results of inputting the frame image only and the frame image at the same time and the event data are compared, respectively. Although the network is inferior to the JDF network in the case of only inputting a frame image, the network in the case of simultaneously inputting two kinds of data is superior to the JDF network.
TABLE 1 number of convolution layers in network framework
Figure BDA0002941730490000071
Figure BDA0002941730490000081
Table 2 experimental results on custom data set
Figure BDA0002941730490000082
TABLE 3 comparison with Single image input network
Figure BDA0002941730490000083
TABLE 4 comparison of two data inputs into different networks
Figure BDA0002941730490000091

Claims (5)

Translated fromChinese
1.基于事件相机的车辆目标检测方法,其特征在于:基于事件相机生成的APS图像和DVS数据,采用卷积神经网络技术,对极端场景中的车辆进行目标检测,将事件数据转为事件图像;根据像素的坐标和极性的变化,在累计时间内将事件数据转为和帧图像相同大小的事件图像;利用成熟的卷积神经网络,在darknet-53框架基础上,只对APS图像进行卷积操作操作的基础上增加对DVS图像提取特征的卷积层,DVS通道仍然采用连续的3×3和1×1的卷积层;然后在卷积神经网络中增加融合模块,在不同分辨率下提取DVS特征之后对相同尺寸的APS的特征进行加权,以引导网络同时学习到APS和DVS更多的细节特征;在检测层修改网络的损失函数,对APS特征进行的损失函数是采用交叉熵损失函数,其中包括对坐标、类别和置信度的损失;交叉熵的损失函数对DVS特征进行损失计算。1. A vehicle target detection method based on an event camera, characterized in that: based on the APS image and DVS data generated by the event camera, the convolutional neural network technology is used to perform target detection on vehicles in extreme scenes, and the event data is converted into event images. ; According to the changes of the coordinates and polarity of the pixels, the event data is converted into an event image of the same size as the frame image within the accumulated time; using a mature convolutional neural network, based on the darknet-53 framework, only the APS image is processed. On the basis of the convolution operation, a convolution layer for extracting features from DVS images is added. The DVS channel still uses continuous 3 × 3 and 1 × 1 convolution layers; and then a fusion module is added to the convolutional neural network. After extracting DVS features at low speed, weight the features of APS of the same size to guide the network to learn more detailed features of APS and DVS at the same time; modify the loss function of the network in the detection layer, and the loss function for APS features is to use crossover Entropy loss function, which includes the loss of coordinates, categories and confidence; the loss function of cross entropy performs loss calculation on DVS features.2.根据权利要求1所述的基于事件相机的车辆目标检测方法,其特征在于,将事件转为图像采用固定时间间隔法;为了以每秒100帧的速度FPS实现检测,帧重建设置为10ms的固定帧长;在每个时间间隔内,根据事件生成的像素位置,在有极性生成的对应像素点,极性增加的事件被绘制成白色像素,极性减少的事件被绘制成黑色像素,图像的背景颜色为灰色;最后生成与APS图像相同大小的事件图像。2. The vehicle target detection method based on an event camera according to claim 1, wherein the event is converted into an image using a fixed time interval method; in order to realize detection at a speed of 100 frames per second, frame reconstruction is set to 10ms The fixed frame length of ; in each time interval, according to the pixel position generated by the event, at the corresponding pixel point with polarity generation, events with increasing polarity are drawn as white pixels, and events with decreasing polarity are drawn as black pixels , the background color of the image is gray; finally, an event image of the same size as the APS image is generated.3.根据权利要求1所述的基于事件相机的车辆目标检测方法,其特征在于,增加对DVS图像提取特征的连续的3×3和1×1的卷积层;将APS图像和DVS图像同时输入网络框架中,通过各自的3×3和1×1的卷积层提取特征,不同的是各自提取特征的卷积层层数不同,DVS的要比APS的少;网络在对输入的APS图像进行预测的同时,也对DVS图像进行了预测;APS图像和DVS图像都被划分为S×S的网格,每个网格预测B个边界框,共预测C类;每个bbox被引入高斯模型,共预测8个坐标值,μ_x,ε_x,μ_y,ε_y,μ_w,ε_w,μ_h,ε_h;此外还要预测一个置信度得分p;所以在网络的最后输入检测层的是2×S×S×B×(C+9)的张量;APS通道的三个尺寸的张量和DVS通道的三个相同尺寸的张量分别被送入检测层。3. The vehicle target detection method based on an event camera according to claim 1, wherein the continuous 3×3 and 1×1 convolution layers for extracting features from the DVS image are added; the APS image and the DVS image are simultaneously In the input network framework, the features are extracted through the respective 3×3 and 1×1 convolutional layers. The difference is that the number of convolutional layers for extracting features is different, and the DVS is less than the APS; the network is on the input APS. While the image is predicted, the DVS image is also predicted; both the APS image and the DVS image are divided into S×S grids, each grid predicts B bounding boxes, and a total of C categories are predicted; each bbox is introduced Gaussian model, a total of 8 coordinate values are predicted, μ_x, ε_x, μ_y, ε_y, μ_w, ε_w, μ_h, ε_h; in addition, a confidence score p is also predicted; so at the end of the network, the input detection layer is 2×S× S×B×(C+9) tensors; three tensors of the APS channel and three tensors of the same size of the DVS channel are respectively sent to the detection layer.4.根据权利要求1所述的基于事件相机的车辆目标检测方法,其特征在于,融合模块中将两部分特征进行有效的融合;将APS和DVS经过各自的卷积层之后分别得到特征Faps和Fdvs送入融合模型,先将Faps和Fdvs经过一个给定的变换操作Tc:F→U,F∈R,U∈RM×N×C,U=[u1,u2,...,uC],得到变换特征Uaps和Udvs,其中uc是在C通道中第c个通道的大小为M×N的特征矩阵;简单来说,将Tc操作作为一个卷积操作;4. The vehicle target detection method based on an event camera according to claim 1, wherein the two parts of features are effectively fused in the fusion module; the APS and the DVS are respectively obtained after passing through the respective convolution layers to obtain the feature Faps and Fdvs are sent to the fusion model, and Faps and Fdvs are first subjected to a given transformation operation Tc: F→U, F∈R, U∈RM×N×C , U=[u1 , u2 , ..., uC ], to obtain the transformed features Uaps and Udvs , where uc is a feature matrix of size M×N for the c-th channel in the C channel; in simple terms, the Tc operation is used as a convolution operate;得到变换特征Udvs之后,我们考虑特征中所有通道的全局信息,将这个全局信息压缩到一个通道中得到聚集信息zc;通过全局平均池化操作Tsq(Udvs)来完成,形式上表示为:After obtaining the transformed feature Udvs , we consider the global information of all channels in the feature, and compress this global information into one channel to obtain the aggregated information zc ; this is done through the global average pooling operationTsq (U dvs ), which is formally expressed as :
Figure FDA0002941730480000021
Figure FDA0002941730480000021
其中,uc(i,j)为特征矩阵中第(i,j)个值;为了利用在压缩操作中的聚集信息zc,进行激励操作,融合各个通道的卷积特征信息,获取通道上的依赖关系s,即:Among them, uc (i, j) is the (i, j) th value in the feature matrix; in order to use the aggregated information zc in the compression operation, perform the excitation operation, fuse the convolution feature information of each channel, and obtain the channel the dependencies s, namely:s=Tex(z,E)=δ(E2σ(E1z))#(2)s=Tex(z, E)=δ(E2 σ(E1 z))#(2)其中,σ表示ReLU激活函数,δ表示sigmoid激活函数;E1和E2为两个权重;使用两个全连接层来实现这一操作;Among them, σ represents the ReLU activation function, and δ represents the sigmoid activation function; E1 and E2 are two weights; two fully connected layers are used to achieve this operation;通过Tscale操作使用s激活转换Uaps的缩放,获得特征块U′:The feature block U′ is obtained by transforming the scaling of Uaps using the s activation by the Tscale operation:U′=Tscale(Uaps,s)=Uaps·s#(3)U'=Tscale(Uaps ,s)=Uaps ·s#(3)最终将DVS的特征块与APS的特征进行融合,得到最终的融合特征Faps′:Finally, the feature blocks of DVS and the features of APS are fused to obtain the final fusion feature Faps ′:
Figure FDA0002941730480000022
Figure FDA0002941730480000022
具体实现中采用拼接操作。The splicing operation is adopted in the specific implementation.5.根据权利要求1所述的基于事件相机的车辆目标检测方法,其特征在于,在检测层增加对DVS特征的损失项;和APS部分相同,在检测层增加DVS检测结果,对DVS检测到的对象和类进行二元交叉熵损失,坐标框的负对数似然损失函数(NLL)如下:5. The vehicle target detection method based on an event camera according to claim 1, wherein the loss item to the DVS feature is added in the detection layer; the same as the APS part, the DVS detection result is added in the detection layer, and the DVS is detected. A binary cross-entropy loss is performed on the objects and classes of , and the negative log-likelihood loss function (NLL) of the coordinate box is as follows:
Figure FDA0002941730480000023
Figure FDA0002941730480000023
其中
Figure FDA0002941730480000024
为DVS的x坐标的NLL损失;W和片分别为每个宽度和高度的网格数,K为先验框数;在(i,j)网格第k个先验框检测层的输出是:
Figure FDA0002941730480000025
Figure FDA0002941730480000026
Figure FDA0002941730480000027
表示x的坐标,
Figure FDA0002941730480000028
表示x坐标的不确定性;
Figure FDA0002941730480000029
是x坐标的Ground Truth,它是根据在GaussianYOLOv3中调整后的图像的宽度和高度以及第k个先验框先验来计算的;ξ是10-9的固定值;
Figure FDA0002941730480000031
与x坐标相同,表示其余坐标y、w、h的损失;
in
Figure FDA0002941730480000024
is the NLL loss of the x-coordinate of the DVS; W and slices are the number of grids for each width and height, respectively, and K is the number of a priori boxes; the output of the kth a priori box detection layer in the (i, j) grid is :
Figure FDA0002941730480000025
and
Figure FDA0002941730480000026
Figure FDA0002941730480000027
represents the coordinate of x,
Figure FDA0002941730480000028
represents the uncertainty of the x-coordinate;
Figure FDA0002941730480000029
is the Ground Truth of the x coordinate, which is calculated based on the width and height of the image adjusted in GaussianYOLOv3 and the kth a priori box prior; ξ is a fixed value of 10-9;
Figure FDA0002941730480000031
The same as the x coordinate, indicating the loss of the remaining coordinates y, w, h;
Figure FDA0002941730480000032
Figure FDA0002941730480000032
ωscale=2-wG×hG#(7)ωscale = 2-wG ×hG #(7)ωscale在训练过程中根据对象大小(wG,hG)提供不同的权重;(7)中的
Figure FDA0002941730480000033
是仅在先验框中存在最适合当前对象的锚点时才应用于损失中的参数;此参数的值为1或0,这由GroundTruth与(i,j)个网格中第k个先验框的交集相交(IOU)确定;
ωscale provides different weights according to the object size (wG , hG ) during training; in (7)
Figure FDA0002941730480000033
is a parameter applied in the loss only if there is an anchor point that best fits the current object in the prior box; the value of this parameter is 1 or 0, which is determined by GroundTruth and the kth first in the (i, j) grid The intersection of inspection boxes (IOU) is determined;
Figure FDA0002941730480000034
Figure FDA0002941730480000034
Cijk的值取决于网格单元的边界框是否适合预测对象;如果合适,则Cijk=1;否则,Cijk=0;τnoobj指示网格的第k个先验框不适合目标;
Figure FDA0002941730480000035
代表正确的类别;
Figure FDA0002941730480000036
指示网格的第k个先验框不负责预测目标;
The value of Cijk depends on whether the bounding box of the grid cell is suitable for the prediction object; if it is suitable, then Cijk =1; otherwise, Cijk =0; τnoobj indicates that the kth a priori box of the grid is not suitable for the target;
Figure FDA0002941730480000035
represents the correct category;
Figure FDA0002941730480000036
Indicates that the kth prior box of the grid is not responsible for predicting the target;
类别损失如下:The class losses are as follows:
Figure FDA0002941730480000037
Figure FDA0002941730480000037
Pij表示当前检测到的目标是正确目标的概率;Pij represents the probability that the currently detected target is the correct target;DVS部分的损失函数为:The loss function of the DVS part is:
Figure FDA0002941730480000038
Figure FDA0002941730480000038
其中LDVS表示DVS通道坐标值损失、类别损失和置信度损失之和;where LDVS represents the sum of DVS channel coordinate value loss, category loss and confidence loss;LAPS和LDVS形式上保持一致;所以整个网络的损失函数为:LAPS and LDVS are identical in form; so the loss function of the entire network is:L=LAPS+LDVS#(11) 。L=LAPS +LDVS #(11).
CN202110182127.XA2021-02-092021-02-09Vehicle target detection method based on event cameraActiveCN112801027B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110182127.XACN112801027B (en)2021-02-092021-02-09Vehicle target detection method based on event camera

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110182127.XACN112801027B (en)2021-02-092021-02-09Vehicle target detection method based on event camera

Publications (2)

Publication NumberPublication Date
CN112801027Atrue CN112801027A (en)2021-05-14
CN112801027B CN112801027B (en)2024-07-12

Family

ID=75815068

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110182127.XAActiveCN112801027B (en)2021-02-092021-02-09Vehicle target detection method based on event camera

Country Status (1)

CountryLink
CN (1)CN112801027B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113762409A (en)*2021-09-172021-12-07北京航空航天大学 A UAV target detection method based on event camera
CN115497028A (en)*2022-10-102022-12-20中国电子科技集团公司信息科学研究院 Event-driven dynamic hidden target detection and recognition method and device
CN115526814A (en)*2021-06-252022-12-27华为技术有限公司 Image prediction method and device
CN115631407A (en)*2022-11-102023-01-20中国石油大学(华东)Underwater transparent biological detection based on event camera and color frame image fusion
WO2023025185A1 (en)*2021-08-242023-03-02The University Of Hong KongEvent-based auto-exposure for digital photography
CN116206196A (en)*2023-04-272023-06-02吉林大学 A multi-target detection method and detection system in marine low-light environment
CN116416602A (en)*2023-04-172023-07-11江南大学 Method and system for moving target detection based on joint event data and image data
CN116682000A (en)*2023-07-282023-09-01吉林大学Underwater frogman target detection method based on event camera

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109685152A (en)*2018-12-292019-04-26北京化工大学A kind of image object detection method based on DC-SPP-YOLO
CN111461083A (en)*2020-05-262020-07-28青岛大学 A fast vehicle detection method based on deep learning
CN112163602A (en)*2020-09-142021-01-01湖北工业大学Target detection method based on deep neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109685152A (en)*2018-12-292019-04-26北京化工大学A kind of image object detection method based on DC-SPP-YOLO
CN111461083A (en)*2020-05-262020-07-28青岛大学 A fast vehicle detection method based on deep learning
CN112163602A (en)*2020-09-142021-01-01湖北工业大学Target detection method based on deep neural network

Cited By (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115526814A (en)*2021-06-252022-12-27华为技术有限公司 Image prediction method and device
WO2023025185A1 (en)*2021-08-242023-03-02The University Of Hong KongEvent-based auto-exposure for digital photography
CN113762409B (en)*2021-09-172024-06-28北京航空航天大学 A UAV target detection method based on event camera
CN113762409A (en)*2021-09-172021-12-07北京航空航天大学 A UAV target detection method based on event camera
CN115497028A (en)*2022-10-102022-12-20中国电子科技集团公司信息科学研究院 Event-driven dynamic hidden target detection and recognition method and device
CN115497028B (en)*2022-10-102023-11-07中国电子科技集团公司信息科学研究院 Event-driven dynamic hidden target detection and identification method and device
CN115631407B (en)*2022-11-102023-10-20中国石油大学(华东) Underwater transparent biological detection based on fusion of event camera and color frame image
CN115631407A (en)*2022-11-102023-01-20中国石油大学(华东)Underwater transparent biological detection based on event camera and color frame image fusion
CN116416602A (en)*2023-04-172023-07-11江南大学 Method and system for moving target detection based on joint event data and image data
CN116416602B (en)*2023-04-172024-05-24江南大学Moving object detection method and system based on combination of event data and image data
CN116206196B (en)*2023-04-272023-08-08吉林大学 A multi-target detection method and detection system in marine low-light environment
CN116206196A (en)*2023-04-272023-06-02吉林大学 A multi-target detection method and detection system in marine low-light environment
CN116682000B (en)*2023-07-282023-10-13吉林大学Underwater frogman target detection method based on event camera
CN116682000A (en)*2023-07-282023-09-01吉林大学Underwater frogman target detection method based on event camera

Also Published As

Publication numberPublication date
CN112801027B (en)2024-07-12

Similar Documents

PublicationPublication DateTitle
CN113052210B (en)Rapid low-light target detection method based on convolutional neural network
CN112801027B (en)Vehicle target detection method based on event camera
CN113762409B (en) A UAV target detection method based on event camera
CN110287849B (en)Lightweight depth network image target detection method suitable for raspberry pi
CN111259786B (en)Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN113505640B (en) A small-scale pedestrian detection method based on multi-scale feature fusion
CN104517103A (en)Traffic sign classification method based on deep neural network
CN114998879B (en)Fuzzy license plate recognition method based on event camera
CN113221957B (en)Method for enhancing radar information fusion characteristics based on center
CN115035298B (en)Urban streetscape semantic segmentation enhancement method based on multidimensional attention mechanism
CN112801182B (en) A RGBT Target Tracking Method Based on Difficult Sample Perception
CN109543672B (en)Object detection method based on dense feature pyramid network
WO2023030182A1 (en)Image generation method and apparatus
CN116245860B (en) A small target detection method based on super-resolution-yolo network
CN118314606B (en)Pedestrian detection method based on global-local characteristics
CN119314141A (en) Lightweight parking detection method based on multi-scale attention mechanism
CN115240163A (en) A traffic sign detection method and system based on a one-stage detection network
CN116311154A (en) A Vehicle Detection and Recognition Method Based on YOLOv5 Model Optimization
CN117115616A (en) A real-time low-light image target detection method based on convolutional neural network
Sun et al.UAV image detection algorithm based on improved YOLOv5
CN115063704B (en) A UAV monitoring target classification method based on 3D feature fusion and semantic segmentation
CN116363535A (en)Ship detection method in unmanned aerial vehicle aerial image based on convolutional neural network
CN120147619A (en) Deformable object detection method based on adaptive feature extraction network and attention mechanism
CN115497059A (en) A Vehicle Behavior Recognition Method Based on Attention Network
CN113920455B (en)Night video coloring method based on deep neural network

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp