CN108388879A

Movatterモバイル変換

Info

Publication number: CN108388879A
Application number: CN201810214503.7A
Authority: CN
Inventors: 李朝辉; 吴颖谦; 蒋宗杰; 张燕昆
Original assignee: Zebra Network Technology Co Ltd
Current assignee: Zebra Network Technology Co Ltd
Priority date: 2018-03-15
Filing date: 2018-03-15
Publication date: 2018-08-10
Anticipated expiration: 2038-03-15
Also published as: CN108388879B

Abstract

Translated fromChinese

本发明提供一种目标的检测方法、装置和存储介质，该方法包括：初始检测得到视频数据中当前帧图像中的待检测目标；将所述待检测目标与所述当前帧图像的上一帧图像中的至少一个目标进行匹配；若所述上一帧图像中存在与所述待检测目标相匹配的目标，则分别根据所述待检测目标在当前帧图像中的特征层以及在所述当前帧图像的前m帧图像中的特征层确定所述待检测目标的类别和位置信息，其中，m为正整数。本发明提供的目标的检测方法、装置和存储介质，不仅能够降低检测的难度，而且可以提高检测的准确性。

The present invention provides a target detection method, device and storage medium. The method includes: initially detecting the target to be detected in the current frame image in the video data; combining the target to be detected with the previous frame of the current frame image At least one target in the image is matched; if there is a target that matches the target to be detected in the previous frame image, then according to the feature layer of the target to be detected in the current frame image and the current The feature layer in the first m frames of images of the frame images determines the category and location information of the target to be detected, where m is a positive integer. The object detection method, device and storage medium provided by the invention can not only reduce the difficulty of detection, but also improve the accuracy of detection.

Description

Translated fromChinese

目标的检测方法、装置和存储介质Target detection method, device and storage medium

技术领域technical field

本发明涉及图像检测技术，尤其涉及一种目标的检测方法、装置和存储介质。The present invention relates to image detection technology, in particular to a target detection method, device and storage medium.

背景技术Background technique

汽车辅助驾驶中对车辆和行人等目标的检测的精度要求十分严格。目前的检测技术对车辆、交通标志和车道线等刚性目标等较为准确，而对行人或自行车等非刚性目标的检测准确率则较低。The precision requirements for the detection of vehicles, pedestrians and other objects in the assisted driving of automobiles are very strict. The current detection technology is relatively accurate for rigid objects such as vehicles, traffic signs, and lane lines, while the detection accuracy for non-rigid objects such as pedestrians or bicycles is relatively low.

目前，针对行人的检测方法，主要是基于视频流中的单帧图像，用传统的特征提取加分类的方法，或者基于卷积神经网络等深度学习方法进行检测。其中，传统的特征提取加分类的方法主要是预先设计行人的特征，再用机器学习算法对特征进行分类。如利用图像的梯度直方图(histogram of gradient；HOG)作为特征，用支持向量机(support vectormachine；SVM)进行二分类，HOG特征通过计算图像的梯度，并按照方向和模值进行统计。另外，基于深度学习的方法是通过卷积神经网络自动学习特征，目前比较流行的方法主要有基于提取候选框进行二次分类的faster rcnn、基于多尺度特征层的SSD(single shotmultibox detector)和YOLO算法，以及基于图像金字塔的特征金字塔网络(featurepyramid networks；FPN)改进算法。At present, detection methods for pedestrians are mainly based on single-frame images in video streams, using traditional feature extraction and classification methods, or detection based on deep learning methods such as convolutional neural networks. Among them, the traditional method of feature extraction and classification is mainly to design the characteristics of pedestrians in advance, and then use machine learning algorithms to classify the features. For example, the histogram of gradient (HOG) of the image is used as a feature, and the support vector machine (SVM) is used for binary classification. The HOG feature calculates the gradient of the image and performs statistics according to the direction and modulus. In addition, the method based on deep learning is to automatically learn features through the convolutional neural network. The current popular methods mainly include faster rcnn based on the extraction of candidate boxes for secondary classification, SSD (single shotmultibox detector) and YOLO based on multi-scale feature layers. Algorithm, and the improved algorithm of feature pyramid networks (FPN) based on image pyramid.

由于行人等目标本身会产生各种形变，在采用上述各方式进行检测时，为了提高检测准确度，需要扩大数据量以包含足够多的样本，同时需要提升模型容量，覆盖可能各种形变，这样不仅会增加检测难度，并且检测的准确率也不高。Due to the various deformations of pedestrians and other targets, in order to improve the detection accuracy when using the above methods for detection, it is necessary to expand the amount of data to include enough samples, and at the same time, it is necessary to increase the capacity of the model to cover possible deformations. Not only will it increase the difficulty of detection, but also the detection accuracy is not high.

发明内容Contents of the invention

为解决现有技术中存在的问题，本发明提供一种目标的检测方法、装置和存储介质，不仅能够降低检测难度，而且能够提高检测的准确率。In order to solve the problems existing in the prior art, the present invention provides a target detection method, device and storage medium, which can not only reduce the difficulty of detection, but also improve the accuracy of detection.

第一方面，本发明实施例提供一种目标的检测方法，包括：In a first aspect, an embodiment of the present invention provides a method for detecting a target, including:

初始检测得到视频数据中当前帧图像中的待检测目标；The target to be detected in the current frame image in the video data is obtained through initial detection;

将所述待检测目标与所述当前帧图像的上一帧图像中的至少一个目标进行匹配；matching the target to be detected with at least one target in the previous frame image of the current frame image;

若所述上一帧图像中存在与所述待检测目标相匹配的目标，则分别根据所述待检测目标在当前帧图像中的特征层以及在所述当前帧图像的前m帧图像中的特征层确定所述待检测目标的类别和位置信息，其中，m为正整数。If there is a target matching the target to be detected in the previous frame image, then according to the feature layer of the target to be detected in the current frame image and the feature layer in the previous m frame images of the current frame image, The feature layer determines the category and location information of the target to be detected, where m is a positive integer.

可选地，所述将所述待检测目标与所述当前帧图像的上一帧图像中的至少一个目标进行匹配，包括：Optionally, the matching the target to be detected with at least one target in the previous frame image of the current frame image includes:

获取所述待检测目标在所述当前帧图像中的候选框；Acquiring the candidate frame of the target to be detected in the current frame image;

将所述候选框和所述上一帧图像中的至少一个目标进行匹配。Matching the candidate frame with at least one target in the previous frame image.

可选地，所述将所述候选框和所述上一帧图像中的至少一个目标进行匹配，包括：Optionally, the matching the candidate frame with at least one target in the previous image frame includes:

将所述至少一个目标在所述当前帧图像中进行跟踪处理，获得各所述目标在所述当前帧图像中的跟踪框；performing tracking processing on the at least one target in the current frame image to obtain a tracking frame of each target in the current frame image;

计算各所述跟踪框和所述候选框之间的交并比IOU；Calculating the intersection-over-union ratio IOU between each of the tracking frames and the candidate frame;

确定所述IOU大于预设阈值的所述跟踪框对应的目标与所述候选框匹配成功。It is determined that the object corresponding to the tracking frame whose IOU is greater than a preset threshold is successfully matched with the candidate frame.

可选地，所述计算各所述跟踪框和所述候选框之间的交并比IOU，包括：Optionally, the calculation of the intersection-over-union ratio IOU between each of the tracking frames and the candidate frames includes:

根据公式IOU＝(TkBBox I CandBBox)/(TkBBox U CandBBox)计算所述IOU，其中，所述TkBBox为所述跟踪框，所述CandBBox为所述候选框。The IOU is calculated according to the formula IOU=(TkBBox I CandBBox)/(TkBBox U CandBBox), wherein the TkBBox is the tracking box, and the CandBBox is the candidate box.

可选地，所述分别根据所述待检测目标在当前帧图像中的特征层以及在所述当前帧图像的前m帧图像中的特征层确定所述待检测目标的类别和位置信息，包括：Optionally, the determining the category and position information of the target to be detected according to the feature layer of the target to be detected in the current frame image and the feature layer in the first m frame images of the current frame image respectively includes :

将所述待检测目标在所述当前帧图像中的特征层以及在所述前m帧图像中的特征层输入长期循环卷积网络LRCN，获得所述待检测目标的位置信息以及所述待检测目标为各类别的概率值；Input the feature layer of the target to be detected in the current frame image and the feature layer in the previous m frame images into the long-term cyclic convolutional network LRCN to obtain the position information of the target to be detected and the target to be detected The target is the probability value of each category;

选取概率值最大的类别作为中间类别；Select the category with the highest probability value as the intermediate category;

根据所述中间类别的概率值，以及所述待检测目标在所述上一帧图像中的类别的概率值，确定所述待检测目标在所述当前帧图像中的类别。Determine the category of the target to be detected in the current frame image according to the probability value of the intermediate category and the probability value of the category of the target to be detected in the previous frame image.

可选地，所述根据所述中间类别的概率值，以及所述待检测目标在所述上一帧图像中的类别的概率值，确定所述待检测目标在所述当前帧图像中的类别，包括：Optionally, according to the probability value of the intermediate category and the probability value of the category of the target to be detected in the previous frame image, determine the category of the target to be detected in the current frame image ,include:

将所述中间类别对应的概率值与所述待检测目标在所述上一帧图像中的类别的概率值进行比较；comparing the probability value corresponding to the intermediate category with the probability value of the category of the target to be detected in the previous frame image;

若所述中间类别对应的概率值大于或等于所述待检测目标在所述上一帧图像中的类别的概率值，则将所述中间类别确定为所述待检测目标在所述当前帧图像中的类别；If the probability value corresponding to the intermediate category is greater than or equal to the probability value of the category of the target to be detected in the previous frame image, then determine the intermediate category as the target to be detected in the current frame image category in

若所述中间类别对应的概率值小于所述待检测目标在所述上一帧图像中的类别的概率值，则将所述待检测目标在所述上一帧图像中的类别确定为所述待检测目标在所述当前帧图像中的类别。If the probability value corresponding to the intermediate category is less than the probability value of the category of the target to be detected in the previous frame image, then determine the category of the target to be detected in the previous frame image as the The category of the target to be detected in the current frame image.

可选地，所述将所述待检测目标在所述当前帧图像中的特征层以及在所述前m帧图像中的特征层输入长期循环卷积网络LRCN之前，所述方法还包括：Optionally, before inputting the feature layer of the target to be detected in the current frame image and the feature layer in the previous m frame images into the long-term cyclic convolutional network LRCN, the method further includes:

将待检测目标在所述当前帧图像中的特征层以及在所述前m帧图像中的特征层分别进行缩放处理，获得预设尺寸的特征层；Scaling the feature layer of the target to be detected in the current frame image and the feature layer in the previous m frame images respectively to obtain a feature layer of a preset size;

所述将所述待检测目标在所述当前帧图像中的特征层以及在所述前m帧图像中的特征层输入长期循环卷积网络LRCN，包括：The feature layer of the target to be detected in the current frame image and the feature layer in the first m frame images are input into the long-term cyclic convolutional network LRCN, including:

将所述预设尺寸的特征层输入所述LRCN。Inputting the feature layer of the preset size into the LRCN.

第二方面，本发明实施例提供一种目标的检测装置，包括：In a second aspect, an embodiment of the present invention provides a target detection device, including:

检测模块，用于初始检测得到视频数据中当前帧图像中的待检测目标；The detection module is used to initially detect the target to be detected in the current frame image in the video data;

匹配模块，用于将所述待检测目标与所述当前帧图像的上一帧图像中的至少一个目标进行匹配；A matching module, configured to match the target to be detected with at least one target in the previous frame image of the current frame image;

确定模块，用于在所述匹配模块匹配出所述上一帧图像中存在与所述待检测目标相匹配的目标时，分别根据所述待检测目标在当前帧图像中的特征层以及在所述当前帧图像的前m帧图像中的特征层确定所述待检测目标的类别和位置信息，其中，m为正整数。The determining module is configured to, when the matching module matches that there is an object matching the object to be detected in the previous frame image, respectively according to the feature layer of the object to be detected in the current frame image and the The feature layer in the first m frames of images of the current frame image determines the category and position information of the target to be detected, where m is a positive integer.

可选地，所述匹配模块，具体用于：Optionally, the matching module is specifically used for:

可选地，所述确定模块，具体用于：Optionally, the determining module is specifically configured to:

第三方面，本发明实施例提供一种终端设备，包括：In a third aspect, an embodiment of the present invention provides a terminal device, including:

处理器；processor;

存储器；以及storage; and

计算机程序；Computer program;

其中，所述计算机程序被存储在所述存储器中，并且被配置为由所述处理器执行，所述计算机程序包括用于执行如第一方面所述的方法的指令。Wherein, the computer program is stored in the memory and is configured to be executed by the processor, the computer program including instructions for performing the method as described in the first aspect.

第四方面，本发明实施例提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序使得服务器执行第一方面所述的方法。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program causes a server to execute the method described in the first aspect.

本发明提供的目标的检测方法、装置和存储介质，通过初始检测得到视频数据中当前帧图像中的待检测目标，并将待检测目标与当前帧图像的上一帧图像中的至少一个目标进行匹配，若上一帧图像中存在与待检测目标相匹配的目标，则分别根据待检测目标在当前帧图像中的特征层以及在当前帧图像的前m帧图像中的特征层确定待检测目标的类别和位置信息。由于终端设备在确定当前帧图像中的待检测目标的类别和位置信息时，可以与当前帧图像的上一帧图像中的目标进行匹配，在匹配成功后，根据待检测目标在当前帧图像中的特征层以及在当前帧图像的前m帧图像中的特征层，共同确定待检测目标的类别和位置信息，这样，避免了现有技术中仅根据单帧图像检测目标的现象，而且根据多帧图像，可以检测出待检测目标的姿态变化，从而不仅可以降低检测难度，而且可以提高检测的准确性。The target detection method, device and storage medium provided by the present invention obtain the target to be detected in the current frame image in the video data through initial detection, and compare the target to be detected with at least one target in the previous frame image of the current frame image Matching, if there is a target that matches the target to be detected in the previous frame image, then determine the target to be detected according to the feature layer of the target to be detected in the current frame image and the feature layer in the previous m frame images of the current frame image category and location information. Since the terminal device can match with the target in the previous frame image of the current frame image when determining the category and position information of the target to be detected in the current frame image, after the matching is successful, according to the target to be detected in the current frame image The feature layer of the current frame image and the feature layer in the first m frame images of the current frame image jointly determine the category and position information of the target to be detected. The frame image can detect the posture change of the target to be detected, which not only reduces the difficulty of detection, but also improves the accuracy of detection.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1为本发明实施例提供的目标的检测方法实施例一的流程示意图；FIG. 1 is a schematic flow chart of Embodiment 1 of a target detection method provided by an embodiment of the present invention;

图2为提取候选框的示意图；Figure 2 is a schematic diagram of extracting candidate frames;

图3为LRCN算法的流程示意图；Figure 3 is a schematic flow chart of the LRCN algorithm;

图4为行人时间序列流示意图；Figure 4 is a schematic diagram of pedestrian time series flow;

图5为本发明实施例提供的目标的检测装置实施例一的结构示意图；FIG. 5 is a schematic structural diagram of Embodiment 1 of a target detection device provided by an embodiment of the present invention;

图6为本发明实施例提供的终端设备的结构示意图。FIG. 6 is a schematic structural diagram of a terminal device provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

本发明实施例提供的目标的检测方法可以应用于图像中目标对象的检测场景中，尤其应用于目标本身的姿态会发生变化或者各种形变的非刚性目标检测的场景中。目前，针对行人等非刚性目标的检测，主要是基于视频流中的单帧图像，用传统的特征提取加分类的方法，或者基于卷积神经网络等深度学习方法进行检测。然而，由于行人等目标本身会产生各种形变，在采用上述各方式进行检测时，为了提高检测准确度，需要扩大数据量以包含足够多的样本，同时需要提升模型容量，覆盖可能各种形变，这样不仅会增加检测难度，并且检测的准确率也不高。The target detection method provided by the embodiment of the present invention can be applied to the detection scene of the target object in the image, especially to the non-rigid target detection scene where the posture of the target itself changes or various deformations occur. At present, the detection of non-rigid targets such as pedestrians is mainly based on single-frame images in video streams, using traditional feature extraction and classification methods, or based on deep learning methods such as convolutional neural networks. However, due to the various deformations of pedestrians and other targets, in order to improve the detection accuracy when using the above methods for detection, it is necessary to expand the amount of data to include enough samples, and at the same time, it is necessary to increase the capacity of the model to cover possible various deformations. , which will not only increase the difficulty of detection, but also the detection accuracy is not high.

本发明实施例考虑到上述问题，提出一种目标的检测方法，该方法中通过初始检测得到视频数据中当前帧图像中的待检测目标，并将待检测目标与当前帧图像的上一帧图像中的至少一个目标进行匹配，若上一帧图像中存在与待检测目标相匹配的目标，则分别根据待检测目标在当前帧图像中的特征层以及在当前帧图像的前m帧图像中的特征层确定待检测目标的类别和位置信息。由于终端设备在确定当前帧图像中的待检测目标的类别和位置信息时，可以与当前帧图像的上一帧图像中的目标进行匹配，在匹配成功后，根据待检测目标在当前帧图像中的特征层以及在当前帧图像的前m帧图像中的特征层，共同确定待检测目标的类别和位置信息，这样，避免了现有技术中仅根据单帧图像检测目标的现象，而且根据多帧图像，可以检测出待检测目标的姿态变化，从而不仅可以降低检测难度，而且可以提高检测的准确性。Considering the above problems, the embodiments of the present invention propose a target detection method. In this method, the target to be detected in the current frame image in the video data is obtained through initial detection, and the target to be detected is compared with the previous frame image of the current frame image. If there is a target matching the target to be detected in the previous frame image, then according to the feature layer of the target to be detected in the current frame image and in the previous m frame images of the current frame image The feature layer determines the category and location information of the target to be detected. Since the terminal device can match with the target in the previous frame image of the current frame image when determining the category and position information of the target to be detected in the current frame image, after the matching is successful, according to the target to be detected in the current frame image The feature layer of the current frame image and the feature layer in the first m frame images of the current frame image jointly determine the category and position information of the target to be detected. The frame image can detect the posture change of the target to be detected, which not only reduces the difficulty of detection, but also improves the accuracy of detection.

下面以具体的实施例对本发明的技术方案进行详细说明。下面这几个具体的实施例可以相互结合，对于相同或相似的概念或过程可能在某些实施例不再赘述。The technical solution of the present invention will be described in detail below with specific examples. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.

图1为本发明实施例提供的目标的检测方法实施例一的流程示意图。本发明实施例提供了一种目标的检测方法，该方法可以由任意执行目标的检测方法的装置来执行，该装置可以通过软件和/或硬件实现。本实施例中，该装置可以集成在终端设备中。如图1所示，本发明实施例提供的目标的检测方法包括如下步骤：FIG. 1 is a schematic flowchart of Embodiment 1 of a target detection method provided by an embodiment of the present invention. An embodiment of the present invention provides a target detection method, which can be executed by any device that executes the target detection method, and the device can be implemented by software and/or hardware. In this embodiment, the apparatus may be integrated into a terminal device. As shown in Figure 1, the target detection method provided by the embodiment of the present invention includes the following steps:

步骤101、初始检测得到视频数据中当前帧图像中的待检测目标。Step 101, initial detection obtains the target to be detected in the current frame image in the video data.

在本实施例中，摄像头会实时采集视频数据，并将采集到的视频数据发送给终端设备，终端设备在接收到该视频数据后，从该视频数据中获取当前帧图像，并采用候选框提取网络(region proposal network；RPN)对该当前帧图像进行初始检测，以获得当前帧图像中的各目标是否为待检测目标。其中，待检测目标的数量可以为一个，也可以为多个。在本实施例中，待检测目标可以包括行人或自行车等非刚性目标。In this embodiment, the camera collects video data in real time, and sends the collected video data to the terminal device. After receiving the video data, the terminal device obtains the current frame image from the video data, and uses candidate frame extraction A network (region proposal network; RPN) performs initial detection on the current frame image to obtain whether each target in the current frame image is a target to be detected. Wherein, the number of targets to be detected may be one or multiple. In this embodiment, the targets to be detected may include non-rigid targets such as pedestrians or bicycles.

其中，终端设备例如可以为手机、平板、可穿戴设备或者车载设备等。Wherein, the terminal device may be, for example, a mobile phone, a tablet, a wearable device, or a vehicle-mounted device.

步骤102、将待检测目标与当前帧图像的上一帧图像中的至少一个目标进行匹配。Step 102. Match the target to be detected with at least one target in the previous frame image of the current frame image.

在本实施例中，每一帧图像中均包括有至少一个目标，终端设备在获取到当前帧图像中的待检测目标之后，将该待检测目标与当前帧图像的上一帧图像中的至少一个目标进行匹配。需要进行说明的是，若待检测目标的数量有多个时，可以将每个待检测目标分别与当前帧图像的上一帧图像中的至少一个目标进行匹配。In this embodiment, each frame of image includes at least one target. After the terminal device acquires the target to be detected in the current frame image, it combines the target to be detected with at least one of the previous frame images of the current frame image A target is matched. It should be noted that, if there are multiple targets to be detected, each target to be detected may be matched with at least one target in the previous frame image of the current frame image.

在一种可能的实现方式中，将待检测目标与当前帧图像的上一帧图像中的至少一个目标进行匹配，包括获取待检测目标在当前帧图像中的候选框，并将候选框和上一帧图像中的至少一个目标进行匹配。In a possible implementation manner, matching the target to be detected with at least one target in the previous frame image of the current frame image includes obtaining a candidate frame of the target to be detected in the current frame image, and combining the candidate frame with the previous frame image At least one object in a frame of image is matched.

具体的，图2为提取候选框的示意图，如图2所示，在获取到视频数据中的当前帧图像之后，将采用候选框提取网络(region proposal network；RPN)对当前帧图像进行候选框1的提取，另外，还需要保存利用此网络计算出的待检测目标的特征层。其中，提取出的每个候选框1中将包含有一个待检测目标。Specifically, Fig. 2 is a schematic diagram of extracting a candidate frame. As shown in Fig. 2, after the current frame image in the video data is obtained, a candidate frame extraction network (region proposal network; RPN) will be used to extract the candidate frame from the current frame image. 1, in addition, it is also necessary to save the feature layer of the target to be detected calculated by using this network. Wherein, each extracted candidate frame 1 will contain a target to be detected.

在提取出候选框1之后，将对提取的候选框与当前帧图像的上一帧图像中的目标进行匹配。在本发明实施例中，可以采用跟踪算法进行匹配，在具体的实现过程中，可以将至少一个目标在当前帧图像中进行跟踪处理，获得各目标在当前帧图像中的跟踪框，并计算各跟踪框和候选框之间的交并比(Intersection over Union；IOU)，确定IOU大于预设阈值的跟踪框对应的目标与候选框匹配成功。After the candidate frame 1 is extracted, the extracted candidate frame will be matched with the target in the previous frame image of the current frame image. In the embodiment of the present invention, a tracking algorithm can be used for matching. In a specific implementation process, at least one target can be tracked in the current frame image to obtain the tracking frame of each target in the current frame image, and calculate each The Intersection over Union (IOU) between the tracking frame and the candidate frame determines that the target corresponding to the tracking frame whose IOU is greater than the preset threshold is successfully matched with the candidate frame.

具体的，可以对前一帧图像中的所有目标在当前帧中用核相关滤波算法(Kernerlized Correlation Filter；KCF)算法进行跟踪，得到前一帧图像中的所有目标在当前帧中的跟踪框。计算出各目标在当前帧图像中的跟踪框之后，将计算各跟踪框与待检测目标的候选框之间的IOU。Specifically, all the targets in the previous frame of image may be tracked in the current frame using a Kernerlized Correlation Filter (KCF) algorithm to obtain the tracking frames of all the targets in the previous frame of image in the current frame. After calculating the tracking frames of each target in the current frame image, the IOU between each tracking frame and the candidate frame of the target to be detected will be calculated.

在一种可能的实现方式中，可以根据公式IOU＝(TkBBox I CandBBox)/(TkBBox UCandBBox)计算IOU，其中，TkBBox为跟踪框，CandBBox为候选框，也即先计算跟踪框和候选框之间的交集，再计算跟踪框和候选框之间的并集，然后将两者做比值，以确定出跟踪框和候选框之间的交并比IOU。In a possible implementation, the IOU can be calculated according to the formula IOU=(TkBBox I CandBBox)/(TkBBox UCandBBox), where TkBBox is the tracking box and CandBBox is the candidate box, that is, the distance between the tracking box and the candidate box is calculated first. Then calculate the union between the tracking frame and the candidate frame, and then compare the two to determine the intersection ratio IOU between the tracking frame and the candidate frame.

在计算出IOU之后，将判断计算出的IOU的值是否大于预设阈值，若判断出IOU的值大于预设阈值，则说明计算该IOU的跟踪框对应的目标与候选框匹配成功，否则，说明跟踪框对应的目标与候选框匹配不成功。其中，预设阈值的取值可以根据实际情况或者经验进行选取，对于预设阈值的具体取值，本发明实施例在此不做限制。After the IOU is calculated, it will be judged whether the value of the calculated IOU is greater than the preset threshold. If it is judged that the value of the IOU is greater than the preset threshold, it means that the target corresponding to the tracking frame of the IOU is successfully matched with the candidate frame. Otherwise, It indicates that the target corresponding to the tracking frame is not successfully matched with the candidate frame. Wherein, the value of the preset threshold may be selected according to the actual situation or experience, and the specific value of the preset threshold is not limited in this embodiment of the present invention.

值得注意的是，若当前帧中的某个候选框没有与上一帧图像中的任何目标匹配成功，则说明该候选框对应的待检测目标可能为在当前帧中新出现的目标，此时，可以将该待检测目标标记为初始帧。若上一帧图像中的某个目标未与当前帧中的任何目标对应的候选框匹配成功，则说明上一帧图像中的该目标在当前帧中已经消失，此时将会丢弃此目标。It is worth noting that if a candidate box in the current frame does not successfully match any target in the previous frame image, it means that the target to be detected corresponding to the candidate box may be a new target in the current frame. , the target to be detected can be marked as the initial frame. If a target in the previous frame image does not match the candidate frame corresponding to any target in the current frame, it means that the target in the previous frame image has disappeared in the current frame, and the target will be discarded at this time.

步骤103、若上一帧图像中存在与待检测目标相匹配的目标，则分别根据待检测目标在当前帧图像中的特征层以及在当前帧图像的前m帧图像中的特征层确定待检测目标的类别和位置信息，其中，m为正整数。Step 103, if there is a target that matches the target to be detected in the previous frame image, then determine the target to be detected according to the feature layer of the target to be detected in the current frame image and the feature layer in the previous m frame images of the current frame image The category and location information of the target, where m is a positive integer.

在本实施例中，终端设备可以通过候选框提取网络(region Proposal Network)计算出各待检测目标的特征层。若终端设备发现上一帧图像中存在与待检测目标相匹配的目标，将会获取待检测目标在当前帧图像中的特征层，以及待检测目标在前m帧图像中的特征层，并根据获取到的这些特征层确定待检测目标的类别和位置信息。In this embodiment, the terminal device may calculate the feature layer of each target to be detected through a candidate frame extraction network (region proposal network). If the terminal device finds that there is a target that matches the target to be detected in the previous frame image, it will obtain the feature layer of the target to be detected in the current frame image and the feature layer of the target to be detected in the previous m frames of images, and according to The obtained feature layers determine the category and location information of the target to be detected.

在一种可能的实现方式中，分别根据待检测目标在当前帧图像中的特征层以及在当前帧图像的前m帧图像中的特征层确定待检测目标的类别和位置信息，包括将待检测目标在当前帧图像中的特征层以及在前m帧图像中的特征层输入长期循环卷积网络(Long-term Recurrent Convolution Network；LRCN)，获得待检测目标的位置信息以及待检测目标为各类别的概率值；选取概率值最大的类别作为中间类别；根据中间类别的概率值，以及待检测目标在上一帧图像中的类别的概率值，确定待检测目标在当前帧图像中的类别。In a possible implementation, the category and position information of the target to be detected is determined according to the feature layer of the target to be detected in the current frame image and the feature layer in the first m frame images of the current frame image, including the The feature layer of the target in the current frame image and the feature layer in the previous m frame images are input into the long-term recurrent convolution network (Long-term Recurrent Convolution Network; LRCN) to obtain the position information of the target to be detected and the target to be detected for each category Select the category with the largest probability value as the intermediate category; determine the category of the target to be detected in the current frame image according to the probability value of the intermediate category and the probability value of the category of the target to be detected in the previous frame image.

具体的，终端设备可以通过候选框提取网络(region Proposal Network；RPN)计算出各待检测目标在当前帧图像中的特征层，同样的，在对之前的各帧图像进行检测时，也会计算上述待检测目标在之前的各帧图像中的特征层，并将该特征层进行保存。Specifically, the terminal device can calculate the feature layer of each target to be detected in the current frame image through the candidate frame extraction network (region proposal network; RPN). Similarly, when detecting the previous frame images, it will also calculate The above-mentioned target to be detected is in the feature layer of the previous frames of images, and the feature layer is saved.

当终端设备在确定出上一帧图像中存在与待检测目标相匹配的目标时，说明该待检测目标在上一帧图像和当前帧图像中均有出现，此时，将会获取保存的待检测目标在当前帧图像中的卷积层特征以及在前m帧图像中的卷积层特征，并将获取到的卷积层特征作为输入传入时间序列网络，例如可以传入至LRCN中，其中，LRCN网络由若干长短期记忆模型(long-short term memory；LSTM)层组成，每一层接收当前帧目标的特征输入，输出对应帧待检测目标的位置信息和类别信息，并向下一层传递状态。When the terminal device determines that there is a target that matches the target to be detected in the previous frame image, it means that the target to be detected appears in both the previous frame image and the current frame image, and at this time, it will obtain the saved target Detect the convolutional layer features of the target in the current frame image and the convolutional layer features in the previous m frame images, and pass the obtained convolutional layer features into the time series network as input, for example, it can be passed into the LRCN, Among them, the LRCN network is composed of several long-short term memory (LSTM) layers. Each layer receives the feature input of the target in the current frame, outputs the position information and category information of the target to be detected in the corresponding frame, and sends it to the next frame. Layer delivery status.

图3为LRCN算法的流程示意图，如图3所示，将通过KCF算法跟踪得到的前一帧图像中的所有目标在当前帧中的跟踪框之后，将跟踪框和待检测目标的候选框进行匹配之后，若匹配成功，则获取待检测目标在当前帧图像中的CNN(Convolutional Neural Network；卷积神经网络)特征层以及在当前帧图像的前m帧图像中的CNN特征层，并将获取到的CNN特征层作为输入传入至LSTM网络，从而获得待检测目标的位置信息以及待检测目标为各类别的概率值。Figure 3 is a flow diagram of the LRCN algorithm. As shown in Figure 3, after all the targets in the previous frame image tracked by the KCF algorithm are tracked in the current frame, the tracking frame and the candidate frame of the target to be detected are compared. After matching, if the matching is successful, then obtain the CNN (Convolutional Neural Network; Convolutional Neural Network) feature layer of the target to be detected in the current frame image and the CNN feature layer in the first m frame images of the current frame image, and obtain The obtained CNN feature layer is passed as input to the LSTM network, so as to obtain the position information of the target to be detected and the probability value of each category of the target to be detected.

其中，m可以根据实际情况或者经验设置，例如可以设置为10、15等，对于m的具体取值，本实施例在此不做限制。Wherein, m can be set according to the actual situation or experience, for example, it can be set to 10, 15, etc. The specific value of m is not limited in this embodiment.

另外，类别的数量或者种类可以使预先设定的，例如可以包括背景、行人、自行车和汽车等，终端设备将特征层输入至LRCN之后，将会得到待检测目标的在当前帧图像中的坐标位置以及待检测目标为各类别的概率值。In addition, the number or type of categories can be preset, for example, it can include background, pedestrians, bicycles, cars, etc. After the terminal device inputs the feature layer into LRCN, it will get the coordinates of the target to be detected in the current frame image The position and the target to be detected are the probability values of each category.

举例来说，若当前帧图像为第30帧图像，则将待检测目标在第30帧图像中的特征层以及待检测目标在第20-29帧图像中的特征层输入LRCN，可以获得待检测目标的位置信息在当前帧图像中的坐标位置，还可以得到该待检测目标为为各类别的概率值，如为背景的概率为0.1，为行人的概率为0.7、为自行车的概率为0.1，为汽车的概率为0.1等。For example, if the current frame image is the 30th frame image, then input the feature layer of the target to be detected in the 30th frame image and the feature layer of the target to be detected in the 20th-29th frame image into the LRCN, and the target to be detected can be obtained The coordinate position of the target's position information in the current frame image can also be obtained as the probability value of each category of the target to be detected. For example, the probability of being a background is 0.1, the probability of being a pedestrian is 0.7, and the probability of being a bicycle is 0.1. The probability of being a car is 0.1 etc.

在确定出待检测目标为各类别的概率值之后，将选取概率值最大的类别作为中间类别，如选取行人作为中间类别。After the target to be detected is determined to be the probability value of each category, the category with the highest probability value is selected as the intermediate category, such as pedestrians are selected as the intermediate category.

进一步地，将确定出的中间类别对应的概率值与待检测目标在上一帧图像中的类别的概率值进行比较；若中间类别对应的概率值大于或等于待检测目标在上一帧图像中的类别的概率值，则将中间类别确定为待检测目标在当前帧图像中的类别；若中间类别对应的概率值小于待检测目标在上一帧图像中的类别的概率值，则将待检测目标在上一帧图像中的类别确定为待检测目标在当前帧图像中的类别。Further, compare the probability value corresponding to the determined intermediate category with the probability value of the category of the target to be detected in the previous frame image; if the probability value corresponding to the intermediate category is greater than or equal to the probability value of the target to be detected in the previous frame image If the probability value of the category of the object to be detected is determined as the category of the target to be detected in the current frame image; The category of the target in the previous frame image is determined as the category of the target to be detected in the current frame image.

具体地，对于每一帧图像来说，其都会根据上述方式确定出待检测目标在此帧图像中的类别，因此，终端设备在确定出中间类别之后，会将中间类别的概率值和待检测目标在上一帧图像中的类别的概率值进行比较，当中间类别对应的概率值大于或等于待检测目标在上一帧图像中的类别的概率值，则将中间类别确定为待检测目标在当前帧图像中的类别。例如：若中间类别为行人，且概率值为0.7，待检测目标在上一帧图像中的类别也为行人，且概率值为0.6，则将中间类别行人确定为待检测目标在当前帧图像中的类别。又如：若中间类别为行人，且概率值为0.7，待检测目标在上一帧图像中的类别为自行车，且概率值为0.6，则将中间类别行人确定为待检测目标在当前帧图像中的类别。Specifically, for each frame of image, it will determine the category of the target to be detected in this frame of image according to the above method. Therefore, after the terminal device determines the intermediate category, it will combine the probability value of the intermediate category and the target to be detected The probability value of the category of the target in the previous frame image is compared, and when the probability value corresponding to the intermediate category is greater than or equal to the probability value of the category of the target to be detected in the previous frame image, the intermediate category is determined as the target to be detected in The categories in the current frame image. For example: if the middle category is a pedestrian, and the probability value is 0.7, the category of the target to be detected in the previous frame image is also a pedestrian, and the probability value is 0.6, then the middle category pedestrian is determined as the target to be detected in the current frame image category. Another example: if the middle category is a pedestrian, and the probability value is 0.7, and the category of the target to be detected in the previous frame image is a bicycle, and the probability value is 0.6, then the middle category pedestrian is determined as the target to be detected in the current frame image category.

另外，若中间类别对应的概率值小于待检测目标在上一帧图像中的类别的概率值，则将待检测目标在上一帧图像中的类别确定为待检测目标在当前帧图像中的类别。例如：若中间类别为行人，且概率值为0.7，待检测目标在上一帧图像中的类别也为行人，且概率值为0.8，则将待检测目标在上一帧图像中的类别行人确定为待检测目标在当前帧图像中的类别。又如：若中间类别为行人，且概率值为0.7，待检测目标在上一帧图像中的类别为自行车，且概率值为0.8，则将待检测目标在上一帧图像中的类别自行车确定为待检测目标在当前帧图像中的类别。In addition, if the probability value corresponding to the intermediate category is less than the probability value of the category of the target to be detected in the previous frame image, the category of the target to be detected in the previous frame image is determined as the category of the target to be detected in the current frame image . For example: if the middle category is a pedestrian, and the probability value is 0.7, the category of the object to be detected in the previous frame image is also a pedestrian, and the probability value is 0.8, then the category pedestrian of the object to be detected in the previous frame image is determined is the category of the target to be detected in the current frame image. Another example: if the intermediate category is a pedestrian, and the probability value is 0.7, and the category of the object to be detected in the previous frame image is bicycle, and the probability value is 0.8, then the category bicycle of the object to be detected in the previous frame image is determined is the category of the target to be detected in the current frame image.

进一步地，将待检测目标在当前帧图像中的特征层以及在前m帧图像中的特征层输入LRCN之前，该方法还包括：将待检测目标在当前帧图像中的特征层以及在前m帧图像中的特征层分别进行缩放处理，获得预设尺寸的特征层，这样，只需将预设尺寸的特征层输入所述LRCN即可。Further, before inputting the feature layer of the target to be detected in the current frame image and the feature layers in the previous m frame images into the LRCN, the method also includes: inputting the feature layer of the target to be detected in the current frame image and the previous m The feature layers in the frame image are respectively scaled to obtain feature layers with preset sizes. In this way, it is only necessary to input the feature layers with preset sizes into the LRCN.

具体的，图4为行人时间序列流示意图，如图4所示，不同帧下的待检测目标的尺寸是不同的，因此输入到LRCN网络之前，本实施例中需要采用fast rcnn中的算法，首先对卷积层做感兴趣区域(region of interest；ROI)缩放处理,缩放到固定尺寸，具体的实现方式为：假设感兴趣区域ROI为h×w,缩放后的特征尺寸为H×W,将ROI分成H×W个格子，每个格子大小为h/H×w/W，在每个格子中做最大缩放处理(max pooling)，最终生成H×W大小的特征层。Specifically, Fig. 4 is a schematic diagram of pedestrian time series flow, as shown in Fig. 4, the size of the target to be detected under different frames is different, so before inputting into the LRCN network, the algorithm in the fast rcnn needs to be used in this embodiment, Firstly, the region of interest (ROI) of the convolutional layer is scaled to a fixed size. The specific implementation method is as follows: assuming that the ROI of the region of interest is h×w, and the scaled feature size is H×W, Divide the ROI into H×W grids, each grid has a size of h/H×w/W, perform max pooling in each grid, and finally generate a feature layer of H×W size.

另外，由于一帧图像中含有多个目标，因此在计算特征层时可以直接对整幅图计算卷积特征，然后根据待检测目标的候选框的坐标和尺寸取出对应的特征层进行ROI缩放处理。In addition, since one frame of image contains multiple targets, the convolution feature can be directly calculated for the entire image when calculating the feature layer, and then the corresponding feature layer is taken out according to the coordinates and size of the candidate frame of the target to be detected for ROI scaling processing .

进一步地，对于训练和检测，也可以以待检测目标为单位，具体的，对于每一帧中的每个目标，首先计算出每帧中所对应的卷积特征，再做ROI缩放变换到固定尺寸，传入LRCN网络。Furthermore, for training and detection, the target to be detected can also be used as the unit. Specifically, for each target in each frame, first calculate the corresponding convolution feature in each frame, and then perform ROI scaling transformation to a fixed Size, passed to the LRCN network.

本发明实施例提供的目标的检测方法，通过初始检测得到视频数据中当前帧图像中的待检测目标，并将待检测目标与当前帧图像的上一帧图像中的至少一个目标进行匹配，若上一帧图像中存在与待检测目标相匹配的目标，则分别根据待检测目标在当前帧图像中的特征层以及在当前帧图像的前m帧图像中的特征层确定待检测目标的类别和位置信息。由于终端设备在确定当前帧图像中的待检测目标的类别和位置信息时，可以与当前帧图像的上一帧图像中的目标进行匹配，在匹配成功后，根据待检测目标在当前帧图像中的特征层以及在当前帧图像的前m帧图像中的特征层，共同确定待检测目标的类别和位置信息，这样，避免了现有技术中仅根据单帧图像检测目标的现象，而且根据多帧图像，可以检测出待检测目标的姿态变化，从而不仅可以降低检测难度，而且可以提高检测的准确性。The target detection method provided by the embodiment of the present invention obtains the target to be detected in the current frame image in the video data through initial detection, and matches the target to be detected with at least one target in the previous frame image of the current frame image, if If there is a target matching the target to be detected in the previous frame image, the category and location information. Since the terminal device can match with the target in the previous frame image of the current frame image when determining the category and position information of the target to be detected in the current frame image, after the matching is successful, according to the target to be detected in the current frame image The feature layer of the current frame image and the feature layer in the first m frame images of the current frame image jointly determine the category and position information of the target to be detected. The frame image can detect the posture change of the target to be detected, which not only reduces the difficulty of detection, but also improves the accuracy of detection.

图5为本发明实施例提供的目标的检测装置实施例一的结构示意图。该目标的检测装置可以为独立的终端设备，也可以为集成在终端设备中的装置，该装置可以通过软件、硬件或者软硬件结合的方式实现。如图5所示，该装置包括：FIG. 5 is a schematic structural diagram of Embodiment 1 of a target detection device provided by an embodiment of the present invention. The device for detecting the target may be an independent terminal device, or a device integrated in the terminal device, and the device may be realized by software, hardware or a combination of software and hardware. As shown in Figure 5, the device includes:

检测模块11用于初始检测得到视频数据中当前帧图像中的待检测目标；The detection module 11 is used for initial detection to obtain the target to be detected in the current frame image in the video data;

匹配模块12用于将所述待检测目标与所述当前帧图像的上一帧图像中的至少一个目标进行匹配；The matching module 12 is used to match the target to be detected with at least one target in the previous frame image of the current frame image;

确定模块13用于在所述匹配模块匹配出所述上一帧图像中存在与所述待检测目标相匹配的目标时，分别根据所述待检测目标在当前帧图像中的特征层以及在所述当前帧图像的前m帧图像中的特征层确定所述待检测目标的类别和位置信息，其中，m为正整数。The determining module 13 is configured to, when the matching module matches that there is an object matching the object to be detected in the previous frame image, according to the feature layer of the object to be detected in the current frame image and the The feature layer in the first m frames of images of the current frame image determines the category and position information of the target to be detected, where m is a positive integer.

本发明实施例提供的目标的检测装置，可以执行上述方法实施例，其实现原理和技术效果类似，在此不再赘述。The object detection device provided by the embodiment of the present invention can execute the above-mentioned method embodiment, and its implementation principle and technical effect are similar, and will not be repeated here.

可选的，所述匹配模块12，具体用于：Optionally, the matching module 12 is specifically used for:

可选的，所述确定模块13，具体用于：Optionally, the determining module 13 is specifically used for:

图6为本发明实施例提供的终端设备的结构示意图。如图6所示，该终端设备可以包括发送器60、处理器61、存储器62、接收器64和至少一个通信总线63。通信总线63用于实现元件之间的通信连接。存储器62可能包含高速RAM存储器，也可能还包括非易失性存储NVM，例如至少一个磁盘存储器，存储器62中可以存储各种计算机程序，用于完成各种处理功能以及实现前述任一实施例的方法步骤。FIG. 6 is a schematic structural diagram of a terminal device provided by an embodiment of the present invention. As shown in FIG. 6 , the terminal device may include a transmitter 60 , a processor 61 , a memory 62 , a receiver 64 and at least one communication bus 63 . The communication bus 63 is used to realize the communication connection among the components. The memory 62 may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and various computer programs may be stored in the memory 62 for completing various processing functions and realizing any of the foregoing embodiments. Method steps.

本发明实施例还提供一种计算机可读存储介质，其中，计算机可读存储介质存储有计算机程序，所述计算机程序使得服务器执行前述任一实施例提供的目标的检测方法。An embodiment of the present invention further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program enables a server to execute the target detection method provided in any one of the foregoing embodiments.

本领域普通技术人员可以理解：实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时，执行包括上述各方法实施例的步骤；而前述的存储介质包括：ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps for implementing the above method embodiments can be completed by program instructions and related hardware. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it executes the steps including the above-mentioned method embodiments; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other various media that can store program codes.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than limiting them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present invention. scope.

Claims

Translated fromChinese

1.一种目标的检测方法，其特征在于，包括：1. A detection method for a target, comprising:

2.根据权利要求1所述的方法，其特征在于，所述将所述待检测目标与所述当前帧图像的上一帧图像中的至少一个目标进行匹配，包括：2. The method according to claim 1, wherein said matching the target to be detected with at least one target in the previous frame image of the current frame image comprises:

3.根据权利要求2所述的方法，其特征在于，所述将所述候选框和所述上一帧图像中的至少一个目标进行匹配，包括：3. The method according to claim 2, wherein said matching said candidate frame with at least one target in said previous frame image comprises:

4.根据权利要求3所述的方法，其特征在于，所述计算各所述跟踪框和所述候选框之间的交并比IOU，包括：4. The method according to claim 3, wherein said calculating the intersection-over-union ratio IOU between each said tracking frame and said candidate frame comprises:

5.根据权利要求1-4任一项所述的方法，其特征在于，所述分别根据所述待检测目标在当前帧图像中的特征层以及在所述当前帧图像的前m帧图像中的特征层确定所述待检测目标的类别和位置信息，包括：5. The method according to any one of claims 1-4, characterized in that, according to the feature layer of the target to be detected in the current frame image and in the first m frame images of the current frame image, respectively, The feature layer determines the category and location information of the target to be detected, including:

6.根据权利要求5所述的方法，其特征在于，所述根据所述中间类别的概率值，以及所述待检测目标在所述上一帧图像中的类别的概率值，确定所述待检测目标在所述当前帧图像中的类别，包括：6. The method according to claim 5, wherein, according to the probability value of the intermediate category and the probability value of the category of the target to be detected in the last frame image, determining the Detecting the category of the target in the current frame image, including:

7.根据权利要求5所述的方法，其特征在于，所述将所述待检测目标在所述当前帧图像中的特征层以及在所述前m帧图像中的特征层输入长期循环卷积网络LRCN之前，所述方法还包括：7. The method according to claim 5, wherein the feature layer of the target to be detected in the current frame image and the feature layer in the first m frame images are input into long-term circular convolution Before network LRCN, the method also includes:

8.一种目标的检测装置，其特征在于，包括：8. A detection device for a target, comprising:

9.一种终端设备，其特征在于，包括：9. A terminal device, characterized in that, comprising:

处理器；processor;

存储器；以及storage; and

计算机程序；Computer program;

其中，所述计算机程序被存储在所述存储器中，并且被配置为由所述处理器执行，所述计算机程序包括用于执行如权利要求1-7任一项所述的方法的指令。Wherein, the computer program is stored in the memory and is configured to be executed by the processor, the computer program comprising instructions for performing the method according to any one of claims 1-7.

10.一种计算机可读存储介质，其特征在于，所述计算机可读存储介质存储有计算机程序，所述计算机程序使得终端设备执行权利要求1-7任一项所述的方法。10. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program causes a terminal device to execute the method according to any one of claims 1-7.