CN110610510B

Movatterモバイル変換

Info

Publication number: CN110610510B
Application number: CN201910810400.1A
Authority: CN
Inventors: 许明
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2022-12-16
Anticipated expiration: 2039-08-29
Also published as: CN110610510A

Abstract

Translated fromChinese

本申请公开了一种目标跟踪方法、装置、电子设备及存储介质，属于图像处理技术领域。该方法包括：获取待跟踪视频的多个帧图像，并从所述多个帧图像中获取多个关键帧；将所述多个关键帧输入至检测网络，得到检测数据，所述检测数据包括目标物的位置信息；利用所述检测数据和核相关滤波算法对所述多个帧图像中的所述目标物进行跟踪。本申请实施例获取到多个帧图像时，可以从所述多个帧图像中获取多个关键帧，该关键帧的引入可以有效删除多个帧图像中的模糊图像，在一定程度上可以提高目标跟踪的准确性。

The application discloses a target tracking method, device, electronic equipment and storage medium, belonging to the technical field of image processing. The method includes: acquiring a plurality of frame images of the video to be tracked, and obtaining a plurality of key frames from the plurality of frame images; inputting the plurality of key frames into a detection network to obtain detection data, the detection data including Position information of the target object; tracking the target object in the plurality of frame images by using the detection data and the kernel correlation filtering algorithm. When multiple frame images are obtained in the embodiment of the present application, multiple key frames can be obtained from the multiple frame images, and the introduction of the key frame can effectively delete the blurred images in the multiple frame images, which can improve the Accuracy of object tracking.

Description

Translated fromChinese

目标跟踪方法、装置、电子设备及存储介质Target tracking method, device, electronic device and storage medium

技术领域technical field

本申请涉及图像处理技术领域，更具体地，涉及一种目标跟踪方法、装置、电子设备及存储介质。The present application relates to the technical field of image processing, and more specifically, to a target tracking method, device, electronic equipment, and storage medium.

背景技术Background technique

目标跟踪技术是近些年才发展并投入应用的一项技术，也是当前一个重要的课题和研究热点，目标跟踪通过在连续的视频图像序列中估计跟踪目标的位置、形状所占区域，确定目标的运动速度、方向及轨迹等运动信息，实现对运动目标行为的分析和理解，以便完成更高级的任务。显然，目标跟踪是视频分析的核心问题之一，其在视频监控、人机交互、虚拟现实等领域有广泛的应用。虽然在过去几十年来该技术的研究取得了很大的进展，但由于应用中存在尺度变化、光照变化、快速运动、部分遮挡、背景复杂等多种因素，寻求快速稳定、适合对象变化的跟踪方法始终是一个挑战。Target tracking technology is a technology that has only been developed and put into application in recent years. It is also an important topic and research hotspot at present. It can analyze and understand the behavior of the moving target in order to complete more advanced tasks. Obviously, object tracking is one of the core issues of video analysis, and it has a wide range of applications in video surveillance, human-computer interaction, virtual reality and other fields. Although the research on this technology has made great progress in the past few decades, due to various factors such as scale changes, illumination changes, fast motion, partial occlusion, and complex backgrounds in the application, it is necessary to seek fast and stable tracking suitable for object changes. Methodology is always a challenge.

发明内容Contents of the invention

有鉴于此，本发明提出了一种目标跟踪方法、装置、电子设备及存储介质，以解决上述问题。In view of this, the present invention proposes a target tracking method, device, electronic equipment and storage medium to solve the above problems.

第一方面，本申请实施例提供了一种目标跟踪方法，应用于电子设备，该方法包括：获取待跟踪视频的多个帧图像，并从所述多个帧图像中获取多个关键帧；将所述多个关键帧输入至检测网络，得到检测数据，所述检测数据包括目标物的位置信息；利用所述检测数据和核相关滤波算法对所述多个帧图像中的所述目标物进行跟踪。In the first aspect, an embodiment of the present application provides a target tracking method applied to an electronic device, the method comprising: acquiring multiple frame images of a video to be tracked, and acquiring multiple key frames from the multiple frame images; Input the multiple key frames into the detection network to obtain detection data, the detection data includes the position information of the target object; use the detection data and the kernel correlation filter algorithm to analyze the target object in the multiple frame images to track.

第二方面，本申请实施例提供了一种目标跟踪装置，应用于电子设备。所述装置包括：关键帧获取模块、检测数据获取模块和目标跟踪模块。关键帧获取模块，用于获取待跟踪视频的多个帧图像，并从所述多个帧图像中获取多个关键帧。检测数据获取模块，用于将所述多个关键帧输入至检测网络，得到检测数据，所述检测数据包括目标物的位置信息。目标跟踪模块，用于利用所述检测数据和核相关滤波算法对所述多个帧图像中的所述目标物进行跟踪。In a second aspect, the embodiment of the present application provides an apparatus for tracking an object, which is applied to an electronic device. The device includes: a key frame acquisition module, a detection data acquisition module and a target tracking module. A key frame acquisition module, configured to acquire multiple frame images of the video to be tracked, and acquire multiple key frames from the multiple frame images. The detection data acquisition module is configured to input the multiple key frames into the detection network to obtain detection data, the detection data including the position information of the target. A target tracking module, configured to track the target in the plurality of frame images by using the detection data and a kernel correlation filtering algorithm.

第三方面，本申请实施例还提供了一种电子设备，包括一个或多个处理器；存储器；一个或多个应用程序，其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行，所述一个或多个程序配置用于执行上述方法。In the third aspect, the embodiment of the present application also provides an electronic device, including one or more processors; memory; one or more application programs, wherein the one or more application programs are stored in the memory and Configured to be executed by the one or more processors, the one or more programs configured to perform the above method.

第四方面，本申请实施例还提供了一种计算机可读介质，所述计算机可读取存储介质中存储有程序代码，所述程序代码可被处理器调用执行上述方法。In a fourth aspect, the embodiment of the present application further provides a computer-readable medium, where program code is stored in the computer-readable storage medium, and the program code can be invoked by a processor to execute the above method.

相对于现有技术，本申请实施例提供的目标跟踪方法、装置、电子设备及存储介质，该目标跟踪方法中可以先获取待跟踪视频的多个帧图像，并从这些帧图像中获取多个关键帧，然后将所述多个关键帧输入至检测网络，得到检测数据，其中，检测数据可以包括目标物的位置信息，最后利用其获取的检测数据和核相关滤波算法对所述多个帧图像中的所述目标物进行跟踪。本申请实施例获取到多个帧图像后，不是直接将其输入至检测网络而是先从这些帧图像中获取多个关键帧，然后才将所述多个关键帧输入至检测网络，通过引入该关键帧本申请可以更加准确有效的对目标物进行跟踪。Compared with the prior art, in the target tracking method, device, electronic equipment and storage medium provided by the embodiments of the present application, in the target tracking method, multiple frame images of the video to be tracked can be obtained first, and multiple frame images can be obtained from these frame images. key frame, and then input the multiple key frames to the detection network to obtain detection data, wherein the detection data can include the position information of the target object, and finally use the detection data obtained by it and the kernel correlation filter algorithm to analyze the multiple frames The target object in the image is tracked. After obtaining multiple frame images in the embodiment of the present application, instead of directly inputting them to the detection network, first obtain multiple key frames from these frame images, and then input the multiple key frames to the detection network, by introducing This key frame application can track the target more accurately and effectively.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1示出了本申请实施例提供的目标跟踪的总体框架图；Fig. 1 shows the overall frame diagram of the target tracking that the embodiment of the present application provides;

图2示出了本申请一个实施例提供的目标跟踪方法的方法流程图；FIG. 2 shows a method flowchart of a target tracking method provided by an embodiment of the present application;

图3示出了本申请另一个实施例提供的目标跟踪方法的方法流程图；FIG. 3 shows a method flowchart of a target tracking method provided in another embodiment of the present application;

图4示出了本申请另一个实施例提供的目标跟踪方法中步骤S302的详细流程图；FIG. 4 shows a detailed flowchart of step S302 in the target tracking method provided by another embodiment of the present application;

图5示出了本申请另一个实施例提供的目标跟踪方法中获取多个关键帧示意图；Fig. 5 shows a schematic diagram of obtaining multiple key frames in the target tracking method provided by another embodiment of the present application;

图6示出了本申请又一个实施例提供的目标跟踪方法的方法流程图；FIG. 6 shows a method flowchart of a target tracking method provided in another embodiment of the present application;

图7示出了本申请实施例提供的目标跟踪装置的模块框图；FIG. 7 shows a block diagram of a target tracking device provided by an embodiment of the present application;

图8示出了本申请实施例提供的电子设备的模块框图；FIG. 8 shows a block diagram of modules of an electronic device provided by an embodiment of the present application;

图9出了本申请实施例提供的用于保存或者携带实现根据本申请实施例的目标跟踪方法的程序代码的存储单元。Fig. 9 shows the storage unit provided by the embodiment of the present application for storing or carrying the program code for implementing the object tracking method according to the embodiment of the present application.

具体实施方式detailed description

为了使本技术领域的人员更好地理解本申请方案，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述。In order to enable those skilled in the art to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application.

随着计算机技术的不断发展，人们对智能相机的要求也不断提高，进而智能相机的不同功能被研发出来。例如，百度的baiduLens以及华为的智慧识物等，通过这些功能智能相机可以实时在其屏幕中显示物体的详细信息。通常情况下屏幕中显示的物体有多个，为了能够直观展示每个物体的标签信息，需要标签可以实时跟随物体移动，因此对电子设备提出了跟踪的要求。现有的目标跟踪方法通常利用CNN检测网络给出目标物体的位置，如此对目标进行跟踪时容易出现标签在物体周围跳动且卡顿的问题。虽然一些目标跟踪方法对上述提出的问题进行了解决，但是其提高精度的代价通常是拉低目标跟踪的实时性。With the continuous development of computer technology, people's requirements for smart cameras are also constantly increasing, and different functions of smart cameras have been developed. For example, Baidu's baiduLens and Huawei's smart objects, etc., through these functions, smart cameras can display detailed information of objects on their screens in real time. Usually there are multiple objects displayed on the screen. In order to visually display the label information of each object, the label needs to be able to follow the movement of the object in real time. Therefore, tracking requirements are put forward for electronic devices. The existing target tracking methods usually use the CNN detection network to give the position of the target object, so when the target is tracked, it is easy to have the problem that the label jumps around the object and freezes. Although some target tracking methods solve the above-mentioned problems, the cost of improving the accuracy is usually to reduce the real-time performance of target tracking.

因此，为了克服上述缺陷，本申请提供了如图1所示的目标跟踪的总体框架图，从图1可以看出目标跟踪整体可以分为两部分，第一部分是获取待滤波数据，第二部分是获取到检测框，即通过本申请实施可以获取到原始图像对应的待滤波数据和检测框，结合这两个数据即可实现对目标物的跟踪。具体的，可以将原始图像输入至初始化提取器，并利用该初始化提取器对原始图像进行特征提取。Therefore, in order to overcome the above-mentioned defects, the present application provides an overall frame diagram of target tracking as shown in Figure 1. It can be seen from Figure 1 that the whole target tracking can be divided into two parts, the first part is to obtain the data to be filtered, and the second part The detection frame is obtained, that is, the data to be filtered and the detection frame corresponding to the original image can be obtained through the implementation of this application, and the tracking of the target can be realized by combining these two data. Specifically, the original image can be input into the initialization extractor, and the feature extraction of the original image can be performed by using the initialization extractor.

本申请实施例中初始化提取器可以包括目标检测网络，该目标检测网络可以是MobileNet-SSD(深度卷积神经网络)、GOTURN(Generic Object Tracking UsingRegreSSion Networks)等。利用初始化进行特征提取后，可以获取到跟踪目标的特征数据，该特征数据可以存储于电子设备的数据库(DB)中，而后可以利用该跟踪目标的特征数据对目标物进行跟踪，本申请实施例可以通过KCF(Kernel Correlation Filter，核相关滤波算法)对跟踪目标的特征数据进行采样与分析，进而可以获取到跟踪目标的新位置，即实现对目标物的跟踪。另外，从图1可以看出在进行目标跟踪时，本申请可以利用更新提取器不断获取新的检测框，进而可以对数据库中存储的特征数据进行不断的更新。In the embodiment of the present application, the initialization extractor may include an object detection network, and the object detection network may be MobileNet-SSD (Deep Convolutional Neural Network), GOTURN (Generic Object Tracking Using RegreSSion Networks) and the like. After the feature extraction is performed by initialization, the feature data of the tracking target can be obtained, and the feature data can be stored in the database (DB) of the electronic device, and then the target can be tracked by using the feature data of the tracking target, the embodiment of the present application The feature data of the tracking target can be sampled and analyzed through KCF (Kernel Correlation Filter, kernel correlation filter algorithm), and then the new position of the tracking target can be obtained, that is, the tracking of the target can be realized. In addition, it can be seen from FIG. 1 that during target tracking, the present application can use the update extractor to continuously obtain new detection frames, and then continuously update the feature data stored in the database.

如图2所示本申请实施例提供了一种目标跟踪方法，应用于电子设备，该方法可以包括步骤S201至步骤203。As shown in FIG. 2 , the embodiment of the present application provides a target tracking method applied to an electronic device, and the method may include steps S201 to S203.

步骤S201：获取待跟踪视频的多个帧图像，并从所述多个帧图像中获取多个关键帧。Step S201: Obtain multiple frame images of the video to be tracked, and acquire multiple key frames from the multiple frame images.

本发明实施例中，电子设备可以是手机、笔记本电脑、平板电脑(TabletPersonalComputer)、掌上电脑、膝上型电脑(LaptopComputer)、个人数字助理(personaldigitalaSSistant，简称PDA)、移动互联网设备(MobileInternetDevice，MID)或可穿戴式设备(例如，智能手表(如iWatch)、智能手环、计步器等)或其他可安装部署即时通讯应用客户端的电子设备。In the embodiment of the present invention, the electronic device may be a mobile phone, a notebook computer, a tablet computer (TabletPersonalComputer), a palmtop computer, a laptop computer (LaptopComputer), a personal digital assistant (personal digitala SSistant, referred to as PDA), a mobile Internet device (MobileInternetDevice, MID) Or wearable devices (for example, smart watches (such as iWatch), smart bracelets, pedometers, etc.) or other electronic devices that can install and deploy instant messaging application clients.

在一个实施例中，获取到待跟踪视频后，可以对所述待跟踪视频进行分帧处理，得到多个帧图像，并存储所述待跟踪视频的相关参数，其中，所述相关参数可以包括：视频持续时间长度、视频编码格式、视频总帧数、视频录制时间点、视频曝光时间、视频帧率以及视频录制所在地的经纬度等。本申请实施例中，多个帧图像的获取可以从视频的第一帧开始，并一帧一帧的按照顺序获取，因此，每获取一帧图像后都要保存相应的状态和参数。例如，从视频文件中获取多个帧图像，需要保存视频文件的文件名、相应的解码类型等，具体利用哪种方式获取待跟踪视频的帧图像，这里不进行明确显示。In one embodiment, after the video to be tracked is acquired, the video to be tracked can be divided into frames to obtain multiple frame images, and related parameters of the video to be tracked are stored, wherein the related parameters can include : Video duration, video encoding format, total number of video frames, video recording time point, video exposure time, video frame rate, and the latitude and longitude of the video recording location, etc. In the embodiment of the present application, the acquisition of multiple frames of images can start from the first frame of the video, and be acquired frame by frame in order. Therefore, the corresponding state and parameters must be saved after each frame of image is acquired. For example, to obtain multiple frame images from a video file, it is necessary to save the file name of the video file, the corresponding decoding type, etc., which method is used to obtain the frame images of the video to be tracked, which is not explicitly shown here.

电子设备获取到待跟踪视频的多个帧图像时，可以从所述多个帧图像中获取多个关键帧，其中，关键帧可以是随机从所述多个帧图像中选取的任意数量的图像，也可以是按照一定的规则选取出的图像。本申请实施例可以通过比较所述多个帧图像的像素值来选取多个关键帧，所述关键帧的像素值比普通帧图像的像素值高；也可以通过比较每个所述帧图像的亮度值来选出关键帧，即所述关键帧的亮度值较高；或者也可以获取每个所述帧图像的畸变值、均衡值或者分辨率等来获取多个关键帧。显然，关键帧主要指的是多个帧图像中图像质量比较好的图像，本申请实施例可以利用上述方法中的任意一种获取多个关键帧，也可以结合上述多个方法来获取所述多个关键帧，具体如何获取这里就不进行详细赘述。When the electronic device acquires multiple frame images of the video to be tracked, multiple key frames can be obtained from the multiple frame images, wherein the key frames can be any number of images randomly selected from the multiple frame images , can also be an image selected according to certain rules. In the embodiment of the present application, a plurality of key frames can be selected by comparing the pixel values of the multiple frame images, and the pixel values of the key frames are higher than the pixel values of ordinary frame images; or by comparing the pixel values of each of the frame images The brightness value is used to select the key frame, that is, the brightness value of the key frame is higher; or the distortion value, equalization value or resolution of each frame image can be obtained to obtain multiple key frames. Obviously, a key frame mainly refers to an image with better image quality among multiple frame images. In this embodiment of the present application, any one of the above-mentioned methods can be used to obtain multiple key frames, and the above-mentioned multiple methods can also be combined to obtain the above-mentioned Multiple key frames, how to obtain them will not be described in detail here.

步骤S202：将所述多个关键帧输入至检测网络，得到检测数据，所述检测数据包括目标物的位置信息。Step S202: Input the multiple key frames into the detection network to obtain detection data, the detection data including the position information of the target.

在一种实时方式中，电子设备可以将其获取的多个关键帧输入至检测网络中，利用所述检测网络进行特征提取，进而得到检测数据，而该检测数据则可以包括目标物的位置信息。本申请实施例中，检测网络可以是目标检测网络，目标检测网络也可以称为目标提取网络，其主要作用是从目标图像中分割出目标物体，并对该目标物体进行识别，显然，目标检测网络是将目标物的分割和识别合二为一。常用的目标检测网络有GOTURN网络、MobileNet-SSD深度卷积神经网络、FasterRCNN神经网络、Yolo神经网络以及SPP-Net(Spatial Pyramid Pooling)神经网络等，常用于视频跟踪的神经网络是GOTURN和MobileNet-SSD。In a real-time manner, the electronic device can input multiple key frames acquired by it into the detection network, and use the detection network to perform feature extraction, and then obtain detection data, and the detection data can include the position information of the target . In the embodiment of the present application, the detection network can be a target detection network, and the target detection network can also be called a target extraction network. Its main function is to segment the target object from the target image and identify the target object. Obviously, target detection The network combines the segmentation and recognition of objects into one. Commonly used target detection networks include GOTURN network, MobileNet-SSD deep convolutional neural network, FasterRCNN neural network, Yolo neural network, and SPP-Net (Spatial Pyramid Pooling) neural network, etc. The neural networks commonly used for video tracking are GOTURN and MobileNet- SSD.

GOTURN神经网络是一种利用卷积神经网络进行离线训练的目标跟踪算法，其利用现有大规模分类数据集预训练的CNN分类网络提取特征，再用观测模型进行分类获得跟踪结果。其跟踪过程是将当前帧图像和上一帧图像通过VGG-Net(Very Deep ConvolutionalNetworks for Large-scaleImage Recognition)网络获得两帧图像的特征表达，再通过一个离线训练的决策网络，即可得到当前帧目标图像所在的位置。GOTURN神经网络对于简单场景下的完整目标具有良好的跟踪能力，并且对尺度变化、形变等具有较强的鲁棒性。GOTURN neural network is a target tracking algorithm that uses convolutional neural network for offline training. It uses the existing large-scale classification dataset pre-trained CNN classification network to extract features, and then uses the observation model to classify to obtain tracking results. The tracking process is to use the current frame image and the previous frame image to obtain the feature expression of the two frame images through the VGG-Net (Very Deep Convolutional Networks for Large-scale Image Recognition) network, and then pass an offline training decision network to obtain the current frame. Where the target image is located. The GOTURN neural network has good tracking ability for complete targets in simple scenes, and has strong robustness to scale changes and deformations.

MobileNet-SSD深度卷积神经网络基于流线型的架构，其主要是通过深度可分离的卷积来构建轻量级的深层神经网络，即MobileNet-SSD深度卷积神经网络主要是为了适用于移动端而提出的一种轻量级深度网络模型。MobileNet-SSD深度卷积神经网络可以包括两个简单的全局超参数，在延迟度和准确度之间进行了有效的平衡，这两个超参数允许允许模型构建者根据问题的约束条件，通过选择合适大小的模型，在精度略微降低的条件下，MobileNet-SSD深度卷积神经网络具有较强的实时性。The MobileNet-SSD deep convolutional neural network is based on a streamlined architecture, which mainly constructs a lightweight deep neural network through depth-separable convolutions, that is, the MobileNet-SSD deep convolutional neural network is mainly for mobile terminals. A lightweight deep network model proposed. The MobileNet-SSD deep convolutional neural network can include two simple global hyperparameters, which effectively balance delay and accuracy. These two hyperparameters allow the model builder to choose according to the constraints of the problem. With a model of appropriate size, the MobileNet-SSD deep convolutional neural network has strong real-time performance under the condition of slightly reduced accuracy.

在一个实施例中，MobileNet-SSD深度卷积神经网络可以采用特征金字塔进行目标检测，而检测时则可以利用MobileNet(基础网络)作为特征提取器提取多个大小不同的Feature Maps(特征图)，并在多个特征图上同时进行Softmax分类和位置回归；对于宽为W高为H的特征图，设置W×H个Prior Box(预选框)，预选框的中心均匀地分布在整张图上，然后通过Softmax分类和Bounding Box RegreSSion(边框回归)获取目标的准确位置。另外，MobileNet-SSD深度卷积神经网络可以运行于神经计算棒中，所述神经计算棒是一种使用USB接口的深度学习设备，其体积比U盘略大，功耗1W，浮点性能可以达到100GFLOPs，有着低功耗高性能的优点。In one embodiment, the MobileNet-SSD deep convolutional neural network can use feature pyramids for target detection, and MobileNet (basic network) can be used as a feature extractor to extract multiple Feature Maps (feature maps) of different sizes during detection. And perform Softmax classification and position regression on multiple feature maps at the same time; for a feature map with a width of W and a height of H, set W×H Prior Boxes (pre-selection boxes), and the centers of the pre-selection boxes are evenly distributed on the entire map , and then obtain the accurate position of the target through Softmax classification and Bounding Box Regression (border regression). In addition, the MobileNet-SSD deep convolutional neural network can run on the Neural Compute Stick, which is a deep learning device using a USB interface. Reaching 100GFLOPs, it has the advantages of low power consumption and high performance.

通过上述介绍可以知道，相较于GOTURN神经网络MobileNet-SSD深度卷积神经网络在进行目标跟踪时其检测精度高、鲁棒性强以及实时性好，即相比GOTURN神经网络MobileNet-SSD深度卷积神经网络出框精度更高，同时出框速度更快，同时其还可以依据出框结果对图像进行预处理，并将预处理之后的图像传输至下一级分类模型。因此，本申请实施例将MobileNet-SSD深度卷积神经网络作为检测网络，主要原因是其可以很好的兼顾目标检测的速度和精度。Through the above introduction, we can know that compared with the GOTURN neural network MobileNet-SSD deep convolutional neural network, it has high detection accuracy, strong robustness and good real-time performance when performing target tracking, that is, compared with the GOTURN neural network MobileNet-SSD deep volume The product neural network has higher frame-out accuracy and faster frame-out speed. At the same time, it can also preprocess the image according to the frame-out result, and transmit the preprocessed image to the next-level classification model. Therefore, in the embodiment of the present application, the MobileNet-SSD deep convolutional neural network is used as the detection network, mainly because it can balance the speed and accuracy of target detection well.

在一种实施方式中，MobileNet-SSD深度卷积神经网络可以包括四个部分：第一部分是输入层，该部分主要用于输入图像，即输入关键帧；第二部分为改进的MobileNet卷积网络，其主要用于提取输入的关键帧的特征；第三部分为SSD元结构，用于分类回归和边界回归；第四部分为输出层，其主要用于输出检测结果。SSD元结构是一种回归模型，利用不同卷积层输出的特征进行分类回归和边界框回归，不仅可以较好的缓解了平移不变性和平移可变性之间的矛盾，而且对检测精度和速度有个较好的折中，即在提高检测速度的同时具有较高的检测精度。In one embodiment, the MobileNet-SSD deep convolutional neural network can include four parts: the first part is the input layer, which is mainly used for inputting images, that is, inputting key frames; the second part is an improved MobileNet convolutional network , which is mainly used to extract the features of the input key frame; the third part is the SSD metastructure, which is used for classification regression and boundary regression; the fourth part is the output layer, which is mainly used to output the detection result. The SSD meta-structure is a regression model that uses the features output by different convolutional layers to perform classification regression and bounding box regression, which can not only better alleviate the contradiction between translation invariance and translation variability, but also improve detection accuracy and speed. There is a better compromise, that is, it has higher detection accuracy while improving the detection speed.

利用MobileNet-SSD深度卷积神经网络得到检测数据的过程包括：获取第一个关键帧，获取所述第一个关键帧中目标物对应的目标检测框，并将所述目标检测框对应的位置信息作为检测数据进行存储；读取下一关键帧，并获取该关键帧中目标物体对应的目标检测框，并将所述目标检测框对应的位置信息作为检测数据进行存储，重复上述步骤直至获取到每个所述关键帧中目标物体对应的目标检测框。The process of obtaining detection data by using the MobileNet-SSD deep convolutional neural network includes: obtaining the first key frame, obtaining the target detection frame corresponding to the target in the first key frame, and setting the position corresponding to the target detection frame The information is stored as detection data; the next key frame is read, and the target detection frame corresponding to the target object in the key frame is obtained, and the position information corresponding to the target detection frame is stored as detection data, and the above steps are repeated until obtaining to the target detection frame corresponding to the target object in each key frame.

步骤S203：利用所述检测数据和核相关滤波算法对所述多个帧图像中的所述目标物进行跟踪。Step S203: Track the target object in the plurality of frame images by using the detection data and the kernel correlation filtering algorithm.

在一种实施方式中，获取到检测数据并将该数据存储于数据库时可以利用核相关滤波算法(Kernel Correlation Filter，KCF)对所述多个帧图像中的所述目标物进行跟踪，其中，核相关滤波算法是一种基于在线学习模型的目标跟踪算法。KCF跟踪过程可以分为提取目标特征、模板训练、目标位置预测以及模板更新四个步骤。KCF采用的是岭回归模型，岭回归是一种线性回归模型，由于岭回归有一个简单的闭式解，所以其分类效果能与复杂模型如SVM相当。In one embodiment, when the detection data is acquired and stored in the database, a kernel correlation filter algorithm (Kernel Correlation Filter, KCF) may be used to track the target in the plurality of frame images, wherein, Kernel correlation filtering algorithm is a target tracking algorithm based on online learning model. The KCF tracking process can be divided into four steps: extracting target features, template training, target position prediction, and template updating. KCF uses the ridge regression model, which is a linear regression model. Since ridge regression has a simple closed-form solution, its classification effect can be comparable to complex models such as SVM.

利用检测数据和核相关滤波算法对所述多个帧图像中的所述目标物进行跟踪的具体过程是：将所述检测数据中包含的目标物的位置信息传输至核相关滤波算法，使用该核相关滤波算法跟踪检测到的目标物，即得到目标物的运动轨迹。显然，核相关滤波算法是一种鉴别式跟踪方法，其在进行目标跟踪时可以训练得到一个目标检测器，并用该目标检测器检测下一关键帧预测位置是否是目标，然后再使用新检测结果去更新目标检测器，而在训练目标检测器时一般选取目标区域为正样本，目标周围区域为负样本，越靠近目标的区域为正样本的可能性越大。The specific process of using the detection data and the nuclear correlation filtering algorithm to track the target in the multiple frame images is: transmitting the position information of the target contained in the detection data to the nuclear correlation filtering algorithm, using the The kernel correlation filtering algorithm tracks the detected object, that is, obtains the trajectory of the object. Obviously, the kernel correlation filter algorithm is a discriminative tracking method, which can train a target detector when performing target tracking, and use the target detector to detect whether the predicted position of the next key frame is a target, and then use the new detection result To update the target detector, when training the target detector, the target area is generally selected as a positive sample, and the area around the target is a negative sample. The closer to the target area, the more likely it is to be a positive sample.

本申请实施例提出的一种目标跟踪方法获取到待跟踪视频的多个帧图像后，可以从所述多个帧图像中获取多个关键帧，这些关键帧整体质量比帧图像好，因此，将其输入至检测网络，得到的检测数据更加准确，即在一定程度上可以提高目标检测的效率和准确率，同时本申请利用核相关滤波算法对所述多个帧图像中的所述目标物进行跟踪，可以满足跟踪的稳定性和实效性的需求。After obtaining multiple frame images of the video to be tracked, a target tracking method proposed in the embodiment of the present application can obtain multiple key frames from the multiple frame images. The overall quality of these key frames is better than that of the frame images. Therefore, By inputting it into the detection network, the obtained detection data is more accurate, that is, to a certain extent, the efficiency and accuracy of target detection can be improved. Tracking can meet the requirements of tracking stability and effectiveness.

本申请另一个实施例提供了一种目标跟踪方法，应用于电子设备，请参阅图3，该目标跟踪方法可以包括步骤S301至步骤S304。Another embodiment of the present application provides an object tracking method applied to an electronic device, please refer to FIG. 3 , the object tracking method may include steps S301 to S304.

步骤S301：获取待跟踪视频的多个帧图像，并将所述多个帧图像分为N个图像组。Step S301: Obtain multiple frame images of the video to be tracked, and divide the multiple frame images into N image groups.

通过上述介绍可以知道，获取到待跟踪视频时，电子设备可以对所述待跟踪视频进行分帧处理，由此得到多个帧图像，为了提高目标跟踪的准确性，本申请实施例可以从所述多个帧图像中获取多个固定数量的关键帧，因此，需先将所述多个帧图像分为N个图像组，每个所述图像组中所包含的帧图像的数量保持一致。例如，获取到一个待跟踪视频并将该视频进行分帧处理得到14400个帧图像，可以将这些帧图像分成720个图像组，即每个图像组中包含有20个帧图像。From the above introduction, it can be known that when the video to be tracked is obtained, the electronic device can divide the video to be tracked into frames to obtain multiple frame images. In order to improve the accuracy of target tracking, the embodiment of the present application can start from the Therefore, the multiple frame images need to be divided into N image groups first, and the number of frame images contained in each image group remains the same. For example, if a video to be tracked is obtained and the video is divided into frames to obtain 14400 frame images, these frame images can be divided into 720 image groups, that is, each image group contains 20 frame images.

步骤S302：从每个所述图像组中各获取一个关键帧，得到多个关键帧。Step S302: Obtain a key frame from each of the image groups to obtain multiple key frames.

在一种实施方式中，将待跟踪视频的多个帧图像分为N个图像组后，可以从每个所述图像组中各获取一个关键帧，如此即可得到多个关键帧。例如，14400个帧图像可以分成720个图像组后，可以从这每个图像组中各获取一个关键帧，即从20个帧图像中选取一个关键帧，最后即可得到720的关键帧。从每个所述图像组中各获取一个关键帧，得到多个关键帧的具体步骤如图4所示，从图4可以知道获取多个关键帧可以包括步骤S401至步骤S405。In an implementation manner, after dividing multiple frame images of the video to be tracked into N image groups, one key frame may be obtained from each of the image groups, so that multiple key frames can be obtained. For example, after 14400 frame images can be divided into 720 image groups, a key frame can be obtained from each image group, that is, a key frame can be selected from 20 frame images, and finally 720 key frames can be obtained. The specific steps of obtaining a key frame from each of the image groups and obtaining multiple key frames are shown in FIG. 4 . From FIG. 4 , it can be known that obtaining multiple key frames may include steps S401 to S405.

步骤S401：从当前图像组中获取一个备选帧。Step S401: Obtain a candidate frame from the current image group.

在一种实施方式中，可以将当前图像组中的第一个帧图像作为备选帧，也可以随机从当前图像组中获取一个帧图像作为备选帧。在一个具体实施每个图像组中包含有20个帧图像，本申请实施例可以将这20个帧图像的第一帧作为备选帧，也可以将这20个帧图像中的第十帧作为备选帧，具体将当前图像组中的哪个帧图像作为备选帧这里不进行明确限制。In one embodiment, the first frame image in the current image group may be used as the candidate frame, or a frame image may be randomly obtained from the current image group as the candidate frame. In a specific implementation, each image group contains 20 frame images. In the embodiment of the present application, the first frame of these 20 frame images can be used as an alternative frame, or the tenth frame of these 20 frame images can be used as The candidate frame, specifically which frame image in the current image group is used as the candidate frame is not explicitly limited here.

步骤S402：判断所述备选帧是否为清晰图像。Step S402: Determine whether the candidate frame is a clear image.

在一种实施方式中，判断备选帧是否为清晰图像可以包括：获取所述备选帧的像素均值，并判断该像素均值是否大于预设阈值，当所述像素均值大于预设阈值时，所述备选帧则为清晰图像，即进入步骤S403。其中，像素均值指的是对备选帧中所有R通道像素、G通道像素以及B通道像素求均值，而后再对这三个通道的均值求平均，得到的值即为像素均值。而当所述像素均值小于等于预设阈值时，则可以判定所述备选帧不是清晰图像，即该备选帧为模糊图像，此时，则可以获取该备选帧的下一帧图像，即进入步骤S404。In one embodiment, judging whether the candidate frame is a clear image may include: acquiring the pixel mean value of the candidate frame, and judging whether the pixel mean value is greater than a preset threshold, and when the pixel mean value is greater than the preset threshold value, The candidate frame is a clear image, that is, go to step S403. Among them, the pixel mean value refers to calculating the mean value of all R channel pixels, G channel pixels and B channel pixels in the candidate frame, and then averaging the mean values of these three channels, and the obtained value is the pixel mean value. And when the pixel mean value is less than or equal to the preset threshold, it can be determined that the candidate frame is not a clear image, that is, the candidate frame is a blurred image, at this time, the next frame image of the candidate frame can be obtained, That is, enter step S404.

在一个具体实施例中，当前图像组中可以包含有20个帧图像，首先将所述20个帧图像中的第一帧图像作为备选帧，获取所述备选帧的像素均值，并判该像素均值是否大于预设阈值，如果大于预设阈值，则表明所述第一帧图像是清晰图像，即可以将该备选帧作为关键帧，而如果该像素均值小于等于预设阈值，则获取这20个帧图像中的第二帧图像，并判断所述第二帧图像是否是清晰图像，以此类推直至获取到清晰图像。综上所述，本申请实施例可以利用拉普拉斯算子边缘检测判断备选帧是否为模糊图像，也可以利用清晰度算法BreNNer来判断备选帧是否为模糊图像，或者也可以通过Tenengrad梯度函数来判断备选帧是否为模糊图像，具体使用哪种方法对备选帧的清晰度进行判断这里不进行明确限制。In a specific embodiment, 20 frame images may be included in the current image group, first, the first frame image in the 20 frame images is used as a candidate frame, and the pixel mean value of the candidate frame is obtained, and judged Whether the pixel mean value is greater than the preset threshold value, if it is greater than the preset threshold value, it indicates that the first frame image is a clear image, that is, the candidate frame can be used as a key frame, and if the pixel mean value is less than or equal to the preset threshold value, then Acquiring the second frame image among the 20 frame images, and judging whether the second frame image is a clear image, and so on until a clear image is obtained. In summary, the embodiment of the present application can use Laplacian edge detection to determine whether the candidate frame is a blurred image, or can use the sharpness algorithm BreNNer to determine whether the candidate frame is a blurred image, or can also use Tenengrad The gradient function is used to judge whether the candidate frame is a blurred image, and which method is used to judge the clarity of the candidate frame is not explicitly limited here.

步骤S403：将所述备选帧作为所述当前图像组的关键帧。Step S403: Use the candidate frame as a key frame of the current image group.

如果判定所述备选帧是清晰图像，则将该备选帧作为当前图像组的关键帧，在此之后可以获取下一个图像组，即进入步骤S405。If it is determined that the candidate frame is a clear image, the candidate frame is used as a key frame of the current image group, after which the next image group can be obtained, that is, step S405 is entered.

步骤S404：获取所述备选帧的下一帧图像，并将所述下一帧图像作为新的备选帧。Step S404: Acquiring the image of the next frame of the candidate frame, and using the image of the next frame as a new candidate frame.

如果判定所述备选帧不是清晰图像时，则获取所述备选帧的下一帧图像，并将所述下一帧图像作为新的备选帧，并返回执行判断所述备选帧是否为清晰图像的操作，即进入步骤S402。例如，获取到第一个图像组的第一帧图像后，判断出所述第一帧图像是模糊图像，此时，则可以获取所述第一个图像组的第二帧图像，并将该第二帧图像作为备选帧判断其像素均值是否大于预设阈值，如果其像素均值仍然小于等于预设阈值，则获取第一个图像组的第三帧图像，并判断所述第三帧图像的像素均值是否大于预设阈值，依次执行直至获取到符合条件的关键帧。If it is determined that the candidate frame is not a clear image, then obtain the next frame image of the candidate frame, and use the next frame image as a new candidate frame, and return to execute to determine whether the candidate frame is For clear image operation, enter step S402. For example, after obtaining the first frame image of the first image group, it is judged that the first frame image is a blurred image, at this time, the second frame image of the first image group can be obtained, and the The second frame image is used as a candidate frame to judge whether its pixel mean value is greater than the preset threshold value, if its pixel mean value is still less than or equal to the preset threshold value, then obtain the third frame image of the first image group, and judge the third frame image Whether the average value of the pixel is greater than the preset threshold, and execute in sequence until a key frame that meets the conditions is obtained.

需要说明的是，如果当前图像组中的所有帧图像的像素均值均小于等于预设阈值时，则可以对这些帧图像的像素均值进行排序，选取像素均值最大的帧图像作为当前图像组的关键帧；也可以直接将当前图像组中的最后一个帧图像作为当前图像组的关键帧，或者可以不从当前图像组中获取关键帧，继续从下一个图像组中获取关键帧。It should be noted that if the average pixel values of all frame images in the current image group are less than or equal to the preset threshold, the pixel average values of these frame images can be sorted, and the frame image with the largest pixel average value is selected as the key of the current image group frame; the last frame image in the current image group can also be directly used as the key frame of the current image group, or the key frame can not be obtained from the current image group, and continue to obtain the key frame from the next image group.

步骤S405：获取下一个图像组，并将所述下一个图像组作为当前图像组，获取当前图像组中的关键帧。Step S405: Obtain the next image group, and use the next image group as the current image group, and obtain key frames in the current image group.

在一种实施方式中，如果备选帧是清晰图像，则将所述备选帧作为当前图像组的关键帧之后，可以获取下一个图像组，并将所述下一个图像组作为当前图像组，获取当前图像组中的关键帧。为了更加清楚的理解如何从多个帧图像中获取多个关键帧，本申请给出一个具体的实施例，在该实施例中可以从每20帧中获取一个关键帧。首先可以将第一个图像组的初始帧(标号为1)作为备选帧，并判断所述备选帧是否模糊，如果该备选帧不模糊则将这个初始帧作为关键帧输入至检测网络中，而后获取第二个图像组，并将所述第二个图像组的第一帧(标号21)作为备选帧，如果所述第一帧(标号21)模糊，则获取该图像组的第二帧(标号22)，并将该帧作为备选帧，如果所述第二帧(标号22)不模糊，则将所述第二帧(标号22)输入至检测网络中，而后获取第三个图像组，并将所述第三个图像组的第二帧(标号42)作为备选帧，依此类推，直至获取到所有图像组的关键帧。In one embodiment, if the candidate frame is a clear image, after the candidate frame is used as the key frame of the current group of images, the next group of images can be obtained, and the next group of images can be used as the current group of images , get the keyframes in the current image group. In order to understand more clearly how to obtain multiple key frames from multiple frame images, the present application provides a specific embodiment, in which a key frame can be obtained from every 20 frames. First, the initial frame (labeled 1) of the first image group can be used as a candidate frame, and it is judged whether the candidate frame is blurred, and if the candidate frame is not blurred, this initial frame is input to the detection network as a key frame , then acquire the second image group, and use the first frame (label 21) of the second image group as an alternative frame, if the first frame (label 21) is blurred, then obtain the image group's The second frame (label 22), and this frame as an alternative frame, if the second frame (label 22) is not fuzzy, then the second frame (label 22) is input into the detection network, and then the second frame is obtained Three image groups, and the second frame (label 42) of the third image group is used as a candidate frame, and so on, until the key frames of all image groups are obtained.

如图5给出了获取多个关键帧的具体示意图，从图5可以看出关键帧的获取主要是通过关键帧提取器获取的，在对关键帧进行选择前需先将多个帧图像以数据流的方式送入至关键帧提取器，其中数据流的送入帧率(每秒送入几帧)为30fps，即每20帧的时间间隔为2/3≈0.67s，该时间也是核相关滤波算法卷积核的更新频率，其可以完全满足实际使用中场景更新的频度，同时也能避免过于频繁运行检测网络模型，造成功耗问题。As shown in Figure 5, the specific schematic diagram of obtaining multiple key frames is given. From Figure 5, it can be seen that the acquisition of key frames is mainly obtained through the key frame extractor. The data stream is sent to the key frame extractor, where the input frame rate of the data stream (sends several frames per second) is 30fps, that is, the time interval of every 20 frames is 2/3≈0.67s, which is also the core The update frequency of the convolution kernel of the correlation filtering algorithm can fully meet the update frequency of the scene in actual use, and at the same time, it can avoid running the detection network model too frequently, causing power consumption problems.

在一个实施例中，关键提取器可以从每20个帧图像中选择一个关键帧，在此之前需先获取一个备选帧，通过拉普拉斯算子边缘检测，判定该备选帧是否模糊，如果不模糊则判定为关键帧，跳过20帧继续获取备选帧以及判断选取的备选帧是否模糊。如果模糊则选取当前帧的下一帧再次执行模糊判断，以此类推。其中，初始帧可以认为是特殊的关键帧(首张关键帧)，而模糊判别则可以是判断备选帧的像素均值是否大于预设阈值，如果大于则将该备选帧作为关键帧，送入检测网络。In one embodiment, the key extractor can select a key frame from every 20 frame images. Before that, it needs to obtain an alternative frame, and determine whether the alternative frame is blurred through Laplacian edge detection. , if it is not blurred, it is determined as a key frame, skip 20 frames and continue to obtain alternative frames and determine whether the selected alternative frame is blurred. If it is blurred, select the next frame of the current frame to perform blur judgment again, and so on. Among them, the initial frame can be considered as a special key frame (the first key frame), and the fuzzy discrimination can be to judge whether the pixel mean value of the candidate frame is greater than the preset threshold, and if it is greater, the candidate frame is used as the key frame and sent to into the detection network.

需要说明的是，本申请实施例可以从每个图像组中各获取一个关键帧，也可以从每个图像组中各获取两个关键帧，或者也可以从每个图像组中各获取三个关键帧，具体从每个图像组中获取多少个关键帧这里不进行明确限制，可以根据实际情况进行选择。例如，待跟踪视频的质量整体比较好，则可以从每个图像组中多获取几个关键帧，而如果视频整体质量不好、清晰度较差时，则可以适当减少每个图像组中关键帧获取的数量。It should be noted that, in the embodiment of the present application, one key frame may be obtained from each image group, two key frames may be obtained from each image group, or three key frames may be obtained from each image group. Keyframes, how many keyframes are obtained from each image group is not explicitly limited here, and can be selected according to actual conditions. For example, if the overall quality of the video to be tracked is relatively good, you can obtain several more key frames from each image group, and if the overall quality of the video is poor and the definition is poor, you can appropriately reduce the number of key frames in each image group. The number of frames to fetch.

步骤S303：将所述多个关键帧输入至检测网络，得到检测数据，所述检测数据包括目标物的位置信息。Step S303: Input the multiple key frames into the detection network to obtain detection data, the detection data including the position information of the target.

步骤S304：利用所述检测数据和核相关滤波算法对所述多个帧图像中的所述目标物进行跟踪。Step S304: Track the target object in the plurality of frame images by using the detection data and the kernel correlation filtering algorithm.

本申请实施提出的一种目标跟踪方法为了使目标跟踪更加准确可以将获取的多个帧图像分为N个图像组，并从每个图像组中各获取一个关键帧，得到多个关键帧，其中，关键帧的获取则可以通过判断图像的像素均值来获取，即本申请实施例引入关键帧提取机制，可以有效删除多个帧图像中的抖动模糊的图像，提高检测网络获取目标位置信息的准确度，进而可以提高核相关滤波算法的跟踪能力，同时能够给降低检测网络模型的运行频率，降低整个电子设备的功耗。In order to make the target tracking more accurate, a target tracking method proposed by the application can divide the acquired multiple frame images into N image groups, and obtain a key frame from each image group to obtain multiple key frames, Among them, the acquisition of the key frame can be obtained by judging the pixel mean value of the image, that is, the embodiment of the present application introduces a key frame extraction mechanism, which can effectively delete the shaking and blurred images in multiple frame images, and improve the detection network to obtain the target position information. Accuracy, which in turn can improve the tracking ability of the kernel correlation filtering algorithm, and at the same time reduce the operating frequency of the detection network model and reduce the power consumption of the entire electronic device.

本申请又一个实施例提供了一种目标跟踪方法，应用于电子设备，请参阅图6，该目标跟踪方法可以包括步骤S601至步骤S606。Yet another embodiment of the present application provides an object tracking method applied to an electronic device, please refer to FIG. 6 , the object tracking method may include steps S601 to S606.

步骤S601：获取待跟踪视频的多个帧图像，并从所述多个帧图像中获取多个关键帧。Step S601: Obtain multiple frame images of the video to be tracked, and acquire multiple key frames from the multiple frame images.

步骤S602：将所述多个关键帧输入至检测网络，得到检测数据，所述检测数据包括目标物的位置信息。Step S602: Input the multiple key frames into the detection network to obtain detection data, the detection data including the position information of the target.

在一种实施方式中，为了使目标跟踪更加准确，可以将多个关键帧输入至检测网络中，以获取每个关键帧中目标物对应的目标检测框，而后可以获取该目标检测框对应的置信度，并判断所述目标检测框的置信度是否大于置信阈值，如果所述目标检测框的置信度大于置信阈值，则将所述目标检测框对应的位置信息作为检测数据。其中，目标边界框的置信度指的是候选框真正包含有目标的概率，目标边界框置信度的计算过程与具体的检测网络相关，本发明实施例在此不做赘述。In one embodiment, in order to make target tracking more accurate, multiple key frames can be input into the detection network to obtain the target detection frame corresponding to the target object in each key frame, and then the target detection frame corresponding to the target detection frame can be obtained Confidence, and judge whether the confidence of the target detection frame is greater than the confidence threshold, if the confidence of the target detection frame is greater than the confidence threshold, then use the position information corresponding to the target detection frame as the detection data. Wherein, the confidence degree of the target bounding box refers to the probability that the candidate frame actually contains the target, and the calculation process of the confidence degree of the target bounding box is related to a specific detection network, and will not be described in this embodiment of the present invention.

另外，如果所述目标检测框的置信度小于等于置信阈值，则不将所述目标检测框对应的位置信息作为检测数据，即电子设备的屏幕对目标进行显示时不对置信度小于等于置信阈值的检测框进行显示。例如，置信阈值为0.5，关键帧中包含的目标物有人、狗和猫，检测网络获取到人对应的检测框的置信度为0.88，狗对应的检测框的置信度为0.75，而猫对应的检测框的置信度则为0.45，显然，人对应的检测框的置信度和狗对应的检测框的置信度均大于置信阈值0.5，因此，可以将人对应的检测框对应的位置信息和狗对应的检测框对应的位置信息作为检测数据，而猫对应的检测框对应的位置信息则不能作为检测数据，故电子设备的屏幕上最终只显示人和狗对应的检测框，而不显示猫对应的检测框。In addition, if the confidence degree of the target detection frame is less than or equal to the confidence threshold, the position information corresponding to the target detection frame is not used as the detection data, that is, when the screen of the electronic device displays the target, it does not use the confidence degree less than or equal to the confidence threshold. The detection frame is displayed. For example, the confidence threshold is 0.5, and the target objects contained in the key frame are people, dogs, and cats. The detection network obtains a confidence level of 0.88 for the detection frame corresponding to the person, 0.75 for the detection frame corresponding to the dog, and 0.75 for the detection frame corresponding to the cat. The confidence degree of the detection frame is 0.45. Obviously, the confidence degree of the detection frame corresponding to the person and the confidence degree of the detection frame corresponding to the dog are both greater than the confidence threshold of 0.5. Therefore, the position information corresponding to the detection frame corresponding to the person can be compared with the dog. The position information corresponding to the detection frame of the cat is used as the detection data, but the position information corresponding to the detection frame corresponding to the cat cannot be used as the detection data. Therefore, only the detection frames corresponding to the person and the dog are displayed on the screen of the electronic device, and the detection frame corresponding to the cat is not displayed. detection box.

步骤S603：判断是否成功获取到所述检测数据。Step S603: Judging whether the detection data is successfully acquired.

通过上述介绍可以知道，电子设备获取到多个关键帧后可以将所述多个关键帧输入至检测网络里，利用所述检测网络对这些关键帧进行特征提取，进而可以获取到检测数据，其中，检测数据则可以包括目标物的位置信息，而后电子设备可以将所述检测数据存储于数据库中。而当关键帧未输入至检测网络，或者将关键帧输入至检测网络时发生丢帧现象，或者检测网络在运行时其参数发生错误警告时则有可能无法得到检测数据。因此，需要判断是否成功获取到检测数据，如果成功获取到所述检测数据，则利用所述检测数据和核相关滤波算法对所述多个帧图像中的所述目标物进行跟踪，即进入步骤S604，如果未成功获取到检测数据，则暂停对所述多个帧图像中目标帧图像中的所述目标物进行跟踪，即进入步骤S605。Through the above introduction, it can be known that after the electronic device obtains multiple key frames, the multiple key frames can be input into the detection network, and the detection network is used to perform feature extraction on these key frames, and then the detection data can be obtained, wherein , the detection data may include the position information of the target, and then the electronic device may store the detection data in a database. However, when the key frame is not input to the detection network, or when the key frame is input to the detection network, frame loss occurs, or when the detection network has an error warning on its parameters during operation, the detection data may not be obtained. Therefore, it is necessary to judge whether the detection data is successfully obtained, and if the detection data is successfully obtained, then use the detection data and the kernel correlation filtering algorithm to track the target in the multiple frame images, that is, enter the step S604. If the detection data is not acquired successfully, suspend tracking the target object in the target frame images in the plurality of frame images, that is, enter step S605.

步骤S604：利用所述检测数据和核相关滤波算法对所述多个帧图像中的所述目标物进行跟踪。Step S604: Track the target object in the plurality of frame images by using the detection data and the kernel correlation filter algorithm.

步骤S605：暂停对所述多个帧图像中目标帧图像中的所述目标物进行跟踪。Step S605: Suspend the tracking of the target object in the target frame images in the plurality of frame images.

在一种实施方式中，如果未成功获取到所述检测数据，则暂停对所述多个帧图像中目标帧图像中的所述目标物进行跟踪，而后将所述目标帧的下一帧图像作为新的目标帧图像，即进入步骤S606。In one embodiment, if the detection data is not acquired successfully, the tracking of the target object in the target frame image among the plurality of frame images is suspended, and then the next frame image of the target frame is As a new target frame image, proceed to step S606.

步骤S606：将所述目标帧的下一帧图像作为新的目标帧图像。Step S606: Use the next frame image of the target frame as a new target frame image.

当判定电子设备未成功获取到检测数据时，可以暂停对多个帧图像中目标帧图像中的目标物进行跟踪，然后将所述目标帧的下一帧图像作为新的目标帧图像，并返回执行判断是否成功获取到所述检测数据的操作，即进入步骤S603。因此，在数据库保存初始的检测数据之前，传输至核相关滤波算法的图像均会被丢弃不予处理，直到初始化提取器完成，核相关滤波算法能够访问数据库中的检测数据后，即开始进行目标跟踪。在进行目标跟踪的同时，本申请实施例仍然会根据关键帧送入检测网络后得到的目标物的位置信息来提取新的特征，随后更新到数据库中，而最新的数据也可以即时用于目标跟踪。When it is determined that the electronic device has not successfully acquired the detection data, it can suspend tracking the target object in the target frame image of multiple frame images, and then use the next frame image of the target frame as a new target frame image, and return Execute the operation of judging whether the detection data is successfully obtained, that is, enter step S603. Therefore, before the database saves the initial detection data, the images transmitted to the kernel correlation filtering algorithm will be discarded and not processed until the initialization extractor is completed and the kernel correlation filtering algorithm can access the detection data in the database, and then start to process the target image. track. While performing target tracking, the embodiment of the present application will still extract new features based on the position information of the target obtained after the key frame is sent to the detection network, and then update it to the database, and the latest data can also be used for the target immediately track.

通过图1可以知道本申请实施例中将所述多个关键帧输入至检测网络的操作，以及利用所述检测数据和核相关滤波算法对所述多个帧图像中的所述目标物体进行跟踪的操作可以并行执行。换句话说，利用检测网络得到检测数据的操作是独立于核相关滤波算法结构的，在软件框架设计中检测网络和核相关滤波算法是在各自的线程中运行的，多个帧图像通过关键帧提取器分流后分别传输至检测网络对应的线程和核相关滤波算法对应的线程，而这两个线程唯一的交互则是在数据库中，即检测网络获取检测数据并存储至数据库，而核相关滤波算法则是从数据库中提取检测数据，利用该检测数据进行目标物的跟踪。From Figure 1, we can know the operation of inputting the multiple key frames into the detection network in the embodiment of the present application, and using the detection data and kernel correlation filtering algorithm to track the target object in the multiple frame images operations can be performed in parallel. In other words, the operation of using the detection network to obtain detection data is independent of the structure of the kernel correlation filtering algorithm. In the design of the software framework, the detection network and the kernel correlation filtering algorithm are run in their own threads, and multiple frame images pass through the key frame After the extractor is shunted, it is transmitted to the thread corresponding to the detection network and the thread corresponding to the kernel correlation filtering algorithm, and the only interaction between these two threads is in the database, that is, the detection network obtains the detection data and stores it in the database, while the kernel correlation filtering algorithm The algorithm is to extract the detection data from the database, and use the detection data to track the target.

通过上述介绍可以知道，核相关滤波算法的跟踪不需要等待检测网络的运行结果，可以很好的体现实时性，而检测网络的出框精度和出框数量能够独立调节，进而控制核相关滤波算法能够跟踪目标准确度与数量多寡，同时检测网络的检测结果不仅可以用于目标跟踪，其也可以用于图像的预处理，即使将其传输至实时性较差的网络也可以获取到比较准确的分类结果。Through the above introduction, we can know that the tracking of the kernel correlation filtering algorithm does not need to wait for the operation result of the detection network, which can reflect the real-time performance very well, and the out-of-frame accuracy and the number of out-of-frame detection networks can be adjusted independently, and then control the kernel correlation filtering algorithm It can track the accuracy and number of targets. At the same time, the detection results of the detection network can not only be used for target tracking, but also can be used for image preprocessing. Even if it is transmitted to a network with poor real-time performance, it can obtain more accurate classification results.

本申请实施例中提出的一种目标跟踪方法利用检测网络可以获取到更加准确的位置信息，进而可以提高核相关滤波算法输出的跟踪准确度，并且本申请实施例可以通过置信阈值来控制出框的多寡，进而提高跟踪的准确率，同时检测网络和核相关滤波算法并行运算，核相关滤波算法从检测网络的运算结果中更新数据并不会陷入等待，即使检测网络停止运算核相关滤波算法仍然可以运行，即提高目标跟踪的实时性。A target tracking method proposed in the embodiment of the application can obtain more accurate position information by using the detection network, thereby improving the tracking accuracy output by the kernel correlation filtering algorithm, and the embodiment of the application can control the out-of-frame through the confidence threshold In order to improve the accuracy of tracking, the detection network and the kernel correlation filtering algorithm are operated in parallel at the same time. The kernel correlation filtering algorithm updates the data from the operation results of the detection network and does not fall into waiting. Even if the detection network stops computing, the kernel correlation filtering algorithm still It can be run, that is, to improve the real-time performance of target tracking.

请参阅图7，本申请实施例提出的一种目标跟踪装置700，应用于电子设备，具体地，该目标跟踪装置700包括：关键帧获取模块701、检测数据获取模块702和目标跟踪模块703。Please refer to FIG. 7 , anobject tracking device 700 proposed by the embodiment of the present application is applied to electronic equipment. Specifically, theobject tracking device 700 includes: a keyframe acquisition module 701 , a detectiondata acquisition module 702 and anobject tracking module 703 .

关键帧获取模块701，用于获取待跟踪视频的多个帧图像，并从所述多个帧图像中获取多个关键帧。The keyframe obtaining module 701 is configured to obtain multiple frame images of the video to be tracked, and obtain multiple key frames from the multiple frame images.

进一步的，关键帧获取模块701还用于将所述多个帧图像分为N个图像组，从每个所述图像组中各获取一个关键帧，得到多个关键帧。Further, the keyframe acquiring module 701 is further configured to divide the plurality of frame images into N image groups, acquire a key frame from each of the image groups, and obtain multiple key frames.

进一步的，关键帧获取模块701还用于从当前图像组中获取一个备选帧，判断所述备选帧是否为清晰图像，如果所述备选帧是清晰图像，则将所述备选帧作为所述当前图像组的关键帧。Further, the keyframe acquisition module 701 is also used to acquire an alternative frame from the current image group, and judge whether the alternative frame is a clear image, and if the alternative frame is a clear image, then the alternative frame as the key frame of the current image group.

进一步的，关键帧获取模块701还用于如果所述备选帧是清晰图像，则将所述备选帧作为所述当前图像组的关键帧之后，获取下一个图像组，并将所述下一个图像组作为当前图像组，获取当前图像组中的关键帧。Further, the keyframe acquisition module 701 is also used for if the candidate frame is a clear image, after the candidate frame is used as the key frame of the current image group, the next image group is acquired, and the next An image group is used as the current image group, and the keyframes in the current image group are obtained.

进一步的，关键帧获取模块701还用于获取所述备选帧的像素均值，并判断该像素均值是否大于预设阈值，当所述像素均值大于预设阈值时，所述备选帧则为清晰图像。Further, the keyframe acquiring module 701 is also used to acquire the pixel mean value of the candidate frame, and judge whether the pixel mean value is greater than a preset threshold, and when the pixel mean value is greater than the preset threshold value, the candidate frame is Sharp image.

进一步的，关键帧获取模块701还用于如果所述备选帧不是清晰图像，则获取所述备选帧的下一帧图像，并将所述下一帧图像作为新的备选帧，并返回执行判断所述备选帧是否为清晰图像的操作。Further, the keyframe acquiring module 701 is also configured to acquire the next frame image of the candidate frame if the candidate frame is not a clear image, and use the next frame image as a new candidate frame, and Return to the operation of judging whether the candidate frame is a clear image.

检测数据获取模块702，用于将所述多个关键帧输入至检测网络，得到检测数据，所述检测数据包括目标物的位置信息。The detectiondata acquisition module 702 is configured to input the multiple key frames into the detection network to obtain detection data, the detection data including the position information of the target.

进一步的，检测数据获取模块702还用于将所述多个关键帧输入至检测网络，以获取每个关键帧中目标物对应的目标检测框，获取所述目标检测框的置信度，并判断所述目标检测框的置信度是否大于置信阈值，如果所述目标检测框的置信度大于置信阈值，则将所述目标检测框对应的位置信息作为检测数据。Further, the detectiondata acquisition module 702 is also used to input the multiple key frames into the detection network to obtain the target detection frame corresponding to the target object in each key frame, obtain the confidence of the target detection frame, and determine Whether the confidence of the target detection frame is greater than a confidence threshold, if the confidence of the target detection frame is greater than the confidence threshold, the position information corresponding to the target detection frame is used as detection data.

目标跟踪模块703，用于利用所述检测数据和核相关滤波算法对所述多个帧图像中的所述目标物进行跟踪。Atarget tracking module 703, configured to track the target in the plurality of frame images by using the detection data and a kernel correlation filtering algorithm.

进一步的，目标跟踪模块703还用于判断是否成功获取到所述检测数据，如果成功获取到所述检测数据，则利用所述检测数据和核相关滤波算法对所述多个帧图像中的所述目标物进行跟踪。Further, thetarget tracking module 703 is also used to judge whether the detection data is successfully obtained, and if the detection data is successfully obtained, use the detection data and the kernel correlation filter algorithm to perform to track the target.

进一步的，目标跟踪模块703还用于如果未成功获取到所述检测数据，则暂停对所述多个帧图像中目标帧图像中的所述目标物进行跟踪，将所述目标帧的下一帧图像作为新的目标帧图像，并返回执行判断是否成功获取到所述检测数据的操作。Further, thetarget tracking module 703 is also configured to suspend tracking the target object in the target frame image in the plurality of frame images if the detection data is not successfully obtained, and track the target object in the target frame image in the next frame image. The frame image is used as a new target frame image, and the operation of judging whether the detection data is successfully obtained is returned.

另外，所述将所述多个关键帧输入至检测网络的操作，以及利用所述检测数据和核相关滤波算法对所述多个帧图像中的所述目标物进行跟踪的操作并行执行。In addition, the operation of inputting the plurality of key frames into the detection network and the operation of tracking the target object in the plurality of frame images by using the detection data and the kernel correlation filtering algorithm are performed in parallel.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述装置和模块的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the devices and modules described above can refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.

在本申请所提供的几个实施例中，模块相互之间的耦合可以是电性，机械或其它形式的耦合。In several embodiments provided in the present application, the coupling between the modules may be electrical, mechanical or other forms of coupling.

另外，在本申请各个实施例中的各功能模块可以集成在一个处理模块中，也可以是各个模块单独物理存在，也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。In addition, each functional module in each embodiment of the present application may be integrated into one processing module, each module may exist separately physically, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules.

请参考图8，其示出了本申请实施例提供的一种电子设备的结构框图。该电子设备100可以是智能手机、平板电脑、电子书等能够运行应用程序的电子设备。本申请中的电子设备100可以包括一个或多个如下部件：处理器110、存储器120、屏幕130以及一个或多个应用程序，其中一个或多个应用程序可以被存储在存储器120中并被配置为由一个或多个处理器110执行，一个或多个程序配置用于执行如前述方法实施例所描述的方法。Please refer to FIG. 8 , which shows a structural block diagram of an electronic device provided by an embodiment of the present application. Theelectronic device 100 may be an electronic device capable of running application programs, such as a smart phone, a tablet computer, and an e-book. Theelectronic device 100 in the present application may include one or more of the following components: aprocessor 110, amemory 120, a screen 130, and one or more application programs, wherein one or more application programs may be stored in thememory 120 and configured To be executed by one ormore processors 110, one or more programs are configured to execute the methods described in the foregoing method embodiments.

处理器110可以包括一个或者多个处理核。处理器110利用各种接口和线路连接整个电子设备100内的各个部分，通过运行或执行存储在存储器120内的指令、程序、代码集或指令集，以及调用存储在存储器120内的数据，执行电子设备100的各种功能和处理数据。可选地，处理器110可以采用数字信号处理(DigitalSignalProceSSing，DSP)、现场可编程门阵列(Field－ProgrammableGateArray，FPGA)、可编程逻辑阵列(ProgrammableLogicArray，PLA)中的至少一种硬件形式来实现。处理器110可集成中央处理器(CentralProceSSingUnit，CPU)、图像处理器(GraphicsProceSSingUnit，GPU)和调制解调器等中的一种或几种的组合。其中，CPU主要处理操作系统、用户界面和应用程序等；GPU用于负责显示内容的渲染和绘制；调制解调器用于处理无线通信。可以理解的是，上述调制解调器也可以不集成到处理器110中，单独通过一块通信芯片进行实现。Processor 110 may include one or more processing cores. Theprocessor 110 uses various interfaces and lines to connect various parts of the entireelectronic device 100, and executes or executes instructions, programs, code sets or instruction sets stored in thememory 120, and calls data stored in thememory 120 to execute Various functions of theelectronic device 100 and processing data. Optionally, theprocessor 110 may be implemented in at least one hardware form among digital signal processing (Digital Signal Processing, DSP), field-programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA). Theprocessor 110 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), an image processing unit (Graphics Processing Unit, GPU), a modem, and the like. Among them, the CPU mainly handles the operating system, user interface and application programs, etc.; the GPU is used to render and draw the displayed content; the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may not be integrated into theprocessor 110, but may be realized by a communication chip alone.

具体地，处理器110可以包括中央处理器(CentralProceSSingUnit，CPU)、多媒体显示处理器和图形处理器(GraphicsProceSSingUnit，GPU)和调制解调器等中的一种或几种的组合。其中，CPU主要处理操作系统、用户界面和客户端等；GPU用于负责显示内容的渲染和绘制；调制解调器用于处理无线通信。可以理解的是，上述调制解调器也可以不集成到处理器110中，单独通过一块通信芯片进行实现。Specifically, theprocessor 110 may include one or a combination of a central processing unit (CentralProceSSingUnit, CPU), a multimedia display processor, a graphics processing unit (GraphicsProceSSingUnit, GPU), a modem, and the like. Among them, the CPU mainly handles the operating system, user interface and client, etc.; the GPU is used to render and draw the displayed content; the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may not be integrated into theprocessor 110, but may be realized by a communication chip alone.

存储器120可以包括随机存储器(RandomAcceSSMemory，RAM)，也可以包括只读存储器(Read-OnlyMemory)。存储器120可用于存储指令、程序、代码、代码集或指令集。存储器120可包括存储程序区和存储数据区，其中，存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等。存储数据区还可以存储电子设备100在使用中所创建的数据(比如电话本、音视频数据、聊天记录数据)等。Thememory 120 may include a random access memory (RandomAcceSSMemory, RAM), and may also include a read-only memory (Read-OnlyMemory). Thememory 120 may be used to store instructions, programs, codes, sets of codes, or sets of instructions. Thememory 120 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system and instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.) , instructions for implementing the following method embodiments, and the like. The storage data area can also store data created during use of the electronic device 100 (such as phonebook, audio and video data, chat record data) and the like.

所述屏幕130用于显示由用户输入的信息、提供给用户的信息以及电子设备的各种图形用户接口，这些图形用户接口可以由图形、文本、图标、数字、视频和其任意组合来构成，在一个实例中，触摸屏可设置于所述显示面板上从而与所述显示面板构成一个整体。The screen 130 is used to display information input by the user, information provided to the user, and various graphical user interfaces of the electronic device. These graphical user interfaces can be composed of graphics, text, icons, numbers, videos, and any combination thereof, In one example, the touch screen can be disposed on the display panel so as to form an integral body with the display panel.

请参考图9，其示出了本申请实施例提供的一种计算机可读存储介质的结构框图。该计算机可读介质1000中存储有程序代码，所述程序代码可被处理器调用执行上述方法实施例中所描述的方法。Please refer to FIG. 9 , which shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application. Program codes are stored in the computer-readable medium 1000, and the program codes can be invoked by a processor to execute the methods described in the foregoing method embodiments.

计算机可读存储介质1000可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。可选地，计算机可读存储介质1000包括非易失性计算机可读介质(non-transitorycomputer-readablestoragemedium)。计算机可读存储介质1000具有执行上述方法中的任何方法步骤的程序代码1010的存储空间。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。程序代码1010可以例如以适当形式进行压缩。The computer readable storage medium 1000 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. Optionally, the computer-readable storage medium 1000 includes a non-transitory computer-readable storage medium (non-transitory computer-readable storage medium). The computer-readable storage medium 1000 has a storage space forprogram code 1010 for executing any method steps in the above methods. These program codes can be read from or written into one or more computer program products.Program code 1010 may, for example, be compressed in a suitable form.

最后应说明的是：以上实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not drive the essence of the corresponding technical solutions away from the spirit and scope of the technical solutions of the various embodiments of the present application.