Movatterモバイル変換


[0]ホーム

URL:


CN107945210B - Target tracking method based on deep learning and environment self-adaption - Google Patents

Target tracking method based on deep learning and environment self-adaption
Download PDF

Info

Publication number
CN107945210B
CN107945210BCN201711237457.4ACN201711237457ACN107945210BCN 107945210 BCN107945210 BCN 107945210BCN 201711237457 ACN201711237457 ACN 201711237457ACN 107945210 BCN107945210 BCN 107945210B
Authority
CN
China
Prior art keywords
target
samples
frame
positive
negative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711237457.4A
Other languages
Chinese (zh)
Other versions
CN107945210A (en
Inventor
周圆
李孜孜
曹颖
杜晓婷
杨鸿宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin UniversityfiledCriticalTianjin University
Priority to CN201711237457.4ApriorityCriticalpatent/CN107945210B/en
Publication of CN107945210ApublicationCriticalpatent/CN107945210A/en
Application grantedgrantedCritical
Publication of CN107945210BpublicationCriticalpatent/CN107945210B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于深度学习和环境自适应的目标跟踪算法,该跟踪算法由两部分组成,一部分是预处理,对跟踪视频的每一帧图像来提取信息,然后通过显著性检测、卷积神经网络算法来对采取的正负样本进行进一步的筛选;另外一部分是实现VGG模型的卷积神经网络:首先利用三层的卷积网络来提取目标特征,其次利用全连接层来对目标和背景来进行分类,最后得到想要跟踪的目标的位置,再开始下一帧的跟踪流程。现有技术相比,本发明(1)能够在降低计算复杂度的同时,精确使用图像的预处理信息,使得跟踪效果更加精确,因此,本发明内容具有独创性;(2)该跟踪器能适应多种环境复杂的场景,有着广泛的应用前景。

Figure 201711237457

The invention discloses a target tracking algorithm based on deep learning and environment adaptation. The tracking algorithm is composed of two parts, one part is preprocessing, and information is extracted from each frame image of the tracking video, and then through the saliency detection, volume Convolutional neural network algorithm is used to further screen the positive and negative samples taken; the other part is to realize the convolutional neural network of the VGG model: first, the three-layer convolutional network is used to extract the target features, and then the fully connected layer is used to analyze the target and The background is classified, and finally the position of the target to be tracked is obtained, and then the tracking process of the next frame is started. Compared with the prior art, the present invention (1) can accurately use the preprocessing information of the image while reducing the computational complexity, so that the tracking effect is more accurate. Therefore, the content of the present invention is original; (2) the tracker can Adapt to a variety of complex environments and scenes, has a wide range of application prospects.

Figure 201711237457

Description

Translated fromChinese
基于深度学习和环境自适应的目标跟踪方法Target tracking method based on deep learning and environment adaptation

技术领域technical field

本发明涉及计算机视觉的目标跟踪领域,更具体地,涉及一种基于深度学习方法来对环境自适应的目标跟踪算法。The invention relates to the field of target tracking of computer vision, and more particularly, to a target tracking algorithm based on a deep learning method to adapt to the environment.

背景技术Background technique

人类是通过感觉来与外界联系和沟通的,但是人的精力和视野是非常有限的。因此在各个领域的应用中,人类的视觉是受到了很大的限制甚至低效的。在数字计算机技术飞速发展的今天,计算机视觉也越来越引起人们的广泛关注,人们意图用计算机来代替人的“眼睛”,使之具有智能化,让计算机能够处理视觉信息、完善人类视觉上的诸多短板。计算机视觉是融合了人工神经网络、心理学、物理学、计算机图形学以及数学等众多领域的一门交叉性很强的学科。Humans connect and communicate with the outside world through feeling, but human energy and vision are very limited. Therefore, in various fields of application, human vision is greatly limited or even inefficient. Today, with the rapid development of digital computer technology, computer vision has attracted more and more attention. People intend to use computers to replace human "eyes" to make them intelligent, so that computers can process visual information and improve human vision. many shortcomings. Computer vision is a highly interdisciplinary subject that integrates artificial neural networks, psychology, physics, computer graphics, and mathematics.

目前在计算机视觉领域,目标跟踪是非常活跃的课题之一,人们也越来越把注意点放在了这个领域上。目标跟踪的应用领域非常广泛,例如,动作分析、行为识别、监控和人机交互等都用到了这方面的知识,在科学和工程中有着重要的研究价值与极大的应用前景,吸引着国内外大批研究者的兴趣。At present, in the field of computer vision, object tracking is one of the very active topics, and people are paying more and more attention to this field. Target tracking has a wide range of applications. For example, action analysis, behavior recognition, monitoring, and human-computer interaction all use this knowledge. It has important research value and great application prospects in science and engineering, attracting domestic attention. interest of a large number of foreign researchers.

深度学习已经很好的应用于图像处理方向当中,为目标跟踪方向提供了一种新的解决思路。在目标跟踪领域,利用深度学习的深层架构自动地从获取的样本中学习更加抽象和本质的特征,从而来测试新的序列。结合深度学习方法的跟踪技术,在性能上逐渐超越了传统的跟踪方法,成为了这一领域的一个新趋势。Deep learning has been well applied to the direction of image processing, providing a new solution for the direction of target tracking. In the field of object tracking, the deep architecture of deep learning is used to automatically learn more abstract and essential features from the acquired samples to test new sequences. The tracking technology combined with the deep learning method has gradually surpassed the traditional tracking method in performance and has become a new trend in this field.

迄今为止,在国内外公开发表的论文和文献中尚未见开展有关基于深度学习和环境自适应的目标跟踪算法。So far, no target tracking algorithm based on deep learning and environment adaptation has been developed in the papers and literatures published at home and abroad.

发明内容SUMMARY OF THE INVENTION

基于上述现有技术,本发明提提出一种基于深度学习和环境自适应的目标跟踪方法,利用卷积神经网络,自适应调节网络的参数,使得跟踪器在多种跟踪场景都有很高的准确率结合显著性检测的预处理优势。Based on the above-mentioned prior art, the present invention proposes a target tracking method based on deep learning and environment adaptation, which uses a convolutional neural network to adaptively adjust the parameters of the network, so that the tracker has high performance in various tracking scenarios. Accuracy combined with the preprocessing advantage of saliency detection.

本发明的一种基于深度学习和环境自适应的目标跟踪方法,该方法包括以下步骤:A target tracking method based on deep learning and environment adaptation of the present invention, the method comprises the following steps:

步骤1、采用107×107像素点大小的图片作为输入;Step 1. Use a picture with a size of 107×107 pixels as input;

步骤2、预处理包括正样本预处理和负样本的处理,包括正样本预处理和负样本预处理;其中,正样本预处理的步骤包括:首先,执行采样流程:根据groundtruth值在正样本中的目标周围取一个比目标的groundtruth值大的矩形,作为采样框,计算正样本的显著图占整个采样框的比例,若是比例大于设定的某个阈值,当成纯正的正样本,若是比设定的阈值小,则予以丢弃;然后,利用显著性检测算法检测出目标的形状,得到显著图,将得到的显著图二值化后,用二值化后的显著图代替原来的那一帧图像,再根据前面的采样的流程对二值化之后的整帧图像来进行采样;负样本预处理的步骤包括:使用难例挖掘算法对于负样本进行筛选,将采样的样本在卷积神经网络中进行一次正向传播,将loss比较大的样本按照顺序排列,并将前面的选出来loss比较大的样本作为“难例”,用这部分样本来训练网络;其中:离线多域训练时,从每一帧中采用50个正样本和200个负样本,正样本和负样本分别和ground-truth的框有≥0.7和≤0.5的重合率,根据这个标准来分别选取正负样本的;同样的,对于在线学习,收集

Figure GDA0002697040580000021
个正样本和
Figure GDA0002697040580000022
负样本,并且遵循上边的采样重合率标准;Step 2. Preprocessing includes positive sample preprocessing and negative sample processing, including positive sample preprocessing and negative sample preprocessing; wherein, the steps of positive sample preprocessing include: first, perform a sampling process: according to the groundtruth value, in the positive sample Take a rectangle around the target that is larger than the groundtruth value of the target as a sampling frame, and calculate the proportion of the saliency map of the positive sample to the entire sampling frame. If the ratio is greater than a certain threshold set, it is regarded as a pure positive sample. If the set threshold is small, it will be discarded; then, the shape of the target is detected by the saliency detection algorithm, and the saliency map is obtained. After binarizing the obtained saliency map, the original frame is replaced by the binarized saliency map. The whole frame image after binarization is sampled according to the previous sampling process; the steps of negative sample preprocessing include: using the difficult example mining algorithm to screen the negative samples, and the sampled samples in the convolutional neural network. Carry out a forward propagation in the middle, arrange the samples with relatively large loss in order, and use the previously selected samples with relatively large loss as "difficult examples", and use these samples to train the network; among them: during offline multi-domain training, From each frame, 50 positive samples and 200 negative samples are used. The positive samples and negative samples have a coincidence rate of ≥0.7 and ≤0.5 with the ground-truth frame respectively. According to this standard, the positive and negative samples are selected respectively; the same Yes, for online learning, collect
Figure GDA0002697040580000021
positive samples and
Figure GDA0002697040580000022
Negative samples, and follow the sampling coincidence rate standard above;

步骤3、在第一帧被训练时采用边界框回归模型,具体处理包括:对于测试的视频序列中所给定第一帧,使用三层卷积网络来训练一个线性的边界框回归模型来预测目标的位置、提取目标特征;在随后的视频序列的每一帧中,使用边界框回归模型来调整预测对应目标的边界框的位置。Step 3. The bounding box regression model is used when the first frame is trained. The specific processing includes: for the given first frame in the test video sequence, a three-layer convolutional network is used to train a linear bounding box regression model to predict The position of the target, extract the target features; in each frame of the subsequent video sequence, use the bounding box regression model to adjust the position of the bounding box that predicts the corresponding target.

与现有技术相比,本发明具有以下效果:Compared with the prior art, the present invention has the following effects:

(1)能够在降低计算复杂度的同时,精确使用图像的预处理信息,使得跟踪效果更加精确,因此,本发明内容具有独创性;(1) The preprocessing information of the image can be accurately used while reducing the computational complexity, so that the tracking effect is more accurate. Therefore, the content of the present invention is original;

(2)该跟踪器能适应多种环境复杂的场景,有着广泛的应用前景。(2) The tracker can adapt to a variety of complex environments and has a wide range of application prospects.

附图说明Description of drawings

图1为本发明的基于深度学习和环境自适应的目标跟踪方法整体框架;图1(a)为本文跟踪算法的基本模型;图1(b)为显著性检测模型;图1(c)深度学习跟踪模型;Fig. 1 is the overall framework of the target tracking method based on deep learning and environment adaptation of the present invention; Fig. 1 (a) is the basic model of the tracking algorithm in this paper; Fig. 1 (b) is the saliency detection model; Fig. 1 (c) depth Learning tracking model;

图2为Diving序列跟踪测试结果Figure 2 shows the results of the Diving sequence tracking test

图3为ball序列跟踪测试结果Figure 3 shows the results of the ball sequence tracking test

具体实施方式Detailed ways

本发明的基于深度学习和环境自适应的目标跟踪方法,该跟踪方法由两部分组成,一部分是预处理,对跟踪视频的每一帧图像来提取信息,然后通过显著性检测、卷积神经网络算法来对采取的正负样本进行进一步的筛选;另外一部分是实现VGG模型的卷积神经网络:首先利用三层的卷积网络来提取目标特征,其次利用全连接层来对目标和背景来进行分类,最后得到想要跟踪的目标的位置,再开始下一帧的跟踪流程。The target tracking method based on deep learning and environment adaptation of the present invention consists of two parts, one part is preprocessing, which extracts information from each frame of the tracked video, and then uses saliency detection, convolutional neural network The algorithm is used to further screen the positive and negative samples taken; the other part is the convolutional neural network that implements the VGG model: first, the three-layer convolutional network is used to extract the target features, and then the fully connected layer is used to perform the target and background. Classification, and finally get the position of the target you want to track, and then start the tracking process of the next frame.

具体流程详细描述如下:The specific process is described in detail as follows:

步骤1、采用107×107像素点大小的图片作为输入;为了保证卷积层输出的特征图与输入的大小相匹配,要保证输入全卷积层的为一维向量;Step 1. Use a picture with a size of 107×107 pixels as input; in order to ensure that the feature map output by the convolution layer matches the size of the input, it is necessary to ensure that the input full convolution layer is a one-dimensional vector;

步骤2、预处理包括正样本预处理和负样本的处理Step 2. Preprocessing includes positive sample preprocessing and negative sample processing

(1)正样本预处理:一般的方法采取的正样本有的时候是包含了大部分背景的负样本,这样的“正样本”对于卷积神经网络中的训练是会造成一定误差的。因此,本发明对所采取的的正样本进行一定的筛选,使得正样本更加的纯正。具体的实现方法如下:(1) Positive sample preprocessing: The positive samples taken by the general method are sometimes negative samples that contain most of the background. Such "positive samples" will cause certain errors in the training of convolutional neural networks. Therefore, the present invention performs certain screening on the positive samples taken, so that the positive samples are more pure. The specific implementation method is as follows:

首先,根据groundtruth值在正样本中的目标周围取一个矩形,矩形一定要比目标的groundtruth值大;计算显著图占整个采样框的比例,若是比例大于设定的某个阈值,就可以当成纯正的正样本来输入进网络,若是比设定的阈值小,则予以丢弃。这样可以用来保证得到的正样本都几乎是纯正的。First, take a rectangle around the target in the positive sample according to the groundtruth value, the rectangle must be larger than the groundtruth value of the target; calculate the proportion of the saliency map to the entire sampling frame, if the proportion is greater than a certain threshold, it can be regarded as pure The positive samples are input into the network, and if it is smaller than the set threshold, it will be discarded. This can be used to ensure that the obtained positive samples are almost pure.

然后,进行“显著性”检测,即对于在一个区域内显著的物体进行检测。具体作法是利用显著性检测算法大致的检测出目标的形状,然后将得到的显著图二值化,将其插回原来的一帧的图像中,再根据前面的采样的流程对二值化之后的整帧图像来进行采样,后面要利用“显著性”方法来对目标进行检验。Then, perform "saliency" detection, that is, detect objects that are salient in a region. The specific method is to use the saliency detection algorithm to roughly detect the shape of the target, and then binarize the obtained saliency map, insert it back into the original image of one frame, and then binarize it according to the previous sampling process. The whole frame of the image is sampled, and then the "saliency" method is used to test the target.

本步骤中的正样本筛选,在大多数的跟踪算法中是一个通用的正样本筛选方法;将这个思想用到了预训练的网络中,可以对于整个网络的参数有一定的影响。The positive sample screening in this step is a general positive sample screening method in most tracking algorithms; applying this idea to the pre-trained network can have a certain impact on the parameters of the entire network.

(2)负样本预处理(2) Negative sample preprocessing

在跟踪检测中,大多数的负样本通常是冗余的,只有很少的具有代表性的负样本是对于训练跟踪器有用的。对于平常的SGD方法,很容易造成跟踪器的漂移问题。对于解决这个问题,最常用的就是难例挖掘的思想。对于负样本的筛选应用难例挖掘的思想,将采样的样本在卷积神经网络中进行一次正向传播,将loss比较大的样本按照顺序排列,并将前面的选出来,因为这部分样本与正样本足够接近,同时又不是正样本,因此被称为“难例”,用这部分样本来训练网络,可以使网络更好的学习到正负样本之间的差别。In tracking detection, most of the negative samples are usually redundant, and only a few representative negative samples are useful for training the tracker. For the ordinary SGD method, it is easy to cause the drift problem of the tracker. To solve this problem, the most commonly used is the idea of hard case mining. For the screening of negative samples, the idea of hard case mining is applied, the sampled samples are forwarded in the convolutional neural network, the samples with relatively large losses are arranged in order, and the previous ones are selected, because these samples are similar to The positive samples are close enough and are not positive samples, so they are called "hard examples". Using these samples to train the network can make the network better learn the difference between positive and negative samples.

步骤3、在第一帧被训练时采用边界框回归模型,具体处理包括:对于测试的视频序列中所给定第一帧,使用三层卷积网络来训练一个线性回归模型来预测目标的位置、提取目标特征;在随后的视频序列的每一帧中,使用回归模型来调整目标的边界框的位置,利用全连接层对图像中的目标和背景进行分类,得到目标概率大的图像块,将该图像块视为要跟踪的目标,即可得到要跟踪目标的位置,再开始下一帧的跟踪流程。Step 3. The bounding box regression model is used when the first frame is trained, and the specific processing includes: for the given first frame in the test video sequence, a three-layer convolutional network is used to train a linear regression model to predict the position of the target , extract the target feature; in each frame of the subsequent video sequence, use the regression model to adjust the position of the bounding box of the target, use the fully connected layer to classify the target and background in the image, and obtain the image block with high target probability, The image block is regarded as the target to be tracked, the position of the target to be tracked can be obtained, and then the tracking process of the next frame is started.

在正样本预处理中,还可以采用长短更新策略:利用一段时间内收集到的正样本来重新更新网络。在跟踪目标的时候,一旦发现跟丢了,就使用短期的更新策略,在短期更新策略中,用于更新网络的正样本还是这一段时间内采集到的正样本。两个更新策略中所使用的负样本都使用的短期更新模型中所收集到的负样本。规定Ts和Tl是两个帧索引集,短期设定为Ts=20帧,长期设定为Tl=100帧。采用这一个策略的目的就是使得样本保持为最“新鲜”的,这样对于跟踪结果更有利。In positive sample preprocessing, a long and short update strategy can also be used: the network is re-updated with positive samples collected over a period of time. When tracking the target, once it is found that it is lost, a short-term update strategy is used. In the short-term update strategy, the positive samples used to update the network are still the positive samples collected during this period of time. Negative samples collected in the short-term update model used in both update strategies. It is specified that Ts and Tl are two frame index sets, the short-term setting is Ts =20 frames, and the long-term setting is Tl =100 frames. The purpose of this strategy is to keep the samples as "fresh" as possible, which is more beneficial for tracking results.

在离线训练好神经网络之后,对于需要测试的视频序列,是在线跟踪的。因此在整体跟踪算法中,需要有在线跟踪算法部分。在线跟踪的算法具体实现过程如下:After the neural network is trained offline, the video sequences to be tested are tracked online. Therefore, in the overall tracking algorithm, there needs to be an online tracking algorithm part. The specific implementation process of the online tracking algorithm is as follows:

输入:预训练卷积神经网络CNN的滤波器{w1,...,w5}Input: Filters {w1 ,...,w5 } of the pretrained convolutional neural network CNN

初始化目标的状态x1Initialize the state of the target x1

输出:估计目标的状态

Figure GDA0002697040580000051
Output: Estimated state of the target
Figure GDA0002697040580000051

(1)随机初始化第6个全连接层的权重w6,使得w6获得一个随机的初始值;(1) Randomly initialize the weight w6 of the sixth fully connected layer, so that w6 obtains a random initial value;

(2)训练一个边界框回归模型;(2) Train a bounding box regression model;

(3)抽取正样本

Figure GDA0002697040580000052
和负样本
Figure GDA0002697040580000053
(3) Extract positive samples
Figure GDA0002697040580000052
and negative samples
Figure GDA0002697040580000053

(4)利用显著性网络对正样本进行筛选,(4) Use the saliency network to screen the positive samples,

(5)使用抽取出的正样本

Figure GDA0002697040580000054
和负样本
Figure GDA0002697040580000055
来更新全连接层的权重值{w4,w5,w6},其中,w4,w5,w6分别表示全连接第4.5.6层的权重值;(5) Use the extracted positive samples
Figure GDA0002697040580000054
and negative samples
Figure GDA0002697040580000055
to update the weight value of the fully connected layer {w4 , w5 , w6 }, where w4 , w5 , w6 respectively represent the weight value of the fully connected layer 4.5.6;

(6)设置长短更新初始值:Ts←{1}和Tl←{1};(6) Set the initial value of length update: Ts ←{1} and Tl ←{1};

(7)重复以下操作:(7) Repeat the following operations:

抽取目标的候选样本

Figure GDA0002697040580000056
Extract candidate samples of the target
Figure GDA0002697040580000056

通过公式

Figure GDA0002697040580000057
找到最优的目标的状态
Figure GDA0002697040580000058
其中,
Figure GDA0002697040580000059
为候选样本,该公式表明候选正样本经过卷积神经网络评分最高的样本即为最优的目标状态
Figure GDA00026970405800000510
by formula
Figure GDA0002697040580000057
Find the optimal target state
Figure GDA0002697040580000058
in,
Figure GDA0002697040580000059
is a candidate sample, the formula indicates that the candidate positive sample with the highest score after the convolutional neural network is the optimal target state
Figure GDA00026970405800000510

如果

Figure GDA00026970405800000511
然后抽取训练的样本
Figure GDA00026970405800000512
Figure GDA00026970405800000513
if
Figure GDA00026970405800000511
Then take the training samples
Figure GDA00026970405800000512
and
Figure GDA00026970405800000513

Ts←Ts∪{t},Tl←Tl∪{t}Ts ←Ts ∪{t}, Tl ←Tl ∪{t}

其中,t表示第t帧,Ts和Tl分别代表短和长的索引集。将t与Ts和Tl的最大值分别的赋给Ts和Tl,更新两个帧索引集的值;where t represents the t-th frame, andTs andTl represent the short and long index sets, respectively. Assign the maximum value of t and Ts and Tl to Ts and Tl respectively, and update the values of the two frame index sets;

如果短的帧索引集的位置长度大于设置的20,即:|Ts|>τs,然后将短索引集Ts中的最小的元素剔除

Figure GDA0002697040580000061
其中,v代表短索引集中的值;If the position length of the short frame index set is greater than theset 20, that is: |Ts |>τs , then remove the smallest element in the short index set Ts
Figure GDA0002697040580000061
where v represents the value in the short index set;

如果长的帧索引集的位置长度大于设置的100,即:|Tl|>τl,然后将长索引集Tl中的最小的值剔除

Figure GDA0002697040580000062
If the position length of the long frame index set is greater than the set 100, that is: |Tl |>τl , then remove the smallest value in the long index set Tl
Figure GDA0002697040580000062

使用边界框回归模型来调整预测的目标的位置

Figure GDA0002697040580000063
Use a bounding box regression model to adjust the position of the predicted object
Figure GDA0002697040580000063

如果

Figure GDA0002697040580000064
使用短期模型中的正样本和负样本来更新权重{w4,w5,w6};if
Figure GDA0002697040580000064
Update the weights {w4 ,w5 ,w6 } using the positive and negative samples in the short-term model;

其他情况,使用短期模型中的正样本和负样本来更新权重{w4,w5,w6}。Otherwise, use the positive and negative samples in the short-term model to update the weights {w4 ,w5 ,w6 }.

下面将结合附图对本发明的实施方式作进一步的详细描述。The embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.

下面对专利提出的基于深度学习和环境自适应的目标跟踪方法进行验证。同时,通过仿真实验比较该算法的训练误差与未改进前的算法的训练误差进行对比,通过大量的实验结果来证实算法的有效性。实验结果以跟踪的目标框的形式表示。The following is a verification of the target tracking method based on deep learning and environment adaptation proposed by the patent. At the same time, the training error of the algorithm is compared with the training error of the unimproved algorithm through simulation experiments, and a large number of experimental results are used to verify the effectiveness of the algorithm. The experimental results are represented in the form of tracked object boxes.

候选目标生成为了在每一帧中生成候选目标,选取N=256个样本,Candidate target generation To generate candidate targets in each frame, N=256 samples are selected,

Figure GDA0002697040580000065
Figure GDA0002697040580000065

其中,

Figure GDA0002697040580000066
表示的为先前的目标状态;协方差矩阵是一个参数为(0.09r2)的对角矩阵,r表示前一帧中目标框的长和宽的平均值。每个候选目标框的大小是初始状态目标框的1.5倍。in,
Figure GDA0002697040580000066
represents the previous target state; the covariance matrix is a diagonal matrix with a parameter of (0.09r2 ), and r represents the average of the length and width of the target box in the previous frame. The size of each candidate target box is 1.5 times that of the initial state target box.

训练数据:在离线多域训练时,从每一帧中采用50个正样本和200个负样本,正样本和负样本分别和ground-truth的框有≥0.7和≤0.5的重合率,就是根据这个标准来分别选取正负样本的。同样的,对于在线学习,收集

Figure GDA0002697040580000067
个正样本和
Figure GDA0002697040580000068
个负样本,并且遵循上边的采样重合率标准。但是第一帧采样时,我们采取正样本
Figure GDA0002697040580000071
负样本
Figure GDA0002697040580000072
对于边界框回归u,我们使用1000个训练样本。Training data: During offline multi-domain training, 50 positive samples and 200 negative samples are used from each frame. The positive samples and negative samples have a coincidence rate of ≥0.7 and ≤0.5 with the ground-truth frame, respectively. This standard is used to select positive and negative samples respectively. Similarly, for online learning, collect
Figure GDA0002697040580000067
positive samples and
Figure GDA0002697040580000068
negative samples, and follow the sampling coincidence rate standard above. But when the first frame is sampled, we take a positive sample
Figure GDA0002697040580000071
negative sample
Figure GDA0002697040580000072
For bounding box regression u, we use 1000 training samples.

网络学习:对于训练K个分支的多域网络学习,把卷积层的学习率参数设置为0.0001,把全连接层的学习率设置为0.001。最开始训练全连接层的时候,我们迭代30次,全连接层4和5的学习率设置为0.0001,第六个全连接层学习率设置为0.001。Network learning: For multi-domain network learning for training K branches, set the learning rate parameter of the convolutional layer to 0.0001 and the learning rate of the fully connected layer to 0.001. When first training the fully connected layer, we iterate 30 times, the learning rate of the fully connectedlayer 4 and 5 is set to 0.0001, and the learning rate of the sixth fully connected layer is set to 0.001.

表1为改进算法是加入“显著性”预处理网络,表2为未改进算法是没加入预处理网络的实验结果。Table 1 shows the improved algorithm with the addition of a "significant" preprocessing network, and Table 2 shows the experimental results of the unimproved algorithm without adding the preprocessing network.

表1、改进算法后的训练结果Table 1. The training results after the improved algorithm

Figure GDA0002697040580000073
Figure GDA0002697040580000073

表2、未改进算法的训练结果Table 2. Training results of the unimproved algorithm

Figure GDA0002697040580000074
Figure GDA0002697040580000074

Claims (1)

1. A target tracking method based on deep learning and environment self-adaptation is characterized by comprising the following steps:
step (1), adopting a picture with 107 multiplied by 107 pixel points as input;
the pretreatment comprises positive sample pretreatment and negative sample treatment, wherein the positive sample pretreatment and the negative sample pretreatment are included; wherein, the step of positive sample pretreatment comprises: firstly, a sampling flow is executed: taking a rectangle larger than the grountruth value of the target around the target in the positive sample as a sampling frame according to the grountruth value, calculating the proportion of the saliency map of the positive sample in the whole sampling frame, if the proportion is larger than a set threshold value, taking the positive sample as a pure positive sample, and if the proportion is smaller than the set threshold value, discarding the positive sample; secondly, detecting the shape of the target by using a saliency detection algorithm to obtain a saliency map, binarizing the obtained saliency map, replacing the original frame image with the binarized saliency map, and sampling the binarized whole frame image according to the previous sampling process; the negative sample pretreatment step comprises the following steps: is difficult to useThe mining algorithm screens negative samples, the sampled samples are subjected to one-time forward propagation in a convolutional neural network, the samples with large loss are arranged in sequence, the selected samples with large loss are taken as 'difficult cases', and the network is trained by the samples; wherein: during off-line multi-domain training, 50 positive samples and 200 negative samples are adopted from each frame, the positive samples and the negative samples respectively have a coincidence rate which is more than or equal to 0.7 and less than or equal to 0.5 with a frame of a ground-route, and the positive samples and the negative samples are respectively selected according to the standard; likewise, for online learning, collection
Figure FDA0002697040570000011
A positive sample and
Figure FDA0002697040570000012
negative samples and follows the upper sample coincidence rate standard;
step (3), adopting a bounding box regression model when the first frame is trained, and specifically processing the method comprises the following steps: for a given first frame in a tested video sequence, training a linear bounding box regression model by using a three-layer convolution network to predict the position of a target and extract the characteristics of the target; in each frame of the subsequent video sequence, a bounding box regression model is used to adjust the position of the bounding box that predicts the corresponding target.
CN201711237457.4A2017-11-302017-11-30Target tracking method based on deep learning and environment self-adaptionActiveCN107945210B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201711237457.4ACN107945210B (en)2017-11-302017-11-30Target tracking method based on deep learning and environment self-adaption

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201711237457.4ACN107945210B (en)2017-11-302017-11-30Target tracking method based on deep learning and environment self-adaption

Publications (2)

Publication NumberPublication Date
CN107945210A CN107945210A (en)2018-04-20
CN107945210Btrue CN107945210B (en)2021-01-05

Family

ID=61946958

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201711237457.4AActiveCN107945210B (en)2017-11-302017-11-30Target tracking method based on deep learning and environment self-adaption

Country Status (1)

CountryLink
CN (1)CN107945210B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109345559B (en)*2018-08-302021-08-06西安电子科技大学 A moving target tracking method based on sample augmentation and deep classification network
CN109344793B (en)2018-10-192021-03-16北京百度网讯科技有限公司Method, apparatus, device and computer readable storage medium for recognizing handwriting in the air
CN111192288B (en)*2018-11-142023-08-04天津大学青岛海洋技术研究院Target tracking algorithm based on deformation sample generation network
CN109682392B (en)*2018-12-282020-09-01山东大学Visual navigation method and system based on deep reinforcement learning
CN113496188B (en)*2020-04-082024-04-02四零四科技股份有限公司 Device and method for processing video content analysis
CN113538507B (en)*2020-04-152023-11-17南京大学Single-target tracking method based on full convolution network online training
CN112465862B (en)*2020-11-242024-05-24西北工业大学Visual target tracking method based on cross-domain depth convolution neural network
CN114842420A (en)*2022-05-112022-08-02四川新视创伟超高清科技有限公司Transaction target tracking system and method based on deep neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103955718A (en)*2014-05-152014-07-30厦门美图之家科技有限公司Image subject recognition method
CN104915972A (en)*2014-03-132015-09-16欧姆龙株式会社Image processing apparatus, image processing method and program
CN106709936A (en)*2016-12-142017-05-24北京工业大学Single target tracking method based on convolution neural network
EP3229206A1 (en)*2016-04-042017-10-11Xerox CorporationDeep data association for online multi-class multi-object tracking
CN107369166A (en)*2017-07-132017-11-21深圳大学A kind of method for tracking target and system based on multiresolution neutral net

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104915972A (en)*2014-03-132015-09-16欧姆龙株式会社Image processing apparatus, image processing method and program
CN103955718A (en)*2014-05-152014-07-30厦门美图之家科技有限公司Image subject recognition method
EP3229206A1 (en)*2016-04-042017-10-11Xerox CorporationDeep data association for online multi-class multi-object tracking
CN106709936A (en)*2016-12-142017-05-24北京工业大学Single target tracking method based on convolution neural network
CN107369166A (en)*2017-07-132017-11-21深圳大学A kind of method for tracking target and system based on multiresolution neutral net

Also Published As

Publication numberPublication date
CN107945210A (en)2018-04-20

Similar Documents

PublicationPublication DateTitle
CN107945210B (en)Target tracking method based on deep learning and environment self-adaption
US12197544B2 (en)Method and system for defending against adversarial sample in image classification includes denoising by an adversarial denoising network, and data processing terminal
CN109583342B (en)Human face living body detection method based on transfer learning
CN110048827B (en)Class template attack method based on deep learning convolutional neural network
CN105069472B (en)A kind of vehicle checking method adaptive based on convolutional neural networks
CN111126386A (en) A sequential domain adaptation method based on adversarial learning in scene text recognition
CN108549841A (en)A kind of recognition methods of the Falls Among Old People behavior based on deep learning
CN110516536A (en) A Weakly Supervised Video Behavior Detection Method Based on the Complementation of Temporal Category Activation Maps
CN109344759A (en) A Relative Recognition Method Based on Angle Loss Neural Network
CN111598914B (en) An Uncertainty-Guided Adaptive Image Segmentation Method
CN110874590B (en) Adapter-based mutual learning model training and visible light infrared vision tracking method
CN113011357A (en)Depth fake face video positioning method based on space-time fusion
CN110929848B (en)Training and tracking method based on multi-challenge perception learning model
CN109002755B (en) Age estimation model construction method and estimation method based on face image
CN110188654B (en)Video behavior identification method based on mobile uncut network
CN110348494A (en)A kind of human motion recognition method based on binary channels residual error neural network
CN110390308B (en)Video behavior identification method based on space-time confrontation generation network
Khaw et al.High‐density impulse noise detection and removal using deep convolutional neural network with particle swarm optimisation
CN112052816B (en)Human behavior prediction method and system based on adaptive graph convolution countermeasure network
CN112270368A (en)Image classification method based on misclassification perception regularization training
CN110969109B (en)Blink detection model under non-limited condition and construction method and application thereof
Saealal et al.Three-dimensional convolutional approaches for the verification of deepfake videos: The effect of image depth size on authentication performance
CN107633527B (en) Target tracking method and device based on fully convolutional neural network
CN114444687B (en) A small sample semi-supervised learning method and device based on pseudo-label noise filtering
Singla et al.Age and gender detection using Deep Learning

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp