技术领域technical field
本发明属计算机视觉领域,涉及一种目标跟踪方法,具体涉及一种基于卷积特征和全局搜索检测的长时遮挡鲁棒跟踪方法。The invention belongs to the field of computer vision and relates to a target tracking method, in particular to a long-term occlusion robust tracking method based on convolution features and global search detection.
背景技术Background technique
目标跟踪的主要任务是获取视频序列中特定目标的位置与运动信息,在视频监控、人机交互等领域具有广泛的应用。跟踪过程中,光照变化、背景复杂、目标发生旋转或缩放等因素都会增加目标跟踪问题的复杂性,尤其是当目标被长时遮挡时,则更容易导致跟踪失败。The main task of target tracking is to obtain the position and motion information of a specific target in a video sequence, which has a wide range of applications in video surveillance, human-computer interaction and other fields. During the tracking process, factors such as illumination changes, complex background, target rotation or scaling will increase the complexity of the target tracking problem, especially when the target is blocked for a long time, it is more likely to cause tracking failure.
文献“Tracking-learning-detection,IEEE Transactions on PatternAnalysis and Machine Intelligence,2012,34(7):1409-1422”提出的跟踪方法(简称TLD)首次将传统的跟踪算法和检测算法结合起来,利用检测结果完善跟踪结果,提高了系统的可靠性和鲁棒性。其跟踪算法基于光流法,检测算法产生大量的检测窗口,对于每个检测窗口,都必须被三个检测器接受才能成为最后的检测结果。针对遮挡问题,TLD提供了一个切实有效的解决思路,能够对目标进行长时跟踪(Long-term Tracking)。但是,TLD使用的是浅层的人工特征,对目标的表征能力有限,且检测算法的设计也较为复杂,有一定的改进空间。The tracking method (TLD for short) proposed in the document "Tracking-learning-detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409-1422" combines the traditional tracking algorithm and detection algorithm for the first time, and uses the detection results Improve the tracking results, improve the reliability and robustness of the system. Its tracking algorithm is based on the optical flow method, and the detection algorithm generates a large number of detection windows. For each detection window, it must be accepted by three detectors to become the final detection result. For the occlusion problem, TLD provides a practical and effective solution, which can track the target for a long time (Long-term Tracking). However, TLD uses shallow artificial features, which have limited ability to represent targets, and the design of detection algorithms is relatively complex, so there is room for improvement.
发明内容Contents of the invention
要解决的技术问题technical problem to be solved
为了避免现有技术的不足之处,本发明提出一种基于卷积特征和全局搜索检测的长时遮挡鲁棒跟踪方法,解决视频运动目标在跟踪过程中由于长时遮挡或目标平移出视野之外等因素造成外观模型漂移,从而易导致跟踪失败的问题。In order to avoid the deficiencies of the prior art, the present invention proposes a long-term occlusion robust tracking method based on convolutional features and global search detection to solve the problems caused by long-term occlusion or the target moving out of the field of view during the tracking process of video moving targets. External factors cause the appearance model to drift, which easily leads to the problem of tracking failure.
技术方案Technical solutions
一种基于卷积特征和全局搜索检测的长时遮挡鲁棒跟踪方法,其特征在于步骤如下:A long-term occlusion robust tracking method based on convolution features and global search detection, characterized in that the steps are as follows:
步骤1读取视频中第一帧图像数据以及目标所在的初始位置信息[x,y,w,h],其中x,y表示目标中心的横坐标和纵坐标,w,h表示目标的宽和高。将(x,y)对应的坐标点记为P,以P为中心,大小为w×h的目标初始区域记为Rinit,再将目标的尺度记为scale,初始化为1。Step 1 Read the first frame of image data in the video and the initial position information of the target [x, y, w, h], where x, y represent the abscissa and ordinate of the target center, w, h represent the width and high. Record the coordinate point corresponding to (x, y) as P, take P as the center, and mark the initial area of the target with a size of w×h as Rinit , and record the scale of the target as scale, which is initialized to 1.
步骤2以P为中心,确定一个包含目标及背景信息的区域Rbkg,Rbkg的大小为M×N,M=2w,N=2h。采用VGGNet-19作为CNN模型,在第5层卷积层(conv5-4)对R'提取卷积特征图ztarget_init。然后根据ztarget_init构建跟踪模块的目标模型t∈{1,2,...,T},T为CNN模型通道数,计算方法如下:Step 2. Taking P as the center, determine a region Rbkg containing target and background information. The size of Rbkg is M×N, M=2w, N=2h. VGGNet-19 is used as the CNN model, and the convolutional feature map ztarget_init is extracted from R' in the 5th convolutional layer (conv5-4). Then build the target model of the tracking module according to ztarget_init t∈{1,2,...,T}, T is the number of CNN model channels, the calculation method is as follows:
其中,大写的变量为相应的小写变量在频域上的表示,高斯滤波模板m,n为高斯函数自变量,m∈{1,2,...,M},n∈{1,2,...,N},σtarget为高斯核的带宽,⊙代表元素相乘运算,上划线表示复共轭,λ1为调整参数(为了避免分母为0而引入),设定为0.0001。Among them, the uppercase variable is the representation of the corresponding lowercase variable in the frequency domain, and the Gaussian filter template m, n are the independent variables of the Gaussian function, m∈{1,2,...,M}, n∈{1,2,...,N}, σtarget is the bandwidth of the Gaussian kernel, ⊙ represents multiplication of elements, the overline represents complex conjugation, and λ1 is an adjustment parameter (introduced to avoid the denominator being 0), which is set to 0.0001.
步骤3以P为中心,提取S个不同尺度的图像子块,S设定为33。每个子块的大小为w×h×s,变量s为图像子块的尺度因子,s∈[0.7,1.4]。然后分别提取每个图像子块的HOG特征,合并后成为一个S维的HOG特征向量,这里将其命名为尺度特征向量,记为zscale_init。再根据zscale_init构建跟踪模块的尺度模型Wscale,计算方法与步骤2中计算类似(将尺度特征向量替换掉卷积特征图),具体如下:Step 3 takes P as the center and extracts S image sub-blocks of different scales, and S is set to 33. The size of each sub-block is w×h×s, and the variable s is the scale factor of the image sub-block, s∈[0.7,1.4]. Then extract the HOG features of each image sub-block separately, and merge them into an S-dimensional HOG feature vector, which is named as the scale feature vector and recorded as zscale_init . Then build the scale model Wscale of the tracking module according to zscale_init , and the calculation method is the same as that calculated in step 2 Similar (replacing the scale feature vector with the convolution feature map), as follows:
其中,s'为高斯函数自变量,s'∈{1,2,...,S},σscale为高斯核的带宽,λ2为调整参数,设定为0.0001。in, s' is the independent variable of the Gaussian function, s'∈{1,2,...,S}, σscale is the bandwidth of the Gaussian kernel, λ2 is an adjustment parameter, which is set to 0.0001.
步骤4对目标初始区域Rinit提取灰度特征,得到的灰度特征表示是一个二维矩阵,这里将该矩阵命名为目标外观表示矩阵,记为Ak,下标k表示当前帧数,初始时k=1。然后将检测模块的滤波模型D初始化为A1,即D=A1,再初始化历史目标表示矩阵集合Ahis。Ahis的作用是存储当前及之前每一帧的目标外观表示矩阵,即Ahis={A1,A2,...,Ak},初始时Ahis={A1}。Step 4 extracts the grayscale features of the target initial region Rinit , and the obtained grayscale feature representation is a two-dimensional matrix. Here, the matrix is named the target appearance representation matrix, denoted as Ak , the subscript k represents the current frame number, and the initial When k=1. Then the filter model D of the detection module is initialized to A1 , that is, D=A1 , and then the historical target representation matrix set Ahis is initialized. The function of Ahis is to store the target appearance representation matrix of the current and previous frames, that is, Ahis ={A1 , A2 ,...,Ak }, initially Ahis ={A1 }.
步骤5读取下一帧图像,仍然以P为中心,提取大小为Rbkg×scale的经过尺度缩放后的目标搜索区域。然后通过步骤2中的CNN网络提取目标搜索区域的卷积特征,并以双边插值的方式采样到Rbkg的大小得到当前帧的卷积特征图ztarget_cur,再利用目标模型计算目标置信图ftarget,计算方法如下:Step 5: Read the next frame of image, still take P as the center, and extract the scaled target search area whose size is Rbkg × scale. Then use the CNN network in step 2 to extract the convolution features of the target search area, and sample to the size of Rbkg by bilateral interpolation to obtain the convolution feature map ztarget_cur of the current frame, and then use the target model Calculate the target confidence map ftarget , the calculation method is as follows:
其中,为傅里叶逆变换。最后更新P的坐标,将(x,y)修正为ftarget中的最大响应值所对应的坐标:in, is the inverse Fourier transform. Finally, update the coordinates of P, and correct (x,y) to the coordinates corresponding to the maximum response value in ftarget :
步骤6以P为中心,提取S个不同尺度的图像子块,然后分别提取每个图像子块的HOG特征,合并后得到当前帧的尺度特征向量zscale_cur(同步骤3中zscale_init的计算方法)。再利用尺度模型Wscale计算尺度置信图:Step 6 takes P as the center, extracts S image sub-blocks of different scales, then extracts the HOG features of each image sub-block respectively, and obtains the scale feature vector zscale_cur of the current frame after merging (same as the calculation method of zscale_init in step 3 ). Then use the scale model Wscale to calculate the scale confidence map:
最后更新目标的尺度scale,计算方法如下:Finally, the scale of the target is updated, and the calculation method is as follows:
至此,可以得到跟踪模块在当前帧(第k帧)的输出:以坐标为(x,y)的P为中心,大小为Rinit×scale的图像子块TPatchk。另外,将已经计算完成的ftarget中的最大响应值简记为TPeakk,即TPeakk=ftarget(x,y)。So far, the output of the tracking module in the current frame (the kth frame) can be obtained: an image sub-block TPatchk whose size is Rinit ×scale centered on P whose coordinates are (x, y). In addition, the calculated maximum response value in ftarget is abbreviated as TPeakk , that is, TPeakk =ftarget (x, y).
步骤7检测模块以全局搜索的方式将滤波模型D与当前帧的整幅图像进行卷积,计算滤波模型D与当前帧各个位置的相似程度。取响应度最高的前j个值(j设定为10),并分别以这j个值对应的位置点为中心,提取大小为Rinit×scale的j个图像子块。将这j个图像子块作为元素,生成一个图像子块集合DPatchesk,即检测模块在第k帧的输出。Step 7: The detection module convolutes the filter model D with the entire image of the current frame by means of global search, and calculates the degree of similarity between the filter model D and each position of the current frame. Take the first j values with the highest responsivity (j is set to 10), and extract j image sub-blocks with a size of Rinit × scale centered on the position points corresponding to these j values. Taking these j image sub-blocks as elements, generate a set of image sub-blocks DPatchesk , which is the output of the detection module at the kth frame.
步骤8对检测模块输出的集合DPatchesk中各图像子块,分别计算其与跟踪模块输出的TPatchk之间的像素重叠率,可以得到j个值,将其中最高的值记为如果小于阈值(设定为0.05),判定为目标被完全遮挡,需要抑制跟踪模块在模型更新时的学习率β,并转步骤9;否则按初始学习率βinit(βinit设定为0.02)进行更新,并转步骤10。β的计算公式如下:Step 8 For each image sub-block in the set DPatchesk output by the detection module, calculate the pixel overlap rate between it and the TPatchk output by the tracking module, and j values can be obtained, and the highest value is recorded as if less than threshold ( set to 0.05), it is determined that the target is completely occluded, and it is necessary to suppress the learning rate β of the tracking module when the model is updated, and go to step 9; otherwise, update according to the initial learning rate βinit (βinit is set to 0.02), and Go to step 10. The calculation formula of β is as follows:
步骤9根据DPatchesk中各图像子块的中心,分别提取大小为Rbkg×scale的j个目标搜索区域,按照步骤5中的方法对每一个目标搜索区域提取卷积特征图并计算目标置信图,可以得到j个目标搜索区域上的最大响应值。在这j个响应值中再进行比较,将最大的值记为DPeakk。如果DPeakk大于TPeakk,则再次更新P的坐标,将(x,y)修正为DPeakk所对应的坐标。并重新计算目标尺度特征向量和目标尺度scale(同步骤6中的计算方式)。Step 9 According to the center of each image sub-block in DPatchesk , respectively extract j target search areas with a size of Rbkg × scale, and extract the convolution feature map for each target search area according to the method in step 5 and calculate the target confidence map , the maximum response value on j target search areas can be obtained. Then compare among the j response values, and record the largest value as DPeakk . If DPeakk is greater than TPeakk , update the coordinates of P again, and correct (x, y) to the coordinates corresponding to DPeakk . And recalculate the target scale feature vector and target scale scale (same calculation method in step 6).
步骤10目标在当前帧最优的位置中心确定为P,最优尺度确定为scale。在图像中标示出新的目标区域Rnew,即以P为中心,宽和高分别为w×scale、h×scale的矩形框。另外,将已经计算完成、并且能够得到最优目标位置中心P的卷积特征图简记为ztarget;同样,将能够得到最优目标尺度scale的尺度特征向量简记为zscale。In step 10, the center of the target's optimal position in the current frame is determined as P, and the optimal scale is determined as scale. A new target region Rnew is marked in the image, that is, a rectangular frame with P as the center and width and height of w×scale and h×scale respectively. In addition, the convolution feature map that has been calculated and can obtain the optimal target position center P is abbreviated as ztarget ; similarly, the scale feature vector that can obtain the optimal target scale scale is abbreviated as zscale .
步骤11利用ztarget、zscale,以及上一帧建立的跟踪模块中的目标模型和尺度模型Wscale,分别以加权求和的方式进行模型更新,计算方法如下:Step 11 uses ztarget , zscale , and the target model in the tracking module established in the previous frame and the scale model Wscale , the model is updated by weighted summation respectively, and the calculation method is as follows:
其中,β为步骤8计算后的学习率。Among them, β is the learning rate calculated in step 8.
步骤12对新的目标区域Rnew提取灰度特征后得到当前帧的目标外观表示矩阵Ak,将Ak加入到历史目标表示矩阵集合Ahis。如果集合Ahis中元素个数大于c(c设定为20),则从Ahis中随机选择c个元素生成一个三维矩阵Ck,Ck(:,i)对应的是Ahis中任意一个元素(即二维矩阵Ak);否则用Ahis中所有元素生成矩阵Ck。然后对Ck进行平均化得到二维矩阵,将这个二维矩阵作为检测模块新的滤波模型D,计算方法如下:Step 12: After extracting the gray features of the new target region Rnew , the target appearance representation matrix Ak of the current frame is obtained, and adding Ak to the historical target representation matrix set Ahis . If the number of elements in the set Ahis is greater thanc (c is set to 20), then randomly selectc elements from Ahis to generate a three-dimensional matrix Ck , and Ck (:,i) corresponds to any one of Ahis elements (that is, two-dimensional matrix Ak ); otherwise, all elements in Ahis are used to generate matrix Ck . Then Ck is averaged to obtain a two-dimensional matrix, and this two-dimensional matrix is used as the new filtering model D of the detection module, and the calculation method is as follows:
步骤13判断是否处理完视频中所有的图像帧,若处理完则算法结束,否则转步骤5继续执行。Step 13 judges whether all image frames in the video have been processed, and if processed, the algorithm ends, otherwise go to step 5 and continue to execute.
有益效果Beneficial effect
本发明提出的一种基于卷积特征和全局搜索检测的长时遮挡鲁棒跟踪方法,分别设计了跟踪模块和检测模块,跟踪过程中,两个模块协同工作:跟踪模块主要利用卷积神经网络(Convolutional Neural Network,CNN)提取目标的卷积特征用于构建鲁棒的目标模型,并通过方向梯度直方图(Histogram of Oriented Gradient,HOG)特征构建尺度模型,结合相关滤波方法来分别确定目标的位置中心和尺度;检测模块提取灰度特征构建目标的滤波模型,采用全局搜索的方式在整幅图像中对目标进行快速检测并判断遮挡的发生,一旦目标被完全遮挡(或其它因素导致目标外观剧烈变化),检测模块利用检测结果修正跟踪目标的位置,并抑制跟踪模块的模型更新,防止引入不必要的噪声从而导致模型漂移和跟踪失败。A long-term occlusion robust tracking method based on convolution features and global search detection proposed by the present invention, the tracking module and the detection module are designed respectively. During the tracking process, the two modules work together: the tracking module mainly uses the convolutional neural network (Convolutional Neural Network, CNN) extracts the convolutional features of the target to build a robust target model, and constructs a scale model through the Histogram of Oriented Gradient (HOG) feature, and combines the correlation filtering method to determine the target's Position center and scale; the detection module extracts the grayscale features to construct the filter model of the target, and uses the global search method to quickly detect the target in the entire image and judge the occurrence of occlusion. Once the target is completely occluded (or other factors lead to the appearance of the target drastic changes), the detection module uses the detection results to correct the position of the tracking target, and suppresses the model update of the tracking module to prevent the introduction of unnecessary noise that leads to model drift and tracking failure.
优越性:通过在跟踪模块中使用卷积特征和多尺度相关滤波方法,增强了跟踪目标外观模型的特征表达能力,使得跟踪结果对于光照变化、目标尺度变化、目标旋转等因素具有很强的鲁棒性;又通过引入的全局搜索检测机制,使得当目标被长时遮挡导致跟踪失败时,检测模块可以再次检测到目标,使跟踪模块从错误中恢复过来,这样即使在目标外观变化的情况下,也能够被长时间持续地跟踪。Advantages: By using convolution features and multi-scale correlation filtering methods in the tracking module, the feature expression ability of the tracking target appearance model is enhanced, so that the tracking results are highly robust to factors such as illumination changes, target scale changes, and target rotations. Robustness; through the introduction of the global search detection mechanism, when the target is blocked for a long time and the tracking fails, the detection module can detect the target again, so that the tracking module can recover from the error, so that even if the target appearance changes , can also be tracked continuously for a long time.
附图说明Description of drawings
图1:基于卷积特征和全局搜索检测的长时遮挡鲁棒跟踪方法流程图Figure 1: Flowchart of a long-duration occlusion robust tracking method based on convolutional features and global search detection
具体实施方式Detailed ways
现结合实施例、附图对本发明作进一步描述:Now in conjunction with embodiment, accompanying drawing, the present invention will be further described:
步骤1读取视频中第一帧图像数据以及目标所在的初始位置信息[x,y,w,h],其中x,y表示目标中心的横坐标和纵坐标,w,h表示目标的宽和高。将(x,y)对应的坐标点记为P,以P为中心,大小为w×h的目标初始区域记为Rinit,再将目标的尺度记为scale,初始化为1。Step 1 Read the first frame of image data in the video and the initial position information of the target [x, y, w, h], where x, y represent the abscissa and ordinate of the target center, w, h represent the width and high. Record the coordinate point corresponding to (x, y) as P, take P as the center, and mark the initial area of the target with a size of w×h as Rinit , and record the scale of the target as scale, which is initialized to 1.
步骤2以P为中心,确定一个包含目标及背景信息的区域Rbkg,Rbkg的大小为M×N,M=2w,N=2h。采用VGGNet-19作为CNN模型,在第5层卷积层(conv5-4)对R'提取卷积特征图ztarget_init。然后根据ztarget_init构建跟踪模块的目标模型t∈{1,2,...,T},T为CNN模型通道数,计算方法如下:Step 2. Taking P as the center, determine a region Rbkg containing target and background information. The size of Rbkg is M×N, M=2w, N=2h. VGGNet-19 is used as the CNN model, and the convolutional feature map ztarget_init is extracted from R' in the 5th convolutional layer (conv5-4). Then build the target model of the tracking module according to ztarget_init t∈{1,2,...,T}, T is the number of CNN model channels, the calculation method is as follows:
其中,大写的变量为相应的小写变量在频域上的表示,高斯滤波模板m,n为高斯函数自变量,m∈{1,2,...,M},n∈{1,2,...,N},σtarget为高斯核的带宽,⊙代表元素相乘运算,上划线表示复共轭,λ1为调整参数(为了避免分母为0而引入),设定为0.0001。Among them, the uppercase variable is the representation of the corresponding lowercase variable in the frequency domain, and the Gaussian filter template m, n are the independent variables of the Gaussian function, m∈{1,2,...,M}, n∈{1,2,...,N}, σtarget is the bandwidth of the Gaussian kernel, ⊙ represents multiplication of elements, the overline represents complex conjugation, and λ1 is an adjustment parameter (introduced to avoid the denominator being 0), which is set to 0.0001.
步骤3以P为中心,提取S个不同尺度的图像子块,S设定为33。每个子块的大小为w×h×s,变量s为图像子块的尺度因子,s∈[0.7,1.4]。然后分别提取每个图像子块的HOG特征,合并后成为一个S维的HOG特征向量,这里将其命名为尺度特征向量,记为zscale_init。再根据zscale_init构建跟踪模块的尺度模型Wscale,计算方法与步骤2中计算类似(将尺度特征向量替换掉卷积特征图),具体如下:Step 3 takes P as the center and extracts S image sub-blocks of different scales, and S is set to 33. The size of each sub-block is w×h×s, and the variable s is the scale factor of the image sub-block, s∈[0.7,1.4]. Then extract the HOG features of each image sub-block separately, and merge them into an S-dimensional HOG feature vector, which is named as the scale feature vector and recorded as zscale_init . Then build the scale model Wscale of the tracking module according to zscale_init , and the calculation method is the same as that calculated in step 2 Similar (replacing the scale feature vector with the convolution feature map), as follows:
其中,s'为高斯函数自变量,s'∈{1,2,...,S},σscale为高斯核的带宽,λ2为调整参数,设定为0.0001。in, s' is the independent variable of the Gaussian function, s'∈{1,2,...,S}, σscale is the bandwidth of the Gaussian kernel, λ2 is an adjustment parameter, which is set to 0.0001.
步骤4对目标初始区域Rinit提取灰度特征,得到的灰度特征表示是一个二维矩阵,这里将该矩阵命名为目标外观表示矩阵,记为Ak,下标k表示当前帧数,初始时k=1。然后将检测模块的滤波模型D初始化为A1,即D=A1,再初始化历史目标表示矩阵集合Ahis。Ahis的作用是存储当前及之前每一帧的目标外观表示矩阵,即Ahis={A1,A2,...,Ak},初始时Ahis={A1}。Step 4 extracts the grayscale features of the target initial region Rinit , and the obtained grayscale feature representation is a two-dimensional matrix. Here, the matrix is named the target appearance representation matrix, denoted as Ak , the subscript k represents the current frame number, and the initial When k=1. Then the filter model D of the detection module is initialized to A1 , that is, D=A1 , and then the historical target representation matrix set Ahis is initialized. The function of Ahis is to store the target appearance representation matrix of the current and previous frames, that is, Ahis ={A1 , A2 ,...,Ak }, initially Ahis ={A1 }.
步骤5读取下一帧图像,仍然以P为中心,提取大小为Rbkg×scale的经过尺度缩放后的目标搜索区域。然后通过步骤2中的CNN网络提取目标搜索区域的卷积特征,并以双边插值的方式采样到Rbkg的大小得到当前帧的卷积特征图ztarget_cur,再利用目标模型计算目标置信图ftarget,计算方法如下:Step 5: Read the next frame of image, still take P as the center, and extract the scaled target search area whose size is Rbkg × scale. Then use the CNN network in step 2 to extract the convolution features of the target search area, and sample to the size of Rbkg by bilateral interpolation to obtain the convolution feature map ztarget_cur of the current frame, and then use the target model Calculate the target confidence map ftarget , the calculation method is as follows:
其中,为傅里叶逆变换。最后更新P的坐标,将(x,y)修正为ftarget中的最大响应值所对应的坐标:in, is the inverse Fourier transform. Finally, update the coordinates of P, and correct (x,y) to the coordinates corresponding to the maximum response value in ftarget :
步骤6以P为中心,提取S个不同尺度的图像子块,然后分别提取每个图像子块的HOG特征,合并后得到当前帧的尺度特征向量zscale_cur(同步骤3中zscale_init的计算方法)。再利用尺度模型Wscale计算尺度置信图:Step 6 takes P as the center, extracts S image sub-blocks of different scales, then extracts the HOG features of each image sub-block respectively, and obtains the scale feature vector zscale_cur of the current frame after merging (same as the calculation method of zscale_init in step 3 ). Then use the scale model Wscale to calculate the scale confidence map:
最后更新目标的尺度scale,计算方法如下:Finally, the scale of the target is updated, and the calculation method is as follows:
至此,可以得到跟踪模块在当前帧(第k帧)的输出:以坐标为(x,y)的P为中心,大小为Rinit×scale的图像子块TPatchk。另外,将已经计算完成的ftarget中的最大响应值简记为TPeakk,即TPeakk=ftarget(x,y)。So far, the output of the tracking module in the current frame (the kth frame) can be obtained: an image sub-block TPatchk whose size is Rinit ×scale centered on P whose coordinates are (x, y). In addition, the calculated maximum response value in ftarget is abbreviated as TPeakk , that is, TPeakk =ftarget (x, y).
步骤7检测模块以全局搜索的方式将滤波模型D与当前帧的整幅图像进行卷积,计算滤波模型D与当前帧各个位置的相似程度。取响应度最高的前j个值(j设定为10),并分别以这j个值对应的位置点为中心,提取大小为Rinit×scale的j个图像子块。将这j个图像子块作为元素,生成一个图像子块集合DPatchesk,即检测模块在第k帧的输出。Step 7: The detection module convolutes the filter model D with the entire image of the current frame by means of global search, and calculates the degree of similarity between the filter model D and each position of the current frame. Take the first j values with the highest responsivity (j is set to 10), and extract j image sub-blocks with a size of Rinit × scale centered on the position points corresponding to these j values. Taking these j image sub-blocks as elements, generate a set of image sub-blocks DPatchesk , which is the output of the detection module at the kth frame.
步骤8对检测模块输出的集合DPatchesk中各图像子块,分别计算其与跟踪模块输出的TPatchk之间的像素重叠率,可以得到j个值,将其中最高的值记为如果小于阈值(设定为0.05),判定为目标被完全遮挡,需要抑制跟踪模块在模型更新时的学习率β,并转步骤9;否则按初始学习率βinit(βinit设定为0.02)进行更新,并转步骤10。β的计算公式如下:Step 8 For each image sub-block in the set DPatchesk output by the detection module, calculate the pixel overlap rate between it and the TPatchk output by the tracking module, and j values can be obtained, and the highest value is recorded as if less than threshold ( set to 0.05), it is determined that the target is completely occluded, and it is necessary to suppress the learning rate β of the tracking module when the model is updated, and go to step 9; otherwise, update according to the initial learning rate βinit (βinit is set to 0.02), and Go to step 10. The calculation formula of β is as follows:
步骤9根据DPatchesk中各图像子块的中心,分别提取大小为Rbkg×scale的j个目标搜索区域,按照步骤5中的方法对每一个目标搜索区域提取卷积特征图并计算目标置信图,可以得到j个目标搜索区域上的最大响应值。在这j个响应值中再进行比较,将最大的值记为DPeakk。如果DPeakk大于TPeakk,则再次更新P的坐标,将(x,y)修正为DPeakk所对应的坐标。并重新计算目标尺度特征向量和目标尺度scale(同步骤6中的计算方式)。Step 9 According to the center of each image sub-block in DPatchesk , respectively extract j target search areas with a size of Rbkg × scale, and extract the convolution feature map for each target search area according to the method in step 5 and calculate the target confidence map , the maximum response value on j target search areas can be obtained. Then compare among the j response values, and record the largest value as DPeakk . If DPeakk is greater than TPeakk , update the coordinates of P again, and correct (x, y) to the coordinates corresponding to DPeakk . And recalculate the target scale feature vector and target scale scale (same calculation method in step 6).
步骤10目标在当前帧最优的位置中心确定为P,最优尺度确定为scale。在图像中标示出新的目标区域Rnew,即以P为中心,宽和高分别为w×scale、h×scale的矩形框。另外,将已经计算完成、并且能够得到最优目标位置中心P的卷积特征图简记为ztarget;同样,将能够得到最优目标尺度scale的尺度特征向量简记为zscale。In step 10, the center of the target's optimal position in the current frame is determined as P, and the optimal scale is determined as scale. A new target region Rnew is marked in the image, that is, a rectangular frame with P as the center and width and height of w×scale and h×scale respectively. In addition, the convolution feature map that has been calculated and can obtain the optimal target position center P is abbreviated as ztarget ; similarly, the scale feature vector that can obtain the optimal target scale scale is abbreviated as zscale .
步骤11利用ztarget、zscale,以及上一帧建立的跟踪模块中的目标模型和尺度模型Wscale,分别以加权求和的方式进行模型更新,计算方法如下:Step 11 uses ztarget , zscale , and the target model in the tracking module established in the previous frame and the scale model Wscale , the model is updated by weighted summation respectively, and the calculation method is as follows:
其中,β为步骤8计算后的学习率。Among them, β is the learning rate calculated in step 8.
步骤12对新的目标区域Rnew提取灰度特征后得到当前帧的目标外观表示矩阵Ak,将Ak加入到历史目标表示矩阵集合Ahis。如果集合Ahis中元素个数大于c(c设定为20),则从Ahis中随机选择c个元素生成一个三维矩阵Ck,Ck(:,i)对应的是Ahis中任意一个元素(即二维矩阵Ak);否则用Ahis中所有元素生成矩阵Ck。然后对Ck进行平均化得到二维矩阵,将这个二维矩阵作为检测模块新的滤波模型D,计算方法如下:Step 12: After extracting the gray features of the new target region Rnew , the target appearance representation matrix Ak of the current frame is obtained, and adding Ak to the historical target representation matrix set Ahis . If the number of elements in the set Ahis is greater thanc (c is set to 20), then randomly selectc elements from Ahis to generate a three-dimensional matrix Ck , and Ck (:,i) corresponds to any one of Ahis elements (that is, two-dimensional matrix Ak ); otherwise, all elements in Ahis are used to generate matrix Ck . Then Ck is averaged to obtain a two-dimensional matrix, and this two-dimensional matrix is used as the new filtering model D of the detection module, and the calculation method is as follows:
步骤13判断是否处理完视频中所有的图像帧,若处理完则算法结束,否则转步骤5继续执行。Step 13 judges whether all image frames in the video have been processed, and if processed, the algorithm ends, otherwise go to step 5 and continue to execute.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710204379.1ACN106952288B (en) | 2017-03-31 | 2017-03-31 | Based on convolution feature and global search detect it is long when block robust tracking method |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710204379.1ACN106952288B (en) | 2017-03-31 | 2017-03-31 | Based on convolution feature and global search detect it is long when block robust tracking method |
| Publication Number | Publication Date |
|---|---|
| CN106952288A CN106952288A (en) | 2017-07-14 |
| CN106952288Btrue CN106952288B (en) | 2019-09-24 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201710204379.1AExpired - Fee RelatedCN106952288B (en) | 2017-03-31 | 2017-03-31 | Based on convolution feature and global search detect it is long when block robust tracking method |
| Country | Link |
|---|---|
| CN (1) | CN106952288B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107452022A (en)* | 2017-07-20 | 2017-12-08 | 西安电子科技大学 | A kind of video target tracking method |
| CN107644430A (en)* | 2017-07-27 | 2018-01-30 | 孙战里 | Target following based on self-adaptive features fusion |
| CN107491742B (en)* | 2017-07-28 | 2020-10-23 | 西安因诺航空科技有限公司 | Long-term stable target tracking method for unmanned aerial vehicle |
| CN108734151B (en)* | 2018-06-14 | 2020-04-14 | 厦门大学 | Robust long-range target tracking method based on correlation filtering and deep Siamese network |
| CN110276782B (en)* | 2018-07-09 | 2022-03-11 | 西北工业大学 | Hyperspectral target tracking method combining spatial spectral features and related filtering |
| CN109271865B (en)* | 2018-08-17 | 2021-11-09 | 西安电子科技大学 | Moving target tracking method based on scattering transformation multilayer correlation filtering |
| CN109308469B (en)* | 2018-09-21 | 2019-12-10 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
| CN109410249B (en)* | 2018-11-13 | 2021-09-28 | 深圳龙岗智能视听研究院 | Self-adaptive target tracking method combining depth characteristic and hand-drawn characteristic |
| CN109596649A (en)* | 2018-11-29 | 2019-04-09 | 昆明理工大学 | A kind of method and device that host element concentration is influenced based on convolutional network coupling microalloy element |
| CN109754424B (en)* | 2018-12-17 | 2022-11-04 | 西北工业大学 | Correlation Filter Tracking Algorithm Based on Fusion Features and Adaptive Update Strategy |
| CN109740448B (en)* | 2018-12-17 | 2022-05-10 | 西北工业大学 | Aerial video target robust tracking method based on relevant filtering and image segmentation |
| CN111260687B (en)* | 2020-01-10 | 2022-09-27 | 西北工业大学 | An Aerial Video Object Tracking Method Based on Semantic Awareness Network and Correlation Filtering |
| CN111652910B (en)* | 2020-05-22 | 2023-04-11 | 重庆理工大学 | Target tracking algorithm based on object space relationship |
| CN112762841A (en)* | 2020-12-30 | 2021-05-07 | 天津大学 | Bridge dynamic displacement monitoring system and method based on multi-resolution depth features |
| CN114926497A (en)* | 2022-04-24 | 2022-08-19 | 北京机械设备研究所 | Single-target anti-occlusion tracking method and device based on ECO |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105631895A (en)* | 2015-12-18 | 2016-06-01 | 重庆大学 | Temporal-spatial context video target tracking method combining particle filtering |
| CN105741316A (en)* | 2016-01-20 | 2016-07-06 | 西北工业大学 | Robust target tracking method based on deep learning and multi-scale correlation filtering |
| CN106326924A (en)* | 2016-08-23 | 2017-01-11 | 武汉大学 | Object tracking method and object tracking system based on local classification |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105631895A (en)* | 2015-12-18 | 2016-06-01 | 重庆大学 | Temporal-spatial context video target tracking method combining particle filtering |
| CN105741316A (en)* | 2016-01-20 | 2016-07-06 | 西北工业大学 | Robust target tracking method based on deep learning and multi-scale correlation filtering |
| CN106326924A (en)* | 2016-08-23 | 2017-01-11 | 武汉大学 | Object tracking method and object tracking system based on local classification |
| Title |
|---|
| CNNTracker: Online discriminative object tracking via deep convolutional neural network;Yan Chen等;《Applied Soft Computing》;20160131;第38卷;第1088-1098页* |
| Hierarchical convolutional features for visual tracking;Chao Ma等;《2015 IEEE International Conference on Computer》;20151213;第3074-3082页* |
| Tracking Human-like Natural Motion Using Deep Recurrent Neural Networks;Youngbin Park等;《arXiv: Computer Vision and Pattern Recognition》;20160415;第1-8页* |
| Publication number | Publication date |
|---|---|
| CN106952288A (en) | 2017-07-14 |
| Publication | Publication Date | Title |
|---|---|---|
| CN106952288B (en) | Based on convolution feature and global search detect it is long when block robust tracking method | |
| CN108509859B (en) | Non-overlapping area pedestrian tracking method based on deep neural network | |
| Li et al. | Weighted low-rank decomposition for robust grayscale-thermal foreground detection | |
| CN103295242B (en) | A kind of method for tracking target of multiple features combining rarefaction representation | |
| Li et al. | Learning motion-robust remote photoplethysmography through arbitrary resolution videos | |
| CN105335986B (en) | Method for tracking target based on characteristic matching and MeanShift algorithm | |
| CN108647694B (en) | A Correlation Filtering Target Tracking Method Based on Context Awareness and Adaptive Response | |
| CN104992453B (en) | Target in complex environment tracking based on extreme learning machine | |
| CN107481264A (en) | A kind of video target tracking method of adaptive scale | |
| CN107316316A (en) | The method for tracking target that filtering technique is closed with nuclear phase is adaptively merged based on multiple features | |
| CN108182388A (en) | A kind of motion target tracking method based on image | |
| CN109461172A (en) | Manually with the united correlation filtering video adaptive tracking method of depth characteristic | |
| CN101789124B (en) | Segmentation method for space-time consistency of video sequence of parameter and depth information of known video camera | |
| CN111080675A (en) | A Target Tracking Method Based on Spatio-temporal Constraint Correlation Filtering | |
| CN106204638A (en) | A kind of based on dimension self-adaption with the method for tracking target of taking photo by plane blocking process | |
| CN103617636B (en) | The automatic detecting and tracking method of video object based on movable information and sparse projection | |
| CN110211157A (en) | A kind of target long time-tracking method based on correlation filtering | |
| CN109977971A (en) | Dimension self-adaption Target Tracking System based on mean shift Yu core correlation filtering | |
| CN111402303A (en) | A Target Tracking Architecture Based on KFSTRCF | |
| CN111914756A (en) | Video data processing method and device | |
| CN110084830A (en) | A kind of detection of video frequency motion target and tracking | |
| CN106548194B (en) | Construction method and positioning method of two-dimensional image human joint point positioning model | |
| CN106887012A (en) | A kind of quick self-adapted multiscale target tracking based on circular matrix | |
| CN111027586A (en) | A Target Tracking Method Based on Novel Response Graph Fusion | |
| CN111539396A (en) | Pedestrian detection and gait recognition method based on yolov3 |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | Granted publication date:20190924 | |
| CF01 | Termination of patent right due to non-payment of annual fee |