CN107093189A

Movatterモバイル変換

Info

Publication number: CN107093189A
Application number: CN201710253274.5A
Authority: CN
Inventors: 刘治; 郭庆荣; 姬海燕; 孔令爽; 许建中; 邱清晨; 曹艳坤
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2017-04-18
Filing date: 2017-04-18
Publication date: 2017-08-25

Abstract

Translated fromChinese

本发明公开了基于自适应颜色特征和时空上下文的目标跟踪方法及系统；包括以下步骤：在输入视频序列第一帧图像后，用矩形框选择跟踪目标；计算第1帧的目标置信图；跟踪目标置信图包括上下文先验模型和空间上下文模型；根据空间上下文模型学习更新第t帧的时空上下文模型；利用得到的时空上下文模型，对第t+1帧图像进行卷积操作，计算第t+1帧的目标置信图，并求得目标置信图的最大似然概率位置作为最佳目标位置；将最佳目标位置作为新的跟踪目标，同时依据第t+1帧的目标置信图，然后重复步骤，进行t+2帧图像的最佳目标位置确定；从而实现对所有帧图像的最佳目标位置确定。

The invention discloses a target tracking method and system based on adaptive color features and spatio-temporal context; comprising the following steps: after inputting the first frame image of a video sequence, using a rectangular frame to select a tracking target; calculating the target confidence map of the first frame; tracking The target confidence map includes a context prior model and a spatial context model; learn and update the spatiotemporal context model of the tth frame according to the spatial context model; use the obtained spatiotemporal context model to perform convolution operations on the t+1th frame image, and calculate the t+1th frame image The target confidence map of 1 frame, and obtain the maximum likelihood probability position of the target confidence map as the best target position; use the best target position as a new tracking target, and repeat according to the target confidence map of the t+1th frame The step is to determine the best target position of t+2 frame images; thereby realizing the best target position determination for all frame images.

Description

Translated fromChinese

基于自适应颜色特征和时空上下文的目标跟踪方法及系统Target tracking method and system based on adaptive color feature and spatio-temporal context

技术领域technical field

本发明涉及图像处理和计算机视觉技术跟踪领域，具体涉及基于自适应颜色特征和时空上下文的目标跟踪方法。The invention relates to the field of image processing and computer vision technology tracking, in particular to a target tracking method based on adaptive color features and spatio-temporal context.

背景技术Background technique

视频目标跟踪是机器视觉领域的基础问题之一，融合了图像处理、模式识别、人工智能、自动控制等多种不同领域的理论知识。视频跟踪是目标识别、行为识别等后续应用的基础，在军事制导、视觉导航、智能交通、医疗诊断等众多领域有着广泛的应用前景。Video object tracking is one of the basic problems in the field of machine vision, which integrates theoretical knowledge in many different fields such as image processing, pattern recognition, artificial intelligence, and automatic control. Video tracking is the basis for subsequent applications such as target recognition and behavior recognition, and has broad application prospects in many fields such as military guidance, visual navigation, intelligent transportation, and medical diagnosis.

在运动目标跟踪问题上的研究，大致可以分为两种方法：(a)不依赖先验知识，直接从图形序列中检测到运动目标，并进行目标识别，最终跟踪感兴趣的运动目标；(b)依赖先验知识，首先对运动目标建立模型，然后在后续图像序列中实时找到相匹配的运动目标。围绕这两种方法，产生了很多效果较好目标检测和跟踪算法。对于不依赖先验知识的目标跟欧洲那个而言，检测是最重要的。目标检测是将跟踪目标从背景中提取出来。运动目标跟踪就是在图像序列中寻找与目标模板最相似候选目标区位置的过程，即在序列图像中进行目标定位。The research on the moving target tracking problem can be roughly divided into two methods: (a) do not rely on prior knowledge, directly detect the moving target from the graphic sequence, and perform target recognition, and finally track the moving target of interest; ( b) Relying on prior knowledge, first build a model for the moving target, and then find the matching moving target in real time in the subsequent image sequence. Around these two methods, many target detection and tracking algorithms with better results have been produced. Detection is most important for targets that do not rely on prior knowledge and for European ones. Object detection is to extract the tracked object from the background. Moving target tracking is the process of finding the position of the candidate target area most similar to the target template in the image sequence, that is, to locate the target in the sequence image.

实现目标的准确定位需要以描述目标的视觉特征建立其表观模型。具有良好可分性的视觉特征，是实现对跟踪目标与视场背景精确分割与提取的关键，因此视觉特征的选择是实现鲁棒跟踪前提。若所选视觉特征具有较强可分性，即使简单的跟踪算法也能实现可靠跟踪。常用的视觉特征有颜色、边缘、光流、小波、上下文等。To realize the accurate positioning of the target, it is necessary to establish its appearance model by describing the visual features of the target. Visual features with good separability are the key to achieve accurate segmentation and extraction of the tracking target and the field of view background, so the selection of visual features is the prerequisite for robust tracking. Even simple tracking algorithms can achieve reliable tracking if the selected visual features are strongly separable. Commonly used visual features include color, edge, optical flow, wavelet, context, etc.

由于光照、遮挡以及目标移动旋转缩放引起的目标本身变化等因素，进行鲁棒的跟踪仍然面对巨大的挑战。Due to factors such as illumination, occlusion, and changes in the target itself caused by target movement, rotation and scaling, it is still a huge challenge to perform robust tracking.

发明内容Contents of the invention

本发明的目的就是为了解决上述问题，提供一种基于自适应颜色特征和时空上下文的目标跟踪方法及系统，它具有可实现有效的多尺度目标跟踪优点。The object of the present invention is to solve the above problems, and provide a method and system for object tracking based on adaptive color features and spatio-temporal context, which has the advantage of realizing effective multi-scale object tracking.

为了实现上述目的，本发明采用如下技术方案：In order to achieve the above object, the present invention adopts the following technical solutions:

基于自适应颜色特征和时空上下文的目标跟踪方法，包括以下步骤：A target tracking method based on adaptive color features and spatio-temporal context, including the following steps:

步骤(1)：在输入视频序列第一帧图像后，用矩形框选择跟踪目标；Step (1): After inputting the first frame image of the video sequence, use a rectangular frame to select the tracking target;

步骤(2)：计算第1帧的目标置信图；Step (2): Calculate the target confidence map of the first frame;

步骤(3)：所述跟踪目标置信图包括上下文先验模型和空间上下文模型；Step (3): the tracking target confidence map includes a context prior model and a spatial context model;

通过生物视觉系统的集中注意力特性和颜色特征对跟踪目标的局部上下文信息建模，得到上下文先验模型，即先验概率；Model the local context information of the tracking target through the concentration characteristics and color features of the biological visual system, and obtain the context prior model, that is, the prior probability;

通过对上下文先验模型先进行快速傅里叶变换再进行快速傅里叶反变换，得到跟踪目标与跟踪目标周围上下文信息的空间关系的空间上下文模型，即条件概率；By first performing fast Fourier transform on the context prior model and then performing inverse fast Fourier transform, the spatial context model of the spatial relationship between the tracking target and the context information around the tracking target is obtained, that is, the conditional probability;

步骤(4)：根据步骤(3)的空间上下文模型学习更新第t帧的时空上下文模型；Step (4): learning and updating the spatiotemporal context model of the tth frame according to the spatial context model of step (3);

步骤(5)：利用步骤(4)得到的时空上下文模型，对第t+1帧图像进行卷积操作，计算第t+1帧的目标置信图，并求得目标置信图的最大似然概率位置作为最佳目标位置；Step (5): Use the space-time context model obtained in step (4) to perform convolution operation on the t+1th frame image, calculate the target confidence map of the t+1th frame, and obtain the maximum likelihood probability of the target confidence map position as the best target position;

步骤(6)：将步骤(4)的最佳目标位置作为新的跟踪目标，同时依据步骤(5)的到的第t+1帧的目标置信图，然后重复步骤(3)-(5)，进行t+2帧图像的最佳目标位置确定；Step (6): Use the best target position in step (4) as the new tracking target, and at the same time, according to the target confidence map of the t+1th frame obtained in step (5), then repeat steps (3)-(5) , to determine the best target position of t+2 frame images;

从而实现对所有帧图像的最佳目标位置确定。In this way, the best target position determination for all frame images is realized.

进一步的，所述步骤(2)的目标置信图的公式为：Further, the formula of the target confidence map of the step (2) is:

式中，c(x)最大的位置就是目标位置，x表示目标位置，o表示目标出现，特性X^c＝{c(z)}＝{(I(z),z)|z∈Ω_c(x^*)}表示图像特性，I(z)表示位置z处的图像自适应颜色特征值，Ω_c(x^*)表示目标位置X^*的局部区域，P(x|c(z),o)表示条件概率，P(c(z)|o)表示先验概率，b是标准化常数，α是尺度参数，β是形状参数。In the formula, the largest position of c(x) is the target position, x represents the target position, o represents the appearance of the target, and the characteristic X^c ={c(z)}={(I(z),z)|z∈Ω_c ( x^* )} represents the image characteristics, I(z) represents the image adaptive color feature value at position z, Ω_c (x^* ) represents the local area of the target position X^* , P(x|c(z),o) Represents the conditional probability, P(c(z)|o) represents the prior probability, b is the normalization constant, α is the scale parameter, and β is the shape parameter.

进一步的，所述步骤(3)上下文先验模型描述的是先验概率，上下文先验模型为：Further, the context prior model in step (3) describes the prior probability, and the context prior model is:

P(c(z)|o)＝I(z)ω_σ(z-x^*) (2)P(c(z)|o)＝I(z)ω_σ (zx^* ) (2)

表示权重函数，a表示归一化参数，σ表示尺度参数，通常距离目标X^*越近的点对跟踪目标越重要，权重越大，I(z)表示z处像素点的自适应颜色特征值。 Represents the weight function, a represents the normalization parameter, σ represents the scale parameter, usually the closer the point to the target X^* is, the more important it is to track the target, and the greater the weight, I(z) represents the adaptive color feature value of the pixel at z .

所述自适应颜色特征值是：The adaptive color feature values are:

先将RGB颜色细化为黑、蓝、棕、灰、绿、橙、粉、紫、红、白和黄共11种，First refine the RGB colors into black, blue, brown, gray, green, orange, pink, purple, red, white and yellow, a total of 11 kinds,

然后利用PCA(Principal Component Analysis)将11维降为2维，选择出比较显著的两种颜色：Then use PCA (Principal Component Analysis) to reduce the 11 dimensions to 2 dimensions, and select two more significant colors:

步骤a)：颜色特征向量的公式为X_M×N×K，X是目标跟踪图像，M为图像行数，N为图像列数，K为颜色特征数量，K＝11；Step a): the formula of the color feature vector is X_M×N×K , X is the target tracking image, M is the number of image rows, N is the number of image columns, K is the number of color features, K=11;

步骤b)：计算步骤a)中的颜色特征向量中所有像素点颜色特征的均值，并利用均值对所有像素点的颜色值进行中心化，得到中心化的颜色特征向量即每个像素点颜色信息减去所有像素点颜色信息的均值，均值计算公式为：Step b): Calculate the mean value of the color features of all pixels in the color feature vector in step a), and use the mean value to center the color values of all pixels to obtain a centered color feature vector That is, the color information of each pixel is subtracted from the mean value of the color information of all pixels. The formula for calculating the mean value is:

其中，是所有像素点颜色特征的均值，z是像素数，MN是像素点个数，是第z个像素点的颜色信息；in, is the mean value of the color features of all pixels, z is the number of pixels, MN is the number of pixels, is the color information of the zth pixel;

步骤c)：计算步骤b)得到中心化的颜色特征向量的协方差矩阵，公式为：Step c): Calculation step b) to obtain the centered color feature vector The covariance matrix of , the formula is:

步骤d)：对步骤c)中的协方差矩阵进行特征值分解，得到一组降序排列的特征值及对应的特征向量x：Step d): Perform eigenvalue decomposition on the covariance matrix in step c), and obtain a set of eigenvalues and corresponding eigenvectors x in descending order:

|C_z-λE|＝0； (6)|_Cz -λE|=0; (6)

(C_z-λE)y＝0； (7)(_Cz -λE)y=0; (7)

其中，C_z是协方差矩阵，E是单位矩阵，λ是协方差矩阵C_z的特征值，y是特征值λ对应的特征向量；Among them, C_z is the covariance matrix, E is the identity matrix, λ is the eigenvalue of the covariance matrix C_z , and y is the eigenvector corresponding to the eigenvalue λ;

步骤e)：根据步骤d)得到的特征值、特征向量，得到特征矩阵，公式为：Step e): According to the eigenvalues and eigenvectors obtained in step d), the eigenmatrix is obtained, and the formula is:

C_z是协方差矩阵，Λ_j是的D₁*D₂维的对角矩阵；C_z is the covariance matrix, Λ_j is Diagonal matrix of D₁ *D₂ dimensions;

步骤f)：依据降序排列的特征值，选择前a个特征值对应的颜色特征主成分，主成分选择方法为：Step f): According to the eigenvalues arranged in descending order, select the principal component of the color feature corresponding to the first a eigenvalues. The method for selecting the principal component is:

B_z＝[w₁,w₂,…,w_a]^T； (10)B_z =[w₁ ,w₂ ,...,w_a ]^T ; (10)

其中，λ_j为第j个特征值,j＝1…m，w_i为第i个特征向量，i＝1…a，a为保留的主成分个数，a＝2，0.995的意义是保留的主成分包含原来特征的99.5％的信息量，B_z是主成分投影矩阵；投影矩阵B_z作为D₂的特征矩阵R_z的归一化特征向量，特征矩阵对应最大特征值。Among them, λ_j is the jth eigenvalue, j=1...m, w_i is the i-th eigenvector, i=1...a, a is the number of retained principal components, a=2, and the meaning of 0.995 is to keep The principal component of contains 99.5% of the information of the original feature, and B_z is the principal component projection matrix; the projection matrix B_z is used as the normalized eigenvector of the feature matrix R_z of D₂ , and the feature matrix corresponds to the largest eigenvalue.

步骤g)：通过步骤f)得到的主成分投影矩阵，对颜色特征进行主成分投影，得到主成分分析后的颜色特征向量，主成分映射公式为：Step g): Through the principal component projection matrix obtained in step f), the principal component projection is performed on the color feature, and the color feature vector after principal component analysis is obtained. The principal component mapping formula is:

I(z)是主成分投影后的数据，B_z是主成分投影矩阵。I(z) is the data after principal component projection, and B_z is the principal component projection matrix.

空间上下文模型是通过基于上下文先验模型和置信图在线学习得到的，随着跟踪的进行不断进行更新。The spatial context model is obtained through online learning based on the context prior model and the confidence map, and is continuously updated as the tracking progresses.

进一步的，所述步骤(3)中空间上下文模型描述的是条件概率函数，具体为：Further, what the spatial context model describes in the step (3) is a conditional probability function, specifically:

P(x|c(z),o)＝h^sc(x-z)； (12)P(x|c(z),o)=h^sc (xz); (12)

式中，h^sc(x-z)表示目标位置x与局部区域点z之间的相对距离以及方向关系的函数，即目标与目标的空间上下文的空间关系。In the formula, h^sc (xz) represents the function of the relative distance and direction relationship between the target position x and the local area point z, that is, the spatial relationship between the target and the target’s spatial context.

根据公式(1)(2)(12)置信图公式表示如下：According to the formula (1) (2) (12) the confidence map formula is expressed as follows:

式中，表示卷积符号；In the formula, Indicates the convolution symbol;

卷积通过快速傅里叶变换加速，具体为：Convolution is accelerated by fast Fourier transform, specifically:

式中，F表示傅里叶变换函数，表示点积符号；In the formula, F represents the Fourier transform function, Indicates the dot product symbol;

通过两次傅里叶变换和一次傅里叶反变换学习后，空间上下文模型表示为：After learning through two Fourier transforms and one inverse Fourier transform, the spatial context model is expressed as:

式中，F^-1表示傅里叶反变换函数；In the formula, F^-1 represents the inverse Fourier transform function;

利用空间上下文模型公式(15)学习得到时空上下文模型，时空上下文模型具体为：Using the spatial context model formula (15) to learn the spatio-temporal context model, the spatio-temporal context model is specifically:

式中，ρ表示学习参数，表示第t帧的时空上下文模型。In the formula, ρ represents the learning parameter, represents the spatio-temporal context model of frame t.

步骤(5)，具体为：Step (5), specifically:

在第t+1帧时，基于t帧目标区域获取局部区域Ω_c(x^*)，得到特征集，具体为：At the t+1th frame, the local area Ω_c (x^* ) is obtained based on the target area of the t frame, and the feature set is obtained, specifically:

X^c＝{c(z)＝(I(z),z)|z∈Ω_c(x^*)}； (17)X^c ={c(z)=(I(z),z)|z∈Ω_c (x^* )}; (17)

求得t+1帧时，置信图最大似然概率位置，具体为：Find the maximum likelihood probability position of the confidence map when t+1 frame is obtained, specifically:

式中，c_t+1(x)具体为：In the formula, c_t+1 (x) is specifically:

同时，更新公式(3)的尺度参数，具体为：At the same time, update the scale parameters of formula (3), specifically:

式中，s′_t表示两帧之间的估计尺度，c_t(·)表示置信图，s_t+1表示估计目标尺度，通过滤波器获得，表示连续n帧图像的估计尺度的平均值，λ＞0是固定滤波器参数。In the formula, s′_t represents the estimated scale between two frames, c_t (·) represents the confidence map, s_t+1 represents the estimated target scale, which is obtained through a filter, Indicates the average value of the estimated scale of consecutive n frames of images, and λ>0 is a fixed filter parameter.

基于自适应颜色特征和时空上下文的目标跟踪方法，包括：Object tracking methods based on adaptive color features and spatio-temporal context, including:

跟踪目标选择模块：在输入视频序列第一帧图像后，用矩形框选择跟踪目标；Tracking target selection module: after inputting the first frame image of the video sequence, use a rectangular frame to select the tracking target;

第1帧的目标置信图计算模块：计算第1帧的目标置信图；The target confidence map calculation module of the first frame: calculate the target confidence map of the first frame;

空间上下文模型计算模块：所述跟踪目标置信图包括上下文先验模型和空间上下文模型；Spatial context model calculation module: the tracking target confidence map includes a context prior model and a spatial context model;

时空上下文模型计算模块：根据空间上下文模型学习更新第t帧的时空上下文模型；Spatio-temporal context model calculation module: learn and update the spatio-temporal context model of frame t according to the spatial context model;

最佳目标位置确定模块：利用得到的时空上下文模型，对第t+1帧图像进行卷积操作，计算第t+1帧的目标置信图，并求得目标置信图的最大似然概率位置作为最佳目标位置；Optimal target position determination module: use the obtained spatio-temporal context model to perform convolution operation on the t+1th frame image, calculate the target confidence map of the t+1th frame, and obtain the maximum likelihood probability position of the target confidence map as best target position;

所有帧图像的最佳目标位置确定模块：将最佳目标位置作为新的跟踪目标，同时依据第t+1帧的目标置信图，然后依次进入空间上下文模型计算模块、时空上下文模型计算模块和最佳目标位置确定模块，进行t+2帧图像的最佳目标位置确定；The best target position determination module for all frame images: take the best target position as a new tracking target, and at the same time, according to the target confidence map of the t+1th frame, then enter the spatial context model calculation module, spatio-temporal context model calculation module and the final The optimal target position determination module is used to determine the optimal target position of t+2 frame images;

本发明的有益效果：Beneficial effects of the present invention:

(1)本发明利用稠密的时空场景模型来进行跟踪，在贝叶斯框架下，利用目标和目标局部的稠密信息的时空关系来建模，置信图在被计算时考虑了上一帧目标的位置的先验信息，这有效的减轻了目标位置的模糊。(1) The present invention uses a dense spatio-temporal scene model to track. Under the Bayesian framework, it uses the spatio-temporal relationship between the target and the local dense information of the target to model. The confidence map takes into account the previous frame of the target when it is calculated. location prior information, which effectively alleviates the ambiguity of the object location.

(2)本发明使用自适应颜色特征描述目标，并对颜色特征进行PCA降维，去除颜色特征的冗余信息，解决了多通道特征融合问题，使得对目标的外观描述更加精确性和鲁棒性，有效应对光照变化，同时也降低了运算难度，提高了运算速度。(2) The present invention uses adaptive color features to describe the target, and performs PCA dimension reduction on the color features, removes redundant information of the color features, solves the problem of multi-channel feature fusion, and makes the appearance description of the target more accurate and robust Responsibility, effectively respond to changes in illumination, but also reduce the difficulty of calculation, improve the speed of calculation.

(3)本发明通过连续两帧的目标最佳位置处的置信值的比值计算当前帧目标的估计尺度，避免引入噪声和过度敏感的自适应引入连续多帧的平均估计尺度，最后通过滤波获得最终目标估计尺度，能够有效地跟踪尺度变化的目标。(3) The present invention calculates the estimated scale of the target in the current frame through the ratio of the confidence values at the best position of the target in two consecutive frames, avoiding the introduction of noise and oversensitive adaptively introducing the average estimated scale of continuous multi-frames, and finally obtains it by filtering The final object estimate scale is able to efficiently track scale-varying objects.

附图说明Description of drawings

图1为本发明的基本流程示意图。Fig. 1 is a schematic diagram of the basic flow of the present invention.

图2(a)为第t帧学习空间上下文模型；Figure 2(a) learns the spatial context model for frame t;

图2(b)为在第t+1帧中检测目标位置；Figure 2(b) is the detection target position in the t+1th frame;

图3为本发明的颜色特征降维示意图。Fig. 3 is a schematic diagram of color feature dimensionality reduction in the present invention.

具体实施方式detailed description

下面结合附图与实施例对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

如图1所示，基于自适应颜色特征和时空上下文的目标跟踪方法，包括以下步骤：As shown in Figure 1, the target tracking method based on adaptive color features and spatio-temporal context includes the following steps:

步骤(4)：根据步骤(3)的空间上下文模型学习更新第t帧的时空上下文模型；如图2(a)所示；Step (4): learning and updating the spatiotemporal context model of frame t according to the spatial context model of step (3); as shown in Figure 2(a);

步骤(5)：利用步骤(4)得到的时空上下文模型，对第t+1帧图像进行卷积操作，计算第t+1帧的目标置信图，并求得目标置信图的最大似然概率位置作为最佳目标位置；如图2(b)所示；Step (5): Use the space-time context model obtained in step (4) to perform convolution operation on the t+1th frame image, calculate the target confidence map of the t+1th frame, and obtain the maximum likelihood probability of the target confidence map position as the best target position; as shown in Figure 2(b);

P(c(z)|o)＝I(z)ω_σ(z-x^*) (2)P(c(z)|o)＝I(z)ω_σ (zx^* ) (2)

所述自适应颜色特征值是：The adaptive color feature values are:

然后利用PCA(Principal Component Analysis)将11维降为2维，选择出比较显著的两种颜色：如图3所示，Then use PCA (Principal Component Analysis) to reduce the 11 dimensions to 2 dimensions, and select two more significant colors: as shown in Figure 3,

|C_z-λE|＝0； (6)|_Cz -λE|=0; (6)

(C_z-λE)y＝0； (7)(_Cz -λE)y=0; (7)

B_z＝[w₁,w₂,…,w_a]^T； (10)B_z =[w₁ ,w₂ ,...,w_a ]^T ; (10)

P(x|c(z),o)＝h^sc(x-z)； (12)P(x|c(z),o)=h^sc (xz); (12)

式中，表示卷积符号。In the formula, Represents the convolution symbol.

式中，F表示傅里叶变换函数，表示点积符号。In the formula, F represents the Fourier transform function, Indicates the dot product notation.

式中，F^-1表示傅里叶反变换函数。In the formula, F^-1 represents the inverse Fourier transform function.

步骤(4)，利用空间上下文模型公式(15)学习得到时空上下文模型，时空上下文模型具体为：Step (4), using the spatial context model formula (15) to learn the spatiotemporal context model, the spatiotemporal context model is specifically:

步骤(5)，具体为：Step (5), specifically:

式中，c_t+1(x)具体为：In the formula, c_t+1 (x) is specifically:

上述虽然结合附图对本发明的具体实施方式进行了描述，但并非对本发明保护范围的限制，所属领域技术人员应该明白，在本发明的技术方案的基础上，本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本发明的保护范围以内。Although the specific implementation of the present invention has been described above in conjunction with the accompanying drawings, it does not limit the protection scope of the present invention. Those skilled in the art should understand that on the basis of the technical solution of the present invention, those skilled in the art do not need to pay creative work Various modifications or variations that can be made are still within the protection scope of the present invention.