CN114972735B

Movatterモバイル変換

Info

Publication number: CN114972735B
Application number: CN202210615297.7A
Authority: CN
Inventors: 黄晶露; 胡仁杰; 黄成�; 吴晓蓓
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2022-06-01
Filing date: 2022-06-01
Publication date: 2025-06-10
Anticipated expiration: 2042-06-01
Also published as: CN114972735A

Abstract

The invention discloses an anti-occlusion moving target tracking device and method based on ROI prediction and multi-module learning, wherein the device comprises four modules, namely a tracking module, a detecting module and a comprehensive filtering module, the tracking module comprises a multi-feature extraction module and a relevant filtering module, the detecting module comprises an ROI prediction module and a cascade classification module, a position filter and a scale filter are used as an algorithm framework of the tracking module, PCA (component analysis) degradation and QR (quick response) decomposition are carried out on the position filter and the scale filter by adding a multi-feature extraction fusion optimization framework, the ROI prediction module carries out position prediction by utilizing square root volume Kalman filtering to obtain an ROI region as the input of the cascade classification module, the tracking module and the detecting module synchronously work, and the comprehensive module obtains the final output position by mutual correction, learning and parameter updating through coordination work, so that the tracking of a single moving target is realized. The invention improves the real-time performance, the robustness and the detection precision of target tracking.

Description

Anti-shielding moving target tracking device and method based on ROI prediction and multi-module learning

Technical Field

The invention relates to the technical field of visual target detection, in particular to an anti-shielding moving target tracking device and method based on ROI prediction and multi-module learning.

Background

Visual target tracking is a challenging computer vision task and is one of the key technologies in the field of artificial intelligence. Visual target tracking is defined as processing the acquired video frame by frame, and the processing process comprises detection, positioning, identification and tracking of the target, so that dynamic acquisition of target position information and motion information is realized. Visual target tracking is an indispensable part in the fields of mobile navigation, video monitoring, human-computer interaction and the like, and has good application prospect and development space.

Although some progress has been made in the field of target tracking, there are still problems of target occlusion, target disappearance, motion abrupt change, dimensional change, illumination change, etc. for long-term tracking of a single moving target. Many of the current algorithms cannot cope with the long-term tracking problem, because the target is more easily lost and the shape changes during long-term movement, so that a secondary detection mechanism needs to be integrated. In order to solve the long-term tracking problem, a framework of a tracking-detecting-learning algorithm is adopted as a basic framework, the tracking-detecting-learning is a robust tracking algorithm, a tracking module tracks the target under the condition that the target is continuously visible, a learning module can continuously learn the appearance of the target, a sample library of the target is updated in real time, a detecting module detects the target in real time in a global range, and the detection of the target can still be realized under the condition that the target is shielded or reappeared after disappearance. The framework of multi-module learning and collaborative updating of the tracking-detection-learning algorithm is a key for ensuring long-time tracking, but at present, a plurality of algorithms utilizing the framework of tracking-detection-learning have low robustness when the tracking module encounters problems of scale change, illumination change and the like, and the algorithm has large calculated amount and low instantaneity under the limitation of an embedded platform. Therefore, the robustness, accuracy and instantaneity of the algorithm are improved through the learning tracking-detecting-learning framework and aiming at two important modules, and the secondary detection is realized, so that the method is an important point for solving the problem of long-term tracking of the moving target.

Disclosure of Invention

The invention aims to provide an anti-occlusion moving target tracking device and method based on ROI prediction and multi-module learning, which have high accuracy, high real-time performance and high robustness.

The technical scheme for realizing the aim of the invention is that the anti-shielding moving target tracking device based on ROI prediction and multi-module learning comprises a tracking module, a detection module, a learning module and a synthesis module, wherein:

The tracking module comprises a multi-feature extraction module and a related filtering module, a scale filter is added on the basis of the position filter to serve as an algorithm framework of the tracking module, and PCA (principal component analysis) dimension reduction and QR decomposition are carried out on the position filter and the scale filter by adding a multi-feature extraction fusion optimization framework;

The detection module comprises an ROI prediction module and a cascade classification module, wherein the ROI prediction module performs position estimation by using square root volume Kalman filtering, obtains an ROI region by using the estimated position and inputs the ROI region as the cascade classification module, and a fHOG-SVM classifier is adopted in the cascade classification module;

The tracking module and the detection module work synchronously, parameters are corrected, learned and updated mutually through the learning module, and the comprehensive module obtains the final output position through coordination work among a plurality of modules, so that tracking of a single moving target is realized.

An anti-occlusion moving target tracking method based on ROI prediction and multi-module learning comprises the following steps:

Converting the gray level of an input RGB three-channel image into a single-channel image, carrying out color space standardization on the image by adopting a gamma correction method, calculating image gradients including gradient values and gradient directions of each pixel point, then constructing 9-dimensional HOG feature vectors, obtaining 36-dimensional feature vectors corresponding to each cell through normalization and truncation, extracting 31-dimensional features through PCA dimension reduction, combining the features of each cell, obtaining fHOG features of MxNx31 dimensions from one MxN image, and splicing the fHOG features with the gray level features of MxNx1 to obtain fusion features of MxNx32 dimensions;

Setting a position filter and a scale filter, firstly initializing the expected two-dimensional Gaussian output of a target aiming at the position filter, collecting a sample by taking the target position as the center, reducing the fused characteristic from 32 dimensions to 18 dimensions by using PCA dimension reduction, extracting the 18-dimensional characteristic for each pixel point of the sample, multiplying the 18-dimensional characteristic by a two-dimensional Hamming window as a test input, and then determining a new target position by using Fourier inverse transformation;

Step 3, predicting an area of the ROI, namely taking the position of the target in the image at the previous moment as an observation value, estimating the position of the target in the image at the next moment by utilizing a square root volume Kalman filtering algorithm, dividing the area by the length-width ratio and four times of the area of the previous frame, and taking the area as the ROI area of the current frame to send into a detection module;

Step 4, cascade classification, namely setting an image element variance classifier, a fHOG-SVM classifier and a nearest neighbor classifier, taking the ROI area as an input of a cascade classification module, namely a region to be detected, firstly generating a plurality of sliding windows to be detected with different scales in the region to be detected, sending the sliding windows to be detected into the image element variance classifier, calculating pixel gray variances of the window to be detected and a target frame image, considering a test sample with the variance smaller than half of the target sample variance as a negative sample, then taking a positive sample as an input of the fHOG-SVM classifier, extracting fHOG characteristics, sending the positive sample to the SVM classifier to obtain a positive and negative sample class result, finally taking positive sample windows obtained by the first two classifiers as an input of the nearest neighbor classifier, sequentially matching the similarity of each window and the online model, and updating a positive sample space of the online model, thereby obtaining a final positive sample of the cascade classification module, namely an output of the detection module;

In the P-N learning, firstly, predicting the target position of the current frame by using a tracking module, correcting a positive sample which is incorrectly divided into negative samples by a P expert into a positive sample if the predicted position is detected as the negative sample by the detection module, and sending the positive sample into a training set;

And 6, integrating multiple modules, namely obtaining a final output position through coordination work among the multiple modules, and realizing tracking of a single moving target.

Compared with the prior art, the method has the remarkable advantages that (1) a scale filter is added on the basis of a position filter to serve as an algorithm framework of a tracking module, tracking robustness is improved by adding a multi-feature extraction fusion optimization framework, PCA (principal component analysis) degradation and QR (quick response) decomposition are carried out on the position filter and the scale filter to reduce calculation amount and improve instantaneity, (2) in a detection module, position estimation is carried out by utilizing square root volume Kalman filtering, detection robustness when a target is shielded is improved, an ROI (region of interest) is obtained by utilizing the estimated position and is used as input of a cascade classifier, the search range is reduced, calculation amount is reduced, instantaneity is improved, and in the cascade classifier, a fHOG-SVM classifier is adopted to replace an original random fern classifier, fHOG features can keep good invariance for image scale and illumination change, calculation amount is reduced relative to HOG features, algorithm speed is improved, detection accuracy and instantaneity are improved by combining fHOG with an SVM, the tracking module and the detection module independently works, the inside the tracking module is mutually learned and parameter is updated by a learning update module, finally, the target is estimated after the position is jointly detected, the target is estimated again, and the target can still be normally lost after the target is restarted by the initialization detection module is restarted.

Drawings

FIG. 1 is a block diagram of an anti-occlusion moving object tracking method based on ROI prediction and multi-module learning of the present invention.

Detailed Description

The invention discloses an anti-shielding moving target tracking device based on ROI prediction and multi-module learning, which comprises a tracking module, a detection module, a learning module and a synthesis module, wherein:

As a specific example, the multi-feature extraction module is obtained by performing stitching fusion on the gray feature of mxn×1 and the quick direction gradient histogram of mxn×31, that is, fHOG features, where M, N is a positive integer, and the feature fHOG of 31 dimensions is obtained by performing normalization truncation to obtain a 36-dimensional feature vector corresponding to each cell, and then performing principal component analysis PCA to reduce dimensions, where the feature includes an 18-dimensional signed fHOG gradient, a 9-dimensional unsigned fHOG gradient, and a 4-dimensional feature from normalization operations of cells of the current cell and cells of 4 fields of diagonal fields.

As a specific example, the basic framework of the related filtering module is a discriminant scale space tracking algorithm framework, and the position filter and the scale filter are utilized to sequentially perform target positioning and scale evaluation, and principal component analysis PCA and orthogonal triangular QR decomposition are performed on the position filter and the scale filter respectively to optimize the algorithm framework.

As a specific example, the ROI prediction module performs ROI region estimation by adding square root volume Kalman filtering, and takes the predicted target position of the current frame as a center, and the length-width ratio and four times area of the previous frame are used for defining a region, and the region is used as the ROI region of the current frame to be sent to the cascade classification module.

As a specific example, the cascade classification module includes an image element variance classifier, a fHOG-SVM classifier, and a nearest neighbor classifier, which are specifically as follows:

The ROI prediction module predicts a region where a current frame target most likely appears, the region is used as an input of the cascade classification module, namely a region to be detected, a plurality of sliding windows to be detected with different scales are firstly generated in the region to be detected, the sliding windows are sent into the image element variance classifier, pixel gray level variances of the window to be detected and a target frame image are calculated, a test sample with the variances smaller than half of the target sample variances is regarded as a negative sample, then a positive sample is used as an input of the fHOG-SVM classifier, fHOG characteristics are extracted and sent into the SVM classifier to obtain a positive sample class result, and finally a positive sample window obtained by the first two classifiers is used as an input of a nearest neighbor classifier, similarity between each window and an online model is matched in sequence, and a positive sample space of the online model is updated, so that a final positive sample of the cascade classification module is obtained, namely an output of the detection module.

The invention discloses an anti-shielding moving target tracking method based on ROI prediction and multi-module learning, which comprises the following steps:

As a specific example, in step 2, for a position filter, firstly, initializing an expected two-dimensional gaussian output of a target, taking a sample with the target position as the center, reducing the fused feature from 32 dimensions to 18 dimensions by using PCA dimension reduction, extracting the 18-dimensional feature for each pixel point of the sample, multiplying the feature by a two-dimensional hamming window as a test input, and then determining a new position of the target by using inverse fourier transform, specifically as follows:

Firstly, a target sample selected in an initial image is set as a positive sample f, and a two-dimensional Gaussian function is selected as a desired output sample g, so that the following formula is minimum:

Wherein λ represents a convolution operation, f^l represents a feature of the first channel, h^l represents a filter of the first channel, l e {1, 2..d }, d is a dimension of the selected feature;

the upper part is converted into a complex frequency domain, and the Parseval formula is used for solving:

Wherein H^l,F^l, G is H^l,f^l, G is the corresponding variable obtained by discrete Fourier transform DFT,Is the conjugate transpose of G;

Updating the filter parameters with a training sample f_t:

Wherein,And B_t is a filterThe numerator and denominator of a corresponding one of the training samples f_t,And B_t-1 is the filter numerator and denominator of the last frame of the training sample, and eta is the learning rate;

If z_t is an image sample,For the variable obtained by the discrete fourier transform, the output y_t is:

Wherein,AndIs the numerator and denominator of the filter in the previous frame, y_t is the correlation score, and the state estimation of the current target position is obtained by searching the maximum correlation score.

As a specific example, in step 3, the position of the target in the image at the previous time is taken as an observation value, the position of the target in the image at the next time is estimated by using a square root volume kalman filter algorithm, and the length-width ratio and the quadruple area of the previous frame define a region, and the region is taken as the ROI region of the current frame and sent to the detection module, specifically as follows:

For discrete nonlinear dynamic target tracking systems with additive noise:

Wherein x_t and y_t represent the state and measurement value of the system at time t, f (·) and h (·) are nonlinear state transfer function and nonlinear measurement function, respectively, the process noise w_t-1 and the measurement noise v_t-1 are independent of each other, and w_t-1～N(0,Q_t),v_t-1～N(0,R_t);

The state estimation includes time update and measurement update, and when a failure is detected, the state parameters x_t-1 and S_t-1 of the last successful frame are used to initialize the filter, and then the filter gain K_t, the new state estimation is calculated byAnd square root factor of error covariance S_t:

Wherein P_xy,t is the measurement predictor 'S mutual covariance matrix, S_yy,t is the square root of the measurement predictor' S automatic covariance matrix,As a result of the system state prediction value,To estimate the measurement state predictor, χ_t、γ_t is the weight matrix, S_R,t is the square root of the covariance matrix of the measurement noise;

Estimating the position of the target in the image at the next moment by taking the position v= (i, j) of the target in the image at the previous moment as an observation valueThe length-width ratio of the previous frame and the four times area delimit the area, and the area is used as the ROI area of the current frame to be sent to the detection module.

As a specific example, in the step 4, the SVM solves the nonlinear problem using a kernel function, and makes a hyperplane as a decision surface by creating a hyperplane in the feature space, so that the isolation edge between the positive sample and the negative sample is maximized, and separates the positive sample and the negative sample, and assumes that the hyperplane is:

wx+b=0

Wherein w represents a normal vector, determines the direction of the hyperplane, b represents an offset, and determines the distance between the hyperplane and the origin;

Training sample set train＝{(x₁,y₁),(x₂,y₂),...,(x_n,y_n)},x∈Rⁿ,y_i∈{+1,-1},i represents the i-th sample, n represents sample capacity, classification surface needs to meet y_i[wx_i +b ]. Gtoreq.1, i=1, 2, m, and the best hyperplane problem translates into:

a lagrangian function was introduced:

should satisfyAnd (3) withObtaining the optimal solutionOptimal weight normal vector w^*, optimal offset b^*:

So the best hyperplane w^*x+b^* =0, the best classification function f (x) =sgn { w^*x+b^* };

and comparing the positive sample output by the fHOG-SVM classifier with the similarity of the online model to realize sample classification and updating the positive sample space of the online model, wherein the similarity calculation is specifically as follows:

Wherein S^r is a correlation similarity, S⁺ is a positive similarity, S^- is a negative similarity, defined as follows:

where M represents the target model of the sample library,A positive sample is represented and a positive sample is represented,Representing a negative sample, p representing a sample to be tested;

The calculation formula of S is as follows:

S(p_i,p_j)＝0.5(NCC(p_i,p_j)+1)

Wherein NCC is defined as follows:

Wherein μ_i,σ_i is the mean and standard deviation of image block p_i, μ_j,σ_j is the mean and standard deviation of image block p_j;

Finally, comparing the calculated S^r, wherein the larger the S^r is, the greater the possibility that the sample is a target is, the samples with the threshold value gamma and the threshold value gamma of S^r are regarded as positive samples, otherwise, the positive samples are negative samples and are discarded, meanwhile, new positive samples are added into the positive sample library of the online model for subsequent matching, the number of the positive sample libraries of the online model is fixed, the number is insufficient, some samples are randomly deleted and the new samples are added after the upper limit of the number is exceeded.

As a specific example, the step 6 is specifically as follows:

The comprehensive module obtains a final output position through coordination among a plurality of modules, and the comprehensive module is divided into four cooperation modes according to the operation results of the detection module and the tracking module:

(1) Success of tracking success detection

The successful detection means that at least one sliding window passes through the detection module, after the sliding windows pass through the detection module and are clustered, the final clustering result only has one clustering center, the failure detection means that no sliding window passes through the detection module, or that a plurality of sliding windows pass through the detection module but the clustering result has a plurality of clustering centers, the successful tracking means that the tracking module has characteristic rectangular frame output, and the failure tracking means that no characteristic rectangular frame output;

If the tracking is successful and the detection is successful, clustering the detection results to obtain related output results, judging the overlapping rate and the credibility of the clustering center and the tracking module, if the overlapping rate is lower than a threshold value of 0.5 and the credibility of the detection module is high, correcting the result of the tracking module by using the detection module, and if the overlapping rate is higher than the threshold value of 0.5, weighting and averaging the result obtained by using the detection module and the tracking module as final output;

(2) Tracking success detection failure

If tracking is successful but detection fails, directly taking the output of the tracking module as the final output of the current frame;

(3) Success of tracking failure detection

If the tracking fails but the detection is successful, the output sample frame of the detection module is clustered, if the final clustering result is only provided with one clustering center, the clustering result is used as the final output, and the tracking module is reinitialized by the clustering result, namely, the re-detection process of reentering after the target disappears;

(4) Failure to detect tracking failure

If both the tracking module and the detection module fail, the detection is considered invalid and discarded.

The invention is described in further detail below with reference to the accompanying drawings and specific examples.

Examples

Referring to fig. 1, the anti-occlusion moving target tracking method based on ROI prediction and multi-module learning of the present invention includes a feature extraction module, a correlation filtering tracking module, an ROI prediction module, a cascade detection module, a learning module, and a synthesis module, and the real-time process of the specific algorithm is as follows:

Step one, multi-feature extraction

Converting the gray level of the input RGB three-channel image into a single-channel image, carrying out color space standardization on the image by adopting a gamma correction method, calculating image gradients including gradient values and gradient directions of each pixel point, and then constructing a 9-dimensional HOG feature vector. And obtaining 36-dimensional feature vectors corresponding to each cell through normalization and truncation. And (3) through PCA dimension reduction, extracting 31-dimensional features, combining the features of each cell, obtaining fHOG features of MxNx31 dimensions from one MxN image, and splicing the fHOG features with the gray features of MxNx1 to obtain fusion features of MxNx32 dimensions. Through multi-feature fusion, the robustness of illumination and appearance rapid change is improved.

Step two, relevant filtering tracking

The correlation filter tracking module comprises two parts, namely a position filter and a scale filter. Firstly, initializing expected two-dimensional Gaussian output of a target, collecting a sample by taking the target position as the center, reducing the calculated amount, improving the running speed, reducing the fused characteristic from 32 dimensions to 18 dimensions by using PCA dimension reduction, extracting the 18-dimensional characteristic for each pixel point of the sample, multiplying the 18-dimensional characteristic by a two-dimensional Hamming window, taking the two-dimensional Hamming window as a test input, and obtaining y_t by using Fourier inverse transformation, wherein the maximum value of y is the new position of the target. The scale filter adopts the same design method as the position filter, firstly, the expected one-dimensional Gaussian output of the scale filter is initialized, samples under different scales are extracted by taking the target position as the center, each sample passes through a one-dimensional Hamming window and serves as a test input, then inverse Fourier transform is utilized to obtain y_t, and the maximum value of y is the new target scale.

Step three, ROI prediction region

The detection module needs to generate a multi-scale sliding window as a sample in the image global range, so that the calculation amount is greatly increased, the algorithm speed is reduced, and the ROI area, namely the region of interest, is determined by predicting the target position at the next moment, so that the search range is reduced, the detection sample is reduced, and the instantaneity is improved. And taking the position v= (i, j) of the target in the image at the previous moment as an observation value, estimating the position v= (i, j) of the target in the image at the next moment by utilizing a square root volume Kalman filtering algorithm, dividing the area by the length-width ratio of the previous frame and four times of the area, and taking the area as the ROI area of the current frame to be sent to a detection module.

Step four, cascade detection

The cascade detection module comprises an image element variance classifier, a fHOG-SVM classifier and a nearest neighbor classifier.

The ROI prediction module predicts a region where a current frame target is most likely to appear, and takes the region as an input of a cascade classifier, namely a region to be detected, firstly, a sample to be detected is obtained in the region to be detected in a multi-scale displacement mode, the sample to be detected is sent into the image element variance classifier, pixel gray variance of a window to be detected and a target frame image is calculated respectively, and if the variance of the sample to be detected is smaller than half of the variance of a target block diagram, the sample to be detected is regarded as a negative sample. The scan frame of the input region can be reduced by half by variance filtering.

Then, positive samples obtained by the image element variance classifier are used as input of the fHOG-SVM classifier, fHOG features are extracted, and positive and negative sample class results are obtained by sending the positive and negative samples to the SVM classifier. The SVM solves the problem of nonlinearity using a kernel function, the main idea being to create a hyperplane in the feature space as a decision surface, so that the isolation edge between positive and negative samples is maximized, separating the positive and negative samples.

And comparing the similarity of the positive sample output by the fHOG-SVM classifier with the online model to realize sample classification and updating the positive sample space of the online model. The similarity calculation is specifically as follows:

where M represents the target model of the sample library,A positive sample is represented and a positive sample is represented,Representing a negative sample, p representing the sample to be tested. The calculation formula of S is as follows:

S(pi,pj)=0.5(NCC(pi,pj)+1)

Wherein NCC is defined as follows:

Where μ_i,σ_i is the mean and standard deviation of image block p_i and μ_j,σ_j is the mean and standard deviation of image block p_j.

Finally, the calculated S^r is compared, the greater S^r is, the greater the possibility that the sample is the target is, the sample with the threshold value γ set, S^r > γ is considered as positive, and otherwise negative, and discarded. Meanwhile, new positive samples are added into a positive sample library of the online model for subsequent matching, the number of the positive sample libraries of the online model is fixed, the positive samples are added when the number is insufficient, and some samples are randomly deleted and then the new samples are added when the number exceeds the upper limit.

Step five, learning and updating

And a P-N learning mode is used in the algorithm, and the performance of the classifier in the detection module is optimized in an online learning mode, so that the generalization capability of the classifier is improved. In the P-N learning, first, a tracking module predicts a target position of a current frame, and if the predicted position is detected as a negative sample by a detection module, a P expert corrects the positive sample thus incorrectly divided into negative samples as a positive sample and sends the positive sample into a training set. Then, the N expert compares the positive sample generated by the detection module with the positive sample obtained by the P expert, and selects the most reliable sample as the output position.

Step six, multi-module synthesis

The comprehensive module obtains a final output position through coordination work among the multiple modules, and can be divided into four cooperation modes according to the operation results of the detection module and the tracking module:

(1) Success of tracking success detection

If the tracking is successful and the detection is successful, the detection results are clustered to obtain related output results, the overlapping rate and the credibility of the clustering center and the tracking module are judged, if the overlapping rate of the clustering center and the tracking module is low and the credibility of the detection module is high, the detection module is used for correcting the result of the tracking module, and if the overlapping rate is close, the detection module and the tracking module are used for weighted average to obtain the result as final output.

(2) Tracking success detection failure

If tracking is successful but detection fails, the output of the tracking module is directly taken as the final output of the current frame.

(3) Success of tracking failure detection

If tracking fails but detection is successful, the output sample frames of the detection module are clustered, if the final clustering result has only one clustering center, the result is used as the final output, and the tracking module is reinitialized by the final clustering result, namely, the re-detection process which is re-entered after the target disappears. If there are also a plurality of cluster centers, it is stated that although the detection module passes, there are a plurality of different positions, and the detection is considered to be failed.

(4) Failure to detect tracking failure

If both the tracking and detection modules fail, the detection is deemed invalid and discarded.

The invention provides a target tracking method aiming at the problems of scale change, illumination change, target shielding, target disappearance and the like of a single moving target in a long-term tracking process, which not only can overcome the problems to realize secondary tracking, but also has high algorithm instantaneity and high robustness. The invention also provides an anti-shielding moving target tracking device based on the ROI prediction and the multi-module learning, which comprises a tracking module, a detection module, a learning module and a comprehensive module, wherein the tracking module based on the multi-feature extraction and related filtering algorithm and the detection module based on the ROI prediction and fHOG-SVM are used for respectively carrying out target position prediction on a single frame, the comprehensive module is used for comprehensively outputting the results of the tracking module and the detection module, and meanwhile, the learning updating module is used for correcting the tracking module and the detection module, so that the classification and generalization level of the detection module is improved, and the stability of the algorithm is greatly improved. In a word, the invention takes the tracking-detecting-learning framework as the background, optimizes and improves the algorithm of each module, solves four main problems encountered in the long-term tracking problem, and has the advantages of high instantaneity, high robustness, high detection precision and the like.

Claims

Translated fromChinese

1.一种基于ROI预测和多模块学习的抗遮挡移动目标跟踪装置，其特征在于，包括跟踪模块、检测模块、学习模块、综合模块，其中：1. An anti-occlusion mobile target tracking device based on ROI prediction and multi-module learning, characterized in that it includes a tracking module, a detection module, a learning module, and a comprehensive module, wherein:

所述检测模块包括ROI预测模块、级联分类模块，ROI预测模块利用平方根容积卡尔曼滤波进行位置预估，利用预估得到的位置获得ROI区域并作为级联分类模块输入，级联分类模块中采用fHOG-SVM分类器；The detection module includes a ROI prediction module and a cascade classification module. The ROI prediction module uses a square root volume Kalman filter to estimate the position, and uses the estimated position to obtain the ROI area and use it as an input to the cascade classification module. The fHOG-SVM classifier is used in the cascade classification module;

所述跟踪模块与检测模块同步工作，通过学习模块相互校正、学习、更新参数，综合模块通过多模块之间的协调工作，得到最后的输出位置，实现对单个移动目标的跟踪；The tracking module and the detection module work synchronously, and the learning module calibrates, learns, and updates parameters with each other. The comprehensive module obtains the final output position through the coordination between multiple modules to achieve tracking of a single moving target.

所述多特征提取模块是将M×N×1的灰度特征与M×N×31的快速方向梯度直方图即fHOG特征进行拼接融合得到的，其中，M、N均为正整数，31维的fHOG特征是通过归一化截断得到每个cell对应的36维特征向量后再利用主成分分析PCA降维得到的，包括18维有符号的fHOG梯度，9维无符号的fHOG梯度，以及来自当前cell和对角线领域的4个领域cell的归一化操作的4维特征；The multi-feature extraction module is obtained by splicing and fusing the M×N×1 grayscale feature and the M×N×31 fast directional gradient histogram, i.e., fHOG feature, wherein M and N are both positive integers, and the 31-dimensional fHOG feature is obtained by normalizing and truncating to obtain a 36-dimensional feature vector corresponding to each cell and then reducing the dimension using principal component analysis PCA, including 18-dimensional signed fHOG gradient, 9-dimensional unsigned fHOG gradient, and 4-dimensional features of normalized operations from the current cell and 4 domain cells in the diagonal domain;

所述ROI预测模块通过加入平方根容积卡尔曼滤波进行ROI区域估计；以当前帧预测到的目标位置为中心，以上一帧的长宽比、以及四倍面积划定区域，将此区域作为当前帧的ROI区域送入级联分类模块；The ROI prediction module estimates the ROI region by adding a square root volume Kalman filter; taking the target position predicted by the current frame as the center, defining a region with the aspect ratio of the previous frame and four times the area, and sending this region as the ROI region of the current frame to the cascade classification module;

所述级联分类模块包括图像元方差分类器、fHOG-SVM分类器、最邻近分类器，具体如下：The cascade classification module includes an image element variance classifier, an fHOG-SVM classifier, and a nearest neighbor classifier, which are as follows:

ROI预测模块预测出当前帧目标最有可能出现的区域，将此作为级联分类模块的输入即待检测区域，首先在待测区域产生多个不同尺度的待测滑窗，并送入图像元方差分类器，计算待测窗口与目标框图像的像素灰度方差，把方差小于目标样本方差一半的测试样本认为是负样本；然后正样本作为fHOG-SVM分类器的输入，提取fHOG特征，送入SVM分类器得到正负样本类别结果；最后将前两个分类器得到的正样本窗口作为最邻近分类器的输入，依次匹配每个窗口与在线模型的相似度，并更新在线模型的正样本空间，从而得到级联分类模块的最终正样本，即检测模块的输出。The ROI prediction module predicts the area where the target is most likely to appear in the current frame, and uses this as the input of the cascade classification module, namely the area to be detected. First, multiple sliding windows of different scales are generated in the area to be detected, and sent to the image element variance classifier to calculate the pixel grayscale variance of the window to be tested and the target frame image. The test sample with a variance less than half of the target sample variance is considered to be a negative sample; then the positive sample is used as the input of the fHOG-SVM classifier, the fHOG feature is extracted, and sent to the SVM classifier to obtain the positive and negative sample category results; finally, the positive sample window obtained by the first two classifiers is used as the input of the nearest neighbor classifier, and the similarity of each window with the online model is matched in turn, and the positive sample space of the online model is updated, so as to obtain the final positive sample of the cascade classification module, namely the output of the detection module.

2.一种基于ROI预测和多模块学习的抗遮挡移动目标跟踪方法，其特征在于，包括以下步骤：2. A method for tracking moving targets with anti-occlusion based on ROI prediction and multi-module learning, characterized in that it comprises the following steps:

步骤1、多特征提取：将输入的RGB三通道图像灰度化转化为单通道图像，采用gamma校正法对图像进行颜色空间的标准化，计算图像梯度，包括每个像素点的梯度值和梯度方向，然后构造9维HOG特征向量；通过归一化截断得到每个cell对应的36维特征向量，通过PCA降维，提取31维特征，将每个cell的特征进行组合，从一幅M×N的图像中获得M×N×31维的fHOG特征，与M×N×1的灰度特征进行拼接得到M×N×32维的融合特征；Step 1, multi-feature extraction: convert the input RGB three-channel image into a single-channel image, use the gamma correction method to standardize the image color space, calculate the image gradient, including the gradient value and gradient direction of each pixel, and then construct a 9-dimensional HOG feature vector; obtain the 36-dimensional feature vector corresponding to each cell through normalization truncation, extract 31-dimensional features through PCA dimensionality reduction, combine the features of each cell, obtain M×N×31-dimensional fHOG features from an M×N image, and splice them with the M×N×1 grayscale features to obtain M×N×32-dimensional fusion features;

步骤3、ROI预测区域：将上一时刻目标在图像中的位置作为观测值，利用平方根容积卡尔曼滤波算法估计下一时刻目标在图像中的位置，以上一帧的长宽比、以及四倍面积划定区域，将此区域作为当前帧ROI区域送入检测模块；Step 3, ROI prediction area: Take the position of the target in the image at the previous moment as the observation value, use the square root volume Kalman filter algorithm to estimate the position of the target in the image at the next moment, delineate the area with the aspect ratio of the previous frame and four times the area, and send this area as the ROI area of the current frame to the detection module;

步骤4、级联分类：设置图像元方差分类器、fHOG-SVM分类器、最邻近分类器，ROI区域为当前帧目标最有可能出现的区域，将此ROI区域作为级联分类模块的输入即待检测区域，首先在待测区域产生多个不同尺度的待测滑窗，并送入图像元方差分类器，计算待测窗口与目标框图像的像素灰度方差，把方差小于目标样本方差一半的测试样本认为是负样本；然后正样本作为fHOG-SVM分类器的输入，提取fHOG特征，送入SVM分类器得到正负样本类别结果；最后将前两个分类器得到的正样本窗口作为最邻近分类器的输入，依次匹配每个窗口与在线模型的相似度，并更新在线模型的正样本空间，从而得到级联分类模块的最终正样本，即检测模块的输出；Step 4, cascade classification: set the image element variance classifier, fHOG-SVM classifier, and nearest neighbor classifier. The ROI area is the area where the target of the current frame is most likely to appear. This ROI area is used as the input of the cascade classification module, that is, the area to be detected. First, multiple sliding windows of different scales to be tested are generated in the area to be tested, and sent to the image element variance classifier. The pixel grayscale variance of the window to be tested and the target frame image is calculated. The test sample with a variance less than half of the target sample variance is considered to be a negative sample; then the positive sample is used as the input of the fHOG-SVM classifier, the fHOG feature is extracted, and it is sent to the SVM classifier to obtain the positive and negative sample category results; finally, the positive sample window obtained by the first two classifiers is used as the input of the nearest neighbor classifier, and the similarity of each window with the online model is matched in turn, and the positive sample space of the online model is updated, so as to obtain the final positive sample of the cascade classification module, that is, the output of the detection module;

步骤5、学习更新：使用P-N学习方式，以在线学习的方式对检测模块中的分类器进行性能优化；在P-N学习中，首先，利用跟踪模块对当前帧的目标位置进行预测，如果预测位置被检测模块检测为负样本，则P专家就会将被错误分为负样本的正样本纠正为正样本，并送入训练集；然后，N专家将检测模块产生的正样本与P专家得到的正样本进行比较，选出最可信的样本，作为输出位置；Step 5, learning update: Use the P-N learning method to optimize the performance of the classifier in the detection module in an online learning manner. In P-N learning, first, the tracking module is used to predict the target position of the current frame. If the predicted position is detected as a negative sample by the detection module, the P expert will correct the positive sample that is incorrectly classified as a negative sample to a positive sample and send it to the training set. Then, the N expert compares the positive sample generated by the detection module with the positive sample obtained by the P expert, and selects the most credible sample as the output position.

步骤6、多模块综合：通过多模块之间的协调工作，得到最后的输出位置，实现对单个移动目标的跟踪。Step 6: Multi-module integration: Through the coordination between multiple modules, the final output position is obtained to achieve tracking of a single moving target.

3.根据权利要求2所述的基于ROI预测和多模块学习的抗遮挡移动目标跟踪方法，其特征在于，步骤2中，针对位置滤波器，首先初始化目标的预期二维高斯输出，以目标位置为中心采集一个样本，利用PCA降维将融合特征从32维降到18维，对于样本的每个像素点提取这18维特征，并乘以二维汉明窗，作为测试输入，然后利用傅里叶反变换确定目标新位置，具体如下：3. According to the anti-occlusion moving target tracking method based on ROI prediction and multi-module learning in claim 2, it is characterized in that, in step 2, for the position filter, the expected two-dimensional Gaussian output of the target is first initialized, a sample is collected with the target position as the center, and the fusion feature is reduced from 32 dimensions to 18 dimensions by PCA dimensionality reduction. For each pixel point of the sample, the 18-dimensional feature is extracted and multiplied by a two-dimensional Hamming window as a test input, and then the new position of the target is determined by inverse Fourier transform, as follows:

首先，设初始图像中选定的目标样本为正样本f，并选取二维高斯函数作为期望输出样本g，使得下式最小：First, assume that the target sample selected in the initial image is the positive sample f, and select a two-dimensional Gaussian function as the expected output sample g, so that the following formula is minimized:

其中，*表示卷积操作，λ表示权重系数，f^l表示第l通道的特征，h^l表示第l通道的滤波器，l∈{1,2,...,d}，d为选取特征的维数；Where * represents the convolution operation, λ represents the weight coefficient,^fl represents the feature of the lth channel,^hl represents the filter of the lth channel, l∈{1,2,...,d}, d is the dimension of the selected feature;

将上式转换为复频域，用Parseval公式求解：Convert the above formula into complex frequency domain and solve it using Parseval formula:

其中，H^l,F^l,G是h^l,f^l,g经过离散傅里叶变换DFT得到的相应变量，为G的共轭转置；Among them, H^l , F^l , G are the corresponding variables obtained by discrete Fourier transform DFT of h^l ,^fl , g, is the conjugate transpose of G;

用一个训练样本f_t更新滤波器参数：Update the filter parameters using a training sample_ft :

其中，和B_t为滤波器对应的一个训练样本f_t的分子和分母，和B_t-1为该训练样本上一帧的滤波器分子和分母，η为学习率；in, and B_t is the filter The numerator and denominator of a corresponding training sample_ft are: and B_t-1 are the numerator and denominator of the filter of the previous frame of the training sample, and η is the learning rate;

若z_t为图像样本，为离散傅里叶变换得到的变量，则输出y_t为：If z_t is an image sample, is the variable obtained by discrete Fourier transform, then the output y_t is:

其中，和是前一帧中滤波器的分子和分母；y_t为相关性得分，通过寻找最大的相关性得分，得到当前目标位置的状态估计。in, and are the numerator and denominator of the filter in the previous frame;_yt is the correlation score. By finding the maximum correlation score, the state estimation of the current target position is obtained.

4.根据权利要求2所述的基于ROI预测和多模块学习的抗遮挡移动目标跟踪方法，其特征在于，步骤3中，将上一时刻目标在图像中的位置作为观测值，利用平方根容积卡尔曼滤波算法估计下一时刻目标在图像中的位置，以上一帧的长宽比、以及四倍面积划定区域，将此区域作为当前帧ROI区域送入检测模块，具体如下：4. The anti-occlusion mobile target tracking method based on ROI prediction and multi-module learning according to claim 2 is characterized in that, in step 3, the position of the target in the image at the previous moment is used as the observation value, and the square root volume Kalman filter algorithm is used to estimate the position of the target in the image at the next moment, and the area is demarcated by the aspect ratio of the previous frame and four times the area, and this area is sent to the detection module as the current frame ROI area, as follows:

针对具有加性噪声的离散非线性动态目标跟踪系统：For discrete nonlinear dynamic target tracking systems with additive noise:

其中，x_t和y_t分别表示t时刻系统的状态和测量值，f(·)和h(·)分别是非线性状态转移函数和非线性测量函数，过程噪声w_t-1和测量噪声v_t-1相互独立，且w_t-1～N(0,Q_t)，v_t-1～N(0,R_t)；Where_xt and_yt represent the state and measurement value of the system at time t, respectively; f(·) and h(·) are the nonlinear state transfer function and nonlinear measurement function, respectively; the process noise wt_-1 and the measurement noise vt_-1 are independent of each other, and wt_-1 ～N(0,_Qt ), vt_-1 ～N(0,_Rt );

状态估计包括时间更新和测量更新，当检测到故障时，最后一个成功帧的状态参数x_t-1和S_t-1用于初始化过滤器，然后通过下式计算滤波器增益K_t、新的状态估计和误差协方差的平方根因子S_t：State estimation includes time update and measurement update. When a fault is detected, the state parameters xt_-1 and St_-1 of the last successful frame are used to initialize the filter. Then the filter gain_Kt and the new state estimate are calculated by the following formula: and the square root factor of the error covariance_St :

其中，P_xy,t为量测预测值的互协方差阵，S_yy,t为量测预测值的自协方差阵的平方根，为系统状态预测值，为估计量测状态预测值，χ_t、γ_t为权重阵，S_R,t为量测噪声的协方差阵的平方根；Where P_xy,t is the cross-covariance matrix of the measurement prediction values, S_yy,t is the square root of the autocovariance matrix of the measurement prediction values, is the predicted value of the system state, is the estimated measurement state prediction value, χ_t , γ_t are weight matrices, and S_R,t is the square root of the covariance matrix of the measurement noise;

将上一时刻目标在图像中的位置v＝(i,j)作为观测值，估计下一时刻目标在图像中的位置以上一帧的长宽比，以及四倍面积划定区域，将此区域作为当前帧ROI区域送入检测模块。Take the position v = (i, j) of the target in the image at the previous moment as the observation value and estimate the position of the target in the image at the next moment The area is demarcated using the aspect ratio of the previous frame and four times the area, and this area is sent to the detection module as the current frame ROI area.

5.根据权利要求2所述的基于ROI预测和多模块学习的抗遮挡移动目标跟踪方法，其特征在于，所述步骤4中，SVM使用核函数解决非线性问题，通过在特征空间建立一个超平面作为决策曲面，使得正样本与负样本之间的隔离边缘被最大化，将正负样本分开，假设超平面为：5. The anti-occlusion moving target tracking method based on ROI prediction and multi-module learning according to claim 2 is characterized in that in the step 4, SVM uses a kernel function to solve the nonlinear problem, and a hyperplane is established in the feature space as a decision surface so that the isolation edge between the positive sample and the negative sample is maximized, and the positive and negative samples are separated. Assume that the hyperplane is:

wx+b＝0wx+b＝0

其中，w表示法向量，决定超平面方向，b表示偏移量，决定超平面与原点之间的距离；Among them, w represents the normal vector, which determines the direction of the hyperplane, and b represents the offset, which determines the distance between the hyperplane and the origin;

训练样本集train＝{(x₁,y₁),(x₂,y₂),...,(x_n,y_n)}，x∈Rⁿ，y_i∈{+1,-1}，i表示第i样本，n表示样本容量；分类面需满足y_i(wx_i+b)≥1,i＝1,2,…,m，最佳超平面问题转化为：The training sample set train＝{(x₁ ,y₁ ),(x₂ ,y₂ ),…,(x_n ,y_n )}, x∈Rⁿ ,_yi∈ {+1,-1}, i represents the i-th sample, and n represents the sample capacity; the classification surface must satisfy_yi (_wxi +b)≥1,i＝1,2,…,m, and the optimal hyperplane problem is transformed into:

引入拉格朗日函数：Introduce the Lagrangian function:

应满足与解得最优解最优权值法向量w^*，最优偏移量b^*：Should meet and Solve the optimal solution Optimal weighted normal vector w^* , optimal offset b^* :

所以得到最佳超平面w^*x+b^*＝0，最佳分类函数f(x)＝sgn{w^*x+b^*}；Therefore, the optimal hyperplane w^* x + b^* = 0, and the optimal classification function f (x) = sgn {w^* x + b^* };

将fHOG-SVM分类器输出的正样本与在线模型进行相似度比较，实现样本分类，并更新在线模型的正样本空间，相似度计算具体如下：The positive samples output by the fHOG-SVM classifier are compared with the online model for similarity, sample classification is achieved, and the positive sample space of the online model is updated. The similarity calculation is as follows:

其中，M表示样本库的目标模型，表示正样本，表示负样本，p表示待测样本；Among them, M represents the target model of the sample library, represents a positive sample, represents negative samples, and p represents the sample to be tested;

S计算公式如下：The calculation formula of S is as follows:

S(p_i,p_j)＝0.5(NCC(p_i,p_j)+1)S(_pi ,p_j )=0.5(NCC(_pi ,p_j )+1)

其中，NCC定义如下：Among them, NCC is defined as follows:

其中，μ_i，σ_i为图像块p_i的均值和标准差，μ_j，σ_j为图像块p_j的均值和标准差；Wherein, μ_i , σ_i are the mean and standard deviation of image block_pi , μ_j , σ_j are the mean and standard deviation of image block p_j ;

最后，将计算出的S^r进行比较，S^r越大，该样本是目标的可能性越大，设定阈值γ，S^r＞γ的样本被认为是正样本，反之则是负样本，舍弃；同时，将新的正样本加入在线模型的正样本库用于后续的匹配，在线模型的正样本库的数量是固定的，数量不足就添加，超过数量上限就随机删除一些样本再添加新样本。Finally, the calculated S^r is compared. The larger the S^r is, the greater the possibility that the sample is the target. The threshold γ is set. Samples with S^r > γ are considered positive samples, otherwise they are negative samples and are discarded. At the same time, the new positive samples are added to the positive sample library of the online model for subsequent matching. The number of positive sample libraries of the online model is fixed. If the number is insufficient, additional samples are added. If the number exceeds the upper limit, some samples are randomly deleted and new samples are added.

6.根据权利要求2所述的基于ROI预测和多模块学习的抗遮挡移动目标跟踪方法，其特征在于，所述步骤6具体如下：6. The anti-occlusion moving target tracking method based on ROI prediction and multi-module learning according to claim 2, characterized in that the step 6 is specifically as follows:

综合模块通过多模块之间的协调工作，得到最后的输出位置，按照检测模块与跟踪模块的运行结果共分为四种协作模式：The comprehensive module obtains the final output position through the coordination between multiple modules. According to the operation results of the detection module and the tracking module, there are four collaboration modes:

(1)跟踪成功检测成功(1) Tracking success and detection success

检测成功是指至少有一个滑动窗口通过了检测模块，并且将通过的滑动窗口进行聚类后，最终聚类的结果只有一个聚类中心，检测失败指的是没有滑动窗口通过检测模块，或者有多个滑动窗口通过检测模块但是聚类结果有多个聚类中心；跟踪成功指的是跟踪模块有特征矩形框输出，跟踪失败指的是没有特征矩形框输出；Successful detection means that at least one sliding window passes through the detection module, and after clustering the sliding windows that pass through, the final clustering result has only one cluster center. Failed detection means that no sliding window passes through the detection module, or there are multiple sliding windows that pass through the detection module but the clustering result has multiple cluster centers. Successful tracking means that the tracking module has a feature rectangular box output, and failed tracking means that there is no feature rectangular box output.

如果跟踪成功并检测成功，则对检测结果进行聚类得到相关输出结果，判断聚类中心与跟踪模块的重叠率和可信度，若两者重叠度低于阈值0.5且检测模块可信度高，则用检测模块修正跟踪模块的结果，如果重叠度高于阈值0.5，则用检测模块与跟踪模块加权平均得到结果作为最终输出；If the tracking and detection are successful, the detection results are clustered to obtain relevant output results, and the overlap rate and credibility of the cluster center and the tracking module are determined. If the overlap between the two is lower than the threshold of 0.5 and the detection module has high credibility, the detection module is used to correct the result of the tracking module. If the overlap is higher than the threshold of 0.5, the detection module and the tracking module are weighted averaged to obtain the result as the final output.

(2)跟踪成功检测失败(2) Tracking success and detecting failure

如果跟踪成功但是检测失败，则直接将跟踪模块的输出作为当前帧的最终输出；If tracking succeeds but detection fails, the output of the tracking module is directly used as the final output of the current frame;

(3)跟踪失败检测成功(3) Tracking failure detection success

如果跟踪失败但是检测成功，则将检测模块的输出样本框进行聚类，如果最终聚类的结果只有一个聚类中心，则用此聚类的结果作为最终输出，并用此聚类的结果重新初始化跟踪模块，也就是目标消失后再次进入的重检测过程；如果聚类中心有多个，说明虽然通过了检测模块，但是有多个不同位置，认为检测失败；If tracking fails but detection succeeds, the output sample frame of the detection module is clustered. If the final clustering result has only one cluster center, the result of this cluster is used as the final output, and the tracking module is reinitialized with the result of this cluster, that is, the re-detection process after the target disappears; if there are multiple cluster centers, it means that although the detection module has passed, there are multiple different positions, and the detection is considered to have failed;

(4)跟踪失败检测失败(4) Tracking failure detection failure

如果跟踪模块和检测模块均失败了，则认为此次检测无效，丢弃。If both the tracking module and the detection module fail, the detection is considered invalid and discarded.