Background
The target tracking has important significance for the development of the fields of robots, unmanned planes, automatic driving, navigation, guidance and the like. For example, in the human-computer interaction process, the camera continuously tracks the human behavior, and the robot achieves the understanding of the human posture, the human motion and the human gesture through a series of analysis processing, so that the friendly communication between the human and the machine is better realized; in the unmanned aerial vehicle target tracking process, visual information of a target is continuously acquired and transmitted to a ground control station, and a video image sequence is analyzed through an algorithm to obtain real-time position information of the tracked target so as to ensure that the tracked target is within the visual field range of the unmanned aerial vehicle in real time.
When tracking an object in a video using a KCF algorithm, a fast motion or a violent motion of the tracked object may cause a search area of the KCF algorithm not to be completely covered, thereby causing a tracking failure. One way to ensure that the search area covers the target is by enlarging the area of the search area, but this introduces boundary effects that cause the filter to learn too much background information. There is a need to find an algorithm that can both enlarge the search area and suppress the background information of the filter.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: although the normalized spatial regularization related filtering algorithm can well solve the boundary effect, the regularization strategy of the algorithm only applies different penalty coefficients to the filter according to the positions of the coefficients of the filter, but the filter learning target information is not beneficial, so that the regularization strategy which can inhibit the boundary effect and better learn the target information needs to be found.
In order to solve the above technical problem, a technical solution of the present invention is to provide a target tracking method based on an appearance adaptive spatial regularization correlation filter, which is characterized by comprising the following steps:
(1) initializing the learning rate of a filter, the maximum iteration times of an ADMM algorithm, a Lagrange penalty factor and the size of a search box;
(2) extracting an image block containing a target from the t frame image, sorting each pixel point of the image block into a sample, and sequentially putting all sample points into a sequence D;
(3) clustering the sample points in the array D by using a K-means algorithm, and specifically setting as follows: measuring the similarity of the sample points by Euclidean distance, wherein the initial 5 centroids are designated as four vertexes of the rectangular image block and the central point of the rectangle, and finally obtaining the category of each sample point;
(4) arranging the sample points into a matrix P with the same size as the original image block according to the original sequence, wherein the elements of the matrix P are corresponding samplesThe class value of the point in the step (3) is that a matrix P with the same center point as that of the matrix P is intercepted from the matrix P1But matrix P1Is 0.6 times the current target size, for matrix P1Counting and sorting the number of the belonged classes, considering the class with the largest number as the class where the target is located, naming as the target class, if the current frame is not the first frame, adding a class which is closest to all the previous target classes, adding the target class, setting the position of the class belonging to the target class to be 1 by using the elements in the matrix P, and if not, setting the positions of the classes to be 0, finally obtaining a Mask matrix of the area where the target is located, resetting the value of the position with the value of 1 to be 0.01 according to the Mask matrix, resetting the value of the position with the value of 0 to be 100000, and naming as a weight matrix w;
(5) solving the filter by using an alternating direction multiplier method, wherein an objective function L (f, g) of the filter is as follows:
where f is the filter, g is the auxiliary variable, y is the label generated by the Gaussian function,
represents the D-th feature channel of the target image block of the t-th frame, D represents the total number of feature channels,
is a Lagrange multiplier, mu is a Lagrange penalty factor;
the ADMM algorithm solves the objective function by iteratively solving the following subproblems:
the above subproblems are all closed-form solutions:
the horizontal line on the matrix represents the frequency domain form of the matrix, and the elements of the matrix N are all 1;
(5) the filter trained in step (4) is recorded as
And updating the previous filter, wherein the updated formula is shown as follows:
in the formula (I), the compound is shown in the specification,
and eta is the learning rate of the filter.
(7) If t frame is not the last frame, use hitAnd (3) scoring the candidate samples to obtain a response graph, taking the position with the maximum response value as the position of the target central point, and returning to the step (2), otherwise, ending the tracking.
Preferably, in step (2), the sample comprises 5 dimensions, and the sequence is: and carrying out dimension normalization processing on the R channel value, the G channel value, the B channel value, the X-axis coordinate and the Y-axis coordinate, so that the values are distributed in a [0,1] interval.
The invention can effectively limit the learning content of the correlation filter, reduce the background information of the correlation filter and inhibit the boundary effect of the correlation filter. Compared with the traditional correlation filter with spatial regularization, the method has the advantages that the target area and the background area are more accurately inhibited to different degrees; the search range of the correlation filter can be expanded, and the robustness of the correlation filter to large displacement of the target is improved.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
With reference to fig. 1, the target tracking method based on the appearance adaptive spatial regularization correlation filter provided by the present invention includes the following steps:
(1) initializing the learning rate of the filter, the maximum iteration times of the ADMM, a Lagrange penalty factor and the size of a search box.
(2) And extracting image blocks containing the target from the t frame image, and sorting each pixel point of the image blocks into a sample. This sample contains 5 dimensions, arranged in the order: r channel, G channel value, B channel value, X axis coordinate, and Y axis coordinate. And carrying out dimension-based normalization processing on the first 3 dimensions, so that the values are distributed in a [0,1] interval. Finally, all the sample points are put into the array D in sequence.
(3) Clustering the sample points in the array D by using a K-means algorithm, and specifically setting as follows: the similarity of the sample points is measured in euclidean distances, and the initial 5 centroids are specified as the four vertices of the rectangular image block and the center point of the rectangle. And finally obtaining the category of each sample point.
(6) Arranging the sample points into a matrix P with the same size as the original image block according to the original sequence, wherein the elements of the matrix P are the belonged values of the corresponding sample points in the step (3). Intercepting a matrix P with the same center point as the matrix P in the matrix P1But matrix P1Is 0.6 times the current target size. For matrix P1And counting and ordering the number of the belonged categories, and considering the category with the largest number as the category where the target is located, and naming the category as the target category. In addition, if the current frame is not the first frame, adding a frame closest to all the previous target classesAnd (4) adding the target class. And (3) setting the position of the element in the matrix P to be 1 according to the category belonging to the target class, otherwise, setting the position to be 0, finally obtaining a Mask matrix of the region where the target is located, resetting the value of the position with the value of 1 to be 0.01 according to the Mask matrix, resetting the value of the position with the value of 0 to be 100000, and naming the element as a weight matrix w.
(5) And solving the filter by using an alternating direction multiplier method, wherein the objective function of the filter is as follows:
where f is the filter, g is the auxiliary variable, y is the label generated by the Gaussian function,
represents the D-th feature channel of the target image block of the t-th frame, D represents the total number of feature channels,
is a lagrange multiplier and μ is a lagrange penalty factor. The ADMM algorithm solves the objective function by iteratively solving the following subproblems:
the subproblems in the above equation all have a closed form solution, as shown in the following equation:
in the above formula, the horizontal lines on the matrix represent the frequency domain form of the matrix, and the elements of the matrix N are all 1.
(6) The filter trained in step (5) is recorded as
And for the previous filterAnd updating the line, wherein the updated formula is shown as the following formula:
in the formula (I), the compound is shown in the specification,
and eta is the learning rate of the filter.
(7) If t frame is not the last frame, use
And (3) scoring the candidate samples to obtain a response graph, taking the position with the maximum response value as the position of the target central point, and returning to the step (2), otherwise, ending the tracking.
Fig. 2 is a process for obtaining the weight matrix according to the embodiment of the present invention.