Movatterモバイル変換


[0]ホーム

URL:


CN113920172A - Target tracking method, device, equipment and storage medium - Google Patents

Target tracking method, device, equipment and storage medium
Download PDF

Info

Publication number
CN113920172A
CN113920172ACN202111519551.5ACN202111519551ACN113920172ACN 113920172 ACN113920172 ACN 113920172ACN 202111519551 ACN202111519551 ACN 202111519551ACN 113920172 ACN113920172 ACN 113920172A
Authority
CN
China
Prior art keywords
previous
current
infrared
color
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111519551.5A
Other languages
Chinese (zh)
Other versions
CN113920172B (en
Inventor
周圣垚
周俊琨
吉翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Ruiyanxinchuang Technology Co ltd
Original Assignee
Chengdu Ruiyanxinchuang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Ruiyanxinchuang Technology Co ltdfiledCriticalChengdu Ruiyanxinchuang Technology Co ltd
Priority to CN202111519551.5ApriorityCriticalpatent/CN113920172B/en
Publication of CN113920172ApublicationCriticalpatent/CN113920172A/en
Application grantedgrantedCritical
Publication of CN113920172BpublicationCriticalpatent/CN113920172B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The application provides a target tracking method, a device, equipment and a storage medium, wherein the target tracking method comprises the following steps: acquiring a current color image and a current infrared image corresponding to a current frame, and a previous color image and a previous infrared image corresponding to a previous frame; performing fusion processing on the current color image, the current infrared image, the previous color image and the previous infrared image to obtain fusion characteristics; inputting the fusion characteristics into a prediction network to obtain a prediction characteristic graph, wherein the prediction characteristic graph comprises a plurality of prediction positions of an object in a current frame and a preliminary probability corresponding to each prediction position; obtaining a target probability corresponding to each predicted position according to a target position of the object in a previous frame, a plurality of predicted positions of the object in a current frame and a preliminary probability corresponding to each predicted position; and acquiring the maximum target probability from the target probability corresponding to each predicted position, and taking the predicted position corresponding to the maximum target probability as the target position of the object in the current frame.

Description

Target tracking method, device, equipment and storage medium
Technical Field
The present application relates to the field of target tracking technologies, and in particular, to a target tracking method, apparatus, device, and storage medium.
Background
With the rapid development of deep learning technology and the popularization of intelligent cameras, the demand for tracking any target is increasingly increased, especially for intelligent security, automatic aerial photography of unmanned aerial vehicles, rare animal protection, accident search and rescue and the like.
However, in existing tracking schemes, visible light is typically used independently as visual information. However, the existing tracking scene is more and more complex, and the tracking target is often affected by factors such as illumination, shading and the like, thereby causing the tracking failure.
Disclosure of Invention
Based on the above, a target tracking method, device, equipment and storage medium are provided to solve the technical problem of easy tracking failure in the prior art.
In a first aspect, a target tracking method is provided, including:
acquiring a current color image and a current infrared image corresponding to a current frame, and a previous color image and a previous infrared image corresponding to a previous frame;
performing fusion processing on the current color image, the current infrared image, the previous color image and the previous infrared image to obtain fusion characteristics;
inputting the fusion characteristics into a prediction network to obtain a prediction characteristic graph, wherein the prediction characteristic graph comprises a plurality of prediction positions of an object in a current frame and a preliminary probability corresponding to each prediction position;
obtaining a target probability corresponding to each predicted position according to a target position of the object in a previous frame, a plurality of predicted positions of the object in a current frame and a preliminary probability corresponding to each predicted position;
and acquiring the maximum target probability from the target probability corresponding to each predicted position, and taking the predicted position corresponding to the maximum target probability as the target position of the object in the current frame.
Firstly, acquiring a current color image and a current infrared image corresponding to a current frame, and a previous color image and a previous infrared image corresponding to a previous frame; performing fusion processing on the current color image, the current infrared image, the previous color image and the previous infrared image to obtain fusion characteristics; then, inputting the fusion characteristics into a prediction network to obtain a prediction characteristic graph, wherein the prediction characteristic graph comprises a plurality of prediction positions of the object in the current frame and a preliminary probability corresponding to each prediction position; obtaining a target probability corresponding to each predicted position according to a target position of the object in a previous frame, a plurality of predicted positions of the object in a current frame and a preliminary probability corresponding to each predicted position; and finally, acquiring the maximum target probability from the target probability corresponding to each predicted position, and taking the predicted position corresponding to the maximum target probability as the target position of the object in the current frame. Therefore, when the target is tracked, more target information can be utilized, the tracking strength is increased to a certain extent, further, the target probability corresponding to each predicted position is obtained according to the target position of the object in the previous frame, and therefore when the target position of the object in the current frame is determined, prior information is also utilized, namely the target position of the object in the previous frame, and due to the prior information, the probability of tracking failure can be reduced to a certain extent.
Optionally, the fusing the current color image, the current infrared image, the previous color image, and the previous infrared image to obtain a fusion characteristic includes:
respectively extracting the characteristics of the current color image, the current infrared image, the previous color image and the previous infrared image to obtain a current color characteristic, a current infrared characteristic, a previous color characteristic and a previous infrared characteristic;
and fusing according to the current color feature, the current infrared feature, the previous color feature and the previous infrared feature to obtain a fused feature.
It is explained how the fusion features are obtained, i.e. feature extraction is performed first, and then fusion is performed according to the extracted features, so as to obtain the fusion features.
Optionally, the fusing according to the current color feature, the current infrared feature, the previous color feature, and the previous infrared feature to obtain a fused feature includes:
calculating the correlation degree of the current color characteristic and the previous color characteristic to obtain the color correlation degree;
calculating the relevance of the current infrared characteristic and the previous infrared characteristic to obtain the infrared relevance;
and averaging the color correlation and the infrared correlation to obtain fusion characteristics.
The fusion features are obtained by averaging the color correlation and the infrared correlation, and can mine the color features and the infrared features between two frames to a certain extent.
Optionally, the fusing according to the current color feature, the current infrared feature, the previous color feature, and the previous infrared feature to obtain a fused feature includes:
calculating the correlation degree of the current color characteristic and the previous color characteristic to obtain the color correlation degree;
calculating the relevance of the current infrared characteristic and the previous infrared characteristic to obtain the infrared relevance;
calculating the relevance of the current color feature and the previous infrared feature to obtain the color infrared relevance;
calculating the correlation degree of the current infrared characteristic and the previous color characteristic to obtain the infrared color correlation degree;
and averaging the color correlation, the infrared correlation, the color infrared correlation and the infrared color correlation to obtain the fusion characteristic.
The fusion characteristics are obtained by averaging the color correlation, the infrared correlation, the color infrared correlation and the infrared color correlation, and can fully mine the characteristics among different types of images, thereby improving the prediction rate.
Optionally, the performing feature extraction on the current color image, the current infrared image, the previous color image, and the previous infrared image respectively to obtain a current color feature, a current infrared feature, a previous color feature, and a previous infrared feature includes:
inputting the current color image and the previous color image into a color feature extraction network respectively to obtain a current color feature and a previous color feature;
and respectively inputting the current infrared image and the previous infrared image into an infrared feature extraction network to obtain the current infrared feature and the previous infrared feature.
Because the infrared image and the color image have great difference, if only one feature extraction network is used for simultaneously extracting the features of the infrared image and the features of the color image, the feature extraction effect may not be good, and then, for the infrared image and the color image, the feature extraction networks are respectively trained to extract the features.
Optionally, the fusing the current color image, the current infrared image, the previous color image, and the previous infrared image to obtain a fusion characteristic includes:
fusing the current color image and the current infrared image to obtain a current fused image;
fusing the previous color image and the previous infrared image to obtain a previous fused image;
respectively extracting the features of the current fusion image and the previous fusion image to obtain the current fusion feature and the previous fusion feature;
and calculating the relevance of the current fusion feature and the previous fusion feature to obtain the fusion feature.
The method comprises the steps of firstly fusing color images and infrared images with the same frame sequence to obtain a current fused image and a previous fused image, then respectively extracting the characteristics of the current fused image and the previous fused image to obtain a current fused characteristic and a previous fused characteristic, and finally calculating the relevance of the current fused characteristic and the previous fused characteristic to obtain a fused characteristic.
Optionally, the obtaining, according to the target position of the object in the previous frame, the multiple predicted positions of the object in the current frame, and the preliminary probability corresponding to each predicted position, the target probability corresponding to each predicted position includes:
calculating to obtain a plurality of prediction length-width ratios according to a plurality of prediction positions of the object in the current frame;
calculating a target aspect ratio according to a target position of the object in a previous frame;
calculating to obtain an absolute difference matrix according to each predicted aspect ratio and the target aspect ratio;
obtaining a window matrix according to the target position of the object in the previous frame and the prediction characteristic diagram;
dividing the window matrix by the absolute difference matrix to obtain a target window matrix;
combining the preliminary probabilities corresponding to each predicted position to obtain a probability matrix;
and multiplying the target window matrix by the probability matrix to obtain a target probability matrix, wherein the target probability matrix comprises a target probability corresponding to each position.
The initial probability of the predicted position is updated, and the target probability corresponding to the predicted position is obtained, so that the accuracy of position prediction is improved.
In a second aspect, there is provided a target tracking apparatus, comprising:
the acquisition module is used for acquiring a current color image and a current infrared image corresponding to a current frame, and a previous color image and a previous infrared image corresponding to a previous frame;
the fusion module is used for carrying out fusion processing on the current color image, the current infrared image, the previous color image and the previous infrared image to obtain fusion characteristics;
the prediction module is used for inputting the fusion characteristics into a prediction network to obtain a prediction characteristic graph, and the prediction characteristic graph comprises a plurality of prediction positions of the object in the current frame and a preliminary probability corresponding to each prediction position;
the target module is used for obtaining a target probability corresponding to each predicted position according to a target position of the object in a previous frame, a plurality of predicted positions of the object in a current frame and a preliminary probability corresponding to each predicted position;
and the position module is used for acquiring the maximum target probability from the target probability corresponding to each predicted position, and taking the predicted position corresponding to the maximum target probability as the target position of the object in the current frame.
Optionally, the fusion module is specifically configured to:
respectively extracting the characteristics of the current color image, the current infrared image, the previous color image and the previous infrared image to obtain a current color characteristic, a current infrared characteristic, a previous color characteristic and a previous infrared characteristic;
and fusing according to the current color feature, the current infrared feature, the previous color feature and the previous infrared feature to obtain a fused feature.
Optionally, the fusion module is specifically configured to:
calculating the correlation degree of the current color characteristic and the previous color characteristic to obtain the color correlation degree;
calculating the relevance of the current infrared characteristic and the previous infrared characteristic to obtain the infrared relevance;
and averaging the color correlation and the infrared correlation to obtain fusion characteristics.
Optionally, the fusion module is specifically configured to:
calculating the correlation degree of the current color characteristic and the previous color characteristic to obtain the color correlation degree;
calculating the relevance of the current infrared characteristic and the previous infrared characteristic to obtain the infrared relevance;
calculating the relevance of the current color feature and the previous infrared feature to obtain the color infrared relevance;
calculating the correlation degree of the current infrared characteristic and the previous color characteristic to obtain the infrared color correlation degree;
and averaging the color correlation, the infrared correlation, the color infrared correlation and the infrared color correlation to obtain the fusion characteristic.
Optionally, the fusion module is specifically configured to:
inputting the current color image and the previous color image into a color feature extraction network respectively to obtain a current color feature and a previous color feature;
and respectively inputting the current infrared image and the previous infrared image into an infrared feature extraction network to obtain the current infrared feature and the previous infrared feature.
Optionally, the fusion module is specifically configured to:
fusing the current color image and the current infrared image to obtain a current fused image;
fusing the previous color image and the previous infrared image to obtain a previous fused image;
respectively extracting the features of the current fusion image and the previous fusion image to obtain the current fusion feature and the previous fusion feature;
and calculating the relevance of the current fusion feature and the previous fusion feature to obtain the fusion feature.
Optionally, the target module is specifically configured to:
calculating to obtain a plurality of prediction length-width ratios according to a plurality of prediction positions of the object in the current frame;
calculating a target aspect ratio according to a target position of the object in a previous frame;
calculating to obtain an absolute difference matrix according to each predicted aspect ratio and the target aspect ratio;
obtaining a window matrix according to the target position of the object in the previous frame and the prediction characteristic diagram;
dividing the window matrix by the absolute difference matrix to obtain a target window matrix;
combining the preliminary probabilities corresponding to each predicted position to obtain a probability matrix;
and multiplying the target window matrix by the probability matrix to obtain a target probability matrix, wherein the target probability matrix comprises a target probability corresponding to each position.
In a third aspect, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the target tracking method as described above when executing the computer program.
In a fourth aspect, a computer readable storage medium is provided, in which computer program instructions are stored, which, when read and executed by a processor, perform the steps of the object tracking method as described above.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
FIG. 1 is a schematic flow chart illustrating an implementation of a target tracking method in an embodiment of the present application;
fig. 2 is an example of correlation calculation according to a current color feature and a previous color feature in the embodiment of the present application;
FIG. 3 is a schematic diagram illustrating a structure of a target tracking apparatus according to an embodiment of the present disclosure;
fig. 4 is a block diagram of an internal structure of a computer device in the embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In one embodiment, a target tracking method is provided. The execution subject of the target tracking method according to the embodiment of the present invention is a computer device capable of implementing the target tracking method according to the embodiment of the present invention, and the computer device may include, but is not limited to, a terminal and a server. The terminal comprises a desktop terminal and a mobile terminal, wherein the desktop terminal comprises but is not limited to a desktop computer and a vehicle-mounted computer; mobile terminals include, but are not limited to, cell phones, tablets, laptops, and smartwatches. The server includes a high performance computer and a cluster of high performance computers.
In one embodiment, as shown in fig. 1, there is provided a target tracking method, including:
step 100, acquiring a current color image and a current infrared image corresponding to a current frame, and a previous color image and a previous infrared image corresponding to a previous frame.
The current frame is a frame of the video at the current time, and it is understood that the video is composed of consecutive images of one frame and one frame, and during the target tracking, the position of the target in each frame needs to be determined, or the position of the target in an interval frame needs to be determined, where the interval frame refers to a frame with a certain time interval, for example, the time interval is 0.5 seconds, that is, the position of the target needs to be determined every 0.5 seconds, and the target is tracked.
The current color image is a color image corresponding to the current frame, wherein the color image comprises an RGB image; the current infrared image is an infrared image corresponding to the current frame, wherein the infrared image is a visible image obtained by receiving infrared radiation from a target and an environment by a thermal infrared imager and converting invisible radiation into visible radiation through photoelectric conversion; the previous color image is the color image corresponding to the previous frame; and the previous infrared image is the infrared image corresponding to the previous frame. The current frame and the previous frame have a certain correlation in frame number, for example, the current frame and the previous frame are two continuous frames, so the frame sequence difference is 1; as another example, the current frame and the previous frame constitute a pair of interval frames, for example, the time interval is 0.5 seconds, the frame number of 0.5 seconds is 15 frames, and thus the frame order of the current frame minus the frame order of the previous frame is equal to 15, i.e., the frame order difference is 15.
And 200, performing fusion processing on the current color image, the current infrared image, the previous color image and the previous infrared image to obtain fusion characteristics.
The fusion characteristic is a characteristic of fusing color information and infrared information.
Step 300, inputting the fusion characteristics into a prediction network to obtain a prediction characteristic map, wherein the prediction characteristic map comprises a plurality of prediction positions of the object in the current frame and a preliminary probability corresponding to each prediction position.
In order to prevent confusion with the concepts of "target probability", "target position", and the like, in the embodiment of the present invention, the target to be tracked in the image is also referred to as an object, where the object includes a foreground in the image, and the object may be a human, an animal, a plant, or an object, and is not specifically limited herein.
The prediction network is a network capable of predicting the position of an object in an image. For example, the predicted network is a network constructed based on Yolo or fast-RCNN.
And the prediction characteristic map is a prediction matrix output by the prediction network, and records a plurality of prediction positions of the object in the current frame and a preliminary probability corresponding to each prediction position, wherein the preliminary probability refers to the probability of the object at the prediction position, and the prediction position can be one of the centers of the image areas. Assuming that the size of the color image is 640 × 640 × 3, the color image is divided into 32 × 32 image regions, and then, the size of the prediction feature map may be 20 × 20 × 5, where 5 refers to 5 channels, the first channel records the preliminary probability that the object is at the center of a certain image region (20 × 20=400 image regions in total, and then 400 centers), and the 2 nd to 5 th channels record the distance from the center to the four sides of the prediction frame.
Step 400, obtaining a target probability corresponding to each predicted position according to the target position of the object in the previous frame, the plurality of predicted positions of the object in the current frame and the preliminary probability corresponding to each predicted position.
The target probability is the final probability of each predicted position, i.e., the probability that is ultimately used to achieve the target position prediction. When the target position of the predicted object in the current frame is determined, the preliminary probability is not directly adopted, but the target position of the object in the previous frame is used as a priori position, and it can be understood that the distance between the previous frame and the current frame in time is very small, so that even if the object is a moving object, the object does not generate a large displacement, and therefore the priori position can give a certain reference to the target position of the object in the current frame, and therefore the preliminary probability can be updated according to the target position of the object in the previous frame to obtain the target probability.
And 500, acquiring a maximum target probability from the target probability corresponding to each prediction position, and taking the prediction position corresponding to the maximum target probability as the target position of the object in the current frame.
And the maximum target probability is the maximum target probability in the target probabilities, and the predicted position corresponding to the maximum target probability is used as the target position of the object in the current frame, so that the target position of the object in the current frame is determined.
Firstly, acquiring a current color image and a current infrared image corresponding to a current frame, and a previous color image and a previous infrared image corresponding to a previous frame; performing fusion processing on the current color image, the current infrared image, the previous color image and the previous infrared image to obtain fusion characteristics; then, inputting the fusion characteristics into a prediction network to obtain a prediction characteristic graph, wherein the prediction characteristic graph comprises a plurality of prediction positions of the object in the current frame and a preliminary probability corresponding to each prediction position; obtaining a target probability corresponding to each predicted position according to a target position of the object in a previous frame, a plurality of predicted positions of the object in a current frame and a preliminary probability corresponding to each predicted position; and finally, acquiring the maximum target probability from the target probability corresponding to each predicted position, and taking the predicted position corresponding to the maximum target probability as the target position of the object in the current frame. Therefore, when the target is tracked, more target information can be utilized, the tracking strength is increased to a certain extent, further, the target probability corresponding to each predicted position is obtained according to the target position of the object in the previous frame, and therefore when the target position of the object in the current frame is determined, prior information is also utilized, namely the target position of the object in the previous frame, and due to the prior information, the probability of tracking failure can be reduced to a certain extent.
In one embodiment, the step 200 of performing fusion processing on the current color image, the current infrared image, the previous color image, and the previous infrared image to obtain a fusion feature includes:
step 201, feature extraction is respectively performed on the current color image, the current infrared image, the previous color image and the previous infrared image to obtain a current color feature, a current infrared feature, a previous color feature and a previous infrared feature.
Firstly, training a feature extraction network, and then using the trained feature extraction network to perform feature extraction on a current color image, a current infrared image, a previous color image and a previous infrared image to obtain a current color feature, a current infrared feature, a previous color feature and a previous infrared feature. For example, the feature extraction network is VGG or Resnet.
And 202, fusing according to the current color feature, the current infrared feature, the previous color feature and the previous infrared feature to obtain a fused feature.
Since the current color feature, the current infrared feature, the previous color feature, and the previous infrared feature are already obtained, these features are fused to obtain a fused feature. For example, the current color feature, the current infrared feature, the previous color feature, and the previous infrared feature are summed to obtain a summed feature, and the summed feature is divided by 4 to obtain a fused feature.
The above embodiments illustrate how the fusion features are obtained, that is, feature extraction is performed first, and then fusion is performed according to the extracted features, so as to obtain the fusion features.
In one embodiment, the step 202 of fusing according to the current color feature, the current infrared feature, the previous color feature and the previous infrared feature to obtain a fused feature includes:
step 202A, a correlation calculation is performed on the current color feature and the previous color feature to obtain a color correlation.
The relevance is the relevance between the features, it can be understood that the target tracking is to track the same target, the essence of realizing the target tracking is to realize the tracking of the features, it can be understood that if the features of the two targets are completely the same, the two targets have the same probability, therefore, in order to obtain the fusion features, the relevance between the features can be calculated first, and then the fusion features for predicting the position can be obtained according to the relevance. The color correlation is the correlation between color images.
In one example, the current color feature and the previous color feature are both 3 × 3 × 2 in size, and the previous color feature is used as a convolution kernel to perform correlation calculation, as shown in fig. 2, thereby obtaining a color correlation of 3 × 3 × 2 in size. It can be seen that the dimensions of the current color feature, the previous color feature and the color correlation are the same, i.e. if the size of the current color feature is m × n × d, the size of the previous color feature and the color correlation is m × n × d.
And step 202B, performing correlation calculation on the current infrared characteristic and the previous infrared characteristic to obtain the infrared correlation.
The infrared correlation is the correlation between infrared images, and the calculation method of the infrared correlation is the same as the calculation method of the color correlation, and the previous infrared feature is also used as a convolution kernel, which is not described in detail herein.
And step 202C, averaging the color correlation and the infrared correlation to obtain fusion characteristics.
Assuming that the color correlation is a (matrix) and the infrared correlation is B (matrix), then the fused feature is (a + B)/2, continuing with the above example, the fused feature also has dimensions of 3 × 3 × 2.
In the above embodiment, the fusion feature is obtained by averaging the color correlation and the infrared correlation, and the color feature and the infrared feature between two frames before and after the two frames can be mined to a certain extent.
In one embodiment, the step 202 of fusing according to the current color feature, the current infrared feature, the previous color feature and the previous infrared feature to obtain a fused feature includes:
step 202a, performing correlation calculation on the current color feature and the previous color feature to obtain color correlation.
Here, the calculation of the color correlation is the same as the calculation of the color correlation in step 202A, and is not described in detail here.
And step 202b, performing correlation calculation on the current infrared characteristic and the previous infrared characteristic to obtain the infrared correlation.
Here, the calculation of the infrared correlation is the same as the calculation of the color correlation in step 202A, and is not described in detail here.
And step 202c, performing correlation calculation on the current color feature and the previous infrared feature to obtain color infrared correlation.
The color infrared correlation, which is the correlation between the current color image and the previous infrared image, is calculated in the same manner as the color correlation in step 202A, and will not be described in detail here.
And step 202d, performing correlation calculation on the current infrared characteristic and the previous color characteristic to obtain infrared color correlation.
The infrared color correlation, which is the correlation between the current infrared image and the previous color image, is calculated in the same manner as the color correlation in step 202A, and will not be described in detail here.
And step 202e, averaging the color correlation, the infrared correlation, the color infrared correlation and the infrared color correlation to obtain a fusion characteristic.
Assuming that the color correlation is a (matrix), the infrared correlation is B (matrix), the color infrared correlation is C (matrix), and the infrared color correlation is D (matrix), then the fusion feature is (a + B + C + D)/4, continuing with the above example, the fusion feature also has dimensions of 3 × 3 × 2.
In the embodiment, the fusion features are obtained by averaging the color correlation, the infrared correlation, the color infrared correlation and the infrared color correlation, and features among different types of images can be sufficiently mined, so that the prediction rate is improved.
In one embodiment, the performing feature extraction on the current color image, the current infrared image, the previous color image, and the previous infrared image to obtain a current color feature, a current infrared feature, a previous color feature, and a previous infrared feature in step 201 includes:
step 201A, inputting the current color image and the previous color image into a color feature extraction network respectively to obtain the current color feature and the previous color feature.
The color feature extraction network, which is a network for extracting features from color images, may include, but is not limited to VGG or Resnet, and the training samples used in the training process are color images. In one example, the dimensions of the input (color image) of the color feature extraction network are 640 × 640 × 3, and the dimensions of the output (color feature) of the color feature extraction network are 20 × 20 × 5.
Step 201B, inputting the current infrared image and the previous infrared image into an infrared feature extraction network respectively to obtain the current infrared feature and the previous infrared feature.
The infrared feature extraction network is a network for extracting features from infrared images, and may include, but is not limited to VGG or Resnet, and the training samples used in the training process are infrared images. In one example, the dimension of the input (infrared image) of the infrared feature extraction network is 640 × 640 × 1, and the dimension of the output (infrared feature) of the infrared feature extraction network is 20 × 20 × 5.
In the above embodiment, since the infrared image and the color image have a great difference, if only one feature extraction network is used to extract the features of the infrared image and the features of the color image at the same time, the feature extraction effect may not be good, and thus, for the infrared image and the color image, the feature extraction networks are trained to extract the features respectively.
In one embodiment, the step 200 of performing fusion processing on the current color image, the current infrared image, the previous color image, and the previous infrared image to obtain a fusion feature includes:
and step 20A, fusing the current color image and the current infrared image to obtain a current fused image.
The current color image is a three-channel image which is respectively an R channel, a G channel and a B channel, the size is 640 multiplied by 3, the current infrared image is a single-channel image, the size is 640 multiplied by 1, and 4 channels are subjected to channel combination to obtain a current fusion image with the size of 640 multiplied by 4.
And step 20B, fusing the previous color image and the previous infrared image to obtain a previous fused image.
The former color image is a three-channel image which is respectively an R channel, a G channel and a B channel, the size is 640 multiplied by 3, the former infrared image is a single-channel image, the size is 640 multiplied by 1, and 4 channels are subjected to channel combination to obtain a former fusion image with the size of 640 multiplied by 4.
And 20C, respectively extracting the features of the current fusion image and the previous fusion image to obtain the current fusion feature and the previous fusion feature.
The feature extraction model is trained in advance, and is used for performing feature extraction on the current fusion image and the previous fusion image, for example, the feature extraction model is obtained by training according to VGG or Resnet, and the dimensions of the current fusion feature and the previous fusion feature are 20 × 20 × 5.
And 20D, calculating the correlation of the current fusion feature and the previous fusion feature to obtain the fusion feature.
The calculation method of the correlation between the current fusion feature and the previous fusion feature is the same as the calculation method of the color correlation, and will not be described in detail here.
In the above embodiment, the color images and the infrared images with the same frame sequence are fused to obtain the current fused image and the previous fused image, then the current fused image and the previous fused image are respectively subjected to feature extraction to obtain the current fused feature and the previous fused feature, and finally the current fused feature and the previous fused feature are subjected to correlation calculation to obtain the fused feature.
In one embodiment, the step 400 of obtaining a target probability corresponding to each predicted position according to a target position of an object in a previous frame, a plurality of predicted positions of the object in a current frame, and a preliminary probability corresponding to each predicted position includes:
step 401, a plurality of prediction aspect ratios are calculated according to a plurality of prediction positions of the object in the current frame.
Since the size of the prediction feature map is 20 × 20 × 5, that is, there are 400 prediction positions in total, each prediction position corresponds to one prediction frame, each prediction frame corresponds to one length and one width, and since the distance from the center of the image region to the four sides of the prediction frame is recorded from the 2 nd channel to the 5 th channel, the length of the prediction frame and the width of the prediction frame can be determined according to the values from the 2 nd channel to the 5 th channel, and then the length of the prediction frame is divided by the width of the prediction frame, so that the prediction aspect ratio corresponding to the prediction frame can be obtained, thereby obtaining 400 prediction aspect ratios.
Step 402, a target aspect ratio is calculated according to the target position of the object in the previous frame.
The target aspect ratio is the ratio of the length to the width of a target frame corresponding to the target position of the object in the previous frame. The target position of the object in the previous frame is determined, and therefore the target aspect ratio may also be calculated according to the method of step 401.
And 403, calculating to obtain an absolute difference matrix according to each predicted aspect ratio and the target aspect ratio.
And subtracting the target aspect ratio from the predicted aspect ratio to obtain an aspect ratio difference value, and calculating the absolute value of the aspect ratio difference value to obtain an absolute difference value, wherein the predicted aspect ratio has 400, and the target aspect ratio has 1, so that 400 absolute difference values can be calculated to obtain an absolute difference value matrix with the size of 20 × 20 × 1.
And step 404, obtaining a window matrix according to the target position of the object in the previous frame and the prediction characteristic diagram.
Mapping the target position of the object in the previous frame onto the prediction feature map to obtain a mapping position, for example, the size of the previous color image is 640 × 640 × 3, the size of the prediction feature map is 20 × 20 × 5, the target position (the center point of the target frame) of the object in the previous frame is (56,56), mapping (56,56) onto the prediction feature map, the mapping position is (1,1), and the mapping position is assumed to be (a, b); creating a window space matrix of a size of 20 × 20 × 1 in which an initial value at each position is 0; for a target point at position (i, j) in the window space matrix, 0.5 × [1-cos (2 × pi × x/(M +1)) ] × 0.5 × [1-cos (2 × pi × y/(M +1)) ], and the calculation result of 0.5 × [1-cos (2 × pi × x/(M +1)) ] × 0.5 × [1-cos (2 × pi × y/(M +1)) ] is used as the value of the target point at position (i, j) in the window space matrix, where | i-a | = x, | j-b | = y, M =20, until the processing is completed for all points at 20 × 20 positions in the window space matrix, so that the window matrix can be obtained.
Step 405, the window matrix is divided by the absolute difference matrix to obtain a target window matrix.
The size of the window matrix is 20 × 20, the size of the absolute difference matrix is 20 × 20, and dividing the window matrix by the absolute difference matrix means dividing the value of the corresponding position to obtain the target window matrix.
And 406, combining the preliminary probabilities corresponding to each prediction position to obtain a probability matrix.
The size of the probability matrix is 20 × 20 × 1, and the value at each position is a preliminary probability.
Step 407, multiplying the target window matrix by the probability matrix to obtain a target probability matrix, where the target probability matrix includes a target probability corresponding to each position.
And multiplying the target window matrix by the probability matrix, namely multiplying the corresponding position, so as to obtain the target probability matrix, namely updating the preliminary probability.
According to the embodiment, the initial probability of the predicted position is updated, and the target probability corresponding to the predicted position is obtained, so that the accuracy of position prediction is improved.
In one embodiment, there is provided a target tracking apparatus 300 comprising:
an obtaining module 301, configured to obtain a current color image and a current infrared image corresponding to a current frame, and a previous color image and a previous infrared image corresponding to a previous frame;
a fusion module 302, configured to perform fusion processing on the current color image, the current infrared image, the previous color image, and the previous infrared image to obtain fusion characteristics;
the prediction module 303 is configured to input the fusion feature into a prediction network to obtain a prediction feature map, where the prediction feature map includes multiple prediction positions of the object in the current frame and a preliminary probability corresponding to each prediction position;
a target module 304, configured to obtain a target probability corresponding to each predicted position according to a target position of the object in a previous frame, a plurality of predicted positions of the object in a current frame, and a preliminary probability corresponding to each predicted position;
a location module 305, configured to obtain a maximum target probability from the target probabilities corresponding to each predicted location, and use the predicted location corresponding to the maximum target probability as a target location of the object in the current frame.
In an embodiment, the fusion module 302 is specifically configured to:
respectively extracting the characteristics of the current color image, the current infrared image, the previous color image and the previous infrared image to obtain a current color characteristic, a current infrared characteristic, a previous color characteristic and a previous infrared characteristic;
and fusing according to the current color feature, the current infrared feature, the previous color feature and the previous infrared feature to obtain a fused feature.
In an embodiment, the fusion module 302 is specifically configured to:
calculating the correlation degree of the current color characteristic and the previous color characteristic to obtain the color correlation degree;
calculating the relevance of the current infrared characteristic and the previous infrared characteristic to obtain the infrared relevance;
and averaging the color correlation and the infrared correlation to obtain fusion characteristics.
In an embodiment, the fusion module 302 is specifically configured to:
calculating the correlation degree of the current color characteristic and the previous color characteristic to obtain the color correlation degree;
calculating the relevance of the current infrared characteristic and the previous infrared characteristic to obtain the infrared relevance;
calculating the relevance of the current color feature and the previous infrared feature to obtain the color infrared relevance;
calculating the correlation degree of the current infrared characteristic and the previous color characteristic to obtain the infrared color correlation degree;
and averaging the color correlation, the infrared correlation, the color infrared correlation and the infrared color correlation to obtain the fusion characteristic.
In an embodiment, the fusion module 302 is specifically configured to:
inputting the current color image and the previous color image into a color feature extraction network respectively to obtain a current color feature and a previous color feature;
and respectively inputting the current infrared image and the previous infrared image into an infrared feature extraction network to obtain the current infrared feature and the previous infrared feature.
In an embodiment, the fusion module 302 is specifically configured to:
fusing the current color image and the current infrared image to obtain a current fused image;
fusing the previous color image and the previous infrared image to obtain a previous fused image;
respectively extracting the features of the current fusion image and the previous fusion image to obtain the current fusion feature and the previous fusion feature;
and calculating the relevance of the current fusion feature and the previous fusion feature to obtain the fusion feature.
In one embodiment, the target module 304 is specifically configured to:
calculating to obtain a plurality of prediction length-width ratios according to a plurality of prediction positions of the object in the current frame;
calculating a target aspect ratio according to a target position of the object in a previous frame;
calculating to obtain an absolute difference matrix according to each predicted aspect ratio and the target aspect ratio;
obtaining a window matrix according to the target position of the object in the previous frame and the prediction characteristic diagram;
dividing the window matrix by the absolute difference matrix to obtain a target window matrix;
combining the preliminary probabilities corresponding to each predicted position to obtain a probability matrix;
and multiplying the target window matrix by the probability matrix to obtain a target probability matrix, wherein the target probability matrix comprises a target probability corresponding to each position.
In one embodiment, as shown in fig. 4, a computer device is provided, which may be a terminal or a server in particular. The computer device comprises a processor, a memory and a network interface which are connected through a system bus, wherein the memory comprises a nonvolatile storage medium and an internal memory, the nonvolatile storage medium of the computer device stores an operating system and also stores a computer program, and when the computer program is executed by the processor, the processor can realize the target tracking method. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM). The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform a target tracking method. Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
The object tracking method provided by the present application may be implemented in the form of a computer program that is executable on a computer device as shown in fig. 4. The memory of the computer device may store therein the various program templates that make up the target tracking apparatus. Such as an acquisition module 301, a fusion module 302, and a prediction module 303.
A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of:
acquiring a current color image and a current infrared image corresponding to a current frame, and a previous color image and a previous infrared image corresponding to a previous frame;
performing fusion processing on the current color image, the current infrared image, the previous color image and the previous infrared image to obtain fusion characteristics;
inputting the fusion characteristics into a prediction network to obtain a prediction characteristic graph, wherein the prediction characteristic graph comprises a plurality of prediction positions of an object in a current frame and a preliminary probability corresponding to each prediction position;
obtaining a target probability corresponding to each predicted position according to a target position of the object in a previous frame, a plurality of predicted positions of the object in a current frame and a preliminary probability corresponding to each predicted position;
and acquiring the maximum target probability from the target probability corresponding to each predicted position, and taking the predicted position corresponding to the maximum target probability as the target position of the object in the current frame.
In one embodiment, a computer readable storage medium is provided, storing a computer program that, when executed by a processor, causes the processor to perform the steps of:
acquiring a current color image and a current infrared image corresponding to a current frame, and a previous color image and a previous infrared image corresponding to a previous frame;
performing fusion processing on the current color image, the current infrared image, the previous color image and the previous infrared image to obtain fusion characteristics;
inputting the fusion characteristics into a prediction network to obtain a prediction characteristic graph, wherein the prediction characteristic graph comprises a plurality of prediction positions of an object in a current frame and a preliminary probability corresponding to each prediction position;
obtaining a target probability corresponding to each predicted position according to a target position of the object in a previous frame, a plurality of predicted positions of the object in a current frame and a preliminary probability corresponding to each predicted position;
and acquiring the maximum target probability from the target probability corresponding to each predicted position, and taking the predicted position corresponding to the maximum target probability as the target position of the object in the current frame.
It should be noted that the above-mentioned target tracking method, target tracking apparatus, computer device and computer readable storage medium belong to a general inventive concept, and the contents in the embodiments of the target tracking method, target tracking apparatus, computer device and computer readable storage medium may be mutually applicable.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

CN202111519551.5A2021-12-142021-12-14Target tracking method, device, equipment and storage mediumActiveCN113920172B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111519551.5ACN113920172B (en)2021-12-142021-12-14Target tracking method, device, equipment and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111519551.5ACN113920172B (en)2021-12-142021-12-14Target tracking method, device, equipment and storage medium

Publications (2)

Publication NumberPublication Date
CN113920172Atrue CN113920172A (en)2022-01-11
CN113920172B CN113920172B (en)2022-03-01

Family

ID=79249133

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111519551.5AActiveCN113920172B (en)2021-12-142021-12-14Target tracking method, device, equipment and storage medium

Country Status (1)

CountryLink
CN (1)CN113920172B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101236599A (en)*2007-12-292008-08-06浙江工业大学 Face recognition detection device based on multi-camera information fusion
CN101404086A (en)*2008-04-302009-04-08浙江大学Target tracking method and device based on video
CN104217428A (en)*2014-08-222014-12-17南京邮电大学Video monitoring multi-target tracking method for fusion feature matching and data association
CN106327526A (en)*2016-08-222017-01-11湖南挚新科技发展有限公司Image object tracking method and image object tracking system
CN108062756A (en)*2018-01-292018-05-22重庆理工大学Image, semantic dividing method based on the full convolutional network of depth and condition random field
CN110457757A (en)*2019-07-162019-11-15江西理工大学 Prediction method and device of rock mass instability stage based on multi-feature fusion
CN110796093A (en)*2019-10-302020-02-14上海眼控科技股份有限公司Target tracking method and device, computer equipment and storage medium
CN110992401A (en)*2019-11-252020-04-10上海眼控科技股份有限公司Target tracking method and device, computer equipment and storage medium
CN111275741A (en)*2020-01-192020-06-12北京迈格威科技有限公司Target tracking method and device, computer equipment and storage medium
CN111402306A (en)*2020-03-132020-07-10中国人民解放军32801部队Low-light-level/infrared image color fusion method and system based on deep learning
CN111754546A (en)*2020-06-182020-10-09重庆邮电大学 A target tracking method, system and storage medium based on multi-feature map fusion
CN112308883A (en)*2020-11-262021-02-02哈尔滨工程大学 A multi-vessel fusion tracking method based on visible light and infrared images
CN112541441A (en)*2020-12-162021-03-23江南大学GM-PHD video multi-target tracking method fusing related filtering

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101236599A (en)*2007-12-292008-08-06浙江工业大学 Face recognition detection device based on multi-camera information fusion
CN101404086A (en)*2008-04-302009-04-08浙江大学Target tracking method and device based on video
CN104217428A (en)*2014-08-222014-12-17南京邮电大学Video monitoring multi-target tracking method for fusion feature matching and data association
CN106327526A (en)*2016-08-222017-01-11湖南挚新科技发展有限公司Image object tracking method and image object tracking system
CN108062756A (en)*2018-01-292018-05-22重庆理工大学Image, semantic dividing method based on the full convolutional network of depth and condition random field
CN110457757A (en)*2019-07-162019-11-15江西理工大学 Prediction method and device of rock mass instability stage based on multi-feature fusion
CN110796093A (en)*2019-10-302020-02-14上海眼控科技股份有限公司Target tracking method and device, computer equipment and storage medium
CN110992401A (en)*2019-11-252020-04-10上海眼控科技股份有限公司Target tracking method and device, computer equipment and storage medium
CN111275741A (en)*2020-01-192020-06-12北京迈格威科技有限公司Target tracking method and device, computer equipment and storage medium
CN111402306A (en)*2020-03-132020-07-10中国人民解放军32801部队Low-light-level/infrared image color fusion method and system based on deep learning
CN111754546A (en)*2020-06-182020-10-09重庆邮电大学 A target tracking method, system and storage medium based on multi-feature map fusion
CN112308883A (en)*2020-11-262021-02-02哈尔滨工程大学 A multi-vessel fusion tracking method based on visible light and infrared images
CN112541441A (en)*2020-12-162021-03-23江南大学GM-PHD video multi-target tracking method fusing related filtering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
廉兴科 等: "点状交叉运动目标的概率关联复合跟踪算法", 《计算机工程与应用》*
张玲等: "基于交互式多模型的双站高频地波雷达机动目标跟踪算法", 《中国海洋大学学报(自然科学版)》*

Also Published As

Publication numberPublication date
CN113920172B (en)2022-03-01

Similar Documents

PublicationPublication DateTitle
US12165290B1 (en)Image processing method and apparatus
CN109858333B (en)Image processing method, image processing device, electronic equipment and computer readable medium
CN112949507A (en)Face detection method and device, computer equipment and storage medium
US20210365726A1 (en)Method and apparatus for object detection in image, vehicle, and robot
CN110222726A (en)Image processing method, device and electronic equipment
CN111369557B (en)Image processing method, device, computing equipment and storage medium
CN111179309A (en)Tracking method and device
WO2023019444A1 (en)Optimization method and apparatus for semantic segmentation model
US11948287B2 (en)Image processing method and system
CN114419070A (en)Image scene segmentation method, device, equipment and storage medium
CN113920172B (en)Target tracking method, device, equipment and storage medium
CN117201708B (en)Unmanned aerial vehicle video stitching method, device, equipment and medium with position information
CN112601029A (en)Video segmentation method, terminal and storage medium with known background prior information
CN114445851B (en)Video-based talk scene anomaly detection method, terminal device and storage medium
CN111914850A (en)Picture feature extraction method, device, server and medium
CN112308094B (en)Image processing method and device, electronic equipment and storage medium
CN112581365B (en)Cross-scale adaptive information mapping imaging method, device and medium
CN112016571B (en)Feature extraction method and device based on attention mechanism and electronic equipment
CN115631510A (en)Pedestrian re-identification method and device, computer equipment and storage medium
CN114821513A (en)Image processing method and device based on multilayer network and electronic equipment
CN115063803A (en)Image processing method, image processing device, storage medium and electronic equipment
CN110705336B (en)Image processing method, system, electronic device and readable storage medium
CN115705617A (en)Remote photo-combination method, device, equipment and storage medium
CN116740481B (en) Selective inheritance learning method, device and electronic device for target counting
CN120032335B (en) Semantic occupancy grid prediction method, electronic device and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp