CN110751674A

Movatterモバイル変換

Info

Publication number: CN110751674A
Application number: CN201810821668.0A
Authority: CN
Inventors: 刘吉
Original assignee: Beijing Shenjian Intelligent Technology Co Ltd
Current assignee: Xilinx Technology Beijing Ltd
Priority date: 2018-07-24
Filing date: 2018-07-24
Publication date: 2020-02-04

Abstract

A multi-target tracking method and a corresponding video analysis system implementing the scheme are disclosed. The method comprises the following steps: acquiring detection position information of a target; acquiring predicted position information of a target; matching the acquired predicted position information with the detected position information; updating state information of a certain target based on a detected position of the target in a case where it is determined that the detected position information of the certain target matches predicted position information of the certain target; determining that a target is a disappearing target in a case where it is determined that the predicted position of the target does not have a detected position matching therewith; and in the case where it is determined that the detected position of a certain target does not have a detected position matching therewith, determining that the target is a new target or a newly appearing target. Therefore, through reasonable allocation of the functional modules, the high-speed tracking of low computation power can be realized, and meanwhile, the high tracking precision requirement can be met.

Description

Multi-target tracking method and corresponding video analysis system

Technical Field

The invention relates to the field of image processing, in particular to a multi-target tracking method and a corresponding video analysis system.

Background

Target detection and tracking has been an important research direction in academia and industry. For example, a video monitoring system, as an important component of smart security and smart traffic in the application of the internet of things facing the urban public safety integrated management, faces a great challenge of deep application. Moreover, object detection and tracking has tremendous utility and potential implications in areas such as vehicle-assisted driving, transportation, and gaming.

With the rapid development of target detection algorithms and target attribute analysis algorithms in recent years, the accuracy of target detection and attribute analysis is higher and higher, but the calculation amount is also larger and larger. When the algorithms need to be locally deployed at embedded terminals such as cameras and unmanned aerial vehicles for the sake of real-time performance and security, the calculation amount of the algorithms is limited to a smaller order of magnitude due to the power consumption limitation of the embedded terminals.

Therefore, how to realize accurate multi-target tracking under the condition of meeting the power consumption limitation becomes a big problem to be solved in the field of target tracking.

Disclosure of Invention

In view of at least one of the above problems, the present invention provides a multi-target tracking scheme and a corresponding video analysis system implementing the same, which can achieve high-speed tracking with low computational power and meet high tracking accuracy requirements through reasonable deployment of each functional module.

According to one aspect of the invention, a multi-target tracking method is provided, which comprises the following steps: acquiring detection position information of a target; acquiring predicted position information of a target; matching the acquired predicted position information with the detected position information; updating state information of a certain target based on a detected position of the target in a case where it is determined that the detected position information of the certain target matches predicted position information of the certain target; determining that a target is a disappearing target in a case where it is determined that the predicted position of the target does not have a detected position matching therewith; and in the case where it is determined that the detected position of a certain target does not have a detected position matching therewith, determining that the target is a new target or a newly appearing target.

Thus, by matching prediction and detection and subsequent processing steps, various situations (e.g., object loss, reappearance, new object, etc.) encountered in object detection are flexibly coped with, and by reasonable cooperation of the detection, tracking and analysis modules, the amount of computation required to correctly display the object (and its attributes) is reduced while ensuring video analysis accuracy.

Acquiring the detected position information of the target may include: the currently input image frame is subject to object detection to obtain detection position information of at least one object, and the currently input image frame for subject detection may be preferably selected at intervals from consecutive video image frames according to a predetermined rule. Thereby, the amount of computation required for video analysis is further reduced by reducing the detection requirements.

Obtaining the predicted location information of the target may include: obtaining predicted position information of a certain object based on previous state information of the object detected in a previous frame; preferably, the target may be modeled based on previous state information of the target to predict the position of the target in the current frame based on the state model of the target. Therefore, the prediction of each target position in the subsequent frame can be realized through more efficient model calculation. Further, in a case where it is determined that the detected position information of the certain object matches the predicted position information of the certain object, updating the position state information of the object using the detected position of the object includes: the state model of the object is corrected using the detected position of the object. Therefore, the prediction accuracy of the prediction model is improved by introducing a correction mechanism.

The previous state information of a certain target may include: the position, velocity, acceleration, deformation velocity, and/or template information of the object. Modeling based on prior state information for the target includes: modeling the target using at least one of: a Kalman (Kalman) filter; a linear filter; a nuclear correlation filtering (KCF) tracker; a mean shift (MeanShift) tracker; and a continuous adaptive mean shift (Camshift) tracker, and the type of state information is determined based on the particular model used. Therefore, the required prediction mechanism can be flexibly selected according to each application occasion.

The multi-target tracking method of the invention further comprises the following steps: storing all active targets as a target state list in a manner that the target numbers and the state information correspond to each other, and acquiring the predicted position information of the targets includes: deriving predicted location information for each active target in the target status list based on previous status information for that target, and updating the status information for the target based on the detected location of the target comprises: and updating the state information of the target entry based on the detection position of the target. Therefore, the change of the target state is comprehensively managed by introducing the state list, and the operating efficiency of the multi-target tracking frame is further improved.

Matching the acquired predicted location information and the detected location information may include: and taking the predicted target rectangular frame and the detected target rectangular frame which have the highest coincidence degree and are higher than a preset threshold value as a matching judgment standard. In addition, acquiring the detected position information of the target includes acquiring position information of a detected rectangular frame surrounding the target, and acquiring the predicted position information of the target includes acquiring position information of a predicted rectangular frame surrounding the target and number information of the target. When it is determined that the detected position information of a certain object matches the predicted position information of the certain object, updating the position state information of the object using the detected position of the object includes: assigning the number of the matched prediction target to the matched detection target, and updating the state information of the encoding target using the matched detection target.

The multi-target tracking method of the invention may further comprise: based on the matching result for a certain target, determining the marking state in the display image frame in the target, specifically, the method may include: when the continuous matching times or the matching frequency of the predicted position and the detected position of a certain target exceed a first threshold value, marking the target in a currently displayed image frame; and canceling the marking of the previously marked object in the currently displayed image frame in the case where it is determined that the number of times that the predicted position of a certain object is continuously free from the detected position matching therewith exceeds a second threshold value. Thereby improving the robustness of correctly displaying the appearing objects and hiding the missing objects.

In a case where it is determined that the predicted position of a certain target does not have a detected position matching therewith, determining the target as an extinction target includes: in the case where the number of times of determining that the predicted position of a certain target does not continuously have a detected position matching therewith exceeds the third threshold value, the target is determined to be a disappearing target. And in the case where it is determined that the predicted position of a certain target does not have a detected position matching therewith and the target has not been determined as a disappearing target, using the position of the target acquired by the tracker as the output position of the target. The tracker may be any one of the following: a nuclear correlation filtering (KCF) tracker; a mean shift (MeanShift) tracker; and a continuous adaptive mean shift (Camshift) tracker.

In a case where it is determined that the predicted position of a certain target does not have a detected position matching therewith, determining the target as an extinction target includes: storing the target number and the target characteristics of the casualty target into an casualty target list; and deleting the corresponding entry of the death target from a target state list, wherein the target state list comprises all the activity targets stored in a mode of corresponding target numbers and state information. Therefore, the scheduling efficiency of the multi-target tracking framework is further improved by combining the extinction list and the state list.

In the case where it is determined that the detected position of a certain target does not have a detected position matching therewith, determining that the target is a new target or a reappeared target includes: extracting the target feature of a certain target under the condition that the detection position of the target is judged not to have a detection position matched with the detection position; comparing the extracted target feature of the target with the target features stored in the extinction target list; under the condition that the matched target features exist in the casualty target list, the target is judged as a reappeared target, and the number of the matched target features is endowed to the target again; and under the condition that the matched target features do not exist in the death target list, judging the target as a new target, and assigning a new number to the target. Further, the step of unmatched detection may further comprise: deleting the entry corresponding to the reappeared target from the death target list; and/or storing the new target as a new entry into a target state list, wherein the target state list comprises all active targets stored in a manner that the target numbers correspond to the state information.

According to another aspect of the present invention, there is provided a multi-target tracking apparatus including: a plurality of single target trackers for predicting the position of a certain target in the current frame at least based on the previous state information of the target; a matching unit for comparing the target predicted position of the current frame given by each single target tracker with the target detection position of the current frame obtained from the outside; and a multi-target tracking unit for operating the single-target tracker based on the matching result of the matching unit, and further for: the matching unit updates the parameter of the single target tracker corresponding to the target based on the detected position of the target when it is determined that the detected position information of the target matches the predicted position information of the target, deletes the single target tracker corresponding to the target when it is determined that the predicted position of the target does not have a detected position matching the detected position, and creates a new single target tracker for the target when it is determined that the detected position of the target does not have a detected position matching the detected position.

Preferably, the apparatus may further comprise: and a target re-recognizer for, in a case where the matching unit determines that the detected position of a certain target does not have a detected position matching therewith, determining whether the target is a target that has appeared before, and the multi-target tracking unit reconstructs a previous single-target tracker for the target in a case where the target re-recognizer determines that the target is a target that has appeared before. The object re-recognizer also maintains a casual object list storing object numbers and object features determined to be casual, and determines whether the object is a previously-appeared object by comparing the extracted features of the object with the object features stored in the casual object list.

The multi-target tracking unit may also maintain a target state list including all active targets stored in a manner that the target numbers and the state information correspond.

The multi-target tracking device of the present invention may further include: and the display indicating unit is used for judging whether to outwards output an instruction for changing the marking state of the specific target or not based on the matching result of the matching unit for the specific target. The display indication unit may be further configured to: when the matching unit judges that the continuous matching times or the matching frequency of the predicted position and the detected position of a certain target exceeds a first threshold value, an instruction for marking the target in a currently displayed image frame is sent; and issuing an instruction to cancel the marking of a previously marked object in a currently displayed image frame in the case where the number of times the matching unit determines that the predicted position of a certain object is not continuously matched with the detected position exceeds a second threshold value.

The single target tracker may record the number of times that the predicted position of the target corresponding thereto is not continuously matched with the detected position, and the multi-target tracking unit determines that the target is a disappearing target and deletes the single target tracker corresponding to the disappearing target when the number of times of unmatched prediction recorded by the single target tracker exceeds a third threshold.

The single target tracker may acquire a target predicted position for matching with the target detected position using the filter model, and acquire the target predicted position for output using the tracker model in a case where the number of times of the unmatched predictions recorded has not reached a third threshold. The filter model may be established using at least one of the following models: a Kalman (Kalman) filter; and a linear filter, and the tracker model implements at least one of the following model building: a nuclear correlation filtering (KCF) tracker; a mean shift (MeanShift) tracker; and a continuous adaptive mean shift (Camshift) tracker, and the type of state information stored in the target state list is determined based on a model used and includes at least one of: the position, velocity, acceleration, deformation velocity, and/or template information of the object.

According to another aspect of the present invention, there is also provided a video analysis system, including: the frame buffer queue is used for storing the video image frames which are continuously input; the target detection module is used for processing continuous image frames from the frame buffer queue to determine the detection position information of a target contained in the current frame; the target tracking module realized by the multi-target tracking device is used for tracking the position of a moving target, matching the tracking position information of the obtained target with the detection position information of the target obtained by the target detection module, and adjusting the tracking operation based on the matching result; and the target analysis module is used for performing feature extraction operation on a certain target under the condition that the target is determined as a new target according to the detection position which is not matched with the detection position of the target.

The object detection module may select a currently input image frame for object detection at intervals from consecutive video image frames according to a predetermined rule.

The target detection module and the target analysis module are implemented at least in part by a GPU, FPGA, or ASIC circuit capable of performing high-parallelism computations. Preferably, the target detection module and the target analysis module may share at least part of a GPU, FPGA or ASIC circuit capable of performing convolutional neural network computations.

Therefore, the multi-target tracking framework provided by the invention can integrate the work of each module in the video analysis system by introducing the matching of the prediction and the detection position, and further by distinguishing the matching, unmatched detection and unmatched prediction and subsequent distinguishing operation, thereby reducing the calculation requirement and simultaneously keeping the tracking precision. Furthermore, the scheme can also improve the resource scheduling efficiency of the system on the whole by introducing the target state list and matching with the extinction target list, and eliminate unnecessary calculation requirements.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.

Fig. 1 shows the overall framework of a common real-time video structured intelligent analysis system.

FIG. 2 shows a schematic flow diagram of a multi-target tracking method according to one embodiment of the invention.

FIG. 3 shows a schematic flow diagram of determining that a predicted location matches a detected location, according to one embodiment of the invention.

FIG. 4 shows a schematic flow diagram of determining a detected location with which a predicted location does not match according to one embodiment of the invention.

FIG. 5 illustrates a schematic flow chart diagram of determining a predicted location with which a detected location does not match, according to one embodiment of the present invention.

FIG. 6 illustrates a schematic diagram of a multi-target tracking device, according to one embodiment of the invention.

Fig. 7 shows one example of a SoC that may be used to implement the video analysis system of the present invention.

Detailed Description

Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

At present, the application bottleneck of target detection and tracking is how to efficiently extract video information, and how to perform standard data exchange, interconnection and intercommunication and semantic interoperation with other information systems. To solve this problem, video structured description techniques have been proposed. A video structural description technology is used for transforming the traditional video monitoring system, so that a new generation of intelligent, semantic and informative semantic video monitoring system of the video monitoring system is formed.

The video structured description is a technology for extracting video content information, which adopts processing means such as space-time segmentation, feature extraction, object identification and the like to organize text information which can be understood by a computer and people according to semantic relation. Fig. 1 shows the overall framework of a common real-time video structured intelligent analysis system.

As shown in fig. 1, the real-time video structured intelligent analysis system 20 collects a data stream from a data source 10, which may be a real-time input to a camera or a stored video file. The local system 20 performs structured analysis of the collected data stream and stores the corresponding analysis results in a local or remote database 30.

The real-time video structured intelligent analysis system 20 may include a video codec module 21, a frame buffer 22, and a video analysis module 23. Video codec module 21 encodes or decodes the data stream from data source 10 into specified format frame data. Frame buffer 22 buffers the video frame data for use by video analysis module 23.

The video analysis module 23 may be broadly divided into a target detection module, a target tracking module, and a target recognition and analysis module. The target detection module performs target detection on the input video stream by using a deep learning algorithm, and extracts information such as the position and the category of a target to be analyzed from the frame image. The target tracking module tracks and deduplicates the target output by the target detection module by utilizing a deep learning or traditional algorithm, so that the repeated operation of the target analysis module is avoided, the analysis quality is improved, and the analysis calculation amount is reduced. And the target recognition and analysis module extracts a target sub-image from the frame image according to the output result of the target detection module and analyzes each target by utilizing a deep learning algorithm. The specific analysis content may vary according to different application scenarios, and the common analysis content includes target identification comparison, target attribute analysis, and the like.

In recent years, with the rapid development of target detection algorithms and target attribute analysis algorithms, the accuracy of target detection and attribute analysis is higher and higher, but the calculation amount required by the target detection and attribute analysis is also larger and larger. When the algorithms need to be locally deployed at embedded terminals such as cameras and unmanned aerial vehicles for the sake of real-time performance and security, the calculation amount of the algorithms is limited to a smaller order of magnitude due to the power consumption limitation of the embedded terminals. Therefore, a multi-target tracking algorithm needs to be introduced, so that after a target position is given by a certain frame detection algorithm, the position of each target in a plurality of subsequent frames is given by the tracking algorithm, and meanwhile, the target attribute analysis only analyzes a certain target once in a plurality of continuous frames to obtain the attribute of the target, so that the calculation times of target detection and target attribute analysis are reduced by about one order of magnitude, and an algorithm with high enough precision is deployed under the limited calculation force of an embedded terminal.

In addition, because the current detection algorithm mainly relies on a still picture for training, the situation that a target frame obtained by detection shakes in a plurality of adjacent frames easily occurs in a video scene, and the stability of the detection algorithm is poor, a multi-target tracking algorithm also needs to be introduced, so that the shake of the target frame is reduced, and the stability of the detection algorithm is improved.

However, in the existing video analysis system, the multi-target tracking algorithm needs a target detection result of each frame, and needs to extract target features relatively frequently. These requirements are computationally expensive and difficult to withstand in embedded systems. In order to solve the problems, the invention provides a high-speed multi-target tracking framework, which can realize low-computation high-speed tracking and meet higher tracking precision requirements at the same time by reasonably allocating all functional modules.

FIG. 2 shows a schematic flow diagram of a multi-target tracking method according to one embodiment of the invention. It should be noted that the sequence numbers of the respective steps in the following methods are merely used as a representation of the steps for the convenience of description, and should not be construed as representing the execution order of the respective steps. The method need not be performed in the exact order shown, unless explicitly stated; similarly, blocks may be performed in parallel, rather than sequentially. It should also be understood that the method may be implemented on a variety of devices as well.

As shown in FIG. 2, a multi-target tracking method 200 according to one embodiment of the invention may include the following steps. The method 200 may be performed, for example, by the video analysis module 23 in the video analysis system 20 shown in fig. 1, and more specifically, by a target tracking module in the video analysis module 23. It should be understood that the existing video analytics system 20, if it has the multi-target tracking framework of the present invention deployed, is also a new video analytics system.

In step S210, detected position information of the target is acquired. In step S220, predicted position information of the target is acquired. It should be appreciated that steps S210 and S220 described above may be performed in any relative order, or may be performed simultaneously. Subsequently, in step S230, the acquired predicted position information and the detected position information are matched.

The method 200 gives different processing actions for different matching results. In the case where it is determined that the detected position information of a certain object matches the predicted position information of a certain object, the position state information of the object is updated using the detected position of the object in step S240. If it is determined that the predicted position of a certain target does not have a detected position matching the predicted position, the target is determined to be a disappearing target in step S250. In the case where it is determined that the detected position of a certain object does not have a detected position matching therewith, it is determined in step S260 that the object is a new object or a newly appeared object.

Step S210 may include performing object detection on the currently input image frame to acquire detection position information of at least one object. Here, the object detection may be performed for each image frame that is continuously input, or may be extracted from continuous video input frames at, for example, a predetermined or variable interval. For example, the object detection module in the system of fig. 1 may read image frames at predetermined intervals from the frame buffer 22 and process the image frames to determine the object included in the current image frame and the detected position information of the object. The above-described interval detection can greatly reduce the amount of computation required for video analysis compared to the prior art for which detection is required for each frame. In one embodiment, the object detection module may extract detection position information of a plurality of objects from the processed image frame. The extracted detection position information may preferably be a plurality of rectangular frames surrounding the respective targets.

The target prediction in step S220 may be performed for each activity target. Here, the "moving target" may refer to a target that is detected from a previous image frame and has not yet died out (died out refers to being determined not to be present in the image frame any more). In one embodiment, step S220 may include deriving predicted position information of a certain object based on previous state information of the object detected from a previous frame. Here, "previous frame" may have a flexible reference depending on the specific implementation. When the prediction is made based on the latest state only, the "previous frame" may be the last frame before the current frame in which the target was detected. When the prediction needs to be made based on the historical state of the target, "previous frame" may generally refer to a number of previous frames before the current frame in which the target was detected. When an object first appears (as will be described in detail below in connection with step S260), feature extraction may be performed on the object. For example, the target analysis module shown in fig. 1 may analyze the subgraph in the new target rectangular box using a deep learning algorithm to extract the required features and assign a number to the target. The initial characteristics and number of the new target may then be provided to a target prediction module for location prediction of the target based on the initial state information.

Preferably, the model can be modeled based on previous state information of a certain object to predict the position of the object in the current frame based on the state model of the object. After new detection information is obtained (e.g., after the predicted position and the detected position are matched as determined in step S240), the state model may be modified using the new detection information to provide a smoother motion trajectory.

In practice, one or more models (and their required state information) may be selected from various existing models to model the change in position of a target in a tracking manner, depending on the particular application.

In one embodiment, a Kalman filter may be used to predict and update the position of the target in the subsequent frames using the motion state information of the target. When the Kalman filter is used for prediction, the position of a target in a current frame needs to be predicted by modeling a historical motion track of the target, and after new detection information is obtained, a motion model of the target is corrected to give a smoother motion track. In one embodiment. In order to obtain input meeting the requirement of a Kalman filter, a rectangular frame obtained by detection needs to be processed to obtain four parameters of a horizontal coordinate and a vertical coordinate of a central point of the rectangular frame, the area of the rectangular frame and the width-height ratio of the rectangular frame, and the horizontal coordinate and the vertical coordinate of the rectangular frame and the area are assumed to move at uniform speed, and the width-height ratio is uniform linear motion, so that a 7 x 4-dimensional driving matrix is provided, other basic parameters are determined, and a filter which can be used in actual prediction is obtained. In another embodiment, the prediction may also be performed using a linear filter. The input of the filter can also be the horizontal and vertical coordinates, the area and the aspect ratio of the center point of the rectangular frame, but the least square method is carried out on the historical track during calculation to give the parameters of the filter. In the case of performing target motion trajectory prediction using the filter, the required previous state information may include target position information (for example, the four parameters may be obtained based on coordinates of two opposite corners of a rectangular frame), a target velocity, an acceleration, a deformation velocity, and the like. The above-mentioned state information may be acquired from the detected position information of the object, and optionally from the extracted object feature information.

In another embodiment, the position of the target may be predicted kernel updated with appearance information of the target position using KCF (kernel correlation filtering). KCF is a template matching type tracking method. After an initial position is given (e.g., the position of a certain object within the frame when the object is determined as a newly detected object), the KCF tracker may extract HOG (histogram of gradient) features of the object as template information of the object. In the subsequent frame, the characteristics are extracted by adopting a sliding window (sliding window) method near the position where the target appears in the previous frame, similarity calculation is carried out on the characteristics and the template information, and the position with the maximum approximation degree is found out and used as the position where the target newly appears. After the position of the current frame is determined, the template information is updated by using the new position to acquire new target information of the target. Therefore, in the prediction of the target position using the KCF tracker, the required previous state information may include previous template information of the target, and likewise, the above state information may be acquired from the detected position information of the target and optionally from the extracted target feature information. In other embodiments, the target location may also be predicted using median trackers (MeanShift) and (CamShift), which also use previous template information for prediction.

It should be understood that the target detection position information obtained by the present invention is the target position information obtained by performing the convolutional neural network detection calculation on the current frame, for example, a rectangular frame surrounding each target in the current frame; the target prediction information acquisition in the present invention predicts the target position of the current frame based on the target state information of the previous frame. Such predictions may or may not require computational processing of the subsequent frames themselves, in addition to the state information of the previous frame, based on the model on which the prediction depends. For example, in the case of using a filter (e.g., a kalman filter), a predicted position of the target may be directly obtained by predicting the motion trajectory, and the motion model correction may be continuously performed using the matched detected position. Therefore, the use of the kalman filter eliminates the need for additional processing on the frame itself where the target is predicted. Whereas in the case of using a tracker (e.g., a KCF, MeanShift, or CamShift tracker), in addition to the target template based on the state information of the previous frame, additional processing (e.g., sliding window processing) is required for the frame itself where the target is predicted to be located to determine the position of the target in the predicted frame. In an embodiment in which the calculation amount limit needs to be considered, it is preferable to acquire the predicted position of the target using a filter (e.g., kalman filter) that performs prediction using a model without involving calculation for the current frame. In a preferred embodiment, different prediction models may also be employed for different prediction scenarios. For example, as described below, a less computationally intensive filter model is used in the prediction that matches the detected position and does not require processing of the current frame, while a more computationally intensive tracker model is used for the prediction that is still required for output when the prediction does not match and requires analysis of the current frame.

In general, the predicted target of the current frame includes a plurality of targets predicted based on the previous information, and the detected target also includes a plurality of targets detected based on the current frame itself. Thus, the matching of step S230 may involve matching between a plurality of predicted targets and a plurality of detected targets. In a specific application, the matching determination in step S230 may be performed based on an IOU (interaction over intersection). Thus, in one embodiment, step S230 may include, as the match determination criterion, the prediction target rectangular frame and the detection target rectangular frame having the highest degree of coincidence and being higher than a predetermined threshold. For example, rectangular frames detected by a current frame may be grouped into a sequence, if the current frame has no detection result, the sequence is empty, meanwhile, a group of single-target trackers is maintained for a moving target, a target position corresponding to the current frame is given as a prediction frame, and then the two groups of target frames are subjected to IOU matching. Specifically, the IOU may be computed between any two rectangular boxes in the two sequences, the two corresponding boxes that are the largest and exceed a certain threshold are selected as a match, and the two boxes are deleted in the two sequences, and the process is repeated.

Since detection is made only for the current frame, the acquired detection position information of the target may include only position information of a detection rectangular frame that surrounds the target. And the detection is made based on at least the previous state information of the active object, so that acquiring the predicted position information of the object includes not only acquiring the position information of the predicted rectangular frame surrounding the object but also the number information of the object. In other words, the predicted target is a target that has been previously detected and numbered. Thus, when it is determined that the detected position information of a certain target matches the predicted position information of a certain target, the number of the matching predicted target is assigned to the matching detected target, and the state information of the encoding target, for example, the parameters of the kalman filter or the target template information of the KCF tracker used to predict the target, can be updated using the matching detected target.

In an extreme case, the number of detection target positions acquired in step S210 may be zero. In other words, when there is no target within the current frame or no target is detected using the SSD or yolo model, step S210 cannot acquire any detected target (and its corresponding location information) from within the current frame. Likewise, in an extreme case, the number of predicted target positions acquired in step S220 may be zero. In other words, when there is no target in the previous frame (i.e., the previous active target is zero), step S220 cannot acquire the position information of any prediction target from the previous frame. When no target position is acquired in both steps S210 and S220, no subsequent operation is performed, and the method jumps to processing for the next frame. When no detection target is acquired in step S210 and one or more predicted target positions are acquired in step S220, step S250 in which prediction is not matched may be performed for each of the predicted target positions. When none of the predicted targets is obtained in step S220 and one or more detected target positions are obtained in step S210, step S260 of detecting that the detected targets are not matched may be performed for each detected target.

The target is usually in a moving state, so that the false detection and the missed detection of the detector can be caused by the problems of poor shooting angle, occlusion and the like. Here, the detector "missing detection" means that the detector does not detect an object actually present in the current frame. The detector "false detection" means that the detector detects other objects as targets. In order to reduce the influence of the false detection and the missed detection of the detector on the tracking scheme, the multi-target tracking scheme can improve the robustness of the scheme by comprehensively considering the matching results of a plurality of continuous frames.

In one embodiment, the multi-target tracking method of the invention further includes determining the marking state in the display image frame in a certain target based on the matching result for the target. In one embodiment, rectangular boxes surrounding matched objects may be displayed on the output monitor, for example, in real-time, or rectangular boxes of unmatched objects may be dismissed from display on the monitor. As described above, in order to reduce the influence of the erroneous detection of the detector on the correct labeling, a certain target may be labeled in the currently displayed image frame when the number of consecutive matches or the frequency of matches of the predicted position and the detected position for the target exceeds the first threshold. For example, a marker for an object, such as a rectangular box surrounding the object, may not be displayed on the monitor until the detected and predicted positions of the object match three consecutive times. It is also possible that the rectangular frame surrounding a certain target is displayed on the monitor after the detected and predicted position of the target is successfully matched more than four times in the six consecutive matching determinations. Likewise, to reduce the effect of detector miss-detection on correct registration, a previously registered target may be de-registered in a currently displayed image frame if the number of times that a predicted position of the target is determined to have no consecutive detected positions matching it exceeds a second threshold. With respect to the target rectangular frame that has been displayed on the monitor, the display of the above-described target rectangular frame is canceled only in a case where, for example, no detected position matches the predicted position three times in succession.

Also in order to avoid the influence of the missed detection of the detector, in step S250, in the case where the number of times that the predicted position of a certain target is determined to have no detected position matching therewith continuously exceeds the third threshold value, the target may be determined to be a disappearing target. The third threshold value may preferably be greater than the second threshold value. For example, with respect to a target rectangular frame that has been displayed in the monitor, if no detected position matches the predicted position three consecutive times (i.e., the second threshold is 3), the display of the above-described target rectangular frame is cancelled. If the next two times there is still no detected position matching the predicted position (i.e., the third threshold is 5), the target for which the predicted position is intended is determined to be a death target, i.e., the numbered target is no longer considered to be an active target.

Here, since the detection of the mismatch is not directly equal to the determination target extinction but the extinction is determined in the case where the number of times of the mismatch detection satisfies a certain threshold, it is possible to distinguish between the conventional prediction of the match with the detected position and the prediction that still needs to be output in the case of the mismatch prediction. For example, the use of the kalman filter and the linear filter does not require additional processing on the frame itself where the prediction target is located, and the amount of calculation required for prediction is small, and therefore, it can be used for conventional prediction calculation matching the detection target. Whereas in the case of using a tracker (e.g., a KCF, MeanShift, or CamShift tracker), in addition to the target template based on the state information of the previous frame, additional processing (e.g., sliding window processing) is required for the frame itself where the target is predicted to be located to determine the position of the target in the predicted frame. It is clear that the amount of computation required for prediction (also a detection for the current frame in a sense) using a tracker is larger, but also more accurate. Thus, in the case where the predicted position lacks correction of the detected position and the predicted position still needs to be output, a tracker model that is more accurate and actually involves processing of the current frame can be used.

In order to improve the efficiency of the multi-target tracking scheme, the movable targets can be managed in a unified mode. Therefore, in one embodiment, the multi-target tracking method of the present invention further comprises: all the active targets are stored as a target state list in a mode that the target numbers and the state information correspond to each other, so that the management efficiency of multiple targets is greatly facilitated.

Thus, in step S220, obtaining the predicted location information of the target may include obtaining the previous state information of each active target in the list of states of the targets to obtain the predicted location information of the target. In step S240, when the predicted position of a certain object matches a certain detected position in the current frame, the detected position may be used to update the state information of the object under the predicted object number entry. In step S250, when the predicted position of a certain target does not have a detection position matching it in the current frame, the target may be determined as a death target and the corresponding entry may be removed from the target state list. Under the above-mentioned more robust determination criterion for death, when the predicted position of a certain target does not have a detection position matching with the predicted position in the current frame, the number of times that the target is not matched with the detection position may be recorded in the target state list, and when the number of times that the target is not matched reaches the above-mentioned third threshold, the target may be determined as a death target and the corresponding entry may be removed from the target state list. Similarly, in step S260, if a detected object is determined as a new object, a new number may be assigned to the object and stored as a new entry in the object status list.

The target state list of the present invention may also be used in conjunction with a casualty target list, thereby further enhancing the overall efficiency of the present invention. Here, the extinction target list is used to store targets that are determined to be extinct, thereby optimizing the processing of reoccurring targets. Then, step S250 may include storing the object number and the object feature of the extinction object in an extinction object list; and deleting the corresponding entry of the death target from the target state list. In step S260, the list of disappearing targets may be used for a determination whether the target of the unmatched detection is a reoccurring target or a new target. Thus, step S260 may include extracting a target feature of a certain target (for example, using the target analysis module shown in fig. 1) in a case where it is determined that the detected position of the target does not have a detected position matching therewith; comparing the extracted target feature of the target with the target features stored in the extinction target list; under the condition that the matched target features exist in the casualty target list, the target is judged as a reappeared target, and the number of the matched target features is endowed to the target again; and under the condition that the matched target features do not exist in the death target list, judging the target as a new target, and assigning a new number to the target. Correspondingly, deleting the entry corresponding to the reappeared target from the death target list; and/or establishing an entry for the new target in the target status list.

The multi-target tracking method of the present invention and the preferred embodiment thereof have been described above in connection with fig. 2. To further facilitate understanding, a specific example of operation under different match determinations is given below in conjunction with fig. 3-5.

FIG. 3 shows a schematic flow diagram of determining that a predicted location matches a detected location, according to one embodiment of the invention. As shown in fig. 3, step S240 of the multi-target tracking method of the present invention may include the following sub-steps. In sub-step S341, the status information in the target status list in the corresponding entry of the target is updated by using the position given by the detector. In sub-step S342, it is determined whether the target meets a rendering condition, e.g., whether a consecutive match reaches a first threshold. In sub-step S343, if the rendering condition is satisfied, a display instruction is output to the outside (e.g., a monitor). The display instruction may include rectangular box position information and an object number that surround the object. In the case where the number needs to be displayed, the display instruction may also cause the number of the object to be displayed, for example, in the upper left corner of the rectangular frame. In sub-step S344, if the display condition is not satisfied, the display instruction is not output to the outside. When the object is not yet displayed, the object remains hidden. In another embodiment, it may also be determined in sub-step S342 whether the target is already displayed, and the display is maintained if already displayed. Subsequently, in step S345, if the display status indication of the target is included in the target status list, the target entry may be continuously updated based on the step of displaying or not. If the update is only made when there is a change in the display condition, it is located directly after the appearance branch, i.e. after sub-step S343, as shown in fig. 3 at sub-step S345. If the hiding also involves updating the list, the substep S345 can also be located on the backbone after the two branches converge.

FIG. 4 shows a schematic flow diagram of determining a detected location with which a predicted location does not match according to one embodiment of the invention. As shown in fig. 4, step S250 of the multi-target tracking method of the present invention may include the following sub-steps. Since the current frame target position given according to the previous state information of a certain target cannot match with all target positions given by the detector, it can be considered that the target does not have the real position given by the detector in the current frame. Thus, in sub-step S451, the target corresponding entry in the target status list is updated, e.g., the number of detected non-matches is updated. Subsequently, in sub-step S452, it is determined whether the target satisfies a hidden condition, e.g., whether consecutive non-matches reach a second threshold. In sub-step S453, if the concealment condition is satisfied, a concealment instruction is output to the outside (e.g., a monitor). The hiding instructions may include numbering information for the object so that, for example, the monitor hides the corresponding object it is displaying. In sub-step S454, if the hiding condition is not satisfied, the hiding instruction is not output to the outside. When the object has not been hidden, the object remains displayed. In another embodiment, it can also be determined in sub-step S452 whether the target has been hidden, and the hiding is maintained if the target has been hidden. Subsequently, in sub-step S455, it is determined whether the target meets a casualty condition, e.g., whether consecutive mismatches reach a third threshold. In sub-step S456, if the extinction condition is satisfied, the extinction target list and the target state list are updated. For example, the number of the target and its information are added to the list of disappearing targets, and the target entry is deleted from the list of target states. In one embodiment, after determining that the extinction condition is satisfied, a feature of the target may be further computed to determine whether the target is a newly-extinct feature. In sub-step S457, if the extinction condition is not satisfied, the target entry may be updated for the number of non-matches, provided that the target status list includes a target to-be-extinct status indication.

FIG. 5 illustrates a schematic flow chart diagram of determining a predicted location with which a detected location does not match, according to one embodiment of the present invention. As shown in fig. 5, step S260 of the multi-target tracking method of the present invention may include the following sub-steps. Since a certain object position detected by the detector cannot match the predicted positions given by all the state information, the object given by the detector can be considered as a new object for the previous frame. Then, it is necessary to further determine whether the target has ever appeared. In sub-step S561, the feature of the target is calculated, for example, the target sub-graph is subjected to feature extraction by using the target detection module shown in fig. 1 to find the feature of the target. Subsequently, a feature determination is made in sub-step S562. Specifically, the feature is compared with each target feature stored in the extinction target list to determine whether they are the same. In sub-step S563, if the feature matches a target feature in the casual target list, the detected feature is assigned the number of the matching feature in the casual list. Subsequently, in sub-step S565, the corresponding entry in the extinction target list is deleted. If the feature does not match all of the target features in the list of erased targets, the detected target is identified as a new target and assigned a new number, via step 564. Subsequently, whether it is a reoccurring target or a new target, in sub-step S566 the target status list is updated, i.e., the target is added as a new entry to the target status list. In specific application, a deep convolutional network can be adopted to extract features of a target subgraph, whether two target features are close to a metric function which can be constructed by deep learning is judged, and when the score is higher than a certain threshold value, a target which has appeared before is considered to be the target which reappears.

It should be appreciated that the multi-target tracking method described above in connection with fig. 2-5 may be made by a target tracking module in a video analytics system for each frame of an update. When the target detection module detects a target for each video image frame that is continuously input, the target tracking module also tracks the target for each frame and performs the matching determination as described above to determine the state of each target.

More preferably, the multi-target tracking method of the invention is particularly suitable for situations where computational power is limited. For example, when the target detection module cannot perform target detection on each input image frame under the limitation of effort and power consumption, the target detection module may, for example, detect an interval of one frame every ten input frames (the interval is fixed), or detect only key frames (the interval is not fixed) meeting the requirement, keep continuously tracking each target by using the target tracking module of the present invention, and perform the above-mentioned matching and subsequent operations of the present invention for each frame with detection update. Thus, when the target is in the display state, for example, when a corresponding rectangular frame is displayed in the monitor, in the frame without the detection result, the continuous display of the rectangular frame is maintained using the prediction result, and in the frame with the detection result, the display of the rectangular frame is updated using the detection result (and in the case where the display condition is satisfied).

In one embodiment, the multi-target tracking scheme of the present invention can also be implemented as a multi-target tracking device, which can be a novel target analysis module in a video analysis platform for executing the multi-target tracking method, or a relatively independent target tracking device. The multi-target tracking device can be realized as a software functional module, can be realized based on a specific special hardware circuit, and can also be realized as a part of a special SoC for intelligent video analysis.

FIG. 6 illustrates a schematic diagram of a multi-target tracking device, according to one embodiment of the invention. As shown in fig. 6, the multi-target tracking apparatus 600 may include a plurality of single target trackers 610, a matching unit 620, and a multi-target tracking unit 630.

Each single target tracker 610 is configured to perform position tracking on a current active target, which predicts a position of the target in a current frame based on at least previous state information of the target.

The matching unit 620 is used to compare the target predicted position of the current frame given by each single target tracker 610 with the target detection position of the current frame obtained from the outside (e.g., target detection module). The multi-target tracking unit 630 may then operate all of the single-target trackers 610 based on the matching results of the matching units to achieve efficient and accurate tracking of multiple active targets. Specifically, in the case where the matching unit 620 determines that the detected position information of a certain target matches the predicted position information of a certain target, the multi-target tracking unit 630 may update the parameters of the single-target tracker 610 corresponding to the target based on the detected position of the target. In the case where the matching unit 620 determines that the predicted position of a certain target does not have a detected position matching therewith, the multi-target tracking unit 630 may delete the single-target tracker 610 corresponding to the target. In the case where the matching unit 620 determines that the detected position of a certain target does not have a detected position matching therewith, the multi-target tracking unit 630 may create a new single-target tracker 610 for the target.

Similarly, the multi-target tracking unit 630 may maintain a target state list including all active targets stored in a manner corresponding to the target number and the state information. The multi-target tracking unit 630 may update corresponding entries of the target status list, create new entries, or delete erased entries based on the matching result, thereby facilitating management of all the single-target trackers 610.

The single-target tracker may be built using at least one of the following models based on the state information of the corresponding number: a Kalman (Kalman) filter; a nuclear correlation filtering (KCF) tracker; a mean shift (MeanShift) tracker; and a continuous adaptive mean shift (Camshift) tracker, and the type of state information stored in the target state list is determined based on a model used and includes at least one of: the position, velocity, acceleration, deformation velocity, and/or template information of the object. As described above, in the case of modeling using the kalman filter, prediction can be achieved based on the horizontal and vertical coordinates of the center point of the rectangular frame, the area and aspect ratio of the rectangular frame, and the corresponding speed, without the need for calculation processing of the subsequent frames themselves. When the MeanShift tracking is used, each subsequent frame needs to be processed to update the template information of the corresponding target.

In one embodiment, the multi-target tracking device 600 may further include a target re-recognizer (not shown) to assist the multi-target tracking unit 630 in further operations on the unmatched detection targets. Specifically, the target re-recognizer may determine whether a certain target is a previously-appeared target in a case where the matching unit 620 determines that the detected position of the target does not have a detected position matching therewith, and the multi-target tracking unit 630 may reconstruct a previous single-target tracker for the target in a case where the target re-recognizer determines that the target is a previously-appeared target. Specifically, when the matching unit 620 determines an unmatched detection target, the detection target may be sent to an external target analysis module for feature extraction. Accordingly, the target re-identifier may maintain a casual target list storing target numbers and target features determined to be casual. Thus, the target re-recognizer may compare the extracted features of the unmatched detection targets with the target features in the list of disappearing targets and determine whether the target is a re-emerging target.

As described above, one single target tracker is maintained for each active target. This single target tracker process requires the maintenance of the correspondence of the number and the target box in addition to the position prediction using, for example, a kalman filter or a KCF tracker. In order to improve the robustness of the system, alleviate false alarm and missed detection of a detection algorithm on certain pictures and improve the tracking effect, a single-target tracker can adopt a form similar to Schmidt penalty, a target is considered to be lost when a plurality of continuous frames of targets are lost, and a target is considered to be present when a plurality of continuous frames of targets are present. Therefore, the single-target tracker can organically combine the filter, the tracker and the target number together, and simultaneously expose a corresponding interface for the upper multi-target tracking frame, so that the whole multi-target tracking model can be conveniently realized.

For this reason, the multi-target tracking unit 620 may establish and delete the single-target tracker without depending on a single unmatched result, and perform corresponding actions only when a predetermined number of times is satisfied. In one embodiment, the single target tracker 610 may record the number of times that the predicted position of its corresponding target does not continuously match the detected position, and the multi-target tracking unit 630 may determine that the target is a disappearing target and delete the single target tracker corresponding to the disappearing target if the number of times that the single target tracker records that the predicted position does not match exceeds a third threshold. In this case, a filter for acquiring a predicted position that matches the detected position, and a tracker for outputting the predicted position in the case of no match detection may be included in the single target tracker 610. In one embodiment, the filter may be a less computationally intensive Kalman filter or linear filter that does not require processing of the current frame itself. The tracker may then be any one of the following: a nuclear correlation filtering (KCF) tracker; a mean shift (MeanShift) tracker; and a continuous adaptive mean shift (Camshift) tracker. The tracker described above needs to process the current frame (without involving the target detection of neural network calculations), and therefore enables a more accurate target position prediction to meet the accuracy requirements required for outputting a rectangular frame, for example.

In addition, in one embodiment, the multi-target tracking device of the present invention may further include a display indication unit (not shown) configured to determine whether to output an instruction to change the indication state for a specific target to the outside based on the matching result of the matching unit for the specific target. Similarly, in order to eliminate the influence of false detection and missed detection of the detection unit on the target display correctness, the target display can be performed after a plurality of continuous frame targets appear, and the target is lost after a plurality of continuous frame targets are lost. Thus, the display indication unit may be further configured to: when the matching unit 620 determines that the number of consecutive matches or the matching frequency of the predicted position and the detected position for a certain target exceeds a first threshold, an instruction to mark the target in the currently displayed image frame is issued; and in a case where the matching unit 620 determines that the number of times that the predicted position of a certain object is not continuously matched with the detected position thereof exceeds the second threshold value, an instruction to cancel the marking of the previously marked object in the currently displayed image frame is issued.

The multi-target tracking framework described in connection with fig. 6 may be implemented as a target analysis module within the video analysis system shown in fig. 1, thereby resulting in a new video analysis system. The system may include: the frame buffer queue is used for storing the video image frames which are continuously input; the target detection module is used for processing continuous image frames from the frame buffer queue to determine the detection position information of a target contained in the current frame; the target tracking module implemented by the multi-target tracking device described above with reference to fig. 6 is configured to track the position of the moving target, match the tracking position information of the acquired target with the detection position information of the target acquired by the target detection module, and adjust the tracking operation based on the matching result; and the target analysis module is used for performing feature extraction operation on a certain target under the condition that the target is determined as a new target according to the detection position which is not matched with the detection position of the target.

A video analysis system incorporating the multi-target tracking framework of the present invention may allow a target detection module to select image frames at intervals from consecutive video image frames for target detection according to predetermined rules, thereby reducing the computational requirements for target detection. By matching the target detection result with the target prediction result and performing corresponding updating, deleting and adding operations, the precision requirement of the video analysis system can be met.

In practical use, part or all of the functions of the video analysis system can be realized by digital circuits. The target detection operation performed by the target detection module and/or the target analysis operation performed by the target analysis module are implemented at least in part using neural network computations. For example, the object detection module and the object analysis module described above are implemented at least in part by GPU, FPGA or ASIC circuitry capable of performing high-parallelism computations. In a preferred embodiment, the target detection module and the target analysis module share at least part of a GPU, FPGA or ASIC circuit capable of performing convolutional neural network computations.

In one embodiment, the video analytics system of the present invention may be implemented in a system on a chip (SoC) that includes a general purpose processor, memory, and digital circuitry. Fig. 7 shows one example of a SoC that may be used to implement the video analysis system of the present invention.

In one embodiment, the deep learning network required by the present system, such as a convolutional neural network, may be implemented by a digital circuit portion (e.g., an FPGA or ASIC chip) on the SoC. For example, the FPGA circuit or the ASIC chip is used to implement part or all of the target detection module and the target analysis module in the video analysis system of the present invention. Because CNNs perform parallel computations, implementing target detection and attribute analysis functions via logic hardware or custom circuitry has natural computational advantages and can achieve lower power consumption than software implementations.

In one embodiment, all the parameters related to CNN obtained by the previous training may be stored in a memory (e.g., a main memory) of the system on chip, and when the target detection is performed subsequently, the parameters of the CNN layers are first read from the main memory to perform neural network calculation on the input image, thereby obtaining the non-linear feature. Subsequently, a large number of consecutive features (e.g., features for all channels of a particular region) are read from the main memory into the cache module of the logical hardware at once. Therefore, the time delay caused by reading data when calculating the next area can be reduced, and the utilization rate of reading the main memory each time is increased, thereby improving the overall calculation efficiency. The buffer module of the logic hardware can comprise a frame buffer queue and a target library to be analyzed, and the image frame parameters (such as life marker values) are reasonably set, so that the image frame can be efficiently stored in a maximized mode and the key frame can be timely reserved, and the video analysis efficiency is improved under the condition that the buffer requirement is not increased.

It is understood that the various components included in the video analysis system of the present invention, such as the frame buffer module, the object detection module, and the object analysis module, may be implemented wholly or partially in hardware, or wholly or partially in software. In one embodiment, the frame buffer module may be a buffer module of logic hardware, the deep learning module in the target detection module and the target analysis module may be implemented by logic hardware, and the specific operation of the multi-target tracking framework of the present invention may be implemented in the form of a thread under the control of a processor.

Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for carrying out the above-mentioned steps defined in the above-mentioned method of the invention.

Alternatively, the invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the steps of the above-described method according to the invention.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.

In addition, references to "first," "second," and "third" in this disclosure are intended to distinguish between the different references. For example, the terms "first threshold", "second threshold", and "third threshold" are intended to lack thresholds for different applications, and their respective values may be the same or different.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A multi-target tracking method comprises the following steps:

acquiring detection position information of a target;

acquiring predicted position information of a target;

matching the acquired predicted position information with the detected position information;

updating state information of a certain target based on a detected position of the target in a case where it is determined that the detected position information of the certain target matches predicted position information of the certain target;

determining that a target is a disappearing target in a case where it is determined that the predicted position of the target does not have a detected position matching therewith; and

in the case where it is determined that the detected position of a certain target does not have a detected position matching therewith, it is determined that the target is a new target or a newly appearing target.

2. The method of claim 1, wherein acquiring the detected position information of the target comprises:

and carrying out target detection on the currently input image frame to acquire the detection position information of at least one target.

3. The method as claimed in claim 2, wherein the performing object detection on the currently input image frame to obtain the detection position information of the at least one object further comprises:

the currently input image frame for object detection is selected at intervals from consecutive video image frames according to a predetermined rule.

4. The method of claim 1, wherein obtaining predicted location information of a target comprises:

predicted position information of a certain object is found based on previous state information of the object detected in a previous frame.

5. The method of claim 4, wherein deriving the predicted location information for an object based on previous state information for the object detected in a previous frame further comprises:

modeling based on the previous state information of the object to predict the position of the object in the current frame based on the state model of the object.

6. The method of claim 5, wherein in the event that it is determined that the detected location information for an object matches the predicted location information for an object, updating the location state information for the object using the detected location of the object comprises:

the state model of the object is corrected using the detected position of the object.

7. The method of claim 4, wherein the prior state information for a target comprises:

the position, velocity, acceleration, deformation velocity, and/or template information of the object.

8. The method of claim 7, wherein modeling based on prior state information of the target comprises:

modeling the target using at least one of:

a Kalman (Kalman) filter;

a linear filter;

a nuclear correlation filtering (KCF) tracker;

a mean shift (MeanShift) tracker; and

a continuous adaptive mean shift (Camshift) tracker, and,

the type of state information is determined based on the particular model used.

9. The method of claim 1, further comprising:

all active targets are stored as a target state list in such a way that the target numbers and the state information correspond, and,

acquiring the predicted position information of the target includes:

obtaining predicted location information for each active target based on previous state information for that target in the target state list, and

updating the state information of the target based on the detected position of the target includes:

and updating the state information of the target entry based on the detection position of the target.

10. The method of claim 1, wherein matching the obtained predicted location information and the detected location information comprises:

and taking the predicted target rectangular frame and the detected target rectangular frame which have the highest coincidence degree and are higher than a preset threshold value as a matching judgment standard.

11. The method of claim 1, wherein obtaining the detected position information of the object comprises obtaining position information of a detected rectangular frame surrounding the object, and obtaining the predicted position information of the object comprises obtaining position information of a predicted rectangular frame surrounding the object and number information of the object.

12. The method of claim 11, wherein in the event that it is determined that the detected location information for an object matches the predicted location information for an object, updating the location state information for the object using the detected location of the object comprises:

assigning the number of the matched prediction target to the matched detection target, and updating the state information of the encoding target using the matched detection target.

13. The method of claim 1, further comprising:

and determining the marking state in the display image frame in a certain target based on the matching result aiming at the target.

14. The method of claim 13, wherein determining the beacon state in the displayed image frame in the target based on the matching result for the target comprises:

when the continuous matching times or the matching frequency of the predicted position and the detected position of a certain target exceed a first threshold value, marking the target in a currently displayed image frame; and

in the case where it is determined that the number of times that the predicted position of a certain object is continuously free from the detected position matching therewith exceeds the second threshold value, the marking of the previously marked object is cancelled in the currently displayed image frame.

15. The method of claim 1, wherein determining that an object is an ablation object in the event that it is determined that the predicted location of the object does not have a detected location that matches the predicted location comprises:

in the case where the number of times of determining that the predicted position of a certain target does not continuously have a detected position matching therewith exceeds the third threshold value, the target is determined to be a disappearing target.

16. The method according to claim 15, wherein in a case where it is determined that the predicted position of a certain target does not have a detected position matching thereto and the target has not been determined as an extinction target, a position of the target acquired by the tracker is used as the output position of the target.

17. The method of claim 16, wherein the tracker is any one of:

a nuclear correlation filtering (KCF) tracker;

a mean shift (MeanShift) tracker; and

a continuous adaptive mean shift (Camshift) tracker.

18. The method of claim 1, wherein determining that an object is an ablation object in the event that it is determined that the predicted location of the object does not have a detected location that matches the predicted location comprises:

storing the target number and the target characteristics of the casualty target into an casualty target list; and

and deleting the corresponding entry of the death target from a target state list, wherein the target state list comprises all the activity targets stored in a mode of corresponding target numbers and state information.

19. The method of claim 18, wherein in the case where it is determined that the detected position of a certain object does not have a detected position matching therewith, determining that the object is a new object or a reoccurring object includes:

extracting the target feature of a certain target under the condition that the detection position of the target is judged not to have a detection position matched with the detection position;

comparing the extracted target feature of the target with the target features stored in the extinction target list;

under the condition that the matched target features exist in the casualty target list, the target is judged as a reappeared target, and the number of the matched target features is endowed to the target again; and

and under the condition that the matched target features do not exist in the death target list, judging the target as a new target, and assigning a new number to the target.

20. The method of claim 19, wherein in the case where it is determined that the detected position of a certain object does not have a detected position matching therewith, determining that the object is a new object or a reoccurring object further comprises:

deleting the entry corresponding to the reappeared target from the death target list; and/or

And storing the new target as a new entry into a target state list, wherein the target state list comprises all active targets stored in a mode that the target numbers correspond to the state information.

21. A multi-target tracking device, comprising:

a plurality of single target trackers for predicting the position of a certain target in the current frame at least based on the previous state information of the target;

a matching unit for comparing the target predicted position of the current frame given by each single target tracker with the target detection position of the current frame obtained from the outside; and

a multi-target tracking unit for operating the single-target tracker based on the matching result of the matching unit, and further for:

when the matching means determines that the detected position information of a certain target matches the predicted position information of the certain target, the parameters of the single-target tracker corresponding to the target are updated based on the detected position of the target,

in a case where the matching unit determines that the predicted position of a certain target does not have a detected position matching therewith, the single target tracker corresponding to the target is deleted, and

in a case where the matching unit determines that the detected position of a certain target does not have a detected position matching therewith, a new single-target tracker is created for the target.

22. The apparatus of claim 21, further comprising:

an object re-recognizer for, in a case where the matching unit determines that the detected position of a certain object does not have a detected position matched therewith, determining whether the object is an object that has appeared before, and

and the multi-target tracking unit reconstructs a previous single-target tracker for the target under the condition that the target re-recognizer judges that the target is a previous target.

23. The apparatus of claim 22, wherein the object re-identifier further maintains a casual object list storing object numbers and object features determined to be casual, and

the object re-recognizer determines whether the object is a previously appeared object by comparing the extracted features of the object with the object features stored in the vanishing object list.

24. The apparatus of claim 21, wherein the multi-target tracking unit further maintains a target state list including all active targets stored in a manner corresponding to target numbers and state information.

25. The apparatus of claim 21, further comprising:

and the display indicating unit is used for judging whether to outwards output an instruction for changing the marking state of the specific target or not based on the matching result of the matching unit for the specific target.

26. The apparatus of claim 25, wherein the display indication unit is further configured to:

when the matching unit judges that the continuous matching times or the matching frequency of the predicted position and the detected position of a certain target exceeds a first threshold value, an instruction for marking the target in a currently displayed image frame is sent; and

and under the condition that the number of times that the matching unit judges that the predicted position of a certain target is not matched with the detected position continuously exceeds a second threshold value, sending an instruction for canceling marking of the target marked before in the currently displayed image frame.

27. The apparatus as claimed in claim 21, wherein the single target tracker records the number of times that the predicted position of the corresponding target thereof is not continuously matched with the detected position thereof, and the multi-target tracking unit determines that the target is a disappearing target and deletes the single target tracker corresponding to the disappearing target in case that the number of times of unmatching prediction recorded by the single target tracker exceeds a third threshold.

28. The apparatus of claim 27, wherein the single target tracker uses a filter model to obtain a target predicted position for matching with a target detected position, and uses a tracker model to obtain a target predicted position for output if the number of recorded unmatched predictions has not reached a third threshold.

29. The apparatus of claim 28, wherein the filter model is built using at least one of:

a Kalman (Kalman) filter; and

a linear filter, and

the tracker model realizes at least one model establishment as follows:

a nuclear correlation filtering (KCF) tracker;

a mean shift (MeanShift) tracker; and

a continuous adaptive mean shift (Camshift) tracker, and,

the type of state information stored in the target state list is determined based on a model used and includes at least one of: the position, velocity, acceleration, deformation velocity, and/or template information of the object.

30. A video analytics system comprising:

the frame buffer queue is used for storing the video image frames which are continuously input;

the target detection module is used for processing continuous image frames from the frame buffer queue to determine the detection position information of a target contained in the current frame;

a target tracking module implemented by the multi-target tracking device of any one of claims 21 to 29, for tracking a position of a moving target, matching the tracking position information of the acquired target with the detection position information of the target acquired by the target detection module, and adjusting a tracking operation based on the matching result;

and the target analysis module is used for performing feature extraction operation on a certain target under the condition that the target is determined as a new target according to the detection position which is not matched with the detection position of the target.

31. The system of claim 30, wherein the object detection module selects a currently input image frame for object detection at intervals from consecutive video image frames according to a predetermined rule.

32. The system of claim 30, wherein the target detection module and the target analysis module are implemented at least in part by GPU, FPGA, or ASIC circuitry capable of performing high-parallelism computations.

33. The system of claim 30, wherein the target detection module and the target analysis module share at least a portion of GPU, FPGA, or ASIC circuitry capable of performing convolutional neural network computations.