Rapid multi-target detection method and device under static complex sceneTechnical Field
The invention relates to the technical field of information monitoring, in particular to a multi-target detection method and device for a photoelectric detection system.
Background
For a still camera, the existing technology can accurately detect a foreground moving object under a simple background. For the complex and dynamic background, the scene contains various interferences of the dynamic background, such as swaying leaves, fountain and illumination changes, so that the background in the video continuously changes in a larger or smaller dynamic manner, the detection accuracy of the prior art is greatly reduced, a large amount of false alarms are brought, and the target cannot be accurately and quickly detected.
There are many methods for obtaining foreground moving objects in a complex background of dynamic change shot by a static camera, including inter-frame difference, background difference and optical flow methods, but these methods have low detection precision, incomplete detected objects, even a "hole" phenomenon on slowly moving objects, even complex calculation and high hardware requirements, which causes the situation that the existing hardware platform cannot meet the requirements, and are easily affected by factors such as noise and illumination change, and cannot meet the actual application.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a rapid multi-target detection method under a static complex scene and a device for detecting by using the method.
The invention provides a rapid multi-target detection method under a static complex scene, which comprises the following steps:
collecting images in an input video, and establishing a background model;
matching each pixel in the current frame image with the background model, if the matching is successful, marking the pixel as the background, otherwise, marking the pixel as the foreground;
processing the foreground and background binary images containing the candidate target, removing false alarms, and obtaining the number and the mark of the candidate target area;
screening and sorting the candidate target areas, obtaining the number and the serial number of the effective target areas and then outputting the effective target areas;
and rapidly completing multi-target detection under a static complex scene.
Further, the specific implementation manner is as follows:
collecting and storing at least 1 frame of image of input video data, and establishing and storing a background model by using a background modeling algorithm;
when the current frame comes, matching each pixel in the image with the established background model, if the matching is successful, marking the pixel as the background, otherwise, marking the pixel as the foreground;
obtaining a foreground binary image and a background binary image containing candidate targets, and performing cluster analysis on the foreground binary image and the background binary image to obtain the number and marks of candidate target areas;
Screening and sorting the candidate target area labels, removing false alarms, obtaining the number and the serial number of the real target areas and then outputting the number and the serial number;
and rapidly completing multi-target detection under a static complex scene.
Further, the method for establishing the background model comprises the following steps:
obtaining a modeled image from input video data;
taking a plurality of frame images for modeling, wherein 50-200 frame images are preferred;
performing codebook modeling on each pixel in each frame of image to obtain a codebook model;
simplifying a codebook model, and checking all code elements in each pixel codebook in an image; if the longest non-updating time lambda of a code element in the codebook is more than 50, setting a flag bit corresponding to the code element as 0;
and completing background modeling.
Further, the method of codebook modeling is as follows:
setting an initialized codebook model to be empty, carrying out gray processing on an input image, establishing a codebook for each pixel of the input image, wherein the space of each codebook is 12 code elements, initializing an effective flag bit to be 0, the learning range of the code elements is determined according to the reserved number of the code element space, and the structure of each code element is Ci={Ymax,Ymin,Ylow,Yhigh,tlast,λ},
Wherein Y ismax,YminRespectively the maximum and minimum value of the gray value of the current pixel, Ylow,YhighThe initial values are the gray value of the current pixel, and are the learning lower limit and the learning upper limit of the code element; t is tlastThe value of the last matching time, namely the updating time, of the code element is 0; lambda is the value of the longest non-updating time of the code element, the initial value is 0, and 1 is added to the value when the code element is not updated;
equally dividing each frame of image into four blocks, performing parallel operation, adding 1 to codebook time when a new frame of image comes, and limiting the pixel gray value to the current gray value [ + -15, + -25 [ + -15 [ + -25 ]]The value range of (2) is preferably ± 20, so the lower learning limit of the code element is Ylow-20, upper limit Yhigh+20, wherein the codebook time is defined as the current modeling frame number, the initial value is 0, and 1 is added to each frame modeling;
if the gray value Y of the pixel is between the upper limit and the lower limit of the code element learning, namely Ylow≤Y≤YhighIf the matching is successful, adding 1 to the codebook time, updating the maximum and minimum values of the code element and updating the upper and lower learning limits of the code element according to the following formula:
Ct={max(Ymax,Y),min(Ymin,Y),Y-20,Y+20,t,λ};
if the gray value Y of the pixel exceeds the range between the upper limit and the lower limit of the code element learning, the pixel is considered not to find a matched code element, and a new code element is created according to the following formulaCLThe number of codebook stages plus 1:
CL={Ymax,Ymin,Ylow,Yhigh,0,0};
then updating the upper and lower learning limits of the code element, and setting the effective zone bit to be 1;
adding 1 to the longest non-updating time lambda of other code elements of the codebook;
All valid symbols with flag bits of 1 are checked, if the longest non-updating time lambda of a certain symbol is more than 50, the valid flag bits are set to be 0, and the modeling is finished.
Furthermore, each pixel of the current frame is matched with the codebook model of the pixel at the corresponding position in the background model, if the pixel in the current frame is between the upper limit and the lower limit of the codebook, the pixel is marked as the background, otherwise, the pixel is marked as the foreground;
matching formula:
Ylow≤Y≤Yhigh;
wherein Y islow,YhighThe upper and lower limits of the codebook of the pixel are obtained after the background model training, and Y is the gray value of the pixel in the current frame.
Further, the candidate target areas containing the background are sorted after being subjected to cluster screening, false alarms are removed, and then the candidate target areas are stored in a queue form, when a new frame comes, the obtained candidate target areas are placed into the queue to be matched with all target areas in the current queue, and if the matching is successful, the current target areas are used for replacing the corresponding target areas in the queue; and if the matching fails, adding the target area to the tail end of the queue, adding 1 to the duration of the target area which is successfully matched in the queue, deleting the target area which is failed to be matched from the queue, subtracting 1 from the countdown, and deleting the target area which is 0 in the countdown of disappearance from the queue when the target area exists.
Further, the matching of the candidate target region with the existing target region in the current queue should satisfy the following requirements at the same time:
the coordinate difference of each corner point in the target area is less than 20 percent;
the number difference of a plurality of pixel points in the target area is less than 20%.
Furthermore, the method for sorting the target regions after the cluster screening is as follows:
when the duration of the target area in the target queue is more than 5, the number 0-9 is bound to the target area, the bound number cannot be automatically unbound before the bound target area is cleared out of the queue, each number can only be bound to one target area at the same time, and 10 target areas obtain the number each time.
Compared with the prior art, the method has the following beneficial effects: the method is simple, effective and easy to implement, overcomes the defects of high false alarm rate, low detection rate and high false detection rate of the traditional method, and realizes the rapid and effective detection of multiple targets in a static complex scene.
The invention provides a rapid multi-target detection device under a static complex scene, which comprises a processor module and an editable device which are mutually connected; the method comprises the steps that a background modeling algorithm and a codebook comparison algorithm are arranged in an editable device, images in input videos are collected, a background model is established, codebook comparison is carried out on each pixel in a current frame image in the background model, and the background, the foreground binary image and the output of a processor result are obtained;
the processor module is internally provided with a target screening algorithm and a numbering and sequencing algorithm which are mainly used for processing a foreground binary image and a background binary image containing a candidate target, removing false alarms, obtaining the number and the mark of candidate target areas, screening and sequencing the candidate target areas, obtaining the number and the serial number of effective target areas and outputting the number and the serial number to the editable logic gate module.
The processor module is a DSP module, and the editable device is an FPGA module.
The invention simplifies and optimizes the classic codebook model by the editable device, and combines the processing of target screening, sequencing and numbering in the DSP module, thereby overcoming the defects of high false alarm, low detection rate and high false detection of the traditional method, realizing the rapid and effective detection of multiple targets in a static complex scene, and obtaining better effect in the application of the actual scene.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a schematic view of the connection structure of the device of the present invention.
FIG. 3 is a flow chart of a method of background modeling of the present invention method.
FIG. 4 is a flow chart of the matching method of the present invention.
FIG. 5 is a diagram illustrating the detection results of the present invention forvehicles 3 km away from a target in a static complex scene.
FIG. 6 is a vehicle detection result diagram of multiple moving targets in haze weather in a static complex scene according to the invention.
FIG. 7 is a diagram showing the results of the present invention on a vehicle with a close range and a target occluded in a static complex scene.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
The invention is based on an embedded device for rapid multi-target detection in a static complex scene, taking vehicle detection as an example, and an input image is an image sequence containing a vehicle target in a ground static complex scene.
The invention provides a rapid multi-target detection embedded device in a static complex scene, which realizes detection by means of an FPGA module shown in figure 2 and a DSP module connected with the FPGA module, as shown in figure 1, the method comprises the following steps:
(1) Acquiring and storing 50-200 frames of images of input video data by using an FPGA module, and establishing and storing a background model by using a background modeling algorithm; the number of the images is selected according to the change condition of the scene and the actual application requirement;
(2) when the current frame comes, matching each pixel in the image with the established background model in the FPGA module, marking the pixel as the background after the matching is successful, or marking the pixel as the foreground to obtain a foreground and background binary image containing the candidate target, and sending the obtained foreground and background binary image to the DSP module by the FPGA module;
(3) the DSP module carries out cluster analysis on the binary images containing the foreground and the background to obtain the number and the marks of the candidate target areas;
(4) and screening and sequencing the candidate target area marks, removing false alarms, obtaining the number and the serial number of the real target area, sending the obtained target area information with the serial number to the FPGA module by the DSP module, and outputting the target area information through the FPGA module.
The invention utilizes the available resources of the FPGA module and the realizability of hardware to comprehensively evaluate the performance and the real-time performance of the algorithm. The invention reduces the size of the original input image to 1/4 by interpolation, thus the required memory is reduced to 3/4 and the calculation is reduced to 1/4.
On the other hand, because the classical codebook model aims at color video images, R, G pixel information and B pixel information are needed, in order to further reduce storage and operation amount, the invention converts the input RGB images into YUV images, and only extracts Y channel data for modeling.
The process of establishing the background model by using the reduced and optimized codebook method is shown in fig. 3:
(1) in an FPGA module, setting an initialized codebook model to be empty, carrying out gray processing on an input image, establishing a codebook for each pixel of the input image, reserving a space of 12 code elements for each pixel codebook by the FPGA module, initializing an effective flag bit to be 0, determining a learning range of the code elements according to the reserved number of the code element space, wherein each code element structure is as follows: ci={Ymax,Ymin,Ylow,Yhigh,tlast,λ},
Wherein Y ismax,YminRespectively the maximum and minimum value of the gray value of the current pixel, Ylow,YhighThe initial values are the gray value of the current pixel, and are the learning lower limit and the learning upper limit of the code element; t is tlastUpdating time which is the last matching of code elements, wherein the initial value is 0; lambda is the longest non-updating time of the code element, the initial value is 0, and 1 is added to the value when the code element is not updated;
(2) In the modeling process, the FPGA equally divides each frame of image into four blocks, parallel operation is carried out, when new frame of image data arrives, the codebook time is added by 1, and the pixel gray value is limited to the current gray value [ + -15, + -25 ]]Preferably, the present invention is ± 20, so the lower learning limit of the codeword is Ylow-20, upper limit Yhigh+20, wherein the codebook time is defined as the current modeling frame number, the initial value is 0, and 1 is added to each frame modeling;
(3) in the model training process, if the gray value Y of a certain pixel is between the upper limit and the lower limit of the code element learning, namely Ylow≤Y≤YhighIf the pixel is considered to find a matched code word, adding 1 to the codebook time, updating the maximum and minimum values of the code element according to the following formula, and updating the upper and lower learning limits of the code element:
Ct={max(Ymax,Y),min(Ymin,Y),Y-20,Y+20,t,λ};
if the codebook is empty or there is no matching symbol, a new symbol C is created as followsL:
CL={Ymax,Ymin,Ylow,Yhigh,0,0};
Then updating the upper and lower learning limits of the code element, and setting the effective zone bit of the code element to be 1;
adding 1 to the longest non-updating time lambda of other code elements of the codebook;
at the end of modeling, all valid symbols withflag bit 1 are checked, if the longest non-updating time λ of a certain symbol is more than 50, the valid flag bit is set to 0.
After background modeling is completed, matching each pixel of the current frame with a codebook model of a pixel at a corresponding position in a background model according to the following matching rules:
Ylow≤Y≤Yhigh;
Wherein Y islow,YhighAnd obtaining the upper and lower bounds of the codebook of the pixel for the background model training stage, wherein Y is the gray value of the pixel in the current frame.
If the formula is met, the matching is considered to be successful, the pixel is marked as a background and is marked as 0, otherwise, the pixel is marked as a foreground and is marked as 1, a binary image containing the candidate target is obtained, and the FPGA module sends the obtained binary image to the DSP module.
According to the invention, because the number of the image frames used for background modeling is 100, 100 images need to be modeled respectively, and after modeling is finished, the codebook model is simplified.
As shown in fig. 4, when the model is built, the current frame comes, the candidate target regions obtained in the above steps are sorted by clustering and screening, the false alarm is removed and then the candidate target regions are stored in a queue, when a new frame comes, the obtained candidate target regions are put into the queue to be matched with all the target regions in the current queue, and the matching rule is as follows: when the candidate target area is matched with the existing target area in the current queue, the following requirements should be met simultaneously: the coordinate difference of each corner point in the target area is less than 20 percent; and when the number difference of the pixel points in the target area is less than 20%, the matching is considered to be successful, otherwise, the matching is failed.
If the matching is successful, replacing the corresponding (matched) target area in the queue by the current target area; and if the matching fails, adding the target area to the tail end of the queue, adding 1 to the duration of the target area which is successfully matched in the queue, deleting the target area which is failed to be matched from the queue, subtracting 1 from the countdown, and deleting the target area which is 0 in the countdown of disappearance from the queue when the target area exists.
Sequencing the target areas after the clustering screening, wherein the sequencing rule is as follows: when the duration of the target area in the target queue is more than 5, the number 0-9 is bound to the target area, the bound number cannot be automatically unbound before the bound target area is cleared out of the queue, each number can only be bound to one target area at the same time, and 10 target areas obtain the number each time.
In order to verify the effectiveness of the method for detecting multiple targets in scenes with different actual complexity, actual scene data is adopted for testing, and the targets in the scenes are vehicles.
Fig. 5 shows a scene where a vehicle passes through aroad 3 km away from atarget 3 km away in a static complex scene and shot by a 1080p ball machine located on an iron tower, in the prior art, a row of cars parked on a roadside has a great interference on the detection of an actual moving vehicle, and due to a long distance, the imaging of the target is small, and meanwhile, the background is disordered, and the interference of similar targets makes a great challenge for the actual target detection. The method can accurately eliminate the interference of the pseudo targets, and finally obtain an accurate detection result which is very clear in a red rectangular frame in the graph.
Fig. 6 shows that in a haze weather in a static complex scene, a 3-km expressway scene shot by a 1080p dome camera located on an iron tower increases great difficulty for actual detection tasks due to haze influence, interference of a dynamic background, multi-target staggering and shielding. After the method eliminates the dynamic background interference, the real target is marked to be very clear in a red rectangular frame in the image, namely the vehicle moving on the highway.
Fig. 7 shows a scene of an urban intersection at 500m, which is shot by a network dome camera in a static complex scene, wherein a target is easily shielded by trees and buildings. From the result, the method can still realize the rapid and accurate detection of the target under the above challenges, is very clear in a red rectangular frame in the figure, and solves the problem in practical application to a great extent.
In addition, in order to verify the advantages of the method of the invention compared with the prior art, the method of the invention is compared with the existing two mainstream static scene target detection methods GMM and VIBE, the detection rate, the false alarm rate and the operation complexity are used as evaluation criteria, and the actual complex scene data is tested on an embedded platform (DSP + FPGA), and the results are shown in table 1 below. As the GMM is only simulated on the PC, the time consumption is high, and the realization on an embedded platform is not considered. As can be seen from the table, compared with the prior art, the method can obtain higher detection rate and lower false alarm rate in a complex scene, and the algorithm after optimization has high real-time performance when running on the embedded platform, and is easy to realize.
Table 1 comparison of the effect of the process of the invention with the prior art:
| evaluation index | GMM | VIBE | The method of the invention |
| Detection rate | 90.9% | 92.5% | 95.6% |
| False alarm rate | 16.8% | 24.5% | 4.9% |
| Time consuming | >40ms | 20ms | 25ms |
The invention simplifies and optimizes the classic codebook model by the editable device, and combines the processing of target screening, sequencing and numbering in the DSP module, thereby overcoming the defects of high false alarm, low detection rate and high false detection of the traditional multi-target detection method, realizing the rapid and effective detection of multiple targets in a static complex scene, and obtaining better effect in the application of the actual scene.
The invention has not been described in detail and is part of the common general knowledge of a person skilled in the art.
It will be appreciated by those skilled in the art that the above embodiments are illustrative only and not intended to be limiting of the invention, and that changes may be made to the above embodiments without departing from the true spirit and scope of the invention, which is defined by the appended claims.