Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a method and a device for detecting a threat in third party construction.
In order to achieve the above object, the present invention adopts the following technical scheme.
In a first aspect, the present invention provides a third party construction threat detection method, comprising the steps of:
Building a training data set based on images acquired at a third party construction site, and training a target detection model;
inputting the video image into a trained target detection model for feature extraction to obtain three image features with different sizes;
And performing target recognition based on three image features with different sizes, and outputting the category of the third party construction threat.
Further, the step of establishing a training data set based on the image acquired by the third party construction site comprises the following steps:
Data acquisition is carried out on a third party construction site, the acquired images are grouped according to the types of threat objects, and each type of the acquired images selects image samples with close quantity;
cleaning the selected image samples, and removing some unclear image samples;
marking the cleaned image sample, and marking a target frame and a category which represent the position and the size of the target;
denoising the marked image sample;
And converting the denoised image into a gray image, and dividing the image sample into a training data set and a test data set according to the ratio of 4:1.
Still further, the categories of third party construction threats include: the construction equipment comprises a safety helmet, a reflective coating, a pneumatic pick, a soil pile, a construction fence, a water horse, a construction warning sign, a bulldozer, an excavator, a road roller, a spade, an earthmoving vehicle, a roadblock and an engineering protective fence.
Furthermore, a Gaussian filter based on convolution operation is adopted to carry out denoising processing on the image sample, and a convolution formula of the Gaussian kernel is as follows:
where G (i, j) is the pixel value at coordinates (i, j) of the input image sample after gaussian smoothing, and σ is the standard deviation of the gaussian kernel.
Still further, the color RGB image is converted to a gray scale using the RGB2gray function in skimage image processing libraries.
Further, the target detection model comprises an input end, a backbone network backup, a connection network neg and an output end.
Further, when training the target detection model, performing data enhancement processing on the input image samples, and performing stitching after performing random scaling, random cutting and random arrangement on 4 input image samples.
Further, the method further comprises: when training the target detection model, automatically screening the target detection frame according to the following steps:
Obtaining the confidence coefficient of the target existing in each target detection frame, wherein si is the confidence coefficient of the target existing in the ith target detection frame, i=1, 2, … and N; n is the number of target detection frames;
the intersection ratio IOU of each target detection frame and each alternative frame is calculated, and the formula is as follows:
Wherein ai is the ith target detection frame, Bj is the jth alternative frame, the alternative frame is the marked target frame, ai∩Bj represents the area of intersection of ai、Bj, ai∪Bj represents the area of union of ai、Bj, j=1, 2, …, M; m is the number of alternative frames;
The score of each target detection frame is calculated as follows:
If Si>S0, reserving a target detection frame Ai; otherwise, the deletion target detection frame ai;S0 is a set threshold.
Still further, the method further comprises:
If Si>S0, the target detection box ai is reserved, and all Bj satisfying Si×IOU(Ai,Bj)>S0 are calculated and denoted as Bjk, k=1, 2, …, K;
Calculating the position of the target detection frame Ai:
where ai-O is the center position coordinate of the target detection frame ai, and Bjk-O is the center position coordinate of the candidate frame Bjk.
In a second aspect, the present invention provides a third party construction threat detection apparatus based on deep learning, including:
the model training module is used for establishing a training data set based on images acquired at a third party construction site and training a target detection model;
The feature extraction module is used for inputting the video image into the trained target detection model to perform feature extraction to obtain three image features with different sizes;
And the target recognition module is used for carrying out target recognition based on three image features with different sizes and outputting the class of the third-party construction threat.
Compared with the prior art, the invention has the following beneficial effects.
According to the invention, the training data set is established based on the images acquired at the third party construction site, the target detection model is trained, the video image is input into the trained target detection model for feature extraction, three image features with different sizes are obtained, the target recognition is carried out based on the three image features with different sizes, the category of the third party construction threat is output, and the automatic detection of the third party construction threat is realized. According to the invention, the target detection model is utilized to extract the characteristics of the input video image, three image characteristics with different sizes are obtained, and the target recognition is carried out based on the three image characteristics with different sizes, so that the third party construction threat can be accurately recognized.
Detailed Description
The present invention will be further described with reference to the drawings and the detailed description below, in order to make the objects, technical solutions and advantages of the present invention more apparent. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a flowchart of a third party construction threat detection method according to an embodiment of the invention, including the steps of:
Step 101, building a training data set based on images acquired at a third party construction site, and training a target detection model;
102, inputting a video image into a trained target detection model for feature extraction to obtain three image features with different sizes;
and 103, performing target recognition based on three image features with different sizes, and outputting the class of the third party construction threat.
In this embodiment, step 101 is mainly used for building a training data set and training a target detection model. In the embodiment, a target detection model is adopted to identify a third party construction threat from a video image photographed by implementation, so as to judge whether third party construction is being performed. The third party construction refers to the behavior that non-company staff develop various engineering constructions such as road construction, bridge construction, factory building construction, 500 m internal blasting construction, 100 m downstream riverbed river dredging construction and the like in the range of 50m on two sides of the central line of the gas pipeline, and comprises construction which directly and indirectly causes pipeline hazard and risk. In order to enable the target detection model to accurately identify the third party construction threat, the embodiment establishes a training data set by collecting real construction scene video images from a third party construction site, trains the target detection model by using the training data set, optimizes model parameters by establishing a loss function and utilizes a back propagation method.
In this embodiment, step 102 is mainly used for image feature extraction. In the embodiment, video images acquired in real time are input into a trained target detection model, and image feature extraction is performed to obtain three image features with different sizes. Because the types of third party construction threats are various, and the large, medium and small sizes are different greatly, the convolution kernels with the same size are difficult to extract the features of all the objects at the same time, and the three convolution kernels with different sizes are adopted for extracting the features so as to extract the image features of the objects with different sizes effectively.
In this embodiment, step 103 is mainly used for third party construction threat identification. In this embodiment, target recognition is performed based on three image features of different sizes extracted in the previous step, a target detection frame (circumscribed rectangle of the target image) is generated, confidence is calculated, and finally, the category of each object is obtained based on the highest confidence.
As an optional embodiment, the building a training data set based on the image collected by the third party construction site includes:
Data acquisition is carried out on a third party construction site, the acquired images are grouped according to the types of threat objects, and each type of the acquired images selects image samples with close quantity;
cleaning the selected image samples, and removing some unclear image samples;
marking the cleaned image sample, and marking a target frame and a category which represent the position and the size of the target;
denoising the marked image sample;
And converting the denoised image into a gray image, and dividing the image sample into a training data set and a test data set according to the ratio of 4:1.
The present embodiment provides a technical solution for creating a training data set. First, third party construction data collection is performed, and each type collects images with similar quantity, for example, about 300 images are collected. And secondly, cleaning the data, and discarding some pictures with small markers and unclear image samples. And then, marking the image by using a marking tool, and marking the position and the type of the object in the picture so as to cover the upper, lower, left and right edges of the object as much as possible. And after the marking is finished, generating a related file of the marking frame, namely a data set matching file, and then obtaining a third party construction data sample library. In the actual use process, the images are identified in the night scene, and in order to match the on-site camera equipment and the actually deployed identification scene, denoising and gray-scale image conversion processing are also carried out on the images in the database. The denoising processing of the image can eliminate the influence of noise generated in the field image acquisition process on the identification and extraction of the third party construction image, and solves the problems of image blurring, multiple noise points and the like caused by noise interference in the image processing. A gray image dataset is finally obtained, which serves as the actual dataset. And dividing the data set into a training set and a testing set, wherein 80% is the training set and 20% is the testing set.
As an alternative embodiment, the categories of the third party construction threat include: the construction equipment comprises a safety helmet, a reflective coating, a pneumatic pick, a soil pile, a construction fence, a water horse, a construction warning sign, a bulldozer, an excavator, a road roller, a spade, an earthmoving vehicle, a roadblock and an engineering protective fence.
The present embodiment gives the class of third party construction threats. The present example lists 14 threat categories in total, all belonging to the objects required for construction, such as excavators, bulldozers, etc., and also representative objects for third party construction, such as water horses.
As an alternative embodiment, a gaussian filter based on convolution operation is used to denoise the image sample, and the convolution formula of the gaussian kernel is:
where G (i, j) is the pixel value at coordinates (i, j) of the input image sample after gaussian smoothing, and σ is the standard deviation of the gaussian kernel.
The embodiment provides a technical scheme of image noise reduction. In the embodiment, a gaussian filter based on convolution operation is adopted to perform denoising processing on the image sample, and a convolution formula of a gaussian kernel is shown as a formula (1). Most image noise belongs to gaussian noise, so gaussian filters are widely used for image denoising. Gaussian filtering is a linear smoothing filter suitable for removing gaussian noise. The technical principle of gaussian filtering can be simply understood that gaussian filtering denoising is to perform weighted average on the pixel value of the whole image, and the value of each pixel point is obtained by performing weighted average on the value of the pixel point and other pixel values in the adjacent domain.
As an alternative embodiment, the RGB2gray functions in skimage image processing libraries are used to convert the color RGB images to gray scale.
The present embodiment provides a technical solution for converting a color RGB image into a gray scale. The embodiment uses the rgb2gray function in skimage to realize gray map conversion, and uses the python language to import the rgb2gray function from the skimage image data processing library in the following specific manner:
from skimage.color import rgb2gray。
As an alternative embodiment, the object detection model includes an input, a backbone network backbone, a connection network stack, and an output.
The embodiment provides a specific network structure of the target detection model. As shown in fig. 2, the target detection model mainly comprises an input end, a backbone network backbone, a connection network stack and an output end. Each of the sections is described separately below.
(A) Input terminal
The input end performs random scaling, cutting and other treatments on the input image, so as to improve the sample size of the data set and the effect of training the model; in addition, the input end adaptively adds the least black edge to the image according to the size of the image so as to realize the uniformity of the size of the image of the whole data set.
(B) Backbone network
The Focus structure in the backbone network is responsible for slicing operation of the pictures, the CSP structure is responsible for feature map convolution, learning effect is improved, and calculation complexity is reduced.
(C) Neck network
And up-sampling the characteristic information from top to bottom by using the FPN structure in Neck networks, fusing the characteristic information of the high layer and the low layer obtained by sampling, and finally obtaining a predicted characteristic diagram by calculation. And the PAN is opposite to the FPN, the characteristic information is downsampled from bottom to top, the characteristic information of the lower layer and the characteristic information of the upper layer obtained by the sampling are fused, and finally, a predicted characteristic diagram is obtained by calculation.
(D) An output terminal
The Bounding box Loss function at the output end adopts CIoU _loss function, and NMS (Non-Maximum Suppression ) is responsible for screening prediction frames, and removing prediction frames with high IOU and low confidence. The output end is responsible for predicting an input image, generating a detection frame with higher confidence coefficient and a label thereof (comprising category and confidence coefficient), and outputting a detection result.
As an alternative embodiment, when training the target detection model, data enhancement processing is performed on the input image samples, and the 4 input image samples are spliced after random scaling, random clipping and random arrangement.
The embodiment provides a technical scheme for data enhancement. The data enhancement is mainly applied to a model training stage and is used for increasing the number of training sample data, so that the training precision of a model is improved. In the embodiment, the (every) 4 image samples are subjected to random scaling, random clipping and random arrangement and then are spliced, so that different image samples with the number much greater than 4 are obtained.
As an alternative embodiment, the method further comprises: when training the target detection model, automatically screening the target detection frame according to the following steps:
Obtaining the confidence coefficient of the target existing in each target detection frame, wherein si is the confidence coefficient of the target existing in the ith target detection frame, i=1, 2, … and N; n is the number of target detection frames;
the intersection ratio IOU of each target detection frame and each alternative frame is calculated, and the formula is as follows:
Wherein, ai is the ith target detection frame, Bj is the jth alternative frame, the alternative frames are marked target frames, ai∩Bj represents the intersection area of ai、Bj, ai∪Bj represents the union area of ai、Bj, j=1, 2, …, M are the number of alternative frames;
The score of each target detection frame is calculated as follows:
If Si>S0, reserving a target detection frame Ai; otherwise, the deletion target detection frame ai;S0 is a set threshold.
The embodiment provides a technical scheme for automatically screening the target detection frame. Target detection frame screening is typically performed during a model training phase. In the training stage, the model receives the input image and the marked target frame, uses the input image and the marked target frame as the candidate frame in the embodiment, scores the target detection frame based on the confidence of the target detection frame and the coincidence degree of the target detection frame and the candidate frame, and screens the target detection frame based on the score of the target detection frame. First, the confidence si,si that an object exists in each object detection frame is obtained, which is given by the model, and the larger si is, the greater the likelihood that an object exists in the object detection frame is. And then calculating the intersection ratio of each target detection frame and each alternative frame, wherein the formula is shown as (2), the intersection ratio reflects the overlapping degree of the two sets, and specifically the overlapping degree of the target detection frames and the alternative frames is the higher the overlapping degree, the greater the possibility that targets exist in the target detection frames. In order to improve screening accuracy, the embodiment scores the target detection frame based on the product of the confidence and the cross-correlation ratio. Since the number of the candidate frames is multiple (M), each target detection frame corresponds to multiple cross ratios, and the product of the maximum value of the multiple cross ratios and the confidence coefficient is taken as the score Si of each target detection frame in this embodiment, as shown in equation (3). Finally, comparing the scoring Si of each target detection frame with a set threshold S0, and if Si>S0, reserving a target detection frame Ai; otherwise, the target detection box ai is deleted.
As an alternative embodiment, the method further comprises:
If Si>S0, the target detection box ai is reserved, and all Bj satisfying Si×IOU(Ai,Bj)>S0 are calculated and denoted as Bjk, k=1, 2, …, K;
Calculating the position of the target detection frame Ai:
where ai-O is the center position coordinate of the target detection frame ai, and Bjk-O is the center position coordinate of the candidate frame Bjk.
The embodiment provides a technical scheme for determining the central position coordinates of the target detection frame Ai. Ai is a target detection frame which remains after screening, and ai itself has a central position coordinate, but the error is larger. For this purpose, the embodiment obtains the center position coordinate of ai by solving the candidate box matched with the candidate box. In this embodiment, all Bj, i.e. Bjk, satisfying si×IOU(Ai,Bj)>S0 are regarded as candidate boxes matched with ai, and weighted average is calculated on the central position coordinates of all Bjk to obtain the central position coordinates of ai, where the calculation formula is as formula (4), and the weighting coefficients are the IOUs of ai and Bjk. When the number k=1 of Bjk, that is, only one candidate frame, the center position coordinate of the candidate frame is the center position coordinate of ai.
Fig. 3 is a schematic diagram of a third party construction threat object detection apparatus according to an embodiment of the invention, the apparatus comprising:
The model training module 11 is used for establishing a training data set based on images acquired at a third party construction site and training a target detection model;
the feature extraction module 12 is configured to input a video image into the trained target detection model for feature extraction, so as to obtain three image features with different sizes;
The target recognition module 13 is used for performing target recognition based on three image features with different sizes and outputting the category of the third party construction threat.
The device of this embodiment may be used to implement the technical solution of the method embodiment shown in fig. 1, and its implementation principle and technical effects are similar, and are not described here again.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.