Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In addition, it should be noted that, in the technical scheme of the invention, the related processes of collection, storage, use, processing, transmission, provision, disclosure and the like of the related data and the like all conform to the regulations of related laws and regulations and do not violate the popular regulations.
Example 1
Fig. 1 is a flowchart of a construction operation detection model training method according to an embodiment of the present invention, where the method may be applicable to a case of performing automatic detection of a construction operation based on a network model to improve detection efficiency and detection accuracy, and the method may be performed by a construction operation detection model training device, where the construction operation detection model training device may be implemented in a form of hardware and/or software, and the construction operation detection model training device may be configured in an electronic device, where the electronic device may be a terminal device or a server, and the embodiment of the present invention is not limited thereto.
As shown in fig. 1, the method for training a construction operation detection model provided by the embodiment of the invention specifically includes the following steps:
s110, determining a construction operation image of a construction operation process of the engineering vehicle; the construction operation image simultaneously comprises the engineering vehicle and the construction operator.
Specifically, in response to a requirement for acquiring related construction operation images in a construction operation process, the construction operation images of the engineering vehicle in the construction operation process are acquired, and the acquired images including the engineering vehicle and construction operators are taken as construction operation images.
Optionally, according to the construction operation process of the engineering vehicle, the mode of determining the construction operation image includes, but is not limited to, photographing and collecting through a camera and collecting through acquiring video information of the engineering vehicle in the construction operation process.
According to the technical scheme provided by the embodiment of the invention, the construction operation image simultaneously comprising the engineering vehicle and the construction operator is determined according to the construction operation process of the engineering vehicle, so that an available data set is constructed based on the acquired construction operation image.
S120, generating a sample training set comprising at least one construction work image; the construction work image has a sample tag of a work vehicle standard area and a worker standard area.
Specifically, the construction operation image is preprocessed, a construction operation image with a construction vehicle standard area and an operator standard area sample label is generated, and a sample training set comprising at least one construction operation image is generated according to the construction operation image with the sample label.
Optionally, the sample training set may be generated by dividing the construction work image with the standard region of the engineering vehicle and the standard region sample tag of the operator, so as to generate the sample training set including at least one construction work image. The embodiment of the invention does not limit the generation mode of the sample training set.
According to the technical scheme provided by the embodiment of the invention, the construction operation images with the engineering vehicle standard area and the operator standard area sample labels are pre-generated, and the sample training set comprising at least one construction operation image is determined, so that the sample training set conforming to the model training conditions is constructed conveniently, and further, the subsequent model training is ensured to be carried out smoothly.
S130, inputting a sample training set into a pre-constructed network model to obtain model output, and carrying out model training on the network model according to an engineering vehicle prediction area and an engineering vehicle standard area output by the model and an operator prediction area and an operator standard area output by the model to obtain a target detection model which is used for carrying out construction operation detection.
Specifically, a model output result comprising an engineering vehicle prediction area and an operator prediction area can be obtained by inputting a sample training set into a pre-constructed network model, the network model can be trained according to the model output result, an engineering vehicle standard area and the operator standard area, when a preset model training ending condition is met, model training is stopped, a target detection model is obtained, and detection of construction operation is carried out according to the obtained target detection model.
Optionally, the model training ending condition may be preset model training iteration times, and when the model training times reach the preset model training iteration times, model training is ended, that is, the model training ending condition is reached. Or, according to the project vehicle prediction area and the project vehicle standard area in the model output result, and the operator prediction area and the operator standard area in the model output result, obtaining a loss value based on a preset loss function, and determining whether the model finishes training according to whether the loss value reaches a preset threshold or tends to be stable. The network model comprises an effective feature extraction network layer, a feature fusion network layer and a classification regression network layer.
Specifically, the network model for model training includes an effective feature extraction network layer, a feature fusion network layer, and a classification regression network layer. The effective feature extraction network layer is used for extracting features of an image input into the model, namely training a sample training set in a data set constructed based on a construction operation image, extracting feature information in the sample training set, and constructing a feature set of the sample training set; the feature fusion network layer is used for fusing the feature information input into the feature fusion network layer so as to integrate the feature information input into the feature fusion network layer and acquire deep feature information; the classification regression network layer is used for judging the characteristic information input into the network layer, and obtaining a final prediction frame result through the coincidence of the threshold value and the non-maximum value so as to realize classification and regression of each training sample in the sample training set.
According to the technical scheme provided by the embodiment of the invention, the network model comprising the effective feature extraction network layer, the feature fusion network layer and the classification regression network layer is constructed, the feature information of the input image is extracted through the effective feature extraction network layer, the feature information is fused through the feature fusion network layer, so that the model can extract the feature information of a deeper level, and classification and regression of each training sample in the sample training set are realized according to the fused feature information.
The embodiment of the invention determines the construction operation image of the construction operation process of the engineering vehicle; the construction operation image simultaneously comprises an engineering vehicle and construction operators; generating a sample training set comprising at least one construction work image; the construction operation image is provided with a sample label of an engineering vehicle standard area and an operator standard area; inputting a sample training set into a pre-constructed network model to obtain model output, and carrying out model training on the network model according to an engineering vehicle prediction area and an engineering vehicle standard area output by the model and an operator prediction area and an operator standard area output by the model to obtain a target detection model for construction operation detection; the network model comprises an effective feature extraction network layer, a feature fusion network layer and a classification regression network layer. According to the technical scheme provided by the embodiment of the invention, the automatic detection of the construction operation image in the construction operation process of the engineering vehicle is realized, and the model training is carried out on the sample labels with the engineering vehicle standard area and the operator standard area and the model output result, so that the characteristic extraction capability of the target detection model is improved, and the detection accuracy and the detection efficiency of the construction operation are improved.
Example two
Fig. 2A is a flowchart of another training method for a construction operation detection model according to the second embodiment of the present invention, where the technical solution of the embodiment of the present invention is further optimized based on the above-mentioned alternative technical solutions.
Further, inputting a sample training set into a pre-constructed network model to obtain model output, carrying out model training on the network model according to an engineering vehicle prediction area and an engineering vehicle standard area output by the model and according to an operator prediction area and an operator standard area output by the model to obtain a target detection model which meets the preset model training ending condition and is used for construction operation detection, and further refining to obtain effective feature parameters output by an effective feature extraction network layer by inputting the sample training set into the effective feature extraction network layer of the network model; inputting the effective characteristic parameters into a characteristic fusion network layer in the network model to perform characteristic fusion, so as to obtain fusion characteristic parameters output by the characteristic fusion network layer; inputting the fusion characteristic parameters into a classification regression network layer in the network model to conduct regional prediction of the construction operation image, and obtaining a prediction result output by the classification regression network layer; according to the prediction result, model training is carried out on the network model based on the engineering vehicle standard area and the operator standard area of the sample label, so that the preset model training ending condition is met, the target detection model is obtained, automatic detection of construction operation is carried out based on the network model, and the detection efficiency and the detection accuracy are improved. It should be noted that, in the present embodiment, parts not described in the present embodiment may refer to the related expressions of other embodiments, which are not described herein.
As shown in fig. 2A, another construction operation detection model training method provided by the embodiment of the present invention specifically includes the following steps:
s210, determining a construction operation image of a construction operation process of the engineering vehicle; the construction operation image simultaneously comprises the engineering vehicle and the construction operator.
Specifically, according to the image of the engineering vehicle in the construction process, the image containing both the engineering vehicle and the construction operator is used as the construction image of the engineering vehicle in the construction process.
Optionally, determining a construction work image of a construction work process of the engineering vehicle includes: acquiring an operation video clip in the construction operation process of the engineering vehicle acquired by image acquisition equipment; extracting image frames simultaneously containing engineering vehicles and construction operators from the operation video clips as key frame images; performing image processing on the key frame image by adopting at least one image processing mode to obtain a diffusion frame image corresponding to the key frame image; and determining the key frame image and the corresponding diffusion frame image as a construction work image.
Specifically, in response to a requirement for acquiring a construction operation image of an engineering vehicle in construction operation engineering, an image acquisition device can acquire an operation video segment of the engineering vehicle in the construction operation process, extract each frame image of the engineering vehicle and construction operators simultaneously contained in the operation video segment, take the operation video segment as a key frame image, perform image processing on each key frame image by adopting at least one image processing mode, obtain a diffusion frame image corresponding to each key frame image after image processing operation, and take the key frame image and the diffusion frame image corresponding to each key frame image as construction operation images.
Optionally, when acquiring a working video clip in the construction working process of the engineering vehicle, the adopted image acquisition device may be a camera, and the embodiment of the invention does not limit the type of the image acquisition device. When the key frame images are extracted according to the operation video clips, the operation video clips can be manually extracted, so that the condition that each frame image of an engineering vehicle and construction operators is simultaneously contained is met. When at least one image processing mode is acquired for each key frame image to perform image processing, the data expansion of each key frame image can be performed by means including but not limited to rotation, overturning and scaling, so as to generate a larger number of diffusion frame images corresponding to each key frame image, and the key frame images and the diffusion frame images corresponding to each key frame image are used as construction operation images.
According to the technical scheme provided by the embodiment of the invention, the image acquisition equipment is used for acquiring the operation video clips in the construction operation process of the engineering vehicle, the image frames simultaneously comprising the engineering vehicle and construction operators are used as key frame images, the diffusion frame images corresponding to the key frame images are obtained by carrying out image processing on the key frame images, and the key frames and the corresponding diffusion frame images are determined to be construction operation images.
S220, generating a sample training set comprising at least one construction work image; the construction work image has a sample tag of a work vehicle standard area and a worker standard area.
Specifically, a dataset is constructed based on the construction work image to generate a sample training set comprising at least one construction work image. Firstly, before a data set is constructed, marking areas of engineering vehicles and operators in each construction operation image, determining sample labels with engineering vehicle standard marking areas and operator standard marking areas in the construction operation images, constructing the data set based on the construction operation images containing the sample labels, and then dividing data according to the constructed data set to generate a sample training set comprising at least one construction operation image.
When marking the areas where the engineering vehicles and the operators are located in each construction operation image, marking can be performed through a large-Scale data marking platform including but not limited to an umbrella cloud intelligent marking platform, scale AI and Labelbox, appen, and the marking platform selected by the development marking work is not limited in the embodiment of the invention. When the constructed data set is subjected to data division, the data set can be divided into a training set, a verification set and a test set according to the proportion of 7:2:1, and the data set can be divided into the training set and the test set according to the proportion of 7:3.
In particular, in order to ensure the rigor of feature extraction and experimental development of the subsequent model, when the data set is divided subsequently, the image generated by the data expansion mode of each construction operation image can be used as a type of image, that is, each construction operation image and the image generated by the data expansion of the construction operation image should be uniformly divided into a training set, a verification set or a test set. The method and the device avoid that the model extracts characteristic information from any type of construction operation images, and tests the type of construction operation images in a test set, so that accuracy of model prediction results is affected. In a specific implementation, the embodiment of the present invention is not limited to this idea of the data set partitioning method.
According to the technical scheme provided by the embodiment of the invention, the sample training set comprising at least one construction operation image is generated, the sample labels of the engineering vehicle standard area and the operator standard area are arranged in each construction operation image, and the available data set is constructed to ensure the smooth development of the steps such as the subsequent model training and the like.
S230, inputting the sample training set into an effective feature extraction network layer in the network model to extract features, and obtaining effective feature parameters output by the effective feature extraction network layer.
The network model comprises an effective feature extraction network layer, a feature fusion network layer and a classification regression network layer.
Specifically, the pre-constructed network model comprises an effective feature extraction network layer, a feature fusion network layer and a classification regression network layer. Firstly, inputting a sample training set into a pre-constructed network model, extracting feature information in the sample training set through an effective feature extraction network layer in the network model, and obtaining effective feature parameters output by the effective feature extraction network layer. The effective feature extraction network layer is a trunk feature extraction network of the network model and is used for extracting feature information in an input sample training set and constructing a feature set of the sample training set.
In an implementation manner of a specific embodiment, an overall structure diagram of a network model is detailed in fig. 2B, an effective feature extraction network layer of the embodiment corresponds to CSPDarknet (Cross Stage Partial Darknet), a feature fusion network layer corresponds to FPN (Feature Pyramid Networks), a classification regression network layer is Yolo Head (Yolo decoupling Head), and in a CSPDarknet module, inputs represent an input sample training set; focus represents splitting a high resolution feature map into multiple low resolution feature maps using a slicing operation; conv2D_BN_SiLU means performing convolution operation on the feature map, siLU means using SiLU activation function, and the purpose of introducing the activation function is to increase nonlinear fitting capability of the neural network; CSPLayer (Cross Stage Partial, CSP) for extracting feature information; SPPBottcleck applies receptive fields with different sizes to the same image, so that characteristic information with different scales can be captured. Then connecting the feature images together, and performing dimension reduction through a full-connection layer to finally obtain feature vectors with fixed sizes; in the FPN module, concat represents that the feature map is spliced; upSampling2D means that UpSampling operation is performed on the two-dimensional feature map; down sample represents the downsampling operation of the two-dimensional feature map; yolhead represents the multi-scale target detection of the extracted feature map, the final regression prediction is performed, and the model predicted category, probability value and category are output. In the CSPDarknet module, a first dotted frame from top to bottom indicates a first residual block, a second dotted frame indicates a second residual block, a third dotted frame indicates a third residual block, and a fourth dotted frame indicates a fourth residual block.
Optionally, the effective feature extraction network layer includes a first residual block, a second residual block, a third residual block, and a fourth residual block; the number and the size of convolution kernels in the first residual block, the second residual block, the third residual block and the fourth residual block are set to be different; correspondingly, inputting the sample training set to an effective feature extraction network layer in the network model for feature extraction to obtain effective feature parameters output by the effective feature extraction network layer, wherein the method comprises the following steps: inputting the sample training set into a first residual block to extract effective characteristics, and obtaining first effective characteristic parameters output by the first residual block; inputting the initial effective characteristic parameters into a second residual block for effective characteristic extraction to obtain second effective characteristic parameters output by the second residual block; inputting the second effective characteristic parameters into a third residual block for effective characteristic extraction to obtain third effective characteristic parameters output by the third residual block; the third effective characteristic parameter is input into a fourth residual block to extract effective characteristics, and a fourth effective characteristic parameter output by the fourth residual block is obtained; generating an effective characteristic parameter including a second effective characteristic parameter, a third effective characteristic parameter, and a fourth effective characteristic parameter.
Specifically, the effective feature extraction network layer in the network model includes a first residual block, a second residual block, a third residual block and a fourth residual block, where the four different residual blocks are different in number of convolution kernels and size of the convolution kernels, that is, the output results of the four residual blocks are different due to different numbers of convolution kernels and different sizes of the convolution kernels. When the sample training set is input into the network model, extracting a first residual block of a network layer from effective features in the network model, extracting effective feature information in the sample training set, and outputting first effective feature parameters; the first effective characteristic parameters output by the first residual block are input into the second residual block, and effective characteristics are extracted, so that the second effective characteristic parameters output by the second residual block can be obtained; inputting the second effective characteristic parameters output by the second residual block into a third residual block, and extracting effective characteristics to obtain third effective characteristic parameters output by the third residual block; the output fourth effective characteristic parameter of the fourth residual block can be obtained by inputting the third effective characteristic parameter output by the third residual block into the fourth residual block and extracting the effective characteristic. And then generating effective characteristic parameters which comprise the three different scales and comprise different characteristic parameters according to the second effective characteristic parameters output by the second residual block, the third effective characteristic parameters output by the third residual block and the fourth effective characteristic parameters output by the fourth residual block.
It can be understood that the second effective feature parameter, the third effective feature parameter and the fourth effective feature parameter are respectively output by residual blocks with different sizes and different numbers of convolution kernels, so that the scales between the different effective feature parameters are also different, and the size and the number of the convolution kernels in the different residual blocks are not limited in the embodiment of the invention.
According to the technical scheme provided by the embodiment of the invention, the effective characteristic parameters including the second effective characteristic parameter, the third effective characteristic parameter and the fourth effective characteristic parameter are generated through the effective characteristic extraction network layer in the network model, so that different characteristic information can be conveniently extracted through different residual blocks in the model, and further more characteristic information can be extracted from the sample training set.
S240, inputting the effective characteristic parameters into a characteristic fusion network layer in the network model to perform characteristic fusion, and obtaining fusion characteristic parameters output by the characteristic fusion network layer.
Specifically, the effective characteristic parameters including the second effective characteristic parameter, the third effective characteristic parameter and the fourth effective characteristic parameter are input into a characteristic fusion network layer in a network model to perform characteristic fusion, and three fusion characteristic parameters output through the characteristic fusion network layer are obtained.
In particular, the extracted features may be upsampled and downsampled in the feature fusion network. The up-sampling operation is significant in that the feature map is enlarged so as to facilitate the extraction of more feature information; the significance of the downsampling operation compresses the feature map, reducing complexity by reducing the number of parameters. The embodiment of the invention does not limit whether up-sampling and down-sampling operations are adopted in the network model.
According to the technical scheme provided by the embodiment of the invention, the effective characteristic parameters are subjected to characteristic fusion through the characteristic fusion network layer in the network model, so that characteristic information of different scales is conveniently synthesized, the risk of overfitting of the model is reduced, more characteristic information is extracted, and the performance and generalization capability of the model are improved.
S250, inputting the fusion characteristic parameters into a classification regression network layer in the network model to conduct regional prediction of the construction operation image, and obtaining a prediction result output by the classification regression network layer.
Specifically, three feature fusion parameters output by the feature fusion network layer are input into a classification regression network layer in the network model, and the region prediction of the construction operation image is performed, so that a prediction result output by the classification regression network layer can be obtained.
According to the technical scheme provided by the embodiment of the invention, the classification regression network layer in the network model is used for determining the area prediction result of the construction operation image based on the fusion characteristic parameters, and outputting the prediction result so as to realize automatic detection of the construction operation image area.
And S260, performing model training on the network model based on the engineering vehicle standard region and the operator standard region of the sample label according to the prediction result to obtain a target detection model, wherein the preset model training ending condition is met.
Specifically, according to the prediction result output by the classification regression network layer in the network model, model training is carried out on the network model based on the engineering vehicle standard region and the operator standard region of the sample label. In the process of training the network model, if the preset model training ending condition is met, model training can be stopped, and a target detection model which is suitable for being constructed by sample training set data is obtained.
The preset model training ending condition may be the number of model training times preset manually, or may be the ratio of the number of the predicted results of the model on the test set to the total number of the test set when the number of the predicted results of the model on the test set is not smaller than the preset accuracy, or may be the average value of the predicted results of the model on the test set, and the average value is compared with the preset accuracy.
Particularly, when the object detection model constructed according to the embodiment of the invention is used for model training, the adopted deep learning frame includes, but is not limited to Pytorch, tensorFlow, keras, caffe, and the selection of the deep learning frame is not limited in the embodiment of the invention.
In one embodiment, the selected activation function in the object detection model may be a SiLU (Sigmoid Linear Unit) activation function, which is a modified version of Sigmoid and ReLU (Rectified Linear Unit), has the characteristics of no upper bound, low bound, smoothness and non-monotonic, and is better on the deep model than the ReLU, which may be considered as a smooth ReLU activation function. The formula of the SiLU activation function is:
where x represents the input profile.
The embodiment of the invention determines the construction operation image of the construction operation process of the engineering vehicle; the construction operation image simultaneously comprises the engineering vehicle and the construction operator. Generating a sample training set comprising at least one construction work image; the construction work image has a sample tag of a work vehicle standard area and a worker standard area. And inputting the sample training set into an effective feature extraction network layer in the network model, and extracting features through different residual blocks to obtain effective feature parameters output by the effective feature extraction network layer. And inputting the effective characteristic parameters into a characteristic fusion network layer in the network model to perform characteristic fusion, so as to obtain fusion characteristic parameters output by the characteristic fusion network layer. And inputting the fusion characteristic parameters into a classification regression network layer in the network model to conduct regional prediction of the construction operation image, and obtaining a prediction result output by the classification regression network layer. And according to the prediction result, carrying out model training on the network model based on the engineering vehicle standard region and the operator standard region of the sample label to obtain a target detection model which meets the preset model training ending condition. The network model comprises an effective feature extraction network layer, a feature fusion network layer and a classification regression network layer. According to the technical scheme provided by the embodiment of the invention, the automatic detection of the construction operation image in the construction operation process of the engineering vehicle is realized, and the model training is carried out on the sample labels with the engineering vehicle standard area and the operator standard area and the model output result, so that the characteristic extraction capability of the target detection model is improved, and the detection accuracy and the detection efficiency of the construction operation are improved.
Example III
Fig. 3 is a flowchart of a construction operation detection method provided in a third embodiment of the present invention, where the embodiment is applicable to a case of performing automatic detection of a construction operation based on a network model to improve detection efficiency and detection accuracy, the method may be performed by a construction operation detection device, where the construction operation detection device may be implemented in a form of hardware and/or software, and the construction operation detection device may be configured in an electronic device, where the electronic device may be a terminal device or a server, and the embodiment of the present invention is not limited thereto.
As shown in fig. 3, the method for detecting construction operation provided by the embodiment of the invention specifically includes the following steps:
s310, acquiring a construction operation image to be detected.
Specifically, in response to a detection demand for a construction work condition, a construction work image in the construction work process should be acquired first, and a construction work image to be measured should be constructed based on the construction work image.
According to the technical scheme provided by the embodiment of the invention, the construction operation image in the construction operation process is obtained in advance, and the construction operation image to be detected is determined, so that the construction operation image representing the construction operation process is determined, the acquisition of image data is completed, and the smooth development of the follow-up work is ensured.
S320, inputting the construction operation image to be detected into the target detection model to obtain a construction operation detection result.
The target detection model is generated by the construction operation detection model training method in the embodiment.
Specifically, the detection result of the construction operation can be obtained by inputting the construction operation image to be detected into the target detection model and performing model training. The target detection model is generated by the construction operation detection model training method in the embodiment. The method for generating the target detection model is not limited by the embodiment of the invention.
Optionally, the construction operation detection result comprises an operator prediction result and an engineering vehicle prediction result; correspondingly, after inputting the construction operation image to be detected into the target detection model to obtain the construction operation detection result, the method comprises the following steps: determining a false alarm rate detection result of a construction operation image to be detected based on a preset personnel confidence threshold value and a preset vehicle confidence threshold value according to personnel confidence in an operation personnel prediction result and vehicle confidence in an engineering vehicle prediction result; determining an overlapping rate detection result of the construction work image to be detected according to the region overlapping condition of the operator prediction region in the operator prediction result and the region overlapping condition of the engineering vehicle prediction region in the engineering vehicle prediction result; and judging whether the constructor is constructed in a safe operation range in a construction scene corresponding to the construction operation image to be detected based on the preset safe construction operation area according to the operator prediction area in the construction operation detection result, and determining whether to generate early warning information according to the judgment result.
Specifically, by inputting the construction work image to be detected into the target detection model, the detection result of the construction work can be obtained, wherein the detection result of the construction work includes, but is not limited to, a prediction result of an operator and a prediction result of an engineering vehicle. That is, after the target detection model performs the related feature extraction operation on the to-be-detected construction work image input into the target detection model, the prediction results of the operator and the engineering vehicle in the construction work image are output, wherein, optionally, the prediction results may include the prediction category and the prediction probability value of the model pair identified as the operator and the engineering vehicle, which is not limited in the embodiment of the present invention.
Specifically, after the construction operation detection result of the target detection model is obtained, the false alarm rate detection result of the construction operation image to be detected can be determined according to the personnel confidence level in the personnel prediction result and the vehicle confidence level in the engineering vehicle prediction result and based on the preset personnel confidence level threshold and the preset vehicle confidence level threshold. The preset person confidence threshold and the preset vehicle confidence threshold may be manually pre-specified, which is not limited in the embodiment of the present invention. Likewise, the method for determining the false alarm rate detection result of the construction operation image to be detected is not limited in the embodiment of the invention.
Optionally, after the detection result of the target detection model on the construction operation is obtained, the detection result of the overlapping rate of the construction operation image to be detected can also be determined according to the area overlapping condition of the prediction area of the operator in the operator prediction result and the area overlapping condition of the prediction area of the engineering vehicle in the engineering vehicle prediction result. When determining the overlapping rate detection result of the construction work image to be detected according to the overlapping condition of the area of the operator prediction area in the operator prediction result, the non-maximum suppression algorithm (Non Maximum Suppression, NMS) can eliminate the overlapping detection result, and the detection results with similar shapes and confidence scores are combined into one detection result, which is not limited in the embodiment of the invention.
Optionally, after the detection result of the construction operation by the target detection model is obtained, whether the constructor is constructed in the safe operation range or not in the construction scene corresponding to the construction operation image to be detected is judged according to the operation personnel prediction area in the construction operation detection result and based on the preset safe construction operation area, and then whether early warning information is generated or not is further determined according to the judgment result. The range of the preset safe construction operation area may be manually specified in advance, which is not limited in the embodiment of the present invention.
The early warning information may be generated by visually displaying the positions and bounding boxes of the constructors and the engineering vehicles, and the range of the warning area. For example, constructors and engineering vehicles can be marked by rectangular frames with different colors, and alarm areas can be marked by red rectangular frames; text description can be added on the graphical interface, for example, the construction personnel are detected in the construction area and the alarm area is triggered, so that a user can intuitively know the content and the condition of the alarm information; the alarm information which needs to be responded in time can be set to be reminded in a mode of automatically sending alarm mails, short messages and telephones, so that the alarm information can be mastered in time and responded.
In a specific implementation manner of an embodiment, the operation after the detection result of the construction operation by the target detection model is obtained may be regarded as an operation manner that may exist in parallel, and there is no constraint of execution according to a sequencing.
According to the embodiment of the invention, the construction operation detection result comprising the operator prediction result and the engineering vehicle prediction result can be obtained by acquiring the construction operation image to be detected and inputting the construction operation image to be detected into the target detection model. The target detection model is generated by adopting the construction operation detection model training method in the embodiment. After the construction operation image to be detected is input into the target detection model to obtain a construction operation detection result, judging whether construction personnel are in a safe operation range or not in a construction scene corresponding to the construction operation image to be detected, including but not limited to a false alarm rate detection result of the construction operation image to be detected, an overlapping rate detection result of the construction operation image to be detected and a construction scene corresponding to the construction operation image to be detected, and further determining whether early warning information is generated or not. According to the technical scheme provided by the embodiment of the invention, the automatic detection of construction operation is performed based on the network model, so that the detection efficiency and the detection accuracy are improved, and the alarm information can be mastered in time and timely response to the alarm information is made based on the detection result and the early warning information.
Example IV
Fig. 4 is a schematic structural diagram of a construction operation detection model training device according to a third embodiment of the present invention. As shown in fig. 4, the construction work detection model training apparatus includes: a construction image determination module 410, a training set generation module 420, and a model training module 430. Wherein:
a construction image determining module 410 for determining a construction work image of a construction work process of the engineering vehicle; the construction operation image simultaneously comprises an engineering vehicle and construction operators;
a training set generation module 420 for generating a sample training set including at least one construction work image; the construction operation image is provided with a sample label of an engineering vehicle standard area and an operator standard area;
the model training module 430 is configured to input the sample training set to a pre-constructed network model to obtain a model output, and perform model training on the network model according to the model output engineering vehicle prediction area and the engineering vehicle standard area, and according to the model output operator prediction area and the operator standard area, so as to obtain a target detection model for performing construction operation detection, where the preset model training end condition is satisfied;
The network model comprises an effective feature extraction network layer, a feature fusion network layer and a classification regression network layer.
The embodiment of the invention determines the construction operation image of the construction operation process of the engineering vehicle; the construction operation image simultaneously comprises an engineering vehicle and construction operators; generating a sample training set comprising at least one construction work image; the construction operation image is provided with a sample label of an engineering vehicle standard area and an operator standard area; inputting a sample training set into a pre-constructed network model to obtain model output, and carrying out model training on the network model according to an engineering vehicle prediction area and an engineering vehicle standard area output by the model and an operator prediction area and an operator standard area output by the model to obtain a target detection model for construction operation detection; the network model comprises an effective feature extraction network layer, a feature fusion network layer and a classification regression network layer. According to the technical scheme provided by the embodiment of the invention, the automatic detection of construction operation is performed based on the network model, so that the detection efficiency and the detection accuracy are improved.
Optionally, the model training module 430 includes:
the effective parameter acquisition unit is used for inputting the sample training set into an effective feature extraction network layer in the network model to perform feature extraction so as to obtain effective feature parameters output by the effective feature extraction network layer;
the fusion parameter acquisition unit is used for inputting the effective characteristic parameters into a characteristic fusion network layer in the network model to perform characteristic fusion, so as to obtain fusion characteristic parameters output by the characteristic fusion network layer;
the prediction result determining unit is used for inputting the fusion characteristic parameters into the classification regression network layer in the network model to perform regional prediction of the construction operation image, so as to obtain a prediction result output by the classification regression network layer;
the detection model acquisition unit is used for carrying out model training on the network model based on the engineering vehicle standard region and the operator standard region of the sample label according to the prediction result to obtain a target detection model which meets the preset model training ending condition.
Optionally, the effective parameter acquiring unit includes: the effective feature extraction network layer comprises a first residual block, a second residual block, a third residual block and a fourth residual block; the number and the size of convolution kernels in the first residual block, the second residual block, the third residual block and the fourth residual block are set to be different;
Correspondingly, inputting the sample training set to an effective feature extraction network layer in the network model for feature extraction to obtain effective feature parameters output by the effective feature extraction network layer, wherein the method comprises the following steps:
the first parameter determination subunit is used for inputting the sample training set into the first residual block to extract effective features and obtain first effective feature parameters output by the first residual block;
the second parameter determining subunit is used for inputting the initial effective characteristic parameters into the second residual block to extract effective characteristics so as to obtain second effective characteristic parameters output by the second residual block;
the third parameter determining subunit is used for inputting the second effective characteristic parameters into the third residual block to extract effective characteristics so as to obtain third effective characteristic parameters output by the third residual block;
the fourth parameter determining subunit is used for inputting the third effective characteristic parameters into the fourth residual block to extract the effective characteristics and obtain fourth effective characteristic parameters output by the fourth residual block;
and the characteristic parameter generation subunit is used for generating the effective characteristic parameters comprising the second effective characteristic parameter, the third characteristic effective parameter and the fourth characteristic effective parameter.
Optionally, the construction image determining module 410 includes:
The operation video acquisition unit is used for acquiring operation video fragments in the construction operation process of the engineering vehicle acquired by the image acquisition equipment;
a key frame image acquisition unit for extracting image frames simultaneously containing the engineering vehicle and the construction operator from the operation video clip as key frame images;
the diffusion frame obtaining unit is used for carrying out image processing on the key frame image by adopting at least one image processing mode to obtain a diffusion frame image corresponding to the key frame image;
and a job image determining unit for determining the key frame image and the corresponding diffusion frame image as a construction job image.
The construction operation detection model training device provided by the embodiment of the invention can execute the construction operation detection model training method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example five
Fig. 5 is a schematic structural diagram of a construction operation detection device according to a fourth embodiment of the present invention. As shown in fig. 5, the construction work detection apparatus includes: a job image acquisition module 510 and a detection result obtaining module 520. Wherein:
the job image acquisition module 510 is used for acquiring a construction job image to be detected;
The detection result obtaining module 520 is configured to input a construction operation image to be detected into the target detection model, so as to obtain a construction operation detection result;
the target detection model can be generated by adopting the method in the construction operation detection model training method.
According to the embodiment of the invention, the construction operation image to be detected is obtained; and inputting the construction operation image to be detected into a target detection model to obtain a construction operation detection result, wherein the target detection model can be generated by adopting the method in the construction operation detection model training method. According to the technical scheme provided by the embodiment of the invention, the automatic detection of construction operation is performed based on the network model, so that the detection efficiency and the detection accuracy are improved.
Optionally, the detection result obtaining module 520 includes an operator prediction result and an engineering vehicle prediction result; accordingly, after the detection result obtaining module 520, a post-detection processing unit is included, including:
the false alarm rate determining subunit is used for determining a false alarm rate detection result of the construction operation image to be detected based on a preset personnel confidence threshold value and a preset vehicle confidence threshold value according to the personnel confidence in the operation personnel prediction result and the vehicle confidence in the engineering vehicle prediction result; the method comprises the steps of,
The overlapping rate determination subunit is used for determining an overlapping rate detection result of the construction work image to be detected according to the region overlapping condition of the operator prediction region in the operator prediction result and the region overlapping condition of the engineering vehicle prediction region in the engineering vehicle prediction result; the method comprises the steps of,
the early warning information generation subunit is used for judging whether constructors in a construction scene corresponding to the construction operation image to be detected are constructed in a safe operation range or not based on a preset safe construction operation area according to an operator prediction area in a construction operation detection result, and determining whether to generate early warning information or not according to a judgment result.
The construction operation detection device provided by the embodiment of the invention can execute the construction operation detection method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example six
Fig. 6 shows a schematic diagram of an electronic device 600 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 6, the electronic device 600 includes at least one processor 610, and a memory, such as a Read Only Memory (ROM) 620, a Random Access Memory (RAM) 630, etc., communicatively coupled to the at least one processor 610, wherein the memory stores computer programs executable by the at least one processor, and the processor 610 may perform various suitable actions and processes according to the computer programs stored in the Read Only Memory (ROM) 620 or the computer programs loaded from the storage unit 680 into the Random Access Memory (RAM) 630. In (RAM) 630, various programs and data required for the operation of electronic device 600 may also be stored. The processors 610, (RAM) 620, and (RAM) 630 are connected to each other by a bus 640. An input/output (I/O) interface 650 is also connected to bus 640.
Various components in electronic device 600 are connected to I/O interface 650, including: an input unit 660 such as a keyboard, a mouse, etc.; an output unit 670 such as various types of displays, speakers, and the like; a storage unit 680 such as a magnetic disk, an optical disk, or the like; and a communication unit 690 such as a network card, modem, wireless communication transceiver, etc. The communication unit 690 allows the electronic device 600 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunication networks.
The processor 610 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 610 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. Processor 610 performs the various methods and processes described above, such as a construction job detection model training method.
In some embodiments, a construction work detection model training method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 680. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 600 via the (RAM) 620 and/or the communication unit 690. When the computer program is loaded into (RAM) 630 and executed by processor 610, one or more steps of one construction job detection model training method described above may be performed. Alternatively, in other embodiments, processor 610 may be configured to perform a construction job detection model training method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.