Detailed Description
The technical solution of the present invention will be described in detail with reference to the accompanying drawings and preferred embodiments.
In one embodiment, as shown in fig. 1, the invention relates to a method for segmenting surface mine features based on a semantic segmentation technology, which comprises the following steps:
step one (S100): an image data set containing RGB three-channel colors of a mine area is obtained. The images in the image dataset include different features of different strip mines and the images are in RGB three channel color.
Step two (S200): a mine area training set is made from the image dataset.
Preferably, in the second step, the step of preparing the mining area training set according to the image data set comprises the following steps: and marking ground objects on each image in the image data set, and obtaining a mining area training set after marking, wherein the ground objects comprise a mining area, a tailing pond, a dumping site, a mine pile, an industrial area and the like.
Step three (S300): and carrying out image preprocessing on the mining area training set to obtain a preprocessed mining area training set.
Preferably, in step three, in the image preprocessing process, the data size of the mining area training set is increased by using a random transformation method, and the specific steps of the random transformation method are as follows:
step three, firstly: randomly rotating the mining area training set according to 90 degrees, 180 degrees and 270 degrees;
step three: randomly turning the mining area training set left and right;
step three: carrying out random gamma transformation on the mining area training set;
step three and four: carrying out fuzzy processing on the mining area training set;
step three and five: carrying out bilateral filtering and Gaussian filtering processing on the mining area training set;
step three and six: random noise point adding processing is carried out on the mining area training set;
step three, pseudo-ginseng: and carrying out image cutting on the mining area training set.
Step four (S400): and respectively training different deep learning models by utilizing the preprocessed mining area training set to obtain at least two trained deep learning models.
In order to meet the requirements of subsequent fusion models, a training set is utilized to train a deep learning method, and various deep learning algorithms are selected. Preferably, in step four, the different deep learning models include a multilayer convolutional neural network uet and a multilayer convolutional neural network deplab v 3. When the multi-layer convolutional neural network uet and the multi-layer convolutional neural network deplab v3 are selected as deep learning models, the multi-layer convolutional neural network uet and the multi-layer convolutional neural network deplab v3 are trained respectively, it should be noted that training processes of the multi-layer convolutional neural network uet and the multi-layer convolutional neural network deplab v3 are not sequential, and in this embodiment, the description is given by taking training of the multi-layer convolutional neural network uet first and training of the multi-layer convolutional neural network deplab v3 later as examples.
Further, when the deep learning model is the multilayer convolutional neural network Unet, training the multilayer convolutional neural network Unet by using the preprocessed mining area training set comprises the following steps:
step four, firstly: constructing a multi-layer convolutional neural network Unet
The specific structure of the multi-layer convolutional neural network uet is shown in fig. 2, and the size of an input image block (i.e., an original image) is 256 × 256;
the first layer is two convolution operations, the convolution kernel size is 3 x 3, and the number of convolution kernels is 64;
the second layer is maximum pooling operation;
the third layer is two convolution operations, the convolution kernel size is 3 x 3, and the number of convolution kernels is 128;
the fourth layer is maximum pooling operation;
the fifth layer is two convolution operations, the convolution kernel size is 3 x 3, and the convolution kernel number is 256;
the sixth layer is maximum pooling operation;
the seventh layer is two convolution operations, the convolution kernel size is 3 x 3, and the number of convolution kernels is 512;
the eighth layer is a maximum pooling operation.
A ninth layer of two convolution operations, the convolution kernel size is 3 x 3, and the number of convolution kernels is 1024;
the tenth layer is up-sampling operation, and the convolution results of the seventh layer are added to generate an image matrix with the size of 1024;
the eleventh layer is convolution operation, the convolution kernel size is 3 x 3, and the number of convolution kernels is 512;
the twelfth layer is an upsampling operation, and the fifth layer convolution results are added to generate an image matrix with the size of 512;
the thirteenth layer is convolution operation, the convolution kernel size is 3 x 3, and the convolution kernel number is 256;
the fourteenth layer is an upsampling operation, and the convolution results of the third layer are added to generate an image matrix with the size of 256;
the eleventh layer is a convolution operation, the convolution kernel size is 3 x 3, and the number of convolution kernels is 128;
the sixteenth layer is an upsampling operation, and the convolution results of the third layer are added to generate an image matrix with the size of 128;
the seventeenth layer is two convolution operations, the convolution kernel size is 3 x 3, and the number of convolution kernels is 64;
the eighteenth layer is a convolution operation, convolution kernel size 1 x 1,convolution kernel number 5.
In the constructed multilayer convolutional neural network Unet, convolution operation is used for extracting high-level features of an image, the input of maximum pooling operation generally comes from the last convolution operation, the main function is to provide strong robustness, the maximum in a small area is taken, if other values in the area slightly change or the image slightly translates, the result after pooling is still unchanged, the number of parameters is reduced, the occurrence of an overfitting phenomenon is prevented, and the pooling operation generally has no parameters, so that when the parameter is propagated reversely, only derivation is needed on the input parameters, and weight updating is not needed; the picture information can be complemented by the up-sampling operation, the subsequent semantic segmentation is convenient, and meanwhile, the convolved high-resolution picture is added, which is equivalent to making a compromise between high resolution and more abstract characteristics, so that the prediction result is more accurate.
Step four and step two: multilayer convolutional neural network Unet constructed by off-line training
And setting training parameters by utilizing a preprocessed mining area training set, performing steepest descent optimization on the error gradient of the constructed multilayer convolutional neural network Unet by adopting a random gradient descent algorithm, and training the constructed multilayer convolutional neural network Unet in an off-line manner to obtain at least one trained Unet deep learning model. For example, the training parameters set in this step are the training times of the model, the training times are set to 40000 times and 45000 times respectively, the constructed multi-layer convolutional neural network Unet is trained by using the preprocessed mining area training set and the random gradient descent algorithm, and two trained Unet deep learning models can be obtained.
Further, when the deep learning model is the multilayer convolutional neural network deplab v3, training the multilayer convolutional neural network deplab v3 by using the preprocessed mining area training set includes the following steps:
step four and step three: construction of a multilayer convolutional neural network Deeplab V3
The specific structure of the multi-layer convolutional neural network deplab v3 is shown in fig. 3, and the size of an input image block (i.e., an original image) is 256 × 256;
the first layer consists of three 3 x 3 convolution operations and one maximum pooling operation, with a number of convolution kernels of 128;
the second layer consists of 3 residual units, each residual unit containing a 1 × 1 convolution operation with a convolution kernel size of 64, a 3 × 3 convolution operation with a convolution kernel size of 64, and a 1 × 1 convolution operation with a convolution kernel step size of 1 and a convolution kernel size of 256;
the third layer consists of 4 residual units, each residual unit comprises 1 × 1 convolution operation with a convolution kernel size of 128, 3 × 3 convolution operation with a convolution kernel size of 128, and 1 × 1 convolution operation with a convolution kernel step size of 2 and a convolution kernel size of 512;
the fourth layer consists of 6 residual units, each of which contains 1 × 1 convolution operation withconvolution kernel size 256, 3 × 3 hole convolution operation withconvolution kernel size 256 and 1 × 1 convolution operation with rate 2 andconvolution kernel size 1024;
the fifth layer consists of 3 residual units, each of which contains 1 × 1 convolution operation withconvolution kernel size 512, 3 × 3 hole convolution operation withconvolution kernel size 512, and 1 × 1 convolution operation withrate 4 and convolution kernel size 2048;
the sixth layer comprises five branches, wherein the 1 st branch is 1 × 1 convolution operation, the 2 nd, 3 rd and 4 th branches are 3 × 3 hole convolution operations, the corresponding rates are respectively 12, 24 and 36, the fifth branch is global average pooling operation plus 1 × 1 convolution operation, the number of convolution kernels of the five branches is 256, and finally the five branches are combined and used for outputting 256-dimensional new features by using the 1 × 1 convolution operation;
the seventh layer comprises a 3 x 3 convolution operation, a 1 x 1 convolution operation with a convolution kernel size of 5, and a bilinear downsampling operation;
step four: multilayer convolutional neural network Deeplab V3 constructed by offline training
And setting training parameters by utilizing a preprocessed mining area training set, performing steepest descent optimization on the error gradient of the constructed multilayer convolutional neural network Deeplab V3 by adopting a random gradient descent algorithm, and training the constructed multilayer convolutional neural network Deeplab V3 in an off-line manner to obtain at least one trained deep learning model of Deeplab V3. For example, the training parameters set in this step are the training times of the model, the training times are set to 40000 times, 45000 times and 50000 times respectively, and the constructed multilayer convolutional neural network deep learning model deplab v3 is trained by using the preprocessed mining area training set and the stochastic gradient descent algorithm, so that three trained deep learning models of the deplab v3 can be obtained.
Step five (S500): and performing feature segmentation on the same target mining area image by using each trained deep learning model to obtain a corresponding feature segmentation result.
And after at least two trained deep learning models are obtained in the fourth step, in the fifth step, performing feature segmentation on the same target mining area image by using each trained deep learning model, so as to obtain a corresponding feature segmentation result.
For example, for selecting a multi-layer convolutional neural network Unet and a multi-layer convolutional neural network Deeplab V3 as deep learning models, and obtaining two trained Unet deep learning models and three trained Deeplab V3 deep learning models by respectively training through adjusting training parameters, the step uses 5 models, namely the two trained Unet deep learning models and the three trained Deeplab V3 deep learning models, to respectively perform feature segmentation on the same target mine image, so as to obtain a feature segmentation result corresponding to each model.
Step six (S600): and fusing the ground feature segmentation results to obtain the final mining area ground feature segmentation result.
And in the step five, the ground feature segmentation results corresponding to all the trained deep learning models obtained in the step five are fused to obtain the final mining area ground feature segmentation result.
Preferably, in the sixth step, a fusion method of voting a plurality of prediction results is adopted when the ground feature segmentation results are fused, and the fusion method includes the following steps:
step six: classifying and predicting pixel points in the target mining area image by using each trained deep learning model, and selecting a prediction result with the most prediction categories as a classification prediction result of the pixel points;
step six and two: traversing all pixel points of the target mine area image according to the method of the sixth step to obtain a classification prediction result of all pixel points in the target mine area image;
step six and three: and drawing an image matrix according to the classification prediction results of all the pixel points to obtain a final mining area ground object segmentation result.
The model prediction of the final image semantic segmentation is to perform prediction classification on each pixel point of an image, each category corresponds to a pixel value, then the pixel value corresponding to the category of each pixel point is drawn into an image matrix, and finally a prediction result is obtained. When ground feature segmentation results are fused, each trained deep learning model is used for carrying out classification prediction on pixel points in the target mine area image, and the prediction result with the largest prediction category is selected as the classification prediction result of the pixel points, wherein the pixel points are any point in the target mine area image; after traversing all pixel points in the target mine area image according to the method of the sixth step, obtaining classification prediction results of all pixel points in the target mine area image; and finally, drawing an image matrix according to the classification prediction results of all the pixel points, and obtaining the final mining area ground object segmentation result. By adopting the idea of voting by voting of each pixel point in a model fusion mode, some pixel points with obvious classification errors can be well removed, and the prediction capability of the model is improved to a great extent.
Compared with the prior art, the invention has the following beneficial effects:
1) the open-pit mine area ground object segmentation method based on the semantic segmentation technology trains the deep learning model by utilizing different ground object image blocks of different open-pit mines, and segments the open-pit mine area ground objects by utilizing the trained deep learning model, so that the problem that the traditional image processing algorithm cannot accurately segment the mine area ground objects is solved;
2) the open-pit mine ground feature segmentation method based on the semantic segmentation technology can efficiently identify different ground features with the same type in different mine areas, utilizes a Google open source deep learning system TensorFlow for development, utilizes an English Kyoda GPU for accelerating a deep learning algorithm, and can meet the real-time requirement of industrial application;
3) the open-pit area ground feature segmentation method based on the semantic segmentation technology adopts the idea of voting by voting of each pixel point in a model fusion mode, can well remove some pixel points with obvious classification errors, and greatly improves the prediction capability of the model;
4) the open-pit mine land feature segmentation method based on the semantic segmentation technology is high in precision, high in automation degree and simple in processing flow, and has very important significance in open-pit mine land feature classification.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.