CN112598684A

Movatterモバイル変換

Info

Publication number: CN112598684A
Application number: CN202011577613.3A
Authority: CN
Inventors: 罗霄; 特日根; 仪锋; 韩宇; 刘欣悦
Original assignee: Chang Guang Satellite Technology Co Ltd
Current assignee: Chang Guang Satellite Technology Co Ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2021-04-02

Abstract

Translated fromChinese

本发明涉及一种基于语义分割技术的露天矿区地物分割方法，包括步骤：获取包含矿区RGB三通道彩色的图像数据集；根据所述图像数据集制作矿区训练集；对所述矿区训练集进行图像预处理，得到预处理后的矿区训练集；利用预处理后的矿区训练集对不同的深度学习模型分别进行训练，得到至少两个训练好的深度学习模型；利用每一个训练好的深度学习模型对同一目标矿区图像进行地物分割，得到对应的地物分割结果；对所述地物分割结果进行融合，得到最终矿区地物分割结果。本发明所提出的基于语义分割技术的露天矿区地物分割方法精度高、自动化程度高、处理流程简单，在露天矿区地物分类中有非常重要的意义。

The invention relates to a method for segmenting ground features in an open-pit mining area based on semantic segmentation technology, comprising the steps of: acquiring an image data set containing three-channel color RGB in the mining area; making a mining area training set according to the image data set; Image preprocessing to obtain a preprocessed mining area training set; use the preprocessed mining area training set to train different deep learning models respectively, and obtain at least two trained deep learning models; use each trained deep learning model The model performs ground object segmentation on the image of the same target mining area to obtain a corresponding ground object segmentation result; and fuses the ground object segmentation results to obtain the final mining area ground object segmentation result. The method for ground feature segmentation in an open-pit mining area based on the semantic segmentation technology proposed by the invention has high precision, a high degree of automation, and a simple processing flow, and has very important significance in the classification of ground features in an open-pit mining area.

Description

Open-pit area ground feature segmentation method based on semantic segmentation technology

Technical Field

The invention relates to the technical field of image segmentation, in particular to an open-pit mine ground feature segmentation method based on a semantic segmentation technology.

Background

Mineral resources are used as an important material basis for the development of socioeconomic development and have been the core and key of national resource safety for a long time. In recent years, with the rapid development of mining and utilization of mineral resources, mine areas around the country emerge like spring shoots after rain. However, the occurrence of some illegal behaviors in the mining area development process restricts the national environmental safety. Therefore, how to effectively monitor the mining area environment in real time has become a primary problem in the national mineral resource report in recent years.

With the rapid development of remote sensing satellites, the space-to-ground observation data acquisition capability is continuously improved, the time, space and spectral resolution of the open-pit mine images provided by the remote sensing images is continuously improved, and more real-time and more reliable data support is provided for mine image interpretation work.

The traditional image interpretation method needs to manually set features to segment the image, needs to have stronger professional knowledge in the parameter adjusting process, has poorer generalization capability and robustness, low recognition rate and lower automation degree, and cannot be suitable for large-scale commercial application. And the characteristics of the pits in the ground objects of the mining area are similar to those of the ground objects of the dumping site, the shapes of the tailings ponds of different mining areas are different, the characteristic difference is obvious, and the difficulty is increased for the classification of the ground objects of the mining area by using the traditional method. With the rapid development of deep learning and big data, the deep learning algorithm based on big data obtains a breakthrough result in the field of computer vision, so that the difficult problem which cannot be solved by the traditional image processing algorithm is solved successively. Therefore, it is an effective solution to try to train the deep learning algorithm by using the images of the mining areas in different regions and to classify the ground features of the mining areas by using the trained deep learning algorithm model.

Disclosure of Invention

In order to solve the problems of low recognition rate and low automation degree of the traditional mining area image interpretation method in the prior art, the invention improves the prior semantic segmentation algorithm, and provides an open-pit mining area ground object segmentation method based on the semantic segmentation technology by combining the mining area remote sensing data, thereby realizing the automatic mining area ground object classification. The invention provides technical support for mining area change detection, mining area ecological environment detection and the like through a deep learning means.

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for segmenting surface features in an open-pit mine area based on a semantic segmentation technology comprises the following steps:

the method comprises the following steps: acquiring an image data set containing RGB three-channel colors of a mining area;

step two: making a mining area training set according to the image data set;

step three: carrying out image preprocessing on the mining area training set to obtain a preprocessed mining area training set;

step four: respectively training different deep learning models by utilizing the preprocessed mining area training set to obtain at least two trained deep learning models;

step five: performing feature segmentation on the same target mining area image by using each trained deep learning model to obtain corresponding feature segmentation results;

step six: and fusing the ground feature segmentation results to obtain a final mining area ground feature segmentation result.

Compared with the prior art, the invention has the following beneficial effects:

1) the open-pit mine area ground object segmentation method based on the semantic segmentation technology trains the deep learning model by utilizing different ground object image blocks of different open-pit mines, and segments the open-pit mine area ground objects by utilizing the trained deep learning model, so that the problem that the traditional image processing algorithm cannot accurately segment the mine area ground objects is solved;

2) the open-pit mine ground feature segmentation method based on the semantic segmentation technology can efficiently identify different ground features with the same type in different mine areas, utilizes a Google open source deep learning system TensorFlow for development, utilizes an English Kyoda GPU for accelerating a deep learning algorithm, and can meet the real-time requirement of industrial application;

3) the open-pit area ground feature segmentation method based on the semantic segmentation technology adopts the idea of voting by voting of each pixel point in a model fusion mode, can well remove some pixel points with obvious classification errors, and greatly improves the prediction capability of the model;

4) the open-pit mine land feature segmentation method based on the semantic segmentation technology is high in precision, high in automation degree and simple in processing flow, and has very important significance in open-pit mine land feature classification.

Drawings

FIG. 1 is a flow chart of a method for segmenting surface mine land features based on semantic segmentation technology according to the present invention;

FIG. 2 is a schematic diagram of a multi-layer convolutional neural network Unet in an embodiment of the present invention;

fig. 3 is a schematic diagram of a multilayered convolutional neural network depllabv 3 in the embodiment of the present invention.

Detailed Description

The technical solution of the present invention will be described in detail with reference to the accompanying drawings and preferred embodiments.

In one embodiment, as shown in fig. 1, the invention relates to a method for segmenting surface mine features based on a semantic segmentation technology, which comprises the following steps:

step one (S100): an image data set containing RGB three-channel colors of a mine area is obtained. The images in the image dataset include different features of different strip mines and the images are in RGB three channel color.

Step two (S200): a mine area training set is made from the image dataset.

Preferably, in the second step, the step of preparing the mining area training set according to the image data set comprises the following steps: and marking ground objects on each image in the image data set, and obtaining a mining area training set after marking, wherein the ground objects comprise a mining area, a tailing pond, a dumping site, a mine pile, an industrial area and the like.

Step three (S300): and carrying out image preprocessing on the mining area training set to obtain a preprocessed mining area training set.

Preferably, in step three, in the image preprocessing process, the data size of the mining area training set is increased by using a random transformation method, and the specific steps of the random transformation method are as follows:

step three, firstly: randomly rotating the mining area training set according to 90 degrees, 180 degrees and 270 degrees;

step three: randomly turning the mining area training set left and right;

step three: carrying out random gamma transformation on the mining area training set;

step three and four: carrying out fuzzy processing on the mining area training set;

step three and five: carrying out bilateral filtering and Gaussian filtering processing on the mining area training set;

step three and six: random noise point adding processing is carried out on the mining area training set;

step three, pseudo-ginseng: and carrying out image cutting on the mining area training set.

Step four (S400): and respectively training different deep learning models by utilizing the preprocessed mining area training set to obtain at least two trained deep learning models.

In order to meet the requirements of subsequent fusion models, a training set is utilized to train a deep learning method, and various deep learning algorithms are selected. Preferably, in step four, the different deep learning models include a multilayer convolutional neural network uet and a multilayer convolutional neural network deplab v 3. When the multi-layer convolutional neural network uet and the multi-layer convolutional neural network deplab v3 are selected as deep learning models, the multi-layer convolutional neural network uet and the multi-layer convolutional neural network deplab v3 are trained respectively, it should be noted that training processes of the multi-layer convolutional neural network uet and the multi-layer convolutional neural network deplab v3 are not sequential, and in this embodiment, the description is given by taking training of the multi-layer convolutional neural network uet first and training of the multi-layer convolutional neural network deplab v3 later as examples.

Further, when the deep learning model is the multilayer convolutional neural network Unet, training the multilayer convolutional neural network Unet by using the preprocessed mining area training set comprises the following steps:

step four, firstly: constructing a multi-layer convolutional neural network Unet

The specific structure of the multi-layer convolutional neural network uet is shown in fig. 2, and the size of an input image block (i.e., an original image) is 256 × 256;

the first layer is two convolution operations, the convolution kernel size is 3 x 3, and the number of convolution kernels is 64;

the second layer is maximum pooling operation;

the third layer is two convolution operations, the convolution kernel size is 3 x 3, and the number of convolution kernels is 128;

the fourth layer is maximum pooling operation;

the fifth layer is two convolution operations, the convolution kernel size is 3 x 3, and the convolution kernel number is 256;

the sixth layer is maximum pooling operation;

the seventh layer is two convolution operations, the convolution kernel size is 3 x 3, and the number of convolution kernels is 512;

the eighth layer is a maximum pooling operation.

A ninth layer of two convolution operations, the convolution kernel size is 3 x 3, and the number of convolution kernels is 1024;

the tenth layer is up-sampling operation, and the convolution results of the seventh layer are added to generate an image matrix with the size of 1024;

the eleventh layer is convolution operation, the convolution kernel size is 3 x 3, and the number of convolution kernels is 512;

the twelfth layer is an upsampling operation, and the fifth layer convolution results are added to generate an image matrix with the size of 512;

the thirteenth layer is convolution operation, the convolution kernel size is 3 x 3, and the convolution kernel number is 256;

the fourteenth layer is an upsampling operation, and the convolution results of the third layer are added to generate an image matrix with the size of 256;

the eleventh layer is a convolution operation, the convolution kernel size is 3 x 3, and the number of convolution kernels is 128;

the sixteenth layer is an upsampling operation, and the convolution results of the third layer are added to generate an image matrix with the size of 128;

the seventeenth layer is two convolution operations, the convolution kernel size is 3 x 3, and the number of convolution kernels is 64;

the eighteenth layer is a convolution operation, convolution kernel size 1 x 1,convolution kernel number 5.

In the constructed multilayer convolutional neural network Unet, convolution operation is used for extracting high-level features of an image, the input of maximum pooling operation generally comes from the last convolution operation, the main function is to provide strong robustness, the maximum in a small area is taken, if other values in the area slightly change or the image slightly translates, the result after pooling is still unchanged, the number of parameters is reduced, the occurrence of an overfitting phenomenon is prevented, and the pooling operation generally has no parameters, so that when the parameter is propagated reversely, only derivation is needed on the input parameters, and weight updating is not needed; the picture information can be complemented by the up-sampling operation, the subsequent semantic segmentation is convenient, and meanwhile, the convolved high-resolution picture is added, which is equivalent to making a compromise between high resolution and more abstract characteristics, so that the prediction result is more accurate.

Step four and step two: multilayer convolutional neural network Unet constructed by off-line training

And setting training parameters by utilizing a preprocessed mining area training set, performing steepest descent optimization on the error gradient of the constructed multilayer convolutional neural network Unet by adopting a random gradient descent algorithm, and training the constructed multilayer convolutional neural network Unet in an off-line manner to obtain at least one trained Unet deep learning model. For example, the training parameters set in this step are the training times of the model, the training times are set to 40000 times and 45000 times respectively, the constructed multi-layer convolutional neural network Unet is trained by using the preprocessed mining area training set and the random gradient descent algorithm, and two trained Unet deep learning models can be obtained.

Further, when the deep learning model is the multilayer convolutional neural network deplab v3, training the multilayer convolutional neural network deplab v3 by using the preprocessed mining area training set includes the following steps:

step four and step three: construction of a multilayer convolutional neural network Deeplab V3

The specific structure of the multi-layer convolutional neural network deplab v3 is shown in fig. 3, and the size of an input image block (i.e., an original image) is 256 × 256;

the first layer consists of three 3 x 3 convolution operations and one maximum pooling operation, with a number of convolution kernels of 128;

the second layer consists of 3 residual units, each residual unit containing a 1 × 1 convolution operation with a convolution kernel size of 64, a 3 × 3 convolution operation with a convolution kernel size of 64, and a 1 × 1 convolution operation with a convolution kernel step size of 1 and a convolution kernel size of 256;

the third layer consists of 4 residual units, each residual unit comprises 1 × 1 convolution operation with a convolution kernel size of 128, 3 × 3 convolution operation with a convolution kernel size of 128, and 1 × 1 convolution operation with a convolution kernel step size of 2 and a convolution kernel size of 512;

the fourth layer consists of 6 residual units, each of which contains 1 × 1 convolution operation withconvolution kernel size 256, 3 × 3 hole convolution operation with

convolution kernel size

256 and 1 × 1 convolution operation with rate 2 andconvolution kernel size 1024;

the fifth layer consists of 3 residual units, each of which contains 1 × 1 convolution operation withconvolution kernel size 512, 3 × 3 hole convolution operation with

convolution kernel size

512, and 1 × 1 convolution operation withrate 4 and convolution kernel size 2048;

the sixth layer comprises five branches, wherein the 1 st branch is 1 × 1 convolution operation, the 2 nd, 3 rd and 4 th branches are 3 × 3 hole convolution operations, the corresponding rates are respectively 12, 24 and 36, the fifth branch is global average pooling operation plus 1 × 1 convolution operation, the number of convolution kernels of the five branches is 256, and finally the five branches are combined and used for outputting 256-dimensional new features by using the 1 × 1 convolution operation;

the seventh layer comprises a 3 x 3 convolution operation, a 1 x 1 convolution operation with a convolution kernel size of 5, and a bilinear downsampling operation;

step four: multilayer convolutional neural network Deeplab V3 constructed by offline training

And setting training parameters by utilizing a preprocessed mining area training set, performing steepest descent optimization on the error gradient of the constructed multilayer convolutional neural network Deeplab V3 by adopting a random gradient descent algorithm, and training the constructed multilayer convolutional neural network Deeplab V3 in an off-line manner to obtain at least one trained deep learning model of Deeplab V3. For example, the training parameters set in this step are the training times of the model, the training times are set to 40000 times, 45000 times and 50000 times respectively, and the constructed multilayer convolutional neural network deep learning model deplab v3 is trained by using the preprocessed mining area training set and the stochastic gradient descent algorithm, so that three trained deep learning models of the deplab v3 can be obtained.

Step five (S500): and performing feature segmentation on the same target mining area image by using each trained deep learning model to obtain a corresponding feature segmentation result.

And after at least two trained deep learning models are obtained in the fourth step, in the fifth step, performing feature segmentation on the same target mining area image by using each trained deep learning model, so as to obtain a corresponding feature segmentation result.

For example, for selecting a multi-layer convolutional neural network Unet and a multi-layer convolutional neural network Deeplab V3 as deep learning models, and obtaining two trained Unet deep learning models and three trained Deeplab V3 deep learning models by respectively training through adjusting training parameters, the step uses 5 models, namely the two trained Unet deep learning models and the three trained Deeplab V3 deep learning models, to respectively perform feature segmentation on the same target mine image, so as to obtain a feature segmentation result corresponding to each model.

Step six (S600): and fusing the ground feature segmentation results to obtain the final mining area ground feature segmentation result.

And in the step five, the ground feature segmentation results corresponding to all the trained deep learning models obtained in the step five are fused to obtain the final mining area ground feature segmentation result.

Preferably, in the sixth step, a fusion method of voting a plurality of prediction results is adopted when the ground feature segmentation results are fused, and the fusion method includes the following steps:

step six: classifying and predicting pixel points in the target mining area image by using each trained deep learning model, and selecting a prediction result with the most prediction categories as a classification prediction result of the pixel points;

step six and two: traversing all pixel points of the target mine area image according to the method of the sixth step to obtain a classification prediction result of all pixel points in the target mine area image;

step six and three: and drawing an image matrix according to the classification prediction results of all the pixel points to obtain a final mining area ground object segmentation result.

The model prediction of the final image semantic segmentation is to perform prediction classification on each pixel point of an image, each category corresponds to a pixel value, then the pixel value corresponding to the category of each pixel point is drawn into an image matrix, and finally a prediction result is obtained. When ground feature segmentation results are fused, each trained deep learning model is used for carrying out classification prediction on pixel points in the target mine area image, and the prediction result with the largest prediction category is selected as the classification prediction result of the pixel points, wherein the pixel points are any point in the target mine area image; after traversing all pixel points in the target mine area image according to the method of the sixth step, obtaining classification prediction results of all pixel points in the target mine area image; and finally, drawing an image matrix according to the classification prediction results of all the pixel points, and obtaining the final mining area ground object segmentation result. By adopting the idea of voting by voting of each pixel point in a model fusion mode, some pixel points with obvious classification errors can be well removed, and the prediction capability of the model is improved to a great extent.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for segmenting surface features in an open-pit mine area based on a semantic segmentation technology is characterized by comprising the following steps:

step two: making a mining area training set according to the image data set;

2. The method for surface property segmentation based on semantic segmentation technology according to claim 1, wherein the process of image preprocessing the training set of the mine area comprises using a stochastic transformation method to increase the data size of the training set of the mine area, and the stochastic transformation method comprises the following steps:

step three: randomly turning the mining area training set left and right;

step three and six: carrying out random noise point adding processing on the mining area training set;

3. The method for segmenting the land features of the strip mine area based on the semantic segmentation technology according to claim 1 or 2, wherein a fusion method for voting on a plurality of prediction results is adopted when the land feature segmentation results are fused, and the fusion method comprises the following steps:

step six and three: and drawing an image matrix according to the classification prediction results of all the pixel points to obtain the final mining area ground object segmentation result.

4. The method for segmenting the land features of the strip mine area based on the semantic segmentation technology as claimed in claim 3,

in step four, the different deep learning models include the multilayer convolutional neural network uet and the multilayer convolutional neural network depllabv 3.

5. The method for segmenting the surface features based on the semantic segmentation technology as claimed in claim 4, wherein when the deep learning model is a multi-layer convolutional neural network Unet, training the multi-layer convolutional neural network Unet by using the preprocessed mining area training set comprises the following steps:

the second layer is maximum pooling operation;

the fourth layer is maximum pooling operation;

the sixth layer is maximum pooling operation;

the eighth layer is a maximum pooling operation.

the eighteenth layer is convolution operation, the convolution kernel size is 1 x 1, and the convolution kernel number is 5;

And setting training parameters by utilizing a preprocessed mining area training set, performing steepest descent optimization on the error gradient of the constructed multilayer convolutional neural network Unet by adopting a random gradient descent algorithm, and training the constructed multilayer convolutional neural network Unet in an off-line manner to obtain at least one trained Unet deep learning model.

6. The method for segmenting the surface features of the strip mine based on the semantic segmentation technology, as claimed in claim 5, wherein when the deep learning model is a multi-layer convolutional neural network Deeplab V3, training the multi-layer convolutional neural network Deeplab V3 by using the preprocessed mine area training set comprises the following steps:

the fourth layer consists of 6 residual units, each of which contains 1 × 1 convolution operation with convolution kernel size 256, 3 × 3 hole convolution operation with convolution kernel size 256 and 1 × 1 convolution operation with rate 2 and convolution kernel size 1024;

the fifth layer consists of 3 residual units, each of which contains 1 × 1 convolution operation with convolution kernel size 512, 3 × 3 hole convolution operation with convolution kernel size 512, and 1 × 1 convolution operation with rate 4 and convolution kernel size 2048;

And setting training parameters by utilizing a preprocessed mining area training set, performing steepest descent optimization on the error gradient of the constructed multilayer convolutional neural network Deeplab V3 by adopting a random gradient descent algorithm, and training the constructed multilayer convolutional neural network Deeplab V3 in an off-line manner to obtain at least one trained deep learning model of Deeplab V3.

7. The method for segmenting the land features in the strip mine area based on the semantic segmentation technology as claimed in claim 6,

and adjusting the training parameters in the second step and the fourth step to respectively obtain two trained Unet deep learning models and three trained Deeplab V3 deep learning models.

8. The method for segmenting surface mine land features based on semantic segmentation technology according to claim 7,

the training parameter is the training times of the model.

9. The method for segmenting surface mine land features based on semantic segmentation technology according to claim 8,

the training times of the two trained Unet deep learning models are 40000 times and 45000 times respectively, and the training times of the three trained Deeplab V3 deep learning models are 40000 times, 45000 times and 50000 times respectively.

10. The method for segmenting surface features based on semantic segmentation technology according to claim 1 or 2, characterized in that the step of preparing a mining area training set according to the image data set comprises the following steps:

and marking ground objects on each image in the image data set, and obtaining the mining area training set after marking, wherein the ground objects comprise a mining area, a tailing pond, a refuse dump, a mine pile and an industrial area.