Unmanned aerial vehicle remote sensing image road information extraction method based on multi-scale generation countermeasure networkTechnical Field
The invention relates to the technical field of unmanned aerial vehicle remote sensing image automatic processing, in particular to a method for extracting high-resolution unmanned aerial vehicle remote sensing image road information based on generation of a countermeasure network and fusion of multi-scale image processing.
Background
The unmanned aerial vehicle remote sensing is one of the development trends of remote sensing, has the advantages of strong timeliness, pertinence, high flexibility and the like in the data acquisition process, and is an important way for acquiring remote sensing data. The road is one of the most common ground feature information in the remote sensing image, and the extraction of the road information has important significance in the fields of military strategy, space mapping, urban construction, traffic management, traffic navigation and the like which concern the national civilians.
In recent years, with the rapid development of deep learning, a wide range of fields such as machine learning including computer vision is rapidly occupied by deep learning, including image classification, target detection and image semantic segmentation. Compared with the traditional algorithm, the deep learning is improved by 20% -30%, which is mainly attributed to the strong learning ability of the convolutional neural network to the image features, which cannot be compared with the traditional algorithm based on pixel and boundary identification.
Although many existing convolutional neural network models have been successfully represented in image semantic segmentation, but sometimes some features in the training data are not well learned, especially in the image semantic segmentation task, segmentation models typically predict the class of each pixel, which may be highly accurate at the pixel level, but the correlation between the pixels is easy to be ignored, so that the objects in the segmentation result are not complete enough or the sizes and shapes of some objects in the segmentation result are different from those in the labels, and the complexity and the changeability of the real scene always cause the lack of the general type of the convolutional neural network, and the influence factors of the lack of generalization capability of the segmentation model are the large change of the target object, the shielding and the overlapping of objects in different scenes, the lack of high-resolution characteristics, the lack of illumination change and the like.
The generation of the countermeasure network is a method proposed for solving the above problems, and is to generate a generation model in the countermeasure network, learn random noise through a convolutional neural network to generate an image similar to the input data distribution, control the difference between the generated false image and the original input image through a discrimination network, make the generated false image as close as possible to the original image, and finally make the discrimination impossible.
In the image semantic segmentation task, a generative model in a generation countermeasure network generates a probability map of label class prediction at a pixel level by learning the characteristics of input RGB images, and the difference between the probability map generated by the generative model and a real sample label is judged through a judgment network. Compared with the traditional convolutional neural network, the generation of the confrontation network model can not only improve the integrity of a single object in the image semantic segmentation result, but also keep the mutual independence among the objects and improve the segmentation precision.
Due to the characteristic problem of the unmanned aerial vehicle remote sensing platform, the flying heights of unmanned aerial vehicles with different numbers are often different, so that the imaging sizes and sizes of the same ground objects are different, and for a road area, when the flying height of the unmanned aerial vehicle is lower, the road area may occupy more than 90% of the area of a single image in an imaging image of the unmanned aerial vehicle, even reach 100%; when the unmanned aircraft is high in altitude, the area of the road area may only account for 10% or even less of the area of a single image. In the conventional convolutional neural network structure, after a network model is designed, when a convolution operation is performed on an image by using a large convolutional kernel to extract features, an object with a small target is often ignored, and when the convolution operation is performed on the image by using a small convolutional kernel to extract features, a discontinuous phenomenon is easily generated in the segmentation result of the object with the large target, so that the image segmentation precision is influenced.
Disclosure of Invention
The invention aims to solve the problem that the extraction precision of a road region is influenced by overlarge or undersize proportion of the road region area in a single remote sensing image when a camera images due to inconsistent flight heights of unmanned aerial vehicles, and provides an unmanned aerial vehicle image road information extraction method based on a multi-scale generation countermeasure network.
In order to achieve the purpose, the invention provides an unmanned aerial vehicle image road information extraction method based on a multi-scale generation countermeasure network, which is characterized by comprising the following steps:
(1) obtaining training data
Cutting an original unmanned aerial vehicle remote sensing image into a series of remote sensing images with the size of n multiplied by n, then making label images for marking a road area, and taking each remote sensing image and the corresponding label image as training data;
(2) building a generating network
2.1) in the generation network, obtaining RGB three-channel images with the sizes of 0.5 nx0.5 n and 2 nx2 n respectively through convolution operation and deconvolution operation on the RGB three-channel images of the remote sensing images with the sizes of nxn;
2.2) in the network generation, the RGB three-channel image with the size of 2n multiplied by 2n obtained in the step 2.1) is subjected to an image segmentation network of end-to-end training to obtain a classification probability feature map with the size of 2n multiplied by 2n, and the probability feature map with the size of n multiplied by n is obtained through convolution operation;
2.3) in the network generation, the RGB three-way image with the size of 0.5n multiplied by 0.5n obtained in the step 2.1) is subjected to an image segmentation network with the same structure and end-to-end training in the step 2.2) to obtain a classification probability feature map with the size of 0.5n multiplied by 0.5n, and a probability feature map with the size of n multiplied by n is obtained through deconvolution operation;
2.4) in the generation network, the RGB image of the remote sensing image with the size of n multiplied by n is subjected to the image segmentation network with the same structure as that in the step 2.2) to obtain a classification probability feature map with the size of n multiplied by n, namely an n multiplied by n probability feature map;
2.4), in the generation network, finally fusing the three probability characteristic graphs with the size of n multiplied by n obtained in the steps 2.2, 2.3 and 2.4 by a pixel-by-pixel addition method to obtain an output characteristic graph of the generation network;
(3) inputting the remote sensing image with the size of n multiplied by n in the training data into the generating network constructed in the step (2) to obtain an output characteristic diagram, respectively carrying out convolution operation on the output characteristic diagram and the remote sensing image with the size of n multiplied by n, connecting the characteristic diagrams obtained by convolution to be used as the input of a judging network, obtaining an output between 0 and 1 after the judging network passes through the judging network, taking the input as the input of a false image by the judging network, wherein the expected output of the judging network is 0 at the moment, and subtracting the output of the judging network from the expected output at the moment to obtain an error;
(4) respectively carrying out convolution operation on the remote sensing image with the size of n multiplied by n in the training data and the label image corresponding to the remote sensing image, then connecting feature maps obtained by convolution to be used as input of a discrimination network, obtaining output between 0 and 1 after passing through the discrimination network, using the input as the input of a real image by the discrimination network, wherein the expected output of the discrimination network is 1 at the moment, and subtracting the output of the discrimination network from the expected output at the moment to obtain an error;
(5) reversely propagating the errors obtained in the steps (3) and (4), updating the generated network and judging network parameters, wherein the image segmentation networks of the end-to-end training in the steps (2.2), (2.3) and (2.4) share weight parameters;
(6) training the generation network through steps (3), (4) and (5) by all remote sensing images and the label images corresponding to the remote sensing images in the training data obtained in the step (1), so that the generation network and the discrimination network in the generation countermeasure network reach a balanced state, and the difference between the output characteristic diagram (i.e. false diagram) generated by the generation network and the label images is very small, so that the discrimination network cannot discriminate whether the input image is from the label image or the output characteristic diagram (i.e. false diagram) generated by the generation network;
(7) and independently taking out and applying a generation network in the generation countermeasure network in the balanced state, cutting the remote sensing image shot by the actual unmanned aerial vehicle into a series of remote sensing images with the size of n multiplied by n, taking the remote sensing images as input, and taking an output characteristic diagram of the generation network as an extracted road area image which is a segmentation result.
The object of the invention is thus achieved.
The invention relates to an unmanned aerial vehicle image road information extraction method based on a multi-scale generation countermeasure network, which comprises the steps of firstly, carrying out convolution and deconvolution operations on a remote sensing image respectively to obtain an image with the length and the width reduced by one time and an image with the length and the width enlarged by one time; secondly, passing the images of the three scales through an end-to-end training image segmentation network to obtain pixel level prediction probability graphs of the three corresponding scales, namely output characteristic graphs; thirdly, unifying the pixel-level output feature graphs of the three scales to the original training data image size scale through convolution and deconvolution operations, and fusing the three scale image features together through a pixel-by-pixel addition method; and finally, inputting the output characteristic diagram fused with the three scale characteristics into a discrimination network, comparing the output characteristic diagram with a real sample label to obtain an error, reversely propagating the error, and updating the generated network and the discrimination network parameters. After training of a certain amount of training data, a generation network and a discrimination network in the generated countermeasure network reach a balance, and a false image generated by the generation network has little difference with a real label image, so that the discrimination network cannot discriminate whether an input image is from the label image or the false image generated by the generation network. And finally, taking the generation result of the generation network in the countermeasure generation network as the segmentation result in the application, namely the extracted road area image.
According to the invention, the characteristics of the unmanned aerial vehicle remote sensing image are learned through the convolutional neural network, the advantages of the generated countermeasure network are combined, and meanwhile, a multi-scale image processing method is fused, so that the road area can be well extracted under the condition that the road area occupies too large or too small of the area of a single image, and the road area segmentation precision in the unmanned aerial vehicle remote sensing image is improved.
Drawings
FIG. 1 is a diagram of a generation of a global structure of a countermeasure network;
FIG. 2 is a flowchart of an embodiment of the method for extracting image road information of an unmanned aerial vehicle based on a multi-scale generation countermeasure network according to the present invention;
FIG. 3 is a diagram of a fused multi-scale generation network architecture in accordance with the present invention;
FIG. 4 is a diagram of a discrimination network architecture;
FIG. 5 is a set of maps of the present invention against road region images without fused multi-scale features that generate a network output;
FIG. 6 is another set of maps of the present invention with road region images that generate network outputs without fusing multi-scale features.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
Fig. 1 is a general configuration diagram of a generation countermeasure network.
As shown in fig. 1, in the generation of the countermeasure network, the remote sensing image is input to the generation network to obtain a false map which is an output characteristic map, the tag image or the false map and the remote sensing image are input to the discrimination network to obtain a true/false probability, and then the true/false probability is subtracted from the expected output 1/0 to obtain an error. And (5) error back propagation, updating the generated network and judging the network parameters. The remote sensing image used as training and the corresponding label image are continuously input, so that a generating network and a judging network in the generated countermeasure network reach a balanced state, the difference between the output characteristic diagram, namely the false diagram, generated by the generating network and the label image is small, and the judging network cannot judge whether the input image is from the label image or the output characteristic diagram, namely the false diagram, generated by the generating network. Therefore, the obtained generation network can be applied to the remote sensing image segmentation task shot by the actual unmanned aerial vehicle.
Fig. 2 is a flowchart of an embodiment of the method for extracting the image road information of the unmanned aerial vehicle based on the multi-scale generation countermeasure network.
In this embodiment, as shown in fig. 1, the method for extracting the image road information of the unmanned aerial vehicle based on the multi-scale generation countermeasure network of the present invention includes the following steps:
step S1: obtaining training data
Cutting an original unmanned aerial vehicle remote sensing image into a series of remote sensing images with the size of n multiplied by n, then making label images for marking a road area, and taking each remote sensing image and the corresponding label image as training data.
In this embodiment, the original unmanned aerial vehicle remote sensing image is cut into a series of remote sensing images of 500 × 500, and then a label image for marking a road area is made manually. In order to verify the object segmentation capability of the present invention, in the present embodiment, 90% of the remote sensing images and their corresponding label images are used as training data, and the remaining 10% of the remote sensing images and their corresponding label images are used as test data.
Step S2: building a generative network
As shown in fig. 3, in the generation network, for the RGB three-channel image I of the remote sensing image with the size of n × n, the RGB three-channel image I with the sizes of 0.5n × 0.5n and 2n × 2n is obtained by convolution operation and deconvolution operation, respectively1、I2. In the present embodiment, the RGB three-channel image I is 1000 × 1000 and 250 × 250, respectively1、I2。
The obtained RGB three-channel image I with the size of 2n multiplied by 2n1Obtaining a classification probability characteristic diagram I with the size of 2n multiplied by 2n through an image segmentation network trained end to end3Obtaining a probability characteristic diagram I with the size of n multiplied by n through convolution operation4;
The obtained RGB three-way image I with the size of 0.5n multiplied by 0.5n is used2Obtaining a classification probability characteristic diagram I with the size of 0.5n multiplied by 0.5n through an image segmentation network with the same structure and end-to-end training5Obtaining a probability characteristic diagram I with the size of n multiplied by n through deconvolution operation6;
An RGB image I of a remote sensing image with the size of n multiplied by n is subjected to an image segmentation network with the same structure to obtain a classification probability feature map with the size of n multiplied by n, namely an n multiplied by n probability feature map I7;
Finally, the obtained three probability characteristic graphs I with the size of n multiplied by n4、I6、I7Fusing image characteristics of three scales by a pixel-by-pixel addition method to obtain an output characteristic diagram I of a generated network8。
Step S3: the remote sensing image with the size of n × n in the training data is input to the generation network constructed in step S2, and an output feature map is obtained. As shown in fig. 4, the output feature map and the remote sensing image with the size of n × n are respectively subjected to a convolution operation, the feature maps obtained by convolution are connected to be used as the input of the discrimination network, an output between 0 and 1 is obtained after the feature maps pass through the discrimination network, the discrimination network takes the input as the input of the false image, the expected output of the discrimination network is 0 at the moment, and the output of the discrimination network and the expected output at the moment are subtracted to obtain an error.
Step S4: the remote sensing image with the size of n multiplied by n in the training data and the corresponding label image are respectively subjected to convolution operation once, then feature maps obtained by convolution are connected to be used as the input of a discrimination network, an output between 0 and 1 is obtained after the feature maps pass through the discrimination network, the discrimination network takes the input as the input of a real image, the expected output of the discrimination network is 1 at the moment, and the output of the discrimination network and the expected output at the moment are subtracted to obtain an error.
Step S5: and (4) reversely propagating the errors obtained in the steps S3 and S4, updating the generated network and judging network parameters, wherein the weight parameters are shared by the three end-to-end trained image segmentation networks in the step S2.
Step S6: all the remote sensing images and the corresponding label images in the training data obtained in the step S1 are subjected to training of the generation network through the steps S3, S4 and S5, so that the generation network and the discrimination network in the generation countermeasure network reach a balanced state, and the output characteristic diagram, namely the false diagram, generated by the generation network is slightly different from the label images, so that the discrimination network cannot discriminate whether the input image is from the label image or the output characteristic diagram, namely the false diagram, generated by the generation network.
Step S7: the generated network in the generated countermeasure network in the balanced state is independently taken out and applied, the remote sensing images shot by the actual unmanned aerial vehicle are cut into a series of remote sensing images with the size of n multiplied by n, the remote sensing images are used as input, and the output characteristic diagram of the generated network is used as the segmentation result, namely the extracted road area image.
Test data are input into a generation network, and the obtained road area image is compared with the label image, so that the extraction effect is good.
FIG. 5 is a set of maps of the present invention against road region images without fused multi-scale features that generate the output of the network.
In fig. 5, a first column of images is an input original unmanned aerial vehicle remote sensing image, a second column of images is a corresponding artificially labeled tag image, a third column of data is a road area image output by a generation network that fuses multi-scale features, and a fourth column of data is a road area image output by a generation network that does not fuse multi-scale features.
Through comparison, the fact that under the conditions that the image background is relatively simple and the outline of the road target area is obvious, the road area information is well extracted through the network which is fused with the multi-scale feature structure and the network which is not fused with the multi-scale feature structure;
FIG. 6 is another set of maps of the present invention with road region images that generate network outputs without fusing multi-scale features.
In fig. 6, a first column of images is an input original unmanned aerial vehicle remote sensing image, a second column of images is a corresponding artificially labeled tag image, a third column of data is a road area image output by a generation network that fuses multi-scale features, and a fourth column of data is a road area image output by a generation network that does not fuse multi-scale features.
Through comparison, the road information extracted by the network with the multi-scale feature structure is more accurate than the road information extracted by the network without the multi-scale feature structure under the condition that the road area in the image is shaded and other objects similar to the road features exist.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.