Disclosure of Invention
The invention provides a building identification method and device based on a remote sensing image, which are used for solving the defects of loss of the middle part and loss of edges of a large building when the building identification is carried out on the remote sensing image based on high resolution in the prior art, and realizing more accurate identification of the building in the remote sensing image based on high resolution.
In a first aspect, the present invention provides a building identification method based on remote sensing images, including: obtaining a target remote sensing image containing a building; inputting the target remote sensing image into a building identification model, and outputting an identification result of the building in the target remote sensing image; the building identification model comprises a semantic segmentation neural network combining space attention and channel attention and is obtained by training according to a remote sensing image sample and the labeling information of the building in the remote sensing image sample.
According to the building identification method based on the remote sensing image, the target remote sensing image is input into a building identification model, and the identification result of the building in the target remote sensing image is output, and the building identification method based on the remote sensing image comprises the following steps: inputting the target remote sensing image into a feature extraction layer, and outputting feature information of the target remote sensing image; inputting the feature information into an attention layer, and outputting enhanced feature information containing the associated information of the feature information in the space dimension and the channel dimension; and inputting the enhanced feature information into an upper sampling layer, and outputting an identification result of the building in the target remote sensing image.
According to the building identification method based on the remote sensing image, provided by the invention, the feature information is input into an attention layer, and enhanced feature information containing the associated information of the feature information in the space dimension and the channel dimension is output, and the method comprises the following steps: inputting the feature information into a spatial attention layer, and outputting first feature information containing the associated information of the feature information in a spatial dimension; inputting the characteristic information into a channel attention layer, and outputting second characteristic information containing the associated information of the characteristic information in the channel dimension; and inputting the first feature information and the second feature information into a fusion layer, and outputting enhanced feature information containing the associated information of the feature information in the spatial dimension and the channel dimension.
According to the building identification method based on the remote sensing image, the feature information is input into a space attention layer, and first feature information containing the related information of the feature information in the space dimension is output, and the building identification method based on the remote sensing image comprises the following steps: determining the weight of each spatial feature in the feature information according to the correlation between each feature and other features at the spatial position in the feature information; giving the weight of each spatial feature in the feature information to the corresponding feature in the feature information to obtain the first feature information; and/or inputting the feature information into a channel attention layer, and outputting second feature information containing the associated information of the feature information in a channel dimension, wherein the second feature information comprises: determining the weight of each channel in the feature information according to the correlation between each channel and other channels in the feature information; and giving the weight of each channel in the characteristic information to the corresponding channel in the characteristic information to obtain the second characteristic information.
According to the building identification method based on the remote sensing image, the method for acquiring the target remote sensing image containing the building comprises the following steps: acquiring a full-color image and a multispectral image of a building contained in a remote sensing satellite; and fusing the full-color image and the multispectral image to obtain the target remote sensing image.
According to the building identification method based on the remote sensing image, provided by the invention, the semantic segmentation neural network comprises one of a full convolution neural network and a UNet neural network.
According to the building identification method based on the remote sensing image, provided by the invention, the method further comprises the following steps: dividing the remote sensing image sample into a training set and a testing set; inputting the remote sensing image samples of the training set into the building identification model, and adjusting the parameters of the building identification model according to the marking information of the building in the remote sensing image samples of the training set; inputting the remote sensing image samples of the test set into the building identification model with the parameters adjusted, and testing the building identification model with the parameters adjusted according to the marking information of the building in the remote sensing image samples of the test set; and obtaining a trained building identification model according to the test result of the building identification model after the parameters are adjusted on the test set.
In a second aspect, the present invention further provides a building identification apparatus based on remote sensing images, including: the processing module is used for acquiring a target remote sensing image containing a building; the building identification module is used for inputting the target remote sensing image into a building identification model and outputting an identification result of the building in the target remote sensing image; the building identification model comprises a semantic segmentation neural network combining space attention and channel attention and is obtained by training according to a remote sensing image sample and the labeling information of the building in the remote sensing image sample.
The invention provides a building identification method and device based on remote sensing images, which are characterized in that a target remote sensing image containing a building is obtained; inputting the target remote sensing image into a building identification model, and outputting an identification result of the building in the target remote sensing image; the building identification model comprises a semantic segmentation neural network combining space attention and channel attention and is obtained by training according to a remote sensing image sample and the labeling information of the building in the remote sensing image sample; the space attention and the channel attention are utilized to enhance the characteristic expression of the semantic segmentation neural network in the space dimension and the channel dimension, the improved semantic segmentation neural network is used for identifying the buildings in the remote sensing images, the problems of middle loss and edge loss of large buildings existing in the building identification of the high-resolution remote sensing images can be solved, and the buildings in the high-resolution remote sensing images can be identified more accurately.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to fig. 1, fig. 1 is a schematic flow chart of some embodiments of a building identification method based on remote sensing images according to the present invention. As shown in fig. 1, the building identification method based on remote sensing images comprises the following steps:
step 101, obtaining a target remote sensing image containing a building.
And 102, inputting the target remote sensing image into a building identification model, and outputting an identification result of the building in the target remote sensing image.
In some embodiments, instep 101, the material, use, etc. of the building are not limited, and the type of the target remote sensing image is not limited. Optionally, the target remote sensing image may be a remote sensing image shot by an aerial camera; or may be remote sensing images taken with satellites.
In some embodiments, instep 102, the building identification model may include a semantic segmentation neural network combining spatial attention and channel attention, which is trained according to the remote sensing image sample and the labeled information of the building in the remote sensing image sample. The embodiment of the invention does not limit the type of the semantic segmentation neural network. As an example, the semantic Segmentation neural network may employ a Deep Convolutional Encoder-Decoder Architecture (a Deep Convolutional Encoder-Decoder for Image Segmentation, Segnet for short). As an example, the annotation information of the building in the remote sensing image sample may be a mask image of the corresponding remote sensing image sample.
The embodiment of the invention discloses a building identification method based on remote sensing images, which comprises the steps of obtaining a target remote sensing image containing a building; inputting the target remote sensing image into a building identification model, and outputting an identification result of the building in the target remote sensing image; the building identification model comprises a semantic segmentation neural network combining space attention and channel attention, and is obtained by training according to a remote sensing image sample and the labeling information of the building in the remote sensing image sample; the space attention and the channel attention are utilized to enhance the characteristic expression of the semantic segmentation neural network in the space dimension and the channel dimension, the improved semantic segmentation neural network is used for identifying the buildings in the remote sensing images, the problems of middle loss and edge loss of large buildings existing in the building identification of the high-resolution remote sensing images can be solved, and the buildings in the high-resolution remote sensing images can be identified more accurately.
Referring to fig. 2, fig. 2 is a flow chart of a building identification method based on remote sensing images according to another embodiment of the invention. As shown in fig. 2, the building identification method based on remote sensing images comprises the following steps:
step 201, obtaining a target remote sensing image containing a building.
In some optional implementations, obtaining a target remote sensing image including a building includes: acquiring a full-color image and a multispectral image of a building contained in a remote sensing satellite; and fusing the full-color image and the multispectral image to obtain a target remote sensing image. The embodiment of the invention does not limit the method for realizing the fusion of the color image and the multispectral image.
Wherein, the full-color image can refer to black and white images of the whole visible light wave region (generally defined between 0.4 mu and 0.7 mu) acquired by a remote sensor. Multispectral images may refer to images that contain many bands, sometimes only 3 bands (color images are one example) but sometimes many more, even hundreds. Each band is a grayscale image that represents the brightness of the scene, derived from the sensitivity of the sensor used to create the band. In such an image, each pixel is associated with a string of values, i.e. a vector, in different bands by the pixel. This string of values is called the spectral signature of the pixel. The high-resolution remote sensing image can be obtained by fusing the full-color image and the multispectral image.
And 202, inputting the target remote sensing image into the feature extraction layer, and outputting feature information of the target remote sensing image.
In some alternative implementations, the building identification model may include a feature extraction layer, an attention layer, and an upsampling layer, e.g., for a deep convolutional encoder-decoder architecture, the feature extraction layer is an encoder, the upsampling layer is a decoder, and the attention layer is disposed between the encoder and the decoder. In some embodiments, the feature extraction layer may obtain feature information of the target remote sensing image by performing feature extraction processing on the input target remote sensing image. In this embodiment of the present invention, the feature information may include a feature map, a feature vector, and the like, which is not limited in this embodiment of the present invention. For example, the feature extraction layer may include a convolution layer, a pooling layer, and the like, and the implementation manner of the feature extraction layer is not limited in the embodiment of the present invention.
Step 203, inputting the feature information into the attention layer, and outputting enhanced feature information containing the associated information of the feature information in the space dimension and the channel dimension.
In some optional implementations, the attention layer may include a spatial attention layer, a channel attention layer, and a fusion layer, and the feature information output by the feature extraction layer may be input into the spatial attention layer, first feature information including associated information of the feature information in a spatial dimension is output, the feature information output by the feature extraction layer is input into the channel attention layer, second feature information including associated information of the feature information in a channel dimension is output, and finally, the first feature information and the second feature information are input into the fusion layer, and enhanced feature information including associated information of the feature information in the spatial dimension and the channel dimension is output. Rich context relations can be established on the local features through the attention layer, so that more extensive context information can be coded into the local features, and the representation capability of the features is enhanced.
In some optional implementations, inputting the feature information into a spatial attention layer, and outputting first feature information including association information of the feature information in a spatial dimension may include: determining the weight of each spatial feature in the feature information according to the correlation between each feature and other features at the spatial position in the feature information; giving the weight of each spatial feature in the feature information to the corresponding feature in the feature information to obtain first feature information; and/or inputting the feature information into a channel attention layer, and outputting second feature information containing associated information of the feature information in a channel dimension, wherein the method comprises the following steps: determining the weight of each channel in the characteristic information according to the correlation between each channel and other channels in the characteristic information; and giving the weight of each channel in the characteristic information to the corresponding channel in the characteristic information to obtain second characteristic information. The first characteristic information and the second characteristic information can capture clear semantic similarity and long-range relation, and can more effectively capture global dependency relation and long-range context information, so that better characteristic representation is learned in scene segmentation.
Andstep 204, inputting the enhanced characteristic information into an upper sampling layer, and outputting an identification result of the building in the target remote sensing image.
In some embodiments, the upsampling layer may obtain the identification result of the building in the target remote sensing image by performing upsampling processing on the input enhanced feature information including the associated information of the feature information in the spatial dimension and the channel dimension. For example, the upsampling layer may include a convolutional layer, an upsampling layer, and the like, and the implementation of the upsampling layer is not limited by the embodiment of the present invention. Optionally, the upsampling layer may also restore detail information by combining with the information of the feature extraction layer. As can be seen from fig. 2, compared with the description of some embodiments corresponding to fig. 1, the method for identifying a building based on a remote sensing image in some embodiments corresponding to fig. 2 shows how the identification result is obtained through a building identification model. After the feature extraction layer, an attention layer is added to enhance feature information, similar features of the unobtrusive object are selectively aggregated, feature representation of the similar features is highlighted, and influence of the highlighted object is avoided. The method adaptively integrates similar features on any scale from the global perspective, and fully utilizes the relation of space and channels to effectively enhance feature representation.
In some embodiments of the present invention, before identifying the building in the target remote sensing image through the building identification model, the building identification model needs to be trained. The remote sensing image sample can be divided into a training set and a testing set; inputting the remote sensing image samples of the training set into a building identification model, and adjusting parameters of the building identification model according to the marking information of the building in the remote sensing image samples of the training set; inputting the remote sensing image sample of the test set into the building identification model with the parameters adjusted, and testing the building identification model with the parameters adjusted according to the marking information of the building in the remote sensing image sample of the test set; and obtaining the trained building identification model according to the test result of the building identification model after the parameters are adjusted on the test set. The invention does not limit the proportional division of the training set and the test set.
As an example, the training of the UANet network can be realized by collecting a remote sensing image with a high resolution acquired by a beijing satellite No. 2 in beijing, establishing a remote sensing image sample library, and identifying a building in the remote sensing image sample through a building identification model, i.e., a U-Attention-Net network, which is called as the UANet network for short, and the structure of the UANet network is shown in fig. 4. Reference may be made specifically to the following steps:
step 1: and collecting a high-resolution remote sensing image acquired by a Beijing No. 2 satellite in Beijing. The remote sensing image comprises a full-color image with the resolution of 0.8 m and blue, green, red and near-infrared multispectral images with the resolution of 3.2 m. The high-resolution remote sensing image with the resolution of 0.8 m and containing 3 wave band information (red, green and blue) is obtained by fusing the full-color image and the multispectral image.
Step 2: and carrying out building labeling on the randomly selected high-resolution remote sensing image by using a man-machine interaction mode. The human-computer interactive interpretation is human-computer interactive image interpretation, and is a method for helping an interpreter to interpret remote sensing images by taking remote sensing digital images as basic information sources and utilizing high-speed data processing and image processing software of a computer to extract and edit images under corresponding software and hardware working environments.
And step 3: dividing the remote sensing images marked with the high resolution of the building, taking 90% of the remote sensing images marked with the high resolution as a training set, and taking 10% of the remote sensing images marked with the high resolution as a testing set.
And 4, step 4: and (3) constructing a UANet neural network structure (as shown in figure 4). On the basis of the UNet neural network structure, a spatial attention and channel attention mechanism is added.
And 5: and identifying the buildings in the high-resolution remote sensing image through UANet. And adjusting parameters of the UANet by utilizing the training set and the testing set to form a final building identification model, so that the UANet identifies the building in the high-resolution remote sensing image.
The UANet network architecture shown in fig. 4 is based on UNet neural network structure, and adds attention mechanism in the previous step of the up-sampling process. As shown in fig. 4, the role of the UANet network in the first half is feature extraction and the role of the second half is upsampling. The UANet adopts completely different feature fusion modes: and (6) splicing. UANet adopts and splices the characteristic passageway dimension together, forms thicker characteristic. That is, the upsampling part fuses the outputs of the feature extraction part, and in so doing, in fact, the multi-scale features are fused together, taking the last upsampling as an example, the features of the feature come from both the output of the first convolution block (the same-scale feature) and the output of the upsampling (the large-scale feature), and such connection is through the whole network, and it can be seen that there are four times of fusion processes in the network in the graph.
The construction idea of the spatial attention mechanism in the attention model is to mutually enhance the expression of respective characteristics by utilizing the relevance between any two characteristics, and the specific operation is divided into three steps: 1) if A is a C × H × W (C, H, M respectively represents a channel, a length and a width) feature map extracted by a previous convolution network, the A is convoluted to generate three new feature maps B1, B2 and B3, and B1 is reshaped into a C × N (wherein N = H × W) dimension and transposed into an N × C dimension matrix represented by B1;
2) b2 is also reshaped into a C multiplied by N dimensional matrix B2, B1 is multiplied by a B2 matrix, and an N multiplied by N dimensional matrix S is obtained through a softmax function;
3) reshaping B3 into a C × N dimensional matrix d, multiplying by the transpose S of S, and multiplying by a scaling coefficient
The C × N dimensional matrix b3 is obtained, reshaped into C × H × W dimensions, and added to the original feature map a by corresponding elements to obtain a feature map E1 of the set space attention weight.
The method for constructing the channel attention mechanism in the attention model is characterized in that the response capability of specific semantics under a channel is enhanced through the relevance between modeling channels, and the specific operation is divided into three steps:
1) taking the above as an example, a is a C × H × W feature map extracted by the convolutional network before, a is reshaped into two C × N (N = H × W) feature maps, each represented by F, G, and I is transposed as I, the same as a;
2) multiplying G by i, and obtaining a C multiplied by C dimensional matrix X through a softmax function;
3) multiplying X by F, and multiplying by the scale factor
And obtaining a matrix J of dimension C multiplied by N, reshaping the matrix J into a matrix E of dimension C multiplied by H multiplied by W, and adding E and the corresponding elements of the original feature map A to obtain a feature map E2 of the attention weight of the set channel.
Finally, to fully utilize the context information, E1 was fused with E2. Namely E1 and E2 are converted, and the sum of element-wise is executed to realize the feature fusion. Finally, convolution is carried out to obtain a final prediction characteristic diagram, as shown in an effect diagram shown in fig. 5, it can be seen that the recognition performance of the UANet network on the large buildings is superior to that of the UNet network, and the specific embodiment is that when the buildings in the high-resolution remote sensing image are recognized, the problems of large building middle loss and edge loss existing in the UNet network can be effectively improved, the buildings in the high-resolution remote sensing image can be recognized more accurately, and the UANet network can not increase too many parameters while effectively enhancing the characteristic representation.
The building identification device based on remote sensing images provided by the invention is described below, and the building identification device based on remote sensing images described below and the building identification method based on remote sensing images described above can be referred to correspondingly.
As shown in fig. 3, the building identification apparatus 300 based on remote sensing image of some embodiments includes aprocessing module 301, a building identification module 302: theprocessing module 301 is configured to obtain a target remote sensing image including a building; abuilding identification module 302, configured to input the target remote sensing image into a building identification model, and output an identification result of a building in the target remote sensing image; the building identification model comprises a semantic segmentation neural network combining space attention and channel attention and is obtained by training according to a remote sensing image sample and the labeling information of the building in the remote sensing image sample.
In an optional implementation of some embodiments, the building identification module includes: the characteristic extraction layer is used for inputting the target remote sensing image and outputting the characteristic information of the target remote sensing image; the attention layer is used for inputting the characteristic information and outputting enhanced characteristic information containing the associated information of the characteristic information in the space dimension and the channel dimension; and the upper sampling layer is used for inputting the enhanced characteristic information and outputting the identification result of the building in the target remote sensing image.
In an alternative implementation of some embodiments, the attention layer includes: the spatial attention layer is used for inputting the characteristic information and outputting first characteristic information containing the associated information of the characteristic information in the spatial dimension; the channel attention layer is used for inputting the characteristic information and outputting second characteristic information containing the associated information of the characteristic information in the channel dimension; and the fusion layer is used for inputting the first characteristic information and the second characteristic information and outputting enhanced characteristic information containing the associated information of the characteristic information in the space dimension and the channel dimension.
In an alternative implementation of some embodiments, the spatial attention layer is configured to determine a weight of each spatial feature in the feature information according to a correlation between each feature and other features at a spatial position in the feature information; giving the weight of each spatial feature in the feature information to the corresponding feature in the feature information to obtain first feature information; and/or the channel attention layer is used for determining the weight of each channel in the feature information according to the correlation between each channel and other channels in the feature information; and giving the weight of each channel in the characteristic information to the corresponding channel in the characteristic information to obtain second characteristic information.
In an alternative implementation of some embodiments, theprocessing module 301 includes: the input unit is used for acquiring a full-color image and a multispectral image of a building contained in the remote sensing satellite; and the fusion unit is used for fusing the full-color image and the multispectral image to obtain a target remote sensing image.
In an alternative implementation of some embodiments, the semantically segmented neural network comprises one of a fully convolutional neural network and a UNet neural network.
In an optional implementation of some embodiments, the apparatus 300 further comprises: the training module is used for dividing the remote sensing image sample into a training set and a testing set; inputting the remote sensing image samples of the training set into a building identification model, and adjusting parameters of the building identification model according to the marking information of the building in the remote sensing image samples of the training set; inputting the remote sensing image sample of the test set into the building identification model with the parameters adjusted, and testing the building identification model with the parameters adjusted according to the marking information of the building in the remote sensing image sample of the test set; and obtaining the trained building identification model according to the test result of the building identification model after the parameters are adjusted on the test set.
It is to be understood that the modules recited in the apparatus 300 correspond to the steps in the method described with reference to fig. 1. Thus, the operations, features and advantages of the method described above are also applicable to the apparatus 300 and the modules and units included therein, and are not described herein again.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.