(II) background technology:
remote sensing (remote sensing) is a method of obtaining object information without physical contact, in contrast to field observations. Remote sensing currently utilizes on-board or on-board sensors to detect and identify objects in the earth, ocean, and atmosphere. According to different sources of received electromagnetic waves, remote sensing can be divided into active remote sensing and passive remote sensing, wherein electromagnetic wave signals of the former are from satellites or airplanes, and signals received by the latter are from reflected sunlight. Remote sensing technology is widely used, including geography, land surveying, and most of the geosciences disciplines. Furthermore, there are also important applications in military, intelligence, commercial, economic, planning and humanitarian areas. Furthermore, when a telemetric instrument can convert a received electromagnetic signal into an image, the image is called a remote sensing image (remote sensing images).
The semantic segmentation (semantic segmentation) means that a computer automatically segments each pixel point in an image according to the semantic category of the target to which the pixel point belongs, so as to obtain a labeling result of pixel-by-pixel classification (pixel-wise classification). The task of semantic segmentation presents a significant challenge because objects that express the same semantic category tend to be composed of multiple components, which may be very different.
A full convolution neural network (full convolution neural network) is one of the neural network models that have been developed rapidly in recent years, and has a feature that end-to-end (end-to-end) semantic information of an image can be learned, so that a semantic segmentation task can be completed well. For a remote sensing image, the contained information is far more than that of a natural image, but the information is limited by the height of a satellite, the resolution of the image is usually not high, and a large amount of detail information is lost, so that semantic segmentation for the remote sensing image is still a difficult point and a hot point of current research.
Change detection (change detection) refers to the fact that a remote sensing image pair with the same area and different time phases is used for extracting an area with obvious change in front and rear time phases, and the technology can be applied to the fields of land supervision, urban planning, natural disaster assessment and the like. However, under the influence of illumination, weather, season and camera state, the imaging quality of the remote sensing image often fluctuates greatly, which makes it difficult to automatically extract the change information.
The semantic level change detection is distinguished by the change conditions of the ground object types contained in the front time phase and the rear time phase, and the land utilization monitoring belongs to the semantic level change detection. Since conventional change detection research is generally limited to small areas and semantic levels are difficult to achieve, this task currently still relies on manual annotation. Generally, the land use monitoring task of only one grade city needs to consume dozens of working hours, and if the land use monitoring task is directed to the whole scope of the district, the land use monitoring task can be completed only by the cooperation of months and the full force of the whole department. Aiming at the problems, the invention provides a semantic level change detection method based on a remote sensing image. The method can not only extract the change area from the remote sensing image pair, but also judge the type of the change, such as the change of a building into vegetation, and the like, thereby greatly improving the automation degree of change detection and further reducing the supervising cost of the government.
(III) the invention content:
the invention aims to provide a semantic level change detection method based on a remote sensing image. The method is based on semantic segmentation, combines the thought of change detection, constructs a full convolution neural network aiming at the change detection of the remote sensing image, and aims to mark a change area in a remote sensing image pair and judge the type of change.
The invention is realized by the following technical scheme:
the invention discloses a semantic level change detection method based on a remote sensing image, which is a semantic level change detection method aiming at a multi-temporal remote sensing image pair. The method comprises the following specific steps:
the method comprises the following steps: a data set is made.
Firstly, multi-temporal remote sensing images in the same area are prepared, and an overlapping area and an available light wave band are extracted to form a true color remote sensing image pair. Secondly, the remote sensing image is labeled with semantic labels artificially for training and testing, the labeled data is used as a training set for neural network training, and the rest data is used as a testing set for verifying the performance of the neural network.
Step two: and building a full convolution neural network.
The neural network comprises two modules, namely a semantic feature extraction module and a change feature extraction module. The semantic feature extraction module extracts consistent semantic features from the front time phase image and the rear time phase image respectively in a parameter sharing mode. The change feature extraction module firstly fuses the semantic features of the two time phase images and then further abstracts the semantic features to obtain the change features. And finally, the two modules are cascaded to obtain good change detection performance.
The specific steps for constructing the neural network are as follows:
1. creating a model file: prototxt.
2. Building a neural network: because Caffe is a layer as a basic module, detailed parameters of each layer need to be configured in sequence;
3. an input layer is provided. At the position of the neural network input, a data layer named as 'data' is arranged, and the data layer is responsible for reading in the original remote sensing image pair and the related label.
4. Setting a connecting layer: because the neural network comprises two modules, in order to train in the same frame, a connection layer is also needed to be arranged for connecting two groups of features respectively extracted by the semantic feature extraction module in series for the subsequent change feature extraction module to use.
5. An output layer is provided. And on the basis of the change feature extraction module, an output layer is further arranged and used for converting the extracted features into specific categories. The number of convolution kernels N of this layer is the same as the number of classes that need to be distinguished in the actual task.
6. Setting a loss function layer: because the remote sensing image interpretation task related in the invention belongs to the pixel-by-pixel classification task, the cross entropy is used as a loss function to guide the training of the neural network.
Step three: training a full convolution neural network. And (4) constructing a full convolution neural network by using the data set manufactured in the step one and the step two, and training by using a Caffe deep learning framework. After full training, the corresponding neural network parameters are recorded.
The specific method comprises the following steps: 1. the optimization method selects a random gradient descent method, and the learning rate is set to be 10-5The maximum number of iterations was set to 10 ten thousand. 2. The weights of the neural network are initialized with a uniform distribution. 3. Training data is input and pixel-by-pixel change detection results are obtained through forward propagation. 4. A loss function between the neural network output and the tag is calculated. 5. Weights and biases in the neural network are adjusted by back-propagating errors. And repeating the steps 2, 3 and 4 until the iteration number reaches the maximum iteration number, and stopping optimization and storing the neural network parameters at the moment.
Step four: and detecting the change of the remote sensing image. Firstly, extracting a visible light wave band from an input remote sensing image according to a format of training set data and pairing the visible light wave band; secondly, marking the test set data or the data except the data set by using the neural network optimized in the step three; if the input image exceeds the computer video memory limit, a small image pair needs to be cut first, and the final result can be obtained by splicing after pair-by-pair change detection.
The invention discloses a semantic level change detection method based on remote sensing images. The invention has the advantages that: by constructing a full convolution neural network suitable for extracting semantic level change information, change information is extracted from an input registered remote sensing image pair, and the change information not only contains binary logic information of whether pixel value change occurs, but also contains semantic information corresponding to a change type. In addition, as the method can extract the robust semantic level change features, the method can deal with massive remote sensing data, and has high automation degree, thereby being beneficial to greatly reducing the labor cost.
(V) specific embodiment:
for a better understanding of the technical solution of the present invention, the following embodiments are further described with reference to the accompanying drawings:
the system environment depended on by the invention is Ubuntu16.04, the used deep learning framework is Caffe, and the framework can be used for efficiently training and testing the neural network. Under the framework, firstly, a configuration file needs to be written, the specific structure of the used neural network is described, and the hyper-parameters required by training are set; when the neural network is trained, parameters contained in the neural network can be adjusted and optimized by using an error back propagation algorithm and a random gradient descent method; and finally, extracting semantic level change information from the remote sensing image pair by using a test mode contained in the Caffe framework.
The full convolution neural network proposed by the present invention is shown in fig. 1, and the activation function layer and the down-sampling layer are hidden in the figure because the full convolution neural network does not contain trainable parameters. The rectangular parallelepiped represents a series of convolutional layers included in the neural network, the solid arrows represent the direction of data flow, and the detailed network configuration is shown in table 1. The flow chart of the whole algorithm operation is shown in fig. 2. In addition, when the algorithm is executed, the detailed configuration of the computer is as follows: intel (R) core (TM) i7-6700K processor, main frequency 4.0GHz, internal memory 32GB, video card NVIDIA TITAN X Pascal.
The semantic level change detection method comprises the following steps:
the method comprises the following steps: a data set is made. The input data required by the invention is a registered true color remote sensing image pair, so when an area covered by a data set is selected, a plurality of remote sensing images in the same area and different time phases are required to be obtained firstly, and an overlapped area is intercepted from the remote sensing images to be used as the remote sensing image pair. After the remote sensing image pair is obtained, a visible light wave band needs to be further extracted and made into a true color remote sensing image pair, so that the input format is ensured to meet the requirements of the invention. In addition, the neural network technology related in the invention belongs to the field of supervised learning, and the remote sensing image pair also needs to be artificially labeled with semantic labels for training and testing. The semantic label is the same as the input image pair in size, and the value of each pixel point represents whether the input image pair changes at the position or not and the type of the change. After obtaining the labeled remote sensing image pair, the image pair is further divided into a small image and divided into two parts, which represent training data and test data respectively and are used for training a neural network and verifying the performance of the neural network. Limited by computer video memory resources, and the size of each pair of remote sensing images after cutting is 500x500 pixels.
Step two: and building a full convolution neural network. And under a Caffe framework, a full convolution neural network aiming at semantic level change detection needs to be built layer by utilizing a configuration file. As shown in fig. 1, the network is divided into two parts, namely a semantic feature extraction module and a change feature extraction module. In the semantic feature extraction module, the network adopts a parameter sharing strategy, and simultaneously extracts semantic features from the remote sensing images of the front time phase and the rear time phase, so that the consistency of the features is ensured. For the change feature extraction module, two groups of features respectively representing front and rear time phase remote sensing images are fused and further abstracted, and semantic change information is learned with the help of corresponding labels.
For feature extraction, it is typically done with multiple stacked convolutional layers. Convolutional layers typically contain a plurality of convolution kernels, each representing a type of feature, for extracting rich features. The convolutional layer is typically input as a feature map or raw image, and the output is a further abstracted feature map. Through the combination of various types of features and the stacking of convolutional layers, semantic level features with highly linear separable properties can be obtained under the optimization of supervised learning. In addition, in order to solve the non-linearity problem, a non-linear activation function layer is required to be added behind each convolution layer, and a ReLU activation function is adopted in the invention.
Further, the invention also adopts a Pooling layer (Pooling) to improve the generalization capability and the micro-deformation resistance of the neural network. The pooling operation is basically equal to down-sampling, and the modes include maximum value pooling and mean value pooling, and the specific method is that a sliding window traverses an input image according to a set step length, and an output value is the maximum value or the mean value of all data in each window.
The specific steps for constructing the neural network are as follows:
1. a model file is created. Such as protein.
2. And building a neural network. Since Caffe is a layer as a basic module, detailed parameters of each layer need to be configured according to the content and sequence of table 1, and the examples are as follows:
this example shows that this layer is the second convolutional layer in the group of conv1 layers, named "conv 1_ 2", the name of the lower layer is "conv 1_ 1", the number of output channels (number of convolutional kernels) of the convolutional layer is 64, and the included convolutional kernel size is 3 × 3.
3. An input layer is provided. At the position one layer before the convolutional layer conv1_1, a data layer named "data" is arranged, and the data layer is responsible for reading in the original remote sensing image pair and the related label.
4. And arranging a connecting layer. Because the neural network comprises two modules, in order to train in the same frame, a connection layer is also needed to be arranged for connecting two groups of features respectively extracted by the semantic feature extraction module in series (concat layer) for the subsequent change feature extraction module to use.
5. An output layer is provided. And on the basis of the change feature extraction module, an output layer is further arranged and used for converting the extracted features into specific categories. The number of convolution kernels N of this layer is the same as the number of classes that need to be distinguished in the actual task.
6. A loss function layer is provided. Because the remote sensing image interpretation task related in the invention belongs to the pixel-by-pixel classification task, the cross entropy is used as a loss function to guide the training of the neural network.
TABLE 1
Step three: training a full convolution neural network. And (4) training the neural network by utilizing a Caffe deep learning framework according to the data set manufactured in the first step and the neural network built in the second step. The specific method comprises the following steps:
1. a solver file, such as solver. In the solver file, firstly, a model file needs to be specified as train. Secondly, the optimization method selects a random gradient descent method with a momentum term, and the learning rate is set to be 10-5The maximum number of iterations is set to 10 ten thousand; and finally, selecting a training mode and training by using the GPU. The formula of the stochastic gradient descent method of the momentum term is as follows:
wi+1:=wi+vi+1
wherein v represents the increment of parameter updating, w represents the parameter of convolution layer, e represents the learning rate, and L represents the calculation result of the loss function.
2. The weights of the neural network are initialized with a uniform distribution. For each convolutional layer, the mean is set to 0 and the variance is set to the reciprocal of the number of convolutional kernels.
3. Training data is input and pixel-by-pixel change detection results are obtained through forward propagation. The main operations include three types, namely convolution operation, pooling operation and activation operation. The pooling operation is to perform down-sampling operation on the input feature image. The calculation formula of the convolution operation and the activation operation is as follows (x)lFeature graph representing the input of the l-th layer, wlConvolution kernel parameters representing layer i):
convolution operation: x is the number ofl+1=wl*xl
And (3) activating operation:
4. a loss function between the neural network output and the tag is calculated. In the invention, cross entropy is used as a loss function to indicate the quality of the output of the neural network, and a calculation formula is as follows (p represents the probability of predicting each class of the neural network,
represents the probability of truth, and the network output contains N pixels and K classes):
5. weights and biases in the neural network are adjusted by back-propagating errors. Error back propagation is a typical application of the chain rule, and in actual calculation, the updating quantity of the weight and the offset can be calculated layer by layer only by calculating the derivative of the input error to the weight and the offset of each layer and then calculating the derivative of the input error to the output error. Taking the loss function in the step 4 as an example, the calculation formula when the loss function propagates reversely is as follows:
and repeating the steps 2, 3 and 4 until the iteration number reaches the maximum iteration number, and stopping optimization and storing the neural network parameters at the moment.
Step four: and detecting the change of the remote sensing image. And (4) extracting the change information of the images in the test set by using the neural network trained in the third step. The specific method comprises the following steps: 1. and (4) loading the adjusted neural network parameters in the third step into a Caffe deep learning framework. 2. Test data of 500x500 pixels is input into the neural network. 3. And carrying out forward propagation on the data to obtain a change detection result. In addition, when the method is applied in a large scale, the registered remote sensing image pair can be firstly divided into a small image pair with 500x500 pixels, the steps from one step to three are repeated to realize the change detection of each pair of images, and finally the image pairs are spliced in sequence to obtain a complete change detection result.
The experimental results are as follows: the method uses 200 marked remote sensing image pairs with 500x500 pixels as training data, and the other 200 unmarked remote sensing image pairs as test data and are marked into three types, wherein the two types represent that semantic level change occurs (building is changed into non-building and non-building is changed into building) and the third type represents that the semantic level change does not occur. As shown in fig. 3 and 4, there are two kinds of changes in the image, the first kind such as farmland, forest and house roof changes in color in front and back time phases, but does not belong to semantic level changes because the category is not changed; the second one is that, as shown on the left side of fig. 3 and 4, the forest in the front time phase changes into buildings such as houses and hardened floors in the rear time phase, and changes in category (semantic level) occur. As can be seen from fig. 5 and fig. 6, the method provided by the present invention can mark the change type well according to the semantic information, so that the change type is not affected by the non-semantic change, and therefore, the method has good robustness and accuracy.
From experimental results, the method well solves the problem of semantic level change detection of the remote sensing image, has high automation degree and marking precision, can greatly reduce labor cost, and has wide application prospect and value.