Background
Cracking is one of the inevitable problems in pavement damage. Civil engineers who detect cracks are currently most often detected by methods such as manual visual inspection or laboratory experiments. However, these methods are cumbersome and labor intensive and slow down the efficiency of maintenance of the pavement.
In this regard, there have been many image processing methods for rapidly detecting a road surface crack and recognizing the crack, for example, a directional edge detection method for analyzing a road surface image using noise; a structural state monitoring method based on adaptive visual crack detection; the automatic detection method of pixel-level precision is carried out on the asphalt pavement cracks by utilizing a convolutional neural network (Crack Net); heuristic optimization edge detection algorithm based on combination, automatic detection method of Convolutional Neural Network (CNN), and the like.
However, the current detection methods have some problems:
1. in actual road surface maintenance work, a lot of background noise is generally generated in a picture taken on site by a road surface detection system due to uneven illumination, low camera resolution, stains, and the like, and this has not been considered sufficiently in conventional research. Thus, the CNN model, after learning from clean, qualified photographs, may not be suitable for live, noisy original photographs.
2. Some studies have trained their CNN models using labeled images with pre-manually identified crack locations and pre-identified crack types.
3. Automatic crack identification and classification are more studied, but less research is done on the morphological characteristics of cracks and different repair activities due to different causes.
Therefore, the invention provides a pavement crack rapid identification method based on deep learning, which divides pavement cracks into four types: transverse cracks, longitudinal cracks, oblique cracks, crocodile skin-like cracks. Unlike other CNN methods, this method can identify pavement crack patterns using a limited number of live images that have not been manually identified in advance for training. The invention has the advantage that the unprocessed road surface crack field image can be learned without manual operation in a very early learning stage, even if the image has some background noise. The internal preprocessing step can reduce useless information and simplify input of CNN, thereby improving performance, helping civil engineers to timely carry out corresponding maintenance and timely repairing cracks before the cracks deteriorate to generate destructive effect.
Disclosure of Invention
The technical scheme adopted by the invention is a pavement crack rapid identification method based on a deep convolutional neural network, which comprises two parts of image preprocessing and CNN (convolutional neural network), as shown in figure 1, and comprises the following specific steps:
the method comprises the following steps: and (4) preprocessing a live photo.
First, the size of the live photograph is adjusted so that the live photograph is uniformly adjusted to 856 × 856 pixels. Secondly, the exposure of the live picture is adjusted. The original live photo is not uniformly illuminated, and the shielding effect of the live image can be realized only by processing the non-uniform exposure. Two methods are used to treat different types of pavement cracks:
1) if the crack is a transverse crack, an oblique crack or an alligator skin-shaped crack, an average value method is adopted, namely an average pixel value P _ m is taken from each single field image, wherein m represents an average value; circularly traversing each column, recording the pixel value c (: i) of each column, wherein i is a circular parameter, i is 1,2,3 … …, and taking the average value c _ m (i) of c (: i); then, the pixel value of this column is adjusted, and the adjusted pixel value is equal to the difference between the column average and the graph average plus the original pixel value: c (: i) ═ c (: i) + (c _ m (i) -P _ m). (1)
2) If the crack is a longitudinal crack, a mask processing method is adopted, and the following model is adopted for the image with uneven illumination:
I′(x,y)=I(x,y)+B(x,y) (1)
where I' (x, y) is a non-uniform illumination image, I (x, y) is an image uniformly illuminated in an ideal state, B (x, y) is a background image reflecting a luminance distribution, x represents an abscissa of an image matrix, and y represents an ordinate of the image matrix.
Typically, a grayscale offset is added after the result:
I(x,y)=I′(x,y)-B(x,y)+offset (2)
wherein the gray scale offset is typically set to the average of the low pass filtered results to make the average intensity of the image after masking consistent with the original image.
Next, the grayscale resolution is reduced to 32 levels within the grayscale range, 0-7, 8-15, … 248-255. The gray values within a certain range become close to the threshold value, e.g. 10 becomes 8 and 250 becomes 248.
Then, a histogram based on the pixel intensity values of the gradation distribution is used to fit the distribution of the gradation pixel values, a threshold is established, and whether it is a background or not is determined. Binary color visualization (black, white) is achieved with a thresholding method based on the mean of the previous step. If the pixel value is greater than the threshold, it is set as background. Since the crack is usually darker than the road surface, the dark pixel is set as foreground and the bright pixel is set as background; if the pixel value is greater than the threshold, the foreground is set. The background is set to 0 and the foreground to 255.
Then, the crack shape is enhanced using a method based on a connecting member (noise removal). The noise comes substantially from the texture of the road surface, the area of which is smaller than the cracks. Based on this finding, the present invention applies join tool denoising. All connected objects in the map are searched and the area of the crack shape is examined. If the area of the crack shape is less than the threshold, it is considered noise. If the area of the crack shape is greater than the threshold, the crack is considered.
Finally, the CNN inputs are adjusted and expansion and erosion are applied to reconnect the fractures. This step includes two substeps: first, the image is resized to 156 × 156 pixels (one smaller sized image for CNN training input); in the second step, dilation is performed and then erosion is performed with an image of size (6,6) to connect some unconnected cracks.
Fig. 2 shows the results of six-step image preprocessing of field photographs of four pavement crack types.
Step two: a CNN architecture.
The developed architecture contains 6 hidden units, each consisting of a convolutional layer, a ReLU activation function and a max pooling layer. The convolutional layers have a data depth of 50, indicating that there are 50 filters in total per convolutional layer, and one 5 × 5 filter is used. The filter is filled with 0,step size 1. The ReLU activation function is then used, as shown in equation (3). After the ReLU activation function, with the maximum pooling layer, the window size is 2 × 2, with a step size of 2 × 2. The final architecture is as shown in fig. 3, the input vector x from the neural network of the previous layer, which enters the neuron in the processed image, is input into the activation function, and is output to the neuron of the next layer. The ReLU function used in the present invention is represented as follows:
for the loss function, the traditional classification cross entropy is used as the loss function, the probability of the CNN outputting the total number of the C classes of each image is trained, the difference value between the predicted value and the true value of a single sample is smaller when the sample image is input, and the CNN model is better. The common classification cross-entropy loss CE is:
wherein, tiAnd siFraction of each section i in the total number of classes C of true and CNN, respectively, f(s)iIs the Softmax function.
For the learning method, an Adam method is used for optimizing the model parameters, and the Adam method is a simple and high-calculation-efficiency random objective function gradient optimization algorithm.
The method combines the advantages that Adagrad is good at processing sparse gradients and RMSprop is good at processing non-stationary targets, and is also suitable for large data sets and high-dimensional spaces. Adam is used in the present invention because it can be well adapted to a wide range of non-convex optimization problems.
Adam holds the past mean squared gradient vtIn the trend of exponential decay. It also has a past gradient m of exponential decay tendencytAnd a flat minimum preference in the error plane. Then, the past attenuation mean and the past squared gradient m are calculatedtAnd vtThe corresponding is as follows:
mt=β1mt-1+(1-β1)gt(6)
wherein m istAnd vtAre estimates of the first moment (mean) and the second moment (no central variance) of the gradient, respectively. The algorithm keeps the random gradient of the image matrix descending and single learning rate, and updates all weights in the CNN.
Due to mtAnd vtVectors initialized to 0, which are biased toward 0, these biases can be calculated as:
these t are then used and the parameters are updated as:
β1default value of 0.9, β2Default value of (2) is 0.999, and the default value of epsilon is 10-8. Each epoch (iteration) is the entire process of neural network training through the entire data set, including forward and backward. In the present invention the learning rate is initialized to 0.001 and after each epoch the learning rate decays to 0.001/i, where i is the number of iterations of the image.
Typically, minipatch is sized to 2n(n-1, 2,3 …) to accommodate CPU/GPU memory. For convenience, the minipatch size is chosen to be 20, since the current study uses a limited number of live photographs. For unprocessed live photographs CNN, 6 hidden units and 150 epochs were chosen as the hyper-parameters. Note that the purpose of the epochs comparison is to find a reasonable training time for possible applications in actual pavement maintenance programs.
Step three: and adjusting the CNN hyper-parameter and determining an optimal CNN frame.
For the number of hidden units, the hidden units are repeated n times, where n is 2,3,4,5,6, and 5 different architectures were tested for the total design. Taking the effectiveness and efficiency of different architectures under different n as the reference, taking the average result of 5 independent repeated operations, respectively corresponding to different hidden units and epochs, it can be seen that the CNN of 150epoch is generally higher than the model accuracy of 100 and 120epoch, because of sufficient iterative computation. For a fixed epoch, the model accuracy increases with increasing number of implicit units and then stabilizes around 0.9. Fig. 4 shows the result of verifying the accuracy. It can be seen that the trend of the three epoch curves (100,120,150) also generally increases with increasing number of implicit units, eventually settling between 0.88 and 0.92. Figure 4 depicts the results of the test accuracy. It can be seen that the test accuracy of 6 hidden units with 150 epochs is the highest and is 0.844, so that the hidden units can be used as an optimal framework of a crack mode, and the result of the hyper-parameter tuning experiment for different hidden units is shown in fig. 5. In general, an epoch of 150 and an architecture of 6 hidden units has a higher test accuracy than 2 to 5 hidden units. Fig. 6 shows the adjustment of the CNN architecture for n hidden units.
As to the size of the filter, since the present invention is performed on a small data set, the size of the filter may affect the network performance, and the experimental result is shown in fig. 7, where the size of the current filter is selected to be5x 5. It has higher accuracy (0.844) and reasonable computation time (1925.23 seconds), and the results are shown in table 1.
TABLE 1 comparison of different size filters
For minimatch size, minimatch size affects the learning rate of the neural network. Usually it is chosen to be 2nTo accommodate CPU/GPU memory. For convenience, 20 is currently used. To investigate the effect on learning rate, pair 16 (2)4) 20 and 32 (2)5) The minipatch sizes of the samples were compared and the comparison is shown in FIG. 8. Each size was averaged over 5 independent replicates corresponding to 6 hidden units and a 5x5 filter. It can be seen that the CNN prediction accuracy of minimatch size of 16 is better than that of 32 minimatch size.
Overfitting is a common problem for many machine learning. If the model is over-fitted, the resulting model is hardly usable. In order to solve the overfitting problem, a model integration method is generally adopted, namely a plurality of models are trained to be combined. In this case, it takes a lot of time to train the models, and it takes much time to test not only a plurality of models but also a plurality of models. The dropout function can effectively relieve the occurrence of overfitting and achieve the regularization effect to a certain extent. Thus, the dropout function is finally used to prevent overfitting, preventing complex co-adaptation of neurons. The probability of selecting the dropout function as the random omission of hidden neurons is 0.2. As shown in table 2.
TABLE 2 application dropout vs. Standard neural network
Detailed Description
The method adopts a ZOYON-RTM intelligent road detection vehicle to acquire the field image of the asphalt pavement crack. The detection system is provided with an advanced vehicle-mounted sensor system, a vehicle-mounted computer and an embedded integrated multi-sensor synchronous control unit.
The pavement damage detection system is provided with a linear array camera with 2048 pixel/line resolution and an infrared laser pavement auxiliary lighting system, and ensures all-weather detection of pavement cracks. When the test vehicle runs at a speed of 5-100 km/h in the daytime, the line scanning camera behind the vehicle body can continuously shoot road surface images at a high speed. Meanwhile, the infrared filter is used for removing shadows generated by sunlight irradiation. These high quality images are of sufficient resolution to ensure that the human eye can directly identify the pavement cracks.
Although the road surface detection system is designed using a noise reduction tool, a crack pattern in an image is difficult to be recognized by the naked eye due to the influence of background noise. Actual pavement cracks are generally divided into several basic types, and four basic types are mainly researched by the invention, namely transverse cracks, longitudinal cracks, oblique cracks and crocodile skin-shaped cracks.
By visual judgment, the pavement cracks in the original images are divided into four basic forms: transverse slits (151 images), longitudinal slits (119 images), oblique slits (68 images), and alligator skin slits (133 images), as shown in fig. 9. There are 46 test photos, and the number of test photos per category is proportional to the size of the training set.
Then, the results of the typical CNN architecture test under 6 hidden units, 5 × 5 filters and 150 epochs conditions of the unprocessed live photograph (single CNN) and the preprocessed live photograph (method of the present invention) are compared, and the verification results are shown in fig. eight. The actual cracks in each picture are identified by visual inspection by a human being and compared with the two measures. The test results show that for a single CNN study, all test fractures were identified by CNN as transverse fractures. Through 5 independent repeated experiments, the average value of the precision of the obtained training model is 0.31, the verification precision is 0.28, and the prediction precision of the test crack form is 0.30. It can be concluded that a single CNN has very low accuracy for analysis of limited unprocessed live photographs. CNN learns little information from this small dataset because random predictions can reach 25% accuracy. In comparison, for a typical test, the computational model accuracy of the method of the present invention is 0.9159, the verification accuracy is 0.8729, and the test accuracy is 0.8478. The calculated time was 3059.21 seconds. Compared with single CNN research, the method can identify the crack types of most of test photos, and has satisfactory accuracy and reasonable calculation time.
Fig. 10 and 11 are comparisons of a single CNN model with the method of the present invention. Comparing fig. 10 and 11, it can be seen that the loss and accuracy in fig. ten remain constant throughout the epoch, while the loss decreases and the accuracy increases in fig. 11. This contrast refers to the CNN learning the fracture type from the training image more efficiently after image pre-processing. The accuracy and loss change rapidly at first, which shows that the learning speed of the CNN model is fast. After a certain time, the trend changes slowly and finally stabilizes at about 0.8-0.9 (precision) and 0.2-0.3 (loss).