CN111126505A

Movatterモバイル変換

Info

Publication number: CN111126505A
Application number: CN201911385351.8A
Authority: CN
Inventors: 侯越; 彭勃; 王俊涛; 杨湛宁; 陈逸涵; 曹丹丹
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2019-12-28
Filing date: 2019-12-28
Publication date: 2020-05-08
Anticipated expiration: 2039-12-28
Also published as: CN111126505B

Abstract

Translated fromChinese

本发明公开了一种基于深度学习的路面裂缝快速识别方法，首先调整现场照片的大小，其次，调整现场图片的曝光。原始的现场照片光照不均匀，必须对不均匀的曝光进行处理才能实现现场图像的遮挡效果。利用基于灰度分布的像素强度值的直方图来拟合灰度像素值的分布。用基于上一步的均值的阈值方法来实现二进制颜色可视化。如果像素值大于阈值，则将其设置为背景。采用基于连接构件的方法使裂缝形状增强。应用连接工具去噪。搜索图中所有连接的对象，并检查裂缝形状的面积。如果裂缝形状的面积小于阈值，则认为是噪声。如果裂缝形状的面积大于阈值，则视为裂缝。最后，调整CNN输入，并应用扩张和侵蚀重新连接裂缝。调整CNN超参数，确定最优CNN框架。

The invention discloses a method for fast identification of pavement cracks based on deep learning, which firstly adjusts the size of the on-site photo, and secondly, adjusts the exposure of the on-site image. The original scene photo has uneven lighting, and the uneven exposure must be processed to achieve the occlusion effect of the scene image. The distribution of grayscale pixel values is fitted using a histogram of pixel intensity values based on the grayscale distribution. Binary color visualization is achieved with a thresholding method based on the mean of the previous step. If the pixel value is greater than the threshold, set it as the background. The crack shape is enhanced using a connecting member-based approach. Apply the join tool to denoise. Search for all connected objects in the graph and check the area of the crack shape. If the area of the crack shape is smaller than the threshold, it is considered noise. If the area of the crack shape is larger than the threshold value, it is considered as a crack. Finally, the CNN input is adjusted, and dilation and erosion are applied to reconnect the cracks. Adjust the CNN hyperparameters to determine the optimal CNN framework.

Description

Pavement crack rapid identification method based on deep learning

Technical Field

The invention belongs to the field of image processing, and relates to a pavement crack rapid identification method based on deep learning. The method is applied to the unprocessed original pavement crack field picture and identifies four crack types.

Background

Cracking is one of the inevitable problems in pavement damage. Civil engineers who detect cracks are currently most often detected by methods such as manual visual inspection or laboratory experiments. However, these methods are cumbersome and labor intensive and slow down the efficiency of maintenance of the pavement.

In this regard, there have been many image processing methods for rapidly detecting a road surface crack and recognizing the crack, for example, a directional edge detection method for analyzing a road surface image using noise; a structural state monitoring method based on adaptive visual crack detection; the automatic detection method of pixel-level precision is carried out on the asphalt pavement cracks by utilizing a convolutional neural network (Crack Net); heuristic optimization edge detection algorithm based on combination, automatic detection method of Convolutional Neural Network (CNN), and the like.

However, the current detection methods have some problems:

1. in actual road surface maintenance work, a lot of background noise is generally generated in a picture taken on site by a road surface detection system due to uneven illumination, low camera resolution, stains, and the like, and this has not been considered sufficiently in conventional research. Thus, the CNN model, after learning from clean, qualified photographs, may not be suitable for live, noisy original photographs.

2. Some studies have trained their CNN models using labeled images with pre-manually identified crack locations and pre-identified crack types.

3. Automatic crack identification and classification are more studied, but less research is done on the morphological characteristics of cracks and different repair activities due to different causes.

Therefore, the invention provides a pavement crack rapid identification method based on deep learning, which divides pavement cracks into four types: transverse cracks, longitudinal cracks, oblique cracks, crocodile skin-like cracks. Unlike other CNN methods, this method can identify pavement crack patterns using a limited number of live images that have not been manually identified in advance for training. The invention has the advantage that the unprocessed road surface crack field image can be learned without manual operation in a very early learning stage, even if the image has some background noise. The internal preprocessing step can reduce useless information and simplify input of CNN, thereby improving performance, helping civil engineers to timely carry out corresponding maintenance and timely repairing cracks before the cracks deteriorate to generate destructive effect.

Disclosure of Invention

The technical scheme adopted by the invention is a pavement crack rapid identification method based on a deep convolutional neural network, which comprises two parts of image preprocessing and CNN (convolutional neural network), as shown in figure 1, and comprises the following specific steps:

the method comprises the following steps: and (4) preprocessing a live photo.

First, the size of the live photograph is adjusted so that the live photograph is uniformly adjusted to 856 × 856 pixels. Secondly, the exposure of the live picture is adjusted. The original live photo is not uniformly illuminated, and the shielding effect of the live image can be realized only by processing the non-uniform exposure. Two methods are used to treat different types of pavement cracks:

1) if the crack is a transverse crack, an oblique crack or an alligator skin-shaped crack, an average value method is adopted, namely an average pixel value P _ m is taken from each single field image, wherein m represents an average value; circularly traversing each column, recording the pixel value c (: i) of each column, wherein i is a circular parameter, i is 1,2,3 … …, and taking the average value c _ m (i) of c (: i); then, the pixel value of this column is adjusted, and the adjusted pixel value is equal to the difference between the column average and the graph average plus the original pixel value: c (: i) ═ c (: i) + (c _ m (i) -P _ m). (1)

2) If the crack is a longitudinal crack, a mask processing method is adopted, and the following model is adopted for the image with uneven illumination:

I′(x，y)＝I(x，y)+B(x，y) (1)

where I' (x, y) is a non-uniform illumination image, I (x, y) is an image uniformly illuminated in an ideal state, B (x, y) is a background image reflecting a luminance distribution, x represents an abscissa of an image matrix, and y represents an ordinate of the image matrix.

Typically, a grayscale offset is added after the result:

I(x，y)＝I′(x，y)-B(x，y)+offset (2)

wherein the gray scale offset is typically set to the average of the low pass filtered results to make the average intensity of the image after masking consistent with the original image.

Next, the grayscale resolution is reduced to 32 levels within the grayscale range, 0-7, 8-15, … 248-255. The gray values within a certain range become close to the threshold value, e.g. 10 becomes 8 and 250 becomes 248.

Then, a histogram based on the pixel intensity values of the gradation distribution is used to fit the distribution of the gradation pixel values, a threshold is established, and whether it is a background or not is determined. Binary color visualization (black, white) is achieved with a thresholding method based on the mean of the previous step. If the pixel value is greater than the threshold, it is set as background. Since the crack is usually darker than the road surface, the dark pixel is set as foreground and the bright pixel is set as background; if the pixel value is greater than the threshold, the foreground is set. The background is set to 0 and the foreground to 255.

Then, the crack shape is enhanced using a method based on a connecting member (noise removal). The noise comes substantially from the texture of the road surface, the area of which is smaller than the cracks. Based on this finding, the present invention applies join tool denoising. All connected objects in the map are searched and the area of the crack shape is examined. If the area of the crack shape is less than the threshold, it is considered noise. If the area of the crack shape is greater than the threshold, the crack is considered.

Finally, the CNN inputs are adjusted and expansion and erosion are applied to reconnect the fractures. This step includes two substeps: first, the image is resized to 156 × 156 pixels (one smaller sized image for CNN training input); in the second step, dilation is performed and then erosion is performed with an image of size (6,6) to connect some unconnected cracks.

Fig. 2 shows the results of six-step image preprocessing of field photographs of four pavement crack types.

Step two: a CNN architecture.

The developed architecture contains 6 hidden units, each consisting of a convolutional layer, a ReLU activation function and a max pooling layer. The convolutional layers have a data depth of 50, indicating that there are 50 filters in total per convolutional layer, and one 5 × 5 filter is used. The filter is filled with 0,step size 1. The ReLU activation function is then used, as shown in equation (3). After the ReLU activation function, with the maximum pooling layer, the window size is 2 × 2, with a step size of 2 × 2. The final architecture is as shown in fig. 3, the input vector x from the neural network of the previous layer, which enters the neuron in the processed image, is input into the activation function, and is output to the neuron of the next layer. The ReLU function used in the present invention is represented as follows:

for the loss function, the traditional classification cross entropy is used as the loss function, the probability of the CNN outputting the total number of the C classes of each image is trained, the difference value between the predicted value and the true value of a single sample is smaller when the sample image is input, and the CNN model is better. The common classification cross-entropy loss CE is:

wherein, t_iAnd s_iFraction of each section i in the total number of classes C of true and CNN, respectively, f(s)_iIs the Softmax function.

For the learning method, an Adam method is used for optimizing the model parameters, and the Adam method is a simple and high-calculation-efficiency random objective function gradient optimization algorithm.

The method combines the advantages that Adagrad is good at processing sparse gradients and RMSprop is good at processing non-stationary targets, and is also suitable for large data sets and high-dimensional spaces. Adam is used in the present invention because it can be well adapted to a wide range of non-convex optimization problems.

Adam holds the past mean squared gradient v_tIn the trend of exponential decay. It also has a past gradient m of exponential decay tendency_tAnd a flat minimum preference in the error plane. Then, the past attenuation mean and the past squared gradient m are calculated_tAnd v_tThe corresponding is as follows:

m_t＝β₁m_t-1+(1-β₁)g_t(6)

wherein m is_tAnd v_tAre estimates of the first moment (mean) and the second moment (no central variance) of the gradient, respectively. The algorithm keeps the random gradient of the image matrix descending and single learning rate, and updates all weights in the CNN.

Due to m_tAnd v_tVectors initialized to 0, which are biased toward 0, these biases can be calculated as:

these t are then used and the parameters are updated as:

β₁default value of 0.9, β₂Default value of (2) is 0.999, and the default value of epsilon is 10^-8. Each epoch (iteration) is the entire process of neural network training through the entire data set, including forward and backward. In the present invention the learning rate is initialized to 0.001 and after each epoch the learning rate decays to 0.001/i, where i is the number of iterations of the image.

Typically, minipatch is sized to 2ⁿ(n-1, 2,3 …) to accommodate CPU/GPU memory. For convenience, the minipatch size is chosen to be 20, since the current study uses a limited number of live photographs. For unprocessed live photographs CNN, 6 hidden units and 150 epochs were chosen as the hyper-parameters. Note that the purpose of the epochs comparison is to find a reasonable training time for possible applications in actual pavement maintenance programs.

Step three: and adjusting the CNN hyper-parameter and determining an optimal CNN frame.

For the number of hidden units, the hidden units are repeated n times, where n is 2,3,4,5,6, and 5 different architectures were tested for the total design. Taking the effectiveness and efficiency of different architectures under different n as the reference, taking the average result of 5 independent repeated operations, respectively corresponding to different hidden units and epochs, it can be seen that the CNN of 150epoch is generally higher than the model accuracy of 100 and 120epoch, because of sufficient iterative computation. For a fixed epoch, the model accuracy increases with increasing number of implicit units and then stabilizes around 0.9. Fig. 4 shows the result of verifying the accuracy. It can be seen that the trend of the three epoch curves (100,120,150) also generally increases with increasing number of implicit units, eventually settling between 0.88 and 0.92. Figure 4 depicts the results of the test accuracy. It can be seen that the test accuracy of 6 hidden units with 150 epochs is the highest and is 0.844, so that the hidden units can be used as an optimal framework of a crack mode, and the result of the hyper-parameter tuning experiment for different hidden units is shown in fig. 5. In general, an epoch of 150 and an architecture of 6 hidden units has a higher test accuracy than 2 to 5 hidden units. Fig. 6 shows the adjustment of the CNN architecture for n hidden units.

As to the size of the filter, since the present invention is performed on a small data set, the size of the filter may affect the network performance, and the experimental result is shown in fig. 7, where the size of the current filter is selected to be5x 5. It has higher accuracy (0.844) and reasonable computation time (1925.23 seconds), and the results are shown in table 1.

TABLE 1 comparison of different size filters

For minimatch size, minimatch size affects the learning rate of the neural network. Usually it is chosen to be 2ⁿTo accommodate CPU/GPU memory. For convenience, 20 is currently used. To investigate the effect on learning rate, pair 16 (2)⁴) 20 and 32 (2)⁵) The minipatch sizes of the samples were compared and the comparison is shown in FIG. 8. Each size was averaged over 5 independent replicates corresponding to 6 hidden units and a 5x5 filter. It can be seen that the CNN prediction accuracy of minimatch size of 16 is better than that of 32 minimatch size.

Overfitting is a common problem for many machine learning. If the model is over-fitted, the resulting model is hardly usable. In order to solve the overfitting problem, a model integration method is generally adopted, namely a plurality of models are trained to be combined. In this case, it takes a lot of time to train the models, and it takes much time to test not only a plurality of models but also a plurality of models. The dropout function can effectively relieve the occurrence of overfitting and achieve the regularization effect to a certain extent. Thus, the dropout function is finally used to prevent overfitting, preventing complex co-adaptation of neurons. The probability of selecting the dropout function as the random omission of hidden neurons is 0.2. As shown in table 2.

TABLE 2 application dropout vs. Standard neural network

Drawings

Fig. 1 shows a crack type calculation procedure.

FIG. 2 is a six-step image pre-processing result of an in-situ photograph of four pavement crack types.

Fig. 3 is a CNN architecture diagram.

FIG. 4 shows experimental results for different implicit units and epochs.

Fig. 5 shows the super-parameter tuning for different hidden units.

Fig. 6 shows CNN architecture adjustment for n hidden units.

Fig. 7 is an experimental diagram of filter size.

FIG. 8 is a minipatch size comparison chart.

Fig. 9 shows four basic forms of pavement cracks.

Fig. 10 is a single CNN model learning process.

FIG. 11 is a learning process of the method of the present invention.

Detailed Description

The method adopts a ZOYON-RTM intelligent road detection vehicle to acquire the field image of the asphalt pavement crack. The detection system is provided with an advanced vehicle-mounted sensor system, a vehicle-mounted computer and an embedded integrated multi-sensor synchronous control unit.

The pavement damage detection system is provided with a linear array camera with 2048 pixel/line resolution and an infrared laser pavement auxiliary lighting system, and ensures all-weather detection of pavement cracks. When the test vehicle runs at a speed of 5-100 km/h in the daytime, the line scanning camera behind the vehicle body can continuously shoot road surface images at a high speed. Meanwhile, the infrared filter is used for removing shadows generated by sunlight irradiation. These high quality images are of sufficient resolution to ensure that the human eye can directly identify the pavement cracks.

Although the road surface detection system is designed using a noise reduction tool, a crack pattern in an image is difficult to be recognized by the naked eye due to the influence of background noise. Actual pavement cracks are generally divided into several basic types, and four basic types are mainly researched by the invention, namely transverse cracks, longitudinal cracks, oblique cracks and crocodile skin-shaped cracks.

By visual judgment, the pavement cracks in the original images are divided into four basic forms: transverse slits (151 images), longitudinal slits (119 images), oblique slits (68 images), and alligator skin slits (133 images), as shown in fig. 9. There are 46 test photos, and the number of test photos per category is proportional to the size of the training set.

Then, the results of the typical CNN architecture test under 6 hidden units, 5 × 5 filters and 150 epochs conditions of the unprocessed live photograph (single CNN) and the preprocessed live photograph (method of the present invention) are compared, and the verification results are shown in fig. eight. The actual cracks in each picture are identified by visual inspection by a human being and compared with the two measures. The test results show that for a single CNN study, all test fractures were identified by CNN as transverse fractures. Through 5 independent repeated experiments, the average value of the precision of the obtained training model is 0.31, the verification precision is 0.28, and the prediction precision of the test crack form is 0.30. It can be concluded that a single CNN has very low accuracy for analysis of limited unprocessed live photographs. CNN learns little information from this small dataset because random predictions can reach 25% accuracy. In comparison, for a typical test, the computational model accuracy of the method of the present invention is 0.9159, the verification accuracy is 0.8729, and the test accuracy is 0.8478. The calculated time was 3059.21 seconds. Compared with single CNN research, the method can identify the crack types of most of test photos, and has satisfactory accuracy and reasonable calculation time.

Fig. 10 and 11 are comparisons of a single CNN model with the method of the present invention. Comparing fig. 10 and 11, it can be seen that the loss and accuracy in fig. ten remain constant throughout the epoch, while the loss decreases and the accuracy increases in fig. 11. This contrast refers to the CNN learning the fracture type from the training image more efficiently after image pre-processing. The accuracy and loss change rapidly at first, which shows that the learning speed of the CNN model is fast. After a certain time, the trend changes slowly and finally stabilizes at about 0.8-0.9 (precision) and 0.2-0.3 (loss).