Movatterモバイル変換


[0]ホーム

URL:


CN111598095B - Urban road scene semantic segmentation method based on deep learning - Google Patents

Urban road scene semantic segmentation method based on deep learning
Download PDF

Info

Publication number
CN111598095B
CN111598095BCN202010156966.XACN202010156966ACN111598095BCN 111598095 BCN111598095 BCN 111598095BCN 202010156966 ACN202010156966 ACN 202010156966ACN 111598095 BCN111598095 BCN 111598095B
Authority
CN
China
Prior art keywords
image
layer
network
residual error
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010156966.XA
Other languages
Chinese (zh)
Other versions
CN111598095A (en
Inventor
宋秀兰
魏定杰
孙云坤
何德峰
余世明
卢为党
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUTfiledCriticalZhejiang University of Technology ZJUT
Priority to CN202010156966.XApriorityCriticalpatent/CN111598095B/en
Publication of CN111598095ApublicationCriticalpatent/CN111598095A/en
Application grantedgrantedCritical
Publication of CN111598095BpublicationCriticalpatent/CN111598095B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

A deep learning-based urban road scene semantic segmentation method comprises the following steps: 1) Acquiring an image of the front end of the vehicle; 2) And expanding the input data of the marked image and the original image: randomly cutting, splicing or adding different types of noise to the image, transforming the image through an image affine matrix, and finally, maintaining the original resolution of the image through transformation such as filling and cutting to obtain a data set; 3) The image after data expansion and the marked image are used for network training, and the residual U-net network comprises a down-sampling part, a bridge part, an up-sampling part and a classification part; 4) And modifying the time interval T of the acquisition module, inputting the subsequently obtained images into a trained deep learning model, outputting predicted semantic segmentation images, and transmitting different gray levels in the images back to the processor. The invention uses a smaller data set, can prevent the gradient from descending too fast and can ensure that the overfitting problem does not occur during training.

Description

Translated fromChinese
一种基于深度学习的城市道路场景语义分割方法A semantic segmentation method for urban road scenes based on deep learning

技术领域Technical Field

本发明属于智能车辆领域,基于深度学习的城市道路场景语义分割方法。The present invention belongs to the field of intelligent vehicles and is a method for semantic segmentation of urban road scenes based on deep learning.

背景技术Background Art

近年来,随着城市化的不断进展,城市路况也变得越来越复杂,行人、交通信号灯、斑马线和不同的交通工具都会影响智能车辆的车速和避障措施。通过深度学习的语义分割方法可以很好的识别车辆周围的环境,并做出不同的反馈。语义分割是为图像每一像素点赋予预设的类别,这不仅可以保持智能车辆在行驶时实时对周围环境进行理解,还可以降低交通事故的发生。因此,关于深度学习城市道路环境的研究一直是车辆智能化领域的研究热点。现有的深度学习语义分割的方法研究有Segnet、Fcn和Resnet等神经网络。虽然这些神经网络不需要传统的物体识别流程,可以自动学习特征,不用工程师手动设计,并且网络可以通过大量的图像训练得到一个合适的模型,并输出语义分割的结果,但是现有的网络训练时会存在以下问题:1、因权重的数量过多造成过拟合问题;2、因网络层数较多,可能发生梯度快速下降的问题;3、因训练所需的数据集较大,训练时间较长等问题。这些问题将使深度学习网路难以输出精确的语义分割结果,从而导致智能车辆在复杂的路况下难以实时的得到周围环境的反馈,即存在安全隐患。因此,设计一种使用较小数据集,同时可以防止梯度下降过快,并且能够保证在训练时不发生过拟合问题的网络还是很有价值的。In recent years, with the continuous progress of urbanization, urban road conditions have become more and more complicated. Pedestrians, traffic lights, zebra crossings and different means of transportation will affect the speed and obstacle avoidance measures of intelligent vehicles. The semantic segmentation method of deep learning can well identify the environment around the vehicle and make different feedbacks. Semantic segmentation is to assign a preset category to each pixel of the image, which can not only keep the intelligent vehicle understanding the surrounding environment in real time while driving, but also reduce the occurrence of traffic accidents. Therefore, the research on deep learning urban road environment has always been a research hotspot in the field of vehicle intelligence. Existing deep learning semantic segmentation methods include neural networks such as Segnet, Fcn and Resnet. Although these neural networks do not require traditional object recognition processes, they can automatically learn features without manual design by engineers, and the network can obtain a suitable model through a large number of image training and output the results of semantic segmentation, but the existing network training will have the following problems: 1. Overfitting problem caused by too many weights; 2. Rapid gradient descent problem may occur due to the large number of network layers; 3. The training data set required for training is large and the training time is long. These problems will make it difficult for deep learning networks to output accurate semantic segmentation results, making it difficult for smart vehicles to get real-time feedback from the surrounding environment under complex road conditions, which poses a safety hazard. Therefore, it is still very valuable to design a network that uses a smaller data set, prevents the gradient from falling too fast, and ensures that there is no overfitting problem during training.

发明内容Summary of the invention

为了克服现有技术的不足,为考虑智能车辆在城市道路等复杂环境下,能对周围环境做出较好的识别,本发明提出了一种基于深度学习对城市道路场景进行语义分割的方法,使用较小数据集,同时可以防止梯度下降过快,并且能够保证在训练时不发生过拟合问题。In order to overcome the shortcomings of the existing technology and to enable smart vehicles to better recognize the surrounding environment in complex environments such as urban roads, the present invention proposes a method for semantic segmentation of urban road scenes based on deep learning, which uses a smaller data set and can prevent the gradient from descending too fast, and can ensure that overfitting problems do not occur during training.

本发明解决其技术问题所采用的技术方案是:The technical solution adopted by the present invention to solve the technical problem is:

一种基于深度学习的城市道路场景语义分割方法,所述方法包括以下步骤:A method for semantic segmentation of urban road scenes based on deep learning, the method comprising the following steps:

1)、车辆前端的图像采集:定时采集城市道路图像,设定的时间间隔为T,并将分辨率为h×w的图像输入图像检测模块,得到有效的图像;然后图像输入标注模块中标注,系统采用公开的图像界面的标注软件Labelme3.11.2进行标注,通过其场景分割标注功能,将图像上的车辆、行人、自行车、交通信号灯和霓虹灯物体框定并标注为不同的类别,生成的标注图像通过不同灰度级来反应不同类的物体,从标注图像的不同灰度得到灰度表list和图像中所存物体类别K;1) Image acquisition of the front end of the vehicle: collect urban road images at regular intervals, set the time interval as T, and input the image with a resolution of h×w into the image detection module to obtain a valid image; then input the image into the annotation module for annotation. The system uses the public image interface annotation software Labelme3.11.2 for annotation. Through its scene segmentation and annotation function, the vehicles, pedestrians, bicycles, traffic lights and neon lights on the image are framed and labeled into different categories. The generated annotated image reflects different types of objects through different gray levels. The grayscale table list and the object category K stored in the image are obtained from the different gray levels of the annotated image;

2)、标注图像与原图像输入数据扩充:将图像随机裁剪、拼接或添加不同类型噪声,再通过图像仿射矩阵对图像变换,仿射变换参见公式(1):2) Data expansion of the labeled image and the original image input: randomly crop, splice or add different types of noise to the image, and then transform the image through the image affine matrix. The affine transformation is shown in formula (1):

Figure BDA0002404413090000021
Figure BDA0002404413090000021

仿射矩阵中sx表示横向平移量和sy表示纵向平移量,c1表示图像横坐标放大或缩小倍数,c4表示纵坐标放大或缩小的倍数,c2和c3控制图像剪切变换,(a,b表示原像素位置,(a′,b′)为变换后位置,最后通过填充和裁剪等变换,保持图像的原有分辨率,得到数据集;In the affine matrix,sx represents the horizontal translation andsy represents the vertical translation,c1 represents the magnification or reduction factor of the image horizontal coordinate,c4 represents the magnification or reduction factor of the vertical coordinate,c2 andc3 control the image shearing transformation, (a, b represents the original pixel position, (a′, b′) is the transformed position, and finally, through transformations such as filling and cropping, the original resolution of the image is maintained to obtain the data set;

3)、使用数据扩充后的图像和标注图像进行网络的训练,残差U-net网络由四个部分组成,分别是下采样部分、桥梁部分、上采样部分和分类部分;3) Use the data-expanded images and labeled images to train the network. The residual U-net network consists of four parts: downsampling part, bridge part, upsampling part and classification part;

图像长度h,图像宽度w,损失函数大小L,网络迭代次数epochs,批量处理大小batch_size和验证集比例rate。数据集将通过rate分为训练集和验证集,训练时按batch_size分批输入残差U-net网络中进行训练,通过网络输出的预测图像与实际标签图像计算L,并反向传播调节网络中的参数使L输出趋于最小化,反复训练网络到迭代次数,在迭代过程中通过验证集调整网络参数。最后得到最优的网络模型。Image length h, image width w, loss function size L, network iteration number epochs, batch processing size batch_size and validation set ratio rate. The data set will be divided into training set and validation set by rate. During training, the residual U-net network is input in batches according to batch_size for training. L is calculated by the predicted image output by the network and the actual label image, and the parameters in the network are adjusted by back propagation to minimize the L output. The network is repeatedly trained to the number of iterations, and the network parameters are adjusted by the validation set during the iteration process. Finally, the optimal network model is obtained.

4)路况分类:修改采集模块时间间隔T,将后续得到的图像输入训练好的深度学习模型中,输出预测的语义分割图像,并将图像中不同灰度回传给处理器,这样车辆就可以很好的识别出前方位置存在哪些类别的物体,以做出后续的不同反应。4) Road condition classification: Modify the acquisition module time interval T, input the subsequent images into the trained deep learning model, output the predicted semantic segmentation image, and send the different grayscale in the image back to the processor, so that the vehicle can well identify what categories of objects exist in the front position and make different subsequent responses.

进一步,所述步骤3)中,下采样部分分为四级,各级均由一个残差网络组成,分别是第一级到第四级残差网络。第一级残差网络内各层连接顺序为:卷积层、批归一化层、softmax函数层、卷积层和融合层,最后通过恒等连接的方式在融合层将输入图像与处理后的特征图像融合。第二级到四级残差网络各层的形式相同,其连接顺序为:批归一化层、softmax函数层、卷积层、批归一化层、softmax函数层、卷积层和融合层,最后也通过恒等连接的方式在融合层将输入的特征图像与处理后的特征图像融合;卷积层由3×3的卷积核构成,各级的两个卷积核维度分别为64、128、256和512,最后各级通过2×2步长为2的池化层进行相连,其维度变化与各级的卷积层相同Furthermore, in the step 3), the downsampling part is divided into four levels, and each level is composed of a residual network, namely the first to fourth level residual networks. The connection order of each layer in the first level residual network is: convolution layer, batch normalization layer, softmax function layer, convolution layer and fusion layer. Finally, the input image and the processed feature image are fused in the fusion layer by the same connection method. The form of each layer of the second to fourth level residual network is the same, and the connection order is: batch normalization layer, softmax function layer, convolution layer, batch normalization layer, softmax function layer, convolution layer and fusion layer. Finally, the input feature image and the processed feature image are fused in the fusion layer by the same connection method; the convolution layer is composed of a 3×3 convolution kernel, and the dimensions of the two convolution kernels at each level are 64, 128, 256 and 512 respectively. Finally, each level is connected by a 2×2 pooling layer with a step size of 2, and its dimensional change is the same as that of the convolution layer at each level.

再进一步,所述步骤3)中,桥梁部分为网络高底维度信息拼接做准备,它由两层批归一化层、两层softplus函数层、两层3×3维度为1024的卷积层构成,无融合层,所以不需要恒等连接,各层的连接顺序与第二级残差网络相同。最后通过上采样层将特征图像调整到拼接的大小。Furthermore, in step 3), the bridge part is prepared for the splicing of high and low dimensional information of the network. It consists of two batch normalization layers, two softplus function layers, and two 3×3 convolution layers with a dimension of 1024. There is no fusion layer, so there is no need for identity connection. The connection order of each layer is the same as that of the second-level residual network. Finally, the feature image is adjusted to the splicing size through the upsampling layer.

更进一步,所述步骤3)中,上采样部分也通过四级残差网络组成,分别是第五级到第八级残差网络,残差网络的形式和各层的连接顺与下采样部分各级残差网络基本相同,只是在第五到第七级残差网络的恒等连接通过一个1×1的卷积层来替代,而第八级残差网络不变,上采样各级残差网络内卷积层维度分别是512、256、128和64,各级之间通过上采样层和拼接层连接,拼接层将对应尺寸的高低维度信息进行拼接,拼接措施如下:Furthermore, in the step 3), the upsampling part is also composed of four levels of residual networks, namely the fifth to eighth levels of residual networks. The form of the residual network and the connection sequence of each layer are basically the same as those of the residual networks at all levels of the downsampling part, except that the identity connection of the fifth to seventh levels of residual networks is replaced by a 1×1 convolution layer, while the eighth level of residual network remains unchanged. The dimensions of the convolution layers in the upsampling residual networks at all levels are 512, 256, 128 and 64, respectively. Each level is connected by an upsampling layer and a splicing layer. The splicing layer splices the high and low dimensional information of the corresponding size. The splicing measures are as follows:

(3.1)、第四级残差网络的输出经过池化层后的特征图像与桥梁部分输出的特征图像进行拼接;(3.1) The feature image of the output of the fourth-level residual network after the pooling layer is spliced with the feature image output of the bridge part;

(3.2)、第三级残差网络的输出经过池化层后的特征图像与第五级残差网络的输出经过上采样层后的特征图像进行拼接;(3.2) The feature image of the output of the third-level residual network after the pooling layer is concatenated with the feature image of the output of the fifth-level residual network after the upsampling layer;

(3.3)、第二级残差网络的输出经过池化层后的特征图像与第六级残差网络的输出经过上采样层后的特征图像进行拼接;(3.3) The feature image of the output of the second-level residual network after the pooling layer is concatenated with the feature image of the output of the sixth-level residual network after the upsampling layer;

(3.4)、第一级残差网络的输出经过池化层后的特征图像与第七级残差网络的输出经过上采样层后的特征图像进行拼接;(3.4) The feature image of the output of the first-level residual network after the pooling layer is concatenated with the feature image of the output of the seventh-level residual network after the upsampling layer;

拼接后特征图像的维度发生变化,使用替代恒等连接的1×1卷积层调整特征图像维度,四个1×1卷积层维度分别为512、256、128和64,最后在融合层进行特征图像的融合。After splicing, the dimension of the feature image changes. The 1×1 convolution layer with alternative identity connection is used to adjust the dimension of the feature image. The dimensions of the four 1×1 convolution layers are 512, 256, 128, and 64, respectively. Finally, the feature images are fused in the fusion layer.

所述步骤3)中,分类部分通过1×1的卷积层和softmax层组成,由于城市道路图像分割涉及车辆、行人、自行车、交通信号灯、霓虹灯和背景六个类,所以通过1×1的卷积层得到6个通道的特征图像,但原始特征图像的像素值表示的不是概率值,所以通过softmax层将输出转换为概率分布,softmax函数,参见公式(2):In step 3), the classification part is composed of a 1×1 convolution layer and a softmax layer. Since the urban road image segmentation involves six categories: vehicles, pedestrians, bicycles, traffic lights, neon lights and background, a feature image of 6 channels is obtained through the 1×1 convolution layer. However, the pixel value of the original feature image does not represent a probability value, so the output is converted into a probability distribution through the softmax layer. The softmax function is shown in formula (2):

Figure BDA0002404413090000041
Figure BDA0002404413090000041

其中dk(x)表示像素x是在通道k上的取值,K表示物品类别数量,gk(x)表示像素x属于k类的概率,gk(x)∈[0,1],各通道中概率最大的就是所对应的类;Where dk (x) indicates the value of pixel x on channel k, K indicates the number of item categories, gk (x) indicates the probability that pixel x belongs to category k, gk (x)∈[0,1], and the channel with the highest probability corresponds to the category.

然后使用交叉熵损失函数来评估预测结果与实际的偏差,损失函数,参见公式(3):Then the cross entropy loss function is used to evaluate the deviation between the predicted result and the actual result. The loss function is shown in formula (3):

Figure BDA0002404413090000042
Figure BDA0002404413090000042

其中t(x)表示像素x所对应的类,所以gt(x)(x)表示该类的概率,

Figure BDA0002404413090000043
表示标注图像对应像素x属于k类的概率,所以损失函数的值越小表示预测图像和标注图像越相近。通过损失函数反向传递,对神经网络内部参数进行不断的优化,使损失函数不断减少趋于理想取值;Where t(x) represents the class corresponding to pixel x, so gt(x) (x) represents the probability of this class.
Figure BDA0002404413090000043
It represents the probability that the pixel x corresponding to the labeled image belongs to class k, so the smaller the value of the loss function, the closer the predicted image is to the labeled image. Through the reverse transfer of the loss function, the internal parameters of the neural network are continuously optimized, so that the loss function is continuously reduced and tends to the ideal value;

最后训练模型时还需要确定网络的迭代次数epochs、批量处理大小batch_size和验证集比例rate。验证集比例将得到的图像集分为训练集和验证集,然后将训练集中的图像按批量处理大小分批输入网络直至训练集图像全部输入,完成了一次迭代,最后通过确定的迭代次数反复训练模型,以得到最优的神经网络模型。When training the model, you also need to determine the number of network iterations epochs, batch processing size batch_size and validation set ratio rate. The validation set ratio divides the obtained image set into a training set and a validation set, and then inputs the images in the training set into the network in batches according to the batch processing size until all the images in the training set are input, completing one iteration. Finally, the model is repeatedly trained through the determined number of iterations to obtain the optimal neural network model.

本发明主要执行部分是图像的采集与处理、神经网络的训练和使用识别模型对图像进行识别。本方法实施过程可以分以下三个阶段:The main execution parts of the present invention are image acquisition and processing, neural network training and image recognition using recognition models. The implementation process of this method can be divided into the following three stages:

第一、图像数据获取:设置采集模块时间间隔T,选取不同的城市环境路段采集图像,输入检测模块,得到有效的图像集;然后使用标注软件labelme3.11.2对图像标注,通过实例场景分割标注功能,框定图像中的目标物品,并标注物体的类别,软件将生成标注图像,图像中使用不同的灰度级标注不同的物体。通过标注图像的不同灰度级,并得到灰度列表list=[]和物品类别数K;最后通过数据扩充模块对图像与标注图像都进行扩充得到数据集。First, image data acquisition: set the acquisition module time interval T, select different urban environment road sections to collect images, input the detection module, and obtain a valid image set; then use the labeling software labelme3.11.2 to annotate the image, use the instance scene segmentation and annotation function to frame the target objects in the image, and annotate the object category. The software will generate an annotated image, and use different grayscale levels to annotate different objects in the image. By annotating the different grayscale levels of the image, the grayscale list list = [] and the number of object categories K are obtained; finally, the image and the annotated image are expanded through the data expansion module to obtain the data set.

第二、网络的参数和训练:图像长度h,图像宽度w,损失函数大小L,网络迭代次数epochs,批量处理大小batch_size和验证集比例rate。数据集将通过rate分为训练集和验证集,训练时按batch_size分批输入残差U-net网络中进行训练,通过网络输出的预测图像与实际标签图像计算L,并反向传播调节网络中的参数使L输出趋于最小化,反复训练网络到迭代次数,在迭代过程中通过验证集调整网络参数。最后得到最优的网络模型。Second, network parameters and training: image length h, image width w, loss function size L, network iteration number epochs, batch processing size batch_size and validation set ratio rate. The data set will be divided into training set and validation set by rate. During training, it will be input into the residual U-net network in batches according to batch_size for training. L is calculated by the predicted image output by the network and the actual label image, and the parameters in the network are adjusted by back propagation to minimize the L output. The network is repeatedly trained to the number of iterations, and the network parameters are adjusted by the validation set during the iteration process. Finally, the optimal network model is obtained.

第三、路况分类:修改采集模块时间间隔T,将后续得到的图像输入训练好的深度学习模型中,输出预测的语义分割图像,并将图像中不同灰度回传给处理器,这样车辆就可以很好的识别出前方位置存在哪些类别的物体,以做出后续的不同反应。Third, road condition classification: modify the acquisition module time interval T, input the subsequent images into the trained deep learning model, output the predicted semantic segmentation image, and send the different grayscale in the image back to the processor, so that the vehicle can well identify what categories of objects exist in the front position and make different subsequent responses.

本发明的有益效果主要表现在:1、在网络设计中,综合考虑了深度学习网络在训练时可能出现的梯度下降过快、需要数据集过大和过拟合问题,所以在网络中添加了批归一化、残差网络和高底层信息拼接的方法,有效的较少梯度下降和图像信息缺失的问题;有利于提高语义分割的准确性;2、该深度学习的路况检测系统设计简单,便于理解,所用的数据集少,实时性高,实用性和适应性强。The beneficial effects of the present invention are mainly manifested in: 1. In the network design, the problems of too fast gradient descent, too large data set and overfitting that may occur in the deep learning network during training are comprehensively considered, so batch normalization, residual network and high-level information splicing methods are added to the network, which effectively reduces the problems of gradient descent and missing image information; it is beneficial to improve the accuracy of semantic segmentation; 2. The deep learning road condition detection system is simple in design, easy to understand, uses a small data set, has high real-time performance, and is highly practical and adaptable.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为深度学习的城市道路场景语义分割系统的实施的流程。Figure 1 shows the implementation process of the deep learning urban road scene semantic segmentation system.

图2为深度学习的城市道路场景语义分割系统中使用的残差U-net网络的整体模型设计。Figure 2 shows the overall model design of the residual U-net network used in the deep learning urban road scene semantic segmentation system.

图3为深度学习的城市道路场景语义分割系统使用的残差U-net网络内第二级到第五级残差网络的网络形式。Figure 3 shows the network form of the second to fifth level residual networks in the residual U-net network used by the deep learning urban road scene semantic segmentation system.

图4为深度学习的城市道路场景语义分割效果图展示。Figure 4 shows the effect of deep learning semantic segmentation of urban road scenes.

具体实施方式DETAILED DESCRIPTION

下面结合附图对本发明的方法作进一步详细说明。The method of the present invention is further described in detail below with reference to the accompanying drawings.

参照图1~图4,一种基于深度学习的城市道路场景语义分割方法,所述方法包括以下步骤:1 to 4, a method for semantic segmentation of urban road scenes based on deep learning is provided, the method comprising the following steps:

1)、车辆前端的图像采集:定时采集城市道路图像,设定的时间间隔为T,并将分辨率为h×w的图像输入图像检测模块,得到有效的图像;然后图像输入标注模块中标注,系统采用公开的图像界面的标注软件Labelme3.11.2进行标注,通过其场景分割标注功能,将图像上的车辆、行人、自行车、交通信号灯和霓虹灯物体等框定并标注为不同的类别,生成的标注图像通过不同灰度级来反应不同类的物体,从标注图像的不同灰度得到灰度表list和图像中所存物体类别K;1) Image acquisition of the front end of the vehicle: collect urban road images at regular intervals, set the time interval as T, and input the image with a resolution of h×w into the image detection module to obtain a valid image; then input the image into the annotation module for annotation. The system uses the public image interface annotation software Labelme3.11.2 for annotation. Through its scene segmentation and annotation function, the vehicles, pedestrians, bicycles, traffic lights and neon lights on the image are framed and labeled into different categories. The generated annotated image reflects different types of objects through different gray levels. The grayscale table list and the object category K stored in the image are obtained from the different gray levels of the annotated image;

2)、标注图像与原图像输入数据扩充:将图像随机裁剪、拼接或添加不同类型噪声,再通过图像仿射矩阵对图像变换,仿射变换参见公式(1):2) Data expansion of the labeled image and the original image input: randomly crop, splice or add different types of noise to the image, and then transform the image through the image affine matrix. The affine transformation is shown in formula (1):

Figure BDA0002404413090000061
Figure BDA0002404413090000061

仿射矩阵中sx表示横向平移量和sy表示纵向平移量,c1表示图像横坐标放大或缩小倍数,c4表示纵坐标放大或缩小的倍数,c2和c3控制图像剪切变换,(a,b)表示原像素位置,(a′,b′)为变换后位置,最后通过填充和裁剪等变换,保持图像的原有分辨率,得到数据集;In the affine matrix, sx represents the horizontal translation and sy represents the vertical translation, c1 represents the magnification or reduction factor of the image horizontal coordinate, c4 represents the magnification or reduction factor of the vertical coordinate, c2 and c3 control the image shearing transformation, (a, b) represents the original pixel position, (a′, b′) is the transformed position, and finally, through transformations such as filling and cropping, the original resolution of the image is maintained to obtain the data set;

3)、使用数据扩充后的图像和标注图像进行网络的训练,残差U-net网络由四个部分组成,分别是下采样部分、桥梁部分、上采样部分和分类部分;3) Use the data-expanded images and labeled images to train the network. The residual U-net network consists of four parts: downsampling part, bridge part, upsampling part and classification part;

图像长度h,图像宽度w,损失函数大小L,网络迭代次数epochs,批量处理大小batch_size和验证集比例rate。数据集将通过rate分为训练集和验证集,训练时按batch_size分批输入残差U-net网络中进行训练,通过网络输出的预测图像与实际标签图像计算L,并反向传播调节网络中的参数使L输出趋于最小化,反复训练网络到迭代次数,在迭代过程中通过验证集调整网络参数。最后得到最优的网络模型。Image length h, image width w, loss function size L, network iteration number epochs, batch processing size batch_size and validation set ratio rate. The data set will be divided into training set and validation set by rate. During training, the residual U-net network is input in batches according to batch_size for training. L is calculated by the predicted image output by the network and the actual label image, and the parameters in the network are adjusted by back propagation to minimize the L output. The network is repeatedly trained to the number of iterations, and the network parameters are adjusted by the validation set during the iteration process. Finally, the optimal network model is obtained.

4)路况分类:修改采集模块时间间隔T,将后续得到的图像输入训练好的深度学习模型中,输出预测的语义分割图像,并将图像中不同灰度回传给处理器,这样车辆就可以很好的识别出前方位置存在哪些类别的物体,以做出后续的不同反应。4) Road condition classification: Modify the acquisition module time interval T, input the subsequent images into the trained deep learning model, output the predicted semantic segmentation image, and send the different grayscale in the image back to the processor, so that the vehicle can well identify what categories of objects exist in the front position and make different subsequent responses.

进一步,所述步骤3)中,下采样部分分为四级,各级均由一个残差网络组成,分别是第一级到第四级残差网络。第一级残差网络内各层连接顺序为:卷积层、批归一化层、softmax函数层、卷积层和融合层,最后通过恒等连接的方式在融合层将输入图像与处理后的特征图像融合。第二级到四级残差网络各层的形式相同,其连接顺序为:批归一化层、softmax函数层、卷积层、批归一化层、softmax函数层、卷积层和融合层,最后也通过恒等连接的方式在融合层将输入的特征图像与处理后的特征图像融合。卷积层由3×3的卷积核构成,各级的两个卷积核维度分别为64、128、256和512。最后各级通过2×2步长为2的池化层进行相连,其维度变化与各级的卷积层相同。Further, in the step 3), the downsampling part is divided into four levels, each of which is composed of a residual network, namely the first to fourth residual networks. The connection order of each layer in the first residual network is: convolution layer, batch normalization layer, softmax function layer, convolution layer and fusion layer, and finally the input image is fused with the processed feature image in the fusion layer by the same connection method. The second to fourth residual networks have the same form, and the connection order is: batch normalization layer, softmax function layer, convolution layer, batch normalization layer, softmax function layer, convolution layer and fusion layer. Finally, the input feature image is fused with the processed feature image in the fusion layer by the same connection method. The convolution layer is composed of 3×3 convolution kernels, and the dimensions of the two convolution kernels at each level are 64, 128, 256 and 512 respectively. Finally, each level is connected by a 2×2 pooling layer with a step size of 2, and its dimensional change is the same as that of the convolution layer at each level.

桥梁部分为网络高底维度信息拼接做准备,它由两层批归一化层、两层softplus函数层、两层3×3维度为1024的卷积层构成,无融合层,所以不需要恒等连接,各层的连接顺序与第二级残差网络相同。最后通过上采样层将特征图像调整到合适拼接的大小。The bridge part is prepared for the splicing of high and low dimensional information of the network. It consists of two batch normalization layers, two softplus function layers, and two 3×3 convolution layers with a dimension of 1024. There is no fusion layer, so there is no need for identity connection. The connection order of each layer is the same as that of the second-level residual network. Finally, the feature image is adjusted to a suitable splicing size through the upsampling layer.

上采样部分也通过四级残差网络组成,分别是第五级到第八级残差网络,残差网络的形式和各层的连接顺与下采样部分各级残差网络基本相同,只是在第五到第七级残差网络的恒等连接通过一个1×1的卷积层来替代,而第八级残差网络不变。上采样各级残差网络内卷积层维度分别是512、256、128和64。各级之间通过上采样层和拼接层连接,拼接层将对应尺寸的高低维度信息进行拼接,拼接措施:The upsampling part is also composed of four levels of residual networks, namely the fifth to eighth levels of residual networks. The form of the residual network and the connection sequence of each layer are basically the same as those of the residual networks at all levels of the downsampling part, except that the identity connection of the fifth to seventh levels of residual networks is replaced by a 1×1 convolution layer, while the eighth level of residual networks remains unchanged. The dimensions of the convolution layers in the upsampling residual networks are 512, 256, 128, and 64, respectively. Each level is connected by an upsampling layer and a splicing layer. The splicing layer splices the high and low dimensional information of the corresponding size. The splicing measures are:

(3.1)、第四级残差网络的输出经过池化层后的特征图像与桥梁部分输出的特征图像进行拼接。(3.1) The feature image of the output of the fourth-level residual network after the pooling layer is concatenated with the feature image output of the bridge part.

(3.2)、第三级残差网络的输出经过池化层后的特征图像与第五级残差网络的输出经过上采样层后的特征图像进行拼接。(3.2) The feature image of the output of the third-level residual network after the pooling layer is concatenated with the feature image of the output of the fifth-level residual network after the upsampling layer.

(3.3)、第二级残差网络的输出经过池化层后的特征图像与第六级残差网络的输出经过上采样层后的特征图像进行拼接。(3.3) The feature image of the output of the second-level residual network after the pooling layer is concatenated with the feature image of the output of the sixth-level residual network after the upsampling layer.

(3.4)、第一级残差网络的输出经过池化层后的特征图像与第七级残差网络的输出经过上采样层后的特征图像进行拼接。(3.4) The feature image of the output of the first-level residual network after the pooling layer is concatenated with the feature image of the output of the seventh-level residual network after the upsampling layer.

拼接后特征图像的维度发生变化,使用替代恒等连接的1×1卷积层调整特征图像维度,四个1×1卷积层维度分别为512、256、128和64,最后在融合层进行特征图像的融合。After splicing, the dimension of the feature image changes. The 1×1 convolution layer with alternative identity connection is used to adjust the dimension of the feature image. The dimensions of the four 1×1 convolution layers are 512, 256, 128, and 64, respectively. Finally, the feature images are fused in the fusion layer.

分类部分通过1×1的卷积层和softmax层组成,由于城市道路图像分割涉及车辆、行人、自行车、交通信号灯、霓虹灯和背景六个类,所以通过1×1的卷积层得到6个通道的特征图像,但原始特征图像的像素值表示的不是概率值,所以通过softmax层将输出转换为概率分布,softmax函数,参见公式(2):The classification part consists of a 1×1 convolution layer and a softmax layer. Since the urban road image segmentation involves six categories: vehicles, pedestrians, bicycles, traffic lights, neon lights, and background, a 6-channel feature image is obtained through a 1×1 convolution layer. However, the pixel values of the original feature image do not represent probability values, so the output is converted into a probability distribution through a softmax layer. The softmax function is shown in formula (2):

Figure BDA0002404413090000071
Figure BDA0002404413090000071

其中dk(x)表示像素x是在通道k上的取值,K表示物品类别数量,gk(x)表示像素x属于k类的概率,gk(x)∈[0,1]。各通道中概率最大的就是所对应的类。Where dk (x) indicates the value of pixel x on channel k, K indicates the number of object categories, gk (x) indicates the probability that pixel x belongs to category k, and gk (x)∈[0,1]. The channel with the highest probability corresponds to the category.

然后使用交叉熵损失函数来评估预测结果与实际的偏差,损失函数,参见公式(3):Then the cross entropy loss function is used to evaluate the deviation between the predicted result and the actual result. The loss function is shown in formula (3):

Figure BDA0002404413090000081
Figure BDA0002404413090000081

其中t(x)表示像素x所对应的类,所以gt(x)(x)表示该类的概率,

Figure BDA0002404413090000082
表示标注图像对应像素x属于k类的概率,所以损失函数的值越小表示预测图像和标注图像越相近。通过损失函数反向传递,对神经网络内部参数进行不断的优化,使损失函数不断减少趋于理想取值。Where t(x) represents the class corresponding to pixel x, so gt(x) (x) represents the probability of this class.
Figure BDA0002404413090000082
It represents the probability that the pixel x corresponding to the labeled image belongs to class k, so the smaller the value of the loss function, the closer the predicted image is to the labeled image. Through the reverse transfer of the loss function, the internal parameters of the neural network are continuously optimized, so that the loss function is continuously reduced and tends to the ideal value.

最后训练模型时还需要确定网络的迭代次数epochs、批量处理大小batch_size和验证集比例rate。验证集比例将得到的图像集分为训练集和验证集,然后将训练集中的图像按批量处理大小分批输入网络直至训练集图像全部输入,完成了一次迭代,最后通过确定的迭代次数反复训练模型,以得到最优的神经网络模型。When training the model, you also need to determine the number of network iterations epochs, batch processing size batch_size and validation set ratio rate. The validation set ratio divides the obtained image set into a training set and a validation set, and then inputs the images in the training set into the network in batches according to the batch processing size until all the images in the training set are input, completing one iteration. Finally, the model is repeatedly trained through the determined number of iterations to obtain the optimal neural network model.

本实施例主要执行部分是图像的采集与处理、神经网络的训练和使用识别模型对图像进行识别。本方法实施过程可以分以下三个阶段:The main execution parts of this embodiment are image acquisition and processing, neural network training and image recognition using recognition models. The implementation process of this method can be divided into the following three stages:

第一、图像数据获取:设置采集模块时间间隔T=4s,选取不同的城市环境路段采集图像,输入检测模块,得到有效的图像1000张;然后使用标注软件labelme3.11.2对图像标注,通过实例场景分割标注功能,框定图像中的各类目标,并标注目标的类别,软件将生成标注图像,图像中使用不同的灰度级标注不同的目标类别。灰度列表为list=[0,20,80,140,180,230]表示不同目标的像素值,其分别包括背景、霓虹灯、交通信号灯、车辆、行人和自行车,合计类别数K=6;最后通过数据扩充模块对图像与标注图像都进行扩充得到数据集。First, image data acquisition: set the acquisition module time interval T = 4s, select different urban environment road sections to collect images, input the detection module, and obtain 1000 valid images; then use the labeling software labelme3.11.2 to annotate the image, and use the instance scene segmentation and annotation function to frame various types of targets in the image and annotate the target categories. The software will generate annotated images, and different grayscale levels are used to annotate different target categories in the image. The grayscale list is list = [0, 20, 80, 140, 180, 230] to represent the pixel values of different targets, which include background, neon lights, traffic lights, vehicles, pedestrians and bicycles, with a total of 6 categories; finally, the data expansion module is used to expand both the image and the annotated image to obtain a data set.

第二、在网络参数设定界面,输入网络参数,如下:图像长度h=224,图像宽度w=224,损失函数L;网络迭代次数epochs=30,批量处理大小batch_size=4和验证集比例rate=0.1;3000张图像集将分为2700张训练集和300张验证集,训练时按batch_size每次4张图像输入残差U-net网络中进行训练,直到训练集全部训练完毕,通过网络输出的预测图像与实际标签图像计算损失函数L的大小,并反向传播调节网络中的参数使L输出趋于最小化,完成一次迭代,迭代训练网络30次,在迭代过程中通过验证集调整网络参数;最后得到合适的网络模型。Second, in the network parameter setting interface, enter the network parameters as follows: image length h = 224, image width w = 224, loss function L; network iteration number epochs = 30, batch processing size batch_size = 4 and verification set ratio rate = 0.1; the 3000 image set will be divided into 2700 training sets and 300 verification sets. During training, 4 images are input into the residual U-net network at a time according to batch_size until all training sets are trained. The size of the loss function L is calculated by the predicted image output by the network and the actual label image, and the parameters in the network are adjusted by backpropagation to minimize the L output. After one iteration, the network is iteratively trained 30 times. During the iteration, the network parameters are adjusted through the verification set; finally, a suitable network model is obtained.

第三、修改采集模块时间间隔T=0.2s,将后续得到的图像输入训练好的深度学习模型中,输出实时的语义分割结果,并将图像中不同灰度回传给处理器,这样车辆就可以很好的识别出前方位置存在哪些类别的物体,以做出后续的不同反应。Third, modify the acquisition module time interval T = 0.2s, input the subsequent images into the trained deep learning model, output real-time semantic segmentation results, and send the different grayscale in the image back to the processor, so that the vehicle can well identify what categories of objects exist in the front position and make subsequent different responses.

实际系统设计形式、网络的建立过程和结果如图1、图2、图3、图4所示,图1为深度学习的城市道路场景语义分割系统的实施的流程。图2为深度学习的城市道路场景语义分割系统中使用的残差U-net网络的整体模型设计。图3为深度学习的城市道路场景语义分割系统使用的残差U-net网络内第二级到第五级残差网络的网络形式。图4为深度学习的城市道路场景语义分割效果图展示。The actual system design form, network establishment process and results are shown in Figures 1, 2, 3 and 4. Figure 1 shows the implementation process of the deep learning urban road scene semantic segmentation system. Figure 2 shows the overall model design of the residual U-net network used in the deep learning urban road scene semantic segmentation system. Figure 3 shows the network form of the second to fifth level residual network in the residual U-net network used in the deep learning urban road scene semantic segmentation system. Figure 4 shows the effect diagram of the deep learning urban road scene semantic segmentation.

以上阐述的是本发明给出的一个实施例所表现出的优良的深度学习的城市道路场景语义分割的效果。需要指出,上述实施例用来解释说明本发明,而不是对本发明进行限制,在本发明的精神和权利要求的保护范围内,对本发明做出的任何修改,都落入本发明的保护范围。The above describes the excellent semantic segmentation effect of urban road scenes by deep learning shown by an embodiment of the present invention. It should be pointed out that the above embodiment is used to explain the present invention, rather than to limit the present invention. Any modification made to the present invention within the spirit of the present invention and the protection scope of the claims shall fall within the protection scope of the present invention.

Claims (1)

1. A deep learning-based urban road scene semantic segmentation method is characterized by comprising the following steps:
1) And image acquisition of the front end of the vehicle: collecting urban road images at regular time, setting a time interval as T, and carrying out image detection on the images with the resolution of h multiplied by w to obtain effective images; then marking the obtained effective image, marking by adopting marking software Labelme3.11.2 of a public image interface, framing and marking the objects of vehicles, pedestrians, bicycles, traffic lights and neon lights on the image into different categories by the scene segmentation marking function, reflecting the objects of different categories by the generated marked image through different gray levels, and obtaining a gray list and the object category K stored in the image from the different gray levels of the marked image;
2) And expanding the input data of the marked image and the original image: randomly clipping, splicing or adding different types of noise to the image, and then transforming the image by using an affine matrix of the image, wherein the affine transformation is shown in a formula (1):
Figure FDA0004028244470000011
in affine matrix sx Representing the sum of the lateral translations and sy Representing the amount of longitudinal translation, c1 Representing magnification or reduction of the image abscissa, c4 Denotes the magnification or reduction of the ordinate, c2 And c3 Controlling image cutting transformation, (a, b) representing original pixel position, (a) ,b ) The original resolution of the image is maintained through the transformation of filling, cutting and the like to obtain a data set;
3) The image after data expansion and the marked image are used for network training, and the residual U-net network consists of four parts, namely a down-sampling part, a bridge part, an up-sampling part and a classification part;
the method comprises the steps of image length h, image width w, loss function size L, network iteration times epochs, batch processing of batch _ size and verification set proportion rate, dividing a data set into a training set and a verification set through the rate, inputting batch _ size into a residual U-net network for training according to the batch _ size during training, calculating L through predicted images output by the network and actual label images, reversely propagating and adjusting parameters in the network to enable the output of the L to tend to be minimized, repeatedly training the network to the iteration times, adjusting network parameters through the verification set in the iteration process, and finally obtaining an optimal network model;
4) Road condition classification: modifying the acquisition time interval T, inputting the subsequently obtained images into a trained deep learning model, outputting predicted semantic segmentation images, transmitting different gray levels in the images back to a processor, and identifying the object types existing in the front position by the vehicle;
in the step 3), the down-sampling part is divided into four stages, each stage is composed of a residual error network, namely a first-stage to fourth-stage residual error network, and the connection sequence of each layer in the first-stage residual error network is as follows: the method comprises the following steps of (1) merging the input image and the processed characteristic image in a merging layer by an identity connection mode, wherein the merging layer comprises a convolution layer, a batch normalization layer, a softmax function layer, a convolution layer and a merging layer, the input image and the processed characteristic image are merged in the merging layer, the forms of all layers of a second-level residual error network and a fourth-level residual error network are the same, and the connection sequence is as follows: the method comprises the steps of firstly, integrating a plurality of feature images into a fusion layer, wherein the feature images are input into a batch normalization layer, a softmax function layer, a convolution layer, a batch normalization layer, a softmax function layer, a convolution layer and a fusion layer, and finally, the input feature images and the processed feature images are fused in the fusion layer in an identical connection mode; the convolution layer is composed of convolution kernels of 3 x 3, the dimensionalities of two convolution kernels of each level are respectively 64, 128, 256 and 512, and finally, each level is connected through a pooling layer with 2 x 2 step length being 2, and the dimensionality change of the pooling layer is the same as that of each level;
in the step 3), the bridge part is prepared for splicing network high-bottom dimension information and comprises two batch normalization layers, two softplus function layers and two convolution layers with dimensions of 3 multiplied by 3 being 1024, wherein no fusion layer exists, the connection sequence of each layer is the same as that of the second-stage residual error network, and finally the feature image is adjusted to be spliced through an up-sampling layer;
in the step 3), the upsampling part is also composed of four levels of residual error networks, which are respectively residual error networks from the fifth level to the eighth level, the form of the residual error networks and the connection mode of each layer are basically the same as that of the residual error networks from the fifth level to the seventh level, the permanent connection of the residual error networks from the fifth level to the seventh level is replaced by a 1 × 1 convolutional layer, the residual error network from the eighth level is not changed, the dimensions of the convolutional layers in the upsampling residual error networks from the various levels are 512, 256, 128 and 64 respectively, the layers are connected through the upsampling layer and the splicing layer, and the splicing layer splices the high-dimensional information and the low-dimensional information with corresponding dimensions, wherein the splicing measure is as follows:
(3.1) splicing the feature image output by the fourth-level residual error network after passing through the pooling layer with the feature image output by the bridge part;
(3.2) splicing the characteristic image of the output of the third-level residual error network after passing through the pooling layer with the characteristic image of the output of the fifth-level residual error network after passing through the upper sampling layer;
(3.3) splicing the characteristic image of the second-level residual error network after the output of the second-level residual error network passes through the pooling layer with the characteristic image of the sixth-level residual error network after the output of the sixth-level residual error network passes through the up-sampling layer;
(3.4) splicing the characteristic image of the first-level residual error network after the output of the first-level residual error network passes through the pooling layer with the characteristic image of the seventh-level residual error network after the output of the seventh-level residual error network passes through the up-sampling layer;
the dimensionality of the spliced characteristic images changes, the dimensionality of the characteristic images is adjusted by using a 1 × 1 convolutional layer for replacing constant connection, the dimensionalities of the four 1 × 1 convolutional layers are respectively 512, 256, 128 and 64, and finally the characteristic images are fused in a fusion layer;
in the step 3), the classification part is composed of a 1 × 1 convolution layer and a softmax layer, since the urban road image segmentation relates to six classes of vehicles, pedestrians, bicycles, traffic lights, neon lights and backgrounds, the feature images of 6 channels are obtained through the 1 × 1 convolution layer, but the pixel values of the original feature images are not probability values, so that the output is converted into probability distribution through the softmax layer, and the softmax function is shown in formula (2):
Figure FDA0004028244470000021
wherein d isk (x) The representation pixel x is a value on a channel K, K represents the number of item classes, gk (x) Representing the probability that pixel x belongs to class k, gk (x)∈[0,1]The highest probability in each channel is the corresponding class;
the deviation of the prediction from the actual is then evaluated using a cross-entropy loss function, see equation (3):
Figure FDA0004028244470000022
where t (x) represents the class to which pixel x corresponds, so gt(x) (x) The probability of the class is represented by,
Figure FDA0004028244470000023
and the probability that the pixels x corresponding to the labeled image belong to k classes is represented, so that the smaller the value of the loss function is, the closer the predicted image and the labeled image are, and the internal parameters of the neural network are continuously optimized through reverse transfer of the loss function, so that the loss function is continuously reduced and tends to be an ideal value. />
CN202010156966.XA2020-03-092020-03-09Urban road scene semantic segmentation method based on deep learningActiveCN111598095B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010156966.XACN111598095B (en)2020-03-092020-03-09Urban road scene semantic segmentation method based on deep learning

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010156966.XACN111598095B (en)2020-03-092020-03-09Urban road scene semantic segmentation method based on deep learning

Publications (2)

Publication NumberPublication Date
CN111598095A CN111598095A (en)2020-08-28
CN111598095Btrue CN111598095B (en)2023-04-07

Family

ID=72181296

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010156966.XAActiveCN111598095B (en)2020-03-092020-03-09Urban road scene semantic segmentation method based on deep learning

Country Status (1)

CountryLink
CN (1)CN111598095B (en)

Families Citing this family (40)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2018176000A1 (en)2017-03-232018-09-27DeepScale, Inc.Data synthesis for autonomous control systems
US10671349B2 (en)2017-07-242020-06-02Tesla, Inc.Accelerated mathematical engine
US11893393B2 (en)2017-07-242024-02-06Tesla, Inc.Computational array microprocessor system with hardware arbiter managing memory requests
US11409692B2 (en)2017-07-242022-08-09Tesla, Inc.Vector computational unit
US11157441B2 (en)2017-07-242021-10-26Tesla, Inc.Computational array microprocessor system using non-consecutive data formatting
US12307350B2 (en)2018-01-042025-05-20Tesla, Inc.Systems and methods for hardware-based pooling
US11561791B2 (en)2018-02-012023-01-24Tesla, Inc.Vector computational unit receiving data elements in parallel from a last row of a computational array
US11215999B2 (en)2018-06-202022-01-04Tesla, Inc.Data pipeline and deep learning system for autonomous driving
US11361457B2 (en)2018-07-202022-06-14Tesla, Inc.Annotation cross-labeling for autonomous control systems
US11636333B2 (en)2018-07-262023-04-25Tesla, Inc.Optimizing neural network structures for embedded systems
US11562231B2 (en)2018-09-032023-01-24Tesla, Inc.Neural networks for embedded devices
IL316003A (en)2018-10-112024-11-01Tesla IncSystems and methods for training machine models with augmented data
US11196678B2 (en)2018-10-252021-12-07Tesla, Inc.QOS manager for system on a chip communications
US11816585B2 (en)2018-12-032023-11-14Tesla, Inc.Machine learning models operating at different frequencies for autonomous vehicles
US11537811B2 (en)2018-12-042022-12-27Tesla, Inc.Enhanced object detection for autonomous vehicles based on field view
US11610117B2 (en)2018-12-272023-03-21Tesla, Inc.System and method for adapting a neural network model on a hardware platform
US11150664B2 (en)2019-02-012021-10-19Tesla, Inc.Predicting three-dimensional features for autonomous driving
US10997461B2 (en)2019-02-012021-05-04Tesla, Inc.Generating ground truth for machine learning from time series elements
US11567514B2 (en)2019-02-112023-01-31Tesla, Inc.Autonomous and user controlled vehicle summon to a target
US10956755B2 (en)2019-02-192021-03-23Tesla, Inc.Estimating object properties using visual image data
CN112070049B (en)*2020-09-162022-08-09福州大学Semantic segmentation method under automatic driving scene based on BiSeNet
CN112348839B (en)*2020-10-272024-03-15重庆大学Image segmentation method and system based on deep learning
CN112329780B (en)*2020-11-042023-10-27杭州师范大学 A method of deep image semantic segmentation based on deep learning
CN112767361B (en)*2021-01-222024-04-09重庆邮电大学Reflected light ferrograph image segmentation method based on lightweight residual U-net
CN112819688A (en)*2021-02-012021-05-18西安研硕信息技术有限公司Conversion method and system for converting SAR (synthetic aperture radar) image into optical image
CN113076837A (en)*2021-03-252021-07-06高新兴科技集团股份有限公司Convolutional neural network training method based on network image
CN113034598B (en)*2021-04-132023-08-22中国计量大学Unmanned aerial vehicle power line inspection method based on deep learning
CN112949617B (en)*2021-05-142021-08-06江西农业大学 Rural road type identification method, system, terminal device and readable storage medium
CN113378845A (en)*2021-05-282021-09-10上海商汤智能科技有限公司Scene segmentation method, device, equipment and storage medium
CN113468963A (en)*2021-05-312021-10-01山东信通电子股份有限公司Road raise dust identification method and equipment
CN113269276B (en)*2021-06-282024-10-01陕西谦亿智能科技有限责任公司Image recognition method, device, equipment and storage medium
CN113657174A (en)*2021-07-212021-11-16北京中科慧眼科技有限公司Vehicle pseudo-3D information detection method and device and automatic driving system
CN113569765A (en)*2021-07-302021-10-29清华大学 Traffic scene instance segmentation method and device based on intelligent networked vehicles
CN113569774B (en)*2021-08-022022-04-08清华大学 A method and system for semantic segmentation based on continuous learning
CN113705498B (en)*2021-09-022022-05-27山东省人工智能研究院 A Wheel Slip State Prediction Method Based on Distributed Propagation Graph Network
CN113689436B (en)*2021-09-292024-02-02平安科技(深圳)有限公司Image semantic segmentation method, device, equipment and storage medium
CN113808128B (en)*2021-10-142023-07-28河北工业大学 Visual control method for the whole process of intelligent compaction based on relative coordinate positioning algorithm
CN114419057A (en)*2022-01-272022-04-29盛视科技股份有限公司 An image-based pavement segmentation method and system
CN114495236B (en)*2022-02-112023-02-28北京百度网讯科技有限公司 Image segmentation method, device, equipment, medium and program product
CN118172787B (en)*2024-05-092024-07-30南昌航空大学 A lightweight document layout analysis method

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108062756A (en)*2018-01-292018-05-22重庆理工大学Image, semantic dividing method based on the full convolutional network of depth and condition random field
CN109145983A (en)*2018-08-212019-01-04电子科技大学A kind of real-time scene image, semantic dividing method based on lightweight network
CN110111335A (en)*2019-05-082019-08-09南昌航空大学A kind of the urban transportation Scene Semantics dividing method and system of adaptive confrontation study
CN110147794A (en)*2019-05-212019-08-20东北大学A kind of unmanned vehicle outdoor scene real time method for segmenting based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108062756A (en)*2018-01-292018-05-22重庆理工大学Image, semantic dividing method based on the full convolutional network of depth and condition random field
CN109145983A (en)*2018-08-212019-01-04电子科技大学A kind of real-time scene image, semantic dividing method based on lightweight network
CN110111335A (en)*2019-05-082019-08-09南昌航空大学A kind of the urban transportation Scene Semantics dividing method and system of adaptive confrontation study
CN110147794A (en)*2019-05-212019-08-20东北大学A kind of unmanned vehicle outdoor scene real time method for segmenting based on deep learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
基于卷积神经网络的交通场景语义分割方法研究;李琳辉等;《通信学报》;20180425(第04期);全文*
基于多尺度特征提取的图像语义分割;熊志勇等;《中南民族大学学报(自然科学版)》;20170915(第03期);全文*
基于彩色-深度图像和深度学习的场景语义分割网络;代具亭等;《科学技术与工程》;20180718(第20期);全文*
基于深度学习的遥感图像新增建筑物语义分割;陈一鸣等;《计算机与数字工程》;20191220(第12期);全文*

Also Published As

Publication numberPublication date
CN111598095A (en)2020-08-28

Similar Documents

PublicationPublication DateTitle
CN111598095B (en)Urban road scene semantic segmentation method based on deep learning
CN109886066B (en)Rapid target detection method based on multi-scale and multi-layer feature fusion
CN110334705B (en) A language recognition method for scene text images combining global and local information
CN111401436B (en)Streetscape image segmentation method fusing network and two-channel attention mechanism
CN110147763A (en)Video semanteme dividing method based on convolutional neural networks
CN110263786B (en) A road multi-target recognition system and method based on feature dimension fusion
CN110781850A (en)Semantic segmentation system and method for road recognition, and computer storage medium
CN112257793A (en)Remote traffic sign detection method based on improved YOLO v3 algorithm
CN112560670B (en)Deep learning-based traffic sign symbol and text detection and identification method and device
CN116524189A (en)High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
CN116630702A (en)Pavement adhesion coefficient prediction method based on semantic segmentation network
CN114037640A (en) Image generation method and device
CN119478401B (en)Urban street view image real-time semantic segmentation method based on attention boundary enhancement and aggregation pyramid
CN116563825A (en)Improved Yolov 5-based automatic driving target detection algorithm
Dong et al.Intelligent pixel-level pavement marking detection using 2D laser pavement images
CN118968183B (en)Fraud image identification method oriented to artificial intelligence ethics
CN119478739A (en) A method for detecting small targets of unmanned aerial vehicles, electronic equipment and storage medium
CN116883963B (en)Pedestrian crosswalk detection method based on improvement YOLOv5
Alzubaidi et al.Autonomous Vehicle Lane Detection Using Hyperbolic Neural Network with Bi-Directional Long Short-Term Memory
CN117292363A (en)Dangerous driving action recognition method
Tumuluru et al.SMS: Signs may save–traffic sign recognition and detection using CNN
CN114639090B (en) A robust Chinese license plate recognition method in uncontrolled environment
CN115690787A (en)Semantic segmentation method, image processing apparatus, and computer-readable storage medium
CN116797902A (en)Infrared target detection network training method based on source model guidance
CN113378838A (en)Method for detecting text region of nameplate of mutual inductor based on deep learning

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp