技术领域technical field
本发明涉及自动驾驶与人工智能技术领域,具体涉及一种基于深度神经网络的可行驶区域分割方法。The invention relates to the technical field of automatic driving and artificial intelligence, in particular to a drivable area segmentation method based on a deep neural network.
背景技术Background technique
智能驾驶汽车具有缓解交通拥堵、避免交通事故、增强驾驶体验、提高道路使用率和降低能耗等诸多方面的优势,使其具有广阔的商业前景。我国发布的《中国制造2025》中对智能驾驶汽车有着明确的定义,并且已经将智能驾驶纳入了中国人工智能领域的重点发展方向。Intelligent driving vehicles have many advantages in alleviating traffic congestion, avoiding traffic accidents, enhancing driving experience, increasing road usage and reducing energy consumption, making them promising commercial prospects. The "Made in China 2025" released by China has a clear definition of intelligent driving vehicles, and intelligent driving has been included in the key development direction of China's artificial intelligence field.
目前,智能汽车是新一轮人工智能技术革命背景下的新兴技术。《中国人工智能系列白皮书—智能交通》中指出了智能汽车技术发展的重要性和汽车行业的战略地位。智能汽车融合了多种前沿技术,主要包括现代传感技术、信息与通信技术、自动控制技术、计算机技术和人工智能等技术。因此智能汽车的发展不仅仅是汽车产业转型升级的关键,也是商业领域中未来汽车技术发展的战略制高点,更是国家当前高科技水平的体现。At present, smart car is an emerging technology under the background of a new round of artificial intelligence technology revolution. The "China Artificial Intelligence Series White Papers - Intelligent Transportation" points out the importance of the development of smart car technology and the strategic position of the automotive industry. Smart cars integrate a variety of cutting-edge technologies, mainly including modern sensing technology, information and communication technology, automatic control technology, computer technology and artificial intelligence. Therefore, the development of smart cars is not only the key to the transformation and upgrading of the automobile industry, but also the strategic commanding height of the future development of automobile technology in the commercial field, and it is also the embodiment of the country's current high-tech level.
汽车高级辅助驾驶系统(Advanced Driving Assistance System,ADAS)是智能汽车技术中至关重要的组成单元,能够有效地提高车辆驾驶安全性能和降低道路交通安全事故发生率。汽车高级辅助驾驶系统首先通过汽车配置的内部及外部传感器,获取车辆自身的驾驶状态和周边的驾驶环境信息来提升驾驶员对当前车辆驾驶状态和驾驶环境的感知能力,然后经过信息处理后向驾驶员发送所需要关注的驾驶信息,辅助驾驶员能够安全地驾驶车辆。并且这也为后续的驾驶决策规划阶段提供关键的信息来源,结合车辆的控制系统可以逐步实现车辆的智能化,达到自动驾驶的水平。The Advanced Driving Assistance System (ADAS) is a vital component in the intelligent vehicle technology, which can effectively improve the driving safety performance of vehicles and reduce the incidence of road traffic accidents. The automotive advanced assisted driving system first obtains the vehicle's own driving state and surrounding driving environment information through the internal and external sensors configured in the car to improve the driver's ability to perceive the current vehicle driving state and driving environment, and then drives the vehicle after processing the information. The driver sends the driving information that needs attention, and assists the driver to drive the vehicle safely. And this also provides a key source of information for the subsequent driving decision-making planning stage. Combined with the control system of the vehicle, the intelligentization of the vehicle can be gradually realized, reaching the level of automatic driving.
可行驶区域分割技术是视觉感知技术中重要的技术环节,它在汽车高级辅助驾驶系统、道路障碍物检测和目标识别等应用中都具有重要的价值。可行驶区域的提取是ADAS的关键技术,旨在使用传感器感知技术感知驾驶车辆周围的道路环境,识别并分割出当前驾驶场景下可行驶的区域,防止偏离车道或违规驾驶,能够为ADAS中的防撞碰撞预警功能提供重要的信息和线索,帮助系统或驾驶员完成控制决策。The drivable area segmentation technology is an important technical link in visual perception technology, and it has important value in applications such as automotive advanced driving assistance systems, road obstacle detection and target recognition. The extraction of the drivable area is the key technology of ADAS, which aims to use sensor perception technology to perceive the road environment around the driving vehicle, identify and segment the drivable area in the current driving scene, and prevent deviating from the lane or illegal driving. The collision avoidance warning function provides important information and clues to help the system or the driver make control decisions.
在智能汽车技术的发展初期,主要的研究应用场景是固定的行驶轨道,如有轨电车等。研究者结合无线通信技术和磁诱导、电缆等设备构建了自动控制车辆系统,使车辆具备了智能驾驶功能。这其中包括了美、日、德、意等一些国家的成果。美国高速路安全管理局将智能汽车的发展分为四个阶段(见表1),目前商业中关于智能汽车的落地产品的功能主要集中在第二个阶段。In the early stage of the development of smart car technology, the main research and application scenarios are fixed running tracks, such as trams. The researchers combined wireless communication technology and magnetic induction, cables and other equipment to build an automatic control vehicle system, so that the vehicle has the function of intelligent driving. This includes the achievements of some countries such as the United States, Japan, Germany, and Italy. The U.S. Highway Safety Administration divides the development of smart cars into four stages (see Table 1). Currently, the functions of the landing products of smart cars in business are mainly concentrated in the second stage.
表1自动驾驶发展阶段Table 1 Development stages of autonomous driving
相比于国外,我国在智能汽车的发展和研究方面起步略晚。从80年代末开始,国防科技大学先后研制出基于视觉的CITAVT系列智能车辆。其中自主研制的CITAVT-Ⅳ型智能车辆是由吉普车所改装而成的,该车型旨在能够在结构化道路环境下实现自主驾驶。在道路测试阶段,该车辆的行驶速度最高可达110km/h。同时该车辆也兼具了在非结构化道路上低速自主驾驶的工作模式。1988年,清华大学在国防科工委和国家“863计划”的资助下开始研究开发THMR系列智能车。在结构化道路行驶时,THMR-Ⅴ智能车能够自动跟踪车道线;在准结构化环境下行驶时,能够实现道路跟踪;在复杂的驾驶场景下,能够避开障碍物和视觉临场感遥控行驶,其最高车速可达150km/h。2013年,上汽集团和中航科工酒智能驾驶领域展开合作,在两年后的上海车展上展示了自主研发的可初步实现远程遥控泊车、自动巡航、自动跟车、车道保持、换道行驶、自主超车等功能的智能驾驶汽车iGS。不仅如此,国内其他汽车制造商也纷纷展开对智能汽车的研制、开发试验和路试,计划逐步完善和增添智能汽车的功能,其中包括长安汽车、北汽集团和长城汽车等。Compared with foreign countries, my country started a little late in the development and research of smart cars. Since the late 1980s, the National University of Defense Technology has successively developed vision-based CITAVT series of intelligent vehicles. Among them, the self-developed CITAVT-IV intelligent vehicle is modified from a jeep, and this model is designed to realize autonomous driving in a structured road environment. During the road test phase, the vehicle can travel at speeds of up to 110km/h. At the same time, the vehicle also has a working mode of low-speed autonomous driving on unstructured roads. In 1988, Tsinghua University began to research and develop THMR series smart cars with the support of the National Defense Science and Technology Commission and the national "863 Program". When driving on structured roads, the THMR-V smart car can automatically track lane lines; when driving in a quasi-structured environment, it can achieve road tracking; in complex driving scenarios, it can avoid obstacles and remote control with visual presence. , its maximum speed can reach 150km/h. In 2013, SAIC and AVIC launched cooperation in the field of intelligent driving. At the Shanghai Auto Show two years later, they demonstrated the self-developed remote control parking, automatic cruise, automatic following, lane keeping and lane changing. , autonomous overtaking and other functions of the intelligent driving car iGS. Not only that, other domestic automakers have also launched research, development tests and road tests on smart cars, and plan to gradually improve and add the functions of smart cars, including Changan Automobile, BAIC Group and Great Wall Motors.
与此同时,国内互联网企业和新兴的智能驾驶解决方案提供商也纷纷涉足于智能驾驶领域。百度公司于2013年启动了百度无人驾驶汽车项目,其技术核心是“百度汽车大脑”,包括高精度地图、定位、感知、智能决策与控制四大模块。2015年百度无人驾驶汽车在国内首次实现了城市、环路和高速公路混合路况下的全自动驾驶,测试时的最高速度达100km/h。2018年1月,滴滴在北京宣布成立了人工智能实施例室,与汽车制造商和供应商展开合作,开发智能汽车,并已经取得北京市自动驾驶车辆道路测试资格。目前国内也有很多AI公司根据自身的优势,在智能驾驶领域寻找切入点,比如图森科技、Pony.ai、地平线、虹软科技。At the same time, domestic Internet companies and emerging intelligent driving solution providers have also set foot in the field of intelligent driving. Baidu launched the Baidu driverless car project in 2013. Its core technology is "Baidu Car Brain", which includes four modules: high-precision map, positioning, perception, intelligent decision-making and control. In 2015, Baidu's driverless car achieved fully automatic driving under mixed road conditions in cities, ring roads and highways for the first time in China, with a maximum speed of 100km/h during the test. In January 2018, Didi announced the establishment of an artificial intelligence example room in Beijing to cooperate with automakers and suppliers to develop smart cars, and has obtained the qualification for road testing of autonomous vehicles in Beijing. At present, there are many AI companies in China who are looking for entry points in the field of intelligent driving based on their own advantages, such as TuSimple, Pony.ai, Horizon, and Arcsoft.
在驾驶场景中,人们所关注的道路信息大部分来源于人们所捕捉到的视觉信息,而这些视觉信息对人们的驾驶决策起着至关重要的影响。类比地,对智能汽车而言,车载摄像头就像人们的视觉感官系统,能够实时采集周围的驾驶环境信息。更为重要的,相比基于激光雷达的解决方案,基于摄像头的视觉系统解决方案具有成本低廉、安装简单、获取信息量大等特点。同时,这也是目前主流ADAS产品所采用的解决方案,例如Mobileye的高级驾驶辅助系统。近年来,提出的众多优秀深度神经网络给图像处理领域带来了一场技术革命。道路场景理解是自动驾驶决策和安全运行的关键组成部分。鉴于道路驾驶的结构性,所有自动驾驶车辆必须遵循道路规则。目前实现辅助驾驶的系统通常依赖于对道路标志和交通规则的视觉感知技术,因此依赖于简单的道路结构和道路标志(例如维护良好的高速公路)。将这些系统扩展应用到更加复杂的驾驶场景。.In driving scenes, most of the road information people pay attention to comes from the visual information captured by people, and these visual information plays a crucial role in people's driving decisions. Analogously, for smart cars, on-board cameras are like people's visual sensory systems, which can collect information about the surrounding driving environment in real time. More importantly, compared to lidar-based solutions, camera-based vision system solutions have the characteristics of low cost, simple installation, and a large amount of information. At the same time, this is also the solution adopted by mainstream ADAS products, such as Mobileye's advanced driver assistance system. In recent years, numerous excellent deep neural networks have been proposed, which have brought a technological revolution to the field of image processing. Road scene understanding is a critical component of autonomous driving decision-making and safe operation. Given the structural nature of road driving, all autonomous vehicles must follow the rules of the road. Current systems that enable assisted driving often rely on visual perception techniques for road signs and traffic rules, and therefore rely on simple road structures and road signs (such as well-maintained highways). Extend these systems to more complex driving scenarios. .
传统的基于视觉传感器的可行驶道路估计方法通常会使用预处理步骤去消除阴影和曝光伪影,并通过特征检测以及连续帧之间的道路和车道特征的时序融合来提取低层次的道路特征和车道线特征,最终实现拟合道路模型的功能。虽然这些方法在维护良好的道路环境中有效,但这些方法在存在遮挡,阴影或光线昏暗的驾驶场景下会受到很大的影响,甚至会失效。B.Ma和A.S.Huang分别提出了一种将图像视觉信息和雷达或LIDAR捕获的信息融合的方案,旨在解决辅助驾驶场景中存在的这类问题,但是雷达或LIDAR会显著地增加研究和开发成本。近来,在图像处理领域的研究中,将深度学习和计算机视觉技术结合取得了明显的进展,尤其是与图像语义分割相关罐体取得的进展令人兴奋。该技术对于输入图像的语义理解是细粒度的。具体地,这类方法实现了图像中物体像素级别的分类,因此比以前的基于特征的方法更加具有鲁棒性。Traditional drivable road estimation methods based on vision sensors usually use preprocessing steps to remove shadows and exposure artifacts, and extract low-level road features and Lane line features, and finally realize the function of fitting the road model. While these methods are effective in well-maintained road environments, they suffer greatly or even fail in driving scenarios with occlusions, shadows, or dim lighting. B.Ma and A.S.Huang respectively proposed a scheme to fuse image visual information and information captured by radar or LIDAR, aiming to solve such problems in assisted driving scenarios, but radar or LIDAR will significantly increase research and development cost. Recently, significant progress has been made in combining deep learning and computer vision techniques in research in the field of image processing, especially the exciting progress made in relation to image semantic segmentation. The technique is fine-grained for the semantic understanding of the input image. Specifically, such methods achieve pixel-level classification of objects in an image and are thus more robust than previous feature-based methods.
发明内容SUMMARY OF THE INVENTION
本发明要解决的技术问题在于,针对上述目前的可行驶区域分割技术受限于道路环境和光照阴影的技术问题,提供一种基于深度神经网络的可行驶区域分割方法解决上述技术缺陷。The technical problem to be solved by the present invention is to provide a drivable area segmentation method based on a deep neural network to solve the above technical defects, aiming at the technical problem that the current drivable area segmentation technology is limited by road environment and light and shadow.
一种基于深度神经网络的可行驶区域分割方法,包括:A drivable area segmentation method based on deep neural network, including:
步骤一、以Mask-RCNN模型为基础,采用ResNet深度残差网络作为特征提取网络对输入图像进行特征提取;Step 1. Based on the Mask-RCNN model, the ResNet deep residual network is used as a feature extraction network to extract features from the input image;
步骤二、采用RPN网络对特征提取后的图像进行处理,以实现对前景和背景的区分;Step 2, using the RPN network to process the image after feature extraction to realize the distinction between the foreground and the background;
步骤三、通过RoIAlign对分类出的前景和背景的轮廓进行定位;Step 3: Locating the outlines of the classified foreground and background through RoIAlign;
步骤四、通过全连接网络对定位出的前景进行类别识别,以实现对前景的具体分类;Step 4: Perform category identification on the located foreground through the fully connected network, so as to realize the specific classification of the foreground;
步骤五、根据前景的具体分类结果和区分出的背景,采用全卷积神经网络实现可行驶区域和不可行驶区域的分割。Step 5. According to the specific classification result of the foreground and the distinguished background, a fully convolutional neural network is used to realize the segmentation of the drivable area and the non-drivable area.
进一步的,步骤一中,ResNet深度残差网络结构引入残差学习模块和捷径连接模块,在原始卷积的基础上,通过在层与层之间的输入和输出之前引入一个线性连接;采用3x3的标准卷积核,使用ReLU激活函数进行激活,其中包含典型的卷积以及最大池化操作;采用了FPN特征金字塔网络,FPN通过改变网络连接方式来融合多尺度、不同语义强度的特征。Further, in step 1, the ResNet deep residual network structure introduces a residual learning module and a shortcut connection module. On the basis of the original convolution, a linear connection is introduced before the input and output between layers; 3x3 is adopted. The standard convolution kernel is activated using the ReLU activation function, which includes typical convolution and maximum pooling operations; the FPN feature pyramid network is used, and FPN fuses features of multiple scales and different semantic strengths by changing the network connection method.
进一步的,步骤二中,前景包括输入图像中的行人、动物、车辆、交通标志和道路障碍物,背景包括输入图像中的天空、绿化带、湖泊和空白道路。Further, in step 2, the foreground includes pedestrians, animals, vehicles, traffic signs and road obstacles in the input image, and the background includes the sky, green belt, lake and blank road in the input image.
进一步的,步骤三中,RoIAlign是基于RoIPooling改进而来,在Mask-RCNN中,以RoI Align以代替Faster R-CNN中的RoI Pooling操作,使用双线性插值计算RoI块,然后通过池化操作聚合。Further, in step 3, RoIAlign is improved based on RoIPooling. In Mask-RCNN, RoI Align is used to replace the RoI Pooling operation in Faster R-CNN, and the RoI block is calculated using bilinear interpolation, and then through the pooling operation. polymerization.
进一步的,在步骤四中,在全连接网络中,全连接层整合末端特征图,这些特征图具有一定程度的类别区分度,最后将特征图映射成多维向量,全连接层将多维向量传入分类层和回归层。Further, in step 4, in the fully connected network, the fully connected layer integrates the end feature maps, these feature maps have a certain degree of class discrimination, and finally maps the feature maps into multi-dimensional vectors, and the fully connected layer passes the multi-dimensional vectors to the network. classification layer and regression layer.
进一步的,在步骤五中,全卷积网络采用了下采样和上采样的编码--解码网络设计结构,并且能够共享卷积特征。Further, in step 5, the fully convolutional network adopts down-sampling and up-sampling encoding-decoding network design structure, and can share convolutional features.
与现有技术相比,本发明的优势在于:Compared with the prior art, the advantages of the present invention are:
提供一种复杂多变的驾驶场景下,基于深度神经网络的可行驶区域分割方法,以Mask-RCNN模型为基础,采用ResNet作为Mask-RCNN模型中的特征提取网络,并结合FPN的优势,融合了不同特征图的不同分辨率、不同语义强度的特征,有效地提升了算法可行驶区域分割的准确性,该方法在不同的驾驶场景下具有良好的鲁棒性和适应性。Provide a drivable area segmentation method based on deep neural network in complex and changeable driving scenarios, based on the Mask-RCNN model, using ResNet as the feature extraction network in the Mask-RCNN model, and combining the advantages of FPN, fusion The features of different resolutions and different semantic strengths of different feature maps can effectively improve the accuracy of the algorithm's drivable area segmentation. The method has good robustness and adaptability in different driving scenarios.
附图说明Description of drawings
下面将结合附图及实施例对本发明作进一步说明,附图中:The present invention will be further described below in conjunction with the accompanying drawings and embodiments, in which:
图1为本发明一种基于深度神经网络的可行驶区域分割方法流程图;Fig. 1 is a kind of flow chart of the drivable area segmentation method based on the deep neural network of the present invention;
图2是本发明RPN网络结构图;Fig. 2 is the RPN network structure diagram of the present invention;
图3是本发明RoIAlign网络结构图;Fig. 3 is RoIAlign network structure diagram of the present invention;
图4是本发明实施例中部分在非结构化道路上提取可行驶区域的效果图。FIG. 4 is an effect diagram of partially extracting a drivable area on an unstructured road in an embodiment of the present invention.
具体实施方式Detailed ways
为了对本发明的技术特征、目的和效果有更加清楚的理解,现对照附图详细说明本发明的具体实施方式。In order to have a clearer understanding of the technical features, objects and effects of the present invention, the specific embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
一种基于深度神经网络的可行驶区域分割方法,如图1所示,包括:A drivable area segmentation method based on a deep neural network, as shown in Figure 1, includes:
步骤一、以Mask-RCNN模型为基础,采用ResNet深度残差网络作为特征提取网络对输入图像进行特征提取;Step 1. Based on the Mask-RCNN model, the ResNet deep residual network is used as a feature extraction network to extract features from the input image;
步骤二、采用RPN网络对特征提取后的图像进行处理,以实现对前景和背景的区分;前景是指输入图像中的行人、动物、车辆、交通标志和道路障碍物等,背景是指输入图像中的天空、绿化带、湖泊和空白道路等;Step 2: Use the RPN network to process the image after feature extraction to realize the distinction between the foreground and the background; the foreground refers to the pedestrians, animals, vehicles, traffic signs and road obstacles in the input image, and the background refers to the input image. the sky, green belts, lakes and blank roads in
步骤三、通过RoIAlign对分类出的前景和背景的轮廓进行精确定位;Step 3. Precisely locate the classified foreground and background contours through RoIAlign;
步骤四、通过全连接网络对精确定位出的前景进行类别识别,已实现对前景的具体分类;Step 4: Perform category identification on the accurately located foreground through the fully connected network, and have realized the specific classification of the foreground;
步骤五、根据前景的具体分类结果和区分出的背景,采用全卷积神经网络实现可行驶区域和不可行驶区域的分割。Step 5. According to the specific classification result of the foreground and the distinguished background, a fully convolutional neural network is used to realize the segmentation of the drivable area and the non-drivable area.
ResNet深度残差网络结构,区别于以往的神经网络结构,该网络引入残差学习模块和捷径(Shortcut)连接模块,在原始卷积的基础上,通过在层与层之间的输入和输出之前引入一个线性连接,这样不仅可以有效地避免因层数过多而引发的过拟合问题,同时可以更好地利用低维度的图像特征,在一定程度上能够提升准确率;采用3x3的标准卷积核,使用ReLU(Rectified Linear Unit)激活函数进行激活,其中包含典型的卷积以及最大池化操作。采用了特征金字塔网络(Feature Pyramid Networks,FPN),FPN通过改变网络连接方式来融合多尺度、不同语义强度的特征,而且这种方法几乎没有增加计算量,就能够获得精度上的提升。The ResNet deep residual network structure is different from the previous neural network structure. The network introduces a residual learning module and a shortcut (Shortcut) connection module. On the basis of the original convolution, through the input and output between layers Introduce a linear connection, which can not only effectively avoid the over-fitting problem caused by too many layers, but also make better use of low-dimensional image features, which can improve the accuracy to a certain extent; using a 3x3 standard volume The product kernel is activated using the ReLU (Rectified Linear Unit) activation function, which includes typical convolution and max pooling operations. Feature Pyramid Networks (FPN) are adopted. FPN fuses multi-scale and different semantic strength features by changing the network connection method, and this method can achieve an improvement in accuracy without increasing the amount of computation.
RPN(Region Proposal Network)是一种采用全卷积方式构建的深度神经网络,RPN的网络结构如图2所示,其作用在于通过产生区域候选框来区分前景、背景,并同时能够修正候选框的坐标。RPN在特征提取网络的共享卷积层末端的特征图上移动滑窗,每个滑窗均映射成低维向量并传入分类层和包围框回归层,得到区域候选框。以滑动窗中心为基准,赋予K中尺度的锚(anchor),分类层对每种尺度的锚均给出属于前景或背景的概率;回归层则给出包围框坐标信息。RPN (Region Proposal Network) is a deep neural network constructed by full convolution. The network structure of RPN is shown in Figure 2. Its function is to distinguish the foreground and background by generating regional candidate frames, and at the same time, it can correct the candidate frames. coordinate of. The RPN moves the sliding window on the feature map at the end of the shared convolutional layer of the feature extraction network. Each sliding window is mapped into a low-dimensional vector and passed to the classification layer and the bounding box regression layer to obtain regional candidate boxes. Taking the center of the sliding window as the benchmark, an anchor of K medium scale is given, and the classification layer gives the probability of belonging to the foreground or background for each scale anchor; the regression layer gives the coordinate information of the bounding box.
RoIAlign是基于RoIPooling改进而来的,如图3所示,是一种区域特征聚集方式,在Mask-RCNN中,以RoI Align以代替Faster R-CNN中的RoI Pooling操作,使用双线性插值计算RoI块,然后通过池化操作聚合,这样避免了对RoI块进行量化处理,解决了RoI池化时的量化操作造成的目标定位不准确的问题。RoIAlign解决了量化操作造成的目标定位不准确的问题。RoIAlign is improved based on RoIPooling. As shown in Figure 3, it is a regional feature aggregation method. In Mask-RCNN, RoI Align is used to replace the RoI Pooling operation in Faster R-CNN, and bilinear interpolation is used to calculate RoI blocks are then aggregated through a pooling operation, which avoids quantizing the RoI blocks and solves the problem of inaccurate target positioning caused by the quantization operation during RoI pooling. RoIAlign solves the problem of inaccurate target positioning caused by quantization operations.
前景物体分类和包围框定位回归是目标检测中的两个核心环节。分类给出输入图像包含有意义的目标类信息,输出的是所属类别的置信值。定位则给出图像中目标物体所在的具体位置信息,输出的是目标物体的包围框。采用全连接层整合末端特征图,这些特征图具有一定程度的类别区分度,最后将特征图映射成n维向量。全连接层将输出结果传入分类层和回归层,其损失函数由分类损失和包围框回归损失两部分组成,包围框损失用以修正目标位置信息。模型采用基于区域候选框的检测算法,沿用了Faster R-CNN方法中最重要的步骤,使用RPN网络生成的候选框代替R-CNN中的选择性搜索(Selective Search)算法。这样可以简化模型训练,避免训练过程中耗费的大量时间和存储空间,并且也能同时提高检测精度,这也有助于提升可行驶区域分割的精度。Foreground object classification and bounding box localization regression are two core links in object detection. Classification gives the input image contains meaningful target class information, and the output is the confidence value of the class to which it belongs. Positioning gives the specific location information of the target object in the image, and outputs the bounding box of the target object. The fully connected layer is used to integrate the end feature maps, these feature maps have a certain degree of class discrimination, and finally the feature maps are mapped into n-dimensional vectors. The fully connected layer transmits the output results to the classification layer and the regression layer. The loss function consists of the classification loss and the bounding box regression loss. The bounding box loss is used to correct the target position information. The model adopts the detection algorithm based on the region candidate box, follows the most important steps in the Faster R-CNN method, and uses the candidate box generated by the RPN network to replace the Selective Search algorithm in R-CNN. This simplifies model training, avoids a lot of time and storage space during the training process, and improves detection accuracy at the same time, which also helps to improve the accuracy of drivable area segmentation.
传统的CNN模型使用较大的感受野,导致图像分割结果边缘粗糙,同时由于最大池化层的使用,进一步加重了分割粗糙的问题。而全卷积网络结构率先完成了图像端到端的语义分割任务,并提出与卷积操作逆向的运算思路,在特征提取进行卷积下采样丢弃了图像的低维度的多种特征的情况下,将经过残差网络训练的特征经过1×1的卷积核重新调整维度以适应分割任务。其采用转置卷积的方式来完成上采样操作,并引入跳跃连接层,将低维度特征与高维度特征进行融合。Mask-RCNN模型在mask分支中引入全卷积网络,该网络的优点在于采用了下采样和上采样的编码-解码网络设计结构,并且能够共享卷积特征,大大节省了存储资源,提高了对可行驶区域分割任务的处理效率。The traditional CNN model uses a large receptive field, which leads to rough edges of image segmentation results. At the same time, due to the use of the maximum pooling layer, the problem of rough segmentation is further aggravated. The fully convolutional network structure is the first to complete the end-to-end semantic segmentation task of the image, and proposes an operation idea that is inverse to the convolution operation. In the case of convolution downsampling in feature extraction, the low-dimensional features of the image are discarded. Residual network-trained features are rescaled by a 1×1 convolution kernel to suit the segmentation task. It uses transposed convolution to complete the upsampling operation, and introduces a skip connection layer to fuse low-dimensional features with high-dimensional features. The Mask-RCNN model introduces a fully convolutional network in the mask branch. The advantage of this network is that it adopts the encoding-decoding network design structure of downsampling and upsampling, and can share convolutional features, which greatly saves storage resources and improves the accuracy of Processing efficiency of the drivable area segmentation task.
实施例1:在COCO数据集上进行实例分割的评估结果见表2。Example 1: The evaluation results of instance segmentation on the COCO dataset are shown in Table 2.
表2不同特征提取网络的评估结果Table 2 Evaluation results of different feature extraction networks
表2中显示,随着ResNet深度残差网络的网络层数递增,其特征表达能力越强。而FPN对不同特征图的不同分辨率、不同语义强度特征进行融合,其特征表达能力比残差网络单一输出的特征图要更好。As shown in Table 2, as the number of layers of the ResNet deep residual network increases, its feature expression ability is stronger. However, FPN fuses the features of different resolutions and different semantic strengths of different feature maps, and its feature expression ability is better than that of the single output feature map of the residual network.
实施例2:对以ResNet101-FPN为特征提取网络的Mask-RCNN模型进行训练,由实施例中设置批量大小为1,由于批量大小比较小,在训练阶段冻结了BN层。实施例中采用了预训练的权重初始化策略,将模型训练分为3个主要阶段。首先训练Mask-RCNN头部网络层部分,然后微调残差网络权重,这两个步骤设置的初始学习率为1×10-3;最后结合前面两个步骤训练的结果,微调所有网络层权重,该步骤的初始学习率设置为1×10-4。对整个模型训练了160epochs,每次epoch迭代训练1000次。Example 2: To train the Mask-RCNN model with ResNet101-FPN as the feature extraction network, the batch size is set to 1 in the example. Since the batch size is relatively small, the BN layer is frozen during the training phase. In the embodiment, a pre-trained weight initialization strategy is adopted, and the model training is divided into three main stages. First train the Mask-RCNN head network layer, and then fine-tune the residual network weights. The initial learning rate set in these two steps is 1×10-3; The initial learning rate for this step is set to 1×10-4. The entire model was trained for 160 epochs, 1000 times per epoch iteration.
在本实施例的模型训练中,通过补零操作将所有输入图像大小设置为1024×1024。并且采用了数据增强,对输入图像进行随机左右平移操作。由于RPN生成候选框的时候往往负样本占比过高,我们设置正负样本候选框比例为1:3,以缓解样本类别不均衡的问题。In the model training of this embodiment, the size of all input images is set to 1024×1024 through the zero-padding operation. And data augmentation is used to perform random left and right translation operations on the input image. Since the proportion of negative samples is often too high when RPN generates candidate frames, we set the ratio of positive and negative sample candidate frames to 1:3 to alleviate the problem of unbalanced sample categories.
同样,在以ResNet50-FPN为特征提取网络的Mask-RCNN模型上使用相同的超参进行训练。然后,在已训练的神经网络上进行算法评估,测试样本来自测试数据集,表3给出了测试阶段的评估结果。Likewise, the same hyperparameters are used for training on the Mask-RCNN model with ResNet50-FPN as the feature extraction network. Then, the algorithm evaluation is performed on the trained neural network, and the test samples are from the test dataset. Table 3 presents the evaluation results in the test phase.
表3评估结果Table 3 Evaluation Results
评估结果显示,使用ResNet深度残差网络作为特征提取网络对复杂多样的驾驶场景能够有效地实现可行驶区域分割,并且具有良好的鲁棒性和适应性。The evaluation results show that using the ResNet deep residual network as a feature extraction network can effectively achieve drivable area segmentation for complex and diverse driving scenes, and has good robustness and adaptability.
实施例3:采用Mask-RCNN模型在非结构化道路上提取可行驶区域的效果图。如图4所示,第一行为输入测试图像,第二行为其相应的真值分割图。Example 3: Using the Mask-RCNN model to extract the rendering of the drivable area on an unstructured road. As shown in Figure 4, the first row is the input test image, and the second row is its corresponding ground-truth segmentation map.
非结构化道路是指城市非主要干道、乡村街道等结构化程度较低的道路,这种道路没有车道线或清晰的道路边界。并且这类道路往往受阴影、水迹和泥泞等影响,不易区别出可行驶区域。实施例结果表明,采用Mask-RCNN模型能够有效地提取可行驶区域。Unstructured roads refer to less structured roads such as urban non-main arterial roads and rural streets, which do not have lane lines or clear road boundaries. And such roads are often affected by shadows, water marks and mud, and it is not easy to distinguish the drivable area. The results of the example show that the use of the Mask-RCNN model can effectively extract the drivable area.
本发明专利提供一种复杂多变的驾驶场景下,基于深度神经网络的可行驶区域分割方法,该方法以Mask-RCNN模型为基础,采用ResNet-50和ResNet-101作为Mask-RCNN模型中的特征提取网络,并通过结合FPN的优势,融合了不同特征图的不同分辨率、不同语义强度的特征,有效地提升了算法可行驶区域分割的准确性,同时,实施例结果表明了该方法在不同的驾驶场景下具有良好的鲁棒性和适应性。The patent of the present invention provides a drivable area segmentation method based on a deep neural network in a complex and changeable driving scene. The feature extraction network, combined with the advantages of FPN, combines the features of different resolutions and different semantic strengths of different feature maps, which effectively improves the accuracy of the algorithm's drivable area segmentation. It has good robustness and adaptability in different driving scenarios.
上面结合附图对本发明的实施例进行了描述,但是本发明并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本发明的启示下,在不脱离本发明宗旨和权利要求所保护的范围情况下,还可做出很多形式,这些均属于本发明的保护之内。The embodiments of the present invention have been described above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned specific embodiments, which are merely illustrative rather than restrictive. Under the inspiration of the present invention, without departing from the scope of protection of the present invention and the claims, many forms can be made, which all belong to the protection of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910703366.8ACN110599497A (en) | 2019-07-31 | 2019-07-31 | Drivable region segmentation method based on deep neural network |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910703366.8ACN110599497A (en) | 2019-07-31 | 2019-07-31 | Drivable region segmentation method based on deep neural network |
| Publication Number | Publication Date |
|---|---|
| CN110599497Atrue CN110599497A (en) | 2019-12-20 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910703366.8APendingCN110599497A (en) | 2019-07-31 | 2019-07-31 | Drivable region segmentation method based on deep neural network |
| Country | Link |
|---|---|
| CN (1) | CN110599497A (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111274976A (en)* | 2020-01-22 | 2020-06-12 | 清华大学 | Lane detection method and system based on multi-level fusion of vision and lidar |
| CN111462128A (en)* | 2020-05-28 | 2020-07-28 | 南京大学 | Pixel-level image segmentation system and method based on multi-modal spectral image |
| CN111723697A (en)* | 2020-06-05 | 2020-09-29 | 广东海洋大学 | An Improved Driver Background Segmentation Method Based on Mask-RCNN |
| CN111881348A (en)* | 2020-07-20 | 2020-11-03 | 百度在线网络技术(北京)有限公司 | Information processing method, apparatus, device, and storage medium |
| CN112180903A (en)* | 2020-10-19 | 2021-01-05 | 江苏中讯通物联网技术有限公司 | Vehicle state real-time detection system based on edge calculation |
| CN112200172A (en)* | 2020-12-07 | 2021-01-08 | 天津天瞳威势电子科技有限公司 | Driving region detection method and device |
| CN113468908A (en)* | 2020-03-30 | 2021-10-01 | 北京四维图新科技股份有限公司 | Target identification method and device |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108109160A (en)* | 2017-11-16 | 2018-06-01 | 浙江工业大学 | It is a kind of that interactive GrabCut tongue bodies dividing method is exempted from based on deep learning |
| CN108985194A (en)* | 2018-06-29 | 2018-12-11 | 华南理工大学 | A kind of intelligent vehicle based on image, semantic segmentation can travel the recognition methods in region |
| CN109712118A (en)* | 2018-12-11 | 2019-05-03 | 武汉三江中电科技有限责任公司 | A kind of substation isolating-switch detection recognition method based on Mask RCNN |
| CN109740465A (en)* | 2018-12-24 | 2019-05-10 | 南京理工大学 | A Lane Line Detection Algorithm Based on Instance Segmentation Neural Network Framework |
| CN109816669A (en)* | 2019-01-30 | 2019-05-28 | 云南电网有限责任公司电力科学研究院 | An Improved Mask R-CNN Image Instance Segmentation Method for Identifying Defects in Power Equipment |
| CN109993082A (en)* | 2019-03-20 | 2019-07-09 | 上海理工大学 | Convolutional neural network road scene classification and road segmentation method |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108109160A (en)* | 2017-11-16 | 2018-06-01 | 浙江工业大学 | It is a kind of that interactive GrabCut tongue bodies dividing method is exempted from based on deep learning |
| CN108985194A (en)* | 2018-06-29 | 2018-12-11 | 华南理工大学 | A kind of intelligent vehicle based on image, semantic segmentation can travel the recognition methods in region |
| CN109712118A (en)* | 2018-12-11 | 2019-05-03 | 武汉三江中电科技有限责任公司 | A kind of substation isolating-switch detection recognition method based on Mask RCNN |
| CN109740465A (en)* | 2018-12-24 | 2019-05-10 | 南京理工大学 | A Lane Line Detection Algorithm Based on Instance Segmentation Neural Network Framework |
| CN109816669A (en)* | 2019-01-30 | 2019-05-28 | 云南电网有限责任公司电力科学研究院 | An Improved Mask R-CNN Image Instance Segmentation Method for Identifying Defects in Power Equipment |
| CN109993082A (en)* | 2019-03-20 | 2019-07-09 | 上海理工大学 | Convolutional neural network road scene classification and road segmentation method |
| Title |
|---|
| JIONGNIMA: "实例分割模型Mask R-CNN详解:从R-CNN,Fast R-CNN,Faster R-CNN再到Mask R-CNN", 《HTTPS://BLOG.CSDN.NET/JIONGNIMA/ARTICLE/DETAILS/79094159》* |
| 谢岩 等: "基于编解码器模型的车道识别与车辆检测算法", 《广东工业大学学报》* |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111274976A (en)* | 2020-01-22 | 2020-06-12 | 清华大学 | Lane detection method and system based on multi-level fusion of vision and lidar |
| CN111274976B (en)* | 2020-01-22 | 2020-09-18 | 清华大学 | Lane detection method and system based on multi-level fusion of vision and lidar |
| CN113468908A (en)* | 2020-03-30 | 2021-10-01 | 北京四维图新科技股份有限公司 | Target identification method and device |
| CN113468908B (en)* | 2020-03-30 | 2024-05-10 | 北京四维图新科技股份有限公司 | Target identification method and device |
| CN111462128A (en)* | 2020-05-28 | 2020-07-28 | 南京大学 | Pixel-level image segmentation system and method based on multi-modal spectral image |
| CN111462128B (en)* | 2020-05-28 | 2023-12-12 | 南京大学 | Pixel-level image segmentation system and method based on multi-mode spectrum image |
| CN111723697A (en)* | 2020-06-05 | 2020-09-29 | 广东海洋大学 | An Improved Driver Background Segmentation Method Based on Mask-RCNN |
| CN111881348A (en)* | 2020-07-20 | 2020-11-03 | 百度在线网络技术(北京)有限公司 | Information processing method, apparatus, device, and storage medium |
| CN112180903A (en)* | 2020-10-19 | 2021-01-05 | 江苏中讯通物联网技术有限公司 | Vehicle state real-time detection system based on edge calculation |
| CN112200172A (en)* | 2020-12-07 | 2021-01-08 | 天津天瞳威势电子科技有限公司 | Driving region detection method and device |
| CN112200172B (en)* | 2020-12-07 | 2021-02-19 | 天津天瞳威势电子科技有限公司 | Driving region detection method and device |
| Publication | Publication Date | Title |
|---|---|---|
| CN110942000B (en) | Unmanned vehicle target detection method based on deep learning | |
| CN110599497A (en) | Drivable region segmentation method based on deep neural network | |
| CN113920499B (en) | Laser point cloud three-dimensional target detection model and method for complex traffic scene | |
| CN117058646B (en) | Complex road target detection method based on multi-mode fusion aerial view | |
| WO2024230038A1 (en) | Three-dimensional point-cloud semantic segmentation method based on multi-level boundary enhancement for unstructured environment | |
| Xu et al. | Real-time obstacle detection over rails using deep convolutional neural network | |
| CN113506318A (en) | A 3D object perception method in vehicle edge scene | |
| CN111401150A (en) | Multi-lane line detection method based on example segmentation and adaptive transformation algorithm | |
| CN114973199A (en) | Rail transit train obstacle detection method based on convolutional neural network | |
| CN115775378A (en) | A vehicle-road cooperative target detection method based on multi-sensor fusion | |
| CN111292366A (en) | Visual driving ranging algorithm based on deep learning and edge calculation | |
| CN109886079A (en) | A vehicle detection and tracking method | |
| CN116630702A (en) | Pavement adhesion coefficient prediction method based on semantic segmentation network | |
| CN115376089A (en) | Deep learning-based lane line detection method | |
| CN110069982A (en) | A kind of automatic identifying method of vehicular traffic and pedestrian | |
| Ding et al. | A lane detection method based on semantic segmentation | |
| CN107220632B (en) | A road image segmentation method based on normal feature | |
| Qin et al. | An improved deep learning algorithm for obstacle detection in complex rail transit environments | |
| CN117765507A (en) | Foggy day traffic sign detection method based on deep learning | |
| Aron et al. | Current Approaches in Traffic Lane Detection: a minireview | |
| CN111160230B (en) | Road irregular area detection network based on deep learning | |
| Chen et al. | A new adaptive region of interest extraction method for two-lane detection | |
| CN117011819A (en) | Lane line detection method, device and equipment based on feature guidance attention | |
| CN113536973B (en) | Traffic sign detection method based on saliency | |
| CN116977975A (en) | Traffic sign detection method based on deep learning |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication | Application publication date:20191220 |