





技术领域technical field
本发明属于计算机视觉的深度估计领域,具体涉及一种基于多尺度卷积神经网络的稀疏深度稠密化方法。The invention belongs to the field of depth estimation of computer vision, and in particular relates to a sparse depth densification method based on a multi-scale convolutional neural network.
背景技术Background technique
无人驾驶中,基于计算机视觉技术的感知系统是最基础的部分。目前,无人驾驶感知系统中最常使用的是基于可见光的摄像头,摄像头具有成本低,相关技术成熟等优点。但基于可见光的摄像头也存在明显缺点:其一,由摄像头拍摄的RGB图像只有颜色信息,如果目标纹理复杂,感知系统容易判断失误。其二,在某些环境,基于可见光的摄像头会失效。例如光照不足的夜晚,摄像头很难正常进行工作。激光雷达也是无人驾驶感知系统经常使用的传感器。激光雷达不易受光照条件的影响,其采集的点云数据具有三维特性,由点云数据可以直接得到深度图像,深度图像是将点云映射到二维平面形成的图像,每一个像素点的值表示该点到传感器的距离。相比于RGB图像,深度图像包含的距离信息对物体识别,分割等任务更有帮助。但激光雷达价格昂贵,并且采集的点云过于稀疏,生成的深度图也过于稀疏,一定程度影响了其使用效果。In unmanned driving, the perception system based on computer vision technology is the most basic part. At present, cameras based on visible light are most commonly used in unmanned driving perception systems. Cameras have the advantages of low cost and mature related technologies. However, cameras based on visible light also have obvious disadvantages: first, the RGB images captured by the camera only have color information, and if the target texture is complex, the perception system is prone to misjudgment. Second, in some environments, cameras based on visible light will fail. For example, at night when there is insufficient light, it is difficult for the camera to work normally. Lidar is also a sensor often used in driverless perception systems. Lidar is not easily affected by lighting conditions. The point cloud data it collects has three-dimensional characteristics. The depth image can be directly obtained from the point cloud data. The depth image is an image formed by mapping the point cloud to a two-dimensional plane. The value of each pixel Indicates the distance from the point to the sensor. Compared with RGB images, the distance information contained in depth images is more helpful for tasks such as object recognition and segmentation. However, lidar is expensive, and the collected point cloud is too sparse, and the generated depth map is also too sparse, which affects its use effect to a certain extent.
发明内容Contents of the invention
本发明的发明目的在于:针对上述存在的问题,提供一种利用多尺度网络对稀疏深度进行稠密化的方法。The object of the present invention is to provide a method for densifying sparse depth by using a multi-scale network in view of the above existing problems.
本发明的基于多尺度网络的稀疏深度稠密化方法,包括下列步骤:The sparse depth densification method based on the multi-scale network of the present invention comprises the following steps:
构建多尺度网络模型:Build a multiscale network model:
所述多尺度网络模型包括L(L≥2)路输入分支支路,将L路分支支路的输出对应点相加后输入信息融合层,对信息融合层后接一个上采样处理层,作为多尺度网络模型的输出层;The multi-scale network model includes L (L≥2) road input branches, the output corresponding points of the L road branches are added and input to the information fusion layer, and an upsampling processing layer is connected to the information fusion layer, as The output layer of the multi-scale network model;
其中,L路输入分支支路中,其中一路支路作为原始图像的输入;剩余L-1路作为原始图像进行不同下采样后得到的下采样图像的输入;且多尺度网络模型的输出层的输出图像与原始图像的尺寸相同;Among them, among the L-way input branches, one of the branches is used as the input of the original image; the remaining L-1 way is used as the input of the down-sampled image obtained after different down-sampling of the original image; and the output layer of the multi-scale network model The output image has the same dimensions as the original image;
且L路输入分支支路的输入数据包括:RGB图像和稀疏深度图;其中对于对原始图像的稀疏深度图的下采样方式为:对于稀疏深度图,基于预设的下采样倍数K,将稀疏深度图按照像素划分为网格,每个网格包含K×K个原始输入像素;并基于原始输入像素的深度值设置各原始输入像素的标记值si,若当前原始输入像素的深度值为0,则si=0;否则si=1;其中i为每个网格包括的K×K个原始输入像素的区分符;并根据公式得到每个网格的深度值pnew,其中pi表示原始输入像素i的深度值;And the input data of the L-way input branch branch includes: RGB image and sparse depth map; wherein the downsampling method for the sparse depth map of the original image is: for the sparse depth map, based on the preset downsampling multiple K, the sparse The depth map is divided into grids according to pixels, and each grid contains K×K original input pixels; and the tag value si of each original input pixel is set based on the depth value of the original input pixel, if the depth value of the current original input pixel is 0, then si =0; otherwise si =1; where i is the distinguisher of K×K original input pixels included in each grid; and according to the formula Get the depth value pnew of each grid, where pi represents the depth value of the original input pixel i;
输入为原始图像的支路的网络结构为第一网络结构;The network structure of the branch that is input as the original image is the first network structure;
输入为原始图像的下采样图像的支路的网络结构为:在第一网络结构后增设K/2个16通道的上采样卷积块D,其中K表示对原始图像的下采样倍数;The network structure of the branch of the downsampled image that is input as the original image is: after the first network structure, K/2 upsampling convolution blocks D of 16 channels are added, where K represents the downsampling multiple of the original image;
所述第一网络结构包括十四层,分别为:The first network structure includes fourteen layers, which are:
第一层为输入层和池化层,输入层的卷积核大小为7*7,通道数为64,卷积步长为2;池化层采用最大值池化,其卷积核大小为3*3,池化常数为2;The first layer is the input layer and the pooling layer. The convolution kernel size of the input layer is 7*7, the number of channels is 64, and the convolution step is 2; the pooling layer adopts the maximum pooling, and the convolution kernel size is 3*3, the pooling constant is 2;
第二层和第三层结构相同,均为一个64通道的R1残差卷积块;The second layer and the third layer have the same structure, both are a 64-channel R1 residual convolution block;
第四层为一个128通道的R2残差卷积块;The fourth layer is a 128-channel R2 residual convolution block;
第五层为一个128通道的R1残差卷积块;The fifth layer is a 128-channel R1 residual convolution block;
第六层为一个256通道的R2残差卷积块;The sixth layer is a 256-channel R2 residual convolution block;
第七层为一个256通道的R1残差卷积块;The seventh layer is a 256-channel R1 residual convolution block;
第八层为一个512通道的R2残差卷积块;The eighth layer is a 512-channel R2 residual convolution block;
第九层为一个512通道的R1残差卷积块;The ninth layer is a 512-channel R1 residual convolution block;
第十层为一个卷积层,其卷积核大小为3*3,通道数为256,卷积步长为1;The tenth layer is a convolution layer with a convolution kernel size of 3*3, a channel number of 256, and a convolution step size of 1;
第十一层为128通道的上采样卷积块D,并将第十一层的输出与第七层的输出按照通道叠加后再输入第十二层;The eleventh layer is an upsampling convolution block D of 128 channels, and the output of the eleventh layer and the output of the seventh layer are superimposed according to the channel and then input to the twelfth layer;
第十二层为64通道的上采样卷积块D,并将第十二层的输出与第五层的输出按照通道叠加后再输入第十三层;The twelfth layer is a 64-channel upsampling convolution block D, and the output of the twelfth layer and the output of the fifth layer are superimposed according to the channel and then input to the thirteenth layer;
第十三层是32通道的上采样卷积块D,并将第十三层的输出与第三层的输出按照通道叠加后再输入第十四层;The thirteenth layer is a 32-channel upsampling convolution block D, and the output of the thirteenth layer and the output of the third layer are superimposed according to the channel and then input to the fourteenth layer;
第十四层为16通道的上采样卷积块D;The fourteenth layer is a 16-channel upsampling convolution block D;
所述R1残差卷积块包括两层相同结构的卷积层,其卷积核大小为3*3,卷积步长为1,通道数可调节;并将输入R1残差卷积块的输入数据与第二层的输出对应点相加接入一个ReLU激活函数,作为R1残差卷积块的输出层;The R1 residual convolution block includes two layers of convolution layers with the same structure, the convolution kernel size is 3*3, the convolution step size is 1, and the number of channels is adjustable; and the input R1 residual convolution The input data of the block is added to the corresponding point of the output of the second layer and connected to a ReLU activation function as the output layer of the R1 residual convolution block;
所述R2残差卷积块包括第一、第二和第三卷积层,输入R2残差卷积块的输入数据分别进入两条支路,再将两条支路的输出对应点相加接入一个ReLU激活函数,作为R2残差卷积块的输出层;其中一条支路为顺次连接的第一和第二卷积层,另一条支路为第三卷积层;The R2 residual convolution block includes the first, second and third convolution layers, the input data of the input R2 residual convolution block enters two branches respectively, and then the output corresponding points of the two branches Adding access to a ReLU activation function as the output layer of the R2 residual convolution block; one of the branches is the first and second convolutional layers connected in sequence, and the other branch is the third convolutional layer;
所述第一卷积层和第二卷积层的结构相同,均为卷积核大小为3*3,卷积步长为2,通道数可调节;第三卷积层为卷积核大小为3*3,卷积步长为1,通道数可调节;The structure of the first convolution layer and the second convolution layer is the same, the size of the convolution kernel is 3*3, the convolution step size is 2, and the number of channels is adjustable; the third convolution layer is the size of the convolution kernel It is 3*3, the convolution step is 1, and the number of channels can be adjusted;
所述上采样卷积块D包括两个放大模块和一个卷积层,其中输入上采样卷积块D的输入数据分别进入两条支路,再将两条支路的输出对应点相加接入一个ReLU激活函数,作为上采样卷积块D的输出层;其中一条支路为顺次连接的第一放大模块和卷积层,另一条支路为第二放大模块;The up-sampling convolution block D includes two amplification modules and a convolution layer, wherein the input data of the up-sampling convolution block D enters two branches respectively, and then the output corresponding points of the two branches are added and connected Enter a ReLU activation function as the output layer of the upsampling convolution block D; one of the branches is the first amplification module and the convolution layer connected in sequence, and the other branch is the second amplification module;
其中,上采样卷积块D的卷积层为:卷积核大小是3*3,卷积步长为1,通道数可调节;Among them, the convolution layer of the upsampling convolution block D is: the convolution kernel size is 3*3, the convolution step is 1, and the number of channels is adjustable;
上采样卷积块D的放大模块包括四个并列的卷积层,该四个卷积层的通道数设置为相同,卷积核大小分别为:3*3,3*2,2*3和2*2,且卷积步长均为1,输入放大模块的输入数据通过其四个卷积层后再拼接在一起,作为放大模块的输出;The amplification module of the upsampling convolution block D includes four parallel convolutional layers, the number of channels of the four convolutional layers is set to be the same, and the convolution kernel sizes are: 3*3, 3*2, 2*3 and 2*2, and the convolution step size is 1, the input data of the input amplification module passes through its four convolution layers and then spliced together as the output of the amplification module;
所述信息融合模块为卷积核大小为3*3,通道数为1,卷积步长为1的卷积层;The information fusion module is a convolution layer with a convolution kernel size of 3*3, a channel number of 1, and a convolution step size of 1;
对所构建的多尺度网络模型进行深度学习训练,并通过训练好的多尺度网络模型得到待处理图像的稠密化的处理结果。Perform deep learning training on the constructed multi-scale network model, and obtain the densified processing result of the image to be processed through the trained multi-scale network model.
综上所述,由于采用了上述技术方案,本发明的有益效果是:本发明利用稀疏点云和图像相结合的方式估计深度,稀疏深度对RGB图像进行指导,RGB图像对稀疏深度进行补充,结合两种数据形式的优点,结合本发明所设置的多个尺度网络模型进行深度估计,提高了深度估计的准确率。In summary, due to the adoption of the above technical solution, the beneficial effects of the present invention are: the present invention uses a combination of sparse point cloud and image to estimate the depth, the sparse depth guides the RGB image, and the RGB image supplements the sparse depth, Combining the advantages of the two data forms and combining the multiple scale network models set by the present invention for depth estimation improves the accuracy of depth estimation.
附图说明Description of drawings
图1是具体实施方式中,本发明的下采样示意图;Fig. 1 is a schematic diagram of downsampling of the present invention in a specific embodiment;
图2是具体实施方式中,残差卷积块示意图。其中图2-a是类型一残差卷积块,图2-b是类型二残差卷积块;Fig. 2 is a schematic diagram of a residual convolution block in a specific embodiment. Among them, Figure 2-a is a type one residual convolution block, and Figure 2-b is a type two residual convolution block;
图3是具体实施方式中,上采样卷积块示意图。其中图3-a是放大模块示意图,图3-b是整个上采样卷积块示意图;Fig. 3 is a schematic diagram of an upsampling convolution block in a specific embodiment. Among them, Figure 3-a is a schematic diagram of the amplification module, and Figure 3-b is a schematic diagram of the entire upsampling convolution block;
图4是具体实施方式中,所采用的多尺度网络结构示意图;Fig. 4 is a schematic diagram of a multi-scale network structure adopted in a specific embodiment;
图5是具体实施方式中,本发明与现有处理方法的结果与对比结果图。其中图5-a为输入的RGB图像,图5-b为稀疏深度图;图5-c为现有方法对图5-b的深度估计;图5-d为本发明对图5-b的深度估计结果。Fig. 5 is a graph showing results and comparison results of the present invention and existing processing methods in the specific embodiment. Wherein Fig. 5-a is the input RGB image, and Fig. 5-b is a sparse depth map; Fig. 5-c is the depth estimation of Fig. 5-b by the existing method; Fig. 5-d is the present invention to Fig. 5-b Depth estimation results.
具体实施方式Detailed ways
为使本发明的目的、技术方案和优点更加清楚,下面结合实施方式和附图,对本发明作进一步地详细描述。In order to make the purpose, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the implementation methods and accompanying drawings.
为了满足特定场景(例如无人驾驶)对对深度图像质量要求较高的需求,本发明提出了一种利用多尺度网络对稀疏深度进行稠密化的方法。而现有的深度估计方法主要利用RGB图像直接得到稠密深度,但是由于二维图像直接估计深度图像存在内在模糊性,为了解决该问题,本发明利用稀疏点云和图像相结合的方式估计深度,稀疏深度对RGB图像进行指导,RGB图像对稀疏深度进行补充,结合两种数据形式的优点,同时在多个尺度下进行深度估计,提高了深度估计的准确率。In order to meet the requirements of specific scenarios (such as unmanned driving) for higher quality depth images, the present invention proposes a method for densifying sparse depth using a multi-scale network. The existing depth estimation methods mainly use RGB images to directly obtain the dense depth, but because the two-dimensional image directly estimates the depth image, there is inherent ambiguity. In order to solve this problem, the present invention uses a combination of sparse point cloud and image to estimate the depth. The sparse depth guides the RGB image, and the RGB image complements the sparse depth. Combining the advantages of the two data forms, the depth estimation is performed at multiple scales at the same time, which improves the accuracy of the depth estimation.
本发明使用多尺度卷积神经网络,将RGB图像数据和稀疏点云数据进行有效的融合,最终得出稠密的深度图像。将稀疏点云映射到二维平面生成稀疏深度图,并与RGB图像对齐,然后将稀疏深度图和RGB图像连接在一起生成RGBD(RGB+Depth Map)图像,将RGBD图像输入到多尺度卷积神经网络进行训练和测试,最终估计出一个稠密的深度图。RGB图像和稀疏点云相结合的方式估计深度,可以让点云包含的距离信息去指导RGB图像转化为深度图;多尺度网络利用了原始数据不同分辨率的信息,一方面扩大了视野域,另一方面小分辨率上的输入深度图更稠密,可以获得更高的准确率。The present invention uses a multi-scale convolutional neural network to effectively fuse RGB image data and sparse point cloud data, and finally obtain a dense depth image. Map the sparse point cloud to a two-dimensional plane to generate a sparse depth map, and align it with the RGB image, then connect the sparse depth map and the RGB image to generate an RGBD (RGB+Depth Map) image, and input the RGBD image to the multi-scale convolution The neural network is trained and tested, and finally estimates a dense depth map. Combining RGB images and sparse point clouds to estimate depth can allow the distance information contained in point clouds to guide the transformation of RGB images into depth maps; multi-scale networks use information of different resolutions of original data, on the one hand to expand the field of view, On the other hand, the input depth map at a small resolution is denser and can achieve higher accuracy.
本发明提出的基于多尺度的稀疏深度稠密化方法的具体实现过程如下:The specific implementation process of the multi-scale-based sparse depth densification method proposed by the present invention is as follows:
(1)输入数据下采样:(1) Input data downsampling:
可行的下采样倍数与输入数据的大小有很大的关系。对于一张大小为M*N的输入图像而言,可行的下采样倍数范围为[2,min(M,N)*2-5]。The feasible downsampling factor has a great relationship with the size of the input data. For an input image with a size of M*N, the feasible range of downsampling multiples is [2, min(M, N)*2-5 ].
采样的方式如下所述:用K表示所选择的下采样倍数,将输入稀疏深度图按照像素划分为网格,每个网格包含K*K个原始输入像素,则输入图像将被划分为个网格。图1为下采样倍数为2时的示意图。将网格中的K*K个像素表示为像素集合P={p1,p2,...,pK*K}。The sampling method is as follows: use K to represent the selected downsampling multiple, divide the input sparse depth map into grids according to pixels, and each grid contains K*K original input pixels, then the input image will be divided into grid. Figure 1 is a schematic diagram when the downsampling factor is 2. The K*K pixels in the grid are expressed as a pixel set P={p1 , p2 , . . . , pK*K }.
由于稀疏深度图中存在深度为零的值,这些值被称为无效值。构建一个标记值s用来标记无效值,如果该像素点深度值不等于0则认为有效,令s等于1;否则为无效值,令s等于0。从而可以得到与像素集合P对应的标记值集合为S={s1,s2,...,sK*K}。Due to the presence of values at depth zero in the sparse depth map, these values are referred to as invalid values. Construct a flag value s to mark invalid values. If the pixel depth value is not equal to 0, it is considered valid, and s is equal to 1; otherwise, it is an invalid value, and s is equal to 0. Therefore, it can be obtained that the label value set corresponding to the pixel set P is S={s1 , s2 , . . . , sK*K }.
经过上述下采样后的新的深度值为:其中pn表示原始像素点的深度值,sn表示原始像素点的标记值。The new depth value after the above downsampling is: Among them, pn represents the depth value of the original pixel, and sn represents the label value of the original pixel.
对划分好的每个网格都进行上述操作,从而得到一个新的分辨率更小,更加稠密的深度图(简称小分辨率深度图)。相比于传统的降采样方法,该方式得到的小分辨率深度图更加稠密,由于剔除了无效值的影响,深度值也更加准确。RGB图像降采样则采用传统的双线性内插降采样方法。最终得到小分辨率的图像和稀疏深度图。The above operations are performed on each divided grid to obtain a new depth map with a smaller resolution and denser (referred to as a small resolution depth map). Compared with the traditional down-sampling method, the small-resolution depth map obtained by this method is denser, and the depth value is more accurate because the influence of invalid values is eliminated. RGB image down-sampling adopts traditional bilinear interpolation down-sampling method. The end result is a small-resolution image and a sparse depth map.
(2)构建残差卷积块:(2) Construct residual convolution block:
残差卷积块是本发明的多尺度网络的重要组成部分,用于提取输入数据的特征,其分为两种类型。The residual convolution block is an important part of the multi-scale network of the present invention, which is used to extract the features of the input data, and it is divided into two types.
类型一:残差卷积块R1构建过程如下:如图2-a所示,残差卷积块的第一层是一个卷积层,其卷积核大小为3*3,通道数为n,卷积步长(stide)为1。第二层与第一层结构相同。然后将输入数据与第二层的输出对应点相加。最后接入一个ReLU激活函数。残差卷积块结构固定,但是卷积层的通道数可变,调整通道数可以得到不同的残差卷积块,据此将类型一残差卷积块命名为n通道R1。R1的输入输出大小一致,其中没有下采样的操作。Type 1: The construction process of the residual convolution blockR1 is as follows: As shown in Figure 2-a, the first layer of the residual convolution block is a convolution layer with a convolution kernel size of 3*3 and a number of channels of n, the convolution step size (stide) is 1. The second layer has the same structure as the first layer. The input data is then summed with the output corresponding points of the second layer. Finally, a ReLU activation function is connected. The structure of the residual convolutional block is fixed, but the number of channels of the convolutional layer is variable, and different residual convolutional blocks can be obtained by adjusting the number of channels. Therefore, the type-one residual convolutional block is named n-channel R1 . The input and output of R1 have the same size, and there is no downsampling operation.
类型二:残差卷积块R2构建过程如下:如图2-b所示,残差卷积块的第一层是一个卷积层,其卷积核大小为3*3,通道数为n,卷积步长为2。第二层也是是一个卷积层,其卷积核大小为3*3,通道数为n,卷积步长为1。然后将输入数据通过一个卷积层,其卷积核大小为1*1,通道数为n,卷积步长为2,将该输出与第二层的输出对应点相加。最后接入一个ReLU激活函数。与R1命名方式类似,将类型二残差卷积块命名为n通道R2。R2的输入大小是输出的两倍,该操作的目的是扩大卷积核的感受野,更好的提取全局特征。Type 2: The construction process of the residual convolution blockR2 is as follows: As shown in Figure 2-b, the first layer of the residual convolution block is a convolution layer with a convolution kernel size of 3*3 and the number of channels as n, the convolution step size is 2. The second layer is also a convolution layer with a convolution kernel size of 3*3, the number of channels is n, and the convolution step is 1. Then the input data is passed through a convolution layer with a convolution kernel size of 1*1, the number of channels is n, and the convolution step is 2, and the output is added to the output corresponding point of the second layer. Finally, a ReLU activation function is connected. Similar to the naming method of R1 , the
(3)构建上采样卷积块:(3) Build an upsampling convolution block:
上采样卷积块也是多尺度网络的重要部分,其作用是将输入放大,每一个上采样卷积块可以将输入放大一倍。其构建过程如下:上采样卷积块的基本模块是放大模块,如图3-a所示,放大模块由四个并列的卷积层构成,这四个卷积层的通道数都是n,卷积核大小分别是3*3,3*2,2*3和2*2,输入通过这四个卷积层后拼接在一起,输出相比于输入扩大了一倍。如图3-b所示,上采样卷积块由两个分支构成。分支一的第一层是一个通道数为n的放大模块,其后接一个ReLU激活函数,第二层是一个卷积层,其卷积核大小是3*3,通道数为n。分支二只有一层,该层是一个通道数为n的放大模块。分支一的输出与分支二的输出对应点相加,最后接入一个ReLU激活函数。与R1,R2命名方式类似,将上采样卷积块命名为n通道D。The upsampling convolution block is also an important part of the multi-scale network. Its function is to amplify the input, and each upsampling convolution block can double the input. The construction process is as follows: the basic module of the upsampling convolutional block is the amplification module, as shown in Figure 3-a, the amplification module is composed of four parallel convolutional layers, and the number of channels of these four convolutional layers is n, The convolution kernel sizes are 3*3, 3*2, 2*3, and 2*2. The input is spliced together after passing through these four convolutional layers, and the output is doubled compared to the input. As shown in Fig. 3-b, the upsampled convolutional block consists of two branches. The first layer of
(4)构建多尺度卷积网络:(4) Construct a multi-scale convolutional network:
多尺度网络可以构建多个尺度,即可以构建多条支路,支路构建的数量与下采样倍数一样受到输入图像大小的影响,对于大小为M*N的图像而言,支路数量上限为log2(min(M,N)*2-5)+1。构建方法以两条支路为例,其要建立两条支路,一条支路的输入为原分辨率,另一条支路的输入为1/K原始分辨率,K为输入图像的下采样倍数。最后将两条支路进行信息融合。The multi-scale network can construct multiple scales, that is, multiple branches can be constructed. The number of branch constructions is affected by the size of the input image just like the downsampling multiple. For an image with a size of M*N, the upper limit of the number of branches is log2 (min(M,N)*2-5 )+1. The construction method takes two branches as an example. It needs to establish two branches. The input of one branch is the original resolution, and the input of the other branch is 1/K original resolution, and K is the downsampling multiple of the input image. . Finally, the information fusion of the two branches is carried out.
第一条支路,即输入为原始分辨率的支路构建如下:The first branch, that is, the branch whose input is native resolution, is constructed as follows:
第一层为输入层和池化层,输入层的卷积核大小为7*7,通道数为64,卷积步长为2。池化层采用最大值池化,其卷积核大小为3*3,池化常数为2。原始输入的尺寸为M*N*4,通过第一层后尺寸变为即大小变为原来的1/4,通道数变为64个。The first layer is the input layer and the pooling layer. The convolution kernel size of the input layer is 7*7, the number of channels is 64, and the convolution step is 2. The pooling layer adopts maximum pooling, its convolution kernel size is 3*3, and the pooling constant is 2. The size of the original input is M*N*4, after passing through the first layer, the size becomes That is, the size becomes 1/4 of the original, and the number of channels becomes 64.
第二层是一个64通道的R1残差卷积块,记为R11。The second layer is a 64-channel R1 residual convolution block, denoted as R11 .
第三层结构与第二层相同,记为R12。The structure of the third layer is the same as that of the second layer, denoted as R12 .
第四层是一个128通道的R2残差卷积块记为R21。The fourth layer is a 128-channel R2 residual convolution block denoted as R21 .
第五层是一个128通道的R1残差卷积块,记为R13。The fifth layer is a 128-channel R1 residual convolution block, denoted as R13 .
第六层是一个256通道的R2残差卷积块,记为R22。The sixth layer is a 256-channel R2 residual convolution block, denoted as R22 .
第七层是一个256通道的R1残差卷积块,记为R14。The seventh layer is a 256-channel R1 residual convolution block, denoted as R14 .
第八层是一个512通道的R2残差卷积块,记为R23。The eighth layer is a 512-channel R2 residual convolution block, denoted as R23 .
第九层是一个512通道的R1残差卷积块,记为R15。The ninth layer is a 512-channel R1 residual convolution block, denoted as R15 .
第十层是一个卷积层,其卷积核大小为3*3,通道数为256,卷积步长为1。The tenth layer is a convolution layer with a convolution kernel size of 3*3, a channel number of 256, and a convolution step size of 1.
第十一层为128通道的上采样卷积块D,记为D1。The eleventh layer is a 128-channel upsampling convolutional block D, denoted as D1 .
然后将D1的输出与第七层R14的输出按照通道叠加在一起,其中R14的输出尺寸为D1的输出尺寸为叠加后的尺寸变为叠加的意义在于可以获取在卷积过程中丢失的一些原始信息,使得结果更准确。Then the output of D1 and the output of the seventh layer R14 are superimposed together according to the channel, where the output size of R14 is The output size ofD1 is The superimposed size becomes The significance of superposition is that some original information lost in the convolution process can be obtained, making the result more accurate.
第十二层为64通道的上采样卷积块D,记为D2,然后将D2的输出与R13的输出按照通道叠加。The twelfth layer is a 64-channel upsampling convolution block D, denoted as D2 , and then the output of D2 and the output of R13 are superimposed according to the channel.
第十三层是32通道的上采样卷积块D,记为D3,然后将D3的输出与R12的输出按照通道叠加。The thirteenth layer is a 32-channel upsampling convolution block D, denoted as D3 , and then the output of D3 and the output of R12 are superimposed according to the channel.
第十四层为16通道的上采样卷积块D,记为D4。The fourteenth layer is a 16-channel upsampling convolution block D, denoted as D4 .
至此,输入为原分辨率的支路的网络结构构建完毕。So far, the network structure of the branches whose input is the original resolution is constructed.
第二条支路,即输入为1/K原始分辨率的支路构建如下:The second branch, the branch whose input is 1/K native resolution, is constructed as follows:
前十四层结构与输入为原始分辨率的支路完全相同,其后要根据支路的输入大小添加对应个数的16通道的上采样卷积块D。对于输入为1/K原始分辨率(下采样倍数为K)的支路而言,则要添加K/2个上采样卷积块。如图4为一个两条支路的情形,其中第二条支路输入为1/2原始分辨率(下采样倍数为2)的例子,其第二条支路要添加的上采样卷积块D的个数就是1个。多分辨率的情况与之类似,如果输入是1/4原始分辨率,则添加两个16通道的上采样卷积块,以此类推。The structure of the first fourteen layers is exactly the same as that of the branch whose input is the original resolution, and then a corresponding number of 16-channel upsampling convolutional blocks D are added according to the input size of the branch. For a branch whose input is 1/K original resolution (downsampling multiple is K), K/2 upsampling convolution blocks are added. Figure 4 is a case of two branches, where the second branch input is an example of 1/2 original resolution (downsampling multiple is 2), the upsampling convolution block to be added to the second branch The number of D is 1. The case of multi-resolution is similar, if the input is 1/4 original resolution, add two 16-channel upsampling convolutional blocks, and so on.
支路构建完成后,需要将这两条支路的信息进行融合,信息融合的结构如下:将第一条支路的输出与第二条支路的输出对应点相加,作为信息融合模块的输入。信息融合模块的网络结构为一个卷积层,其卷积核大小为3*3,通道数为1,最后将该层输出经过线性上采样得到大小和原始输入一样大小的最终结果。After the branch is constructed, the information of the two branches needs to be fused. The structure of the information fusion is as follows: the output of the first branch is added to the corresponding point of the output of the second branch, as the information fusion module enter. The network structure of the information fusion module is a convolution layer with a convolution kernel size of 3*3 and a channel number of 1. Finally, the output of this layer is linearly up-sampled to obtain the final result with the same size as the original input.
多余多支路(两条以上)情况下的信息融合,则是:Information fusion in the case of redundant multi-branch (more than two) is:
(5)损失函数的设置:(5) Setting of loss function:
本具体实施方式中,损失函数采用Smooth L1损失函数,即其中d表示卷积神经网络估计出来的深度值,dg表示标准的深度值,N表示一张深度图中像素个数的总和。In this specific embodiment, the loss function adopts the Smooth L1 loss function, that is Where d represents the depth value estimated by the convolutional neural network, dg represents the standard depth value, and N represents the sum of the number of pixels in a depth map.
(6)模型的训练和测试:(6) Model training and testing:
本具体实施方式中,采用的训练数据来源于公开数据集NYU-Depth-v2 dataset。该数据集包含了RGB图像和稠密的深度图,其大小为640*480。训练过程选用了48000张RGB图像和其对应的稠密深度图作为训练数据;测试过程选用了654张RGB图像和其对应的稠密深度作为测试数据。网络的输入是一张RGB图像以及一张稀疏深度图,该数据集不存在稀疏深度图,可以通过对稠密深度图随机采样1000个点得到稀疏深度图,与RGB图像组合成RGBD图像作为输入。In this specific embodiment, the training data used comes from the public dataset NYU-Depth-v2 dataset. The dataset contains RGB images and dense depth maps with a size of 640*480. In the training process, 48,000 RGB images and their corresponding dense depth maps were selected as training data; in the testing process, 654 RGB images and their corresponding dense depth maps were selected as test data. The input of the network is an RGB image and a sparse depth map. There is no sparse depth map in this data set. The sparse depth map can be obtained by randomly sampling 1000 points from the dense depth map, and combined with the RGB image to form an RGBD image as input.
训练时,将RGBD图像下采样成320*240大小,再进行中心切割得到304*228大小的RGBD图像(即输入多尺度网络模型的原始图像),将该图像作为第一条支路的输入,然后将该图像按照步骤(1)所述的方法下采样两倍得到152*114大小的RGBD图像作为第二条支路的输入。一次训练8张图像,则训练完整个数据集需要6000次,将整个数据集训练15遍,则一共要训练90000次。训练时的学习率采用变化的学习率,初始学习率设置为0.01,数据集每训练完5遍,学习率下降10倍,最后学习率为0.0001。训练完毕后将模型的参数保存。During training, the RGBD image is down-sampled to a size of 320*240, and then the center cut is performed to obtain an RGBD image of a size of 304*228 (that is, the original image input to the multi-scale network model), and this image is used as the input of the first branch, Then the image is down-sampled twice according to the method described in step (1) to obtain an RGBD image with a size of 152*114 as the input of the second branch. To train 8 images at a time, it takes 6,000 times to train the entire data set, and to train the entire data set 15 times, it takes a total of 90,000 times to train. The learning rate during training adopts a variable learning rate. The initial learning rate is set to 0.01. After the data set is trained 5 times, the learning rate decreases by 10 times, and the final learning rate is 0.0001. After training, save the parameters of the model.
测试时,读取模型的参数,数据处理方式于训练过程相同,将处理后的数据输入到模型中,输出最终的结果。如图5所示,是本发明的输出结果和现有的深度学习方法的一些比较。整体来看,本发明的结果更清晰,从黑框中的结果比较可以看出,本发明的结果细节体现的更好。During the test, the parameters of the model are read, and the data processing method is the same as the training process. The processed data is input into the model, and the final result is output. As shown in Figure 5, it is some comparisons between the output results of the present invention and existing deep learning methods. On the whole, the results of the present invention are clearer, and it can be seen from the comparison of the results in the black box that the details of the results of the present invention are better reflected.
以上所述,仅为本发明的具体实施方式,本说明书中所公开的任一特征,除非特别叙述,均可被其他等效或具有类似目的的替代特征加以替换;所公开的所有特征、或所有方法或过程中的步骤,除了互相排斥的特征和/或步骤以外,均可以任何方式组合。The above is only a specific embodiment of the present invention. Any feature disclosed in this specification, unless specifically stated, can be replaced by other equivalent or alternative features with similar purposes; all the disclosed features, or All method or process steps may be combined in any way, except for mutually exclusive features and/or steps.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811531022.5ACN109685842B (en) | 2018-12-14 | 2018-12-14 | Sparse depth densification method based on multi-scale network |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811531022.5ACN109685842B (en) | 2018-12-14 | 2018-12-14 | Sparse depth densification method based on multi-scale network |
| Publication Number | Publication Date |
|---|---|
| CN109685842A CN109685842A (en) | 2019-04-26 |
| CN109685842Btrue CN109685842B (en) | 2023-03-21 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201811531022.5AExpired - Fee RelatedCN109685842B (en) | 2018-12-14 | 2018-12-14 | Sparse depth densification method based on multi-scale network |
| Country | Link |
|---|---|
| CN (1) | CN109685842B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110490118A (en)* | 2019-08-14 | 2019-11-22 | 厦门美图之家科技有限公司 | Image processing method and device |
| GB2586869B (en)* | 2019-09-06 | 2023-02-15 | Imperial College Innovations Ltd | Scene representation using image processing |
| CN110796105A (en)* | 2019-11-04 | 2020-02-14 | 中国矿业大学 | Remote sensing image semantic segmentation method based on multi-modal data fusion |
| CN113034562B (en)* | 2019-12-09 | 2023-05-12 | 百度在线网络技术(北京)有限公司 | Method and apparatus for optimizing depth information |
| CN111062981B (en)* | 2019-12-13 | 2023-05-05 | 腾讯科技(深圳)有限公司 | Image processing method, device and storage medium |
| CN111079683B (en)* | 2019-12-24 | 2023-12-12 | 天津大学 | Remote sensing image cloud and snow detection method based on convolutional neural network |
| CN111199516B (en)* | 2019-12-30 | 2023-05-05 | 深圳大学 | Image processing method, system and storage medium based on image generation network model |
| CN111179331B (en)* | 2019-12-31 | 2023-09-08 | 智车优行科技(上海)有限公司 | Depth estimation method, depth estimation device, electronic equipment and computer readable storage medium |
| CN110992271B (en)* | 2020-03-04 | 2020-07-07 | 腾讯科技(深圳)有限公司 | Image processing method, path planning method, device, device and storage medium |
| CN113496138B (en)* | 2020-03-18 | 2025-03-07 | 广州极飞科技股份有限公司 | Method, device, computer equipment and storage medium for generating dense point cloud data |
| CN111667522A (en)* | 2020-06-04 | 2020-09-15 | 上海眼控科技股份有限公司 | Three-dimensional laser point cloud densification method and equipment |
| CN111815766B (en)* | 2020-07-28 | 2024-04-30 | 复影(上海)医疗科技有限公司 | Processing method and system for reconstructing three-dimensional model of blood vessel based on 2D-DSA image |
| CN114078149B (en)* | 2020-08-21 | 2025-02-07 | 深圳市万普拉斯科技有限公司 | Image estimation method, electronic device and storage medium |
| CN112001914B (en)* | 2020-08-31 | 2024-03-01 | 三星(中国)半导体有限公司 | Depth image complement method and device |
| CN112102472B (en)* | 2020-09-01 | 2022-04-29 | 北京航空航天大学 | Sparse three-dimensional point cloud densification method |
| CN112132880B (en)* | 2020-09-02 | 2024-05-03 | 东南大学 | A real-time dense depth estimation method based on sparse measurements and monocular RGB images |
| CN112258626A (en)* | 2020-09-18 | 2021-01-22 | 山东师范大学 | Three-dimensional model generation method and system for generating dense point cloud based on image cascade |
| EP3975105A1 (en)* | 2020-09-25 | 2022-03-30 | Aptiv Technologies Limited | Method and system for interpolation and method and system for determining a map of a surrounding of a vehicle |
| CN112837262B (en)* | 2020-12-04 | 2023-04-07 | 国网宁夏电力有限公司检修公司 | Method, medium and system for detecting opening and closing states of disconnecting link |
| CN112861729B (en)* | 2021-02-08 | 2022-07-08 | 浙江大学 | Real-time depth completion method based on pseudo-depth map guidance |
| CN113256546A (en)* | 2021-05-24 | 2021-08-13 | 浙江大学 | Depth map completion method based on color map guidance |
| CN115409746A (en)* | 2021-05-27 | 2022-11-29 | 北京万集科技股份有限公司 | Method, device, device and storage medium for processing point cloud data |
| CN113344839B (en)* | 2021-08-06 | 2022-01-07 | 深圳市汇顶科技股份有限公司 | Depth image acquisition device, fusion method and terminal equipment |
| EP4156085A4 (en) | 2021-08-06 | 2023-04-26 | Shenzhen Goodix Technology Co., Ltd. | Depth image collection apparatus, depth image fusion method and terminal device |
| CN113807417B (en)* | 2021-08-31 | 2023-05-30 | 中国人民解放军战略支援部队信息工程大学 | Dense matching method and system based on deep learning visual field self-selection network |
| CN114627351B (en)* | 2022-02-18 | 2023-05-16 | 电子科技大学 | Fusion depth estimation method based on vision and millimeter wave radar |
| CN114494023B (en)* | 2022-04-06 | 2022-07-29 | 电子科技大学 | A Video Super-Resolution Implementation Method Based on Motion Compensation and Sparse Enhancement |
| US12430779B2 (en)* | 2022-10-24 | 2025-09-30 | Qualcomm Incorporated | Depth estimation using image and sparse depth inputs |
| CN116152066B (en)* | 2023-02-14 | 2023-07-04 | 苏州赫芯科技有限公司 | Point cloud detection method, system, equipment and medium for complete appearance of element |
| CN115861401B (en)* | 2023-02-27 | 2023-06-09 | 之江实验室 | A binocular and point cloud fusion depth restoration method, device and medium |
| CN115908531B (en)* | 2023-03-09 | 2023-06-13 | 深圳市灵明光子科技有限公司 | Vehicle-mounted ranging method and device, vehicle-mounted terminal and readable storage medium |
| CN116503460B (en)* | 2023-04-07 | 2024-11-05 | 北京鉴智科技有限公司 | Depth map acquisition method and device, electronic equipment and storage medium |
| CN116580073A (en)* | 2023-05-04 | 2023-08-11 | 中国科学院深圳先进技术研究院 | Light-weight, rapid and accurate self-supervision depth estimation method and system |
| CN117953029B (en)* | 2024-03-27 | 2024-06-07 | 北京科技大学 | General depth map completion method and device based on depth information propagation |
| CN120053035B (en)* | 2025-04-28 | 2025-08-26 | 南昌大学第一附属医院 | A method for locating the position of the renal pelvis in a renal pelvis ultrasound image |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108535675A (en)* | 2018-04-08 | 2018-09-14 | 朱高杰 | A kind of magnetic resonance multichannel method for reconstructing being in harmony certainly based on deep learning and data |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014165244A1 (en)* | 2013-03-13 | 2014-10-09 | Pelican Imaging Corporation | Systems and methods for synthesizing images from image data captured by an array camera using restricted depth of field depth maps in which depth estimation precision varies |
| US9412172B2 (en)* | 2013-05-06 | 2016-08-09 | Disney Enterprises, Inc. | Sparse light field representation |
| CN106408015A (en)* | 2016-09-13 | 2017-02-15 | 电子科技大学成都研究院 | Road fork identification and depth estimation method based on convolutional neural network |
| CN107767413B (en)* | 2017-09-20 | 2020-02-18 | 华南理工大学 | An Image Depth Estimation Method Based on Convolutional Neural Networks |
| CN107944459A (en)* | 2017-12-09 | 2018-04-20 | 天津大学 | A kind of RGB D object identification methods |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108535675A (en)* | 2018-04-08 | 2018-09-14 | 朱高杰 | A kind of magnetic resonance multichannel method for reconstructing being in harmony certainly based on deep learning and data |
| Title |
|---|
| 深度网络模型压缩综述;雷杰等;《软件学报》(第02期);第31-46页* |
| Publication number | Publication date |
|---|---|
| CN109685842A (en) | 2019-04-26 |
| Publication | Publication Date | Title |
|---|---|---|
| CN109685842B (en) | Sparse depth densification method based on multi-scale network | |
| CN113393522B (en) | 6D pose estimation method based on monocular RGB camera regression depth information | |
| US20210390329A1 (en) | Image processing method, device, movable platform, unmanned aerial vehicle, and storage medium | |
| CN114254696B (en) | Visible light, infrared and radar fusion target detection method based on deep learning | |
| CN113936139B (en) | Scene aerial view reconstruction method and system combining visual depth information and semantic segmentation | |
| KR102126724B1 (en) | Method and apparatus for restoring point cloud data | |
| CN112912920B (en) | Point cloud data conversion method and system for 2D convolutional neural network | |
| EP3505866B1 (en) | Method and apparatus for creating map and positioning moving entity | |
| US11734918B2 (en) | Object identification apparatus, moving body system, object identification method, object identification model learning method, and object identification model learning apparatus | |
| CN112396645B (en) | Monocular image depth estimation method and system based on convolution residual learning | |
| US20200265597A1 (en) | Method for estimating high-quality depth maps based on depth prediction and enhancement subnetworks | |
| US12260575B2 (en) | Scale-aware monocular localization and mapping | |
| CN109741383A (en) | Image depth estimation system and method based on atrous convolution and semi-supervised learning | |
| CN111046781B (en) | A Robust 3D Object Detection Method Based on Ternary Attention Mechanism | |
| CN111325782A (en) | Unsupervised monocular view depth estimation method based on multi-scale unification | |
| CN116468768B (en) | Scene depth completion method based on conditional variation self-encoder and geometric guidance | |
| CN112528808B (en) | Method and device for identifying obstacles on celestial body surface | |
| CN112767467B (en) | Double-image depth estimation method based on self-supervision deep learning | |
| AU2021103300A4 (en) | Unsupervised Monocular Depth Estimation Method Based On Multi- Scale Unification | |
| CN111832399B (en) | Attention mechanism fused cross-domain road navigation mark registration algorithm | |
| CN111580130A (en) | A Mapping Method Based on Multi-sensor Fusion | |
| CN115965961B (en) | Local-global multi-mode fusion method, system, equipment and storage medium | |
| CN116310326A (en) | A multi-modal point cloud segmentation method, system, device and storage medium | |
| CN116342675A (en) | Real-time monocular depth estimation method, system, electronic equipment and storage medium | |
| CN115908992A (en) | Method, device, equipment and storage medium for binocular stereo matching |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee | Granted publication date:20230321 |