Movatterモバイル変換


[0]ホーム

URL:


CN112561979A - Self-supervision monocular depth estimation method based on deep learning - Google Patents

Self-supervision monocular depth estimation method based on deep learning
Download PDF

Info

Publication number
CN112561979A
CN112561979ACN202011562061.9ACN202011562061ACN112561979ACN 112561979 ACN112561979 ACN 112561979ACN 202011562061 ACN202011562061 ACN 202011562061ACN 112561979 ACN112561979 ACN 112561979A
Authority
CN
China
Prior art keywords
pixel
depth estimation
right view
view
monocular depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011562061.9A
Other languages
Chinese (zh)
Other versions
CN112561979B (en
Inventor
雷建军
孙琳
彭勃
张哲�
刘秉正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin UniversityfiledCriticalTianjin University
Priority to CN202011562061.9ApriorityCriticalpatent/CN112561979B/en
Publication of CN112561979ApublicationCriticalpatent/CN112561979A/en
Application grantedgrantedCritical
Publication of CN112561979BpublicationCriticalpatent/CN112561979B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于深度学习的自监督单目深度估计方法,所述方法包括:分别提取原始的右视图Ir和合成的左视图

Figure DDA0002860714330000011
的金字塔特征,将金字塔特征进行水平相关操作以获得多尺度的相关特征Fc,并获取完善后的多尺度相关特征Fm;将Fm送入双目线索预测模块中的视觉线索预测网络,生成辅助的视觉线索Dr,并从合成的左视图
Figure DDA0002860714330000012
再重建出右视图
Figure DDA0002860714330000013
利用重建的右视图
Figure DDA0002860714330000014
和真实的右视图Ir之间的图像重建损失
Figure DDA0002860714330000015
来优化双目线索预测模块;将双目线索预测模块生成的视觉线索Dr用于约束单目深度估计网络预测的视差图Dl,使用一致性损失增强二者之间的一致性;构建遮挡引导的约束来为遮挡区域像素和非遮挡区域像素的重建误差分配不同的权重。

Figure 202011562061

The invention discloses a self-supervised monocular depth estimation method based on deep learning, the method comprises: extracting the original right view Ir and the synthesized left view respectively

Figure DDA0002860714330000011
Pyramid feature, perform horizontal correlation operation on the pyramid feature to obtain multi-scale correlation feature Fc , and obtain the perfect multi-scale correlation feature Fm ; send Fm to the visual cue prediction network in the binocular cue prediction module, Generate auxiliary visual cuesDr , and from the synthesized left view
Figure DDA0002860714330000012
Rebuild the right view
Figure DDA0002860714330000013
Utilize reconstructed right view
Figure DDA0002860714330000014
and the real right view Ir between image reconstruction loss
Figure DDA0002860714330000015
to optimize the binocular cue prediction module; use the visual cues Dr generated by the binocular cue prediction module to constrain the disparity map Dl predicted by the monocular depth estimation network, and use consistency loss to enhance the consistency between the two; construct occlusion Guided constraints to assign different weights to the reconstruction errors of pixels in occluded and non-occluded regions.

Figure 202011562061

Description

Translated fromChinese
一种基于深度学习的自监督单目深度估计方法A self-supervised monocular depth estimation method based on deep learning

技术领域technical field

本发明涉及计算机视觉、深度估计领域,尤其涉及一种基于深度学习的自监督单目深度估计方法。The invention relates to the fields of computer vision and depth estimation, in particular to a self-supervised monocular depth estimation method based on deep learning.

背景技术Background technique

深度感知作为计算机视觉的基本任务之一,可以广泛应用于自动驾驶、增强现实、机器人导航和三维重建等领域。尽管有源传感器(例如:激光雷达、结构光和飞行时间)已被广泛利用以直接获取场景深度,但是有源传感器设备通常体积大、价格昂贵且具有较高的能耗。相比之下,基于RGB(彩色)图像预测深度的方法具有价格低廉、易于实现等优点。在现有基于图像的深度估计方法中,单目深度估计不依赖于感知环境的多次采集,受到了研究人员的广泛关注。As one of the basic tasks of computer vision, depth perception can be widely used in the fields of autonomous driving, augmented reality, robot navigation, and 3D reconstruction. Although active sensors (eg, lidar, structured light, and time-of-flight) have been widely utilized to directly acquire scene depth, active sensor devices are usually bulky, expensive, and have high energy consumption. In contrast, the depth prediction method based on RGB (color) images has the advantages of low price and easy implementation. Among existing image-based depth estimation methods, monocular depth estimation does not rely on multiple acquisitions of the perceptual environment, and has received extensive attention from researchers.

近年来,基于深度学习的单目深度估计方法已取得显著进展。其中,基于监督学习的方法通常需要具有真实深度标注的大型数据集来训练深度估计模型。在实际应用中,对大量图像进行高质量的像素级标注是一项具有挑战性的任务,这极大地限制了基于监督学习的单目深度估计方法的应用。与直接使用深度标签作为监督的方法不同,自监督方法旨在利用单目视频或双目图像为网络训练提供间接的监督信号。因此,研究无需深度标注信息的自监督单目深度估计方法具有重要意义和应用价值。In recent years, deep learning-based monocular depth estimation methods have made significant progress. Among them, supervised learning-based methods usually require large datasets with real depth annotations to train depth estimation models. In practical applications, high-quality pixel-level annotation of a large number of images is a challenging task, which greatly limits the application of supervised learning-based monocular depth estimation methods. Unlike methods that directly use deep labels as supervision, self-supervised methods aim to utilize monocular videos or stereo images to provide indirect supervision signals for network training. Therefore, it is of great significance and application value to study the self-supervised monocular depth estimation method without depth annotation information.

在自监督的单目深度估计方法中,一种基本的技术手段是利用单目深度估计模型从源视图中预测视差图,基于预测的视差图和源视图合成目标视图,并采用合成的目标视图和真实的目标视图间的重建误差约束深度估计模型的训练。最后,可以利用相机参数,基于预测的视差图计算出深度图。然而,现有方法通常仅关注于利用合成的目标视图来构造监督信号,没有充分地探索并利用源视图与合成的目标视图间的几何相关性。此外,由于源视图和目标视图间存在遮挡,现有的方法在视差学习中,直接最小化合成的目标视图与真实的目标视图之间的外观差异,将导致在遮挡区域附近预测的视差不准确。因此,在自监督的单目深度估计方法中,研究如何充分地探索并利用源视图与合成的目标视图之间的几何相关性以及如何解决源视图和目标视图间的遮挡问题是至关重要的。In the self-supervised monocular depth estimation method, a basic technical means is to use the monocular depth estimation model to predict the disparity map from the source view, synthesize the target view based on the predicted disparity map and the source view, and use the synthesized target view The reconstruction error between the real object view and the real object view constrains the training of the depth estimation model. Finally, a depth map can be computed based on the predicted disparity map using the camera parameters. However, existing methods usually only focus on exploiting synthesized target views to construct supervisory signals, and do not sufficiently explore and exploit the geometric correlation between source views and synthesized target views. In addition, due to the occlusion between the source view and the target view, existing methods directly minimize the appearance difference between the synthesized target view and the real target view in disparity learning, which will lead to inaccurate disparity prediction near the occluded area . Therefore, in self-supervised monocular depth estimation methods, it is crucial to study how to fully explore and exploit the geometric correlation between the source view and the synthesized target view and how to solve the occlusion problem between the source view and the target view. .

发明内容SUMMARY OF THE INVENTION

当前的自监督单目深度估计方法通常仅关注于利用合成的目标视图来构造监督信号,没有充分地利用源视图与合成的目标视图间的几何相关性,且没有分析及处理源视图和目标视图间存在的遮挡问题。本发明针对这些问题,提出一种基于深度学习的自监督单目深度估计方法,通过探索源视图与合成的目标视图间的相关性来生成辅助的视觉线索,并利用生成的视觉线索推理遮挡区域来构建遮挡引导的约束,提高自监督单目深度估计的性能,详见下文描述:Current self-supervised monocular depth estimation methods usually only focus on using synthesized target views to construct supervised signals, do not fully exploit the geometric correlation between source views and synthesized target views, and do not analyze and process source and target views. occlusion problems. Aiming at these problems, the present invention proposes a self-supervised monocular depth estimation method based on deep learning, which generates auxiliary visual cues by exploring the correlation between the source view and the synthesized target view, and uses the generated visual cues to infer the occlusion area. to build occlusion-guided constraints and improve the performance of self-supervised monocular depth estimation, as described below:

一种基于深度学习的自监督单目深度估计方法,所述方法包括:A self-supervised monocular depth estimation method based on deep learning, the method comprises:

1)分别提取原始的右视图和合成的左视图的金字塔特征,将金字塔特征进行水平相关操作以获得多尺度的相关特征Fc,并获取完善后的多尺度相关特征Fm1) Extract the pyramid features of the original right view and the synthesized left view respectively, perform a horizontal correlation operation on the pyramid features to obtain a multi-scale correlation feature Fc , and obtain a perfected multi-scale correlation feature Fm ;

2)将Fm送入双目线索预测模块中的视觉线索预测网络,生成辅助的视觉线索Dr,并从合成的左视图

Figure BDA0002860714310000021
再重建出右视图
Figure BDA0002860714310000022
利用重建的右视图
Figure BDA0002860714310000023
和真实的右视图Ir之间的图像重建损失
Figure BDA0002860714310000024
来优化双目线索预测模块;2) Feed Fm into the visual cue prediction network in the binocular cue prediction module to generate auxiliary visual cues Dr , and from the synthesized left view
Figure BDA0002860714310000021
Rebuild the right view
Figure BDA0002860714310000022
Utilize reconstructed right view
Figure BDA0002860714310000023
and the real right view Ir between image reconstruction loss
Figure BDA0002860714310000024
to optimize the binocular cue prediction module;

3)将双目线索预测模块生成的视觉线索Dr用于约束单目深度估计网络预测的视差图Dl,使用一致性损失增强二者之间的一致性;3) The visual cue Dr generated by the binocular cue prediction module is used to constrain the disparity map Dl predicted by the monocular depth estimation network, and the consistency between the two is enhanced by the consistency loss;

4)构建遮挡引导的约束来为遮挡区域像素和非遮挡区域像素的重建误差分配不同的权重。4) Construct occlusion-guided constraints to assign different weights to the reconstruction errors of occluded area pixels and non-occluded area pixels.

其中,所述获得多尺度的相关特征Fc具体为:Wherein, the obtained multi-scale related feature Fc is specifically:

Fc=Fr(x,y)e Fl(x+d,y)Fc =Fr (x,y)e Fl (x+d,y)

其中,Fr(x,y)和Fl(x,y)分别表示特征图Fr和Fl中位置(x,y)处的值,e表示点积,d表示可能的视差值。Among them, Fr (x, y) andFl (x, y) represent the value at position (x, y) in the feature maps Fr andFl , respectively, e represents the dot product, and d represents the possible disparity value.

其中,所述完善后的多尺度相关特征Fm具体为:Wherein, the perfected multi-scale correlation feature Fm is specifically:

Fm=Concat[Fc,Conv(Fr)]Fm =Concat[Fc ,Conv(Fr )]

其中,Conv(·)表示卷积运算,Concat[·,·]表示在相同尺度上的级联操作。Among them, Conv(·) represents the convolution operation, and Concat[·,·] represents the cascade operation on the same scale.

进一步地,所述使用一致性损失增强二者之间的一致性具体为:Further, the use of consistency loss to enhance the consistency between the two is specifically:

Figure BDA0002860714310000025
Figure BDA0002860714310000025

其中,w(·)表示变形操作,用来逐像素地对齐Dr和Dlwhere w(·) represents the warping operation used to align Dr and Dl pixel by pixel.

其中,所述遮挡引导的约束具体为:Wherein, the constraints of the occlusion guidance are specifically:

Figure BDA0002860714310000026
Figure BDA0002860714310000026

Figure BDA0002860714310000031
Figure BDA0002860714310000031

其中,e表示点积,p表示像素索引,N表示像素总数,γ表示偏置,SSIM(Il(p),

Figure BDA0002860714310000032
)为真实的左视图与合成的左视图间像素p处的结构相似性,Il(p)为真实的左视图中像素p的像素值,
Figure BDA0002860714310000033
为合成的左视图中像素p的像素值,Ml(p)为左遮挡掩模中像素p的像素值,Mr(p)为右遮挡掩模中像素p的像素值,SSIM(Ir(p),
Figure BDA0002860714310000034
)为真实的右视图与合成的右视图间像素p处的结构相似性,Ir(p)为真实的右视图中像素p的像素值,
Figure BDA0002860714310000035
为合成的右视图中像素p的像素值;where e represents the dot product, p represents the pixel index, N represents the total number of pixels, γ represents the bias, SSIM(Il (p),
Figure BDA0002860714310000032
) is the structural similarity at the pixel p between the real left view and the synthesized left view, Il (p) is the pixel value of the pixel p in the real left view,
Figure BDA0002860714310000033
is the pixel value of pixel p in the synthesized left view, Ml (p) is the pixel value of pixel p in the left occlusion mask, Mr (p) is the pixel value of pixel p in the right occlusion mask, SSIM(Ir (p),
Figure BDA0002860714310000034
) is the structural similarity at pixel p between the real right view and the synthesized right view, Ir (p) is the pixel value of pixel p in the real right view,
Figure BDA0002860714310000035
is the pixel value of pixel p in the synthesized right view;

最终训练整个网络使用的损失函数公式表达如下:The loss function formula used in the final training of the entire network is expressed as follows:

Figure BDA0002860714310000036
Figure BDA0002860714310000036

其中,λM,λcon和λes表示不同损失函数的权重。where λM , λcon and λes represent the weights of different loss functions.

本发明提供的技术方案的有益效果是:The beneficial effects of the technical scheme provided by the present invention are:

1、本发明提出了一种双目线索预测模块,通过探索源视图与合成视图间的相关性,生成辅助的视觉线索,从而实现自监督单目深度估计;1. The present invention proposes a binocular cue prediction module, which generates auxiliary visual cues by exploring the correlation between the source view and the synthesized view, thereby realizing self-supervised monocular depth estimation;

2、本发明提出了一种遮挡引导的约束,为深度估计网络的监督提供正确的指导,提升了遮挡区域附近的深度估计准确性。2. The present invention proposes a constraint of occlusion guidance, which provides correct guidance for the supervision of the depth estimation network, and improves the accuracy of depth estimation near the occlusion area.

附图说明Description of drawings

图1为一种基于深度学习的自监督单目深度估计方法的流程图;1 is a flowchart of a deep learning-based self-supervised monocular depth estimation method;

图2为本发明方法与其他方法的对比结果示意图。Figure 2 is a schematic diagram of the comparison results between the method of the present invention and other methods.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚,下面对本发明实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present invention clearer, the embodiments of the present invention are further described in detail below.

本发明实施例提供了一种基于深度学习的自监督单目深度估计方法,参见图1,该方法包括以下步骤:An embodiment of the present invention provides a deep learning-based self-supervised monocular depth estimation method. Referring to FIG. 1 , the method includes the following steps:

一、构建单目深度估计网络1. Building a monocular depth estimation network

对原始的右视图Ir,使用单目深度估计网络从右视图Ir中学习从右到左的视差图Dl。单目深度估计网络采用具有跳跃连接的编码器-解码器的网络结构,编码器网络使用ResNet50来对右视图进行特征提取,解码器网络由连续的反卷积和跳跃式连接构成来逐步地将特征图的分辨率恢复到输入图像的分辨率。从单目深度估计网络得到视差图Dl之后,利用右视图Ir和视差图Dl来合成左视图

Figure BDA0002860714310000041
将合成的左视图
Figure BDA0002860714310000042
和真实的左视图Il之间的图像重建损失
Figure BDA0002860714310000043
作为目标函数来优化单目深度估计网络。For the original right view Ir , a right-to-left disparity map Dl is learned from the right view Ir using a monocular depth estimation network. The monocular depth estimation network adopts an encoder-decoder network structure with skip connections. The encoder network uses ResNet50 to extract features from the right view, and the decoder network consists of continuous deconvolution and skip connections to gradually The resolution of the feature map is restored to that of the input image. After the disparity mapDl is obtained from the monocular depth estimation network, the left view is synthesized using the right viewIr and the disparity mapDl
Figure BDA0002860714310000041
Left view to be composited
Figure BDA0002860714310000042
and the real left viewIl between image reconstruction loss
Figure BDA0002860714310000043
as the objective function to optimize the monocular depth estimation network.

二、构建双目线索预测模块2. Build a binocular cue prediction module

对原始的右视图Ir和合成的左视图

Figure BDA0002860714310000044
使用双目线索预测模块学习它们的几何对应关系,并生成辅助的视觉线索。首先分别提取原始的右视图和合成的左视图的金字塔特征Fr和Fl。For the original right view Ir and the synthesized left view
Figure BDA0002860714310000044
Use a binocular cue prediction module to learn their geometric correspondence and generate auxiliary visual cues. First, the original right-view and synthesized left-view pyramid featuresFr andFl are extracted, respectively.

为了学习这两个视图之间的几何对应关系,将具有相同尺度的金字塔特征Fr和Fl进行水平相关操作以获得多尺度的相关特征Fc,公式如下:Fc=Fr(x,y)e Fl(x+d,y) (1)In order to learn the geometric correspondence between these two views, the pyramid features Fr and Fl with the same scale aresubjected to horizontal correlation operation to obtain the multi-scalecorrelation feature Fc , the formula is as follows: Fc =Fr (x, y)e Fl (x+d,y) (1)

其中,Fr(x,y)和Fl(x,y)分别表示特征图Fr和Fl中位置(x,y)处的值,e表示点积,d表示可能的视差值。Fc的通道数为可能的视差值的数量。Among them, Fr (x, y) andFl (x, y) represent the value at position (x, y) in the feature maps Fr andFl , respectively, e represents the dot product, and d represents the possible disparity value. The number of channels of Fc is the number of possible disparity values.

上述水平相关操作将原始的右视图和合成的左视图之间的几何对应关系在不同的尺度上进行了编码。The above horizontal correlation operation encodes the geometric correspondence between the original right view and the synthesized left view at different scales.

为了在双目线索预测模块中预测更准确的视差图,保留右特征Fr的详细信息,以进一步完善多尺度相关特征Fc,公式如下:Fm=Concat[Fc,Conv(Fr)] (2)In order to predict a more accurate disparity map in the binocular cue prediction module, the detailed information of the right feature Fr is retained to further improve the multi-scale correlation feature Fc , the formula is as follows: Fm =Concat[Fc ,Conv(Fr) ] (2)

其中,Fm表示完善后的多尺度相关特征,Conv(·)表示卷积运算,Concat[·,·]表示在相同尺度上的级联操作。Among them, Fm represents the multi-scale correlation feature after improvement, Conv( ) represents the convolution operation, and Concat[ , ] represents the cascade operation on the same scale.

之后,将Fm送入双目线索预测模块中的视觉线索预测网络,生成辅助的视觉线索Dr,该视觉线索表示原始的右视图和合成的左视图之间相应像素的水平偏移。根据视觉线索Dr,从合成的左视图

Figure BDA0002860714310000045
再重建出右视图
Figure BDA0002860714310000046
利用重建的右视图
Figure BDA0002860714310000047
和真实的右视图Ir之间的图像重建损失
Figure BDA0002860714310000048
来优化双目线索预测模块。Afterwards,Fm is fed into the visual cue prediction network in the binocular cue prediction module to generate auxiliary visual cuesDr , which represent the horizontal offset of corresponding pixels between the original right view and the synthesized left view. From the synthesized left view, according to visualcues Dr
Figure BDA0002860714310000045
Rebuild the right view
Figure BDA0002860714310000046
Utilize reconstructed right view
Figure BDA0002860714310000047
and the real right view Ir between image reconstruction loss
Figure BDA0002860714310000048
to optimize the binocular cue prediction module.

其中,视觉线索预测网络的结构为:包含了13个残差块的编码器以及包含了6个反卷积块的解码器组成的编解码网络,残差块采用的是ResNet50中的残差块,反卷积块由一个卷积层和一个上采样层组成。Among them, the structure of the visual cue prediction network is: an encoder including 13 residual blocks and an encoder and decoder network including 6 deconvolution blocks. The residual block is the residual block in ResNet50. , the deconvolution block consists of a convolutional layer and an upsampling layer.

为了利用双目线索预测模块来帮助单目深度估计网络,将双目线索预测模块生成的Dr用于约束单目深度估计网络预测的视差图Dl。考虑到Dr和Dl分别表征了从左视图到右视图以及从右视图到左视图的几何对应关系,因此本发明实施例使用一致性损失来增强Dr和Dl之间的一致性。一致性损失函数的公式表达如下:In order to utilize the stereo cue prediction module to help the monocular depth estimation network, the Dr generated by the stereo cue prediction module is used to constrain the disparity map Dl predicted by the monocular depth estimation network. Considering thatDr andDl respectively represent the geometric correspondence from left view to right view and from right view to left view, the embodiment of the present invention uses consistency loss to enhance the consistency betweenDr andD1 . The formula of the consistency loss function is expressed as follows:

Figure BDA0002860714310000051
Figure BDA0002860714310000051

其中,w(·)表示变形操作,用来逐像素地对齐Dr和Dl,以使得Dr和Dl之间的一致性可以直接用L1损失来测量。where w(·) represents the warping operation used to alignDr andDl pixel by pixel so that the consistency betweenDr andDl can be directly measured with the L1 loss.

此外,为了提高视差图的局部平滑度,使用边缘感知平滑损失

Figure BDA0002860714310000052
来对Dr和Dl进行正则化。边缘感知平滑损失的公式表达如下:Furthermore, to improve the local smoothness of the disparity map, an edge-aware smoothing loss is used
Figure BDA0002860714310000052
to regularize Dr and Dl . The formula for edge-aware smoothing loss is as follows:

Figure BDA0002860714310000053
Figure BDA0002860714310000053

其中,

Figure BDA0002860714310000054
表示水平方向上的一阶微分算子,
Figure BDA0002860714310000055
表示竖直方向上的一阶微分算子。in,
Figure BDA0002860714310000054
represents the first-order differential operator in the horizontal direction,
Figure BDA0002860714310000055
represents a first-order differential operator in the vertical direction.

三、构建遮挡引导的约束3. Constraints for building occlusion guidance

为了解决遮挡问题,本发明实施例构建遮挡引导的约束来为遮挡区域像素和非遮挡区域像素的重建误差分配不同的权重,从而为视差估计的监督提供正确的指导。In order to solve the occlusion problem, the embodiment of the present invention constructs an occlusion guidance constraint to assign different weights to the reconstruction errors of the pixels in the occlusion area and the pixels in the non-occlusion area, so as to provide correct guidance for the supervision of disparity estimation.

首先,利用Dr和Dl识别左视图和右视图中属于遮挡区域的像素。考虑到除了遮挡区域外,Dr和Dl中的视差值应是一致的,因此计算差异图Diffl和Diffr,具体公式如下:First, useDr andDl to identify pixels belonging to occlusion areas in the left and right views. Considering that the disparity values in Dr and Dl should be consistent except for the occlusion area, the difference maps Diffl and Diffr are calculated, and the specific formula is as follows:

Diffl=|Dl-w(Dl,Dr)| (5)Diffr=|Dr-w(Dr,Dl)| (6)Diffl =|Dl -w(Dl ,Dr )| (5) Diffr =|Dr -w(Dr ,Dl )| (6)

其中,w(·)表示变形操作,|·|表示取绝对值操作。Among them, w(·) represents the deformation operation, and |·| represents the absolute value operation.

差异图中的值在遮挡区域要远大于非遮挡区域,因此通过检测差异图Diffl和Diffr中的离群点来获得二进制的遮挡掩模Ml和Mr,具体公式如下:The value in the difference map is much larger in the occluded area than in the non-occluded area, so the binary occlusion masks Ml and Mr are obtained by detecting the outliers in the difference maps Diffl and Diffr . The specific formula is as follows:

Figure BDA0002860714310000056
Figure BDA0002860714310000056

Figure BDA0002860714310000057
Figure BDA0002860714310000057

其中,W和H表示不一致图的宽度和高度,[·]表示艾弗森括号,当括号内的条件满足时将值设置为1,否则设置为0,λ表示平衡常数。在遮挡掩模Ml和Mr中,值为0的位置表示遮挡区域像素,而值为1的位置表示非遮挡区域中的像素。Among them, W and H represent the width and height of the inconsistent graph, [ ] represents the Iverson bracket, the value is set to 1 when the conditions in the bracket are satisfied, and 0 otherwise, and λ represents the equilibrium constant. In occlusion masksMl andMr , locations with a value of 0 represent pixels in the occluded area, while locations with a value of 1 represent pixels in the non-occluded area.

具体地,左遮挡掩模Ml表示在左视图中可见但在右视图中不可见的像素,右遮挡掩模Mr表示在右视图中可见但在左视图中不可见的像素。Specifically, the leftocclusion maskM1 represents pixels that are visible in the left view but not in the right view, and the right occlusion mask Mr represents pixels that are visible in the right view but not visible in the left view.

利用遮挡掩模Ml和Mr,本发明实施例使用遮挡引导的约束来为视差估计的监督提供更准确的指导,具体公式如下:Using the occlusion masks Ml and Mr , the embodiment of the present invention uses the constraints of occlusion guidance to provide more accurate guidance for the supervision of disparity estimation, and the specific formula is as follows:

Figure BDA0002860714310000061
Figure BDA0002860714310000061

Figure BDA0002860714310000062
Figure BDA0002860714310000062

其中,e表示点积,p表示像素索引,N表示像素总数,γ表示偏置,SSIM(Il(p),

Figure BDA0002860714310000063
)为真实的左视图与合成的左视图间像素p处的结构相似性,Il(p)为真实的左视图中像素p的像素值,
Figure BDA0002860714310000064
为合成的左视图中像素p的像素值,Ml(p)为左遮挡掩模中像素p的像素值,Mr(p)为右遮挡掩模中像素p的像素值,SSIM(Ir(p),
Figure BDA0002860714310000065
)为真实的右视图与合成的右视图间像素p处的结构相似性,Ir(p)为真实的右视图中像素p的像素值,
Figure BDA0002860714310000066
为合成的右视图中像素p的像素值,α=0.85。where e represents the dot product, p represents the pixel index, N represents the total number of pixels, γ represents the bias, SSIM(Il (p),
Figure BDA0002860714310000063
) is the structural similarity at the pixel p between the real left view and the synthesized left view, Il (p) is the pixel value of the pixel p in the real left view,
Figure BDA0002860714310000064
is the pixel value of pixel p in the synthesized left view, Ml (p) is the pixel value of pixel p in the left occlusion mask, Mr (p) is the pixel value of pixel p in the right occlusion mask, SSIM(Ir (p),
Figure BDA0002860714310000065
) is the structural similarity at pixel p between the real right view and the synthesized right view, Ir (p) is the pixel value of pixel p in the real right view,
Figure BDA0002860714310000066
is the pixel value of pixel p in the synthesized right view, α=0.85.

最终训练整个网络使用的损失函数公式表达如下:The loss function formula used in the final training of the entire network is expressed as follows:

Figure BDA0002860714310000067
Figure BDA0002860714310000067

其中,λM,λcon和λes表示不同损失函数的权重。where λM , λcon and λes represent the weights of different loss functions.

四、训练基于深度学习的自监督单目深度估计网络4. Training a deep learning-based self-supervised monocular depth estimation network

该训练过程中,基于深度学习的自监督单目深度估计网络包括单目深度估计网络、双目线索预测模块和遮挡引导约束(公式(9)和(10)),训练分为四个阶段。In this training process, the deep learning-based self-supervised monocular depth estimation network includes a monocular depth estimation network, a stereo cue prediction module, and occlusion guidance constraints (formulas (9) and (10)). The training is divided into four stages.

在第一阶段使用左视图的图像重建损失

Figure BDA00028607143100000615
和边缘感知平滑损失
Figure BDA0002860714310000069
训练单目深度估计网络。在第二阶段,不更新单目深度估计网络的权重,使用右视图的图像重建损失
Figure BDA00028607143100000610
和边缘感知平滑损失
Figure BDA00028607143100000611
训练双目线索预测模块。在第三阶段,使用
Figure BDA00028607143100000612
Figure BDA00028607143100000613
共同优化单目深度估计网络和双目线索预测模块。最后,在第四阶段,将遮挡引导约束嵌入整个网络,使用
Figure BDA00028607143100000614
联合训练整个网络,各个损失的权重{λMcones}分别为{1.0,1.0,0.1}。Image reconstruction loss using the left view in the first stage
Figure BDA00028607143100000615
and edge-aware smoothing loss
Figure BDA0002860714310000069
Train a monocular depth estimation network. In the second stage, the weights of the monocular depth estimation network are not updated, and the image reconstruction loss from the right view is used
Figure BDA00028607143100000610
and edge-aware smoothing loss
Figure BDA00028607143100000611
Train the binocular cue prediction module. In the third stage, use
Figure BDA00028607143100000612
and
Figure BDA00028607143100000613
Co-optimize the monocular depth estimation network and the binocular cue prediction module. Finally, in the fourth stage, occlusion-guided constraints are embedded throughout the network, using
Figure BDA00028607143100000614
The entire network is jointly trained, and the weights of each loss {λM , λcon , λes } are {1.0, 1.0, 0.1}, respectively.

图2给出了预测深度图的均方根误差对比结果,对比算法包括:3Net方法和Monodepth2方法,这两种方法均是自监督单目深度估计算法。均方根误差越小,所预测的深度图越准确。如图所示,3Net方法和Monodepth2方法均得到了较大的均方根误差,原因在于这两种方法均是仅利用合成的目标视图来构造监督信号,没有进一步利用源视图与合成目标视图间的几何相关性,且没有处理源视图和目标视图间存在的遮挡问题。从图2中可以看出,通过探索源视图与合成目标视图间的相关性来生成辅助的视觉线索以及构建遮挡引导的约束,本发明方法可以获得更准确的深度图。Figure 2 shows the comparison results of the root mean square error of the predicted depth map. The comparison algorithms include: the 3Net method and the Monodepth2 method, both of which are self-supervised monocular depth estimation algorithms. The smaller the root mean square error, the more accurate the predicted depth map. As shown in the figure, both the 3Net method and the Monodepth2 method obtain large RMSE, because both methods only use the synthesized target view to construct the supervision signal, and do not further utilize the difference between the source view and the synthesized target view. The geometric dependencies of the source view and the target view are not dealt with. As can be seen from Fig. 2, the method of the present invention can obtain a more accurate depth map by exploring the correlation between the source view and the synthetic target view to generate auxiliary visual cues and construct constraints for occlusion guidance.

本发明实施例对各器件的型号除做特殊说明的以外,其他器件的型号不做限制,只要能完成上述功能的器件均可。In the embodiment of the present invention, the models of each device are not limited unless otherwise specified, as long as the device can perform the above functions.

本领域技术人员可以理解附图只是一个优选实施例的示意图,上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。Those skilled in the art can understand that the accompanying drawing is only a schematic diagram of a preferred embodiment, and the above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages or disadvantages of the embodiments.

以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection of the present invention. within the range.

Claims (5)

Translated fromChinese
1.一种基于深度学习的自监督单目深度估计方法,其特征在于,所述方法包括:1. A self-supervised monocular depth estimation method based on deep learning, wherein the method comprises:1)分别提取原始的右视图和合成的左视图的金字塔特征,将金字塔特征进行水平相关操作以获得多尺度的相关特征Fc,并获取完善后的多尺度相关特征Fm1) Extract the pyramid features of the original right view and the synthesized left view respectively, perform a horizontal correlation operation on the pyramid features to obtain a multi-scale correlation feature Fc , and obtain a perfected multi-scale correlation feature Fm ;2)将Fm送入双目线索预测模块中的视觉线索预测网络,生成辅助的视觉线索Dr,并从合成的左视图
Figure FDA0002860714300000011
再重建出右视图
Figure FDA0002860714300000012
利用重建的右视图
Figure FDA0002860714300000013
和真实的右视图Ir之间的图像重建损失
Figure FDA0002860714300000014
来优化双目线索预测模块;2) Feed Fm into the visual cue prediction network in the binocular cue prediction module to generate auxiliary visual cues Dr , and from the synthesized left view
Figure FDA0002860714300000011
Rebuild the right view
Figure FDA0002860714300000012
Utilize reconstructed right view
Figure FDA0002860714300000013
and the real right view Ir between image reconstruction loss
Figure FDA0002860714300000014
to optimize the binocular cue prediction module;3)将双目线索预测模块生成的视觉线索Dr用于约束单目深度估计网络预测的视差图Dl,使用一致性损失增强二者之间的一致性;3) The visual cue Dr generated by the binocular cue prediction module is used to constrain the disparity map Dl predicted by the monocular depth estimation network, and the consistency between the two is enhanced by the consistency loss;4)构建遮挡引导的约束来为遮挡区域像素和非遮挡区域像素的重建误差分配不同的权重。4) Construct occlusion-guided constraints to assign different weights to the reconstruction errors of occluded area pixels and non-occluded area pixels.2.根据权利要求1所述的一种基于深度学习的自监督单目深度估计方法,其特征在于,所述获得多尺度的相关特征Fc具体为:2. a kind of self-supervised monocular depth estimation method based on deep learning according to claim 1, is characterized in that, described obtaining multi-scale correlation feature Fc is specifically:Fc=Fr(x,y)e Fl(x+d,y)Fc =Fr (x,y)e Fl (x+d,y)其中,Fr(x,y)和Fl(x,y)分别表示特征图Fr和Fl中位置(x,y)处的值,e表示点积,d表示视差值。Among them, Fr (x, y) and Fl (x, y) represent the value at position (x, y) in the feature maps Fr and Fl , respectively, e represents the dot product, and d represents the disparity value.3.根据权利要求2所述的一种基于深度学习的自监督单目深度估计方法,其特征在于,所述完善后的多尺度相关特征Fm具体为:3. a kind of self-supervised monocular depth estimation method based on deep learning according to claim 2, is characterized in that, described perfected multi-scale correlation feature Fm is specifically:Fm=Concat[Fc,Conv(Fr)]Fm =Concat[Fc ,Conv(Fr )]其中,Conv(·)表示卷积运算,Concat[·,·]表示在相同尺度上的级联操作。Among them, Conv(·) represents the convolution operation, and Concat[·,·] represents the cascade operation on the same scale.4.根据权利要求1所述的一种基于深度学习的自监督单目深度估计方法,其特征在于,所述使用一致性损失增强二者之间的一致性具体为:4. a kind of self-supervised monocular depth estimation method based on deep learning according to claim 1, is characterized in that, described using consistency loss to enhance the consistency between the two is specifically:
Figure FDA0002860714300000015
Figure FDA0002860714300000015
其中,w(·)表示变形操作,用来逐像素地对齐Dr和Dlwhere w(·) represents the warping operation used to align Dr and Dl pixel by pixel.
5.根据权利要求4所述的一种基于深度学习的自监督单目深度估计方法,其特征在于,所述遮挡引导的约束具体为:5. a kind of self-supervised monocular depth estimation method based on deep learning according to claim 4, is characterized in that, the constraint of described occlusion guidance is specifically:
Figure FDA0002860714300000021
Figure FDA0002860714300000021
Figure FDA0002860714300000022
Figure FDA0002860714300000022
其中,e表示点积,p表示像素索引,N表示像素总数,γ表示偏置,
Figure FDA0002860714300000023
为真实的左视图与合成的左视图间像素p处的结构相似性,Il(p)为真实的左视图中像素p的像素值,
Figure FDA0002860714300000024
为合成的左视图中像素p的像素值,Ml(p)为左遮挡掩模中像素p的像素值,Mr(p)为右遮挡掩模中像素p的像素值,
Figure FDA0002860714300000025
为真实的右视图与合成的右视图间像素p处的结构相似性,Ir(p)为真实的右视图中像素p的像素值,
Figure FDA0002860714300000026
为合成的右视图中像素p的像素值;
where e is the dot product, p is the pixel index, N is the total number of pixels, γ is the bias,
Figure FDA0002860714300000023
is the structural similarity at pixel p between the real left view and the synthesized left view, Il (p) is the pixel value of pixel p in the real left view,
Figure FDA0002860714300000024
is the pixel value of pixel p in the synthesized left view, Ml (p) is the pixel value of pixel p in the left occlusion mask, Mr (p) is the pixel value of pixel p in the right occlusion mask,
Figure FDA0002860714300000025
is the structural similarity at pixel p between the real right view and the synthesized right view, Ir (p) is the pixel value of pixel p in the real right view,
Figure FDA0002860714300000026
is the pixel value of pixel p in the synthesized right view;
最终训练整个网络使用的损失函数公式表达如下:The loss function formula used in the final training of the entire network is expressed as follows:
Figure FDA0002860714300000027
Figure FDA0002860714300000027
其中,λM,λcon和λes表示不同损失函数的权重。where λM , λcon and λes represent the weights of different loss functions.
CN202011562061.9A2020-12-252020-12-25Self-supervision monocular depth estimation method based on deep learningActiveCN112561979B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202011562061.9ACN112561979B (en)2020-12-252020-12-25Self-supervision monocular depth estimation method based on deep learning

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202011562061.9ACN112561979B (en)2020-12-252020-12-25Self-supervision monocular depth estimation method based on deep learning

Publications (2)

Publication NumberPublication Date
CN112561979Atrue CN112561979A (en)2021-03-26
CN112561979B CN112561979B (en)2022-06-28

Family

ID=75032828

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202011562061.9AActiveCN112561979B (en)2020-12-252020-12-25Self-supervision monocular depth estimation method based on deep learning

Country Status (1)

CountryLink
CN (1)CN112561979B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113192149A (en)*2021-05-202021-07-30西安交通大学Image depth information monocular estimation method, device and readable storage medium
CN113221744A (en)*2021-05-122021-08-06天津大学Monocular image 3D object detection method based on deep learning
CN115294199A (en)*2022-07-152022-11-04大连海洋大学 Underwater image enhancement and depth estimation method, device and storage medium
US12307694B2 (en)2022-04-072025-05-20Toyota Research Institute, Inc.Self-supervised monocular depth estimation via rigid-motion embeddings

Citations (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110163246A (en)*2019-04-082019-08-23杭州电子科技大学The unsupervised depth estimation method of monocular light field image based on convolutional neural networks
CN110490919A (en)*2019-07-052019-11-22天津大学A kind of depth estimation method of the monocular vision based on deep neural network
CN111445476A (en)*2020-02-272020-07-24上海交通大学 Monocular depth estimation method based on multimodal unsupervised image content decoupling
CN111508013A (en)*2020-04-212020-08-07中国科学技术大学 Stereo matching method
CN111696148A (en)*2020-06-172020-09-22中国科学技术大学End-to-end stereo matching method based on convolutional neural network
US20200351489A1 (en)*2019-05-022020-11-05Niantic, Inc.Self-supervised training of a depth estimation model using depth hints
CN111899280A (en)*2020-07-132020-11-06哈尔滨工程大学Monocular vision odometer method adopting deep learning and mixed pose estimation
CN111899295A (en)*2020-06-062020-11-06东南大学Monocular scene depth prediction method based on deep learning
US20200364876A1 (en)*2019-05-172020-11-19Magic Leap, Inc.Methods and apparatuses for corner detection using neural network and corner detector

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110163246A (en)*2019-04-082019-08-23杭州电子科技大学The unsupervised depth estimation method of monocular light field image based on convolutional neural networks
US20200351489A1 (en)*2019-05-022020-11-05Niantic, Inc.Self-supervised training of a depth estimation model using depth hints
US20200364876A1 (en)*2019-05-172020-11-19Magic Leap, Inc.Methods and apparatuses for corner detection using neural network and corner detector
CN110490919A (en)*2019-07-052019-11-22天津大学A kind of depth estimation method of the monocular vision based on deep neural network
CN111445476A (en)*2020-02-272020-07-24上海交通大学 Monocular depth estimation method based on multimodal unsupervised image content decoupling
CN111508013A (en)*2020-04-212020-08-07中国科学技术大学 Stereo matching method
CN111899295A (en)*2020-06-062020-11-06东南大学Monocular scene depth prediction method based on deep learning
CN111696148A (en)*2020-06-172020-09-22中国科学技术大学End-to-end stereo matching method based on convolutional neural network
CN111899280A (en)*2020-07-132020-11-06哈尔滨工程大学Monocular vision odometer method adopting deep learning and mixed pose estimation

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
ANDREA PILZER 等: ""Progressive fusion for unsupervised binocular depth estimation using cycled networks"", 《ATEX》, 17 September 2019 (2019-09-17)*
HYESEUNG PARK 等: ""Relativistic approach for training self-supervised adversarial depth prediction model using symmetric consistency"", 《DIGTAL OBJECT IDENTIFIER》, 25 November 2020 (2020-11-25)*
WEI CHEN 等: ""A unified framework for depth prediction from a single image and binocular stereo matching"", 《REMOTE SENSING》, 10 February 2020 (2020-02-10)*
YANLING TIAN 等: ""Multi-scale dilated convolution network based depth estimation in intelligent transportation systems"", 《IEEE》, 31 December 2019 (2019-12-31)*
周云成等: "基于自监督学习的番茄植株图像深度估计方法", 《农业工程学报》, no. 24, 23 December 2019 (2019-12-23)*
梁正发: ""视觉感知增强关键技术研究"", 《中国博士学位论文全文数据库信息科技辑》, 15 February 2020 (2020-02-15)*
熊炜 等: ""基于深度学习特征点法的单目视觉里程计"", 《计算机工程与科学》, 15 January 2020 (2020-01-15)*
马成齐 等: ""抗遮挡的单目深度估计算法"", 《计算机工程与应用》, 12 May 2020 (2020-05-12)*

Cited By (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113221744A (en)*2021-05-122021-08-06天津大学Monocular image 3D object detection method based on deep learning
CN113221744B (en)*2021-05-122022-10-04天津大学 A 3D object detection method for monocular images based on deep learning
CN113192149A (en)*2021-05-202021-07-30西安交通大学Image depth information monocular estimation method, device and readable storage medium
CN113192149B (en)*2021-05-202024-05-10西安交通大学Image depth information monocular estimation method, apparatus and readable storage medium
US12307694B2 (en)2022-04-072025-05-20Toyota Research Institute, Inc.Self-supervised monocular depth estimation via rigid-motion embeddings
CN115294199A (en)*2022-07-152022-11-04大连海洋大学 Underwater image enhancement and depth estimation method, device and storage medium

Also Published As

Publication numberPublication date
CN112561979B (en)2022-06-28

Similar Documents

PublicationPublication DateTitle
CN112561979B (en)Self-supervision monocular depth estimation method based on deep learning
CN110490919B (en)Monocular vision depth estimation method based on deep neural network
CN110782490A (en) A video depth map estimation method and device with spatiotemporal consistency
CN111028281B (en)Depth information calculation method and device based on light field binocular system
Ye et al.DRM-SLAM: Towards dense reconstruction of monocular SLAM with scene depth fusion
CN106875437B (en)RGBD three-dimensional reconstruction-oriented key frame extraction method
AU2017324923A1 (en)Predicting depth from image data using a statistical model
CN114359509B (en)Multi-view natural scene reconstruction method based on deep learning
CN110610486B (en)Monocular image depth estimation method and device
CN114170286B (en)Monocular depth estimation method based on unsupervised deep learning
CN115035171A (en)Self-supervision monocular depth estimation method based on self-attention-guidance feature fusion
CN103702103B (en)Based on the grating stereo printing images synthetic method of binocular camera
CN115511759B (en)Point cloud image depth completion method based on cascading feature interaction
Duan et al.RGB-Fusion: Monocular 3D reconstruction with learned depth prediction
CN114266900B (en)Monocular 3D target detection method based on dynamic convolution
CN114219900B (en)Three-dimensional scene reconstruction method, reconstruction system and application based on mixed reality glasses
CN111354030A (en)Method for generating unsupervised monocular image depth map embedded into SENET unit
Basak et al.Monocular depth estimation using encoder-decoder architecture and transfer learning from single RGB image
CN117765187A (en)Monocular saphenous nerve mapping method based on multi-modal depth estimation guidance
CN114923491A (en) A three-dimensional multi-target online tracking method based on feature fusion and distance fusion
Li et al.Scale-aware monocular SLAM based on convolutional neural network
CN110428461B (en) Monocular SLAM method and device combined with deep learning
CN116597135A (en) RGB-D Multimodal Semantic Segmentation Method
CN116051832A (en)Three-dimensional labeling method and device for vehicle
Ren et al.Layer-wise feature refinement for accurate three-dimensional lane detection with enhanced bird’s eye view transformation

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp