

技术领域technical field
本发明涉及计算机视觉、深度估计领域,尤其涉及一种基于深度学习的自监督单目深度估计方法。The invention relates to the fields of computer vision and depth estimation, in particular to a self-supervised monocular depth estimation method based on deep learning.
背景技术Background technique
深度感知作为计算机视觉的基本任务之一,可以广泛应用于自动驾驶、增强现实、机器人导航和三维重建等领域。尽管有源传感器(例如:激光雷达、结构光和飞行时间)已被广泛利用以直接获取场景深度,但是有源传感器设备通常体积大、价格昂贵且具有较高的能耗。相比之下,基于RGB(彩色)图像预测深度的方法具有价格低廉、易于实现等优点。在现有基于图像的深度估计方法中,单目深度估计不依赖于感知环境的多次采集,受到了研究人员的广泛关注。As one of the basic tasks of computer vision, depth perception can be widely used in the fields of autonomous driving, augmented reality, robot navigation, and 3D reconstruction. Although active sensors (eg, lidar, structured light, and time-of-flight) have been widely utilized to directly acquire scene depth, active sensor devices are usually bulky, expensive, and have high energy consumption. In contrast, the depth prediction method based on RGB (color) images has the advantages of low price and easy implementation. Among existing image-based depth estimation methods, monocular depth estimation does not rely on multiple acquisitions of the perceptual environment, and has received extensive attention from researchers.
近年来,基于深度学习的单目深度估计方法已取得显著进展。其中,基于监督学习的方法通常需要具有真实深度标注的大型数据集来训练深度估计模型。在实际应用中,对大量图像进行高质量的像素级标注是一项具有挑战性的任务,这极大地限制了基于监督学习的单目深度估计方法的应用。与直接使用深度标签作为监督的方法不同,自监督方法旨在利用单目视频或双目图像为网络训练提供间接的监督信号。因此,研究无需深度标注信息的自监督单目深度估计方法具有重要意义和应用价值。In recent years, deep learning-based monocular depth estimation methods have made significant progress. Among them, supervised learning-based methods usually require large datasets with real depth annotations to train depth estimation models. In practical applications, high-quality pixel-level annotation of a large number of images is a challenging task, which greatly limits the application of supervised learning-based monocular depth estimation methods. Unlike methods that directly use deep labels as supervision, self-supervised methods aim to utilize monocular videos or stereo images to provide indirect supervision signals for network training. Therefore, it is of great significance and application value to study the self-supervised monocular depth estimation method without depth annotation information.
在自监督的单目深度估计方法中,一种基本的技术手段是利用单目深度估计模型从源视图中预测视差图,基于预测的视差图和源视图合成目标视图,并采用合成的目标视图和真实的目标视图间的重建误差约束深度估计模型的训练。最后,可以利用相机参数,基于预测的视差图计算出深度图。然而,现有方法通常仅关注于利用合成的目标视图来构造监督信号,没有充分地探索并利用源视图与合成的目标视图间的几何相关性。此外,由于源视图和目标视图间存在遮挡,现有的方法在视差学习中,直接最小化合成的目标视图与真实的目标视图之间的外观差异,将导致在遮挡区域附近预测的视差不准确。因此,在自监督的单目深度估计方法中,研究如何充分地探索并利用源视图与合成的目标视图之间的几何相关性以及如何解决源视图和目标视图间的遮挡问题是至关重要的。In the self-supervised monocular depth estimation method, a basic technical means is to use the monocular depth estimation model to predict the disparity map from the source view, synthesize the target view based on the predicted disparity map and the source view, and use the synthesized target view The reconstruction error between the real object view and the real object view constrains the training of the depth estimation model. Finally, a depth map can be computed based on the predicted disparity map using the camera parameters. However, existing methods usually only focus on exploiting synthesized target views to construct supervisory signals, and do not sufficiently explore and exploit the geometric correlation between source views and synthesized target views. In addition, due to the occlusion between the source view and the target view, existing methods directly minimize the appearance difference between the synthesized target view and the real target view in disparity learning, which will lead to inaccurate disparity prediction near the occluded area . Therefore, in self-supervised monocular depth estimation methods, it is crucial to study how to fully explore and exploit the geometric correlation between the source view and the synthesized target view and how to solve the occlusion problem between the source view and the target view. .
发明内容SUMMARY OF THE INVENTION
当前的自监督单目深度估计方法通常仅关注于利用合成的目标视图来构造监督信号,没有充分地利用源视图与合成的目标视图间的几何相关性,且没有分析及处理源视图和目标视图间存在的遮挡问题。本发明针对这些问题,提出一种基于深度学习的自监督单目深度估计方法,通过探索源视图与合成的目标视图间的相关性来生成辅助的视觉线索,并利用生成的视觉线索推理遮挡区域来构建遮挡引导的约束,提高自监督单目深度估计的性能,详见下文描述:Current self-supervised monocular depth estimation methods usually only focus on using synthesized target views to construct supervised signals, do not fully exploit the geometric correlation between source views and synthesized target views, and do not analyze and process source and target views. occlusion problems. Aiming at these problems, the present invention proposes a self-supervised monocular depth estimation method based on deep learning, which generates auxiliary visual cues by exploring the correlation between the source view and the synthesized target view, and uses the generated visual cues to infer the occlusion area. to build occlusion-guided constraints and improve the performance of self-supervised monocular depth estimation, as described below:
一种基于深度学习的自监督单目深度估计方法,所述方法包括:A self-supervised monocular depth estimation method based on deep learning, the method comprises:
1)分别提取原始的右视图和合成的左视图的金字塔特征,将金字塔特征进行水平相关操作以获得多尺度的相关特征Fc,并获取完善后的多尺度相关特征Fm;1) Extract the pyramid features of the original right view and the synthesized left view respectively, perform a horizontal correlation operation on the pyramid features to obtain a multi-scale correlation feature Fc , and obtain a perfected multi-scale correlation feature Fm ;
2)将Fm送入双目线索预测模块中的视觉线索预测网络,生成辅助的视觉线索Dr,并从合成的左视图再重建出右视图利用重建的右视图和真实的右视图Ir之间的图像重建损失来优化双目线索预测模块;2) Feed Fm into the visual cue prediction network in the binocular cue prediction module to generate auxiliary visual cues Dr , and from the synthesized left view Rebuild the right view Utilize reconstructed right view and the real right view Ir between image reconstruction loss to optimize the binocular cue prediction module;
3)将双目线索预测模块生成的视觉线索Dr用于约束单目深度估计网络预测的视差图Dl,使用一致性损失增强二者之间的一致性;3) The visual cue Dr generated by the binocular cue prediction module is used to constrain the disparity map Dl predicted by the monocular depth estimation network, and the consistency between the two is enhanced by the consistency loss;
4)构建遮挡引导的约束来为遮挡区域像素和非遮挡区域像素的重建误差分配不同的权重。4) Construct occlusion-guided constraints to assign different weights to the reconstruction errors of occluded area pixels and non-occluded area pixels.
其中,所述获得多尺度的相关特征Fc具体为:Wherein, the obtained multi-scale related feature Fc is specifically:
Fc=Fr(x,y)e Fl(x+d,y)Fc =Fr (x,y)e Fl (x+d,y)
其中,Fr(x,y)和Fl(x,y)分别表示特征图Fr和Fl中位置(x,y)处的值,e表示点积,d表示可能的视差值。Among them, Fr (x, y) andFl (x, y) represent the value at position (x, y) in the feature maps Fr andFl , respectively, e represents the dot product, and d represents the possible disparity value.
其中,所述完善后的多尺度相关特征Fm具体为:Wherein, the perfected multi-scale correlation feature Fm is specifically:
Fm=Concat[Fc,Conv(Fr)]Fm =Concat[Fc ,Conv(Fr )]
其中,Conv(·)表示卷积运算,Concat[·,·]表示在相同尺度上的级联操作。Among them, Conv(·) represents the convolution operation, and Concat[·,·] represents the cascade operation on the same scale.
进一步地,所述使用一致性损失增强二者之间的一致性具体为:Further, the use of consistency loss to enhance the consistency between the two is specifically:
其中,w(·)表示变形操作,用来逐像素地对齐Dr和Dl。where w(·) represents the warping operation used to align Dr and Dl pixel by pixel.
其中,所述遮挡引导的约束具体为:Wherein, the constraints of the occlusion guidance are specifically:
其中,e表示点积,p表示像素索引,N表示像素总数,γ表示偏置,SSIM(Il(p),)为真实的左视图与合成的左视图间像素p处的结构相似性,Il(p)为真实的左视图中像素p的像素值,为合成的左视图中像素p的像素值,Ml(p)为左遮挡掩模中像素p的像素值,Mr(p)为右遮挡掩模中像素p的像素值,SSIM(Ir(p),)为真实的右视图与合成的右视图间像素p处的结构相似性,Ir(p)为真实的右视图中像素p的像素值,为合成的右视图中像素p的像素值;where e represents the dot product, p represents the pixel index, N represents the total number of pixels, γ represents the bias, SSIM(Il (p), ) is the structural similarity at the pixel p between the real left view and the synthesized left view, Il (p) is the pixel value of the pixel p in the real left view, is the pixel value of pixel p in the synthesized left view, Ml (p) is the pixel value of pixel p in the left occlusion mask, Mr (p) is the pixel value of pixel p in the right occlusion mask, SSIM(Ir (p), ) is the structural similarity at pixel p between the real right view and the synthesized right view, Ir (p) is the pixel value of pixel p in the real right view, is the pixel value of pixel p in the synthesized right view;
最终训练整个网络使用的损失函数公式表达如下:The loss function formula used in the final training of the entire network is expressed as follows:
其中,λM,λcon和λes表示不同损失函数的权重。where λM , λcon and λes represent the weights of different loss functions.
本发明提供的技术方案的有益效果是:The beneficial effects of the technical scheme provided by the present invention are:
1、本发明提出了一种双目线索预测模块,通过探索源视图与合成视图间的相关性,生成辅助的视觉线索,从而实现自监督单目深度估计;1. The present invention proposes a binocular cue prediction module, which generates auxiliary visual cues by exploring the correlation between the source view and the synthesized view, thereby realizing self-supervised monocular depth estimation;
2、本发明提出了一种遮挡引导的约束,为深度估计网络的监督提供正确的指导,提升了遮挡区域附近的深度估计准确性。2. The present invention proposes a constraint of occlusion guidance, which provides correct guidance for the supervision of the depth estimation network, and improves the accuracy of depth estimation near the occlusion area.
附图说明Description of drawings
图1为一种基于深度学习的自监督单目深度估计方法的流程图;1 is a flowchart of a deep learning-based self-supervised monocular depth estimation method;
图2为本发明方法与其他方法的对比结果示意图。Figure 2 is a schematic diagram of the comparison results between the method of the present invention and other methods.
具体实施方式Detailed ways
为使本发明的目的、技术方案和优点更加清楚,下面对本发明实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present invention clearer, the embodiments of the present invention are further described in detail below.
本发明实施例提供了一种基于深度学习的自监督单目深度估计方法,参见图1,该方法包括以下步骤:An embodiment of the present invention provides a deep learning-based self-supervised monocular depth estimation method. Referring to FIG. 1 , the method includes the following steps:
一、构建单目深度估计网络1. Building a monocular depth estimation network
对原始的右视图Ir,使用单目深度估计网络从右视图Ir中学习从右到左的视差图Dl。单目深度估计网络采用具有跳跃连接的编码器-解码器的网络结构,编码器网络使用ResNet50来对右视图进行特征提取,解码器网络由连续的反卷积和跳跃式连接构成来逐步地将特征图的分辨率恢复到输入图像的分辨率。从单目深度估计网络得到视差图Dl之后,利用右视图Ir和视差图Dl来合成左视图将合成的左视图和真实的左视图Il之间的图像重建损失作为目标函数来优化单目深度估计网络。For the original right view Ir , a right-to-left disparity map Dl is learned from the right view Ir using a monocular depth estimation network. The monocular depth estimation network adopts an encoder-decoder network structure with skip connections. The encoder network uses ResNet50 to extract features from the right view, and the decoder network consists of continuous deconvolution and skip connections to gradually The resolution of the feature map is restored to that of the input image. After the disparity mapDl is obtained from the monocular depth estimation network, the left view is synthesized using the right viewIr and the disparity mapDl Left view to be composited and the real left viewIl between image reconstruction loss as the objective function to optimize the monocular depth estimation network.
二、构建双目线索预测模块2. Build a binocular cue prediction module
对原始的右视图Ir和合成的左视图使用双目线索预测模块学习它们的几何对应关系,并生成辅助的视觉线索。首先分别提取原始的右视图和合成的左视图的金字塔特征Fr和Fl。For the original right view Ir and the synthesized left view Use a binocular cue prediction module to learn their geometric correspondence and generate auxiliary visual cues. First, the original right-view and synthesized left-view pyramid featuresFr andFl are extracted, respectively.
为了学习这两个视图之间的几何对应关系,将具有相同尺度的金字塔特征Fr和Fl进行水平相关操作以获得多尺度的相关特征Fc,公式如下:Fc=Fr(x,y)e Fl(x+d,y) (1)In order to learn the geometric correspondence between these two views, the pyramid features Fr and Fl with the same scale aresubjected to horizontal correlation operation to obtain the multi-scalecorrelation feature Fc , the formula is as follows: Fc =Fr (x, y)e Fl (x+d,y) (1)
其中,Fr(x,y)和Fl(x,y)分别表示特征图Fr和Fl中位置(x,y)处的值,e表示点积,d表示可能的视差值。Fc的通道数为可能的视差值的数量。Among them, Fr (x, y) andFl (x, y) represent the value at position (x, y) in the feature maps Fr andFl , respectively, e represents the dot product, and d represents the possible disparity value. The number of channels of Fc is the number of possible disparity values.
上述水平相关操作将原始的右视图和合成的左视图之间的几何对应关系在不同的尺度上进行了编码。The above horizontal correlation operation encodes the geometric correspondence between the original right view and the synthesized left view at different scales.
为了在双目线索预测模块中预测更准确的视差图,保留右特征Fr的详细信息,以进一步完善多尺度相关特征Fc,公式如下:Fm=Concat[Fc,Conv(Fr)] (2)In order to predict a more accurate disparity map in the binocular cue prediction module, the detailed information of the right feature Fr is retained to further improve the multi-scale correlation feature Fc , the formula is as follows: Fm =Concat[Fc ,Conv(Fr) ] (2)
其中,Fm表示完善后的多尺度相关特征,Conv(·)表示卷积运算,Concat[·,·]表示在相同尺度上的级联操作。Among them, Fm represents the multi-scale correlation feature after improvement, Conv( ) represents the convolution operation, and Concat[ , ] represents the cascade operation on the same scale.
之后,将Fm送入双目线索预测模块中的视觉线索预测网络,生成辅助的视觉线索Dr,该视觉线索表示原始的右视图和合成的左视图之间相应像素的水平偏移。根据视觉线索Dr,从合成的左视图再重建出右视图利用重建的右视图和真实的右视图Ir之间的图像重建损失来优化双目线索预测模块。Afterwards,Fm is fed into the visual cue prediction network in the binocular cue prediction module to generate auxiliary visual cuesDr , which represent the horizontal offset of corresponding pixels between the original right view and the synthesized left view. From the synthesized left view, according to visualcues Dr Rebuild the right view Utilize reconstructed right view and the real right view Ir between image reconstruction loss to optimize the binocular cue prediction module.
其中,视觉线索预测网络的结构为:包含了13个残差块的编码器以及包含了6个反卷积块的解码器组成的编解码网络,残差块采用的是ResNet50中的残差块,反卷积块由一个卷积层和一个上采样层组成。Among them, the structure of the visual cue prediction network is: an encoder including 13 residual blocks and an encoder and decoder network including 6 deconvolution blocks. The residual block is the residual block in ResNet50. , the deconvolution block consists of a convolutional layer and an upsampling layer.
为了利用双目线索预测模块来帮助单目深度估计网络,将双目线索预测模块生成的Dr用于约束单目深度估计网络预测的视差图Dl。考虑到Dr和Dl分别表征了从左视图到右视图以及从右视图到左视图的几何对应关系,因此本发明实施例使用一致性损失来增强Dr和Dl之间的一致性。一致性损失函数的公式表达如下:In order to utilize the stereo cue prediction module to help the monocular depth estimation network, the Dr generated by the stereo cue prediction module is used to constrain the disparity map Dl predicted by the monocular depth estimation network. Considering thatDr andDl respectively represent the geometric correspondence from left view to right view and from right view to left view, the embodiment of the present invention uses consistency loss to enhance the consistency betweenDr andD1 . The formula of the consistency loss function is expressed as follows:
其中,w(·)表示变形操作,用来逐像素地对齐Dr和Dl,以使得Dr和Dl之间的一致性可以直接用L1损失来测量。where w(·) represents the warping operation used to alignDr andDl pixel by pixel so that the consistency betweenDr andDl can be directly measured with the L1 loss.
此外,为了提高视差图的局部平滑度,使用边缘感知平滑损失来对Dr和Dl进行正则化。边缘感知平滑损失的公式表达如下:Furthermore, to improve the local smoothness of the disparity map, an edge-aware smoothing loss is used to regularize Dr and Dl . The formula for edge-aware smoothing loss is as follows:
其中,表示水平方向上的一阶微分算子,表示竖直方向上的一阶微分算子。in, represents the first-order differential operator in the horizontal direction, represents a first-order differential operator in the vertical direction.
三、构建遮挡引导的约束3. Constraints for building occlusion guidance
为了解决遮挡问题,本发明实施例构建遮挡引导的约束来为遮挡区域像素和非遮挡区域像素的重建误差分配不同的权重,从而为视差估计的监督提供正确的指导。In order to solve the occlusion problem, the embodiment of the present invention constructs an occlusion guidance constraint to assign different weights to the reconstruction errors of the pixels in the occlusion area and the pixels in the non-occlusion area, so as to provide correct guidance for the supervision of disparity estimation.
首先,利用Dr和Dl识别左视图和右视图中属于遮挡区域的像素。考虑到除了遮挡区域外,Dr和Dl中的视差值应是一致的,因此计算差异图Diffl和Diffr,具体公式如下:First, useDr andDl to identify pixels belonging to occlusion areas in the left and right views. Considering that the disparity values in Dr and Dl should be consistent except for the occlusion area, the difference maps Diffl and Diffr are calculated, and the specific formula is as follows:
Diffl=|Dl-w(Dl,Dr)| (5)Diffr=|Dr-w(Dr,Dl)| (6)Diffl =|Dl -w(Dl ,Dr )| (5) Diffr =|Dr -w(Dr ,Dl )| (6)
其中,w(·)表示变形操作,|·|表示取绝对值操作。Among them, w(·) represents the deformation operation, and |·| represents the absolute value operation.
差异图中的值在遮挡区域要远大于非遮挡区域,因此通过检测差异图Diffl和Diffr中的离群点来获得二进制的遮挡掩模Ml和Mr,具体公式如下:The value in the difference map is much larger in the occluded area than in the non-occluded area, so the binary occlusion masks Ml and Mr are obtained by detecting the outliers in the difference maps Diffl and Diffr . The specific formula is as follows:
其中,W和H表示不一致图的宽度和高度,[·]表示艾弗森括号,当括号内的条件满足时将值设置为1,否则设置为0,λ表示平衡常数。在遮挡掩模Ml和Mr中,值为0的位置表示遮挡区域像素,而值为1的位置表示非遮挡区域中的像素。Among them, W and H represent the width and height of the inconsistent graph, [ ] represents the Iverson bracket, the value is set to 1 when the conditions in the bracket are satisfied, and 0 otherwise, and λ represents the equilibrium constant. In occlusion masksMl andMr , locations with a value of 0 represent pixels in the occluded area, while locations with a value of 1 represent pixels in the non-occluded area.
具体地,左遮挡掩模Ml表示在左视图中可见但在右视图中不可见的像素,右遮挡掩模Mr表示在右视图中可见但在左视图中不可见的像素。Specifically, the leftocclusion maskM1 represents pixels that are visible in the left view but not in the right view, and the right occlusion mask Mr represents pixels that are visible in the right view but not visible in the left view.
利用遮挡掩模Ml和Mr,本发明实施例使用遮挡引导的约束来为视差估计的监督提供更准确的指导,具体公式如下:Using the occlusion masks Ml and Mr , the embodiment of the present invention uses the constraints of occlusion guidance to provide more accurate guidance for the supervision of disparity estimation, and the specific formula is as follows:
其中,e表示点积,p表示像素索引,N表示像素总数,γ表示偏置,SSIM(Il(p),)为真实的左视图与合成的左视图间像素p处的结构相似性,Il(p)为真实的左视图中像素p的像素值,为合成的左视图中像素p的像素值,Ml(p)为左遮挡掩模中像素p的像素值,Mr(p)为右遮挡掩模中像素p的像素值,SSIM(Ir(p),)为真实的右视图与合成的右视图间像素p处的结构相似性,Ir(p)为真实的右视图中像素p的像素值,为合成的右视图中像素p的像素值,α=0.85。where e represents the dot product, p represents the pixel index, N represents the total number of pixels, γ represents the bias, SSIM(Il (p), ) is the structural similarity at the pixel p between the real left view and the synthesized left view, Il (p) is the pixel value of the pixel p in the real left view, is the pixel value of pixel p in the synthesized left view, Ml (p) is the pixel value of pixel p in the left occlusion mask, Mr (p) is the pixel value of pixel p in the right occlusion mask, SSIM(Ir (p), ) is the structural similarity at pixel p between the real right view and the synthesized right view, Ir (p) is the pixel value of pixel p in the real right view, is the pixel value of pixel p in the synthesized right view, α=0.85.
最终训练整个网络使用的损失函数公式表达如下:The loss function formula used in the final training of the entire network is expressed as follows:
其中,λM,λcon和λes表示不同损失函数的权重。where λM , λcon and λes represent the weights of different loss functions.
四、训练基于深度学习的自监督单目深度估计网络4. Training a deep learning-based self-supervised monocular depth estimation network
该训练过程中,基于深度学习的自监督单目深度估计网络包括单目深度估计网络、双目线索预测模块和遮挡引导约束(公式(9)和(10)),训练分为四个阶段。In this training process, the deep learning-based self-supervised monocular depth estimation network includes a monocular depth estimation network, a stereo cue prediction module, and occlusion guidance constraints (formulas (9) and (10)). The training is divided into four stages.
在第一阶段使用左视图的图像重建损失和边缘感知平滑损失训练单目深度估计网络。在第二阶段,不更新单目深度估计网络的权重,使用右视图的图像重建损失和边缘感知平滑损失训练双目线索预测模块。在第三阶段,使用和共同优化单目深度估计网络和双目线索预测模块。最后,在第四阶段,将遮挡引导约束嵌入整个网络,使用联合训练整个网络,各个损失的权重{λM,λcon,λes}分别为{1.0,1.0,0.1}。Image reconstruction loss using the left view in the first stage and edge-aware smoothing loss Train a monocular depth estimation network. In the second stage, the weights of the monocular depth estimation network are not updated, and the image reconstruction loss from the right view is used and edge-aware smoothing loss Train the binocular cue prediction module. In the third stage, use and Co-optimize the monocular depth estimation network and the binocular cue prediction module. Finally, in the fourth stage, occlusion-guided constraints are embedded throughout the network, using The entire network is jointly trained, and the weights of each loss {λM , λcon , λes } are {1.0, 1.0, 0.1}, respectively.
图2给出了预测深度图的均方根误差对比结果,对比算法包括:3Net方法和Monodepth2方法,这两种方法均是自监督单目深度估计算法。均方根误差越小,所预测的深度图越准确。如图所示,3Net方法和Monodepth2方法均得到了较大的均方根误差,原因在于这两种方法均是仅利用合成的目标视图来构造监督信号,没有进一步利用源视图与合成目标视图间的几何相关性,且没有处理源视图和目标视图间存在的遮挡问题。从图2中可以看出,通过探索源视图与合成目标视图间的相关性来生成辅助的视觉线索以及构建遮挡引导的约束,本发明方法可以获得更准确的深度图。Figure 2 shows the comparison results of the root mean square error of the predicted depth map. The comparison algorithms include: the 3Net method and the Monodepth2 method, both of which are self-supervised monocular depth estimation algorithms. The smaller the root mean square error, the more accurate the predicted depth map. As shown in the figure, both the 3Net method and the Monodepth2 method obtain large RMSE, because both methods only use the synthesized target view to construct the supervision signal, and do not further utilize the difference between the source view and the synthesized target view. The geometric dependencies of the source view and the target view are not dealt with. As can be seen from Fig. 2, the method of the present invention can obtain a more accurate depth map by exploring the correlation between the source view and the synthetic target view to generate auxiliary visual cues and construct constraints for occlusion guidance.
本发明实施例对各器件的型号除做特殊说明的以外,其他器件的型号不做限制,只要能完成上述功能的器件均可。In the embodiment of the present invention, the models of each device are not limited unless otherwise specified, as long as the device can perform the above functions.
本领域技术人员可以理解附图只是一个优选实施例的示意图,上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。Those skilled in the art can understand that the accompanying drawing is only a schematic diagram of a preferred embodiment, and the above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages or disadvantages of the embodiments.
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection of the present invention. within the range.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011562061.9ACN112561979B (en) | 2020-12-25 | 2020-12-25 | Self-supervision monocular depth estimation method based on deep learning |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011562061.9ACN112561979B (en) | 2020-12-25 | 2020-12-25 | Self-supervision monocular depth estimation method based on deep learning |
| Publication Number | Publication Date |
|---|---|
| CN112561979Atrue CN112561979A (en) | 2021-03-26 |
| CN112561979B CN112561979B (en) | 2022-06-28 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202011562061.9AActiveCN112561979B (en) | 2020-12-25 | 2020-12-25 | Self-supervision monocular depth estimation method based on deep learning |
| Country | Link |
|---|---|
| CN (1) | CN112561979B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113192149A (en)* | 2021-05-20 | 2021-07-30 | 西安交通大学 | Image depth information monocular estimation method, device and readable storage medium |
| CN113221744A (en)* | 2021-05-12 | 2021-08-06 | 天津大学 | Monocular image 3D object detection method based on deep learning |
| CN115294199A (en)* | 2022-07-15 | 2022-11-04 | 大连海洋大学 | Underwater image enhancement and depth estimation method, device and storage medium |
| US12307694B2 (en) | 2022-04-07 | 2025-05-20 | Toyota Research Institute, Inc. | Self-supervised monocular depth estimation via rigid-motion embeddings |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110163246A (en)* | 2019-04-08 | 2019-08-23 | 杭州电子科技大学 | The unsupervised depth estimation method of monocular light field image based on convolutional neural networks |
| CN110490919A (en)* | 2019-07-05 | 2019-11-22 | 天津大学 | A kind of depth estimation method of the monocular vision based on deep neural network |
| CN111445476A (en)* | 2020-02-27 | 2020-07-24 | 上海交通大学 | Monocular depth estimation method based on multimodal unsupervised image content decoupling |
| CN111508013A (en)* | 2020-04-21 | 2020-08-07 | 中国科学技术大学 | Stereo matching method |
| CN111696148A (en)* | 2020-06-17 | 2020-09-22 | 中国科学技术大学 | End-to-end stereo matching method based on convolutional neural network |
| US20200351489A1 (en)* | 2019-05-02 | 2020-11-05 | Niantic, Inc. | Self-supervised training of a depth estimation model using depth hints |
| CN111899280A (en)* | 2020-07-13 | 2020-11-06 | 哈尔滨工程大学 | Monocular vision odometer method adopting deep learning and mixed pose estimation |
| CN111899295A (en)* | 2020-06-06 | 2020-11-06 | 东南大学 | Monocular scene depth prediction method based on deep learning |
| US20200364876A1 (en)* | 2019-05-17 | 2020-11-19 | Magic Leap, Inc. | Methods and apparatuses for corner detection using neural network and corner detector |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110163246A (en)* | 2019-04-08 | 2019-08-23 | 杭州电子科技大学 | The unsupervised depth estimation method of monocular light field image based on convolutional neural networks |
| US20200351489A1 (en)* | 2019-05-02 | 2020-11-05 | Niantic, Inc. | Self-supervised training of a depth estimation model using depth hints |
| US20200364876A1 (en)* | 2019-05-17 | 2020-11-19 | Magic Leap, Inc. | Methods and apparatuses for corner detection using neural network and corner detector |
| CN110490919A (en)* | 2019-07-05 | 2019-11-22 | 天津大学 | A kind of depth estimation method of the monocular vision based on deep neural network |
| CN111445476A (en)* | 2020-02-27 | 2020-07-24 | 上海交通大学 | Monocular depth estimation method based on multimodal unsupervised image content decoupling |
| CN111508013A (en)* | 2020-04-21 | 2020-08-07 | 中国科学技术大学 | Stereo matching method |
| CN111899295A (en)* | 2020-06-06 | 2020-11-06 | 东南大学 | Monocular scene depth prediction method based on deep learning |
| CN111696148A (en)* | 2020-06-17 | 2020-09-22 | 中国科学技术大学 | End-to-end stereo matching method based on convolutional neural network |
| CN111899280A (en)* | 2020-07-13 | 2020-11-06 | 哈尔滨工程大学 | Monocular vision odometer method adopting deep learning and mixed pose estimation |
| Title |
|---|
| ANDREA PILZER 等: ""Progressive fusion for unsupervised binocular depth estimation using cycled networks"", 《ATEX》, 17 September 2019 (2019-09-17)* |
| HYESEUNG PARK 等: ""Relativistic approach for training self-supervised adversarial depth prediction model using symmetric consistency"", 《DIGTAL OBJECT IDENTIFIER》, 25 November 2020 (2020-11-25)* |
| WEI CHEN 等: ""A unified framework for depth prediction from a single image and binocular stereo matching"", 《REMOTE SENSING》, 10 February 2020 (2020-02-10)* |
| YANLING TIAN 等: ""Multi-scale dilated convolution network based depth estimation in intelligent transportation systems"", 《IEEE》, 31 December 2019 (2019-12-31)* |
| 周云成等: "基于自监督学习的番茄植株图像深度估计方法", 《农业工程学报》, no. 24, 23 December 2019 (2019-12-23)* |
| 梁正发: ""视觉感知增强关键技术研究"", 《中国博士学位论文全文数据库信息科技辑》, 15 February 2020 (2020-02-15)* |
| 熊炜 等: ""基于深度学习特征点法的单目视觉里程计"", 《计算机工程与科学》, 15 January 2020 (2020-01-15)* |
| 马成齐 等: ""抗遮挡的单目深度估计算法"", 《计算机工程与应用》, 12 May 2020 (2020-05-12)* |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113221744A (en)* | 2021-05-12 | 2021-08-06 | 天津大学 | Monocular image 3D object detection method based on deep learning |
| CN113221744B (en)* | 2021-05-12 | 2022-10-04 | 天津大学 | A 3D object detection method for monocular images based on deep learning |
| CN113192149A (en)* | 2021-05-20 | 2021-07-30 | 西安交通大学 | Image depth information monocular estimation method, device and readable storage medium |
| CN113192149B (en)* | 2021-05-20 | 2024-05-10 | 西安交通大学 | Image depth information monocular estimation method, apparatus and readable storage medium |
| US12307694B2 (en) | 2022-04-07 | 2025-05-20 | Toyota Research Institute, Inc. | Self-supervised monocular depth estimation via rigid-motion embeddings |
| CN115294199A (en)* | 2022-07-15 | 2022-11-04 | 大连海洋大学 | Underwater image enhancement and depth estimation method, device and storage medium |
| Publication number | Publication date |
|---|---|
| CN112561979B (en) | 2022-06-28 |
| Publication | Publication Date | Title |
|---|---|---|
| CN112561979B (en) | Self-supervision monocular depth estimation method based on deep learning | |
| CN110490919B (en) | Monocular vision depth estimation method based on deep neural network | |
| CN110782490A (en) | A video depth map estimation method and device with spatiotemporal consistency | |
| CN111028281B (en) | Depth information calculation method and device based on light field binocular system | |
| Ye et al. | DRM-SLAM: Towards dense reconstruction of monocular SLAM with scene depth fusion | |
| CN106875437B (en) | RGBD three-dimensional reconstruction-oriented key frame extraction method | |
| AU2017324923A1 (en) | Predicting depth from image data using a statistical model | |
| CN114359509B (en) | Multi-view natural scene reconstruction method based on deep learning | |
| CN110610486B (en) | Monocular image depth estimation method and device | |
| CN114170286B (en) | Monocular depth estimation method based on unsupervised deep learning | |
| CN115035171A (en) | Self-supervision monocular depth estimation method based on self-attention-guidance feature fusion | |
| CN103702103B (en) | Based on the grating stereo printing images synthetic method of binocular camera | |
| CN115511759B (en) | Point cloud image depth completion method based on cascading feature interaction | |
| Duan et al. | RGB-Fusion: Monocular 3D reconstruction with learned depth prediction | |
| CN114266900B (en) | Monocular 3D target detection method based on dynamic convolution | |
| CN114219900B (en) | Three-dimensional scene reconstruction method, reconstruction system and application based on mixed reality glasses | |
| CN111354030A (en) | Method for generating unsupervised monocular image depth map embedded into SENET unit | |
| Basak et al. | Monocular depth estimation using encoder-decoder architecture and transfer learning from single RGB image | |
| CN117765187A (en) | Monocular saphenous nerve mapping method based on multi-modal depth estimation guidance | |
| CN114923491A (en) | A three-dimensional multi-target online tracking method based on feature fusion and distance fusion | |
| Li et al. | Scale-aware monocular SLAM based on convolutional neural network | |
| CN110428461B (en) | Monocular SLAM method and device combined with deep learning | |
| CN116597135A (en) | RGB-D Multimodal Semantic Segmentation Method | |
| CN116051832A (en) | Three-dimensional labeling method and device for vehicle | |
| Ren et al. | Layer-wise feature refinement for accurate three-dimensional lane detection with enhanced bird’s eye view transformation |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |