Movatterモバイル変換


[0]ホーム

URL:


CN115272435A - A light field depth estimation method based on light field sequence feature analysis - Google Patents

A light field depth estimation method based on light field sequence feature analysis
Download PDF

Info

Publication number
CN115272435A
CN115272435ACN202210721840.1ACN202210721840ACN115272435ACN 115272435 ACN115272435 ACN 115272435ACN 202210721840 ACN202210721840 ACN 202210721840ACN 115272435 ACN115272435 ACN 115272435A
Authority
CN
China
Prior art keywords
light field
epi
image
network
depth estimation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210721840.1A
Other languages
Chinese (zh)
Other versions
CN115272435B (en
Inventor
韩磊
杨庆
焦良葆
路绳方
郑胜男
施展
俞翔
黄晓华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Institute of Technology
Original Assignee
Nanjing Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Institute of TechnologyfiledCriticalNanjing Institute of Technology
Priority to CN202210721840.1ApriorityCriticalpatent/CN115272435B/en
Publication of CN115272435ApublicationCriticalpatent/CN115272435A/en
Application grantedgrantedCritical
Publication of CN115272435BpublicationCriticalpatent/CN115272435B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention discloses a light field depth estimation method based on light field sequence characteristic analysis, which comprises the steps of extracting a central sub-aperture image from 4D light field data expressed by a biplane, and calculating to generate an EPI synthetic image; designing an LFRNN network which takes a central subaperture image and an EPI synthetic image as input and takes a disparity map as output, wherein the network comprises a local depth estimation module based on light field sequence analysis and a global depth optimization module based on a conditional random field model; training and evaluating an LFRNN (Linear frequency feedback network) in two stages of local depth estimation and global optimization; and testing and practical LFRNN network to evaluate network performance. The invention develops a new way to analyze the light field from the view angle of the sequence data, designs a depth characteristic extraction sub-network based on the cyclic neural network, and obviously improves the local depth estimation capability; the global depth information is modeled, and an end-to-end optimization network is designed, so that the depth estimation accuracy and robustness are remarkably improved.

Description

Translated fromChinese
一种基于光场序列特征分析的光场深度估计方法A Light Field Depth Estimation Method Based on Light Field Sequence Feature Analysis

技术领域technical field

本发明属于计算机视觉与人工智能技术领域,具体涉及一种基于光场序列特征分析的光场深度估计方法。The invention belongs to the technical field of computer vision and artificial intelligence, and in particular relates to a light field depth estimation method based on light field sequence feature analysis.

背景技术Background technique

微透镜光场相机已经进入消费类电子领域,具有较大工业应用和学术研究价值。微透镜光场相机为解决深度估计问题提供了新途径。一方面,光场成像不仅能够记录光线位置,而且能够记录光线的方向,为深度估计提供了几何依据;另一方面,微透镜光场相机具有单目多视角图像采集能力,方便了视觉系统部署,为拓展深度估计应用奠定了物理基础。Microlens light field cameras have entered the field of consumer electronics, and have great industrial application and academic research value. Microlens light-field cameras provide a new way to solve the depth estimation problem. On the one hand, light field imaging can not only record the position of light, but also record the direction of light, which provides a geometric basis for depth estimation; on the other hand, the microlens light field camera has the ability of monocular multi-view image acquisition, which facilitates the deployment of vision systems , laying a physical foundation for expanding the application of depth estimation.

基于微透镜光场相机的深度估计是近十年兴起的研究热点,大致分为传统深度估计和基于学习的深度估计两大类。传统深度估计方法主要基于结构算子提取特征,根据光场成像几何反演深度信息。例如:Tao等人运用方差和均值算子度量裁切EPI,融合散焦和对应两种深度线索,进而估计场景深度。Wanner等人利用2D结构张量估计EPI图像上的直线斜率,得到深度信息。Zhang等人引入平行四边形算子定位EPI图像上直线位置,进行深度估计。中国发明专利“一种基于光场的深度估计(ZL201510040975.1)”也利用结构张量作为初始深度估计的方法。基于结构算子的传统深度估计方法,可解释性强,但算子描述能力有限,深度估计的准确度存在提升瓶颈。Depth estimation based on microlens light field cameras is a research hotspot that has emerged in the past decade, and can be roughly divided into two categories: traditional depth estimation and learning-based depth estimation. Traditional depth estimation methods are mainly based on structural operators to extract features, and invert depth information based on light field imaging geometry. For example: Tao et al. use the variance and mean operator to measure the cut EPI, fuse defocus and corresponding two depth cues, and then estimate the depth of the scene. Wanner et al. used the 2D structure tensor to estimate the slope of the line on the EPI image to obtain depth information. Zhang et al. introduced a parallelogram operator to locate the linear position on the EPI image for depth estimation. The Chinese invention patent "A Depth Estimation Based on Light Field (ZL201510040975.1)" also uses the structure tensor as the initial depth estimation method. The traditional depth estimation method based on structural operators has strong interpretability, but the operator's description ability is limited, and there is a bottleneck in improving the accuracy of depth estimation.

近年来,随着深度学习技术的兴起,基于学习的光场深度估计方法受到青睐。Heber等人首先提出了基于深度学习的光场深度估计方法,利用卷积神经网络提取EPI块的特征,进而回归得到对应深度值。Shin等人在多个方向EPI图像的输入流上进行卷积处理,融合得到深度值。Han等人提出生成EPI合成图像的方法,进而利用多流卷积和跃层融合估计场景深度。Tsai等人引入注意力机制选择更有效的子孔径图像,然后运用卷积提取特征,并得到场景深度。中国发明专利“一种基于混合型卷积神经网络的光场图像深度估计方法(ZL201711337965.X)”公开了光场图像深度估计方法,利用卷积网络神经网络提取水平方向EPI块和垂直方向EPI块的信息,然后进行回归融合,得到深度图。中国发明专利申请“一种基于多模态信息的光场深度估计方法(公布号:CN 112767466 A)”利用卷积和空洞卷积对焦点堆栈和中心视图进行分析处理,进而预测场景深度。In recent years, with the rise of deep learning techniques, learning-based light field depth estimation methods are favored. Heber et al. first proposed a deep learning-based light field depth estimation method, using a convolutional neural network to extract the features of the EPI block, and then regression to obtain the corresponding depth value. Shin et al. performed convolution processing on the input stream of EPI images in multiple directions, and fused them to obtain depth values. Han et al. propose a method to generate EPI synthetic images, and then use multi-stream convolution and layer-hopping fusion to estimate scene depth. Tsai et al. introduced an attention mechanism to select more effective sub-aperture images, and then used convolution to extract features and obtain scene depth. The Chinese invention patent "A method for estimating the depth of light field images based on hybrid convolutional neural network (ZL201711337965.X)" discloses a method for estimating the depth of light field images, using the convolutional network neural network to extract horizontal EPI blocks and vertical EPI Block information, and then perform regression fusion to obtain a depth map. The Chinese invention patent application "A Light Field Depth Estimation Method Based on Multimodal Information (Publication No.: CN 112767466 A)" uses convolution and atrous convolution to analyze and process the focus stack and central view, and then predict the depth of the scene.

深度估计理论建模、光场数据提取方法、神经网络设计等影响着深度估计效果。目前,基于学习的方法已经成为光场深度估计方法的主流,取得了长足进步;但深度估计的准确度,以及在遮挡、噪声等方面的鲁棒性均有待提高,尤其是光场数据提取和神经网络数据处理等技术环节亟待革新突破。为此,本发明公开了一种基于向量序列分析的光场图像处理方法,并设计了局部深度估计和全局优化为一体的端到端深度估计网络,运用该网络进行光场深度估计,准确度显著提高,为三维重建、三维缺陷检测等应用提供了良好支撑。Depth estimation theoretical modeling, light field data extraction methods, neural network design, etc. affect the depth estimation effect. At present, learning-based methods have become the mainstream of light field depth estimation methods and have made great progress; however, the accuracy of depth estimation and the robustness in terms of occlusion and noise need to be improved, especially light field data extraction and Technical links such as neural network data processing are in urgent need of innovation and breakthrough. To this end, the present invention discloses a light field image processing method based on vector sequence analysis, and designs an end-to-end depth estimation network integrating local depth estimation and global optimization. Significantly improved, providing good support for 3D reconstruction, 3D defect detection and other applications.

发明内容Contents of the invention

发明目的:本发明提供一种基于光场序列特征分析的光场深度估计方法,能够估计高准确度深度结果,支撑光场三维重建、缺陷检测等应用。Purpose of the invention: The present invention provides a light field depth estimation method based on light field sequence feature analysis, which can estimate high-accuracy depth results and support applications such as light field three-dimensional reconstruction and defect detection.

技术方案:本发明所述的一种基于光场序列特征分析的光场深度估计方法,具体包括以下步骤:Technical solution: A light field depth estimation method based on light field sequence feature analysis according to the present invention specifically includes the following steps:

(1)从4D光场数据中提取中心子孔径图像

Figure BDA0003711659600000021
其中(iC,jC)表示中心子孔径图像的视角坐标;(1) Extract the central sub-aperture image from the 4D light field data
Figure BDA0003711659600000021
Wherein (iC , jC ) represents the viewing angle coordinates of the central sub-aperture image;

(2)由4D光场数据计算生成EPI合成图像ISEPI(2) generate EPI synthetic image ISEPI by calculation of 4D light field data;

(3)构建光场神经网络模型LFRNN,接收ISEPI

Figure BDA0003711659600000022
输入,输出与中心子孔径图像
Figure BDA0003711659600000023
同分辨率的视差图D;所述光场神经网络模型LFRNN包括基于光场序列分析的局部深度估计模块和基于条件随机场模型的深度优化模块;(3) Construct the light field neural network model LFRNN, receive ISEPI ,
Figure BDA0003711659600000022
input, output and central sub-aperture image
Figure BDA0003711659600000023
The disparity map D of the same resolution; the light field neural network model LFRNN includes a local depth estimation module based on light field sequence analysis and a depth optimization module based on a conditional random field model;

(4)训练步骤(3)构建的光场神经网络模型LFRNN,得到网络最优参数集P:将训练分为两个阶段进行,两个阶段均采用平均绝对误差作为损失函数;第一个阶段仅训练基于光场序列分析的局部深度估计模块,得到该模块的最优参数集P1;第二阶段冻结基于光场序列分析的局部深度估计模块的最优参数集P1,并训练整个网络,更新基于条件随机场模型的深度优化模块的参数,得到LFRNN网络的最优参数集P。(4) The light field neural network model LFRNN constructed in the training step (3) obtains the optimal parameter set P of the network: the training is divided into two stages, and the mean absolute error is used as the loss function in both stages; the first stage Only train the local depth estimation module based on light field sequence analysis, and obtain the optimal parameter set P1 of the module; in the second stage, freeze the optimal parameter set P1 of the local depth estimation module based on light field sequence analysis, and train the entire network, update Based on the parameters of the deep optimization module of the conditional random field model, the optimal parameter set P of the LFRNN network is obtained.

进一步地,所述步骤(1)实现过程如下:Further, the implementation process of the step (1) is as follows:

4D光场数据是由光场相机所采集光场图像的解码表示,记为L:(i,j,k,l)→L(i,j,k,l),其中,(i,j)表示微透镜图像的像素索引坐标或称视角坐标,(k,l)表示微透镜中心的索引坐标,i,j,k,l均为整数,L(i,j,k,l)表示(i,j)视角下通过(k,l)位置处的光线的辐射强度;抽取每个微透镜图像的中心像素,按微透镜位置索引排列得到二维图像,即

Figure BDA0003711659600000031
其中(iC,jC)表示中心子孔径图像的视角坐标。The 4D light field data is the decoded representation of the light field image collected by the light field camera, which is recorded as L:(i,j,k,l)→L(i,j,k,l), where (i,j) Represents the pixel index coordinates of the microlens image or view coordinates, (k,l) represents the index coordinates of the microlens center, i,j,k,l are integers, L(i,j,k,l) represents (i , j) Radiation intensity of the light passing through the position (k,l) under the viewing angle; the central pixel of each microlens image is extracted, and the two-dimensional image is obtained according to the position index of the microlens, namely
Figure BDA0003711659600000031
Where (iC , jC ) represents the viewing angle coordinates of the central sub-aperture image.

进一步地,所述步骤(2)实现过程如下:Further, the implementation process of the step (2) is as follows:

(21)根据输入4D光场的维度,初始化ISEPI为全0矩阵:(21) According to the dimension of the input 4D light field, initialize ISEPI as a matrix of all 0s:

4D光场L:(i,j,k,l)→L(i,j,k,l)中,角度分辨率为NAi×NAj,即i∈[0,NAi),j∈[0,NAj);空间分辨率为NSk×NSl,即k∈[0,NSk),l∈[0,NSl);则ISEPI是(NSk×NAj)×NSl的二维矩阵,初始化为全0矩阵;In a 4D light field L:(i,j,k,l)→L(i,j,k,l), the angular resolution is NAi ×NAj , that is, i∈[0,NAi ), j∈[ 0,NAj ); the spatial resolution is NSk ×NSl , that is, k∈[0,NSk ), l∈[0,NSl ); then the ISEPI is (NSk ×NAj )×NSl A two-dimensional matrix, initialized to a matrix of all 0s;

(22)对于4D光场第三维k的每一行,行序号为k*,计算其对应的EPI图像

Figure BDA0003711659600000032
并使用
Figure BDA0003711659600000033
更新ISEPI的部分区域:(22) For each row of the third dimension k of the 4D light field, the row number is k* , and the corresponding EPI image is calculated
Figure BDA0003711659600000032
and use
Figure BDA0003711659600000033
Update some areas of ISEPI :

由4D光场数据计算生成第三维第k*行对应的EPI图像的过程看作一个映射:

Figure BDA0003711659600000034
即固定4D光场中的第一和第三两个维度,变化另外两个维度所得到的二维切片图像,令i=iC,k=k*;The process of calculating and generating the EPI image corresponding to the k* th row in the third dimension from the 4D light field data is regarded as a mapping:
Figure BDA0003711659600000034
That is, fix the first and third dimensions in the 4D light field, and change the two-dimensional slice image obtained by changing the other two dimensions, let i=iC , k=k*;

使用所得的

Figure BDA0003711659600000035
更新ISEPI的部分区域,即
Figure BDA0003711659600000036
这里,ISEPI((k*-1)×NAj:k*×NAj,0:NSl)表示ISEPI中第(k*-1)×NAj行至第k*×NAj-1行,第0列至第NSl-1列的一块区域;use the proceeds
Figure BDA0003711659600000035
Update some areas of ISEPI , namely
Figure BDA0003711659600000036
Here, ISEPI ((k*-1)×NAj :k*×NAj ,0:NSl ) means that the line (k*-1)×NAj to k*×NAj -1 in ISEPI row, an area from column 0 to column NSl -1;

(23)对4D光场第三维的每一行进行步骤(22)的操作,计算生成EPI合成图像ISEPI(23) Perform the operation of step (22) on each row of the third dimension of the 4D light field to calculate and generate an EPI composite image ISEPI .

进一步地,步骤(3)所述基于光场序列分析的局部深度估计模块实包括滑窗处理层、序列特征提取子网络、特征图变形层;Further, the local depth estimation module based on light field sequence analysis described in step (3) actually includes a sliding window processing layer, a sequence feature extraction subnetwork, and a feature map deformation layer;

所述滑窗处理层负责在EPI合成图像ISEPI上滑动截取EPI块IEPI-p,输入到序列特征提取子网络;滑窗大小为(NAj,16),水平方向滑动步长为1,垂直方向滑动步长为NAj,滑窗超越ISEPI时,补0填充;The sliding window processing layer is responsible for sliding and intercepting the EPI block IEPI-p on the EPI composite image ISEPI , and inputting it to the sequence feature extraction sub-network; the size of the sliding window is (NAj , 16), and the horizontal sliding step is 1, The sliding step in the vertical direction is NAj , and when the sliding window exceeds ISEPI , fill it with 0;

所述序列特征提取子网络为提取EPI块IEPI-p的序列特征的循环神经网络,包括序列化拆分处理、双向GRU层和全连接网络;其中序列化拆分处理是基于EPI图像上蕴含着深度信息的直线分布于多列像素之中的独特观察,将NAj×16的EIP图像块IEPI-p的每列像素,看作一个列向量

Figure BDA0003711659600000041
其中,x、y分别表示EPI图像块IEPI-p上像素的行、列坐标,
Figure BDA0003711659600000042
表示EPI图像块IEPI-p上(x,y)处像素的灰度值;一个NAj×16的EPI图像块IEPI-p可以序列化为16个列向量Gy,0≤y≤15且y为整数;向量Gy将依次作为后续双向GRU层每个时刻的输入;双向GRU层由两个方向的GRU单元构成,每个方向GRU单元的维度为256,每个GRU单元设置为非序列工作模式,接收16个时刻的向量输入,产生1个输出;双向GRU层共计产生512个输出;全连接网络包含两个全连接层;第一个全连接层接收双向GRU层的512个输出,产生16个输出;该层全连接配置ReLU激活函数;第二个全连接层接收前一个全连接层的16个输出,输出1个视差值;该全连接层不配置激活函数;The sequence feature extraction sub-network is a recurrent neural network that extracts the sequence features of the EPI block IEPI-p , including serialization split processing, bidirectional GRU layer and fully connected network; wherein the serialization split processing is based on the EPI image contains According to the unique observation that the straight line of depth information is distributed in multiple columns of pixels, each column of pixels in the EIP image block IEPI-p of NAj ×16 is regarded as a column vector
Figure BDA0003711659600000041
Wherein, x and y represent the row and column coordinates of pixels on the EPI image block IEPI-p respectively,
Figure BDA0003711659600000042
Indicates the gray value of the pixel at (x, y) on the EPI image block I EPI-p ; an EPI image block IEPI-p of NAj ×16 can be serialized into 16 column vectors Gy , 0≤y≤15 And y is an integer; the vector Gy will be used as the input of the subsequent two-way GRU layer at each moment in turn; the two-way GRU layer is composed of GRU units in two directions, the dimension of each direction GRU unit is 256, and each GRU unit is set to non Sequential working mode, receiving vector input at 16 moments and generating 1 output; the bidirectional GRU layer generates a total of 512 outputs; the fully connected network includes two fully connected layers; the first fully connected layer receives 512 outputs from the bidirectional GRU layer , generating 16 outputs; this layer is fully connected and configured with a ReLU activation function; the second fully connected layer receives 16 outputs from the previous fully connected layer, andoutputs 1 disparity value; this fully connected layer does not configure an activation function;

所述特征图变形层将(NSk×NSl)个视差值序列,变形成NSk×NSl的矩阵,称为特征图,记为U。The feature map deformation layer transforms (NSk ×NSl ) disparity value sequences into a matrix of NSk ×NSl , which is called a feature map, denoted as U.

进一步地,步骤(3)所述基于条件随机场模型的深度优化模块包括中心子孔径图像核参数提取和特征图迭代优化两部分;中心子孔径图像核参数提取部分是根据输入的中心子孔径图像计算滤波器核参数;特征图迭代优化部分是以条件随机场为理论基础,按照中心子孔径图像核参数提取部分所得滤波器核参数,将特征图迭代优化,得到视差图D;Further, the depth optimization module based on the conditional random field model described in step (3) includes two parts: central sub-aperture image kernel parameter extraction and feature map iterative optimization; the central sub-aperture image kernel parameter extraction part is based on the input central sub-aperture image Calculate the filter kernel parameters; the iterative optimization part of the feature map is based on the conditional random field, and extract the filter kernel parameters according to the central sub-aperture image kernel parameters, and iteratively optimize the feature map to obtain the disparity map D;

中心子孔径图像核参数提取部分以中心子孔径图像

Figure BDA0003711659600000043
为输入,计算空间和色彩卷积核F1和空间卷积核F2:The central sub-aperture image kernel parameter extraction part is based on the central sub-aperture image
Figure BDA0003711659600000043
As input, compute the spatial and color convolution kernel F1 and the spatial convolution kernel F2 :

Figure BDA0003711659600000044
Figure BDA0003711659600000044

Figure BDA0003711659600000051
Figure BDA0003711659600000051

其中,pi、pj分别表示中心子孔径图像

Figure BDA0003711659600000052
上第i个、第j个像素的位置信息,ci、cj分别表示中心子孔径图像
Figure BDA0003711659600000053
上第i个、第j个像素的色彩信息,θα、θβ、θγ为自定义的带宽半径;Among them, pi and pj represent the central sub-aperture image respectively
Figure BDA0003711659600000052
The position information of the i-th and j-th pixels above, ci and cj represent the central sub-aperture image respectively
Figure BDA0003711659600000053
Color information of the i-th and j-th pixels above, θα , θβ , θγ are custom bandwidth radii;

特征图迭代优化部分包括并行滤波、一元项叠加、归一化因子计算、归一化四个模块;并行滤波模块通过两个通路分别对本次迭代输入μt-1进行滤波处理:第一个通路利用卷积核F1对μt-1进行滤波,即

Figure BDA0003711659600000054
然后,对滤波结果
Figure BDA0003711659600000055
乘以权重参数θ1,即
Figure BDA0003711659600000056
第二通路用卷积核F2对μt-1进行滤波,即
Figure BDA0003711659600000057
然后,对滤波结果
Figure BDA0003711659600000058
乘以权重参数θ2,即
Figure BDA0003711659600000059
第一次迭代时,μt-1初始化为特征图U;θ1、θ2做随机初始化,通过网络训练获得更新;两个通路的结果
Figure BDA00037116596000000510
逐元素相加得到并行滤波模块的输出
Figure BDA00037116596000000511
Figure BDA00037116596000000512
一元项添加模块是将特征图U与并行滤波模块的结果
Figure BDA00037116596000000513
相叠加,得
Figure BDA00037116596000000514
Figure BDA00037116596000000515
归一化因子计算模块内部也进行了并行滤波和一元项添加操作,得到归一化因子γ;其数据处理的对象是全1矩阵J,而不是μt-1和特征图U;归一化因子计算模块的具体处理步骤是:
Figure BDA00037116596000000516
Figure BDA00037116596000000517
归一化模块是将一元项添加的模块的计算结果
Figure BDA00037116596000000518
对归一化因子γ按逐元素相除,得到本轮迭代的输出μt,即
Figure BDA00037116596000000519
最后一次迭代的输出即是优化的视差图D。The feature map iterative optimization part includes four modules: parallel filtering, unary item superposition, normalization factor calculation, and normalization; the parallel filtering module performs filtering processing on the iterative input μt-1 through two channels: the first The channel uses the convolution kernel F1 to filter μt-1 , that is
Figure BDA0003711659600000054
Then, for the filtered results
Figure BDA0003711659600000055
Multiplied by the weight parameter θ1 , that is
Figure BDA0003711659600000056
The second pass uses the convolution kernel F2 to filter μt-1 , namely
Figure BDA0003711659600000057
Then, for the filtered result
Figure BDA0003711659600000058
Multiplied by the weight parameter θ2 , that is
Figure BDA0003711659600000059
In the first iteration, μt-1 is initialized as the feature map U; θ1 and θ2 are randomly initialized and updated through network training; the results of the two paths
Figure BDA00037116596000000510
Add element-wise to get the output of the parallel filter module
Figure BDA00037116596000000511
which is
Figure BDA00037116596000000512
The unary item addition module is the result of combining the feature map U with the parallel filtering module
Figure BDA00037116596000000513
superimposed, get
Figure BDA00037116596000000514
which is
Figure BDA00037116596000000515
The normalization factor calculation module also performs parallel filtering and unary item addition operations to obtain the normalization factor γ; the object of its data processing is the all-one matrix J instead of μt-1 and the feature map U; the normalization The specific processing steps of the factor calculation module are:
Figure BDA00037116596000000516
Figure BDA00037116596000000517
The normalized module is the computation of the module adding the unary term
Figure BDA00037116596000000518
Divide the normalization factor γ element by element to get the output μt of this round of iteration, namely
Figure BDA00037116596000000519
The output of the last iteration is the optimized disparity map D.

有益效果:与现有技术相比,本发明的有益效果:1、本发明依据光场成像几何,另辟蹊径地从序列数据的视角分析光场,设计了基于循环神经网络的深度特征提取子网络,替代了传统卷积神经网络的特征提取方法,显著提高了局部深度估计能力;2、本发明依据条件随机场理论,对全局深度信息建模,设计了端到端的优化网络,显著提升了深度估计准确度和鲁棒性。Beneficial effects: Compared with the prior art, the beneficial effects of the present invention: 1. Based on the light field imaging geometry, the present invention analyzes the light field from the perspective of sequence data in a new way, and designs a deep feature extraction sub-network based on a cyclic neural network. It replaces the feature extraction method of the traditional convolutional neural network, and significantly improves the local depth estimation ability; 2. The present invention models the global depth information based on the conditional random field theory, and designs an end-to-end optimization network, which significantly improves the depth estimation accuracy and robustness.

附图说明Description of drawings

图1为本发明的流程图;Fig. 1 is a flow chart of the present invention;

图2为本发明中的4D光场双平面表示图;Fig. 2 is the representation diagram of the double plane of 4D light field among the present invention;

图3为本发明中的计算生成EPI合成图像的流程图;Fig. 3 is the flow chart of calculating and generating EPI synthetic image in the present invention;

图4为本发明中的中心子孔径图像及EPI图像示例;Fig. 4 is central sub-aperture image and EPI image example among the present invention;

图5为本发明中的EPI合成图像示意图;Fig. 5 is a schematic diagram of an EPI composite image in the present invention;

图6为本发明设计的LFRNN网络体系结构图;Fig. 6 is the LFRNN network architecture diagram that the present invention designs;

图7为本发明设计的序列特征提取子网络的结构图;Fig. 7 is the structural diagram of the sequence feature extraction sub-network designed by the present invention;

图8为本发明设计的基于条件随机场模型的深度优化模块的结构图;Fig. 8 is the structural diagram of the depth optimization module based on the conditional random field model designed by the present invention;

图9为本发明与现有方法的结果对比示意图。Fig. 9 is a schematic diagram of the results comparison between the present invention and the existing method.

具体实施方式Detailed ways

下面结合附图对本发明作进一步详细说明。The present invention will be described in further detail below in conjunction with the accompanying drawings.

如图1所示,本发明公开的一种基于光场序列特征分析的光场深度估计方法,包括如下步骤:As shown in Figure 1, a light field depth estimation method based on light field sequence feature analysis disclosed by the present invention includes the following steps:

步骤1、从4D光场数据中提取中心子孔径图像

Figure BDA0003711659600000061
其中(iC,jC)表示中心子孔径图像的视角坐标。Step 1. Extract the central sub-aperture image from the 4D light field data
Figure BDA0003711659600000061
Where (iC , jC ) represents the viewing angle coordinates of the central sub-aperture image.

4D光场数据是由光场相机所采集光场图像的解码表示,如图2所示,通常用双平面法(2PP)表示光场数据,图中Π和Ω是平行平面,分别表示视角平面和位置平面,通过光线与双平面的交点表示一条光线,所有非平行于双平面的光线,形成了光场。通常将光场记为L:(i,j,k,l)→L(i,j,k,l),其中,(i,j)表示微透镜图像的像素索引坐标或称视角坐标,(k,l)表示微透镜中心的索引坐标,i,j,k,l均为整数,L(i,j,k,l)表示(i,j)视角下通过(k,l)位置处的光线的辐射强度。提取中心子孔径图像的方法是抽取每个微透镜图像的中心像素,按微透镜位置索引排列得到二维图像,即

Figure BDA0003711659600000062
其中(iC,jC)表示中心子孔径图像的视角坐标。4D light field data is the decoded representation of the light field image collected by the light field camera. As shown in Figure 2, the light field data is usually represented by the two-plane method (2PP). In the figure, Π and Ω are parallel planes, respectively representing the viewing angle plane and the position plane, a ray is represented by the intersection of the ray and the biplane, and all the rays that are not parallel to the biplane form a light field. Usually, the light field is recorded as L:(i,j,k,l)→L(i,j,k,l), where (i,j) represents the pixel index coordinates or viewing angle coordinates of the microlens image, (k ,l) indicates the index coordinates of the microlens center, i, j, k, l are all integers, L(i,j,k,l) indicates the light passing through the position (k,l) under the (i,j) viewing angle radiation intensity. The method of extracting the central sub-aperture image is to extract the central pixel of each microlens image, and arrange it according to the position index of the microlens to obtain a two-dimensional image, namely
Figure BDA0003711659600000062
Where (iC , jC ) represents the viewing angle coordinates of the central sub-aperture image.

步骤2、由4D光场数据计算生成EPI合成图像ISEPI;如图3所示,包括如下步骤:Step 2, calculate and generate EPI composite image ISEPI by 4D light field data; As shown in Figure 3, comprise the following steps:

(2.1)根据输入4D光场的维度,初始化ISEPI为全0矩阵。(2.1) According to the dimension of the input 4D light field, initialize ISEPI as a matrix of all 0s.

4D光场L:(i,j,k,l)→L(i,j,k,l)中,若假设:角度分辨率为NAi×NAj,即i∈[0,NAi),j∈[0,NAj);空间分辨率为NSk×NSl,即k∈[0,NSk),l∈[0,NSl)。则ISEPI是(NSk×NAj)×NSl的二维矩阵,初始化为全0矩阵。In the 4D light field L:(i,j,k,l)→L(i,j,k,l), if it is assumed that the angular resolution is NAi ×NAj , that is, i∈[0,NAi ), j∈[0,NAj ); the spatial resolution is NSk ×NSl , that is, k∈[0,NSk ), l∈[0,NSl ). Then ISEPI is a two-dimensional matrix of (NSk ×NAj )×NSl , which is initialized as a matrix of all 0s.

(2.2)对于4D光场第三维(k)的每一行(行序号:k*),计算其对应的EPI图像

Figure BDA0003711659600000071
并使用
Figure BDA0003711659600000072
更新ISEPI的部分区域。(2.2) For each row (row number: k* ) of the third dimension (k) of the 4D light field, calculate its corresponding EPI image
Figure BDA0003711659600000071
and use
Figure BDA0003711659600000072
Update some areas of ISEPI .

由4D光场数据计算生成第三维第k*行对应的EPI图像的过程可以看作一个映射:

Figure BDA0003711659600000073
即固定4D光场中的第一和第三两个维度,变化另外两个维度所得到的二维切片图像,令i=iC,k=k*。如图4所示,上方图像是由某场景光场数据生成的中心子孔径图像,下方图像是中心子孔径图像实线所在行对应的EPI图像。The process of calculating and generating the EPI image corresponding to the k* th row in the third dimension from the 4D light field data can be regarded as a mapping:
Figure BDA0003711659600000073
That is, fix the first and third dimensions in the 4D light field, and change the other two dimensions to obtain a two-dimensional slice image, set i=iC , k=k*. As shown in Figure 4, the upper image is the central sub-aperture image generated by the light field data of a certain scene, and the lower image is the EPI image corresponding to the row where the solid line of the central sub-aperture image is located.

然后,使用所得的

Figure BDA0003711659600000074
更新ISEPI的部分区域,即
Figure BDA0003711659600000075
这里,ISEPI((k*-1)×NAj:k*×NAj,0:NSl)表示ISEPI中第(k*-1)×NAj行至第k*×NAj-1行,第0列至第NSl-1列的一块区域。Then, use the resulting
Figure BDA0003711659600000074
Update some areas of ISEPI , namely
Figure BDA0003711659600000075
Here, ISEPI ((k*-1)×NAj :k*×NAj ,0:NSl ) means that the line (k*-1)×NAj to k*×NAj -1 in ISEPI row, an area from column 0 to column NSl -1.

(2.3)对4D光场第三维的每一行进行第(2.2)步的操作,即可计算生成EPI合成图像ISEPI。为了展示效果,图5截取EPI合成图像中的一块区域作为示例,该区域是图4中心子孔径图像上实线位置上下14行像素对应的EPI合成图像。(2.3) Perform the operation in step (2.2) on each row of the third dimension of the 4D light field to calculate and generate the EPI composite image ISEPI . In order to demonstrate the effect, Figure 5 intercepts an area in the EPI composite image as an example, which is the EPI composite image corresponding to the 14 rows of pixels above and below the solid line on the central sub-aperture image in Figure 4 .

步骤3、构建光场神经网络模型LFRNN,接收ISEPI

Figure BDA0003711659600000076
输入,输出与中心子孔径图像
Figure BDA0003711659600000077
同分辨率的视差图D。如图6所示,光场神经网络模型LFRNN包括基于光场序列分析的局部深度估计模块和基于条件随机场模型的深度优化模块。Step 3. Construct the light field neural network model LFRNN, receive ISEPI ,
Figure BDA0003711659600000076
input, output and central sub-aperture images
Figure BDA0003711659600000077
Disparity map D at the same resolution. As shown in Figure 6, the light field neural network model LFRNN includes a local depth estimation module based on light field sequence analysis and a depth optimization module based on conditional random field model.

基于光场序列分析的局部深度估计模块包括滑窗处理层、序列特征提取子网络、特征图变形层。其中,滑窗处理层负责在EPI合成图像ISEPI上滑动截取EPI块IEPI-p,输入到序列特征提取子网络。滑窗大小为(NAj,16),水平方向滑动步长为1,垂直方向滑动步长为NAj,滑窗超越ISEPI时,补0填充。The local depth estimation module based on light field sequence analysis includes a sliding window processing layer, a sequence feature extraction subnetwork, and a feature map deformation layer. Among them, the sliding window processing layer is responsible for sliding and intercepting the EPI block IEPI-p on the EPI composite image ISEPI , and inputting it to the sequence feature extraction sub-network. The size of the sliding window is (NAj ,16), the horizontal sliding step is 1, and the vertical sliding step is NAj . When the sliding window exceeds ISEPI , it is filled with 0s.

序列特征提取子网络是为提取EPI块IEPI-p的序列特征而专门设计的循环神经网络,包括序列化拆分处理、双向GRU层和全连接网络,如图7所示。序列化拆分处理是基于EPI图像上蕴含着深度信息的直线分布于多列像素之中的独特观察,而提出的EPI图像块序列化机制。具体地,将NAj×16的EIP图像块IEPI-p的每列像素,看作一个列向量

Figure BDA0003711659600000081
其中,x、y分别表示EPI图像块IEPI-p上像素的行、列坐标,
Figure BDA0003711659600000082
表示EPI图像块IEPI-p上(x,y)处像素的灰度值。因此,一个NAj×16的EPI图像块IEPI-p可以序列化为16个列向量Gy,0≤y≤15且y为整数。这些向量将依次作为后续双向GRU层每个时刻的输入。The sequence feature extraction subnetwork is a recurrent neural network specially designed for extracting the sequence features of EPI block IEPI-p , including serialization split processing, bidirectional GRU layer and fully connected network, as shown in Figure 7. The serialization splitting process is based on the unique observation that the straight lines containing the depth information on the EPI image are distributed in multiple columns of pixels, and the serialization mechanism of the EPI image block is proposed. Specifically, consider each column of pixels of NAj ×16 EIP image block IEPI-p as a column vector
Figure BDA0003711659600000081
Wherein, x and y represent the row and column coordinates of pixels on the EPI image block IEPI-p respectively,
Figure BDA0003711659600000082
Indicates the gray value of the pixel at (x, y) on the EPI image block IEPI-p . Therefore, an EPI image block IEPI-p of NAj ×16 can be serialized into 16 column vectors Gy , where 0≤y≤15 and y is an integer. These vectors will in turn serve as the input for each moment of the subsequent bidirectional GRU layer.

双向GRU层由两个方向的GRU单元构成,每个方向GRU单元的维度为256,每个GRU单元设置为非序列工作模式,即接收16个时刻的向量输入,产生1个输出。双向GRU层共计产生512个输出。The bidirectional GRU layer is composed of GRU units in two directions. The dimension of the GRU unit in each direction is 256. Each GRU unit is set to work in a non-sequential mode, that is, it receives vector inputs at 16 moments and generates 1 output. The bidirectional GRU layer produces a total of 512 outputs.

接下来的全连接网络包含两个全连接层。第一个全连接层接收双向GRU层的512个输出,产生16个输出;该层全连接配置ReLU激活函数。第二个全连接层接收前一个全连接层的16个输出,输出1个视差值;该全连接层不配置激活函数。The next fully connected network consists of two fully connected layers. The first fully connected layer receives 512 outputs from the bidirectional GRU layer and generates 16 outputs; this layer is fully connected to configure the ReLU activation function. The second fully connected layer receives 16 outputs from the previous fully connected layer andoutputs 1 disparity value; this fully connected layer does not configure an activation function.

特征图变形层的任务是将(NSk×NSl)个视差值序列,变形成NSk×NSl的矩阵,称为特征图,记为U。前面滑窗处理层按照设定的步长,在EPI合成图像ISEPI上滑动截取了(NSk×NSl)个EPI块IEPI-p,每个EPI块IEPI-p在经序列特征提取子网络处理得到1个视差值,所有EPI块共产生了(NSk×NSl)个视差值,特征图变形层调用Reshape处理,将其变形为NSk×NSl矩阵,记为U。The task of the feature map deformation layer is to transform (NSk ×NSl ) disparity value sequences into a matrix of NSk ×NSl , which is called a feature map and denoted as U. The front sliding window processing layer slides and intercepts (NSk ×NSl ) EPI blocks IEPI-p on the EPI composite image ISEPI according to the set step size, and each EPI block IEPI-p is extracted through sequence features The sub-network processing obtains 1 disparity value, and all EPI blocks generate a total of (NSk ×NSl ) disparity values, and the feature map deformation layer calls Reshape processing to transform it into a NSk ×NSl matrix, denoted as U .

基于条件随机场模型的深度优化模块,包括中心子孔径图像核参数提取和特征图迭代优化两部分,如图8所示。中心子孔径图像核参数提取部分主要功能是根据输入的中心子孔径图像计算滤波器核参数;特征图迭代优化部分是以条件随机场为理论基础,按照中心子孔径图像核参数提取部分所得滤波器核参数,将特征图迭代优化,得到视差图D。The deep optimization module based on the conditional random field model includes two parts: central sub-aperture image kernel parameter extraction and feature map iterative optimization, as shown in Figure 8. The main function of the central sub-aperture image kernel parameter extraction part is to calculate the filter kernel parameters according to the input central sub-aperture image; the feature map iterative optimization part is based on the conditional random field, according to the central sub-aperture image kernel parameter extraction part of the obtained filter The kernel parameters are used to iteratively optimize the feature map to obtain the disparity map D.

中心子孔径图像核参数提取部分以中心子孔径图像

Figure BDA0003711659600000091
为输入,计算两个全局连接卷积核的参数:1)计算空间/色彩卷积核F1,计算方法是
Figure BDA0003711659600000092
其中,pi、pj分别表示中心子孔径图像
Figure BDA0003711659600000093
上第i个、第j个像素的位置信息,ci、cj分别表示中心子孔径图像
Figure BDA0003711659600000094
上第i个、第j个像素的色彩信息,θα、θβ是自定义的带宽半径(这里,都设定为1)。2)计算空间卷积核F2,计算方法是
Figure BDA0003711659600000095
同样,pi、pj分别表示中心子孔径图像
Figure BDA0003711659600000096
上第i个、第j个像素的位置信息,θγ是自定义的带宽半径(这里设定为
Figure BDA0003711659600000097
)。The central sub-aperture image kernel parameter extraction part is based on the central sub-aperture image
Figure BDA0003711659600000091
As input, calculate the parameters of two globally connected convolution kernels: 1) Calculate the space/color convolution kernel F1 , the calculation method is
Figure BDA0003711659600000092
Among them, pi and pj represent the central sub-aperture image respectively
Figure BDA0003711659600000093
The position information of the i-th and j-th pixels above, ci and cj represent the central sub-aperture image respectively
Figure BDA0003711659600000094
The color information of the i-th and j-th pixels above, θα and θβ are custom bandwidth radii (here, both are set to 1). 2) Calculate the spatial convolution kernel F2 , the calculation method is
Figure BDA0003711659600000095
Similarly, pi and pj represent the central sub-aperture image respectively
Figure BDA0003711659600000096
The position information of the i-th and j-th pixels above, θγ is the custom bandwidth radius (here set as
Figure BDA0003711659600000097
).

特征图迭代优化部分包括并行滤波、一元项添加、归一化因子计算、归一化等四个模块。The feature map iterative optimization part includes four modules: parallel filtering, unary term addition, normalization factor calculation, and normalization.

并行滤波模块通过两个通路分别对本次迭代输入μt-1进行滤波处理:第一个通路利用卷积核F1对μt-1进行滤波,即

Figure BDA0003711659600000098
然后,对滤波结果
Figure BDA0003711659600000099
乘以权重参数θ1,即
Figure BDA00037116596000000910
类似地,第二通路用卷积核F2对μt-1进行滤波,即
Figure BDA00037116596000000911
然后,对滤波结果
Figure BDA00037116596000000912
乘以权重参数θ2,即
Figure BDA00037116596000000913
第一次迭代时,μt-1初始化为特征图U;θ1、θ2做随机初始化,通过网络训练获得更新。两个通路的结果
Figure BDA00037116596000000914
逐元素相加得到并行滤波模块的输出
Figure BDA00037116596000000915
Figure BDA00037116596000000916
The parallel filtering module filters the iterative input μt-1 through two channels respectively: the first channel uses the convolution kernel F1 to filter μt-1 , namely
Figure BDA0003711659600000098
Then, for the filtered result
Figure BDA0003711659600000099
Multiplied by the weight parameter θ1 , that is
Figure BDA00037116596000000910
Similarly, the second pass uses the convolution kernel F2 to filter μt-1 , namely
Figure BDA00037116596000000911
Then, for the filtered result
Figure BDA00037116596000000912
Multiplied by the weight parameter θ2 , that is
Figure BDA00037116596000000913
In the first iteration, μt-1 is initialized as the feature map U; θ1 and θ2 are randomly initialized and updated through network training. The result of the two pathways
Figure BDA00037116596000000914
Add element-wise to get the output of the parallel filter module
Figure BDA00037116596000000915
which is
Figure BDA00037116596000000916

一元项添加模块是将特征图U与并行滤波模块的结果

Figure BDA00037116596000000917
相叠加,得
Figure BDA00037116596000000918
Figure BDA00037116596000000919
The unary item addition module is the result of combining the feature map U with the parallel filtering module
Figure BDA00037116596000000917
superimposed, get
Figure BDA00037116596000000918
which is
Figure BDA00037116596000000919

归一化因子计算模块内部也进行了并行滤波和一元项添加操作,得到归一化因子γ;不同的是,其数据处理的对象是全1矩阵J,而不是μt-1和特征图U。归一化因子计算模块的具体处理步骤是:

Figure BDA0003711659600000101
Figure BDA0003711659600000102
The normalization factor calculation module also performs parallel filtering and unary item addition operations to obtain the normalization factor γ; the difference is that the object of its data processing is the all-one matrix J instead of μt-1 and the feature map U . The specific processing steps of the normalization factor calculation module are:
Figure BDA0003711659600000101
Figure BDA0003711659600000102

归一化模块是将一元项添加的模块的计算结果

Figure BDA0003711659600000103
对归一化因子γ按逐元素相除,得到本轮迭代的输出μt,即
Figure BDA0003711659600000104
The normalized module is the computation of the module adding the unary term
Figure BDA0003711659600000103
Divide the normalization factor γ element by element to get the output μt of this round of iteration, namely
Figure BDA0003711659600000104

特征图迭代优化部分是由四个模块构成的迭代过程,通常6次迭代即可取得理想的优化效果。最后一次迭代的输出即是优化的视差图D。The feature map iterative optimization part is an iterative process consisting of four modules, and usually 6 iterations can achieve the ideal optimization effect. The output of the last iteration is the optimized disparity map D.

步骤4、训练步骤3所述的LFRNN,得到网络最优参数集P。其特征在于训练步骤分为两个阶段进行,两个阶段均采用平均绝对误差作为损失函数。第一个阶段仅训练基于光场序列分析的局部深度估计模块,得到该模块的最优参数集P1;第二阶段冻结基于光场序列分析的局部深度估计模块的最优参数集P1,并训练整个网络,从而更新基于条件随机场模型的深度优化模块的参数,最终得到LFRNN网络的最优参数集P。Step 4. Train the LFRNN described in step 3 to obtain the optimal parameter set P of the network. It is characterized in that the training step is divided into two stages, and the average absolute error is used as the loss function in both stages. The first stage only trains the local depth estimation module based on light field sequence analysis, and obtains the optimal parameter set P1 of the module; the second stage freezes the optimal parameter set P1 of the local depth estimation module based on light field sequence analysis, and trains The entire network, so as to update the parameters of the deep optimization module based on the conditional random field model, and finally obtain the optimal parameter set P of the LFRNN network.

训练LFRNN网络包括如下步骤:Training the LFRNN network includes the following steps:

(4.1)准备光场数据集,并将其划分为训练集、验证集和测试集。光场数据集需包含场景光场数据和场景视差真值,具体地,可以使用现行公开的HCI光场数据集,也可通过Blender仿真软件合成光场数据,还可以通过光场相机和测距设备采集光场数据和深度真值。按照5:3:2的比例,将光场数据集随机分成训练集、验证集和测试集。(4.1) Prepare the light field dataset and divide it into training set, validation set and test set. The light field data set needs to contain the scene light field data and the true parallax value of the scene. Specifically, the current public HCI light field data set can be used, or the light field data can be synthesized through the Blender simulation software. The device collects light field data and true depth values. According to the ratio of 5:3:2, the light field dataset is randomly divided into training set, validation set and test set.

(4.2)准备网络训练所需的输入数据和真值数据。输入数据包括中心子孔径图像和EPI合成图像,分别按照步骤1和步骤2由光场数据集计算产生;真值数据直接由光场数据集提供。(4.2) Prepare the input data and ground truth data required for network training. The input data includes the central sub-aperture image and the EPI composite image, which are calculated and generated from the light field dataset according tosteps 1 and 2 respectively; the ground truth data is directly provided by the light field dataset.

(4.3)将基于光场序列分析的局部深度估计模块作为独立的网络进行训练并验证。首先,输入为EPI合成图像,输出的特征图作为估计的视差值,数据集提供的真值数据作为视差真值,由此计算平均绝对误差,反向传播优化网络参数,训练后得到该模块的最优参数集P1。其中,超参数batch设置为64,超参数epoch设置为10000;前2000个epoch学习率为0.1×10-3,后8000个epoch学习率为0.1×10-4。其次,在验证集上,验证该网络模块的泛化能力。(4.3) The local depth estimation module based on light field sequence analysis is trained and verified as an independent network. First, the input is the EPI composite image, the output feature map is used as the estimated disparity value, and the real value data provided by the data set is used as the disparity true value, from which the mean absolute error is calculated, and the network parameters are optimized by backpropagation, and the module is obtained after training The optimal parameter set P1 of . Among them, the hyperparameter batch is set to 64, and the hyperparameter epoch is set to 10000; the learning rate of the first 2000 epochs is 0.1×10-3 , and the learning rate of the last 8000 epochs is 0.1×10-4 . Second, verify the generalization ability of the network module on the verification set.

(4.4)训练并验证LFRNN,得到参数最优参数集P。首先,将基于光场序列分析的局部深度估计模块作为预训练网络,加载其参数集P1,并冻结该模块的参数更新;然后,输入EPI合成图像、中心子孔径图像,输出估计的视差值,参考视差真值计算平均绝对误差,反向传播优化LFRNN网络中基于条件随机场模型的深度优化模块的参数,最终得到LFRNN的最优参数集P。其中,超参数batch设置为64,超参数epoch设置为3000,学习率设置为0.1×10-4。最后,在验证集上测试整个网络的泛化能力。(4.4) Train and verify LFRNN to obtain the optimal parameter set P. First, the local depth estimation module based on light field sequence analysis is used as a pre-trained network, its parameter set P1 is loaded, and the parameter update of this module is frozen; then, the EPI composite image and the central sub-aperture image are input, and the estimated disparity value is output , calculate the mean absolute error with reference to the true value of the disparity, and backpropagate to optimize the parameters of the deep optimization module based on the conditional random field model in the LFRNN network, and finally obtain the optimal parameter set P of the LFRNN. Among them, the hyperparameter batch is set to 64, the hyperparameter epoch is set to 3000, and the learning rate is set to 0.1×10-4 . Finally, the generalization ability of the entire network is tested on the validation set.

LFRNN网络的测试与实用。对于步骤4所述测试集或者光场相机采集的4D光场数据,均可按照步骤1处理得到中心子孔径图像,按照步骤2处理得到EPI合成图像;然后,将所得中心子孔径图像和EPI合成图像输入到步骤3所述LFRNN网络;接着,加载步骤4所述的最优参数集P,进行前向计算,得到视差图D。Testing and practicality of LFRNN network. For the test set described in step 4 or the 4D light field data collected by the light field camera, the central sub-aperture image can be obtained by processing according tostep 1, and the EPI composite image can be obtained by processing according to step 2; then, the obtained central sub-aperture image and EPI are synthesized The image is input to the LFRNN network described in step 3; then, the optimal parameter set P described in step 4 is loaded, and the forward calculation is performed to obtain the disparity map D.

图9给出了本文方法与其他基于神经网络的深度估计方法的性能对比示例。以4幅典型场景为例,对比了EPINet、FusionNet和VommaNet等主流光场深度估计方法。第一列是场景的中心子孔径图像,第二至五列分别是本发明公开的方法、EPINet、FusionNet和VommaNet的处理结果;同一场景的处理结果排列在同一行;对比评价的指标是均方误差(MSE),每个处理结果图像上方的数字代表了对应方法在该场景上取得的MSE值;每一行后附加一个灰度标尺,指示了处理结果在各像素位置的误差分布,颜色越浅误差越小,颜色越深误差越大。由图9可知,本发明公开的LFRNN深度估计方法在前两个示例场景中取得了最好的MSE指标,在后两个示例场景中虽然总体MSE指标不及VammaNet方法,但多数像素的深度估计结果更接近真值,视觉效果明显优于VammaNet的结果。Figure 9 gives an example of the performance comparison between our method and other neural network-based depth estimation methods. Taking 4 typical scenes as examples, the mainstream light field depth estimation methods such as EPINet, FusionNet and VommaNet are compared. The first column is the central sub-aperture image of the scene, and the second to fifth columns are the processing results of the method disclosed in the present invention, EPINet, FusionNet and VommaNet respectively; the processing results of the same scene are arranged in the same row; the index of comparative evaluation is mean square Error (MSE), the number above each processing result image represents the MSE value obtained by the corresponding method on the scene; a gray scale is attached after each line, indicating the error distribution of the processing result at each pixel position, the lighter the color The smaller the error, the darker the color, the larger the error. It can be seen from Fig. 9 that the LFRNN depth estimation method disclosed in the present invention has achieved the best MSE index in the first two example scenarios. Although the overall MSE index is not as good as that of the VammaNet method in the latter two example scenarios, the depth estimation results of most pixels are Closer to the ground truth, the visual effect is significantly better than the results of VammaNet.

Claims (5)

Translated fromChinese
1.一种基于光场序列特征分析的光场深度估计方法,其特征在于,包括以下步骤:1. A light field depth estimation method based on light field sequence feature analysis, is characterized in that, comprises the following steps:(1)从4D光场数据中提取中心子孔径图像
Figure FDA0003711659590000011
其中(iC,jC)表示中心子孔径图像的视角坐标;(1) Extract the central sub-aperture image from the 4D light field data
Figure FDA0003711659590000011
Wherein (iC , jC ) represents the viewing angle coordinates of the central sub-aperture image;(2)由4D光场数据计算生成EPI合成图像ISEPI(2) generate EPI synthetic image ISEPI by calculation of 4D light field data;(3)构建光场神经网络模型LFRNN,接收ISEPI
Figure FDA0003711659590000012
输入,输出与中心子孔径图像
Figure FDA0003711659590000013
同分辨率的视差图D;所述光场神经网络模型LFRNN包括基于光场序列分析的局部深度估计模块和基于条件随机场模型的深度优化模块;
(3) Construct the light field neural network model LFRNN, receive ISEPI ,
Figure FDA0003711659590000012
input, output and central sub-aperture image
Figure FDA0003711659590000013
The disparity map D of the same resolution; the light field neural network model LFRNN includes a local depth estimation module based on light field sequence analysis and a depth optimization module based on a conditional random field model;
(4)训练步骤(3)构建的光场神经网络模型LFRNN,得到网络最优参数集P:将训练分为两个阶段进行,两个阶段均采用平均绝对误差作为损失函数;第一个阶段仅训练基于光场序列分析的局部深度估计模块,得到该模块的最优参数集P1;第二阶段冻结基于光场序列分析的局部深度估计模块的最优参数集P1,并训练整个网络,更新基于条件随机场模型的深度优化模块的参数,得到LFRNN网络的最优参数集P。(4) The light field neural network model LFRNN constructed in the training step (3) obtains the optimal parameter set P of the network: the training is divided into two stages, and the mean absolute error is used as the loss function in both stages; the first stage Only train the local depth estimation module based on light field sequence analysis, and obtain the optimal parameter set P1 of the module; in the second stage, freeze the optimal parameter set P1 of the local depth estimation module based on light field sequence analysis, and train the entire network, update Based on the parameters of the deep optimization module of the conditional random field model, the optimal parameter set P of the LFRNN network is obtained.2.根据权利要求1所述的一种基于光场序列特征分析的光场深度估计方法,其特征在于,所述步骤(1)实现过程如下:2. a kind of light field depth estimation method based on light field sequence feature analysis according to claim 1, is characterized in that, described step (1) implementation process is as follows:4D光场数据是由光场相机所采集光场图像的解码表示,记为L:(i,j,k,l)→L(i,j,k,l),其中,(i,j)表示微透镜图像的像素索引坐标或称视角坐标,(k,l)表示微透镜中心的索引坐标,i,j,k,l均为整数,L(i,j,k,l)表示(i,j)视角下通过(k,l)位置处的光线的辐射强度;抽取每个微透镜图像的中心像素,按微透镜位置索引排列得到二维图像,即
Figure FDA0003711659590000014
其中(iC,jC)表示中心子孔径图像的视角坐标。
The 4D light field data is the decoded representation of the light field image collected by the light field camera, which is recorded as L:(i,j,k,l)→L(i,j,k,l), where (i,j) Represents the pixel index coordinates of the microlens image or view coordinates, (k,l) represents the index coordinates of the microlens center, i,j,k,l are integers, L(i,j,k,l) represents (i , j) Radiation intensity of the light passing through the position (k,l) under the viewing angle; the central pixel of each microlens image is extracted, and the two-dimensional image is obtained according to the position index of the microlens, namely
Figure FDA0003711659590000014
Where (iC , jC ) represents the viewing angle coordinates of the central sub-aperture image.
3.根据权利要求1所述的一种基于光场序列特征分析的光场深度估计方法,其特征在于,所述步骤(2)实现过程如下:3. a kind of light field depth estimation method based on light field sequence feature analysis according to claim 1, is characterized in that, described step (2) implementation process is as follows:(21)根据输入4D光场的维度,初始化ISEPI为全0矩阵:(21) According to the dimension of the input 4D light field, initialize ISEPI as a matrix of all 0s:4D光场L:(i,j,k,l)→L(i,j,k,l)中,角度分辨率为NAi×NAj,即i∈[0,NAi),j∈[0,NAj);空间分辨率为NSk×NSl,即k∈[0,NSk),l∈[0,NSl);则ISEPI是(NSk×NAj)×NSl的二维矩阵,初始化为全0矩阵;In a 4D light field L:(i,j,k,l)→L(i,j,k,l), the angular resolution is NAi ×NAj , that is, i∈[0,NAi ), j∈[ 0,NAj ); the spatial resolution is NSk ×NSl , that is, k∈[0,NSk ), l∈[0,NSl ); then the ISEPI is (NSk ×NAj )×NSl A two-dimensional matrix, initialized to a matrix of all 0s;(22)对于4D光场第三维k的每一行,行序号为k*,计算其对应的EPI图像
Figure FDA0003711659590000021
并使用
Figure FDA0003711659590000022
更新ISEPI的部分区域:
(22) For each row of the third dimension k of the 4D light field, the row number is k* , and the corresponding EPI image is calculated
Figure FDA0003711659590000021
and use
Figure FDA0003711659590000022
Update some areas of ISEPI :
由4D光场数据计算生成第三维第k*行对应的EPI图像的过程看作一个映射:
Figure FDA0003711659590000023
即固定4D光场中的第一和第三两个维度,变化另外两个维度所得到的二维切片图像,令i=iC,k=k*;
The process of calculating and generating the EPI image corresponding to the k* th row in the third dimension from the 4D light field data is regarded as a mapping:
Figure FDA0003711659590000023
That is, fix the first and third dimensions in the 4D light field, and change the two-dimensional slice image obtained by changing the other two dimensions, let i=iC , k=k*;
使用所得的
Figure FDA0003711659590000024
更新ISEPI的部分区域,即
Figure FDA0003711659590000025
这里,ISEPI((k*-1)×NAj:k*×NAj,0:NSl)表示ISEPI中第(k*-1)×NAj行至第k*×NAj-1行,第0列至第NSl-1列的一块区域;
use the proceeds
Figure FDA0003711659590000024
Update some areas of ISEPI , namely
Figure FDA0003711659590000025
Here, ISEPI ((k*-1)×NAj :k*×NAj ,0:NSl ) means that the line (k*-1)×NAj to k*×NAj -1 in ISEPI row, an area from column 0 to column NSl -1;
(23)对4D光场第三维的每一行进行步骤(22)的操作,计算生成EPI合成图像ISEPI(23) Perform the operation of step (22) on each row of the third dimension of the 4D light field to calculate and generate an EPI composite image ISEPI .
4.根据权利要求1所述的一种基于光场序列特征分析的光场深度估计方法,其特征在于,步骤(3)所述基于光场序列分析的局部深度估计模块实包括滑窗处理层、序列特征提取子网络、特征图变形层;4. a kind of light field depth estimation method based on light field sequence feature analysis according to claim 1, is characterized in that, step (3) described local depth estimation module based on light field sequence analysis actually comprises sliding window processing layer , sequence feature extraction sub-network, feature map deformation layer;所述滑窗处理层负责在EPI合成图像ISEPI上滑动截取EPI块IEPI-p,输入到序列特征提取子网络;滑窗大小为(NAj,16),水平方向滑动步长为1,垂直方向滑动步长为NAj,滑窗超越ISEPI时,补0填充;The sliding window processing layer is responsible for sliding and intercepting the EPI block IEPI-p on the EPI composite image ISEPI , and inputting it to the sequence feature extraction sub-network; the size of the sliding window is (NAj , 16), and the horizontal sliding step is 1, The sliding step in the vertical direction is NAj , and when the sliding window exceeds ISEPI , fill it with 0;所述序列特征提取子网络为提取EPI块IEPI-p的序列特征的循环神经网络,包括序列化拆分处理、双向GRU层和全连接网络;其中序列化拆分处理是基于EPI图像上蕴含着深度信息的直线分布于多列像素之中的独特观察,将NAj×16的EIP图像块IEPI-p的每列像素,看作一个列向量
Figure FDA0003711659590000026
其中,x、y分别表示EPI图像块IEPI-p上像素的行、列坐标,
Figure FDA0003711659590000027
表示EPI图像块IEPI-p上(x,y)处像素的灰度值;一个NAj×16的EPI图像块IEPI-p可以序列化为16个列向量Gy,0≤y≤15且y为整数;向量Gy将依次作为后续双向GRU层每个时刻的输入;双向GRU层由两个方向的GRU单元构成,每个方向GRU单元的维度为256,每个GRU单元设置为非序列工作模式,接收16个时刻的向量输入,产生1个输出;双向GRU层共计产生512个输出;全连接网络包含两个全连接层;第一个全连接层接收双向GRU层的512个输出,产生16个输出;该层全连接配置ReLU激活函数;第二个全连接层接收前一个全连接层的16个输出,输出1个视差值;该全连接层不配置激活函数;
The sequence feature extraction sub-network is a recurrent neural network that extracts the sequence features of the EPI block IEPI-p , including serialization split processing, bidirectional GRU layer and fully connected network; wherein the serialization split processing is based on the EPI image contains According to the unique observation that the straight line of depth information is distributed in multiple columns of pixels, each column of pixels in the EIP image block IEPI-p of NAj ×16 is regarded as a column vector
Figure FDA0003711659590000026
Wherein, x and y represent the row and column coordinates of pixels on the EPI image block IEPI-p respectively,
Figure FDA0003711659590000027
Indicates the gray value of the pixel at (x, y) on the EPI image block I EPI-p ; an EPI image block IEPI-p of NAj ×16 can be serialized into 16 column vectors Gy , 0≤y≤15 And y is an integer; the vector Gy will be used as the input of the subsequent two-way GRU layer at each moment in turn; the two-way GRU layer is composed of GRU units in two directions, the dimension of each direction GRU unit is 256, and each GRU unit is set to non Sequential working mode, receiving vector input at 16 moments and generating 1 output; the bidirectional GRU layer generates a total of 512 outputs; the fully connected network includes two fully connected layers; the first fully connected layer receives 512 outputs from the bidirectional GRU layer , generating 16 outputs; this layer is fully connected and configured with a ReLU activation function; the second fully connected layer receives 16 outputs from the previous fully connected layer, and outputs 1 disparity value; this fully connected layer does not configure an activation function;
所述特征图变形层将(NSk×NSl)个视差值序列,变形成NSk×NSl的矩阵,称为特征图,记为U。The feature map deformation layer transforms (NSk ×NSl ) disparity value sequences into a matrix of NSk ×NSl , which is called a feature map, denoted as U.
5.根据权利要求1所述的一种基于光场序列特征分析的光场深度估计方法,其特征在于,步骤(3)所述基于条件随机场模型的深度优化模块包括中心子孔径图像核参数提取和特征图迭代优化两部分;中心子孔径图像核参数提取部分是根据输入的中心子孔径图像计算滤波器核参数;特征图迭代优化部分是以条件随机场为理论基础,按照中心子孔径图像核参数提取部分所得滤波器核参数,将特征图迭代优化,得到视差图D;5. a kind of light field depth estimation method based on light field sequence feature analysis according to claim 1, is characterized in that, the depth optimization module based on conditional random field model described in step (3) comprises central sub-aperture image kernel parameter There are two parts: extraction and feature map iterative optimization; the central sub-aperture image kernel parameter extraction part is to calculate the filter kernel parameters according to the input central sub-aperture image; the feature map iterative optimization part is based on the conditional random field, according to the central sub-aperture image Kernel parameters are obtained by extracting part of the filter kernel parameters, and the feature map is iteratively optimized to obtain the disparity map D;中心子孔径图像核参数提取部分以中心子孔径图像
Figure FDA0003711659590000031
为输入,计算空间和色彩卷积核F1和空间卷积核F2
The central sub-aperture image kernel parameter extraction part is based on the central sub-aperture image
Figure FDA0003711659590000031
As input, compute the spatial and color convolution kernel F1 and the spatial convolution kernel F2 :
Figure FDA0003711659590000032
Figure FDA0003711659590000032
Figure FDA0003711659590000033
Figure FDA0003711659590000033
其中,pi、pj分别表示中心子孔径图像
Figure FDA0003711659590000034
上第i个、第j个像素的位置信息,ci、cj分别表示中心子孔径图像
Figure FDA0003711659590000035
上第i个、第j个像素的色彩信息,θα、θβ、θγ为自定义的带宽半径;
Among them, pi and pj represent the central sub-aperture image respectively
Figure FDA0003711659590000034
The position information of the i-th and j-th pixels above, ci and cj represent the central sub-aperture image respectively
Figure FDA0003711659590000035
Color information of the i-th and j-th pixels above, θα , θβ , θγ are custom bandwidth radii;
特征图迭代优化部分包括并行滤波、一元项叠加、归一化因子计算、归一化四个模块;并行滤波模块通过两个通路分别对本次迭代输入μt-1进行滤波处理:第一个通路利用卷积核F1对μt-1进行滤波,即
Figure FDA0003711659590000036
然后,对滤波结果
Figure FDA0003711659590000037
乘以权重参数θ1,即
Figure FDA0003711659590000038
第二通路用卷积核F2对μt-1进行滤波,即
Figure FDA0003711659590000041
然后,对滤波结果
Figure FDA0003711659590000042
乘以权重参数θ2,即
Figure FDA0003711659590000043
第一次迭代时,μt-1初始化为特征图U;θ1、θ2做随机初始化,通过网络训练获得更新;两个通路的结果
Figure FDA0003711659590000044
逐元素相加得到并行滤波模块的输出
Figure FDA0003711659590000045
Figure FDA0003711659590000046
一元项添加模块是将特征图U与并行滤波模块的结果
Figure FDA0003711659590000047
相叠加,得
Figure FDA0003711659590000048
Figure FDA0003711659590000049
归一化因子计算模块内部也进行了并行滤波和一元项添加操作,得到归一化因子γ;其数据处理的对象是全1矩阵J,而不是μt-1和特征图U;归一化因子计算模块的具体处理步骤是:
Figure FDA00037116595900000410
Figure FDA00037116595900000411
归一化模块是将一元项添加的模块的计算结果
Figure FDA00037116595900000412
对归一化因子γ按逐元素相除,得到本轮迭代的输出μt,即
Figure FDA00037116595900000413
最后一次迭代的输出即是优化的视差图D。
The feature map iterative optimization part includes four modules: parallel filtering, unary item superposition, normalization factor calculation, and normalization; the parallel filtering module performs filtering processing on the iterative input μt-1 through two channels: the first The channel uses the convolution kernel F1 to filter μt-1 , that is
Figure FDA0003711659590000036
Then, for the filtered result
Figure FDA0003711659590000037
Multiplied by the weight parameter θ1 , that is
Figure FDA0003711659590000038
The second pass uses the convolution kernel F2 to filter μt-1 , namely
Figure FDA0003711659590000041
Then, for the filtered result
Figure FDA0003711659590000042
Multiplied by the weight parameter θ2 , that is
Figure FDA0003711659590000043
In the first iteration, μt-1 is initialized as the feature map U; θ1 and θ2 are randomly initialized and updated through network training; the results of the two paths
Figure FDA0003711659590000044
Add element-wise to get the output of the parallel filter module
Figure FDA0003711659590000045
which is
Figure FDA0003711659590000046
The unary item addition module is the result of combining the feature map U with the parallel filtering module
Figure FDA0003711659590000047
superimposed, get
Figure FDA0003711659590000048
which is
Figure FDA0003711659590000049
The normalization factor calculation module also performs parallel filtering and unary item addition operations to obtain the normalization factor γ; the object of its data processing is the all-one matrix J instead of μt-1 and the feature map U; the normalization The specific processing steps of the factor calculation module are:
Figure FDA00037116595900000410
Figure FDA00037116595900000411
The normalized module is the computation of the module adding the unary term
Figure FDA00037116595900000412
Divide the normalization factor γ element by element to get the output μt of this round of iteration, namely
Figure FDA00037116595900000413
The output of the last iteration is the optimized disparity map D.
CN202210721840.1A2022-06-242022-06-24 A light field depth estimation method based on light field sequence feature analysisActiveCN115272435B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202210721840.1ACN115272435B (en)2022-06-242022-06-24 A light field depth estimation method based on light field sequence feature analysis

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202210721840.1ACN115272435B (en)2022-06-242022-06-24 A light field depth estimation method based on light field sequence feature analysis

Publications (2)

Publication NumberPublication Date
CN115272435Atrue CN115272435A (en)2022-11-01
CN115272435B CN115272435B (en)2025-05-06

Family

ID=83762435

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202210721840.1AActiveCN115272435B (en)2022-06-242022-06-24 A light field depth estimation method based on light field sequence feature analysis

Country Status (1)

CountryLink
CN (1)CN115272435B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116070687A (en)*2023-03-062023-05-05浙江优众新材料科技有限公司Neural network light field representation method based on global ray space affine transformation
CN119903750A (en)*2025-01-102025-04-29南京大学 Design method and device of monocular depth imaging equipment based on vector light field control

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107993260A (en)*2017-12-142018-05-04浙江工商大学A kind of light field image depth estimation method based on mixed type convolutional neural networks
CN112116646A (en)*2020-09-232020-12-22南京工程学院Light field image depth estimation method based on depth convolution neural network
CN112288789A (en)*2020-10-262021-01-29杭州电子科技大学Light field depth self-supervision learning method based on occlusion region iterative optimization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107993260A (en)*2017-12-142018-05-04浙江工商大学A kind of light field image depth estimation method based on mixed type convolutional neural networks
CN112116646A (en)*2020-09-232020-12-22南京工程学院Light field image depth estimation method based on depth convolution neural network
CN112288789A (en)*2020-10-262021-01-29杭州电子科技大学Light field depth self-supervision learning method based on occlusion region iterative optimization

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116070687A (en)*2023-03-062023-05-05浙江优众新材料科技有限公司Neural network light field representation method based on global ray space affine transformation
CN119903750A (en)*2025-01-102025-04-29南京大学 Design method and device of monocular depth imaging equipment based on vector light field control
CN119903750B (en)*2025-01-102025-09-05南京大学 Design method and device of monocular depth imaging equipment based on vector light field control

Also Published As

Publication numberPublication date
CN115272435B (en)2025-05-06

Similar Documents

PublicationPublication DateTitle
CN110458939B (en)Indoor scene modeling method based on visual angle generation
CN110443882B (en)Light field microscopic three-dimensional reconstruction method and device based on deep learning algorithm
Fahringer et al.Volumetric particle image velocimetry with a single plenoptic camera
CN106846463B (en) Three-dimensional reconstruction method and system of microscopic image based on deep learning neural network
EP4191539A1 (en)Method for performing volumetric reconstruction
CN110047144A (en)A kind of complete object real-time three-dimensional method for reconstructing based on Kinectv2
CN114170290B (en)Image processing method and related equipment
CN111260707B (en)Depth estimation method based on light field EPI image
CN110570522A (en) A multi-view 3D reconstruction method
CN105981050A (en)Method and system for exacting face features from data of face images
CN117172134B (en) Multi-scale DEM modeling method of lunar surface based on fused terrain features
CN112116646B (en) A light field image depth estimation method based on deep convolutional neural network
CN115272435A (en) A light field depth estimation method based on light field sequence feature analysis
CN114820299B (en) A method and device for super-resolution image restoration of non-uniform motion blur
CN112330795A (en) Human 3D Reconstruction Method and System Based on Single RGBD Image
CN117974895B (en) A pipeline monocular video 3D reconstruction and depth prediction method and system
Jancosek et al.Scalable multi-view stereo
CN111028273A (en) A light field depth estimation method based on multi-stream convolutional neural network and its implementation system
CN117557739A (en)Three-dimensional reconstruction method and device for surface exposed body
CN111582437B (en)Construction method of parallax regression depth neural network
Lin et al.A-SATMVSNet: An attention-aware multi-view stereo matching network based on satellite imagery
Ashfaq et al.3D Point Cloud Generation to Understand Real Object Structure via Graph Convolutional Networks.
CN114463175A (en)Mars image super-resolution method based on deep convolution neural network
Fahringer et al.The effect of grid resolution on the accuracy of tomographic reconstruction using a plenoptic camera
CN113486928A (en)Multi-view image alignment method based on rational polynomial model differentiable tensor expression

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp