








技术领域technical field
本发明属于计算机视觉与人工智能技术领域,具体涉及一种基于光场序列特征分析的光场深度估计方法。The invention belongs to the technical field of computer vision and artificial intelligence, and in particular relates to a light field depth estimation method based on light field sequence feature analysis.
背景技术Background technique
微透镜光场相机已经进入消费类电子领域,具有较大工业应用和学术研究价值。微透镜光场相机为解决深度估计问题提供了新途径。一方面,光场成像不仅能够记录光线位置,而且能够记录光线的方向,为深度估计提供了几何依据;另一方面,微透镜光场相机具有单目多视角图像采集能力,方便了视觉系统部署,为拓展深度估计应用奠定了物理基础。Microlens light field cameras have entered the field of consumer electronics, and have great industrial application and academic research value. Microlens light-field cameras provide a new way to solve the depth estimation problem. On the one hand, light field imaging can not only record the position of light, but also record the direction of light, which provides a geometric basis for depth estimation; on the other hand, the microlens light field camera has the ability of monocular multi-view image acquisition, which facilitates the deployment of vision systems , laying a physical foundation for expanding the application of depth estimation.
基于微透镜光场相机的深度估计是近十年兴起的研究热点,大致分为传统深度估计和基于学习的深度估计两大类。传统深度估计方法主要基于结构算子提取特征,根据光场成像几何反演深度信息。例如:Tao等人运用方差和均值算子度量裁切EPI,融合散焦和对应两种深度线索,进而估计场景深度。Wanner等人利用2D结构张量估计EPI图像上的直线斜率,得到深度信息。Zhang等人引入平行四边形算子定位EPI图像上直线位置,进行深度估计。中国发明专利“一种基于光场的深度估计(ZL201510040975.1)”也利用结构张量作为初始深度估计的方法。基于结构算子的传统深度估计方法,可解释性强,但算子描述能力有限,深度估计的准确度存在提升瓶颈。Depth estimation based on microlens light field cameras is a research hotspot that has emerged in the past decade, and can be roughly divided into two categories: traditional depth estimation and learning-based depth estimation. Traditional depth estimation methods are mainly based on structural operators to extract features, and invert depth information based on light field imaging geometry. For example: Tao et al. use the variance and mean operator to measure the cut EPI, fuse defocus and corresponding two depth cues, and then estimate the depth of the scene. Wanner et al. used the 2D structure tensor to estimate the slope of the line on the EPI image to obtain depth information. Zhang et al. introduced a parallelogram operator to locate the linear position on the EPI image for depth estimation. The Chinese invention patent "A Depth Estimation Based on Light Field (ZL201510040975.1)" also uses the structure tensor as the initial depth estimation method. The traditional depth estimation method based on structural operators has strong interpretability, but the operator's description ability is limited, and there is a bottleneck in improving the accuracy of depth estimation.
近年来,随着深度学习技术的兴起,基于学习的光场深度估计方法受到青睐。Heber等人首先提出了基于深度学习的光场深度估计方法,利用卷积神经网络提取EPI块的特征,进而回归得到对应深度值。Shin等人在多个方向EPI图像的输入流上进行卷积处理,融合得到深度值。Han等人提出生成EPI合成图像的方法,进而利用多流卷积和跃层融合估计场景深度。Tsai等人引入注意力机制选择更有效的子孔径图像,然后运用卷积提取特征,并得到场景深度。中国发明专利“一种基于混合型卷积神经网络的光场图像深度估计方法(ZL201711337965.X)”公开了光场图像深度估计方法,利用卷积网络神经网络提取水平方向EPI块和垂直方向EPI块的信息,然后进行回归融合,得到深度图。中国发明专利申请“一种基于多模态信息的光场深度估计方法(公布号:CN 112767466 A)”利用卷积和空洞卷积对焦点堆栈和中心视图进行分析处理,进而预测场景深度。In recent years, with the rise of deep learning techniques, learning-based light field depth estimation methods are favored. Heber et al. first proposed a deep learning-based light field depth estimation method, using a convolutional neural network to extract the features of the EPI block, and then regression to obtain the corresponding depth value. Shin et al. performed convolution processing on the input stream of EPI images in multiple directions, and fused them to obtain depth values. Han et al. propose a method to generate EPI synthetic images, and then use multi-stream convolution and layer-hopping fusion to estimate scene depth. Tsai et al. introduced an attention mechanism to select more effective sub-aperture images, and then used convolution to extract features and obtain scene depth. The Chinese invention patent "A method for estimating the depth of light field images based on hybrid convolutional neural network (ZL201711337965.X)" discloses a method for estimating the depth of light field images, using the convolutional network neural network to extract horizontal EPI blocks and vertical EPI Block information, and then perform regression fusion to obtain a depth map. The Chinese invention patent application "A Light Field Depth Estimation Method Based on Multimodal Information (Publication No.: CN 112767466 A)" uses convolution and atrous convolution to analyze and process the focus stack and central view, and then predict the depth of the scene.
深度估计理论建模、光场数据提取方法、神经网络设计等影响着深度估计效果。目前,基于学习的方法已经成为光场深度估计方法的主流,取得了长足进步;但深度估计的准确度,以及在遮挡、噪声等方面的鲁棒性均有待提高,尤其是光场数据提取和神经网络数据处理等技术环节亟待革新突破。为此,本发明公开了一种基于向量序列分析的光场图像处理方法,并设计了局部深度估计和全局优化为一体的端到端深度估计网络,运用该网络进行光场深度估计,准确度显著提高,为三维重建、三维缺陷检测等应用提供了良好支撑。Depth estimation theoretical modeling, light field data extraction methods, neural network design, etc. affect the depth estimation effect. At present, learning-based methods have become the mainstream of light field depth estimation methods and have made great progress; however, the accuracy of depth estimation and the robustness in terms of occlusion and noise need to be improved, especially light field data extraction and Technical links such as neural network data processing are in urgent need of innovation and breakthrough. To this end, the present invention discloses a light field image processing method based on vector sequence analysis, and designs an end-to-end depth estimation network integrating local depth estimation and global optimization. Significantly improved, providing good support for 3D reconstruction, 3D defect detection and other applications.
发明内容Contents of the invention
发明目的:本发明提供一种基于光场序列特征分析的光场深度估计方法,能够估计高准确度深度结果,支撑光场三维重建、缺陷检测等应用。Purpose of the invention: The present invention provides a light field depth estimation method based on light field sequence feature analysis, which can estimate high-accuracy depth results and support applications such as light field three-dimensional reconstruction and defect detection.
技术方案:本发明所述的一种基于光场序列特征分析的光场深度估计方法,具体包括以下步骤:Technical solution: A light field depth estimation method based on light field sequence feature analysis according to the present invention specifically includes the following steps:
(1)从4D光场数据中提取中心子孔径图像其中(iC,jC)表示中心子孔径图像的视角坐标;(1) Extract the central sub-aperture image from the 4D light field data Wherein (iC , jC ) represents the viewing angle coordinates of the central sub-aperture image;
(2)由4D光场数据计算生成EPI合成图像ISEPI;(2) generate EPI synthetic image ISEPI by calculation of 4D light field data;
(3)构建光场神经网络模型LFRNN,接收ISEPI、输入,输出与中心子孔径图像同分辨率的视差图D;所述光场神经网络模型LFRNN包括基于光场序列分析的局部深度估计模块和基于条件随机场模型的深度优化模块;(3) Construct the light field neural network model LFRNN, receive ISEPI , input, output and central sub-aperture image The disparity map D of the same resolution; the light field neural network model LFRNN includes a local depth estimation module based on light field sequence analysis and a depth optimization module based on a conditional random field model;
(4)训练步骤(3)构建的光场神经网络模型LFRNN,得到网络最优参数集P:将训练分为两个阶段进行,两个阶段均采用平均绝对误差作为损失函数;第一个阶段仅训练基于光场序列分析的局部深度估计模块,得到该模块的最优参数集P1;第二阶段冻结基于光场序列分析的局部深度估计模块的最优参数集P1,并训练整个网络,更新基于条件随机场模型的深度优化模块的参数,得到LFRNN网络的最优参数集P。(4) The light field neural network model LFRNN constructed in the training step (3) obtains the optimal parameter set P of the network: the training is divided into two stages, and the mean absolute error is used as the loss function in both stages; the first stage Only train the local depth estimation module based on light field sequence analysis, and obtain the optimal parameter set P1 of the module; in the second stage, freeze the optimal parameter set P1 of the local depth estimation module based on light field sequence analysis, and train the entire network, update Based on the parameters of the deep optimization module of the conditional random field model, the optimal parameter set P of the LFRNN network is obtained.
进一步地,所述步骤(1)实现过程如下:Further, the implementation process of the step (1) is as follows:
4D光场数据是由光场相机所采集光场图像的解码表示,记为L:(i,j,k,l)→L(i,j,k,l),其中,(i,j)表示微透镜图像的像素索引坐标或称视角坐标,(k,l)表示微透镜中心的索引坐标,i,j,k,l均为整数,L(i,j,k,l)表示(i,j)视角下通过(k,l)位置处的光线的辐射强度;抽取每个微透镜图像的中心像素,按微透镜位置索引排列得到二维图像,即其中(iC,jC)表示中心子孔径图像的视角坐标。The 4D light field data is the decoded representation of the light field image collected by the light field camera, which is recorded as L:(i,j,k,l)→L(i,j,k,l), where (i,j) Represents the pixel index coordinates of the microlens image or view coordinates, (k,l) represents the index coordinates of the microlens center, i,j,k,l are integers, L(i,j,k,l) represents (i , j) Radiation intensity of the light passing through the position (k,l) under the viewing angle; the central pixel of each microlens image is extracted, and the two-dimensional image is obtained according to the position index of the microlens, namely Where (iC , jC ) represents the viewing angle coordinates of the central sub-aperture image.
进一步地,所述步骤(2)实现过程如下:Further, the implementation process of the step (2) is as follows:
(21)根据输入4D光场的维度,初始化ISEPI为全0矩阵:(21) According to the dimension of the input 4D light field, initialize ISEPI as a matrix of all 0s:
4D光场L:(i,j,k,l)→L(i,j,k,l)中,角度分辨率为NAi×NAj,即i∈[0,NAi),j∈[0,NAj);空间分辨率为NSk×NSl,即k∈[0,NSk),l∈[0,NSl);则ISEPI是(NSk×NAj)×NSl的二维矩阵,初始化为全0矩阵;In a 4D light field L:(i,j,k,l)→L(i,j,k,l), the angular resolution is NAi ×NAj , that is, i∈[0,NAi ), j∈[ 0,NAj ); the spatial resolution is NSk ×NSl , that is, k∈[0,NSk ), l∈[0,NSl ); then the ISEPI is (NSk ×NAj )×NSl A two-dimensional matrix, initialized to a matrix of all 0s;
(22)对于4D光场第三维k的每一行,行序号为k*,计算其对应的EPI图像并使用更新ISEPI的部分区域:(22) For each row of the third dimension k of the 4D light field, the row number is k* , and the corresponding EPI image is calculated and use Update some areas of ISEPI :
由4D光场数据计算生成第三维第k*行对应的EPI图像的过程看作一个映射:即固定4D光场中的第一和第三两个维度,变化另外两个维度所得到的二维切片图像,令i=iC,k=k*;The process of calculating and generating the EPI image corresponding to the k* th row in the third dimension from the 4D light field data is regarded as a mapping: That is, fix the first and third dimensions in the 4D light field, and change the two-dimensional slice image obtained by changing the other two dimensions, let i=iC , k=k*;
使用所得的更新ISEPI的部分区域,即这里,ISEPI((k*-1)×NAj:k*×NAj,0:NSl)表示ISEPI中第(k*-1)×NAj行至第k*×NAj-1行,第0列至第NSl-1列的一块区域;use the proceeds Update some areas of ISEPI , namely Here, ISEPI ((k*-1)×NAj :k*×NAj ,0:NSl ) means that the line (k*-1)×NAj to k*×NAj -1 in ISEPI row, an area from column 0 to column NSl -1;
(23)对4D光场第三维的每一行进行步骤(22)的操作,计算生成EPI合成图像ISEPI。(23) Perform the operation of step (22) on each row of the third dimension of the 4D light field to calculate and generate an EPI composite image ISEPI .
进一步地,步骤(3)所述基于光场序列分析的局部深度估计模块实包括滑窗处理层、序列特征提取子网络、特征图变形层;Further, the local depth estimation module based on light field sequence analysis described in step (3) actually includes a sliding window processing layer, a sequence feature extraction subnetwork, and a feature map deformation layer;
所述滑窗处理层负责在EPI合成图像ISEPI上滑动截取EPI块IEPI-p,输入到序列特征提取子网络;滑窗大小为(NAj,16),水平方向滑动步长为1,垂直方向滑动步长为NAj,滑窗超越ISEPI时,补0填充;The sliding window processing layer is responsible for sliding and intercepting the EPI block IEPI-p on the EPI composite image ISEPI , and inputting it to the sequence feature extraction sub-network; the size of the sliding window is (NAj , 16), and the horizontal sliding step is 1, The sliding step in the vertical direction is NAj , and when the sliding window exceeds ISEPI , fill it with 0;
所述序列特征提取子网络为提取EPI块IEPI-p的序列特征的循环神经网络,包括序列化拆分处理、双向GRU层和全连接网络;其中序列化拆分处理是基于EPI图像上蕴含着深度信息的直线分布于多列像素之中的独特观察,将NAj×16的EIP图像块IEPI-p的每列像素,看作一个列向量其中,x、y分别表示EPI图像块IEPI-p上像素的行、列坐标,表示EPI图像块IEPI-p上(x,y)处像素的灰度值;一个NAj×16的EPI图像块IEPI-p可以序列化为16个列向量Gy,0≤y≤15且y为整数;向量Gy将依次作为后续双向GRU层每个时刻的输入;双向GRU层由两个方向的GRU单元构成,每个方向GRU单元的维度为256,每个GRU单元设置为非序列工作模式,接收16个时刻的向量输入,产生1个输出;双向GRU层共计产生512个输出;全连接网络包含两个全连接层;第一个全连接层接收双向GRU层的512个输出,产生16个输出;该层全连接配置ReLU激活函数;第二个全连接层接收前一个全连接层的16个输出,输出1个视差值;该全连接层不配置激活函数;The sequence feature extraction sub-network is a recurrent neural network that extracts the sequence features of the EPI block IEPI-p , including serialization split processing, bidirectional GRU layer and fully connected network; wherein the serialization split processing is based on the EPI image contains According to the unique observation that the straight line of depth information is distributed in multiple columns of pixels, each column of pixels in the EIP image block IEPI-p of NAj ×16 is regarded as a column vector Wherein, x and y represent the row and column coordinates of pixels on the EPI image block IEPI-p respectively, Indicates the gray value of the pixel at (x, y) on the EPI image block I EPI-p ; an EPI image block IEPI-p of NAj ×16 can be serialized into 16 column vectors Gy , 0≤y≤15 And y is an integer; the vector Gy will be used as the input of the subsequent two-way GRU layer at each moment in turn; the two-way GRU layer is composed of GRU units in two directions, the dimension of each direction GRU unit is 256, and each GRU unit is set to non Sequential working mode, receiving vector input at 16 moments and generating 1 output; the bidirectional GRU layer generates a total of 512 outputs; the fully connected network includes two fully connected layers; the first fully connected layer receives 512 outputs from the bidirectional GRU layer , generating 16 outputs; this layer is fully connected and configured with a ReLU activation function; the second fully connected layer receives 16 outputs from the previous fully connected layer, and
所述特征图变形层将(NSk×NSl)个视差值序列,变形成NSk×NSl的矩阵,称为特征图,记为U。The feature map deformation layer transforms (NSk ×NSl ) disparity value sequences into a matrix of NSk ×NSl , which is called a feature map, denoted as U.
进一步地,步骤(3)所述基于条件随机场模型的深度优化模块包括中心子孔径图像核参数提取和特征图迭代优化两部分;中心子孔径图像核参数提取部分是根据输入的中心子孔径图像计算滤波器核参数;特征图迭代优化部分是以条件随机场为理论基础,按照中心子孔径图像核参数提取部分所得滤波器核参数,将特征图迭代优化,得到视差图D;Further, the depth optimization module based on the conditional random field model described in step (3) includes two parts: central sub-aperture image kernel parameter extraction and feature map iterative optimization; the central sub-aperture image kernel parameter extraction part is based on the input central sub-aperture image Calculate the filter kernel parameters; the iterative optimization part of the feature map is based on the conditional random field, and extract the filter kernel parameters according to the central sub-aperture image kernel parameters, and iteratively optimize the feature map to obtain the disparity map D;
中心子孔径图像核参数提取部分以中心子孔径图像为输入,计算空间和色彩卷积核F1和空间卷积核F2:The central sub-aperture image kernel parameter extraction part is based on the central sub-aperture image As input, compute the spatial and color convolution kernel F1 and the spatial convolution kernel F2 :
其中,pi、pj分别表示中心子孔径图像上第i个、第j个像素的位置信息,ci、cj分别表示中心子孔径图像上第i个、第j个像素的色彩信息,θα、θβ、θγ为自定义的带宽半径;Among them, pi and pj represent the central sub-aperture image respectively The position information of the i-th and j-th pixels above, ci and cj represent the central sub-aperture image respectively Color information of the i-th and j-th pixels above, θα , θβ , θγ are custom bandwidth radii;
特征图迭代优化部分包括并行滤波、一元项叠加、归一化因子计算、归一化四个模块;并行滤波模块通过两个通路分别对本次迭代输入μt-1进行滤波处理:第一个通路利用卷积核F1对μt-1进行滤波,即然后,对滤波结果乘以权重参数θ1,即第二通路用卷积核F2对μt-1进行滤波,即然后,对滤波结果乘以权重参数θ2,即第一次迭代时,μt-1初始化为特征图U;θ1、θ2做随机初始化,通过网络训练获得更新;两个通路的结果逐元素相加得到并行滤波模块的输出即一元项添加模块是将特征图U与并行滤波模块的结果相叠加,得即归一化因子计算模块内部也进行了并行滤波和一元项添加操作,得到归一化因子γ;其数据处理的对象是全1矩阵J,而不是μt-1和特征图U;归一化因子计算模块的具体处理步骤是:归一化模块是将一元项添加的模块的计算结果对归一化因子γ按逐元素相除,得到本轮迭代的输出μt,即最后一次迭代的输出即是优化的视差图D。The feature map iterative optimization part includes four modules: parallel filtering, unary item superposition, normalization factor calculation, and normalization; the parallel filtering module performs filtering processing on the iterative input μt-1 through two channels: the first The channel uses the convolution kernel F1 to filter μt-1 , that is Then, for the filtered results Multiplied by the weight parameter θ1 , that is The second pass uses the convolution kernel F2 to filter μt-1 , namely Then, for the filtered result Multiplied by the weight parameter θ2 , that is In the first iteration, μt-1 is initialized as the feature map U; θ1 and θ2 are randomly initialized and updated through network training; the results of the two paths Add element-wise to get the output of the parallel filter module which is The unary item addition module is the result of combining the feature map U with the parallel filtering module superimposed, get which is The normalization factor calculation module also performs parallel filtering and unary item addition operations to obtain the normalization factor γ; the object of its data processing is the all-one matrix J instead of μt-1 and the feature map U; the normalization The specific processing steps of the factor calculation module are: The normalized module is the computation of the module adding the unary term Divide the normalization factor γ element by element to get the output μt of this round of iteration, namely The output of the last iteration is the optimized disparity map D.
有益效果:与现有技术相比,本发明的有益效果:1、本发明依据光场成像几何,另辟蹊径地从序列数据的视角分析光场,设计了基于循环神经网络的深度特征提取子网络,替代了传统卷积神经网络的特征提取方法,显著提高了局部深度估计能力;2、本发明依据条件随机场理论,对全局深度信息建模,设计了端到端的优化网络,显著提升了深度估计准确度和鲁棒性。Beneficial effects: Compared with the prior art, the beneficial effects of the present invention: 1. Based on the light field imaging geometry, the present invention analyzes the light field from the perspective of sequence data in a new way, and designs a deep feature extraction sub-network based on a cyclic neural network. It replaces the feature extraction method of the traditional convolutional neural network, and significantly improves the local depth estimation ability; 2. The present invention models the global depth information based on the conditional random field theory, and designs an end-to-end optimization network, which significantly improves the depth estimation accuracy and robustness.
附图说明Description of drawings
图1为本发明的流程图;Fig. 1 is a flow chart of the present invention;
图2为本发明中的4D光场双平面表示图;Fig. 2 is the representation diagram of the double plane of 4D light field among the present invention;
图3为本发明中的计算生成EPI合成图像的流程图;Fig. 3 is the flow chart of calculating and generating EPI synthetic image in the present invention;
图4为本发明中的中心子孔径图像及EPI图像示例;Fig. 4 is central sub-aperture image and EPI image example among the present invention;
图5为本发明中的EPI合成图像示意图;Fig. 5 is a schematic diagram of an EPI composite image in the present invention;
图6为本发明设计的LFRNN网络体系结构图;Fig. 6 is the LFRNN network architecture diagram that the present invention designs;
图7为本发明设计的序列特征提取子网络的结构图;Fig. 7 is the structural diagram of the sequence feature extraction sub-network designed by the present invention;
图8为本发明设计的基于条件随机场模型的深度优化模块的结构图;Fig. 8 is the structural diagram of the depth optimization module based on the conditional random field model designed by the present invention;
图9为本发明与现有方法的结果对比示意图。Fig. 9 is a schematic diagram of the results comparison between the present invention and the existing method.
具体实施方式Detailed ways
下面结合附图对本发明作进一步详细说明。The present invention will be described in further detail below in conjunction with the accompanying drawings.
如图1所示,本发明公开的一种基于光场序列特征分析的光场深度估计方法,包括如下步骤:As shown in Figure 1, a light field depth estimation method based on light field sequence feature analysis disclosed by the present invention includes the following steps:
步骤1、从4D光场数据中提取中心子孔径图像其中(iC,jC)表示中心子孔径图像的视角坐标。
4D光场数据是由光场相机所采集光场图像的解码表示,如图2所示,通常用双平面法(2PP)表示光场数据,图中Π和Ω是平行平面,分别表示视角平面和位置平面,通过光线与双平面的交点表示一条光线,所有非平行于双平面的光线,形成了光场。通常将光场记为L:(i,j,k,l)→L(i,j,k,l),其中,(i,j)表示微透镜图像的像素索引坐标或称视角坐标,(k,l)表示微透镜中心的索引坐标,i,j,k,l均为整数,L(i,j,k,l)表示(i,j)视角下通过(k,l)位置处的光线的辐射强度。提取中心子孔径图像的方法是抽取每个微透镜图像的中心像素,按微透镜位置索引排列得到二维图像,即其中(iC,jC)表示中心子孔径图像的视角坐标。4D light field data is the decoded representation of the light field image collected by the light field camera. As shown in Figure 2, the light field data is usually represented by the two-plane method (2PP). In the figure, Π and Ω are parallel planes, respectively representing the viewing angle plane and the position plane, a ray is represented by the intersection of the ray and the biplane, and all the rays that are not parallel to the biplane form a light field. Usually, the light field is recorded as L:(i,j,k,l)→L(i,j,k,l), where (i,j) represents the pixel index coordinates or viewing angle coordinates of the microlens image, (k ,l) indicates the index coordinates of the microlens center, i, j, k, l are all integers, L(i,j,k,l) indicates the light passing through the position (k,l) under the (i,j) viewing angle radiation intensity. The method of extracting the central sub-aperture image is to extract the central pixel of each microlens image, and arrange it according to the position index of the microlens to obtain a two-dimensional image, namely Where (iC , jC ) represents the viewing angle coordinates of the central sub-aperture image.
步骤2、由4D光场数据计算生成EPI合成图像ISEPI;如图3所示,包括如下步骤:Step 2, calculate and generate EPI composite image ISEPI by 4D light field data; As shown in Figure 3, comprise the following steps:
(2.1)根据输入4D光场的维度,初始化ISEPI为全0矩阵。(2.1) According to the dimension of the input 4D light field, initialize ISEPI as a matrix of all 0s.
4D光场L:(i,j,k,l)→L(i,j,k,l)中,若假设:角度分辨率为NAi×NAj,即i∈[0,NAi),j∈[0,NAj);空间分辨率为NSk×NSl,即k∈[0,NSk),l∈[0,NSl)。则ISEPI是(NSk×NAj)×NSl的二维矩阵,初始化为全0矩阵。In the 4D light field L:(i,j,k,l)→L(i,j,k,l), if it is assumed that the angular resolution is NAi ×NAj , that is, i∈[0,NAi ), j∈[0,NAj ); the spatial resolution is NSk ×NSl , that is, k∈[0,NSk ), l∈[0,NSl ). Then ISEPI is a two-dimensional matrix of (NSk ×NAj )×NSl , which is initialized as a matrix of all 0s.
(2.2)对于4D光场第三维(k)的每一行(行序号:k*),计算其对应的EPI图像并使用更新ISEPI的部分区域。(2.2) For each row (row number: k* ) of the third dimension (k) of the 4D light field, calculate its corresponding EPI image and use Update some areas of ISEPI .
由4D光场数据计算生成第三维第k*行对应的EPI图像的过程可以看作一个映射:即固定4D光场中的第一和第三两个维度,变化另外两个维度所得到的二维切片图像,令i=iC,k=k*。如图4所示,上方图像是由某场景光场数据生成的中心子孔径图像,下方图像是中心子孔径图像实线所在行对应的EPI图像。The process of calculating and generating the EPI image corresponding to the k* th row in the third dimension from the 4D light field data can be regarded as a mapping: That is, fix the first and third dimensions in the 4D light field, and change the other two dimensions to obtain a two-dimensional slice image, set i=iC , k=k*. As shown in Figure 4, the upper image is the central sub-aperture image generated by the light field data of a certain scene, and the lower image is the EPI image corresponding to the row where the solid line of the central sub-aperture image is located.
然后,使用所得的更新ISEPI的部分区域,即这里,ISEPI((k*-1)×NAj:k*×NAj,0:NSl)表示ISEPI中第(k*-1)×NAj行至第k*×NAj-1行,第0列至第NSl-1列的一块区域。Then, use the resulting Update some areas of ISEPI , namely Here, ISEPI ((k*-1)×NAj :k*×NAj ,0:NSl ) means that the line (k*-1)×NAj to k*×NAj -1 in ISEPI row, an area from column 0 to column NSl -1.
(2.3)对4D光场第三维的每一行进行第(2.2)步的操作,即可计算生成EPI合成图像ISEPI。为了展示效果,图5截取EPI合成图像中的一块区域作为示例,该区域是图4中心子孔径图像上实线位置上下14行像素对应的EPI合成图像。(2.3) Perform the operation in step (2.2) on each row of the third dimension of the 4D light field to calculate and generate the EPI composite image ISEPI . In order to demonstrate the effect, Figure 5 intercepts an area in the EPI composite image as an example, which is the EPI composite image corresponding to the 14 rows of pixels above and below the solid line on the central sub-aperture image in Figure 4 .
步骤3、构建光场神经网络模型LFRNN,接收ISEPI、输入,输出与中心子孔径图像同分辨率的视差图D。如图6所示,光场神经网络模型LFRNN包括基于光场序列分析的局部深度估计模块和基于条件随机场模型的深度优化模块。Step 3. Construct the light field neural network model LFRNN, receive ISEPI , input, output and central sub-aperture images Disparity map D at the same resolution. As shown in Figure 6, the light field neural network model LFRNN includes a local depth estimation module based on light field sequence analysis and a depth optimization module based on conditional random field model.
基于光场序列分析的局部深度估计模块包括滑窗处理层、序列特征提取子网络、特征图变形层。其中,滑窗处理层负责在EPI合成图像ISEPI上滑动截取EPI块IEPI-p,输入到序列特征提取子网络。滑窗大小为(NAj,16),水平方向滑动步长为1,垂直方向滑动步长为NAj,滑窗超越ISEPI时,补0填充。The local depth estimation module based on light field sequence analysis includes a sliding window processing layer, a sequence feature extraction subnetwork, and a feature map deformation layer. Among them, the sliding window processing layer is responsible for sliding and intercepting the EPI block IEPI-p on the EPI composite image ISEPI , and inputting it to the sequence feature extraction sub-network. The size of the sliding window is (NAj ,16), the horizontal sliding step is 1, and the vertical sliding step is NAj . When the sliding window exceeds ISEPI , it is filled with 0s.
序列特征提取子网络是为提取EPI块IEPI-p的序列特征而专门设计的循环神经网络,包括序列化拆分处理、双向GRU层和全连接网络,如图7所示。序列化拆分处理是基于EPI图像上蕴含着深度信息的直线分布于多列像素之中的独特观察,而提出的EPI图像块序列化机制。具体地,将NAj×16的EIP图像块IEPI-p的每列像素,看作一个列向量其中,x、y分别表示EPI图像块IEPI-p上像素的行、列坐标,表示EPI图像块IEPI-p上(x,y)处像素的灰度值。因此,一个NAj×16的EPI图像块IEPI-p可以序列化为16个列向量Gy,0≤y≤15且y为整数。这些向量将依次作为后续双向GRU层每个时刻的输入。The sequence feature extraction subnetwork is a recurrent neural network specially designed for extracting the sequence features of EPI block IEPI-p , including serialization split processing, bidirectional GRU layer and fully connected network, as shown in Figure 7. The serialization splitting process is based on the unique observation that the straight lines containing the depth information on the EPI image are distributed in multiple columns of pixels, and the serialization mechanism of the EPI image block is proposed. Specifically, consider each column of pixels of NAj ×16 EIP image block IEPI-p as a column vector Wherein, x and y represent the row and column coordinates of pixels on the EPI image block IEPI-p respectively, Indicates the gray value of the pixel at (x, y) on the EPI image block IEPI-p . Therefore, an EPI image block IEPI-p of NAj ×16 can be serialized into 16 column vectors Gy , where 0≤y≤15 and y is an integer. These vectors will in turn serve as the input for each moment of the subsequent bidirectional GRU layer.
双向GRU层由两个方向的GRU单元构成,每个方向GRU单元的维度为256,每个GRU单元设置为非序列工作模式,即接收16个时刻的向量输入,产生1个输出。双向GRU层共计产生512个输出。The bidirectional GRU layer is composed of GRU units in two directions. The dimension of the GRU unit in each direction is 256. Each GRU unit is set to work in a non-sequential mode, that is, it receives vector inputs at 16 moments and generates 1 output. The bidirectional GRU layer produces a total of 512 outputs.
接下来的全连接网络包含两个全连接层。第一个全连接层接收双向GRU层的512个输出,产生16个输出;该层全连接配置ReLU激活函数。第二个全连接层接收前一个全连接层的16个输出,输出1个视差值;该全连接层不配置激活函数。The next fully connected network consists of two fully connected layers. The first fully connected layer receives 512 outputs from the bidirectional GRU layer and generates 16 outputs; this layer is fully connected to configure the ReLU activation function. The second fully connected layer receives 16 outputs from the previous fully connected layer and
特征图变形层的任务是将(NSk×NSl)个视差值序列,变形成NSk×NSl的矩阵,称为特征图,记为U。前面滑窗处理层按照设定的步长,在EPI合成图像ISEPI上滑动截取了(NSk×NSl)个EPI块IEPI-p,每个EPI块IEPI-p在经序列特征提取子网络处理得到1个视差值,所有EPI块共产生了(NSk×NSl)个视差值,特征图变形层调用Reshape处理,将其变形为NSk×NSl矩阵,记为U。The task of the feature map deformation layer is to transform (NSk ×NSl ) disparity value sequences into a matrix of NSk ×NSl , which is called a feature map and denoted as U. The front sliding window processing layer slides and intercepts (NSk ×NSl ) EPI blocks IEPI-p on the EPI composite image ISEPI according to the set step size, and each EPI block IEPI-p is extracted through sequence features The sub-network processing obtains 1 disparity value, and all EPI blocks generate a total of (NSk ×NSl ) disparity values, and the feature map deformation layer calls Reshape processing to transform it into a NSk ×NSl matrix, denoted as U .
基于条件随机场模型的深度优化模块,包括中心子孔径图像核参数提取和特征图迭代优化两部分,如图8所示。中心子孔径图像核参数提取部分主要功能是根据输入的中心子孔径图像计算滤波器核参数;特征图迭代优化部分是以条件随机场为理论基础,按照中心子孔径图像核参数提取部分所得滤波器核参数,将特征图迭代优化,得到视差图D。The deep optimization module based on the conditional random field model includes two parts: central sub-aperture image kernel parameter extraction and feature map iterative optimization, as shown in Figure 8. The main function of the central sub-aperture image kernel parameter extraction part is to calculate the filter kernel parameters according to the input central sub-aperture image; the feature map iterative optimization part is based on the conditional random field, according to the central sub-aperture image kernel parameter extraction part of the obtained filter The kernel parameters are used to iteratively optimize the feature map to obtain the disparity map D.
中心子孔径图像核参数提取部分以中心子孔径图像为输入,计算两个全局连接卷积核的参数:1)计算空间/色彩卷积核F1,计算方法是其中,pi、pj分别表示中心子孔径图像上第i个、第j个像素的位置信息,ci、cj分别表示中心子孔径图像上第i个、第j个像素的色彩信息,θα、θβ是自定义的带宽半径(这里,都设定为1)。2)计算空间卷积核F2,计算方法是同样,pi、pj分别表示中心子孔径图像上第i个、第j个像素的位置信息,θγ是自定义的带宽半径(这里设定为)。The central sub-aperture image kernel parameter extraction part is based on the central sub-aperture image As input, calculate the parameters of two globally connected convolution kernels: 1) Calculate the space/color convolution kernel F1 , the calculation method is Among them, pi and pj represent the central sub-aperture image respectively The position information of the i-th and j-th pixels above, ci and cj represent the central sub-aperture image respectively The color information of the i-th and j-th pixels above, θα and θβ are custom bandwidth radii (here, both are set to 1). 2) Calculate the spatial convolution kernel F2 , the calculation method is Similarly, pi and pj represent the central sub-aperture image respectively The position information of the i-th and j-th pixels above, θγ is the custom bandwidth radius (here set as ).
特征图迭代优化部分包括并行滤波、一元项添加、归一化因子计算、归一化等四个模块。The feature map iterative optimization part includes four modules: parallel filtering, unary term addition, normalization factor calculation, and normalization.
并行滤波模块通过两个通路分别对本次迭代输入μt-1进行滤波处理:第一个通路利用卷积核F1对μt-1进行滤波,即然后,对滤波结果乘以权重参数θ1,即类似地,第二通路用卷积核F2对μt-1进行滤波,即然后,对滤波结果乘以权重参数θ2,即第一次迭代时,μt-1初始化为特征图U;θ1、θ2做随机初始化,通过网络训练获得更新。两个通路的结果逐元素相加得到并行滤波模块的输出即The parallel filtering module filters the iterative input μt-1 through two channels respectively: the first channel uses the convolution kernel F1 to filter μt-1 , namely Then, for the filtered result Multiplied by the weight parameter θ1 , that is Similarly, the second pass uses the convolution kernel F2 to filter μt-1 , namely Then, for the filtered result Multiplied by the weight parameter θ2 , that is In the first iteration, μt-1 is initialized as the feature map U; θ1 and θ2 are randomly initialized and updated through network training. The result of the two pathways Add element-wise to get the output of the parallel filter module which is
一元项添加模块是将特征图U与并行滤波模块的结果相叠加,得即The unary item addition module is the result of combining the feature map U with the parallel filtering module superimposed, get which is
归一化因子计算模块内部也进行了并行滤波和一元项添加操作,得到归一化因子γ;不同的是,其数据处理的对象是全1矩阵J,而不是μt-1和特征图U。归一化因子计算模块的具体处理步骤是:The normalization factor calculation module also performs parallel filtering and unary item addition operations to obtain the normalization factor γ; the difference is that the object of its data processing is the all-one matrix J instead of μt-1 and the feature map U . The specific processing steps of the normalization factor calculation module are:
归一化模块是将一元项添加的模块的计算结果对归一化因子γ按逐元素相除,得到本轮迭代的输出μt,即The normalized module is the computation of the module adding the unary term Divide the normalization factor γ element by element to get the output μt of this round of iteration, namely
特征图迭代优化部分是由四个模块构成的迭代过程,通常6次迭代即可取得理想的优化效果。最后一次迭代的输出即是优化的视差图D。The feature map iterative optimization part is an iterative process consisting of four modules, and usually 6 iterations can achieve the ideal optimization effect. The output of the last iteration is the optimized disparity map D.
步骤4、训练步骤3所述的LFRNN,得到网络最优参数集P。其特征在于训练步骤分为两个阶段进行,两个阶段均采用平均绝对误差作为损失函数。第一个阶段仅训练基于光场序列分析的局部深度估计模块,得到该模块的最优参数集P1;第二阶段冻结基于光场序列分析的局部深度估计模块的最优参数集P1,并训练整个网络,从而更新基于条件随机场模型的深度优化模块的参数,最终得到LFRNN网络的最优参数集P。Step 4. Train the LFRNN described in step 3 to obtain the optimal parameter set P of the network. It is characterized in that the training step is divided into two stages, and the average absolute error is used as the loss function in both stages. The first stage only trains the local depth estimation module based on light field sequence analysis, and obtains the optimal parameter set P1 of the module; the second stage freezes the optimal parameter set P1 of the local depth estimation module based on light field sequence analysis, and trains The entire network, so as to update the parameters of the deep optimization module based on the conditional random field model, and finally obtain the optimal parameter set P of the LFRNN network.
训练LFRNN网络包括如下步骤:Training the LFRNN network includes the following steps:
(4.1)准备光场数据集,并将其划分为训练集、验证集和测试集。光场数据集需包含场景光场数据和场景视差真值,具体地,可以使用现行公开的HCI光场数据集,也可通过Blender仿真软件合成光场数据,还可以通过光场相机和测距设备采集光场数据和深度真值。按照5:3:2的比例,将光场数据集随机分成训练集、验证集和测试集。(4.1) Prepare the light field dataset and divide it into training set, validation set and test set. The light field data set needs to contain the scene light field data and the true parallax value of the scene. Specifically, the current public HCI light field data set can be used, or the light field data can be synthesized through the Blender simulation software. The device collects light field data and true depth values. According to the ratio of 5:3:2, the light field dataset is randomly divided into training set, validation set and test set.
(4.2)准备网络训练所需的输入数据和真值数据。输入数据包括中心子孔径图像和EPI合成图像,分别按照步骤1和步骤2由光场数据集计算产生;真值数据直接由光场数据集提供。(4.2) Prepare the input data and ground truth data required for network training. The input data includes the central sub-aperture image and the EPI composite image, which are calculated and generated from the light field dataset according to
(4.3)将基于光场序列分析的局部深度估计模块作为独立的网络进行训练并验证。首先,输入为EPI合成图像,输出的特征图作为估计的视差值,数据集提供的真值数据作为视差真值,由此计算平均绝对误差,反向传播优化网络参数,训练后得到该模块的最优参数集P1。其中,超参数batch设置为64,超参数epoch设置为10000;前2000个epoch学习率为0.1×10-3,后8000个epoch学习率为0.1×10-4。其次,在验证集上,验证该网络模块的泛化能力。(4.3) The local depth estimation module based on light field sequence analysis is trained and verified as an independent network. First, the input is the EPI composite image, the output feature map is used as the estimated disparity value, and the real value data provided by the data set is used as the disparity true value, from which the mean absolute error is calculated, and the network parameters are optimized by backpropagation, and the module is obtained after training The optimal parameter set P1 of . Among them, the hyperparameter batch is set to 64, and the hyperparameter epoch is set to 10000; the learning rate of the first 2000 epochs is 0.1×10-3 , and the learning rate of the last 8000 epochs is 0.1×10-4 . Second, verify the generalization ability of the network module on the verification set.
(4.4)训练并验证LFRNN,得到参数最优参数集P。首先,将基于光场序列分析的局部深度估计模块作为预训练网络,加载其参数集P1,并冻结该模块的参数更新;然后,输入EPI合成图像、中心子孔径图像,输出估计的视差值,参考视差真值计算平均绝对误差,反向传播优化LFRNN网络中基于条件随机场模型的深度优化模块的参数,最终得到LFRNN的最优参数集P。其中,超参数batch设置为64,超参数epoch设置为3000,学习率设置为0.1×10-4。最后,在验证集上测试整个网络的泛化能力。(4.4) Train and verify LFRNN to obtain the optimal parameter set P. First, the local depth estimation module based on light field sequence analysis is used as a pre-trained network, its parameter set P1 is loaded, and the parameter update of this module is frozen; then, the EPI composite image and the central sub-aperture image are input, and the estimated disparity value is output , calculate the mean absolute error with reference to the true value of the disparity, and backpropagate to optimize the parameters of the deep optimization module based on the conditional random field model in the LFRNN network, and finally obtain the optimal parameter set P of the LFRNN. Among them, the hyperparameter batch is set to 64, the hyperparameter epoch is set to 3000, and the learning rate is set to 0.1×10-4 . Finally, the generalization ability of the entire network is tested on the validation set.
LFRNN网络的测试与实用。对于步骤4所述测试集或者光场相机采集的4D光场数据,均可按照步骤1处理得到中心子孔径图像,按照步骤2处理得到EPI合成图像;然后,将所得中心子孔径图像和EPI合成图像输入到步骤3所述LFRNN网络;接着,加载步骤4所述的最优参数集P,进行前向计算,得到视差图D。Testing and practicality of LFRNN network. For the test set described in step 4 or the 4D light field data collected by the light field camera, the central sub-aperture image can be obtained by processing according to
图9给出了本文方法与其他基于神经网络的深度估计方法的性能对比示例。以4幅典型场景为例,对比了EPINet、FusionNet和VommaNet等主流光场深度估计方法。第一列是场景的中心子孔径图像,第二至五列分别是本发明公开的方法、EPINet、FusionNet和VommaNet的处理结果;同一场景的处理结果排列在同一行;对比评价的指标是均方误差(MSE),每个处理结果图像上方的数字代表了对应方法在该场景上取得的MSE值;每一行后附加一个灰度标尺,指示了处理结果在各像素位置的误差分布,颜色越浅误差越小,颜色越深误差越大。由图9可知,本发明公开的LFRNN深度估计方法在前两个示例场景中取得了最好的MSE指标,在后两个示例场景中虽然总体MSE指标不及VammaNet方法,但多数像素的深度估计结果更接近真值,视觉效果明显优于VammaNet的结果。Figure 9 gives an example of the performance comparison between our method and other neural network-based depth estimation methods. Taking 4 typical scenes as examples, the mainstream light field depth estimation methods such as EPINet, FusionNet and VommaNet are compared. The first column is the central sub-aperture image of the scene, and the second to fifth columns are the processing results of the method disclosed in the present invention, EPINet, FusionNet and VommaNet respectively; the processing results of the same scene are arranged in the same row; the index of comparative evaluation is mean square Error (MSE), the number above each processing result image represents the MSE value obtained by the corresponding method on the scene; a gray scale is attached after each line, indicating the error distribution of the processing result at each pixel position, the lighter the color The smaller the error, the darker the color, the larger the error. It can be seen from Fig. 9 that the LFRNN depth estimation method disclosed in the present invention has achieved the best MSE index in the first two example scenarios. Although the overall MSE index is not as good as that of the VammaNet method in the latter two example scenarios, the depth estimation results of most pixels are Closer to the ground truth, the visual effect is significantly better than the results of VammaNet.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210721840.1ACN115272435B (en) | 2022-06-24 | 2022-06-24 | A light field depth estimation method based on light field sequence feature analysis |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210721840.1ACN115272435B (en) | 2022-06-24 | 2022-06-24 | A light field depth estimation method based on light field sequence feature analysis |
| Publication Number | Publication Date |
|---|---|
| CN115272435Atrue CN115272435A (en) | 2022-11-01 |
| CN115272435B CN115272435B (en) | 2025-05-06 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202210721840.1AActiveCN115272435B (en) | 2022-06-24 | 2022-06-24 | A light field depth estimation method based on light field sequence feature analysis |
| Country | Link |
|---|---|
| CN (1) | CN115272435B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116070687A (en)* | 2023-03-06 | 2023-05-05 | 浙江优众新材料科技有限公司 | Neural network light field representation method based on global ray space affine transformation |
| CN119903750A (en)* | 2025-01-10 | 2025-04-29 | 南京大学 | Design method and device of monocular depth imaging equipment based on vector light field control |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107993260A (en)* | 2017-12-14 | 2018-05-04 | 浙江工商大学 | A kind of light field image depth estimation method based on mixed type convolutional neural networks |
| CN112116646A (en)* | 2020-09-23 | 2020-12-22 | 南京工程学院 | Light field image depth estimation method based on depth convolution neural network |
| CN112288789A (en)* | 2020-10-26 | 2021-01-29 | 杭州电子科技大学 | Light field depth self-supervision learning method based on occlusion region iterative optimization |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107993260A (en)* | 2017-12-14 | 2018-05-04 | 浙江工商大学 | A kind of light field image depth estimation method based on mixed type convolutional neural networks |
| CN112116646A (en)* | 2020-09-23 | 2020-12-22 | 南京工程学院 | Light field image depth estimation method based on depth convolution neural network |
| CN112288789A (en)* | 2020-10-26 | 2021-01-29 | 杭州电子科技大学 | Light field depth self-supervision learning method based on occlusion region iterative optimization |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116070687A (en)* | 2023-03-06 | 2023-05-05 | 浙江优众新材料科技有限公司 | Neural network light field representation method based on global ray space affine transformation |
| CN119903750A (en)* | 2025-01-10 | 2025-04-29 | 南京大学 | Design method and device of monocular depth imaging equipment based on vector light field control |
| CN119903750B (en)* | 2025-01-10 | 2025-09-05 | 南京大学 | Design method and device of monocular depth imaging equipment based on vector light field control |
| Publication number | Publication date |
|---|---|
| CN115272435B (en) | 2025-05-06 |
| Publication | Publication Date | Title |
|---|---|---|
| CN110458939B (en) | Indoor scene modeling method based on visual angle generation | |
| CN110443882B (en) | Light field microscopic three-dimensional reconstruction method and device based on deep learning algorithm | |
| Fahringer et al. | Volumetric particle image velocimetry with a single plenoptic camera | |
| CN106846463B (en) | Three-dimensional reconstruction method and system of microscopic image based on deep learning neural network | |
| EP4191539A1 (en) | Method for performing volumetric reconstruction | |
| CN110047144A (en) | A kind of complete object real-time three-dimensional method for reconstructing based on Kinectv2 | |
| CN114170290B (en) | Image processing method and related equipment | |
| CN111260707B (en) | Depth estimation method based on light field EPI image | |
| CN110570522A (en) | A multi-view 3D reconstruction method | |
| CN105981050A (en) | Method and system for exacting face features from data of face images | |
| CN117172134B (en) | Multi-scale DEM modeling method of lunar surface based on fused terrain features | |
| CN112116646B (en) | A light field image depth estimation method based on deep convolutional neural network | |
| CN115272435A (en) | A light field depth estimation method based on light field sequence feature analysis | |
| CN114820299B (en) | A method and device for super-resolution image restoration of non-uniform motion blur | |
| CN112330795A (en) | Human 3D Reconstruction Method and System Based on Single RGBD Image | |
| CN117974895B (en) | A pipeline monocular video 3D reconstruction and depth prediction method and system | |
| Jancosek et al. | Scalable multi-view stereo | |
| CN111028273A (en) | A light field depth estimation method based on multi-stream convolutional neural network and its implementation system | |
| CN117557739A (en) | Three-dimensional reconstruction method and device for surface exposed body | |
| CN111582437B (en) | Construction method of parallax regression depth neural network | |
| Lin et al. | A-SATMVSNet: An attention-aware multi-view stereo matching network based on satellite imagery | |
| Ashfaq et al. | 3D Point Cloud Generation to Understand Real Object Structure via Graph Convolutional Networks. | |
| CN114463175A (en) | Mars image super-resolution method based on deep convolution neural network | |
| Fahringer et al. | The effect of grid resolution on the accuracy of tomographic reconstruction using a plenoptic camera | |
| CN113486928A (en) | Multi-view image alignment method based on rational polynomial model differentiable tensor expression |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |