Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a high-fidelity radiation field reconstruction method and a high-fidelity radiation field reconstruction system based on polarization normal estimation, which aim to accelerate the convergence process of three-dimensional Gaussian ellipsoid optimization, so that the generated grid surface is smoother, and the problems of slow convergence of the existing radiation field reconstruction and unsmooth surface when the grid is generated are effectively solved.
In order to solve the technical problems, the invention adopts the following technical scheme:
a high-fidelity radiation field reconstruction method based on polarization normal estimation comprises the following steps:
s1, extracting polarization degree and polarization angle from a polarized image sequence, and extracting position codes from an RGB image sequence;
S2, inputting a multichannel tensor consisting of a polarized image sequence, polarization degree, polarization angle and position code into a trained attention mechanism image segmentation network to obtain an estimation result of a polarization normal vector, wherein the attention mechanism image segmentation network is an improved network of an image segmentation network Unet ++, an attention gate is added in residual connection of the image segmentation network Unet +, and a function expression of the attention gate is as follows:
,
,
,
wherein,For the output characteristics of the attention gate,For the input features of the attention gate,In order to add the attention output,For a nonlinear activated and resampled output,In the form of a linear transformation matrix,For the function to be activated by the ReLU,For inputting featuresIs used for the weight coefficient of the (c),To input gating signalIs used for the weight coefficient of the (c),In order to input the gate control signal,AndIn order for the offset to be a function of,The function is activated for Sigmoid,For the output of the upper layer,Is a parameter、AndIs a collection of (3);
S3, extracting a camera pose matrix from the RGB image sequence by adopting a motion structure recovery algorithm SfM and establishing a sparse point cloud;
s4, replacing the normal vector of the original DN-Splatter algorithm with the estimation result of the polarization normal vector to obtain an improved DN-Splatter algorithm, and realizing high-fidelity radiation field reconstruction according to the established sparse point cloud by utilizing the improved DN-Splatter algorithm.
Optionally, the extracting the function expression of the polarization degree and the polarization angle from the polarized image sequence in the step S1 is:
,
,
,
,
,
wherein,In order to be of a degree of polarization,Is the polarization angle, the polarization degreeFor characterizing the proportion of the intensity of the polarized light in the beam to the total intensity, the angle of polarizationFor characterizing the angle between the amounts of linear polarization,、AndIs the vector of Stokes,Is the polarization intensity in the direction of 0 degrees,Is the polarized intensity in the direction of 45 degrees,Is the polarized intensity in the 90 degree direction.
Optionally, the extracting a functional expression of the position code from the RGB image sequence in step S1 is:
,
wherein,For the purpose of the position coding,AndThe focal lengths of the cameras in the x and y directions,AndThe coordinates of the principal point of the camera in the x and y directions,、AndIs the coordinates of the real world point in the camera coordinate system.
Optionally, step S2 is preceded by a step of training the attention mechanism image segmentation network using a polarized image sequence sample with a polarized normal vector label, wherein the calculated functional expression of the polarized normal vector label is:
,
wherein,Is the normal vector of polarization, which is the polarization vector,Curved surface upper point being three-dimensional Gaussian ellipsoidIs provided with a height of (1),Curved surface upper point being three-dimensional Gaussian ellipsoidIs provided with x-axis and y-axis coordinates,As a function of the angle of incidence,Is the azimuth angle of the incident space, and has:
,
,
,
wherein,In order to be of a refractive index,In order to be of a degree of polarization,As a signed arc-tangent function,、AndIs the vector of Stokes,Is the polarization intensity in the direction of 0 degrees,Is the polarized intensity in the direction of 45 degrees,Is the polarized intensity in the direction of 90 degrees,Is 135 degree polarized intensity.
Optionally, when the improved DN-Splatter algorithm is used for realizing high-fidelity radiation field reconstruction according to the established sparse point cloud, the adopted loss function is obtained by weighted summation of luminosity error loss and Gaussian smoothness loss, and the function expression of the luminosity error loss is as follows:
,
wherein,In order to account for the loss of luminosity errors,As the coefficient of the light-emitting diode,In order for the frame to be a target frame,The frame to be reconstructed is then processed to obtain,Is a structural similarity measure, and the computational function expression of the structural similarity measure is:
,
wherein,Representation ofIs a measure of the structural similarity of (a),AndRespectively isIs used for the average value of (a),AndRespectively isIs a function of the variance of (a),Is thatIs used to determine the covariance of (1),AndIs a constant for avoiding the phenomenon of zero removal.
Optionally, the gaussian smoothness loss is expressed as a function of:
,
wherein,In order to be a loss of gaussian smoothness,In the case of a pixel which is a pixel,Is a pixelDepth to neighborhoodIs used for the gradient of (a),For the transpose operation,Is a pixelIntensity with neighborhoodIs a gradient of (a).
In addition, the invention also provides a high-fidelity radiation field reconstruction system based on the polarization normal estimation, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is configured to execute the high-fidelity radiation field reconstruction method based on the polarization normal estimation.
Furthermore, the invention provides a computer readable storage medium having stored therein a computer program configured to perform the high fidelity radiation field reconstruction method based on polarization normal estimation by a processor.
Furthermore, the invention provides a computer program product comprising a computer program configured to execute the high-fidelity radiation field reconstruction method based on polarization normal estimation by a processor.
Compared with the prior art, the method has the advantages that firstly, in order to improve the precision and the robustness in the three-dimensional reconstruction of scenes and objects, the method introduces polarization information into the model reconstruction process, as the polarization state of light changes when light interacts with the surfaces of different materials, the polarized image can realize the normal vector calculation of pixel level through processing, and the method effectively solves the problem of polarized normal vector estimation by means of a deep learning networkThe fuzzy problem ensures the estimation precision of the normal vector of the multiple objects at the scene level. Secondly, the method of the invention provides priori constraint for radiation field optimization and reconstruction by using the normal vector of polarization estimation, accelerates the convergence process of three-dimensional Gaussian ellipsoid optimization, leads the generated grid surface to be smoother, induces Gaussian balls to tend to be flattened in the normal direction, more smoothly fits the plane, improves the optimization convergence rate and the overall accuracy of radiation field reconstruction, has the advantages of strong instantaneity, smooth fitting and high accuracy, and can effectively solve the problems of slow convergence of the existing radiation field reconstruction and unsmooth surface when generating grids.
Detailed Description
In order to make the present invention better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The technical problems to be solved by the high-fidelity radiation field reconstruction method based on polarization normal estimation of the invention are that the precision of scene-level polarization normal vector estimation is low and the radiation field reconstruction surface is not smooth, as shown in fig. 1 and 2, the high-fidelity radiation field reconstruction method based on polarization normal estimation of the embodiment comprises the following steps:
s1, extracting polarization degree and polarization angle from a polarized image sequence, and extracting position codes from an RGB image sequence;
S2, inputting a multichannel tensor consisting of a polarized image sequence, a polarization degree, a polarization angle and a position code into a trained attention mechanism image segmentation network to obtain an estimation result of a polarization normal vector;
S3, extracting a camera pose matrix from the RGB image sequence by adopting a motion structure recovery algorithm SfM and establishing a sparse point cloud;
s4, replacing the normal vector of the original DN-Splatter algorithm with the estimation result of the polarization normal vector to obtain an improved DN-Splatter algorithm, and realizing high-fidelity radiation field reconstruction according to the established sparse point cloud by utilizing the improved DN-Splatter algorithm.
Calculating polarization degree and polarization angle information based on Stokes vectors through the input polarized images in the four polarization directions, and extracting the function expression of the polarization degree and the polarization angle from the polarized image sequence in the step S1 of the embodiment is as follows:
,
,
,
,
,
wherein,In order to be of a degree of polarization,Is the polarization angle, the polarization degreeFor characterizing the proportion of the intensity of the polarized light in the beam to the total intensity, the angle of polarizationFor characterizing the angle between the amounts of linear polarization,、AndIs the vector of Stokes,Is the polarization intensity in the direction of 0 degrees,Is the polarized intensity in the direction of 45 degrees,Is the polarized intensity in the direction of 90 degrees,Is the coordinates of the pixel points
Since the position information of the observation direction affects the measured polarization information, the two-dimensional image coordinates are mapped to three dimensions through the camera internal parameters, and normalized to form a position code, fig. 3 is a schematic diagram of the principle of mapping the two-dimensional image coordinates to the three-dimensional coordinates in the embodiment, where O is the origin,Is a two-dimensional image coordinate and,Is a three-dimensional coordinate. In step S1 of this embodiment, the functional expression for extracting the position code from the RGB image sequence is:
,
wherein,For the purpose of the position coding,AndThe focal lengths of the cameras in the x and y directions,AndThe coordinates of the principal point of the camera in the x and y directions,、AndIs the coordinates of the real world point in the camera coordinate system.
Inputting the multichannel tensor consisting of the original polarized image, the polarization degree, the polarization angle and the position code into a trained attention mechanism image segmentation network to obtain an estimation result of the polarization normal vector. Considering the difficulty in extracting normal vectors of multiple objects in a complex scene, referring to fig. 4, in this embodiment, attention gates are added in residual connection of a picture segmentation network Unet ++ with dense jump connection (triangles in fig. 4 are added attention gates, other structures are the same as Unet ++, Xi,j in each circular picture frame represents features), so that feature information of upper and lower layers of depth can be communicated, generalization of the network and interactivity between features are enhanced, the network can learn features of different depths by itself, parameter quantity is controlled to a certain extent, and overfitting is prevented under the condition of small sample training. The input to the attention gate is the upsampling feature of the extension node and the corresponding feature from the current encoder. The upsampling feature is used as a gating signal to suppress irrelevant areas in the task. The newly designed network effectively reduces the angle error of the normal vector estimation, improves the scene generalization of the normal vector estimation, and provides accurate priori constraint for the three-dimensional reconstruction task. FIG. 5 is a schematic diagram of the network structure of the attention gate in the present embodiment, assuming that the current encoder characteristics of the input areThe up-sampling feature of the extended node is thatThe functional expression of the attention gate is:
,
,
,
wherein,To pay attention toThe output characteristics of the gate are such that,For the input features of the attention gate,In order to add the attention output,For a nonlinear activated and resampled output,In the form of a linear transformation matrix,For the function to be activated by the ReLU,For inputting featuresIs used for the weight coefficient of the (c),To input gating signalIs used for the weight coefficient of the (c),In order to input the gate control signal,AndIn order for the offset to be a function of,The function is activated for Sigmoid,For the output of the upper layer,Is a parameter、AndIs a set of (3). Resampling means that the output of the last node and the weight matrixElement-wise multiplication to achieve enhancement of the feature.
The embodiment further includes a step of training the attention mechanism image segmentation network by using a polarized image sequence sample with a polarized normal vector label, wherein the expression of the calculation function of the polarized normal vector label is:
,
wherein,Is the normal vector of polarization, which is the polarization vector,Curved surface upper point being three-dimensional Gaussian ellipsoidIs provided with a height of (1),Curved surface upper point being three-dimensional Gaussian ellipsoidIs provided with x-axis and y-axis coordinates,As a function of the angle of incidence,Is the azimuth angle of the incident space, and has:
,
,
,
wherein,In order to be of a refractive index,In order to be of a degree of polarization,As a signed arc-tangent function,、AndIs the vector of Stokes,Is the polarization intensity in the direction of 0 degrees,Is the polarized intensity in the direction of 45 degrees,Is the polarized intensity in the direction of 90 degrees,Is 135 degree polarized intensity.
Referring to fig. 2, step S3 of this embodiment includes extracting a camera pose matrix from an RGB image sequence by using a motion structure restoration algorithm SfM, and creating a sparse point cloud, where the motion structure restoration algorithm SfM is an existing algorithm (see literature :Snavely, N., Seitz, S. M.,&Szeliski, R. (2006). Photo Tourism: Exploring Photo Collections In 3D. ACM Transactions on Graphics, 25(3), 835-846.),, which includes two major key links, feature extraction and matching and incremental three-dimensional reconstruction, inputting a group of images, extracting feature points from each input image by using a feature extraction and matching algorithm, and performing descriptor matching.
In this embodiment, the estimation result of the polarization normal vector is substituted for the original DN-Splatter algorithm in step S4 (see document :Bhat, S. F., Birkl, R., Wofk, D., Wonka, P.,&Müller, M. (2023). DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing. arXiv. arXiv:2403.17822), in which the normal vector is modified by DN-Splatter algorithm, and the high-fidelity radiation field reconstruction is realized by using the modified DN-Splatter algorithm according to the established sparse point cloud, fig. 6 is a schematic diagram of the high-fidelity radiation field reconstruction of the modified DN-Splatter algorithm in this embodiment, the estimation result of the polarization normal vector is substituted for the normal vector, the constraint of expanding the corresponding gaussian ellipsoid in the normal vector direction is given according to the polarization normal vector of the pixel point, the actual surface is attached in a flattened form as much as possible, the gaussian ellipsoid optimization variable mainly includes information such as size, color, pose of the triaxial direction, and the like, and the normal vector constraint regularizes the gaussian position of the edge perception loss in the optimization process, and the gaussian local smoothing and direction correction of the fitting surface.
As an optional implementation manner, when the embodiment uses the modified DN-Splatter algorithm to implement high-fidelity radiation field reconstruction according to the established sparse point cloud, the adopted loss function is obtained by weighted summation of luminosity error loss and gaussian smoothness loss, and the function expression of luminosity error loss is as follows:
,
wherein,In order to account for the loss of luminosity errors,As the coefficient of the light-emitting diode,In order for the frame to be a target frame,The frame to be reconstructed is then processed to obtain,Is a structural similarity measure, and the computational function expression of the structural similarity measure is:
,
wherein,Representation ofIs a measure of the structural similarity of (a),AndRespectively isIs used for the average value of (a),AndRespectively isIs a function of the variance of (a),Is thatIs used to determine the covariance of (1),AndIs a constant for avoiding the phenomenon of zero removal.
In this embodiment, the functional expression of the gaussian smoothness loss is:
,
wherein,In order to be a loss of gaussian smoothness,In the case of a pixel which is a pixel,Is a pixelDepth to neighborhoodIs used for the gradient of (a),For the transpose operation,Is a pixelIntensity with neighborhoodIs a gradient of (a).
In summary, the high-fidelity radiation field reconstruction method based on polarization normal estimation in the embodiment has the following characteristics that 1) the method of the embodiment firstly provides a new method for guiding three-dimensional radiation field reconstruction by using the normal vector of polarization estimation, thereby accelerating the optimal convergence rate of the radiation field and greatly improving the fitting precision of the three-dimensional reconstruction surface. 2) The method improves the prior scene-level polarization normal vector estimation network, adopts the attention gate to enhance the learning of the target area in Unet ++ jump connection, leads the attention connection between adjacent convolution layers to better process the relationship between the local normal vector estimation and the whole scene segmentation. 3) According to the method, an accurate normal vector diagram can be estimated only through a polarized image of a single view, and the step of initializing a Gaussian direction by using the normal direction of the estimated initial SfM point cloud for monocular normal estimation is omitted. 4) The method of the embodiment can enable grids generated by reconstructing the three-dimensional radiation field to be smoother, and can be better suitable for downstream robot navigation tasks.
In addition, the embodiment also provides a high-fidelity radiation field reconstruction system based on polarization normal estimation, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is configured to execute the high-fidelity radiation field reconstruction method based on the polarization normal estimation.
Furthermore, the present embodiment provides a computer readable storage medium having stored therein a computer program configured to execute the high-fidelity radiation field reconstruction method based on polarization normal estimation by a processor.
Furthermore, the present embodiment provides a computer program product comprising a computer program configured to execute the high-fidelity radiation field reconstruction method based on polarization normal estimation by a processor.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided in the form of a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.