Table 1 is a quantitative result of the 4 methods on the reconstructed mass on the DTU data set (lower is better). It can be seen that under the condition that numdepth is 96, in the aspect of Acc (accuracy), the accuracy of the original MVSNet algorithm is 0.496, the accuracy of the R-MVSNet algorithm is 0.478, and the accuracy of the Point-MVSNet algorithm is 0.462, the accuracy of the improved method is 0.483, and the improved method has better accuracy than the original MVSNet algorithm and the R-MVSNet algorithm; in the aspect of Comp (integrity), the integrity of the original MVSNet algorithm is 1.378, the integrity of the R-MVSNet algorithm is 1.341, the integrity of the Point-MVSNet algorithm is 1.326, the integrity of the improved method of the invention is 1.304, and the improved method is the best method of the 4 methods; in the aspect of Overall (average precision), the average precision of the original MVSNet algorithm is 0.937, the average precision of the R-MVSNet algorithm is 0.910, the average precision of the Point-MVSNet algorithm is 0.894, and the improved method of the invention has the average precision of 0.889 and is the best of the 4 methods.

The invention researches the problems of reconstruction of multi-view images and insufficient reconstruction accuracy and integrity, provides an MVSNet improvement algorithm based on depth learning, constructs a network model for improving the precision of a depth map by modifying a feature extraction module, designing a contribution calculation module and a spatial information acquisition module, and enhances the depth estimation capability of the images. The experimental results show that: the network model provided by the invention effectively solves the problem of poor accuracy and integrity when the image is reconstructed, and has a good reconstruction effect.

Those skilled in the art will appreciate that the invention may be practiced without these specific details. Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims

1. A multi-view stereo vision reconstruction method based on deep learning is characterized in that: the reconstruction method comprises the following steps:

step 1, processing a DTU data set;

step 2, designing a network;

step 3, reading the data set in the step 1 into the network designed in the step 2 for training to obtain the trained network weight;

2. The method of claim 1, wherein the method comprises: the specific method for processing the DTU data set in the step 1 is as follows: and normalizing the image data by using Python language, and performing data type conversion on other data to obtain a processed data set.

3. The method of claim 1, wherein the method comprises: the specific method for designing the network in the step 2 comprises the following steps: further comprising the steps of:

step 2.1, designing a special three-layer feature extraction pyramid network;

4. The method of claim 3, wherein the method comprises: the step 2.1 designs a specific method of a special three-layer feature extraction pyramid network, which comprises the following steps: the first layer superposes gradient images of a source image with the same image resolution in channel dimension to obtain input with the channel number of 6, two 3 multiplied by 3 convolution kernels with the step length of 1 are used, the channel number is 8, and the output size is 512 multiplied by 640; in the second layer, firstly, a convolution kernel with the step size of 2, the convolution kernel of 5 multiplied by 5 and the channel number of 16 is used, then the input with the channel number of 19 is obtained by overlapping the gradient image of the source image with the same image resolution in the channel dimension, then, two convolution kernels with the step size of 1, the convolution kernel of 3 multiplied by 3 and the channel number of 16 are used, and the output size is 256 multiplied by 320; in the third layer, firstly, a convolution kernel with the step size of 2, the convolution kernel of 5 multiplied by 5 and the channel number of 32 is used, then the input with the channel number of 35 is obtained by overlapping the gradient images of the source image with the same image resolution in the channel dimension, then, two step sizes are used, the convolution kernel is 3 multiplied by 3, the channel number is 32, and the output size is 128 multiplied by 160; wherein, each convolution operation is followed by batch processing operation and nonlinear activation function, and finally the characteristic diagram is obtained.

5. The method of claim 3, wherein the method comprises: the step 2.2 is to design a contribution degree algorithm, and a specific method for calculating the contribution degree for each feature body is as follows: first, absolute differences between a feature body of a source image and a feature body of a reference image are obtained by using a torch.abs () function and a torch.sub () function, all feature body difference data are processed by using a slice function and a reshape (1, b c h w), two-dimensional tensor data (1, b c h w) of a patch set of all feature body differences of each depth plane are obtained by connecting the first dimension through a torch.cat () function, the opposite number of the tensor is calculated through a softmax () function and a torch.neg () function, four-dimensional tensors (b, c, h, w) are obtained through two times of unsqueeze (-1) and reshape (b, c, h, w) functions, corresponding feature body contributions are respectively formed, and then different feature body contributions are obtained by multiplying the feature bodies with corresponding feature bodies, and finally fused into a new Cost volume in the form of variance.

6. The method of claim 3, wherein the method comprises: the step 2.3 is to design a Gaussian process regression algorithm, and the specific method for conducting 3D regularization on the Cost volume is as follows: (ii) a Firstly, designing a distance function (pos 1, pos 2), respectively obtaining a rotation matrix R and a translation matrix t from a camera pose (pos) by using a slice function, and respectively carrying out different calculations on t and R by using np.linear.norm () and np.matrix.trace (), finally obtaining the distance between two camera poses, and obtaining the distance D between the camera poses of all read pictures by calculation, wherein the dimension is (n, n); secondly, designing a radial basis kernel function K, defining three hyper-parameters, expanding the camera pose distance D into three dimensions (1, n, n) by using np.expand _ dims (), obtaining K, adding the Cost volumes of all pictures in the third dimension by using the torch.sum (), obtaining a tensor with dimensions (b, l, h, w), connecting all Cost volumes by using torch.stack () in the second dimension, obtaining Y with dimensions (b, n, l, h, w), using view (b, l, -1) function for Y, enabling Y to be used as a result in an adaptive manner, finally obtaining a relation function of the camera pose distance D and the Cost volumes, predicting hidden space information brought by different camera poses by using the Y, enriching the Cost, and finally carrying out 3D regularization on the Cost volumes to obtain a depth map.

7. The method of claim 1, wherein the method comprises: the specific method for generating the point cloud data of each predicted scene in the step 4 comprises the following steps: after the depth map and the probability map of each image are obtained, the fused point cloud data is obtained through simple version depth map filter/fusion.