Movatterモバイル変換


[0]ホーム

URL:


CN114119916A - Multi-view stereoscopic vision reconstruction method based on deep learning - Google Patents

Multi-view stereoscopic vision reconstruction method based on deep learning
Download PDF

Info

Publication number
CN114119916A
CN114119916ACN202111200048.3ACN202111200048ACN114119916ACN 114119916 ACN114119916 ACN 114119916ACN 202111200048 ACN202111200048 ACN 202111200048ACN 114119916 ACN114119916 ACN 114119916A
Authority
CN
China
Prior art keywords
function
network
designing
obtaining
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111200048.3A
Other languages
Chinese (zh)
Other versions
CN114119916B (en
Inventor
韩燮
王若蓝
李顺增
赵融
谌钟毓
任铭铭
杨恬恬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North University of China
Original Assignee
North University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North University of ChinafiledCriticalNorth University of China
Priority to CN202111200048.3ApriorityCriticalpatent/CN114119916B/en
Publication of CN114119916ApublicationCriticalpatent/CN114119916A/en
Application grantedgrantedCritical
Publication of CN114119916BpublicationCriticalpatent/CN114119916B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention discloses a deep learning-based multi-view stereoscopic vision reconstruction method, and belongs to the technical field of machine vision. According to the method, a network model more suitable for multi-view image three-dimensional reconstruction is constructed from the aspects of changing a feature extraction network structure, designing a contribution algorithm and acquiring hidden spatial information, so that the problems of insufficient reconstruction accuracy and integrity and the like are solved. The improved MVSNet method of the invention achieves Acc and Comp of 0.473 and 1.304 on the DTU data set, and has better accuracy and integrity; the method is suitable for complex large-scale image three-dimensional reconstruction environments, and can be well used in the fields of reverse engineering, ancient cultural relic restoration and the like.

Description

Multi-view stereoscopic vision reconstruction method based on deep learning
Technical Field
The invention belongs to the technical field of machine vision, and particularly relates to a deep learning-based multi-view stereoscopic vision reconstruction method.
Background
In the process of continuous development of computer technology, a three-dimensional model is valued and applied in more and more information fields, such as digital city construction, Virtual Reality (VR) technology, reverse engineering, ancient cultural relic restoration, an automatic driving system and the like, so that research and development of a three-dimensional reconstruction technology are in great tendency. Among the numerous three-dimensional reconstruction methods, the three-dimensional reconstruction technology using the multi-view image depth map becomes one of the most mainstream research directions at present due to the characteristics of low equipment cost, high operation flexibility and high reconstruction accuracy of the algorithm.
For the three-dimensional reconstruction technology of the multi-view image depth map, the method is generally divided into a traditional computer vision-based method and a depth learning-based method. The method comprises the steps of calculating the cost of stereo matching by using the characteristics of manual design, and performing aggregation, parallax calculation and optimization on the cost to finally obtain the depth value. The three-dimensional reconstruction method can achieve a good three-dimensional reconstruction effect in an ideal Lambert body scene, but for an area with sparse texture and non-diffuse reflection, the reconstruction result is often poor because local feature extraction is difficult and dense matching is difficult to perform. However, the deep learning method automatically learns the high-level and global features of the input image through a deep convolutional neural network based on a large amount of training data, can comprehensively learn the information in the image and abstract the information into high-level semantic features, compared with the traditional method, the extracted features have stronger semantic property, and artificial errors are avoided. Although some recent studies have shown that depth learning-based image depth estimation techniques can effectively improve the quality of reconstruction, there is still room for further improvement in terms of accuracy and completeness.
Disclosure of Invention
Aiming at the problem that the existing mainstream three-dimensional reconstruction technology of the depth map of the multi-view image based on the depth learning often has insufficient reconstruction accuracy and integrity, the invention provides a multi-view stereoscopic vision reconstruction method based on the depth learning.
In order to achieve the purpose, the invention adopts the following technical scheme:
a multi-view stereoscopic vision reconstruction method based on deep learning, the reconstruction method comprises the following steps:
step 1, processing a DTU data set;
step 2, designing a network;
step 3, reading the data set in thestep 1 into the design network in the step 2 for training to obtain the trained network weight;
step 4, reading the network weight trained in the step 3 into the network designed in the step 2, reading scene information to be predicted into the network for calculation, obtaining a depth map, a probability map and a mask map of each picture, and generating point cloud data of each predicted scene;
and 5, analyzing the accuracy and the integrity of the point cloud data of each prediction scene to obtain the average accuracy and integrity.
Further, the specific method for processing the DTU data set instep 1 is as follows: and normalizing the image data by using Python language, and performing data type conversion on other data to obtain a processed data set.
Further, the specific method for designing a network in step 2 is as follows: further comprising the steps of:
step 2.1, designing a special three-layer feature extraction pyramid network;
step 2.2, designing a contribution degree algorithm, and calculating the contribution degree for each feature body;
and 2.3, designing a Gaussian process regression algorithm, and carrying out 3D regularization on the Cost volume.
Further, the step 2.1 designs a specific method of a special three-layer feature extraction pyramid network, which comprises the following steps: the feature extraction network is a special three-layer pyramid model, and is different from a common three-layer pyramid model in that a gradient image of a source image with the same image resolution is added behind each layer, wherein an F.interplate () function is used for carrying out down-sampling on an input image, np.gradient () and np.abs () functions are used for respectively solving absolute gradient values of a third dimension and a fourth dimension of the input image, and a final gradient image is obtained through a summation function;
the first layer superposes gradient images of a source image with the same image resolution in channel dimension to obtain input with the channel number of 6, two 3 multiplied by 3 convolution kernels with the step length of 1 are used, the channel number is 8, and the output size is 512 multiplied by 640;
in the second layer, firstly, a convolution kernel with the step size of 2, the convolution kernel of 5 multiplied by 5 and the channel number of 16 is used, then the input with the channel number of 19 is obtained by overlapping the gradient image of the source image with the same image resolution in the channel dimension, then, two convolution kernels with the step size of 1, the convolution kernel of 3 multiplied by 3 and the channel number of 16 are used, and the output size is 256 multiplied by 320;
in the third layer, firstly, a convolution kernel with the step size of 2, the convolution kernel of 5 multiplied by 5 and the channel number of 32 is used, then the input with the channel number of 35 is obtained by overlapping the gradient images of the source image with the same image resolution in the channel dimension, then, two step sizes are used, the convolution kernel is 3 multiplied by 3, the channel number is 32, and the output size is 128 multiplied by 160; wherein, each convolution operation is followed by batch processing operation and nonlinear activation function, and finally the characteristic diagram is obtained.
Further, the step 2.2 designs a contribution degree algorithm, and a specific method for calculating the contribution degree for each feature body is as follows: when a Cost volume is formed, a feature body is obtained by the feature map through micro homography transformation, so that on the same depth plane, corresponding pixel points of the same source image pixel point are different due to different camera postures of the feature map, and according to the view that three-dimensional points at the same coordinate are different in visibility from different viewpoints, namely different views all deal with matching Cost to make unequal contribution, a contribution algorithm is designed, and a contribution body is calculated for each feature body. First, absolute differences between a feature body of a source image and a feature body of a reference image are obtained by using a torch.abs () function and a torch.sub () function, all feature body difference data are processed by using a slice function and a reshape (1, b c h w), two-dimensional tensor data (1, b c h w) of a patch set of all feature body differences of each depth plane are obtained by connecting the first dimension through a torch.cat () function, the opposite number of the tensor is calculated through a softmax () function and a torch.neg () function, four-dimensional tensors (b, c, h, w) are obtained through two times of unsqueeze (-1) and reshape (b, c, h, w) functions, corresponding feature body contributions are respectively formed, and then different feature body contributions are obtained by multiplying the feature bodies with corresponding feature bodies, and finally fused into a new Cost volume in the form of variance.
Further, the step 2.3 designs a gaussian process regression algorithm, and a specific method for performing 3D regularization on the Cost volume is as follows: (ii) a Firstly, designing a distance function (pos 1, pos 2), respectively obtaining a rotation matrix R and a translation matrix t from a camera pose (pos) by using a slice function, and respectively carrying out different calculations on t and R by using np.linear.norm () and np.matrix.trace (), finally obtaining the distance between two camera poses, and obtaining the distance D between the camera poses of all read pictures by calculation, wherein the dimension is (n, n); secondly, designing a radial basis kernel function K, defining three hyper-parameters, expanding the camera pose distance D into three dimensions (1, n, n) by using np.expand _ dims (), obtaining K, adding the Cost volumes of all pictures in the third dimension by using the torch.sum (), obtaining a tensor with dimensions (b, l, h, w), connecting all Cost volumes by using torch.stack () in the second dimension, obtaining Y with dimensions (b, n, l, h, w), using view (b, l, -1) function for Y, enabling Y to be used as a result in an adaptive manner, finally obtaining a relation function of the camera pose distance D and the Cost volumes, predicting hidden space information brought by different camera poses by using the Y, enriching the Cost, and finally carrying out 3D regularization on the Cost volumes to obtain a depth map.
Further, the specific method for generating the point cloud data of each predicted scene in the step 4 is as follows: after the depth map and the probability map of each image are obtained, the fused point cloud data is obtained through simple version depth map filter/fusion.
The method is suitable for three-dimensional reconstruction engineering of a large number of images, such as cultural relic restoration, reverse engineering and the like.
Compared with the prior art, the invention has the following advantages:
when the MVSNet network reconstructs an image, the reconstruction accuracy and the integrity are related to the quality of the image, and the reconstruction effect is not good for an image scene with strong illumination and large pose change, which is reflected in that the reconstruction accuracy and the integrity are not enough. According to the depth estimation method, the accuracy of depth estimation is improved by aiming at the depth estimation of a multi-view image, an MVSNet improvement algorithm based on depth learning is provided, and a single feature extraction network module is improved to add a first-order gradient image of the image into the feature extraction network module so as to extract features which are not influenced by illumination; meanwhile, designing a contribution algorithm to obtain a novel Cost volume so as to improve the precision of depth estimation; in addition, a Gaussian process regression algorithm is adopted, and a spatial information acquisition module is designed, so that hidden spatial information brought by different camera postures is predicted, the accuracy of the depth map is improved, and a good reconstruction effect is achieved.
Drawings
FIG. 1 is a network layout of the present method;
FIG. 2 is a diagram of a feature extraction module architecture;
FIG. 3 is a flow diagram of a contribution score calculation module;
FIG. 4 is a flow diagram of a spatial information acquisition module;
FIG. 5 is an illustration of the reconstruction results of the present invention.
Detailed Description
Example 1
Step 1, data set processing: the public data set DTU is processed by using Python language, images, camera parameters, mask pictures and the like are read in, and the images, the camera parameters, the mask pictures and the like are converted into single-precision tensor data types to adapt to network reading.
Step 2, reading the training set after the data processing in thestep 1 into a designed network for training: the 79 scenes, namely 27097 images, are taken as a training set, 18 scenes, namely 6174 images are taken as a test set, the batch size is 4, numdepth is 192, the learning rate is 0.001, and the weight attenuation rate is 0. Finally, iterate to 16 stops.
Wherein the network structure is designed to: firstly, a three-layer feature extraction pyramid network is designed. The method comprises the steps that firstly, an F.interplate () function is used for carrying out down-sampling on an input image, np.gradient () and np.abs () functions are used for respectively solving the absolute value of the gradient of the third dimension and the fourth dimension of the input image, and then a final gradient image is obtained through a summation function; secondly, designing a three-layer characteristic extraction pyramid network; the first layer superposes gradient images of a source image with the same image resolution in channel dimension to obtain an input with the channel number of 6, two 3 × 3 convolution kernels with the step size of 1 are used, the channel number is 8, and the output size is 512 × 640. In the second layer, a convolution kernel with the step size of 2, the convolution kernel of 5 × 5 and the channel number of 16 is used, then the gradient images of the source image with the same image resolution in the channel dimension are superposed to obtain an input with the channel number of 19, and then two convolution kernels with the step size of 1, the convolution kernel of 3 × 3 and the channel number of 16 are used, and the output size is 256 × 320. In the third layer, a convolution kernel with a step size of 2, a convolution kernel of 5 × 5 and a channel number of 32 is used, then gradient images of the source image with the same image resolution in the channel dimension are superposed to obtain an input with a channel number of 35, two step sizes are used, the convolution kernel is 3 × 3, the channel number is 32, and the output size is 128 × 160. Wherein, each convolution operation is followed by batch processing operation and nonlinear activation function, and finally the characteristic diagram is obtained.
Secondly, a contribution algorithm is designed. When a Cost volume is formed, a feature body is obtained by the feature map through micro homography transformation, so that on the same depth plane, corresponding pixel points of the same source image pixel point are different due to different camera postures of the feature map, and according to the view that three-dimensional points at the same coordinate are different in visibility from different viewpoints, namely different views all deal with matching Cost to make unequal contribution, a contribution algorithm is designed, and a contribution body is calculated for each feature body. Firstly, using a torch.abs () function and a torch.sub () function to obtain the absolute difference between a feature body of a source image and a feature body of a reference image, processing all feature body difference data by using a slice function and reshape (1, b c h w) to obtain two-dimensional tensor data (1, b c h w) of a patch set of all feature body differences of each depth plane, then connecting the two-dimensional tensor data on a first dimension by using a torch.cat () function to obtain data with dimensions of (l, b c h w), calculating the opposite number of the tensor by using a softmax () function and a torch.neg () function, obtaining four-dimensional tensors (b, c, h, w) by using unsqueeze (-1) and reshape (b, c, h, w) functions twice, respectively forming corresponding contribution bodies of the feature bodies, and multiplying the contribution bodies with the corresponding feature bodies to obtain different contribution bodies, and finally fused into a new Cost volume in the form of variance.
Immediately before 3D regularization of the Cost volume, a spatial information acquisition module is designed. And adopting a Gaussian process regression algorithm, and obtaining hidden spatial information by using the camera attitude information and the Cost volume information. Firstly, designing a distance function (pos 1, pos 2), respectively obtaining a rotation matrix R and a translation matrix t from a camera pose (pos) by using a slice function, and respectively carrying out different calculations on t and R by using np.linear.norm () and np.matrix.trace (), finally obtaining the distance between two camera poses, and obtaining the distance D between the camera poses of all read pictures by calculation, wherein the dimension is (n, n); secondly, designing a radial basis kernel function K, defining three hyper-parameters, expanding the camera pose distance D into three dimensions (1, n, n) by using np.expand _ dims (), obtaining K, adding the Cost volumes of all pictures in the third dimension by using torch.sum (), obtaining a tensor with dimensions (b, l, h, w), connecting all Cost volumes in the second dimension by using torch.stack (), obtaining Y with dimensions (b, n, l, h, w), and using a view (b, l, -1) function for Y, so that Y can be adaptively used as a result, and finally obtaining a relation function of the camera pose distance D and the Cost volumes, thereby predicting hidden spatial information brought by different camera poses.
And 3, inputting the image of the test pair into the network for prediction: and selecting 22 scenes, namely 7564 images, from the rest scenes of the DTU data set for prediction, wherein the batch size is 2, and numdepth is 96, and finally obtaining a point cloud data file.
And 4, analyzing the obtained point cloud data file to obtain final average acc and comp data: and calculating Acc and Comp of the obtained point cloud model data of all scenes by using a standard MatLab evaluation code, and finally obtaining the accuracy, integrity and overall score of the reconstructed point clouds of all predicted scenes so as to evaluate the accuracy of the depth map.
TABLE 1 analysis of efficiency
Figure BDA0003304576940000081
Table 1 is a quantitative result of the 4 methods on the reconstructed mass on the DTU data set (lower is better). It can be seen that under the condition that numdepth is 96, in the aspect of Acc (accuracy), the accuracy of the original MVSNet algorithm is 0.496, the accuracy of the R-MVSNet algorithm is 0.478, and the accuracy of the Point-MVSNet algorithm is 0.462, the accuracy of the improved method is 0.483, and the improved method has better accuracy than the original MVSNet algorithm and the R-MVSNet algorithm; in the aspect of Comp (integrity), the integrity of the original MVSNet algorithm is 1.378, the integrity of the R-MVSNet algorithm is 1.341, the integrity of the Point-MVSNet algorithm is 1.326, the integrity of the improved method of the invention is 1.304, and the improved method is the best method of the 4 methods; in the aspect of Overall (average precision), the average precision of the original MVSNet algorithm is 0.937, the average precision of the R-MVSNet algorithm is 0.910, the average precision of the Point-MVSNet algorithm is 0.894, and the improved method of the invention has the average precision of 0.889 and is the best of the 4 methods.
The invention researches the problems of reconstruction of multi-view images and insufficient reconstruction accuracy and integrity, provides an MVSNet improvement algorithm based on depth learning, constructs a network model for improving the precision of a depth map by modifying a feature extraction module, designing a contribution calculation module and a spatial information acquisition module, and enhances the depth estimation capability of the images. The experimental results show that: the network model provided by the invention effectively solves the problem of poor accuracy and integrity when the image is reconstructed, and has a good reconstruction effect.
Those skilled in the art will appreciate that the invention may be practiced without these specific details. Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims (7)

1. A multi-view stereo vision reconstruction method based on deep learning is characterized in that: the reconstruction method comprises the following steps:
step 1, processing a DTU data set;
step 2, designing a network;
step 3, reading the data set in the step 1 into the network designed in the step 2 for training to obtain the trained network weight;
step 4, reading the network weight trained in the step 3 into the network designed in the step 2, reading scene information to be predicted into the network for calculation, obtaining a depth map, a probability map and a mask map of each picture, and generating point cloud data of each predicted scene;
and 5, analyzing the accuracy and the integrity of the point cloud data of each prediction scene to obtain the average accuracy and integrity.
2. The method of claim 1, wherein the method comprises: the specific method for processing the DTU data set in the step 1 is as follows: and normalizing the image data by using Python language, and performing data type conversion on other data to obtain a processed data set.
3. The method of claim 1, wherein the method comprises: the specific method for designing the network in the step 2 comprises the following steps: further comprising the steps of:
step 2.1, designing a special three-layer feature extraction pyramid network;
step 2.2, designing a contribution degree algorithm, and calculating the contribution degree for each feature body;
and 2.3, designing a Gaussian process regression algorithm, and carrying out 3D regularization on the Cost volume.
4. The method of claim 3, wherein the method comprises: the step 2.1 designs a specific method of a special three-layer feature extraction pyramid network, which comprises the following steps: the first layer superposes gradient images of a source image with the same image resolution in channel dimension to obtain input with the channel number of 6, two 3 multiplied by 3 convolution kernels with the step length of 1 are used, the channel number is 8, and the output size is 512 multiplied by 640; in the second layer, firstly, a convolution kernel with the step size of 2, the convolution kernel of 5 multiplied by 5 and the channel number of 16 is used, then the input with the channel number of 19 is obtained by overlapping the gradient image of the source image with the same image resolution in the channel dimension, then, two convolution kernels with the step size of 1, the convolution kernel of 3 multiplied by 3 and the channel number of 16 are used, and the output size is 256 multiplied by 320; in the third layer, firstly, a convolution kernel with the step size of 2, the convolution kernel of 5 multiplied by 5 and the channel number of 32 is used, then the input with the channel number of 35 is obtained by overlapping the gradient images of the source image with the same image resolution in the channel dimension, then, two step sizes are used, the convolution kernel is 3 multiplied by 3, the channel number is 32, and the output size is 128 multiplied by 160; wherein, each convolution operation is followed by batch processing operation and nonlinear activation function, and finally the characteristic diagram is obtained.
5. The method of claim 3, wherein the method comprises: the step 2.2 is to design a contribution degree algorithm, and a specific method for calculating the contribution degree for each feature body is as follows: first, absolute differences between a feature body of a source image and a feature body of a reference image are obtained by using a torch.abs () function and a torch.sub () function, all feature body difference data are processed by using a slice function and a reshape (1, b c h w), two-dimensional tensor data (1, b c h w) of a patch set of all feature body differences of each depth plane are obtained by connecting the first dimension through a torch.cat () function, the opposite number of the tensor is calculated through a softmax () function and a torch.neg () function, four-dimensional tensors (b, c, h, w) are obtained through two times of unsqueeze (-1) and reshape (b, c, h, w) functions, corresponding feature body contributions are respectively formed, and then different feature body contributions are obtained by multiplying the feature bodies with corresponding feature bodies, and finally fused into a new Cost volume in the form of variance.
6. The method of claim 3, wherein the method comprises: the step 2.3 is to design a Gaussian process regression algorithm, and the specific method for conducting 3D regularization on the Cost volume is as follows: (ii) a Firstly, designing a distance function (pos 1, pos 2), respectively obtaining a rotation matrix R and a translation matrix t from a camera pose (pos) by using a slice function, and respectively carrying out different calculations on t and R by using np.linear.norm () and np.matrix.trace (), finally obtaining the distance between two camera poses, and obtaining the distance D between the camera poses of all read pictures by calculation, wherein the dimension is (n, n); secondly, designing a radial basis kernel function K, defining three hyper-parameters, expanding the camera pose distance D into three dimensions (1, n, n) by using np.expand _ dims (), obtaining K, adding the Cost volumes of all pictures in the third dimension by using the torch.sum (), obtaining a tensor with dimensions (b, l, h, w), connecting all Cost volumes by using torch.stack () in the second dimension, obtaining Y with dimensions (b, n, l, h, w), using view (b, l, -1) function for Y, enabling Y to be used as a result in an adaptive manner, finally obtaining a relation function of the camera pose distance D and the Cost volumes, predicting hidden space information brought by different camera poses by using the Y, enriching the Cost, and finally carrying out 3D regularization on the Cost volumes to obtain a depth map.
7. The method of claim 1, wherein the method comprises: the specific method for generating the point cloud data of each predicted scene in the step 4 comprises the following steps: after the depth map and the probability map of each image are obtained, the fused point cloud data is obtained through simple version depth map filter/fusion.
CN202111200048.3A2021-10-142021-10-14 A multi-view stereo vision reconstruction method based on deep learningActiveCN114119916B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111200048.3ACN114119916B (en)2021-10-142021-10-14 A multi-view stereo vision reconstruction method based on deep learning

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111200048.3ACN114119916B (en)2021-10-142021-10-14 A multi-view stereo vision reconstruction method based on deep learning

Publications (2)

Publication NumberPublication Date
CN114119916Atrue CN114119916A (en)2022-03-01
CN114119916B CN114119916B (en)2025-02-18

Family

ID=80375935

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111200048.3AActiveCN114119916B (en)2021-10-142021-10-14 A multi-view stereo vision reconstruction method based on deep learning

Country Status (1)

CountryLink
CN (1)CN114119916B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115588038A (en)*2022-10-192023-01-10沈阳工业大学 A Multi-View Depth Estimation Method
CN118691761A (en)*2024-07-122024-09-24中北大学 A hierarchical 3D reconstruction method based on automatic decoder

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107203988A (en)*2016-03-182017-09-26北京大学A kind of method and its application that three-dimensional volumetric image is rebuild by two dimensional x-ray image
CA3032983A1 (en)*2019-02-062020-08-06Thanh Phuoc HongSystems and methods for keypoint detection

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107203988A (en)*2016-03-182017-09-26北京大学A kind of method and its application that three-dimensional volumetric image is rebuild by two dimensional x-ray image
CA3032983A1 (en)*2019-02-062020-08-06Thanh Phuoc HongSystems and methods for keypoint detection

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
J. FU, J. LIANG AND Z. WANG: ""Monocular Depth Estimation Based on Multi-Scale Graph Convolution Networks"", 《IEEE ACCESS》, vol. 8, 3 January 2020 (2020-01-03), pages 997 - 1009, XP011764479, DOI: 10.1109/ACCESS.2019.2961606*
范冰: ""异构立体视觉系统的三维重建关键技术研究"", 《中国博士学位论文全文数据库(电子期刊) 信息科技辑》, no. 4, 15 April 2021 (2021-04-15)*
邢彩燕;张志毅;胡少军;耿楠;: "基于图像尖锐度的角点匹配算法", 计算机工程与科学, no. 04, 15 April 2019 (2019-04-15)*

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115588038A (en)*2022-10-192023-01-10沈阳工业大学 A Multi-View Depth Estimation Method
CN118691761A (en)*2024-07-122024-09-24中北大学 A hierarchical 3D reconstruction method based on automatic decoder

Also Published As

Publication numberPublication date
CN114119916B (en)2025-02-18

Similar Documents

PublicationPublication DateTitle
CN110443842B (en)Depth map prediction method based on visual angle fusion
CN111339903B (en)Multi-person human body posture estimation method
Ye et al.DPNet: Detail-preserving network for high quality monocular depth estimation
CN112232134B (en)Human body posture estimation method based on hourglass network and attention mechanism
CN108764250B (en) A method of extracting essential images using convolutional neural network
CN116958453B (en)Three-dimensional model reconstruction method, device and medium based on nerve radiation field
CN111582104A (en)Semantic segmentation method and device for remote sensing image
CN117115359A (en)Multi-view power grid three-dimensional space data reconstruction method based on depth map fusion
CN113962858A (en) A multi-view depth acquisition method
CN116958420A (en) A high-precision modeling method for the three-dimensional face of a digital human teacher
CN114972378A (en)Brain tumor MRI image segmentation method based on mask attention mechanism
Lin et al.Dyspn: Learning dynamic affinity for image-guided depth completion
CN112509021A (en)Parallax optimization method based on attention mechanism
CN114820323A (en)Multi-scale residual binocular image super-resolution method based on stereo attention mechanism
CN116310095A (en)Multi-view three-dimensional reconstruction method based on deep learning
CN114119916A (en)Multi-view stereoscopic vision reconstruction method based on deep learning
CN114663880A (en)Three-dimensional target detection method based on multi-level cross-modal self-attention mechanism
Wang et al.Stereo matching and 3D reconstruction with NeRF supervision for accurate weight estimation in free-swimming fish
Xiao et al.Multi-dimensional graph interactional network for progressive point cloud completion
Jiang et al.Contrastive learning of features between images and lidar
CN112669452A (en)Object positioning method based on convolutional neural network multi-branch structure
CN119888086A (en)Multi-view three-dimensional reconstruction method based on depth perception
CN119941807A (en) A robust point cloud registration method and system based on global spatial perception and multi-level filtering
CN119919782A (en) A remote sensing target detection method and system based on selective feature space fusion
CN114926734A (en)Solid waste detection device and method based on feature aggregation and attention fusion

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp