Movatterモバイル変換


[0]ホーム

URL:


CN115457101A - Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform - Google Patents

Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform
Download PDF

Info

Publication number
CN115457101A
CN115457101ACN202211408484.4ACN202211408484ACN115457101ACN 115457101 ACN115457101 ACN 115457101ACN 202211408484 ACN202211408484 ACN 202211408484ACN 115457101 ACN115457101 ACN 115457101A
Authority
CN
China
Prior art keywords
depth
view
depth map
map
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211408484.4A
Other languages
Chinese (zh)
Other versions
CN115457101B (en
Inventor
陶文兵
苏婉娟
刘李漫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Tuke Intelligent Information Technology Co ltd
Original Assignee
Wuhan Tuke Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Tuke Intelligent Technology Co ltdfiledCriticalWuhan Tuke Intelligent Technology Co ltd
Priority to CN202211408484.4ApriorityCriticalpatent/CN115457101B/en
Publication of CN115457101ApublicationCriticalpatent/CN115457101A/en
Application grantedgrantedCritical
Publication of CN115457101BpublicationCriticalpatent/CN115457101B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention provides an efficient edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform. The method comprises the following steps: a hierarchical edge preserving residual learning module is provided to correct errors generated in bilinear upsampling and optimize a depth map estimated by a multi-scale depth estimation network, so that the network can obtain a depth map with edge details preserved; the gradient flow of a detail area during training is enhanced by providing cross-view luminosity consistency loss, so that the accuracy of depth estimation can be further improved; a lightweight multi-view depth estimation cascade network framework is designed, and depth hypothesis sampling can be performed as much as possible under the condition of not increasing much extra video memory and time consumption by stacking stages under the same resolution, so that depth estimation can be performed efficiently.

Description

Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform
Technical Field
The invention relates to the technical field of computer vision, in particular to an edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform.
Background
The multi-view depth estimation facing the unmanned aerial vehicle platform aims to establish dense corresponding relation in multi-view images acquired by the unmanned aerial vehicle, so that the depth of the images under a reference view angle is recovered. Unmanned aerial vehicle autonomous navigation needs to possess perception surrounding environment and location ability, and multi-view depth estimation facing to an unmanned aerial vehicle platform can provide three-dimensional scene perception and understanding ability for the unmanned aerial vehicle, and provides technical support for unmanned aerial vehicle realization autonomous obstacle avoidance and range finding and three-dimensional map reconstruction based on the unmanned aerial vehicle. In recent years, the development of multi-view depth estimation is greatly promoted by deep learning technology. The learning-based multi-view depth estimation method generally adopts 3D CNN (3D volumetric Neural Network) to perform regularization of the cost body, however, due to the smooth characteristic of the 3D CNN, there is a problem of excessive smoothing at the edge of the object in the estimated depth map.
In addition, the depth map estimation can be performed more efficiently due to a Coarse-to-Fine (Coarse-to-Fine) architecture, which is widely applied to a learning-based multi-view depth estimation method. But in this architecture, discrete and sparse depth hypothesis sampling further exacerbates the difficulty of recovering thin structures and object edge depths. Moreover, the existing multi-view depth estimation method based on learning is difficult to realize good balance between performance and efficiency, limited by limited airborne hardware resources of the unmanned aerial vehicle, and the existing multi-view depth estimation algorithm is difficult to be practically applied on an unmanned aerial vehicle platform. Therefore, how to accurately recover the depth of the detail area to provide support for the unmanned aerial vehicle to accurately measure the distance and how to achieve a good balance between performance and efficiency remain key issues to be solved.
Disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides an edge-preserving multi-view depth estimation and distance measurement method for an unmanned aerial vehicle platform, and aims to solve the technical problems that the depth of a thin structure and an object edge area is difficult to recover and good balance between performance and efficiency is difficult to realize in the conventional method.
According to a first aspect of the present invention, there is provided an edge-preserving multi-view depth estimation method for a drone platform, comprising: step 1, a reference image is given
Figure 795187DEST_PATH_IMAGE001
And N-1 neighborhood images thereof
Figure 674281DEST_PATH_IMAGE002
Extracting the multi-scale depth features of each image by using a weight sharing multi-scale depth feature extraction network
Figure 318889DEST_PATH_IMAGE003
Wherein, in the process,
Figure 149180DEST_PATH_IMAGE004
representing the s-th scale, the size of the s-th scale feature being
Figure 904646DEST_PATH_IMAGE005
Figure 497301DEST_PATH_IMAGE006
Is the number of channels of the s-th scale feature,
Figure 922598DEST_PATH_IMAGE007
is the size of the original input image;
step 2, determining the estimated depth map of the 1 st stage of the multi-scale depth feature extraction network
Figure 866283DEST_PATH_IMAGE008
Step 3, based on the depth map
Figure 300806DEST_PATH_IMAGE008
Determining a depth map for the 2 nd stage estimation of the multi-scale depth feature extraction network
Figure 13547DEST_PATH_IMAGE009
Step 4, adopting a hierarchy edge retention residual error learning module to carry out comparison on the depth map
Figure 140903DEST_PATH_IMAGE009
Optimizing and upsampling to obtain an optimized depth map
Figure 476945DEST_PATH_IMAGE010
Step 5, based on the depth map
Figure 449580DEST_PATH_IMAGE010
And image depth features at 2 nd scale
Figure 282407DEST_PATH_IMAGE011
Sequentially carrying out depth estimation of the 3 rd stage and the 4 th stage to obtain a depth map estimated in the 4 th stage
Figure 315085DEST_PATH_IMAGE012
Step 6, adopting a hierarchy edge retention residual error learning module to carry out comparison on the depth map
Figure 967783DEST_PATH_IMAGE012
Optimizing and upsampling to obtain an optimized depth map
Figure 9688DEST_PATH_IMAGE013
Step 7, based on the optimized depth map
Figure 94360DEST_PATH_IMAGE013
And image depth features at the 3 rd scale
Figure 688153DEST_PATH_IMAGE014
Performing depth estimation at the 5 th stage to obtain a depth map
Figure 437934DEST_PATH_IMAGE015
On the basis of the technical scheme, the invention can be improved as follows.
Optionally, the multi-scale feature extraction network is a two-dimensional U-shaped network composed of an encoder and a decoder with a jump connection; the encoder and the decoder are composed of a plurality of residual blocks.
Optionally, step 2 includes:
step 201, in the whole scene depth range
Figure 877006DEST_PATH_IMAGE016
Internal uniform sampling
Figure 294212DEST_PATH_IMAGE017
A depth hypothesis value;
step 202, through the micro-homography transformation, under each depth hypothesis, the first oneiDepth characterization of a view of a web neighborhood
Figure 58905DEST_PATH_IMAGE018
Projective transformation is carried out to the reference view, and then the two-view cost body is constructed by utilizing the group correlation measurement
Figure 561562DEST_PATH_IMAGE019
Step 203, for the second stepiTwo-view cost body
Figure 443805DEST_PATH_IMAGE019
Estimation of visibility map using shallow 3D CNN
Figure 840151DEST_PATH_IMAGE020
And based on the visibility map of each domain view
Figure 651112DEST_PATH_IMAGE021
And carrying out weighted summation on all the two-view cost bodies to obtain the final aggregated cost body
Figure 765699DEST_PATH_IMAGE022
Step 204, utilizing a three-dimensional convolution neural network to carry out the cost matching on the cost body
Figure 687519DEST_PATH_IMAGE022
Regularizing, obtaining a depth probability body through a Softmax operation, and obtaining the depth map by adopting soft-argmax based on the depth probability body
Figure 876054DEST_PATH_IMAGE023
Optionally, step 3 includes:
step 301, according to the depth map
Figure 90873DEST_PATH_IMAGE023
Determining a depth hypothesis sampling range for the second stage
Figure 692756DEST_PATH_IMAGE024
And sampling uniformly in the depth range
Figure 683845DEST_PATH_IMAGE025
A depth hypothesis value;
step 302, performing two-view cost body construction and aggregation according to the method from step 201 to step 203, and performing image depth feature under the 1 st scale
Figure 930150DEST_PATH_IMAGE026
And with
Figure 941968DEST_PATH_IMAGE025
Obtaining aggregated cost body based on individual depth hypothesis value
Figure 906513DEST_PATH_IMAGE027
Step 303, regularizing a cost body and predicting a depth map according to the method in step 204, and based on the cost body
Figure 29190DEST_PATH_IMAGE027
Obtaining the depth map
Figure 628536DEST_PATH_IMAGE009
Optionally, the step 4 includes:
step 401, extracting multi-scale context features of a reference image by using a context coding network
Figure 76835DEST_PATH_IMAGE028
Wherein
Figure 263097DEST_PATH_IMAGE029
representing the s-th scale, the size of the s-th scale feature being
Figure 189465DEST_PATH_IMAGE030
Step 402, aligning the depth map
Figure 144782DEST_PATH_IMAGE009
Normalizing the normalized depth map by using a shallow 2D CNN network
Figure 498403DEST_PATH_IMAGE031
Carrying out feature extraction;
step 403, the extracted depth map features and the contextual features of the image are combined
Figure 171961DEST_PATH_IMAGE032
Connecting, inputting to an edge preserving residual error learning network for residual error learning to obtain a residual error map
Figure 167599DEST_PATH_IMAGE033
Step 404, normalizing and upsampling the depth map and the residual map
Figure 7117DEST_PATH_IMAGE033
Adding the obtained data, and performing de-normalization on the result to obtainThe depth map after optimization
Figure 203743DEST_PATH_IMAGE034
Optionally, the context coding network in step 401 is a two-dimensional U-shaped network, and the context coding network includes: an encoder and a decoder having a jump connection;
the depth map is mapped in the step 402
Figure 630176DEST_PATH_IMAGE035
The normalized formula is:
Figure 773713DEST_PATH_IMAGE036
(1)
wherein,
Figure 93836DEST_PATH_IMAGE037
and
Figure 428740DEST_PATH_IMAGE038
mean and variance calculations are represented, respectively;
the edge preserving residual learning network in step 403 is a two-dimensional U-shaped network consisting of one encoder and one decoder with a jump connection; the encoder and the decoder are composed of a plurality of residual blocks;
in step 404, the normalized depth map is processed
Figure 201524DEST_PATH_IMAGE039
Upsampling by bilinear interpolation and matching with the residual map
Figure 883172DEST_PATH_IMAGE040
Adding the normalized depth maps to obtain an optimized normalized depth map
Figure 57801DEST_PATH_IMAGE041
I.e. by
Figure 799492DEST_PATH_IMAGE042
(2)
Wherein,
Figure 59573DEST_PATH_IMAGE043
represents that the image is processed by bilinear interpolation
Figure 544912DEST_PATH_IMAGE041
Sampling to twice of the original; using depth maps
Figure 213528DEST_PATH_IMAGE035
The mean value and the variance are subjected to solution normalization to obtain an optimized depth map
Figure 516333DEST_PATH_IMAGE044
Figure 873497DEST_PATH_IMAGE045
(3)。
Optionally, in the process of performing depth estimation in the 3 rd stage, the 4 th stage and the 5 th stage in the step 5 and the step 7: determining a depth range according to the method of step 301;
constructing and aggregating the two-view cost body according to the method from the step 201 to the step 203; and performing cost body regularization and depth map prediction according to the method of the step 204.
Optionally, the step 6 includes:
step 601, extracting multi-scale context characteristics of reference image by using context coding network
Figure 287160DEST_PATH_IMAGE046
Step 602, aligning the depth map
Figure 46169DEST_PATH_IMAGE047
Normalizing the normalized depth map by using a shallow 2D CNN network
Figure 519876DEST_PATH_IMAGE048
Carrying out feature extraction;
step 603, the extracted depth map features and the context features of the image are combined
Figure 629914DEST_PATH_IMAGE049
Connecting, inputting to an edge-preserving residual learning network for residual learning, and obtaining a residual map
Figure 316110DEST_PATH_IMAGE050
Step 604, adding the normalized and up-sampled depth map and the residual map, and de-normalizing the added result to obtain the optimized depth map
Figure 693740DEST_PATH_IMAGE051
Optionally, the training process of the multi-scale depth feature extraction network includes:
step 801, adopting cross-view photometric consistency loss and L1 loss together to supervise a multi-scale depth estimation network, and regarding the reference image
Figure 338348DEST_PATH_IMAGE052
Pixel with middle depth value d
Figure 670103DEST_PATH_IMAGE053
Corresponding pixel in the source view
Figure 691149DEST_PATH_IMAGE054
Is composed of
Figure 159170DEST_PATH_IMAGE055
(4)
Wherein,
Figure 974680DEST_PATH_IMAGE056
and
Figure 793731DEST_PATH_IMAGE057
camera parameters for the reference view and the ith neighborhood view respectively,
Figure 352888DEST_PATH_IMAGE058
Figure 705110DEST_PATH_IMAGE059
is the relative rotation and translation between the reference view and the i-th neighborhood view; obtaining an image synthesized by the ith neighborhood view on the reference view based on the depth map D through differentiable bilinear interpolation
Figure 301308DEST_PATH_IMAGE060
I.e. by
Figure 732289DEST_PATH_IMAGE061
(5)
Binary mask generated in the conversion process
Figure 704924DEST_PATH_IMAGE062
For identifying the composite image
Figure 537751DEST_PATH_IMAGE063
An invalid pixel in (1);
the computational disclosure of cross-view photometric consistency loss is:
Figure 304850DEST_PATH_IMAGE064
(6)
wherein, respectively, views synthesized on the basis of the i-th neighborhood view according to the true depth and the estimated depth are represented, N represents the number of views,
Figure 223127DEST_PATH_IMAGE065
representing the effective pixels in the composite image and the generated GT depth map
Figure 497988DEST_PATH_IMAGE066
So as to obtain the compound with the characteristics of,
Figure 919743DEST_PATH_IMAGE067
representing valid pixels in the GT depth map;
step 802, combining the cross-view photometric consistency loss and the L1 loss to obtain the loss of the multi-scale depth estimation branch part:
Figure 123322DEST_PATH_IMAGE068
(7)
wherein
Figure 528895DEST_PATH_IMAGE069
Weight coefficients which are loss functions at the s-th stage;
step 803, the hierarchy edge residual error learning branch adopts L1 loss to supervise, and the total loss of the whole network is:
Figure 108912DEST_PATH_IMAGE070
(8)
wherein
Figure 650752DEST_PATH_IMAGE071
Is the weight coefficient of the loss function at the s-th stage.
According to a second aspect of the invention, there is provided a ranging method for an unmanned aerial vehicle platform, comprising: the distance measurement is carried out based on the depth map obtained by the edge preserving multi-view depth estimation method facing the unmanned aerial vehicle platform.
The invention provides an edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform, and provides a hierarchical edge-preserving residual error learning module for correcting errors generated in bilinear upsampling and helping to improve the accuracy of depth estimation of a multi-scale depth estimation network in order to achieve accurate estimation of a detail area. In addition, in order to enhance the gradient flow of the detail region during network training, cross-view photometric consistency loss is provided, and the accuracy of the estimated depth can be further improved. In order to realize better balance on performance and efficiency, a lightweight multi-view depth estimation cascade network framework is designed and combined with the two strategies, so that accurate depth estimation can be realized under the efficient condition, and the method is favorable for practical application on an unmanned aerial vehicle platform.
Drawings
Fig. 1 is a schematic diagram of an overall architecture of an efficient edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform according to the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
In order to overcome the defects and problems in the background art, a hierarchical edge preserving residual error learning module is proposed to optimize a depth map estimated by a multi-scale depth estimation network, so that the network can perform edge-aware depth map upsampling. In addition, a cross-view photometric consistency loss is proposed to strengthen the gradient flow of the detail region during training, thereby realizing more refined depth estimation. Meanwhile, on the basis, a lightweight multi-view depth estimation cascade network framework is designed, and depth estimation can be efficiently carried out.
Therefore, the invention provides an efficient edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform, fig. 1 is an overall architecture schematic diagram of the edge-preserving multi-view depth estimation and ranging method for the unmanned aerial vehicle platform, as shown in fig. 1, the edge-preserving multi-view depth estimation method includes:
step 1, a reference image is given
Figure 759654DEST_PATH_IMAGE001
And N-1 neighborhood images thereof
Figure 386944DEST_PATH_IMAGE002
Extracting network extraction by using multi-scale depth features shared by weightTaking multi-scale depth features of each image
Figure 269187DEST_PATH_IMAGE003
Wherein
Figure 665533DEST_PATH_IMAGE004
representing the s-th scale, the size of the s-th scale feature being
Figure 945336DEST_PATH_IMAGE005
Figure 59923DEST_PATH_IMAGE006
Is the number of channels of the s-th scale feature,
Figure 716163DEST_PATH_IMAGE007
is the size of the original input image.
Step 2, determining the depth map estimated at the 1 st stage of the multi-scale depth feature extraction network
Figure 232595DEST_PATH_IMAGE008
Step 3, based on the depth map
Figure 214458DEST_PATH_IMAGE008
Determining depth maps for 2 nd stage estimation of multi-scale depth feature extraction networks
Figure 816340DEST_PATH_IMAGE009
Step 4, in order to carry out edge-preserving upsampling, a hierarchical edge-preserving residual error learning module is adopted to carry out depth map
Figure 774807DEST_PATH_IMAGE009
Optimizing and upsampling to obtain an optimized depth map
Figure 145745DEST_PATH_IMAGE010
Step 5, based on the depth map
Figure 32930DEST_PATH_IMAGE010
And image depth features at 2 nd scale
Figure 856529DEST_PATH_IMAGE011
Sequentially carrying out depth estimation of the 3 rd stage and the 4 th stage to obtain a depth map estimated by the 4 th stage
Figure 120152DEST_PATH_IMAGE012
Step 6, adopting a hierarchy edge preserving residual error learning module to carry out depth map matching
Figure 345596DEST_PATH_IMAGE012
Optimizing and upsampling to obtain an optimized depth map
Figure 403682DEST_PATH_IMAGE013
Step 7, based on the optimized depth map
Figure 980157DEST_PATH_IMAGE013
And image depth features at the 3 rd scale
Figure 280426DEST_PATH_IMAGE014
Performing depth estimation of the 5 th stage to obtain a final depth map
Figure 501323DEST_PATH_IMAGE015
In summary, the whole multi-scale depth estimation network branch has five stages in total, the depth hypothesis sampling number of each stage is 32, 16, 8 and 8 respectively, the depth sampling range corresponding to the 2 nd stage is attenuated to be half of the previous stage, and the attenuation of the rest stages is one fourth of the previous stage.
The invention provides an efficient edge-preserving multi-view depth estimation method for an unmanned aerial vehicle platform, which aims to solve the technical problems that the depth of a thin structure and an object edge area is difficult to recover and good balance between performance and efficiency is difficult to realize in the conventional method.
Example 1
Embodiment 1 provided by the present invention is an embodiment of an edge-preserving multi-view depth estimation method for an unmanned aerial vehicle platform, and as can be seen in fig. 1, the embodiment of the edge-preserving multi-view depth estimation method includes:
step 1, a reference image is given
Figure 120523DEST_PATH_IMAGE001
And N-1 neighborhood images thereof
Figure 794081DEST_PATH_IMAGE002
Extracting the multi-scale depth features of each image by using a weight sharing multi-scale depth feature extraction network
Figure 524140DEST_PATH_IMAGE003
Wherein
Figure 599543DEST_PATH_IMAGE004
representing the s-th scale, the size of the s-th scale feature being
Figure 389645DEST_PATH_IMAGE005
Figure 783455DEST_PATH_IMAGE006
Is the number of channels of the s-th scale feature,
Figure 51625DEST_PATH_IMAGE007
is the size of the original input image.
In one possible embodiment, the multi-scale feature extraction network is a two-dimensional U-network consisting essentially of an encoder and a decoder with a jump connection. Furthermore, to enhance the feature representation capability, the encoder and decoder are composed of a plurality of residual blocks.
Step 2, determining the depth map estimated at the 1 st stage of the multi-scale depth feature extraction network
Figure 981535DEST_PATH_IMAGE008
In a possible embodiment, for the 1 st stage, step 2 includes:
step 201, in the whole scene depth range
Figure 942538DEST_PATH_IMAGE016
Internal uniform sampling
Figure 325109DEST_PATH_IMAGE017
A depth hypothesis value.
It will be appreciated that for the depth hypothesis d, the depth characteristics of all neighborhood views are transformed by the micromilliterate transform
Figure 396970DEST_PATH_IMAGE018
Transforming the projection to a reference view to obtain transformed features
Figure 446965DEST_PATH_IMAGE072
The calculation process of the micro homography is shown as formula (1).
Figure 313290DEST_PATH_IMAGE073
(1)
Wherein,
Figure 947272DEST_PATH_IMAGE074
and
Figure 557245DEST_PATH_IMAGE075
camera internal and external references respectively representing reference views,
Figure 461747DEST_PATH_IMAGE076
and
Figure 498973DEST_PATH_IMAGE077
and respectively representing the camera internal reference and the external reference of the ith neighborhood view.
Step 202, through a micro-homographic transformationUnder each depth hypothesis, the first oneiDepth characterization of a view of a web neighborhood
Figure 121715DEST_PATH_IMAGE018
Projective transformation is carried out to the reference view, and then the two-view cost body is constructed by utilizing the group correlation measurement
Figure 535379DEST_PATH_IMAGE019
It will be appreciated that the similarity of the projective transformation depth features of each neighborhood view to the depth features of the reference view is calculated based on the group correlation metric. In particular, for depth features of a reference image
Figure 294388DEST_PATH_IMAGE078
And projective transformation characteristics of the ith neighborhood view under the depth value d
Figure 768094DEST_PATH_IMAGE079
Their features are evenly divided into G groups along the feature channel dimension. Then, the user can use the device to perform the operation,
Figure 111089DEST_PATH_IMAGE078
and
Figure 62864DEST_PATH_IMAGE079
the inter-gth group feature similarity was calculated as:
Figure 941958DEST_PATH_IMAGE080
(2)
wherein,
Figure 320987DEST_PATH_IMAGE081
Figure 652742DEST_PATH_IMAGE082
and
Figure 408209DEST_PATH_IMAGE083
are respectively
Figure 141810DEST_PATH_IMAGE084
And
Figure 957319DEST_PATH_IMAGE085
the group g of features of (1),
Figure 298343DEST_PATH_IMAGE086
is an inner product operation. When the calculation is finished
Figure 591921DEST_PATH_IMAGE078
And
Figure 180028DEST_PATH_IMAGE079
after the feature similarities of all G groups, the feature similarities form a feature similarity graph of G channels
Figure 166439DEST_PATH_IMAGE087
. Due to the fact that
Figure 472786DEST_PATH_IMAGE017
Individual depth hypothesis, between reference image and i-th neighborhood view
Figure 570055DEST_PATH_IMAGE017
The feature similarity map is further sized as
Figure 12669DEST_PATH_IMAGE088
Two-view cost body
Figure 169981DEST_PATH_IMAGE019
Step 203, for the ith two-view cost body
Figure 462160DEST_PATH_IMAGE019
Estimation of visibility map using shallow 3D CNN
Figure 97541DEST_PATH_IMAGE020
And based on the visibility map of each domain view
Figure 660240DEST_PATH_IMAGE021
And carrying out weighted summation on all the two-view cost bodies to obtain the final aggregated cost body
Figure 988453DEST_PATH_IMAGE022
It will be appreciated that in order to derive a visibility map of the ith neighbourhood view under the reference view
Figure 3814DEST_PATH_IMAGE089
And for each two-view cost body, performing visibility estimation by adopting a shallow 3D CNN consisting of a layer of 3D convolution, batch regularization, reLU activation function, a layer of 3D convolution and Sigmoid activation function. On the basis, the visibility map of each domain view is utilized
Figure 708465DEST_PATH_IMAGE090
Carrying out weighted summation on the cost bodies of the two views to obtain the final aggregated cost body
Figure 125671DEST_PATH_IMAGE091
I.e. by
Figure 624785DEST_PATH_IMAGE092
(3)
Step 204, utilizing the three-dimensional convolution neural network to compare the cost body
Figure 625977DEST_PATH_IMAGE022
Regularization is carried out, a depth probability body is obtained through Softmax operation, and a depth map is obtained through soft-argmax based on the depth probability body
Figure 603160DEST_PATH_IMAGE023
It can be understood that for the cost body
Figure 140452DEST_PATH_IMAGE022
Using three-dimensional convolutional neural network pairsPrice body
Figure 76047DEST_PATH_IMAGE093
And carrying out regularization, wherein the three-dimensional convolution neural network is formed by a three-dimensional U-shaped neural network. Then, obtaining a depth probability body by adopting a Softmax operation, and regressing a depth map based on soft-argmax, namely obtaining a final depth map by expecting the depth probability body and a depth hypothesis
Figure 800420DEST_PATH_IMAGE023
Step 3, based on the depth map
Figure 581295DEST_PATH_IMAGE008
Determining a depth map for the 2 nd stage estimate of the multi-scale depth feature extraction network
Figure 238672DEST_PATH_IMAGE009
In a possible embodiment, for the 2 nd stage, the step 3 includes:
step 301, according to the depth map
Figure 79589DEST_PATH_IMAGE023
Determining a depth hypothesis sampling range for the second stage
Figure 55373DEST_PATH_IMAGE024
And performing uniform sampling in the depth range
Figure 639938DEST_PATH_IMAGE025
A depth hypothesis value.
As will be appreciated, estimated from the previous stage
Figure 886243DEST_PATH_IMAGE023
Determining a depth hypothesis sampling range for the phase
Figure 898061DEST_PATH_IMAGE024
And performing uniform sampling in the depth range
Figure 862606DEST_PATH_IMAGE025
A depth hypothesis value, wherein
Figure 985283DEST_PATH_IMAGE024
The determined sampling range is
Figure 351673DEST_PATH_IMAGE094
Step 302, performing two-view cost body construction and aggregation according to the method from step 201 to step 203, and performing image depth feature at the 1 st scale
Figure 908294DEST_PATH_IMAGE026
And with
Figure 484769DEST_PATH_IMAGE025
Obtaining a aggregated cost body on the basis of the assumed depth value
Figure 286503DEST_PATH_IMAGE027
It can be understood that according to the two-view cost volume construction and aggregation method in step 2, the image depth feature at the 1 st scale
Figure 366455DEST_PATH_IMAGE026
And with
Figure 861021DEST_PATH_IMAGE025
Obtaining aggregated cost body based on individual depth hypothesis value
Figure 393633DEST_PATH_IMAGE027
Step 303, regularizing a cost body and predicting a depth map according to the method in step 204, based on the cost body
Figure 999058DEST_PATH_IMAGE027
Obtaining the depth map
Figure 199095DEST_PATH_IMAGE009
It can be understood that, according to the cost body regularization and depth map prediction method in step 2, the cost body is based on
Figure 363098DEST_PATH_IMAGE027
Obtaining a depth map
Figure 383007DEST_PATH_IMAGE009
Step 4, adopting a hierarchy edge retention residual error learning module to carry out depth map matching
Figure 792123DEST_PATH_IMAGE009
Optimizing and upsampling to obtain an optimized depth map
Figure 846666DEST_PATH_IMAGE010
In one possible embodiment, step 4 includes:
step 401, extracting multi-scale context features of a reference image by using a context coding network
Figure 417456DEST_PATH_IMAGE028
Wherein, in the process,
Figure 190240DEST_PATH_IMAGE029
representing the s-th scale, the size of the s-th scale feature being
Figure 137467DEST_PATH_IMAGE030
It is understood that the context coding network structure in step 401 is similar to the multi-scale feature extraction network structure in step 1, and is also a two-dimensional U-type network composed of one encoder and one decoder with a jump connection.
Step 402, aligning the depth map
Figure 46518DEST_PATH_IMAGE009
The normalization is carried out, and the normalization is carried out,normalization of depth maps using a shallow 2D CNN network
Figure 552323DEST_PATH_IMAGE031
And (5) carrying out feature extraction.
It is to be understood that step 402 is directed to the depth map
Figure 546824DEST_PATH_IMAGE009
The normalized formula is:
Figure 297742DEST_PATH_IMAGE036
(4)
wherein,
Figure 326878DEST_PATH_IMAGE037
and
Figure 98525DEST_PATH_IMAGE038
mean and variance calculations are indicated, respectively.
Step 403, the extracted depth map features and the contextual features of the image are combined
Figure 986847DEST_PATH_IMAGE032
Connecting, inputting to an edge preserving residual error learning network for residual error learning to obtain a residual error map
Figure 275877DEST_PATH_IMAGE033
It is understood that the edge preserving residual learning network in step 403 is a two-dimensional U-type network consisting of one encoder and one decoder with a jump connection; the encoder and decoder are composed of a plurality of residual blocks to enhance the feature representation capability.
Step 404, normalizing and upsampling the depth map and the residual map
Figure 159519DEST_PATH_IMAGE033
Adding the depth data and the depth data, and performing de-normalization on the result to obtain an optimized depth map
Figure 741548DEST_PATH_IMAGE034
It will be appreciated that, in step 404, the normalized depth map is compared
Figure 710641DEST_PATH_IMAGE039
Upsampling using bilinear interpolation and matching the residual map
Figure 803362DEST_PATH_IMAGE040
Adding to obtain optimized normalized depth map
Figure 541511DEST_PATH_IMAGE041
I.e. by
Figure 61485DEST_PATH_IMAGE042
(5)
Wherein,
Figure 517874DEST_PATH_IMAGE043
representation of using bilinear interpolation
Figure 148706DEST_PATH_IMAGE041
Sampling to twice of the original; on the basis, a depth map is utilized
Figure 6941DEST_PATH_IMAGE035
The mean value and the variance are subjected to solution normalization to obtain an optimized depth map
Figure 196352DEST_PATH_IMAGE044
Figure 874458DEST_PATH_IMAGE045
(6)
Step 5, based on the depth map
Figure 308981DEST_PATH_IMAGE010
And images at 2 nd scaleDepth feature
Figure 21722DEST_PATH_IMAGE011
Sequentially carrying out depth estimation of the 3 rd stage and the 4 th stage to obtain a depth map estimated in the 4 th stage
Figure 883499DEST_PATH_IMAGE012
Step 6, adopting a hierarchy edge retention residual error learning module to carry out depth map matching
Figure 314480DEST_PATH_IMAGE012
Optimizing and upsampling to obtain an optimized depth map
Figure 287116DEST_PATH_IMAGE013
In a possible embodiment, the method of step 6 is similar to that of step 4, and may specifically include:
601, extracting multi-scale context characteristics of reference image by using context coding network
Figure 854363DEST_PATH_IMAGE046
Step 602, for the depth map
Figure 385576DEST_PATH_IMAGE047
Normalizing the depth map by using a shallow 2D CNN network
Figure 38275DEST_PATH_IMAGE048
And (5) carrying out feature extraction.
Step 603, the extracted depth map features and the context features of the image are combined
Figure 814601DEST_PATH_IMAGE049
Connecting, inputting to an edge preserving residual error learning network for residual error learning to obtain a residual error map
Figure 767513DEST_PATH_IMAGE050
Step 604, adding the normalized and up-sampled depth map and the residual map, and de-normalizing the added result to obtain an optimized depth map
Figure 971093DEST_PATH_IMAGE051
Step 7, based on the optimized depth map
Figure 111087DEST_PATH_IMAGE013
And image depth features at the 3 rd scale
Figure 425525DEST_PATH_IMAGE014
Performing depth estimation of the 5 th stage to obtain a depth map
Figure 967364DEST_PATH_IMAGE015
In a possible embodiment, in the process of performing the depth estimation of the 3 rd stage, the 4 th stage and the 5 th stage in the steps 5 and 7: the depth range is determined in accordance with the method of step 301.
Constructing and aggregating the two-view cost body according to the method from step 201 to step 203; cost body regularization and depth map prediction are performed according to the method of step 204.
In a possible way of implementing the embodiment,
the training process of the multi-scale depth feature extraction network comprises the following steps:
step 801, supervising the multi-scale depth estimation network with cross-view photometric consistency loss together with L1 loss, the core idea of cross-view photometric consistency is to convert the difference of true depth value and predicted depth value into the difference of image synthesized based on true depth value and depth value synthesized based on predicted depth value by depth-based view synthesis, thereby enlarging the gradient flow of the detail region. For reference images
Figure 840380DEST_PATH_IMAGE052
Pixel with middle depth value d
Figure 467671DEST_PATH_IMAGE053
Its corresponding pixel in the source view
Figure 585799DEST_PATH_IMAGE054
Comprises the following steps:
Figure 247725DEST_PATH_IMAGE055
(7)
wherein,
Figure 793107DEST_PATH_IMAGE056
and
Figure 642114DEST_PATH_IMAGE057
camera parameters for the reference view and the ith neighborhood view respectively,
Figure 563934DEST_PATH_IMAGE058
Figure 80366DEST_PATH_IMAGE059
is the relative rotation and translation between the reference view and the i-th neighborhood view; through the transformation, an image synthesized by the ith neighborhood view on the reference view based on the depth map D can be obtained through differentiable bilinear interpolation
Figure 295184DEST_PATH_IMAGE060
I.e. by
Figure 631488DEST_PATH_IMAGE061
(8)
During the transformation, a binary mask is generated
Figure 356998DEST_PATH_IMAGE062
For identifying the composite image
Figure 462357DEST_PATH_IMAGE063
I.e. the pixels projected to the outer area of the image.
The computational disclosure of cross-view photometric consistency loss is:
Figure 615121DEST_PATH_IMAGE064
(9)
wherein, respectively, views synthesized on the basis of the i-th neighborhood view according to the true depth and the estimated depth are represented, N represents the number of views,
Figure 579666DEST_PATH_IMAGE065
representing the effective pixels in the composite image and the generated GT depth map
Figure 702343DEST_PATH_IMAGE066
So as to obtain the composite material,
Figure 98427DEST_PATH_IMAGE067
representing the active pixels in the GT depth map.
Step 802, combining the cross-view photometric consistency loss and the L1 loss to obtain the loss of the multi-scale depth estimation branch part:
Figure 156513DEST_PATH_IMAGE068
(10)
wherein
Figure 467408DEST_PATH_IMAGE069
For the weight coefficients of the loss functions at the s-th stage, the weight coefficients of the loss functions at the 1 st to 5 th stages may be set to 0.5, 1, and 2, respectively.
Step 803, the hierarchy edge residual error keeping learning branch adopts L1 loss for supervision, and the total loss of the whole network is:
Figure 269142DEST_PATH_IMAGE070
(11)
wherein
Figure 880252DEST_PATH_IMAGE071
For the weight coefficient of the loss function at the s-th stage, the weight coefficients of the loss functions at the 2 nd and 4 th stages may be set to 1 and 2, respectively.
Example 2
Embodiment 2 provided by the present invention is an embodiment of a ranging method for an unmanned aerial vehicle platform provided by the present invention, and as can be seen by referring to fig. 1, the embodiment of the ranging method includes: the distance measurement is carried out based on the depth map obtained by the edge-preserving multi-view depth estimation method for the unmanned aerial vehicle platform.
It can be understood that the ranging method for the unmanned aerial vehicle platform provided by the present invention corresponds to the edge preservation multiview depth estimation method for the unmanned aerial vehicle platform provided by the foregoing embodiments, and the relevant technical features of the ranging method for the unmanned aerial vehicle platform may refer to the relevant technical features of the edge preservation multiview depth estimation method for the unmanned aerial vehicle platform, which are not described herein again.
The edge-preserving multi-view depth estimation and ranging method for the unmanned aerial vehicle platform has obvious gains on depth estimation results and efficiency, and the gains mainly come from the following three aspects: firstly, correcting errors generated in bilinear upsampling through a hierarchical edge retention residual error learning module and optimizing a depth map estimated by a multi-scale depth estimation network to obtain a depth map with retained edge details; meanwhile, cross-view luminosity consistency loss is introduced to enhance the gradient flow of a detail area during training, so that the accuracy of depth estimation can be further improved; on the basis, a lightweight multi-view depth estimation cascade network framework is designed, and depth hypothesis sampling can be performed as much as possible under the condition that a lot of extra video memory and time consumption are not increased in the stacking stage under the same resolution, so that accurate depth estimation can be realized under the efficient condition, and the multi-view depth estimation network can be applied to an unmanned aerial vehicle platform practically.
It should be noted that, in the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to relevant descriptions of other embodiments for parts that are not described in detail in a certain embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. An edge-preserving multi-view depth estimation method facing an unmanned aerial vehicle platform is characterized by comprising the following steps:
step 1, a reference image is given
Figure 238976DEST_PATH_IMAGE001
And N-1 neighborhood images thereof
Figure 288972DEST_PATH_IMAGE002
Extracting the multi-scale depth features of each image by using a multi-scale depth feature extraction network with weight sharing
Figure 155297DEST_PATH_IMAGE003
Wherein
Figure 540011DEST_PATH_IMAGE004
representing the s-th scale, the size of the s-th scale feature being
Figure 149984DEST_PATH_IMAGE005
Figure 54486DEST_PATH_IMAGE006
Is the number of channels of the s-th scale feature,
Figure 91712DEST_PATH_IMAGE007
is the size of the original input image;
step 2, determining the estimated depth map of the 1 st stage of the multi-scale depth feature extraction network
Figure 698143DEST_PATH_IMAGE008
Step 3, based on the depth map
Figure 111806DEST_PATH_IMAGE008
Determining a depth map for the 2 nd stage estimate of the multi-scale depth feature extraction network
Figure 870815DEST_PATH_IMAGE009
Step 4, adopting a hierarchy edge retention residual error learning module to carry out comparison on the depth map
Figure 344522DEST_PATH_IMAGE009
Optimizing and upsampling to obtain an optimized depth map
Figure 48035DEST_PATH_IMAGE010
Step 5, based on the depth map
Figure 124445DEST_PATH_IMAGE010
And image depth features at 2 nd scale
Figure 128173DEST_PATH_IMAGE011
Sequentially carrying out depth estimation of the 3 rd stage and the 4 th stage to obtain a depth map estimated by the 4 th stage
Figure 648147DEST_PATH_IMAGE012
Step 6, adopting a hierarchy edge preserving residual error learning module to carry out comparison on the depth map
Figure 838957DEST_PATH_IMAGE012
Optimizing and upsampling to obtain an optimized depth map
Figure 719057DEST_PATH_IMAGE013
Step 7, based on the optimized depth map
Figure 577292DEST_PATH_IMAGE013
And image depth features at the 3 rd scale
Figure 2588DEST_PATH_IMAGE014
Performing depth estimation of the 5 th stage to obtain a depth map
Figure 946273DEST_PATH_IMAGE015
2. The edge-preserving multiview depth estimation method of claim 1, wherein the multiscale feature extraction network is a two-dimensional U-type network consisting of one encoder and one decoder with a jump-connection; the encoder and the decoder are composed of a plurality of residual blocks.
3. The edge-preserving multi-view depth estimation method according to claim 1, wherein the step 2 comprises:
step 201, in the whole scene depth range
Figure 630064DEST_PATH_IMAGE016
Internal uniform sampling
Figure 342805DEST_PATH_IMAGE017
A depth hypothesis value;
step 202, through the micro-homography transformation, under each depth hypothesis, the first oneiDepth characterization of a view of a web neighborhood
Figure 939003DEST_PATH_IMAGE018
Projective transformation is carried out to the reference view, and then the two-view cost body is constructed by utilizing the group correlation measurement
Figure 104405DEST_PATH_IMAGE019
Step 203, for the second stepiTwo-view cost body
Figure 201674DEST_PATH_IMAGE019
Estimating visibility maps using shallow 3D CNN
Figure 159135DEST_PATH_IMAGE020
And based on visibility map of each domain view
Figure 785288DEST_PATH_IMAGE021
And carrying out weighted summation on all the two-view cost bodies to obtain the final aggregated cost body
Figure 578932DEST_PATH_IMAGE022
Step 204, utilizing a three-dimensional convolution neural network to carry out the cost matching on the cost body
Figure 479892DEST_PATH_IMAGE022
Regularization is carried out, a depth probability body is obtained through Softmax operation, and based on the depth probability body, soft-argmax is adopted to obtain the depth map
Figure 26279DEST_PATH_IMAGE023
4. The edge-preserving multi-view depth estimation method according to claim 3, wherein the step 3 comprises:
step 301, according to the depth map
Figure 354493DEST_PATH_IMAGE023
Determining a depth hypothesis sampling range for the second stage
Figure 369853DEST_PATH_IMAGE024
And sampling uniformly in the depth range
Figure 808925DEST_PATH_IMAGE025
A depth hypothesis value;
step 302, performing two-view cost body construction and aggregation according to the method of the steps 201 to 203, and performing image depth feature under the 1 st scale
Figure 350764DEST_PATH_IMAGE026
And
Figure 974513DEST_PATH_IMAGE025
obtaining a aggregated cost body on the basis of the assumed depth value
Figure 601803DEST_PATH_IMAGE027
Step 303, regularizing a cost body and predicting a depth map according to the method in the step 204, and based on the cost body
Figure 454353DEST_PATH_IMAGE027
Obtaining the depth map
Figure 850699DEST_PATH_IMAGE009
5. The edge-preserving multi-view depth estimation method according to claim 1, wherein the step 4 comprises:
step 401, extracting multi-scale context features of a reference image by using a context coding network
Figure 645348DEST_PATH_IMAGE028
Wherein, in the process,
Figure 759935DEST_PATH_IMAGE029
representing the s-th scale, the size of the s-th scale feature being
Figure 275230DEST_PATH_IMAGE030
Step 402, aligning the depth map
Figure 667028DEST_PATH_IMAGE009
Normalizing the normalized depth map by using a shallow 2D CNN network
Figure 507945DEST_PATH_IMAGE031
Carrying out feature extraction;
step 403, the extracted depth map features and the context features of the image are combined
Figure 234462DEST_PATH_IMAGE032
Connecting, inputting to an edge preserving residual error learning network for residual error learning to obtain a residual error map
Figure 553447DEST_PATH_IMAGE033
Step 404, normalizing and upsampling the depth map and the residual map
Figure 799752DEST_PATH_IMAGE033
Adding the depth map and performing normalization on the result after the addition to obtain the optimized depth map
Figure 811571DEST_PATH_IMAGE034
6. The edge-preserving multiview depth estimation method of claim 5,
the context coding network in step 401 is a two-dimensional U-shaped network, and the context coding network includes: an encoder and a decoder having a jump connection;
the depth map is mapped in the step 402
Figure 369591DEST_PATH_IMAGE035
The normalized formula is:
Figure 882481DEST_PATH_IMAGE036
(1)
wherein,
Figure 107926DEST_PATH_IMAGE037
and
Figure 166011DEST_PATH_IMAGE038
mean and variance calculations are respectively represented;
the edge preserving residual learning network in step 403 is a two-dimensional U-shaped network consisting of one encoder and one decoder with a jump connection; the encoder and the decoder are composed of a plurality of residual blocks;
in step 404, the normalized depth map is processed
Figure 476907DEST_PATH_IMAGE039
Upsampling using bilinear interpolation and matching the residual map
Figure 551346DEST_PATH_IMAGE040
Adding the normalized depth maps to obtain an optimized normalized depth map
Figure 631298DEST_PATH_IMAGE041
I.e. by
Figure 125864DEST_PATH_IMAGE042
(2)
Wherein,
Figure 658476DEST_PATH_IMAGE043
representation of using bilinear interpolation
Figure 388535DEST_PATH_IMAGE041
Sampling to twice of the original;
using depth maps
Figure 447627DEST_PATH_IMAGE035
The mean value and the variance are subjected to solution normalization to obtain an optimized depth map
Figure 972149DEST_PATH_IMAGE044
Figure 133003DEST_PATH_IMAGE045
(3)。
7. The edge-preserving multi-view depth estimation method of claim 5, wherein in the process of performing the 3 rd stage, the 4 th stage and the 5 th stage depth estimation in step 5 and step 7: determining a depth range according to the method of step 301;
constructing and aggregating the two-view cost body according to the method from the step 201 to the step 203; and performing cost body regularization and depth map prediction according to the method of the step 204.
8. The edge-preserving multi-view depth estimation method according to claim 5, wherein the step 6 comprises:
step 601, utilizeMulti-scale context features for extracting reference images by context coding network
Figure 401173DEST_PATH_IMAGE046
Step 602, aligning the depth map
Figure 580351DEST_PATH_IMAGE047
Normalizing the normalized depth map by using a shallow 2D CNN network
Figure 275774DEST_PATH_IMAGE048
Carrying out feature extraction;
step 603, the extracted depth map features and the context features of the image are combined
Figure 782979DEST_PATH_IMAGE049
Connecting, inputting to an edge preserving residual error learning network for residual error learning to obtain a residual error map
Figure 730206DEST_PATH_IMAGE050
Step 604, adding the normalized and up-sampled depth map and the residual map, and de-normalizing the added result to obtain the optimized depth map
Figure 904836DEST_PATH_IMAGE051
9. The edge-preserving multi-view depth estimation method according to claim 1, wherein the training process of the multi-scale depth feature extraction network comprises:
step 801, adopting cross-view luminosity consistency loss and L1 loss together to supervise a multi-scale depth estimation network, and carrying out supervision on the reference image
Figure 630215DEST_PATH_IMAGE052
Pixel with middle depth value d
Figure 890295DEST_PATH_IMAGE053
Corresponding pixel in the source view
Figure 375634DEST_PATH_IMAGE054
Is composed of
Figure 139191DEST_PATH_IMAGE055
(4)
Wherein,
Figure 176417DEST_PATH_IMAGE056
and
Figure 48427DEST_PATH_IMAGE057
camera parameters for the reference view and the ith neighborhood view respectively,
Figure 462091DEST_PATH_IMAGE058
Figure 955520DEST_PATH_IMAGE059
is the relative rotation and translation between the reference view and the ith neighborhood view; obtaining an image synthesized by the ith neighborhood view on the reference view based on the depth map D through differentiable bilinear interpolation
Figure 429227DEST_PATH_IMAGE060
I.e. by
Figure 522954DEST_PATH_IMAGE061
(5)
Binary mask generated in the conversion process
Figure 209150DEST_PATH_IMAGE062
For marking the composite image
Figure 212878DEST_PATH_IMAGE063
An invalid pixel in (1);
the computational disclosure of cross-view photometric consistency loss is:
Figure 467273DEST_PATH_IMAGE064
(6)
wherein, respectively, views synthesized on the basis of the i-th neighborhood view according to the true depth and the estimated depth are represented, N represents the number of views,
Figure 923662DEST_PATH_IMAGE065
representing the effective pixels in the composite image and the generated GT depth map
Figure 538183DEST_PATH_IMAGE066
So as to obtain the compound with the characteristics of,
Figure 396418DEST_PATH_IMAGE067
representing valid pixels in the GT depth map;
step 802, combining the cross-view photometric consistency loss and the L1 loss to obtain the loss of the multi-scale depth estimation branch part:
Figure 87293DEST_PATH_IMAGE068
(7)
wherein
Figure 30979DEST_PATH_IMAGE069
Weight coefficients which are loss functions at the s-th stage;
step 803, the hierarchy edge residual error learning branch adopts L1 loss to supervise, and the total loss of the whole network is:
Figure 58977DEST_PATH_IMAGE070
(8)
wherein
Figure 896352DEST_PATH_IMAGE071
Is the weight coefficient of the loss function at the s-th stage.
10. A distance measurement method facing an unmanned aerial vehicle platform is characterized by comprising the following steps: ranging is performed based on the depth map obtained by the unmanned aerial vehicle platform-oriented edge preserving multi-view depth estimation method of any one of claims 1-9.
CN202211408484.4A2022-11-102022-11-10Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platformActiveCN115457101B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202211408484.4ACN115457101B (en)2022-11-102022-11-10Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202211408484.4ACN115457101B (en)2022-11-102022-11-10Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform

Publications (2)

Publication NumberPublication Date
CN115457101Atrue CN115457101A (en)2022-12-09
CN115457101B CN115457101B (en)2023-03-24

Family

ID=84295585

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202211408484.4AActiveCN115457101B (en)2022-11-102022-11-10Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform

Country Status (1)

CountryLink
CN (1)CN115457101B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108765333A (en)*2018-05-242018-11-06华南理工大学A kind of depth map improving method based on depth convolutional neural networks
CN110310317A (en)*2019-06-282019-10-08西北工业大学 A method for depth estimation of monocular vision scene based on deep learning
CN111462329A (en)*2020-03-242020-07-28南京航空航天大学 A 3D reconstruction method of UAV aerial images based on deep learning
CN112001960A (en)*2020-08-252020-11-27中国人民解放军91550部队Monocular image depth estimation method based on multi-scale residual error pyramid attention network model
WO2021098554A1 (en)*2019-11-202021-05-27Oppo广东移动通信有限公司Feature extraction method and apparatus, device, and storage medium
US20210319577A1 (en)*2020-04-142021-10-14Toyota Research Institute, Inc.Depth estimation based on ego-motion estimation and residual flow estimation
CN113962858A (en)*2021-10-222022-01-21沈阳工业大学 A multi-view depth acquisition method
CN114820755A (en)*2022-06-242022-07-29武汉图科智能科技有限公司Depth map estimation method and system
CN115082540A (en)*2022-07-252022-09-20武汉图科智能科技有限公司Multi-view depth estimation method and device suitable for unmanned aerial vehicle platform
CN115131418A (en)*2022-06-082022-09-30中国石油大学(华东) A Transformer-based Monocular Depth Estimation Algorithm
CN115170746A (en)*2022-09-072022-10-11中南大学Multi-view three-dimensional reconstruction method, system and equipment based on deep learning
CN115272438A (en)*2022-08-192022-11-01中国矿业大学 A high-precision monocular depth estimation system and method for 3D scene reconstruction

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108765333A (en)*2018-05-242018-11-06华南理工大学A kind of depth map improving method based on depth convolutional neural networks
CN110310317A (en)*2019-06-282019-10-08西北工业大学 A method for depth estimation of monocular vision scene based on deep learning
WO2021098554A1 (en)*2019-11-202021-05-27Oppo广东移动通信有限公司Feature extraction method and apparatus, device, and storage medium
CN111462329A (en)*2020-03-242020-07-28南京航空航天大学 A 3D reconstruction method of UAV aerial images based on deep learning
US20210319577A1 (en)*2020-04-142021-10-14Toyota Research Institute, Inc.Depth estimation based on ego-motion estimation and residual flow estimation
CN112001960A (en)*2020-08-252020-11-27中国人民解放军91550部队Monocular image depth estimation method based on multi-scale residual error pyramid attention network model
CN113962858A (en)*2021-10-222022-01-21沈阳工业大学 A multi-view depth acquisition method
CN115131418A (en)*2022-06-082022-09-30中国石油大学(华东) A Transformer-based Monocular Depth Estimation Algorithm
CN114820755A (en)*2022-06-242022-07-29武汉图科智能科技有限公司Depth map estimation method and system
CN115082540A (en)*2022-07-252022-09-20武汉图科智能科技有限公司Multi-view depth estimation method and device suitable for unmanned aerial vehicle platform
CN115272438A (en)*2022-08-192022-11-01中国矿业大学 A high-precision monocular depth estimation system and method for 3D scene reconstruction
CN115170746A (en)*2022-09-072022-10-11中南大学Multi-view three-dimensional reconstruction method, system and equipment based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MADHUANAND L 等: "Deep learning for monocular depth estimation from UAV images", 《ISPRS ANNALS OF THE PHOTOGRAMMETRY, REMOTE SENSING AND SPATIAL INFORMATION SCIENCES》*
WANJUAN SU 等: "Uncertainty Guided Multi-View Stereo Network for Depth Estimation", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》*

Also Published As

Publication numberPublication date
CN115457101B (en)2023-03-24

Similar Documents

PublicationPublication DateTitle
CN113160294B (en) Image scene depth estimation method, device, terminal device and storage medium
CN111696148A (en)End-to-end stereo matching method based on convolutional neural network
CN113221925B (en)Target detection method and device based on multi-scale image
CN110782490A (en) A video depth map estimation method and device with spatiotemporal consistency
CN110443883B (en)Plane three-dimensional reconstruction method for single color picture based on droplock
CN113344869A (en)Driving environment real-time stereo matching method and device based on candidate parallax
US8416989B2 (en)Image processing apparatus, image capture apparatus, image processing method, and program
CN104166987B (en)Parallax estimation method based on improved adaptive weighted summation and belief propagation
CN116486288A (en) Aerial Target Counting and Detection Method Based on Lightweight Density Estimation Network
CN112329662B (en)Multi-view saliency estimation method based on unsupervised learning
CN114742875B (en) Binocular stereo matching method based on multi-scale feature extraction and adaptive aggregation
CN117726747A (en)Three-dimensional reconstruction method, device, storage medium and equipment for complementing weak texture scene
CN114596349A (en)Depth estimation method, depth estimation device, electronic equipment and computer-readable storage medium
CN118037989A (en)Multi-view nerve implicit surface reconstruction method based on priori driving
CN112115786A (en)Monocular vision odometer method based on attention U-net
CN119006678A (en)Three-dimensional Gaussian sputtering optimization method for pose-free input
CN114820755B (en)Depth map estimation method and system
CN118485783A (en) Multi-view 3D reconstruction method and system based on visual center and implicit attention
CN111179327B (en)Depth map calculation method
WO2025138753A1 (en)Three-dimensional modeling method and apparatus
CN115457101B (en)Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform
CN117456185A (en) Remote sensing image segmentation method based on adaptive pattern matching and nested modeling
Zhu et al.Hybrid Cost Volume Regularization for Memory-efficient Multi-view Stereo Networks.
CN113066165A (en)Three-dimensional reconstruction method and device for multi-stage unsupervised learning and electronic equipment
CN117115145B (en)Detection method and device, electronic equipment and computer readable medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
CP03Change of name, title or address

Address after:No. 548, 5th Floor, Building 10, No. 28 Linping Avenue, Donghu Street, Linping District, Hangzhou City, Zhejiang Province

Patentee after:Hangzhou Tuke Intelligent Information Technology Co.,Ltd.

Address before:430000 B033, No. 05, 4th floor, building 2, international enterprise center, No. 1, Guanggu Avenue, Donghu New Technology Development Zone, Wuhan, Hubei (Wuhan area of free trade zone)

Patentee before:Wuhan Tuke Intelligent Technology Co.,Ltd.

CP03Change of name, title or address

[8]ページ先頭

©2009-2025 Movatter.jp