CN115457101A

Movatterモバイル変換

Info

Publication number: CN115457101A
Application number: CN202211408484.4A
Authority: CN
Inventors: 陶文兵; 苏婉娟; 刘李漫
Original assignee: Wuhan Tuke Intelligent Technology Co ltd
Current assignee: Hangzhou Tuke Intelligent Information Technology Co ltd
Priority date: 2022-11-10
Filing date: 2022-11-10
Publication date: 2022-12-09
Anticipated expiration: 2042-11-10
Also published as: CN115457101B

Abstract

The invention provides an efficient edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform. The method comprises the following steps: a hierarchical edge preserving residual learning module is provided to correct errors generated in bilinear upsampling and optimize a depth map estimated by a multi-scale depth estimation network, so that the network can obtain a depth map with edge details preserved; the gradient flow of a detail area during training is enhanced by providing cross-view luminosity consistency loss, so that the accuracy of depth estimation can be further improved; a lightweight multi-view depth estimation cascade network framework is designed, and depth hypothesis sampling can be performed as much as possible under the condition of not increasing much extra video memory and time consumption by stacking stages under the same resolution, so that depth estimation can be performed efficiently.

Description

Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform

Technical Field

The invention relates to the technical field of computer vision, in particular to an edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform.

Background

The multi-view depth estimation facing the unmanned aerial vehicle platform aims to establish dense corresponding relation in multi-view images acquired by the unmanned aerial vehicle, so that the depth of the images under a reference view angle is recovered. Unmanned aerial vehicle autonomous navigation needs to possess perception surrounding environment and location ability, and multi-view depth estimation facing to an unmanned aerial vehicle platform can provide three-dimensional scene perception and understanding ability for the unmanned aerial vehicle, and provides technical support for unmanned aerial vehicle realization autonomous obstacle avoidance and range finding and three-dimensional map reconstruction based on the unmanned aerial vehicle. In recent years, the development of multi-view depth estimation is greatly promoted by deep learning technology. The learning-based multi-view depth estimation method generally adopts 3D CNN (3D volumetric Neural Network) to perform regularization of the cost body, however, due to the smooth characteristic of the 3D CNN, there is a problem of excessive smoothing at the edge of the object in the estimated depth map.

In addition, the depth map estimation can be performed more efficiently due to a Coarse-to-Fine (Coarse-to-Fine) architecture, which is widely applied to a learning-based multi-view depth estimation method. But in this architecture, discrete and sparse depth hypothesis sampling further exacerbates the difficulty of recovering thin structures and object edge depths. Moreover, the existing multi-view depth estimation method based on learning is difficult to realize good balance between performance and efficiency, limited by limited airborne hardware resources of the unmanned aerial vehicle, and the existing multi-view depth estimation algorithm is difficult to be practically applied on an unmanned aerial vehicle platform. Therefore, how to accurately recover the depth of the detail area to provide support for the unmanned aerial vehicle to accurately measure the distance and how to achieve a good balance between performance and efficiency remain key issues to be solved.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides an edge-preserving multi-view depth estimation and distance measurement method for an unmanned aerial vehicle platform, and aims to solve the technical problems that the depth of a thin structure and an object edge area is difficult to recover and good balance between performance and efficiency is difficult to realize in the conventional method.

According to a first aspect of the present invention, there is provided an edge-preserving multi-view depth estimation method for a drone platform, comprising: step 1, a reference image is given

And N-1 neighborhood images thereof

Extracting the multi-scale depth features of each image by using a weight sharing multi-scale depth feature extraction network

Wherein, in the process,

representing the s-th scale, the size of the s-th scale feature being

，

Is the number of channels of the s-th scale feature,

is the size of the original input image;

step 2, determining the estimated depth map of the 1 st stage of the multi-scale depth feature extraction network

；

Step 3, based on the depth map

Determining a depth map for the 2 nd stage estimation of the multi-scale depth feature extraction network

；

Step 4, adopting a hierarchy edge retention residual error learning module to carry out comparison on the depth map

Optimizing and upsampling to obtain an optimized depth map

；

Step 5, based on the depth map

And image depth features at 2 nd scale

Sequentially carrying out depth estimation of the 3 rd stage and the 4 th stage to obtain a depth map estimated in the 4 th stage

；

Step 6, adopting a hierarchy edge retention residual error learning module to carry out comparison on the depth map

Optimizing and upsampling to obtain an optimized depth map

；

Step 7, based on the optimized depth map

And image depth features at the 3 rd scale

Performing depth estimation at the 5 th stage to obtain a depth map

。

On the basis of the technical scheme, the invention can be improved as follows.

Optionally, the multi-scale feature extraction network is a two-dimensional U-shaped network composed of an encoder and a decoder with a jump connection; the encoder and the decoder are composed of a plurality of residual blocks.

Optionally, step 2 includes:

step 201, in the whole scene depth range

Internal uniform sampling

A depth hypothesis value;

step 202, through the micro-homography transformation, under each depth hypothesis, the first oneiDepth characterization of a view of a web neighborhood

Projective transformation is carried out to the reference view, and then the two-view cost body is constructed by utilizing the group correlation measurement

；

Step 203, for the second stepiTwo-view cost body

Estimation of visibility map using shallow 3D CNN

And based on the visibility map of each domain view

And carrying out weighted summation on all the two-view cost bodies to obtain the final aggregated cost body

；

Step 204, utilizing a three-dimensional convolution neural network to carry out the cost matching on the cost body

Regularizing, obtaining a depth probability body through a Softmax operation, and obtaining the depth map by adopting soft-argmax based on the depth probability body

。

Optionally, step 3 includes:

step 301, according to the depth map

Determining a depth hypothesis sampling range for the second stage

And sampling uniformly in the depth range

A depth hypothesis value;

step 302, performing two-view cost body construction and aggregation according to the method from step 201 to step 203, and performing image depth feature under the 1 st scale

And with

Obtaining aggregated cost body based on individual depth hypothesis value

；

Step 303, regularizing a cost body and predicting a depth map according to the method in step 204, and based on the cost body

Obtaining the depth map

。

Optionally, the step 4 includes:

step 401, extracting multi-scale context features of a reference image by using a context coding network

Wherein

representing the s-th scale, the size of the s-th scale feature being

；

Step 402, aligning the depth map

Normalizing the normalized depth map by using a shallow 2D CNN network

Carrying out feature extraction;

step 403, the extracted depth map features and the contextual features of the image are combined

Connecting, inputting to an edge preserving residual error learning network for residual error learning to obtain a residual error map

；

Step 404, normalizing and upsampling the depth map and the residual map

Adding the obtained data, and performing de-normalization on the result to obtainThe depth map after optimization

。

Optionally, the context coding network in step 401 is a two-dimensional U-shaped network, and the context coding network includes: an encoder and a decoder having a jump connection;

the depth map is mapped in the step 402

The normalized formula is:

（1）

wherein,

and

mean and variance calculations are represented, respectively;

the edge preserving residual learning network in step 403 is a two-dimensional U-shaped network consisting of one encoder and one decoder with a jump connection; the encoder and the decoder are composed of a plurality of residual blocks;

in step 404, the normalized depth map is processed

Upsampling by bilinear interpolation and matching with the residual map

Adding the normalized depth maps to obtain an optimized normalized depth map

I.e. by

（2）

Wherein,

represents that the image is processed by bilinear interpolation

Sampling to twice of the original; using depth maps

The mean value and the variance are subjected to solution normalization to obtain an optimized depth map

：

（3）。

Optionally, in the process of performing depth estimation in the 3 rd stage, the 4 th stage and the 5 th stage in the step 5 and the step 7: determining a depth range according to the method of step 301;

constructing and aggregating the two-view cost body according to the method from the step 201 to the step 203; and performing cost body regularization and depth map prediction according to the method of the step 204.

Optionally, the step 6 includes:

step 601, extracting multi-scale context characteristics of reference image by using context coding network

；

Step 602, aligning the depth map

Normalizing the normalized depth map by using a shallow 2D CNN network

Carrying out feature extraction;

step 603, the extracted depth map features and the context features of the image are combined

Connecting, inputting to an edge-preserving residual learning network for residual learning, and obtaining a residual map

；

Step 604, adding the normalized and up-sampled depth map and the residual map, and de-normalizing the added result to obtain the optimized depth map

。

Optionally, the training process of the multi-scale depth feature extraction network includes:

step 801, adopting cross-view photometric consistency loss and L1 loss together to supervise a multi-scale depth estimation network, and regarding the reference image

Pixel with middle depth value d

Corresponding pixel in the source view

Is composed of

（4）

Wherein,

and

camera parameters for the reference view and the ith neighborhood view respectively,

、

is the relative rotation and translation between the reference view and the i-th neighborhood view; obtaining an image synthesized by the ith neighborhood view on the reference view based on the depth map D through differentiable bilinear interpolation

I.e. by

（5）

Binary mask generated in the conversion process

For identifying the composite image

An invalid pixel in (1);

the computational disclosure of cross-view photometric consistency loss is:

（6）

wherein, respectively, views synthesized on the basis of the i-th neighborhood view according to the true depth and the estimated depth are represented, N represents the number of views,

representing the effective pixels in the composite image and the generated GT depth map

So as to obtain the compound with the characteristics of,

representing valid pixels in the GT depth map;

step 802, combining the cross-view photometric consistency loss and the L1 loss to obtain the loss of the multi-scale depth estimation branch part:

（7）

wherein

Weight coefficients which are loss functions at the s-th stage;

step 803, the hierarchy edge residual error learning branch adopts L1 loss to supervise, and the total loss of the whole network is:

（8）

wherein

Is the weight coefficient of the loss function at the s-th stage.

According to a second aspect of the invention, there is provided a ranging method for an unmanned aerial vehicle platform, comprising: the distance measurement is carried out based on the depth map obtained by the edge preserving multi-view depth estimation method facing the unmanned aerial vehicle platform.

The invention provides an edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform, and provides a hierarchical edge-preserving residual error learning module for correcting errors generated in bilinear upsampling and helping to improve the accuracy of depth estimation of a multi-scale depth estimation network in order to achieve accurate estimation of a detail area. In addition, in order to enhance the gradient flow of the detail region during network training, cross-view photometric consistency loss is provided, and the accuracy of the estimated depth can be further improved. In order to realize better balance on performance and efficiency, a lightweight multi-view depth estimation cascade network framework is designed and combined with the two strategies, so that accurate depth estimation can be realized under the efficient condition, and the method is favorable for practical application on an unmanned aerial vehicle platform.

Drawings

Fig. 1 is a schematic diagram of an overall architecture of an efficient edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform according to the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

In order to overcome the defects and problems in the background art, a hierarchical edge preserving residual error learning module is proposed to optimize a depth map estimated by a multi-scale depth estimation network, so that the network can perform edge-aware depth map upsampling. In addition, a cross-view photometric consistency loss is proposed to strengthen the gradient flow of the detail region during training, thereby realizing more refined depth estimation. Meanwhile, on the basis, a lightweight multi-view depth estimation cascade network framework is designed, and depth estimation can be efficiently carried out.

Therefore, the invention provides an efficient edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform, fig. 1 is an overall architecture schematic diagram of the edge-preserving multi-view depth estimation and ranging method for the unmanned aerial vehicle platform, as shown in fig. 1, the edge-preserving multi-view depth estimation method includes:

step 1, a reference image is given

And N-1 neighborhood images thereof

Extracting network extraction by using multi-scale depth features shared by weightTaking multi-scale depth features of each image

Wherein

representing the s-th scale, the size of the s-th scale feature being

，

Is the number of channels of the s-th scale feature,

is the size of the original input image.

Step 2, determining the depth map estimated at the 1 st stage of the multi-scale depth feature extraction network

。

Step 3, based on the depth map

Determining depth maps for 2 nd stage estimation of multi-scale depth feature extraction networks

。

Step 4, in order to carry out edge-preserving upsampling, a hierarchical edge-preserving residual error learning module is adopted to carry out depth map

Optimizing and upsampling to obtain an optimized depth map

。

Step 5, based on the depth map

And image depth features at 2 nd scale

Sequentially carrying out depth estimation of the 3 rd stage and the 4 th stage to obtain a depth map estimated by the 4 th stage

。

Step 6, adopting a hierarchy edge preserving residual error learning module to carry out depth map matching

Optimizing and upsampling to obtain an optimized depth map

。

Step 7, based on the optimized depth map

And image depth features at the 3 rd scale

Performing depth estimation of the 5 th stage to obtain a final depth map

。

In summary, the whole multi-scale depth estimation network branch has five stages in total, the depth hypothesis sampling number of each stage is 32, 16, 8 and 8 respectively, the depth sampling range corresponding to the 2 nd stage is attenuated to be half of the previous stage, and the attenuation of the rest stages is one fourth of the previous stage.

The invention provides an efficient edge-preserving multi-view depth estimation method for an unmanned aerial vehicle platform, which aims to solve the technical problems that the depth of a thin structure and an object edge area is difficult to recover and good balance between performance and efficiency is difficult to realize in the conventional method.

Example 1

Embodiment 1 provided by the present invention is an embodiment of an edge-preserving multi-view depth estimation method for an unmanned aerial vehicle platform, and as can be seen in fig. 1, the embodiment of the edge-preserving multi-view depth estimation method includes:

step 1, a reference image is given

And N-1 neighborhood images thereof

Wherein

representing the s-th scale, the size of the s-th scale feature being

，

Is the number of channels of the s-th scale feature,

is the size of the original input image.

In one possible embodiment, the multi-scale feature extraction network is a two-dimensional U-network consisting essentially of an encoder and a decoder with a jump connection. Furthermore, to enhance the feature representation capability, the encoder and decoder are composed of a plurality of residual blocks.

。

In a possible embodiment, for the 1 st stage, step 2 includes:

step 201, in the whole scene depth range

Internal uniform sampling

A depth hypothesis value.

It will be appreciated that for the depth hypothesis d, the depth characteristics of all neighborhood views are transformed by the micromilliterate transform

Transforming the projection to a reference view to obtain transformed features

The calculation process of the micro homography is shown as formula (1).

（1）

Wherein,

and

camera internal and external references respectively representing reference views,

and

and respectively representing the camera internal reference and the external reference of the ith neighborhood view.

Step 202, through a micro-homographic transformationUnder each depth hypothesis, the first oneiDepth characterization of a view of a web neighborhood

。

It will be appreciated that the similarity of the projective transformation depth features of each neighborhood view to the depth features of the reference view is calculated based on the group correlation metric. In particular, for depth features of a reference image

And projective transformation characteristics of the ith neighborhood view under the depth value d

Their features are evenly divided into G groups along the feature channel dimension. Then, the user can use the device to perform the operation,

and

the inter-gth group feature similarity was calculated as:

（2）

wherein,

，

and

are respectively

And

the group g of features of (1),

is an inner product operation. When the calculation is finished

And

after the feature similarities of all G groups, the feature similarities form a feature similarity graph of G channels

. Due to the fact that

Individual depth hypothesis, between reference image and i-th neighborhood view

The feature similarity map is further sized as

Two-view cost body

。

Step 203, for the ith two-view cost body

Estimation of visibility map using shallow 3D CNN

And based on the visibility map of each domain view

。

It will be appreciated that in order to derive a visibility map of the ith neighbourhood view under the reference view

And for each two-view cost body, performing visibility estimation by adopting a shallow 3D CNN consisting of a layer of 3D convolution, batch regularization, reLU activation function, a layer of 3D convolution and Sigmoid activation function. On the basis, the visibility map of each domain view is utilized

Carrying out weighted summation on the cost bodies of the two views to obtain the final aggregated cost body

I.e. by

（3）

Step 204, utilizing the three-dimensional convolution neural network to compare the cost body

Regularization is carried out, a depth probability body is obtained through Softmax operation, and a depth map is obtained through soft-argmax based on the depth probability body

。

It can be understood that for the cost body

Using three-dimensional convolutional neural network pairsPrice body

And carrying out regularization, wherein the three-dimensional convolution neural network is formed by a three-dimensional U-shaped neural network. Then, obtaining a depth probability body by adopting a Softmax operation, and regressing a depth map based on soft-argmax, namely obtaining a final depth map by expecting the depth probability body and a depth hypothesis

。

Step 3, based on the depth map

Determining a depth map for the 2 nd stage estimate of the multi-scale depth feature extraction network

。

In a possible embodiment, for the 2 nd stage, the step 3 includes:

step 301, according to the depth map

Determining a depth hypothesis sampling range for the second stage

And performing uniform sampling in the depth range

A depth hypothesis value.

As will be appreciated, estimated from the previous stage

Determining a depth hypothesis sampling range for the phase

And performing uniform sampling in the depth range

A depth hypothesis value, wherein

The determined sampling range is

。

Step 302, performing two-view cost body construction and aggregation according to the method from step 201 to step 203, and performing image depth feature at the 1 st scale

And with

Obtaining a aggregated cost body on the basis of the assumed depth value

。

It can be understood that according to the two-view cost volume construction and aggregation method in step 2, the image depth feature at the 1 st scale

And with

Obtaining aggregated cost body based on individual depth hypothesis value

。

Step 303, regularizing a cost body and predicting a depth map according to the method in step 204, based on the cost body

Obtaining the depth map

。

It can be understood that, according to the cost body regularization and depth map prediction method in step 2, the cost body is based on

Obtaining a depth map

。

Step 4, adopting a hierarchy edge retention residual error learning module to carry out depth map matching

Optimizing and upsampling to obtain an optimized depth map

。

In one possible embodiment, step 4 includes:

Wherein, in the process,

representing the s-th scale, the size of the s-th scale feature being

。

It is understood that the context coding network structure in step 401 is similar to the multi-scale feature extraction network structure in step 1, and is also a two-dimensional U-type network composed of one encoder and one decoder with a jump connection.

Step 402, aligning the depth map

The normalization is carried out, and the normalization is carried out,normalization of depth maps using a shallow 2D CNN network

And (5) carrying out feature extraction.

It is to be understood that step 402 is directed to the depth map

The normalized formula is:

（4）

wherein,

and

mean and variance calculations are indicated, respectively.

。

It is understood that the edge preserving residual learning network in step 403 is a two-dimensional U-type network consisting of one encoder and one decoder with a jump connection; the encoder and decoder are composed of a plurality of residual blocks to enhance the feature representation capability.

Step 404, normalizing and upsampling the depth map and the residual map

Adding the depth data and the depth data, and performing de-normalization on the result to obtain an optimized depth map

。

It will be appreciated that, in step 404, the normalized depth map is compared

Upsampling using bilinear interpolation and matching the residual map

Adding to obtain optimized normalized depth map

I.e. by

（5）

Wherein,

representation of using bilinear interpolation

Sampling to twice of the original; on the basis, a depth map is utilized

：

（6）

Step 5, based on the depth map

And images at 2 nd scaleDepth feature

。

Step 6, adopting a hierarchy edge retention residual error learning module to carry out depth map matching

Optimizing and upsampling to obtain an optimized depth map

。

In a possible embodiment, the method of step 6 is similar to that of step 4, and may specifically include:

601, extracting multi-scale context characteristics of reference image by using context coding network

。

Step 602, for the depth map

Normalizing the depth map by using a shallow 2D CNN network

And (5) carrying out feature extraction.

。

Step 604, adding the normalized and up-sampled depth map and the residual map, and de-normalizing the added result to obtain an optimized depth map

。

Step 7, based on the optimized depth map

And image depth features at the 3 rd scale

Performing depth estimation of the 5 th stage to obtain a depth map

。

In a possible embodiment, in the process of performing the depth estimation of the 3 rd stage, the 4 th stage and the 5 th stage in the steps 5 and 7: the depth range is determined in accordance with the method of step 301.

Constructing and aggregating the two-view cost body according to the method from step 201 to step 203; cost body regularization and depth map prediction are performed according to the method of step 204.

In a possible way of implementing the embodiment,

the training process of the multi-scale depth feature extraction network comprises the following steps:

step 801, supervising the multi-scale depth estimation network with cross-view photometric consistency loss together with L1 loss, the core idea of cross-view photometric consistency is to convert the difference of true depth value and predicted depth value into the difference of image synthesized based on true depth value and depth value synthesized based on predicted depth value by depth-based view synthesis, thereby enlarging the gradient flow of the detail region. For reference images

Pixel with middle depth value d

Its corresponding pixel in the source view

Comprises the following steps:

（7）

wherein,

and

、

is the relative rotation and translation between the reference view and the i-th neighborhood view; through the transformation, an image synthesized by the ith neighborhood view on the reference view based on the depth map D can be obtained through differentiable bilinear interpolation

I.e. by

（8）

During the transformation, a binary mask is generated

For identifying the composite image

I.e. the pixels projected to the outer area of the image.

The computational disclosure of cross-view photometric consistency loss is:

（9）

So as to obtain the composite material,

representing the active pixels in the GT depth map.

（10）

wherein

For the weight coefficients of the loss functions at the s-th stage, the weight coefficients of the loss functions at the 1 st to 5 th stages may be set to 0.5, 1, and 2, respectively.

Step 803, the hierarchy edge residual error keeping learning branch adopts L1 loss for supervision, and the total loss of the whole network is:

（11）

wherein

For the weight coefficient of the loss function at the s-th stage, the weight coefficients of the loss functions at the 2 nd and 4 th stages may be set to 1 and 2, respectively.

Example 2

Embodiment 2 provided by the present invention is an embodiment of a ranging method for an unmanned aerial vehicle platform provided by the present invention, and as can be seen by referring to fig. 1, the embodiment of the ranging method includes: the distance measurement is carried out based on the depth map obtained by the edge-preserving multi-view depth estimation method for the unmanned aerial vehicle platform.

It can be understood that the ranging method for the unmanned aerial vehicle platform provided by the present invention corresponds to the edge preservation multiview depth estimation method for the unmanned aerial vehicle platform provided by the foregoing embodiments, and the relevant technical features of the ranging method for the unmanned aerial vehicle platform may refer to the relevant technical features of the edge preservation multiview depth estimation method for the unmanned aerial vehicle platform, which are not described herein again.

The edge-preserving multi-view depth estimation and ranging method for the unmanned aerial vehicle platform has obvious gains on depth estimation results and efficiency, and the gains mainly come from the following three aspects: firstly, correcting errors generated in bilinear upsampling through a hierarchical edge retention residual error learning module and optimizing a depth map estimated by a multi-scale depth estimation network to obtain a depth map with retained edge details; meanwhile, cross-view luminosity consistency loss is introduced to enhance the gradient flow of a detail area during training, so that the accuracy of depth estimation can be further improved; on the basis, a lightweight multi-view depth estimation cascade network framework is designed, and depth hypothesis sampling can be performed as much as possible under the condition that a lot of extra video memory and time consumption are not increased in the stacking stage under the same resolution, so that accurate depth estimation can be realized under the efficient condition, and the multi-view depth estimation network can be applied to an unmanned aerial vehicle platform practically.

It should be noted that, in the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to relevant descriptions of other embodiments for parts that are not described in detail in a certain embodiment.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An edge-preserving multi-view depth estimation method facing an unmanned aerial vehicle platform is characterized by comprising the following steps:

step 1, a reference image is given

And N-1 neighborhood images thereof

Extracting the multi-scale depth features of each image by using a multi-scale depth feature extraction network with weight sharing

Wherein

representing the s-th scale, the size of the s-th scale feature being

，

Is the number of channels of the s-th scale feature,

is the size of the original input image;

；

Step 3, based on the depth map

；

Optimizing and upsampling to obtain an optimized depth map

；

Step 5, based on the depth map

And image depth features at 2 nd scale

；

Step 6, adopting a hierarchy edge preserving residual error learning module to carry out comparison on the depth map

Optimizing and upsampling to obtain an optimized depth map

；

Step 7, based on the optimized depth map

And image depth features at the 3 rd scale

Performing depth estimation of the 5 th stage to obtain a depth map

。

2. The edge-preserving multiview depth estimation method of claim 1, wherein the multiscale feature extraction network is a two-dimensional U-type network consisting of one encoder and one decoder with a jump-connection; the encoder and the decoder are composed of a plurality of residual blocks.

3. The edge-preserving multi-view depth estimation method according to claim 1, wherein the step 2 comprises:

step 201, in the whole scene depth range

Internal uniform sampling

A depth hypothesis value;

；

Step 203, for the second stepiTwo-view cost body

Estimating visibility maps using shallow 3D CNN

And based on visibility map of each domain view

；

Regularization is carried out, a depth probability body is obtained through Softmax operation, and based on the depth probability body, soft-argmax is adopted to obtain the depth map

。

4. The edge-preserving multi-view depth estimation method according to claim 3, wherein the step 3 comprises:

step 301, according to the depth map

Determining a depth hypothesis sampling range for the second stage

And sampling uniformly in the depth range

A depth hypothesis value;

step 302, performing two-view cost body construction and aggregation according to the method of the steps 201 to 203, and performing image depth feature under the 1 st scale

And

obtaining a aggregated cost body on the basis of the assumed depth value

；

Step 303, regularizing a cost body and predicting a depth map according to the method in the step 204, and based on the cost body

Obtaining the depth map

。

5. The edge-preserving multi-view depth estimation method according to claim 1, wherein the step 4 comprises:

Wherein, in the process,

representing the s-th scale, the size of the s-th scale feature being

；

Step 402, aligning the depth map

Normalizing the normalized depth map by using a shallow 2D CNN network

Carrying out feature extraction;

step 403, the extracted depth map features and the context features of the image are combined

；

Step 404, normalizing and upsampling the depth map and the residual map

Adding the depth map and performing normalization on the result after the addition to obtain the optimized depth map

。

6. The edge-preserving multiview depth estimation method of claim 5,

the context coding network in step 401 is a two-dimensional U-shaped network, and the context coding network includes: an encoder and a decoder having a jump connection;

the depth map is mapped in the step 402

The normalized formula is:

（1）

wherein,

and

mean and variance calculations are respectively represented;

in step 404, the normalized depth map is processed

Upsampling using bilinear interpolation and matching the residual map

Adding the normalized depth maps to obtain an optimized normalized depth map

I.e. by

（2）

Wherein,

representation of using bilinear interpolation

Sampling to twice of the original;

using depth maps

：

（3）。

7. The edge-preserving multi-view depth estimation method of claim 5, wherein in the process of performing the 3 rd stage, the 4 th stage and the 5 th stage depth estimation in step 5 and step 7: determining a depth range according to the method of step 301;

8. The edge-preserving multi-view depth estimation method according to claim 5, wherein the step 6 comprises:

step 601, utilizeMulti-scale context features for extracting reference images by context coding network

；

Step 602, aligning the depth map

Normalizing the normalized depth map by using a shallow 2D CNN network

Carrying out feature extraction;

；

。

9. The edge-preserving multi-view depth estimation method according to claim 1, wherein the training process of the multi-scale depth feature extraction network comprises:

step 801, adopting cross-view luminosity consistency loss and L1 loss together to supervise a multi-scale depth estimation network, and carrying out supervision on the reference image

Pixel with middle depth value d

Corresponding pixel in the source view

Is composed of

（4）

Wherein,

and

、

is the relative rotation and translation between the reference view and the ith neighborhood view; obtaining an image synthesized by the ith neighborhood view on the reference view based on the depth map D through differentiable bilinear interpolation

I.e. by

（5）

Binary mask generated in the conversion process

For marking the composite image

An invalid pixel in (1);

the computational disclosure of cross-view photometric consistency loss is:

（6）

So as to obtain the compound with the characteristics of,

representing valid pixels in the GT depth map;

（7）

wherein

Weight coefficients which are loss functions at the s-th stage;

（8）

wherein

Is the weight coefficient of the loss function at the s-th stage.

10. A distance measurement method facing an unmanned aerial vehicle platform is characterized by comprising the following steps: ranging is performed based on the depth map obtained by the unmanned aerial vehicle platform-oriented edge preserving multi-view depth estimation method of any one of claims 1-9.