Summary of the invention
The present invention provides a kind of connection of combination residual error, depth separates the new and effective residual error mould that convolution sum decomposes convolutionBlock --- it is separable to decompose residual error module.The residual error module can allow network to keep lower parameter amount while depth is largerAnd calculation amount, so that network be allowed to keep higher computational efficiency.The present invention is based on the separable decomposition residual error module designs simultaneouslyA kind of semantic segmentation network structure that can make full use of its learning ability and learning efficiency realizes quickly semantic in real time pointIt cuts.Technical solution is as follows:
A kind of separable decomposition residual error modularity for quick semantic segmentation, including the following steps:
(1) it combines residual error connection and depth to separate convolution and designs separable residual error module: depth is separated into convolution3D convolution kernel resolves into one by the 2D convolution kernel of channel processing and the 3D convolution kernel of 1 × 1 size across channel, by residual error3D convolution in module replaces with depth and separates convolution, designs separable residual error module.
(2) introduced on the basis of separable residual error module and decompose the more efficient residual error module of convolution design --- it can divideFrom decomposition residual error module: the 3D convolution kernel for decomposing convolution is resolved into two continuous orthogonal one-dimensional convolution kernels, it is residual by separatingBecome two continuously by 3 × 1 convolution sum of channel by 1 × 3 convolution of channel by 3 × 3 convolution of channel in difference module, design can divideFrom decomposition residual error module;
(3) the separable decomposition residual error module design based on design is a kind of can make full use of its learning ability and learning efficiencyEfficient semantic segmentation network structure: the network use coding-decoding architecture, by continuous encoder and continuous decoder structureIncluding at, encoder includes the separable decomposition residual error module and down-sampled module being sequentially connected, and is responsible for extracting feature, and generateDown-sampled characteristic pattern;Decoder includes the separable decomposition residual error module being sequentially connected and up-sampling module, is responsible for furtherFeature is extracted, and the feature of encoder output is up-sampled, it is final to generate and input picture Pixel-level class of the same sizeOther probabilistic forecasting result.
(4) data set used in network training, test is determined, input training, test image are extracted special by encoderSign generates segmentation result by decoder, utilizes the segmentation efficiency of single image processing Time evaluation network.
For the residual error module that current convolutional network is widely used, connected using combination residual error proposed by the present inventionIt connects, the separable decomposition residual error module that the separable convolution sum of depth decomposes convolution can largely solve conventional residual moduleComputational efficiency restricted problem.Network is divided come design semantic using separable decomposition residual error module proposed by the present invention, it can be withNetwork is allowed to keep lower parameter amount and calculation amount while keeping larger depth.Compared to current common emphasis precisionFor semantic segmentation algorithm, network can be greatly promoted while guaranteeing precision using semantic segmentation network proposed by the present inventionComputational efficiency.
Specific embodiment
This patent solves conventional residual module the technical problem to be solved is that a kind of novel efficient residual error module is designedThe low problem of computational efficiency, and by a kind of efficient semantic segmentation network structure of efficient residual error module design of design, it realizesQuick real-time semantic segmentation.The key point of this method is that how to design one kind not only can guarantee that precision was higher, but also can guarantee ginsengQuantity and the lower efficient residual error module of calculation amount.
Step 1: separating convolution in conjunction with residual error connection and depth designs a kind of efficient residual error module --- separable residual errorModule.Residual error connection can allow network to avoid the occurrence of gradient disappearance and degenerate problem, but conventional residual while depth is largerAn important factor for computational efficiency of module is limitation network efficiency.It is a kind of normal of network model miniaturization that depth, which separates convolution,With technology, standard 3D convolution kernel is resolved into 2D convolution kernel that one is handled by channel and 1 × 1 size across channel by it3D convolution kernel.Standard 3D convolution in conventional residual module is replaced with into depth and separates convolution, can largely be reducedThe parameter amount and calculation amount of network promote network efficiency.
Step 2: being introduced on the basis of separable residual error module further to promote network efficiency and decomposing convolution designA kind of more efficient residual error module --- it is separable to decompose residual error module.Decompose one kind that convolution is also network model miniaturizationCommon technology, standard 3D convolution kernel is resolved into two continuous orthogonal one-dimensional convolution kernels by it.It will be in separable residual error moduleBecoming two by 3 × 3 convolution of channel continuously can further decrease network by 1 × 3 convolution of channel by 3 × 1 convolution sum of channelParameter amount and calculation amount, promoted network efficiency.
Step 3: the separable decomposition residual error module design based on design is a kind of to make full use of its learning ability and studyThe efficient semantic segmentation network structure of efficiency.The network uses coding-decoding architecture, by continuous encoder and continuous decodingDevice is constituted.Encoder decomposes residual error module and down-sampled module composition by separable, is responsible for extracting feature, and generate down-sampledCharacteristic pattern.Decoder decomposes residual error module and up-sampling module composition by separable, is responsible for further extracting feature, and to volumeThe feature of code device output is up-sampled, final to generate and input picture Pixel-level class probability prediction result of the same size.
Step 4: determining data set used in network training, test, this patent uses Cityscapes roadway scene dataCollection.Input training, test image, extract feature by encoder, generate segmentation result by decoder.At single imageManage the segmentation efficiency of Time evaluation network.
Below with reference to embodiment, the present invention is further described.
There are two types of common design methods for conventional residual module: without bottleneck residual error module and Fig. 1 (b) institute shown in Fig. 1 (a)The bottleneck residual error module shown.When the network number of plies is less, the parameter amount and precision of two kinds of residual error module design modes are almost the same.But with the increase of the network number of plies, bottleneck residual error module needs increased computing resource less, no bottleneck residual error module raisesPrecision is more.In order to design it is a kind of can while guaranteeing precision the higher residual error module of computational efficiency, need utilizing no bottleParameter amount and calculation amount are reduced while neck residual error module accuracy benefits.In order to reach the purpose, residual error can be connected and deepIt spends separable convolution to combine, devises a kind of novel residual error module, as shown in Fig. 1 (c), which can be to a certain degreeIt is upper to solve the problems, such as that conventional residual module computational efficiency is low.It is asked further to solve the computational efficiency limitation of conventional residual moduleTopic, promotes the computational efficiency of residual error module, and this patent combines again on the basis of combining residual error connection and depth to separate convolutionDecomposition convolution, devises a kind of more efficient residual error module, referred to as separable to decompose residual error module, as shown in Fig. 1 (d).
Depth separates convolution sum and decomposes a kind of mode that convolution is all network model miniaturization, their essence is superfluousThe remaining less rarefaction expression of information.Separating convolution bonus point deconvolution with depth replaces standard 3D convolution that can reduce convolution kernelRedundant representation, reduce network parameter amount and calculation amount, largely so as to by network application to mobile terminal platform.Depth canStandard 3D convolution kernel is resolved into a 2D convolution kernel and 1 × 1 size across channel by channel processing by separation convolution kernel3D convolution kernel.It decomposes convolution kernel and standard 3D convolution kernel is resolved into two continuous orthogonal one-dimensional convolution kernels.As shown in Fig. 2,Fig. 2 (a) represents standard 3D convolution kernel, and Fig. 2 (b) represents depth and separates convolution kernel, and Fig. 2 (c), which is represented, decomposes convolution kernel, Fig. 2 (d)It represents and combines the separable decomposition convolution kernel that depth separates convolution sum decomposition convolution.
As shown in Fig. 2 (a), standard 3D convolution is R in the calculation amount that a certain layer carries out convolution1:
R1=DK×DK×M×N×DF×DF(1),
D in formula (1)K×DKIt is convolution kernel size, M is input channel number, and N is output channel number, DF×DFIt is that input is differentiatedRate.
As shown in Fig. 2 (b), depth separates convolution and is broadly divided into two parts in the calculation amount that a certain layer carries out convolution:First is that by the calculation amount of channel 2D convolution kernel as shown in Fig. 2 (b) top half, second is that as shown in the lower half portion Fig. 2 (b) acrossThe calculation amount of 1 × 1 size convolution kernel of channel.Wherein 2D convolution kernel only handles a channel, quantity and input channel number phase every timeTogether.Across the channel processing feature figure of 1 × 1 size convolution kernel, becomes specified quantity for output channel number.Use the separable volume of depthLong-pending calculation amount is R2:
R2=DK×DK×M×DF×DF+M×N×DF×DF(2),
D in formula (2)K×DK×M×DF×DFIt is by channel 2D convolution kernel calculation amount, M × N × DF×DFIt is across channel 1 × 1Size convolution kernel calculation amount.
As shown in Fig. 2 (c), decomposing convolution in the calculation amount that a certain layer carries out convolution is R3:
R3=(DK+DK)×M×N×DF×DF (3)。
As shown in Fig. 2 (d), convolution sum is separated in conjunction with depth and decomposes the separable convolution of decomposing of depth of convolution in a certain layerThe calculation amount for carrying out convolution is R4:
R4=(DK+DK)×M×DF×DF+M×N×DF×Dk (4)。
So separating the calculation amount of convolution bonus point deconvolution processing feature figure relative to using 3D volumes of standard using depthThe calculation amount ratio of product processing feature figure is R:
Assuming that input is the characteristic pattern of 64 channel, 512 × 256 size, export as the feature of 128 channel, 256 × 128 sizeFigure, convolution kernel are then only to use standard 3D convolutional calculation with the calculation amount that depth separates convolution bonus point deconvolution having a size of 5 × 5The 4.3% of amount.Replace standard 3D convolution that can substantially reduce calculation amount it can be seen that separating convolution bonus point deconvolution using depth.AndAnd by formula (5) it is found that convolution kernel size and output channel number are more, convolution bonus point deconvolution is separated instead of standard using depthThe income that 3D convolution obtains in terms of reducing parameter amount and calculation amount is bigger.
It is separable to decompose that residual error module combines residual error connection, depth separates convolution sum and decomposes convolution, using one 3 ×1 replaces 3 × 3 Standard convolutions by channel convolution by channel convolution sum one 1 × 3, guarantees the network situation larger in depthUnder be still able to maintain lower parameter amount.And separable residual error module of decomposing one-dimensional joined at two by between the convolution of channel1 × 1 across channel convolution carries out the fusion of interchannel information, and guarantee can extract more effective feature.In addition to this, which existsIt is also added into nonlinear activation function after 1 × 1 across channel convolution, network can be allowed to be fitted more complicated function.Finally shouldModule can achieve the effect that computational efficiency is promoted under conditions of keeping precision.
The main purpose of this patent is to solve current semantics partitioning algorithm to be unable to satisfy asking for high efficiency and requirement of real-timeTopic.To solve this problem, separable decomposition residual error module design of this patent based on design is a kind of can make full use of its studyThe efficient semantic segmentation network of ability and learning efficiency, as shown in Figure 3.This patent is carried out end-to-end using coding-decoding architectureNetwork training.Network is made of continuous encoder and continuous decoder.Encoder includes 16 modules, is decomposed by separableResidual error module and down-sampled module composition are responsible for extracting feature, and generate down-sampled characteristic pattern.Decoder includes 7 modules,Residual error module and up-sampling module composition are decomposed by separable, is responsible for further extracting feature, and to the feature of encoder outputIt is up-sampled, it is final to generate and input picture Pixel-level class probability prediction result of the same size.Fig. 3 with input 1024 ×The semantic segmentation network of this patent design is shown for 3 Channel Color images of 512 sizes.Input picture passes through in coding stageIt is repeatedly separable to decompose residual error module and after down-sampled module extracts feature, into decoding stage by up-sampling module generate withInput picture segmentation result of the same size.The network can reach extraordinary effect in segmentation efficiency, realize quickly realWhen semantic segmentation.Although this patent is directed to semantic segmentation task, the separable decomposition residual error module of design can be straightIt connects in the network architecture for moving to any other task.
The specific implementation step of this method includes the training process and test process of network.Network training process is to instructPractice all parameters in network, obtains so that the smallest network parameter of loss function, and then obtain optimal network structure.TestProcess is to carry out semantic segmentation to new input picture using trained network, and evaluate its segmentation performance.
Training process:
Step 1: being ready to train semantic segmentation input image data and its corresponding label used, this patent usesThe training set of Cityscapes roadway scene data set.
Step 2: separating convolution sum decomposition convolution in conjunction with residual error connection, depth and design a kind of high efficiency residual error module ---It is separable to decompose residual error module, solve the computational efficiency limitation of conventional residual module.
Step 3: a kind of efficiently real-time semantic segmentation network structure of the separable decomposition residual error module design based on design.
Step 4: forward calculation.All parameters in network are initialized, forward calculation is carried out to network, obtains initial predictedAs a result.
Step 5: error back propagation.The difference between prediction result and true tag is analyzed, average hand over and than damage is calculatedIt loses, and loss is carried out by backpropagation according to chain rule.
Step 6: weight updates.Weight update is carried out using gradient descent method to loss function, so that loss function is graduallyReduce.
Step 7: repetitive exercise.Circulation executes the 4th~6 step, up to network convergence or reaches maximum number of iterations, obtains mostWhole network structure.
Test process:
Step 1: selected test semantic segmentation input image data and its corresponding label used, this patent useThe test set of Cityscapes roadway scene data set.
Step 2: selected test data set being split using trained semantic segmentation network.According to prediction resultDifference between true tag judges network segmentation precision, judges that network divides efficiency according to the single image testing time.
This patent problem low for conventional residual module computational efficiency separates convolution sum in conjunction with residual error connection, depthIt decomposes convolution and devises a kind of novel efficient residual error module, it is referred to as separable to decompose residual error module.The separable decomposition residual errorModule is utilized residual error module and can construct the advantages of deep layer network promotes neural network accuracy and the separable convolution of depth, decompose convolutionThe advantages of network parameter amount and calculation amount can be reduced, solves the computational efficiency restricted problem of conventional residual module, realizesThe effect of network query function efficiency is promoted while guaranteeing neural network accuracy.This patent, which has also been devised, simultaneously a kind of can make full use of thisThe separable semantic segmentation network structure for decomposing residual error module learning ability and learning efficiency realizes quickly semantic in real time pointIt cuts.