Occlusion object segmentation solving method based on deep learningTechnical Field
The invention relates to the technical field of image processing, in particular to a method for segmenting and solving a sheltered object based on deep learning.
Background
Given a scene, human beings can perform scene understanding well, for example, human beings can not only recognize objects in the scene well, but also perceive relationships between the objects, including occlusion relationships. Occlusion often occurs in two-dimensional scenes, and the occlusion relationship reflects the depth order relationship between objects, i.e., near objects occlude distant objects. The human being can easily judge the relationship between the shielding objects and can identify the shielded objects at the same time because the human eyes acquire a large amount of prior knowledge along with the accumulation of experience of long-term observation of surrounding images.
Scene understanding is an extremely important fundamental task in the field of computer vision, with the aim of making computers understand scenes like humans. The current studies of scholars on scene understanding are mainly divided into two types: one is based on a neural network model and one is based on a probabilistic graphical model. With the wide application of deep learning in recent years, particularly after the Convolutional Neural Network (CNN) is applied in the image field with great success, various subtasks for scene understanding, such as scene recognition, target detection, scene segmentation and the like, have achieved breakthrough progress. However, the scene understanding based on the neural network model has less research on the occlusion objects, only the objects themselves are concerned (for example, the recognition is only to classify the objects in the picture, and the segmentation is only to classify the pixels), and the relationship between the objects is not considered, so that the occlusion relationship cannot be judged. Secondly, the CNN network generally needs a large amount of data supervision information to support, and needs to see samples shielded at various angles in order to identify the shielded object; in addition, the cognitive process of the neural network is a process based on forward transmission and backward transmission of the Convolutional Neural Network (CNN), and there is no feedback mechanism similar to the human brain, and the essential differences between the two are: the feedforward network is a bottom-up process, while reasoning and feedback based on knowledge and experience is a top-down process. The probabilistic graphical model has certain advantages in the aspects of logical reasoning and context information relationship, some researches can carry out depth order reasoning by using some probabilistic graphical models such as Bayesian reasoning and Markov, but the probabilistic graphical model is only a mathematical model, so that the accuracy is lower compared with a neural network model, different models are required to be established according to different scenes, the universality is poor, and the probabilistic graphical model cannot be used for modeling according to more complex scenes.
Disclosure of Invention
The invention aims to provide a method for segmenting and solving a sheltered object based on deep learning, which comprises the steps of obtaining the area of a part of the object which is not sheltered, obtaining a standard frame global parameter and a standard area global parameter through an image deep learning model, and sending the standard frame global parameter and the standard area global parameter to an image construction model to obtain the object global;
the technical scheme adopted by the invention is as follows: a method for segmenting and solving an occluded object based on deep learning comprises the following steps:
step 1: when the object is shielded, acquiring an image of the object which is not shielded, performing initialization processing on the image to extract an area parameter and a frame parameter, and executing the step 2;
step 2: the area parameters and the frame parameters are used as input and sent to an image deep learning model, the model outputs corresponding frame overall appearance parameters and area overall appearance parameters, the parameters are continuously screened according to the requirements, and the step 3 is executed;
and step 3: and screening out standard frame overall appearance parameters and standard area overall appearance parameters from the plurality of frame overall appearance parameters and area overall appearance parameters, and sending the standard frame overall appearance parameters and the standard area overall appearance parameters to an image construction model to obtain the object overall appearance.
Preferably, in the step 1, the initialization processing includes dividing the obtained image of the object that is not occluded into a plurality of cells, and counting the number and area of the cells, where the cells are rectangles, and the areas of the cells close to the edge of the object and the occlusion boundary gradually decrease until the image is fully covered.
Preferably, the acquired image is a planar image of the object, and a single object acquires planar images in at most three directions.
Preferably, in the step 2, the building of the image depth learning model includes the following steps:
step 41: establishing a neural network structure comprising a full connection layer, a convolution layer, an activation function and a pooling layer, establishing forward propagation of image parameters through the full connection layer, and executing step 42;
step 42: extracting image features through the convolutional layer, outputting the feature matrix channel number which is the same as the number of the convolutional kernels through training the convolutional kernels and the deviation features, and executing the step 43;
step 43: in the activation function, the calculation process is linear, a nonlinear factor is introduced, and a matrix size calculation formula after convolution is obtained, wherein W is (W-F +2P)/S +1
Step 44: and only changing W and H of the characteristic matrix without changing the number of depth channels, and obtaining a learning model formula by setting the probability sum of all the processed nodes to be 1.
Preferably, in step 42, a channel of each convolution kernel is the same as a channel of the input feature layer, each channel of the convolution kernel is convolved with the input layer of the corresponding channel, and then the channels are summed to obtain a feature matrix which is used as an output of the deep learning image and is used as a channel of the next layer of input features to be propagated forward.
Preferably, the standard frame complete picture and the standard area complete picture are average values output by the image depth learning model through multiple times of training.
Preferably, the error calculation of the image depth learning model adopts a cross Encopy Loss of Loss cross entropy algorithm.
Preferably, in step 43, Dropout random neuron inactivation operation is used in the first two layers of the fully-connected layer through the relu activation function to reduce fitting.
Compared with the prior art, the invention has the beneficial effects that:
1. the full-view of the object can be quickly obtained through the only area of the object image, and the method is quick and high in accuracy.
Drawings
Fig. 1 is a schematic diagram of a method for segmentation solution of an occluded object based on deep learning.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to fig. 1 of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other implementations made by those of ordinary skill in the art based on the embodiments of the present invention are obtained without inventive efforts.
Example 1:
a method for segmenting and solving an occluded object based on deep learning comprises the following steps:
step 1: when the object is shielded, acquiring an image of the object which is not shielded, performing initialization processing on the image to extract an area parameter and a frame parameter, and executing the step 2;
step 2: the area parameters and the frame parameters are used as input and sent to an image deep learning model, the model outputs corresponding frame overall appearance parameters and area overall appearance parameters, the parameters are continuously screened according to the requirements, and the step 3 is executed;
and step 3: and screening out standard frame overall appearance parameters and standard area overall appearance parameters from the plurality of frame overall appearance parameters and area overall appearance parameters, and sending the standard frame overall appearance parameters and the standard area overall appearance parameters to an image construction model to obtain the object overall appearance.
It should be noted that, in step 1, the initialization processing includes dividing the obtained image of the object that is not occluded into a plurality of cells, and counting the number and area of the cells, where the cells are rectangles, and the area of the cells near the edge of the object and the occlusion boundary gradually decreases until the image is fully covered.
It should be noted that the acquired image is a planar image of an object, and a single object acquires planar images in at most three directions.
It should be noted that, in the step 2, the building of the image depth learning model includes the following steps:
step 41: establishing a neural network structure comprising a full connection layer, a convolution layer, an activation function and a pooling layer, establishing forward propagation of image parameters through the full connection layer, and executing step 42;
step 42: extracting image features through convolutional layers, training convolutional kernels and deviation features, wherein a channel of each convolutional kernel is the same as a channel of an input feature layer (for example, a 5 x 5 RGB image has 3 input layers, and each convolutional kernel also has three channels corresponding to the three input layers respectively), each channel of the convolutional kernel is convolved with the input layer of the corresponding channel respectively, then summing is carried out to obtain a feature matrix which is used as the output of a deep learning image, the number of characteristic matrix channels which are used as the forward propagation output of one channel of the next layer of input features is the same as the number of convolutional kernels, and executing step 43;
step 43: in the activation function, the calculation process is linear, a nonlinear factor is introduced, and the disappearance of the gradient is easy to occur when the sigmoid-layer number is deep; relu-when a very large gradient passes through the back propagation process, the updated weight distribution center may be smaller than zero, the derivative is always 0, the back propagation cannot update the weight, and the neuron is in an inactive state. Obtaining the calculation formula of the matrix size after convolution, wherein W is (W-F +2P)/S +1
Step 44: only changing W and H of the feature matrix, not changing the number of depth channels, enabling the probability sum of all nodes after processing to be 1, and carrying out Max boosting downsampling to aim at carrying out sparse processing on the feature map and reducing the operation amount of data. AveragePooling downsampling layer, characteristics: # No training parameters # only changes W and H of the feature matrix, does not change the depth (number of channels) # the same 5SoftMax layers as the general poolsize and stride.
It should be noted that, in step 42, the channel of each convolution kernel is the same as the channel of the input feature layer, and each channel of the convolution kernel is convolved with the input layer of the corresponding channel, and then summed to obtain a feature matrix as the output of the deep learning image, which is used as a channel of the next layer of input features to propagate forward.
It should be noted that the standard frame full-face and the standard area full-face are average values output by the image depth learning model through multiple training.
It is worth to be noted that the error calculation of the image depth learning model adopts a cross entropy Loss algorithm of cross entropy Loss, wherein the cross entropy Loss algorithm of cross entropy Loss is as follows: in the process of solving the maximum value of the likelihood function of the sample, the minimum value of a function is deduced, and the function is made to be a cross entropy function. (the goal of maximum likelihood estimation is to extrapolate back to the parameter values most likely to lead to such a result using known sample results.) the logistic regression-two classification reflects the degree of deviation of the two distributions (the distributions of groudtruth and output), the optimization goal being to approximate the two distributions.
For the multi-class problem (the output belongs to only one class), crossentry _ Loss is the function 1.
For the two-class problem (the output may fall into multiple classes), crossentry _ Loss is function 2.
It is worth noting that in said step 43, using Dropout random neuron inactivation operation in the first two layers of the fully-connected layer by relu activation function, fitting is reduced, and the LRN layer: a competition mechanism is created for the activity of local neurons, so that the response value becomes relatively larger, and other neurons with smaller feedback are inhibited, and the generalization capability of the model is enhanced. LRN (local Response normalization) local Response normalization.
In summary, by acquiring the area of the part of the object which is not shielded, the standard frame global parameters and the standard area global parameters are obtained through the image depth learning model, and the standard frame global parameters and the standard area global parameters are sent to the image construction model to obtain the object global.