CN110287777B

Movatterモバイル変換

Info

Publication number: CN110287777B
Application number: CN201910405596.6A
Authority: CN
Inventors: 许鹏飞; 王妍; 郭松涛; 李朋喜; 常晓军; 郭凌; 何刚; 陈�峰; 郭军
Original assignee: Northwestern University
Current assignee: Northwestern University
Priority date: 2019-05-16
Filing date: 2019-05-16
Publication date: 2021-06-08
Anticipated expiration: 2039-05-16
Also published as: CN110287777A

Abstract

Translated fromChinese

本发明公开了一种自然场景下的金丝猴躯体分割算法，该算法包括构造语义分割网络，实现端到端的图像分割；对所述的语义分割网络进行训练，保存训练后的网络模型用于待分割图像的分割检测；所述的语义分割网络包括分类网络、融合部分以及输出部分，分类网络用于对输入图像进行特征提取并进行像素级的分类，得到置信图；特征融合层有两个，分别融合最底两层、三层的分割网络，提高识别准确率，实现跨层连接；输出部分最终输出与原始图像大小一致的高分辨率的类热力图。本发明有效地提高了检测准确率，可以较好地解决金丝猴身体被分为多个部分的问题。

The invention discloses a golden monkey body segmentation algorithm in a natural scene. The algorithm includes constructing a semantic segmentation network to realize end-to-end image segmentation; training the semantic segmentation network, and saving the trained network model for use in segmentation Image segmentation and detection; the semantic segmentation network includes a classification network, a fusion part and an output part. The classification network is used to extract features from the input image and perform pixel-level classification to obtain a confidence map; there are two feature fusion layers, respectively The segmentation network of the bottom two and three layers is integrated to improve the recognition accuracy and realize cross-layer connection; the output part finally outputs a high-resolution class heat map with the same size as the original image. The invention effectively improves the detection accuracy, and can better solve the problem that the golden monkey's body is divided into multiple parts.

Description

Golden monkey body segmentation algorithm in natural scene

Technical Field

The invention relates to the technical field of image segmentation and image object identification and positioning, in particular to a golden monkey body segmentation algorithm in a natural scene.

Background

In the research and protection work of the wild golden monkey, the golden monkey is accurately distinguished from the environmental background through natural images and videos, and the method is the basis of a subsequent individual re-identification task, so that data support is provided for the research of the quantity, behaviors and the like of golden monkey groups. At present, a plurality of researchers at home and abroad propose various animal detection and recognition algorithms based on facial features.

The Du provides an automatic positioning method based on the facial features of the rhesus monkey by utilizing image segmentation and mathematical morphology. The traditional image processing method is utilized to realize the automatic positioning of the organs such as the eyes, the mouths and the like of the macaque, and the edge detection operators are utilized to extract the detailed outlines of partial organs.

However, these methods do not yield very good results due to image variations. Pengfei Xu proposes a golden monkey face detection algorithm based on regional color quantization and incremental adaptive course learning, which can effectively detect monkey faces of monkeys, but the segmentation threshold of the isplc is limited by the kind or habitat of the golden monkeys.

Disclosure of Invention

The invention aims to solve the technical problem of providing a golden monkey body segmentation algorithm which can meet the requirements of a golden monkey individual re-identification task on natural scene data processing.

In order to realize the task, the invention adopts the following technical scheme:

a golden monkey body segmentation algorithm under a natural scene comprises the following steps:

constructing a semantic segmentation network to realize end-to-end image segmentation; training the semantic segmentation network, and storing the trained network model for segmentation detection of the image to be segmented;

the semantic segmentation network comprises a classification network, a fusion part and an output part, wherein:

the semantic segmentation network sequentially comprises a first convolution layer, a first maximum pooling layer, a second convolution layer c, a second maximum pooling layer, a third convolution layer, a third maximum pooling layer, a fourth convolution layer, a fourth maximum pooling layer, a fifth convolution layer, a fifth maximum pooling layer, a sixth convolution layer and a seventh convolution layer from front to back; each layer of the first convolution layer to the fifth convolution layer comprises two continuous convolution calculations, the sixth convolution layer comprises one convolution calculation, and the function is to perform feature extraction processing on an input image to obtain a feature map; the seventh convolution layer comprises a primary convolution calculation and a classification activation function, and is used for performing feature extraction processing and pixel-level classification to obtain a confidence map; the role of the first through fifth max pooling layers is to reduce the data dimension without losing features;

the fusion part comprises a first characteristic fusion layer and a second characteristic fusion layer, and the first characteristic fusion layer performs characteristic fusion with the output of the fifth maximum pooling layer after up-sampling the output of the seventh convolution layer; the second feature fusion layer performs feature fusion on the output result of the first feature fusion layer and the features extracted by the fourth maximum pooling layer to obtain a fused confidence map;

the output part comprises an output layer, the output layer comprises an up-sampling layer and a classification layer, wherein the input of the up-sampling layer is the output of the second feature fusion layer, and the function is to expand the fused confidence map to the size of the original input image; the classification layer is used for performing classification prediction on each pixel to finally obtain a high-resolution class thermodynamic diagram consistent with the size of the original input image.

Further, the convolution kernel size of each of the first convolution layer to the fifth convolution layer is 2 × 2, and the step length is 2; the convolution kernel size of the sixth convolution layer is 7 × 7, and the step size is 1; the convolution kernel size of the seventh convolution layer is 1 × 1 with a step size of 1.

Further, the pooling kernel size of each of the first largest pooling layer to the fifth largest pooling layer is 2 × 2, and the step size is 2.

Further, the first feature fusion layer performs feature fusion with the output of the fifth largest pooling layer after upsampling the output of the seventh convolutional layer, and includes:

the first feature fusion layer comprises an up-sampling layer and a convolution layer, the input of the up-sampling layer is the output of the seventh convolution layer, the up-sampling multiple is 2 times, and the function is to expand pixels of the confidence map so as to facilitate feature fusion and obtain a confidence map A2 with expanded dimensions; the input of the convolutional layer is the output of the fifth maximum pooling layer, the size of the convolutional kernel is 1 multiplied by 1, the step length is 1, the activation function is a Relu function, and a confidence map B of the output of the fifth maximum pooling layer is obtained; the final output of the first feature fusion layer is the sum of the confidence map B and the confidence map a2, denoted as confidence map C.

Further, the second feature fusion layer performs feature fusion on the output result of the first feature fusion layer and the features extracted by the fourth largest pooling layer to obtain a fused confidence map, which includes:

the second feature fusion layer comprises an up-sampling layer and a convolution layer, the input of the up-sampling layer is a segmentation result graph C, the up-sampling multiple is 2 times, the function is to expand pixels of the confidence graph so as to facilitate feature fusion, and a dimension expanded confidence graph C2 is obtained; the input of the convolution layer is the output of the fourth maximum pooling layer, the size of the convolution kernel is 1 multiplied by 1, the step length is 1, the activation function is a Relu function, and a confidence map D of the output of the fourth maximum pooling layer is obtained; the final output of the second feature fusion layer is the sum of the confidence map D and the confidence map C2, which is denoted as the confidence map E, i.e. the fused confidence map.

Further, when the semantic segmentation network is trained, the loss function adopted is as follows:

wherein, y^(i，j)Representing the label value at the actually classified image pixel point (i, j) corresponding to the input image,

representing the predicted value of the pixel point (i, j) of the output image after the input image is processed by the semantic segmentation network, height and width respectively representing the height and width of the image, dw^(i，j)The weight function is constrained for the distance, expressed as:

wherein distance (C)^(i,j),enter^(i,j)) Representing a pixel point I^(i，j)Center away from the connected domain where it is located^(i，j)α and β are two constants.

The invention has the following technical characteristics:

1. the algorithm achieves the purpose of golden monkey target detection by semantically segmenting an original image, mainly performs golden monkey and natural environment segmentation by a full convolution network in deep learning, and focuses an FCN model on the integrity of a golden monkey individual by using a loss function based on distance weight, namely an improved cross entropy loss function, so that the final detection accuracy is improved.

2. Through experimental contrastive analysis, the problem that the body of the golden monkey is divided into a plurality of parts can be better solved by using the improved loss function, and the improved natural scene golden monkey detection network can better improve the image segmentation result of the previous problems.

Drawings

FIG. 1 is a diagram of a semantic segmentation network architecture;

FIG. 2 is a feature fusion graph;

FIG. 3 illustrates IoU a calculation method;

FIG. 4 is an exemplary diagram of a semantic segmentation network generating correct detection results, wherein (a) represents the segmentation results of the original image, and (b) is a rectangular frame in which rectangular golden monkey individual detection results are generated based on the result image;

FIG. 5 is an exemplary diagram of error detection results generated by the semantic segmentation network; wherein (a) and (c) show the condition that the lower edge of the golden monkey is not completely covered by the rectangular detection result, and (b) show the condition that the single golden monkey is divided into two detection frames due to the error of the division result;

fig. 6 is a comparison of detection results of the semantic segmentation network after training by using the improved DWL function, wherein (a), (b), and (c) are respectively a diagram of detection results of three golden monkey individuals.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The golden monkey body segmentation in the scheme refers to detecting and segmenting an image part where a golden monkey body is located from an image containing the golden monkeys.

Step 1, constructing a semantic segmentation network to realize end-to-end image segmentation

first part, the classification network

The classification network is composed of 7 convolutional layers and 5 pooling layers, and the arrangement of each layer from front to back of the network is as follows: a first rolling layer conv1, a first maximum pooling layer pool1, a second rolling layer conv2, a second maximum pooling layer pool2, a third rolling layer conv3, a third maximum pooling layer pool3, a fourth rolling layer conv4, a fourth maximum pooling layer pool4, a fifth rolling layer conv5, a fifth maximum pooling layer pool5, a sixth rolling layer conv6 and a seventh rolling layer conv 7. Wherein:

performing conv 1-conv 5 on the first to fifth conv layers, wherein each layer comprises two times of continuous convolution calculation, and performing feature extraction processing on an input image to obtain a feature map; a sixth convolution layer conv6, including a convolution calculation, for performing a feature extraction process on the input image to obtain a feature map, and outputting the feature map as the next layer; a seventh convolution layer conv7, which includes a primary convolution calculation and classification activation function, and is used for performing feature extraction processing on the input image and performing pixel-level classification to obtain a heat map of a low-resolution class, namely a confidence map; the sizes of convolution kernels of each layer from conv1 to conv5 are 2 multiplied by 2, and the step length is 2; the convolution kernel size of the sixth convolution layer is 7 × 7, and the step size is 1; the convolution kernel size of the seventh convolution layer is 1 × 1 with a step size of 1.

Reducing the data dimension under the condition of not losing the features from the first largest pooling layer to the fifth largest pooling layer pool 1-pool 5 to obtain a feature map with reduced data dimension; the sizes of pool 1-pool 5 pooling nuclei are all 2X 2, step size is 2.

Second part, the fusion part

The fusion part comprises a first characteristic fusion layer and a second characteristic fusion layer, and the first characteristic fusion layer performs characteristic fusion with the output of the fifth maximum pooling layer after up-sampling the output of the seventh convolution layer; and the second feature fusion layer performs feature fusion on the output result of the first feature fusion layer and the features extracted by the fourth maximum pooling layer.

The first fusion layer fuses the segmentation results of the characteristics of the bottom two layers and is used for fusing the characteristics of multi-layer output, so that the identification accuracy is improved; the second feature fusion layer fuses the segmentation results of the bottom three layers, and cross-layer connection is achieved.

The feature fusion part is shown in figure 2:

the first feature fusion layer comprises an up-sampling layer and a convolution layer, wherein the input of the up-sampling layer is the output of conv7, the up-sampling multiple is 2 times, and the function is to expand pixels of the confidence map so as to facilitate feature fusion and obtain a confidence map A2 with expanded dimensions; the input of the convolutional layer is the output of pool5, the size of the convolutional kernel is 1 multiplied by 1, the step length is 1, the activation function is the Relu function, and a confidence map B of the output of the pool5 layer is obtained; the final output of the first feature fusion layer is the sum of the confidence map B and the confidence map A2, and is marked as a confidence map C, namely a segmentation result map fused with the features of the bottom two layers, and the segmentation result map is used for fusing the features of the multi-layer output and improving the recognition accuracy.

The second feature fusion layer comprises an up-sampling layer and a convolution layer, the input of the up-sampling layer is a segmentation result graph C, the up-sampling multiple is 2 times, the function is to expand pixels of the confidence graph so as to facilitate feature fusion, and a dimension expanded confidence graph C2 is obtained; the input of the convolutional layer is the output of pool4, the size of the convolutional kernel is 1 multiplied by 1, the step length is 1, the activation function is a relu function, and a confidence map D of the output of the pool4 layer is obtained; the final output of the second feature fusion layer is the sum of the confidence map D and the confidence map C2, and is recorded as a confidence map E, namely, the segmentation result map of the bottom three layers is fused, so that cross-layer connection is realized, the features of multi-layer output are fused, and the recognition accuracy is improved.

Third, output section

The output part comprises an output layer, the output layer comprises an up-sampling layer and a classification layer, wherein the input of the up-sampling layer is the output of the fusion part, namely the second characteristic fusion layer; the upsampling multiple is 8 times, and the function is to enlarge the size of the pixels of the fused confidence image E to the size of an original input image (an image input into a classification network); the classification layer activation function is a SoftMax function and is used for performing classification prediction on each pixel to finally obtain a high-resolution class thermodynamic diagram consistent with the size of an original input image.

Step 2, training the semantic segmentation network constructed in thestep 1

Creating a data set: in the data set, the golden monkey partial pixels are uniformly marked as target areas. And finally, creating a golden monkey target detection set-Nature golden monkey Segmentation under a natural scene according to the standard of the PASCAL VOC data set, and naming the golden monkey target detection set-Nature golden monkey Segmentation as the NGS. The NGS data set contains 600 golden monkey images in a natural scene in total, and 31 golden monkey individuals appear in total, and the golden monkey has uniform gender and age layer distribution and rich action types.

Training is carried out by utilizing an NGS data set, training data and testing data are randomly distributed according to the ratio of 4:1, the maximum iteration number is set to be 5000 times, and a network model is stored after the training is finished. The specific training process is as follows:

using the final output of the semantic segmentation network as

Inputting the classified images of the images as y (namely the actual classification result of the original image) into a loss function and calculating, wherein the calculated result reflects the difference between the prediction and the actual result of the network; and (3) carrying out derivation on parameters in the network by the loss function, updating the network parameters according to the relationship of the derivatives, setting the learning rate to be 0.00005, and keeping the learning rate stable and unchanged in the learning process.

In the scheme, a loss function distance-weight loss based on distance weight is adopted as the loss function and is marked as DWL; the loss function is obtained by introducing weight coefficients to different pixel points in a basic cross entropy function and taking the weight coefficients as a distance constraint weight function. The DWL loss function can better calculate the distance between the predicted value and the true value of the sample, so that the model focuses more on the central area of the body of the golden monkey, and learns the structural information of the body of the golden monkey, and the DWL loss function is as follows:

a specific derivation of the DWL loss function is given below.

For the segmentation problem of the invention, the network model outputs the prediction probability value of whether each pixel belongs to the golden monkey region, and the cross entropy function L of the network model can be deduced through the basic classification cross entropy function as follows:

the predicted value of the pixel point (i, j) of the output image after the input image is processed by the semantic segmentation network is represented, and height and width respectively represent the height and width of the image.

The invention mainly makes two improvements to the cross entropy loss function: the improvement of different weights of the ROI area and the environmental background area and the introduction of the distance information from the center to the edge position of the golden monkey in the ROI area.

Firstly, introducing a weight coefficient W to different pixel points in a basic cross entropy function^(i，j)The new cross entropy function WL is as follows:

wherein, W^(i，j)The weight of loss at the pixel point (i, j) is expressed, and the original cross entropy loss function L can be understood as W^(i，j)Is constantly equal to 1 and is therefore negligible.

In calculating W^(i，j)In the process, the label value y of the pixel point is required to be determined^(i,j)And the position information pair weight coefficient W of the pixel point (i, j)^(i，j)Respectively taking values: firstly, keeping the weight of an environmental background region unchanged, and increasing the weight coefficient of a golden monkey ROI region to enable a model to pay more attention to the golden monkey region; secondly, for a golden monkey ROI target region, in order to utilize the body structure information of the golden monkey, linearly reduced weights are set from the central part of the body of the golden monkey, namely the rectangular center of the ROI region to the edge, so that the learning of the model to the central region of the body of the golden monkey is strengthened, and meanwhile, important distance constraint information between the center of the body of the golden monkey and the hair edge can be kept as much as possible; the distance constraint weight function dw^(i，j)As follows:

wherein, I^(i，j)Indicating the pixel at (i, j), center^(i，j)(ii) represents the center of the connected domain where the pixel point (i, j) is located, distance: (^(i,j),enter^(i,j)) Representing a pixel point I^(i，j)Center away from the connected domain where it is located^(i，j)A and beta are two constants for controlling the pixel point I^(i，j)The weighted value of the region makes the weighted range of different pixel points in the ROI region of the golden monkey be [ alpha-beta, alpha ]](ii) a Let α be 2 and β be max (distance: (b)^(i,j),enter^(i,j)) When I) then^(i，j)When being the center of the connected component, dw^(i，j)A value of 2; when I is^(i，j)At farthest distance from the center of the connected component, dw^(i，j)The value is 1, thereby achieving the goal of decreasing weights in the golden monkey ROI region from the center to the edge of the connected domain.

Weighting coefficient W of new cross entropy function WL^(i，j)Value is dw^(i，j)An improved distance weight based loss function DWL can be obtained as follows:

and 3, storing the trained network model for the segmentation detection of the image to be segmented.

After the semantic segmentation network is trained in thestep 2, storing the trained network model; in practical application, an image to be segmented is input into a network model, and the output of the network model is the segmented high-resolution thermodynamic diagram.

Through experimental contrastive analysis, the problem that the body of the golden monkey is divided into a plurality of parts can be better solved by using the improved loss function, and the improved natural scene golden monkey detection network can better improve the image segmentation result of the previous problems.

The experimental comparative analysis procedure is as follows:

the invention performs experiments on the NGS dataset and randomly distributes training data and test data in a 4:1 ratio. In the natural scene golden monkey detection algorithm, the learning rate is set to be 0.00005, the learning rate is stable and unchanged in the learning process, and the maximum iteration number of the experiment is set to be 5000 times. After the segmentation result of the original image in the natural scene is obtained, generating a rectangular golden monkey individual detection result according to the edge of the segmentation image, in the generation process, eliminating a target pixel region which is too small to be normally utilized, and finally obtaining golden monkey individual data through the image region in the rectangular frame.

IoU standard can be used to quantitatively measure the correlation between the true value and the predicted value, as shown in fig. 3, in the present invention, the segmentation results obtained by the FCN network before and after the DWL function is applied are compared, IoU and the average value of all 100 images in the test set are calculated by calculating IoU between the segmentation results and ground truth as the evaluation index of the network performance, and the results are shown in the following table:

IoU comparison before and after DWL function improvement

Loss function	IoU
		Cross entropy function	85.28％
DWL function	86.14％

As can be seen from the table, the segmentation effect obtained by the original semantic segmentation network is better for the basic data, and the method is proved to be suitable for the segmentation task of the invention; after the loss function is improved, the result of using the DWL function is improved by 0.86 percent compared with the original network, and the improvement of the loss function is proved by the invention, so that the segmentation effect of the network model is improved to a certain extent on the whole.

As shown in fig. 4, the rectangle frame of (a) shows the segmentation result of the original image, and the rectangle frame of (b) is the rectangular golden monkey individual detection result generated based on the result image.

Fig. 5 shows in detail the influence of segmentation error on the generation of rectangle detection result, where (a) and (c) show two cases where the lower edge of the golden monkey is not completely covered by the rectangle detection result, and the segmentation error causes the problem of missing edge information of the golden monkey. (b) The situation that a single golden monkey is divided into two detection boxes due to the segmentation result error is shown, and the segmentation error causes the problem that the single golden monkey is divided into a plurality of rectangles. It is clear that the test data shown in fig. 5 is not usable for the golden monkey individual re-identification experiment.

By the method, the loss function of the original FCN network is improved, the loss weight calculation mode of different pixels based on distance is adopted, so that the constraint information of the body structure of the golden monkey is introduced, the segmentation result is shown in fig. 6, and the result has a better improvement effect. The image compares the situation that the rectangular detection result cannot be used due to two typical segmentation errors before improvement, the segmentation results in the two images are obviously improved, and the area of the error result is obviously reduced. For the generated rectangular detection results, (a) the lower edge of the golden monkey is completely included in the detection results, and (b) the golden monkey originally divided into two rectangular detection frames is successfully and accurately detected as a complete golden monkey individual.

The distance weight-based loss function DWL designed by the invention can greatly improve the golden monkey segmentation result in a natural scene, thereby effectively improving the accuracy of the final rectangular detection result and obtaining golden monkey individual image data meeting the requirements of the golden monkey individual re-identification experiment through the rectangular region result.

Claims

Translated fromChinese

1.一种自然场景下的金丝猴躯体分割算法，其特征在于，包括以下步骤：1. a golden monkey body segmentation algorithm under a natural scene, is characterized in that, comprises the following steps:

构造语义分割网络，实现端到端的图像分割；对所述的语义分割网络进行训练，保存训练后的网络模型用于待分割图像的分割检测；Constructing a semantic segmentation network to achieve end-to-end image segmentation; training the semantic segmentation network, and saving the trained network model for segmentation and detection of images to be segmented;

所述的语义分割网络包括分类网络、融合部分以及输出部分，其中：The semantic segmentation network includes a classification network, a fusion part and an output part, wherein:

融合部分包括第一特征融合层和第二特征融合层，第一特征融合层将第七卷积层的输出经过上采样后，与第五最大池化层的输出进行特征融合；第二特征融合层将第一特征融合层的输出结果和第四最大池化层提取的特征进行特征融合，得到融合后的置信图；The fusion part includes a first feature fusion layer and a second feature fusion layer. The first feature fusion layer upsamples the output of the seventh convolutional layer and performs feature fusion with the output of the fifth maximum pooling layer; the second feature fusion layer fuses the output result of the first feature fusion layer with the features extracted by the fourth maximum pooling layer to obtain a fused confidence map;

输出部分包括输出层，输出层包括一个上采样层和一个分类层，其中上采样层的输入为第二特征融合层的输出，作用是扩大所述融合后的置信图到原始输入图像的大小；分类层作用是对每个像素进行分类预测，最终得到与原始输入图像大小一致的高分辨率的类热力图。The output part includes an output layer, and the output layer includes an upsampling layer and a classification layer, wherein the input of the upsampling layer is the output of the second feature fusion layer, and the function is to expand the fused confidence map to the size of the original input image; The role of the classification layer is to classify and predict each pixel, and finally obtain a high-resolution class heatmap that is consistent with the size of the original input image.

2.如权利要求1所述的自然场景下的金丝猴躯体分割算法，其特征在于，所述的第一卷积层至第五卷积层的每一层卷积核大小均为2×2，步长为2；第六卷积层的卷积核大小为7×7，步长为1；第七卷积层的卷积核大小为1×1，步长为1。2. The golden monkey body segmentation algorithm in a natural scene as claimed in claim 1, wherein the size of each convolution kernel of the first convolution layer to the fifth convolution layer is 2×2, The stride is 2; the convolution kernel size of the sixth convolutional layer is 7×7, and the stride is 1; the convolutional kernel size of the seventh convolutional layer is 1×1, and the stride is 1.

3.如权利要求1所述的自然场景下的金丝猴躯体分割算法，其特征在于，所述的第一最大池化层至第五最大池化层的每一层池化核大小为2×2，步长为2。3. The golden monkey body segmentation algorithm in a natural scene according to claim 1, wherein the size of the pooling kernel of each layer from the first maximum pooling layer to the fifth maximum pooling layer is 2×2 , with a step size of 2.

4.如权利要求1所述的自然场景下的金丝猴躯体分割算法，其特征在于，所述的第一特征融合层将第七卷积层的输出经过上采样后，与第五最大池化层的输出进行特征融合，包括：4. The golden monkey body segmentation algorithm in a natural scene according to claim 1, wherein the first feature fusion layer upsamples the output of the seventh convolutional layer, and then combines the output of the seventh convolutional layer with the fifth maximum pooling layer. The output of the feature fusion, including:

第一特征融合层包括一个上采样层和卷积层，上采样层的输入为第七卷积层的输出，上采样倍数为2倍，作用是扩大置信图的像素便于特征融合，得到维度扩大的置信图A2；卷积层的输入为第五最大池化层的输出，卷积核大小为1×1，步长为1，激活函数为Relu函数，得到第五最大池化层输出的置信图B；第一特征融合层最终的输出为置信图B与置信图A2的和，记为置信图C。The first feature fusion layer includes an upsampling layer and a convolutional layer. The input of the upsampling layer is the output of the seventh convolutional layer, and the upsampling multiple is 2 times. The function is to expand the pixels of the confidence map to facilitate feature fusion and obtain dimension expansion. The confidence map A2 of ; the input of the convolutional layer is the output of the fifth maximum pooling layer, the convolution kernel size is 1×1, the stride is 1, and the activation function is the Relu function to obtain the confidence of the output of the fifth maximum pooling layer. Figure B; the final output of the first feature fusion layer is the sum of the confidence map B and the confidence map A2, denoted as the confidence map C.

5.如权利要求1所述的自然场景下的金丝猴躯体分割算法，其特征在于，所述的第二特征融合层将第一特征融合层的输出结果和第四最大池化层提取的特征进行特征融合，得到融合后的置信图，包括：5. The golden monkey body segmentation algorithm in a natural scene as claimed in claim 1, wherein the second feature fusion layer combines the output result of the first feature fusion layer and the features extracted by the fourth maximum pooling layer. Feature fusion to obtain a fused confidence map, including:

第二特征融合层包括一个上采样层和卷积层，上采样层的输入为分割结果图C，上采样倍数为2倍，作用是扩大置信图的像素便于特征融合，得到维度扩大的置信图C2；卷积层的输入为第四最大池化层的输出，卷积核大小为1×1，步长为1，激活函数为Relu函数，得到第四最大池化层输出的置信图D；第二特征融合层最终输出为置信图D与置信图C2的和，记为置信图E，即为融合后的置信图。The second feature fusion layer includes an upsampling layer and a convolutional layer. The input of the upsampling layer is the segmentation result graph C, and the upsampling multiple is 2 times. C2; the input of the convolution layer is the output of the fourth maximum pooling layer, the size of the convolution kernel is 1×1, the step size is 1, the activation function is the Relu function, and the confidence map D of the output of the fourth maximum pooling layer is obtained; The final output of the second feature fusion layer is the sum of the confidence map D and the confidence map C2, denoted as the confidence map E, which is the fused confidence map.

6.如权利要求1所述的自然场景下的金丝猴躯体分割算法，其特征在于，所述的语义分割网络进行训练时，所采用的损失函数为：6. The golden monkey body segmentation algorithm under natural scene as claimed in claim 1, is characterized in that, when described semantic segmentation network is trained, the loss function that adopts is:

其中，y^(i,j)表示输入图像对应的实际分类后的图像像素点(i,j)处的标签值，

表示输入图像经语义分割网络处理后的输出图像像素点(i,j)处的预测值，height和width分别表示图像的高和宽，dw^(i,j)为距离约束权重函数，表示为：Among them, y^{(i, j)} represents the label value at the actual classified image pixel point (i, j) corresponding to the input image,

Represents the predicted value at pixel (i, j) of the output image after the input image is processed by the semantic segmentation network, height and width represent the height and width of the image respectively, dw^{(i, j)} is the distance constraint weight function, expressed as:

其中，distance(I^(i,j),center^(i,j))表示像素点I^(i,j)距离其所在连通域中心center^(i,j)的距离，α和β是两个常数。Among them, distance(I^(i,j) ,center^(i,j) ) represents the distance between the pixel point I^(i,j) and the center^(i,j) of the connected domain where it is located, and α and β are two constants.