Movatterモバイル変換


[0]ホーム

URL:


CN110287777B - Golden monkey body segmentation algorithm in natural scene - Google Patents

Golden monkey body segmentation algorithm in natural scene
Download PDF

Info

Publication number
CN110287777B
CN110287777BCN201910405596.6ACN201910405596ACN110287777BCN 110287777 BCN110287777 BCN 110287777BCN 201910405596 ACN201910405596 ACN 201910405596ACN 110287777 BCN110287777 BCN 110287777B
Authority
CN
China
Prior art keywords
layer
output
feature fusion
confidence map
maximum pooling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910405596.6A
Other languages
Chinese (zh)
Other versions
CN110287777A (en
Inventor
许鹏飞
王妍
郭松涛
李朋喜
常晓军
郭凌
何刚
陈�峰
郭军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern University
Original Assignee
Northwestern University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern UniversityfiledCriticalNorthwestern University
Priority to CN201910405596.6ApriorityCriticalpatent/CN110287777B/en
Publication of CN110287777ApublicationCriticalpatent/CN110287777A/en
Application grantedgrantedCritical
Publication of CN110287777BpublicationCriticalpatent/CN110287777B/en
Expired - Fee Relatedlegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种自然场景下的金丝猴躯体分割算法,该算法包括构造语义分割网络,实现端到端的图像分割;对所述的语义分割网络进行训练,保存训练后的网络模型用于待分割图像的分割检测;所述的语义分割网络包括分类网络、融合部分以及输出部分,分类网络用于对输入图像进行特征提取并进行像素级的分类,得到置信图;特征融合层有两个,分别融合最底两层、三层的分割网络,提高识别准确率,实现跨层连接;输出部分最终输出与原始图像大小一致的高分辨率的类热力图。本发明有效地提高了检测准确率,可以较好地解决金丝猴身体被分为多个部分的问题。

Figure 201910405596

The invention discloses a golden monkey body segmentation algorithm in a natural scene. The algorithm includes constructing a semantic segmentation network to realize end-to-end image segmentation; training the semantic segmentation network, and saving the trained network model for use in segmentation Image segmentation and detection; the semantic segmentation network includes a classification network, a fusion part and an output part. The classification network is used to extract features from the input image and perform pixel-level classification to obtain a confidence map; there are two feature fusion layers, respectively The segmentation network of the bottom two and three layers is integrated to improve the recognition accuracy and realize cross-layer connection; the output part finally outputs a high-resolution class heat map with the same size as the original image. The invention effectively improves the detection accuracy, and can better solve the problem that the golden monkey's body is divided into multiple parts.

Figure 201910405596

Description

Golden monkey body segmentation algorithm in natural scene
Technical Field
The invention relates to the technical field of image segmentation and image object identification and positioning, in particular to a golden monkey body segmentation algorithm in a natural scene.
Background
In the research and protection work of the wild golden monkey, the golden monkey is accurately distinguished from the environmental background through natural images and videos, and the method is the basis of a subsequent individual re-identification task, so that data support is provided for the research of the quantity, behaviors and the like of golden monkey groups. At present, a plurality of researchers at home and abroad propose various animal detection and recognition algorithms based on facial features.
The Du provides an automatic positioning method based on the facial features of the rhesus monkey by utilizing image segmentation and mathematical morphology. The traditional image processing method is utilized to realize the automatic positioning of the organs such as the eyes, the mouths and the like of the macaque, and the edge detection operators are utilized to extract the detailed outlines of partial organs.
However, these methods do not yield very good results due to image variations. Pengfei Xu proposes a golden monkey face detection algorithm based on regional color quantization and incremental adaptive course learning, which can effectively detect monkey faces of monkeys, but the segmentation threshold of the isplc is limited by the kind or habitat of the golden monkeys.
Disclosure of Invention
The invention aims to solve the technical problem of providing a golden monkey body segmentation algorithm which can meet the requirements of a golden monkey individual re-identification task on natural scene data processing.
In order to realize the task, the invention adopts the following technical scheme:
a golden monkey body segmentation algorithm under a natural scene comprises the following steps:
constructing a semantic segmentation network to realize end-to-end image segmentation; training the semantic segmentation network, and storing the trained network model for segmentation detection of the image to be segmented;
the semantic segmentation network comprises a classification network, a fusion part and an output part, wherein:
the semantic segmentation network sequentially comprises a first convolution layer, a first maximum pooling layer, a second convolution layer c, a second maximum pooling layer, a third convolution layer, a third maximum pooling layer, a fourth convolution layer, a fourth maximum pooling layer, a fifth convolution layer, a fifth maximum pooling layer, a sixth convolution layer and a seventh convolution layer from front to back; each layer of the first convolution layer to the fifth convolution layer comprises two continuous convolution calculations, the sixth convolution layer comprises one convolution calculation, and the function is to perform feature extraction processing on an input image to obtain a feature map; the seventh convolution layer comprises a primary convolution calculation and a classification activation function, and is used for performing feature extraction processing and pixel-level classification to obtain a confidence map; the role of the first through fifth max pooling layers is to reduce the data dimension without losing features;
the fusion part comprises a first characteristic fusion layer and a second characteristic fusion layer, and the first characteristic fusion layer performs characteristic fusion with the output of the fifth maximum pooling layer after up-sampling the output of the seventh convolution layer; the second feature fusion layer performs feature fusion on the output result of the first feature fusion layer and the features extracted by the fourth maximum pooling layer to obtain a fused confidence map;
the output part comprises an output layer, the output layer comprises an up-sampling layer and a classification layer, wherein the input of the up-sampling layer is the output of the second feature fusion layer, and the function is to expand the fused confidence map to the size of the original input image; the classification layer is used for performing classification prediction on each pixel to finally obtain a high-resolution class thermodynamic diagram consistent with the size of the original input image.
Further, the convolution kernel size of each of the first convolution layer to the fifth convolution layer is 2 × 2, and the step length is 2; the convolution kernel size of the sixth convolution layer is 7 × 7, and the step size is 1; the convolution kernel size of the seventh convolution layer is 1 × 1 with a step size of 1.
Further, the pooling kernel size of each of the first largest pooling layer to the fifth largest pooling layer is 2 × 2, and the step size is 2.
Further, the first feature fusion layer performs feature fusion with the output of the fifth largest pooling layer after upsampling the output of the seventh convolutional layer, and includes:
the first feature fusion layer comprises an up-sampling layer and a convolution layer, the input of the up-sampling layer is the output of the seventh convolution layer, the up-sampling multiple is 2 times, and the function is to expand pixels of the confidence map so as to facilitate feature fusion and obtain a confidence map A2 with expanded dimensions; the input of the convolutional layer is the output of the fifth maximum pooling layer, the size of the convolutional kernel is 1 multiplied by 1, the step length is 1, the activation function is a Relu function, and a confidence map B of the output of the fifth maximum pooling layer is obtained; the final output of the first feature fusion layer is the sum of the confidence map B and the confidence map a2, denoted as confidence map C.
Further, the second feature fusion layer performs feature fusion on the output result of the first feature fusion layer and the features extracted by the fourth largest pooling layer to obtain a fused confidence map, which includes:
the second feature fusion layer comprises an up-sampling layer and a convolution layer, the input of the up-sampling layer is a segmentation result graph C, the up-sampling multiple is 2 times, the function is to expand pixels of the confidence graph so as to facilitate feature fusion, and a dimension expanded confidence graph C2 is obtained; the input of the convolution layer is the output of the fourth maximum pooling layer, the size of the convolution kernel is 1 multiplied by 1, the step length is 1, the activation function is a Relu function, and a confidence map D of the output of the fourth maximum pooling layer is obtained; the final output of the second feature fusion layer is the sum of the confidence map D and the confidence map C2, which is denoted as the confidence map E, i.e. the fused confidence map.
Further, when the semantic segmentation network is trained, the loss function adopted is as follows:
Figure GDA0002918802590000031
wherein, y(i,j)Representing the label value at the actually classified image pixel point (i, j) corresponding to the input image,
Figure GDA0002918802590000032
representing the predicted value of the pixel point (i, j) of the output image after the input image is processed by the semantic segmentation network, height and width respectively representing the height and width of the image, dw(i,j)The weight function is constrained for the distance, expressed as:
Figure GDA0002918802590000033
wherein distance (C)(i,j),enter(i,j)) Representing a pixel point I(i,j)Center away from the connected domain where it is located(i,j)α and β are two constants.
The invention has the following technical characteristics:
1. the algorithm achieves the purpose of golden monkey target detection by semantically segmenting an original image, mainly performs golden monkey and natural environment segmentation by a full convolution network in deep learning, and focuses an FCN model on the integrity of a golden monkey individual by using a loss function based on distance weight, namely an improved cross entropy loss function, so that the final detection accuracy is improved.
2. Through experimental contrastive analysis, the problem that the body of the golden monkey is divided into a plurality of parts can be better solved by using the improved loss function, and the improved natural scene golden monkey detection network can better improve the image segmentation result of the previous problems.
Drawings
FIG. 1 is a diagram of a semantic segmentation network architecture;
FIG. 2 is a feature fusion graph;
FIG. 3 illustrates IoU a calculation method;
FIG. 4 is an exemplary diagram of a semantic segmentation network generating correct detection results, wherein (a) represents the segmentation results of the original image, and (b) is a rectangular frame in which rectangular golden monkey individual detection results are generated based on the result image;
FIG. 5 is an exemplary diagram of error detection results generated by the semantic segmentation network; wherein (a) and (c) show the condition that the lower edge of the golden monkey is not completely covered by the rectangular detection result, and (b) show the condition that the single golden monkey is divided into two detection frames due to the error of the division result;
fig. 6 is a comparison of detection results of the semantic segmentation network after training by using the improved DWL function, wherein (a), (b), and (c) are respectively a diagram of detection results of three golden monkey individuals.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The golden monkey body segmentation in the scheme refers to detecting and segmenting an image part where a golden monkey body is located from an image containing the golden monkeys.
Step 1, constructing a semantic segmentation network to realize end-to-end image segmentation
The semantic segmentation network comprises a classification network, a fusion part and an output part, wherein:
first part, the classification network
The classification network is composed of 7 convolutional layers and 5 pooling layers, and the arrangement of each layer from front to back of the network is as follows: a first rolling layer conv1, a first maximum pooling layer pool1, a second rolling layer conv2, a second maximum pooling layer pool2, a third rolling layer conv3, a third maximum pooling layer pool3, a fourth rolling layer conv4, a fourth maximum pooling layer pool4, a fifth rolling layer conv5, a fifth maximum pooling layer pool5, a sixth rolling layer conv6 and a seventh rolling layer conv 7. Wherein:
performing conv 1-conv 5 on the first to fifth conv layers, wherein each layer comprises two times of continuous convolution calculation, and performing feature extraction processing on an input image to obtain a feature map; a sixth convolution layer conv6, including a convolution calculation, for performing a feature extraction process on the input image to obtain a feature map, and outputting the feature map as the next layer; a seventh convolution layer conv7, which includes a primary convolution calculation and classification activation function, and is used for performing feature extraction processing on the input image and performing pixel-level classification to obtain a heat map of a low-resolution class, namely a confidence map; the sizes of convolution kernels of each layer from conv1 to conv5 are 2 multiplied by 2, and the step length is 2; the convolution kernel size of the sixth convolution layer is 7 × 7, and the step size is 1; the convolution kernel size of the seventh convolution layer is 1 × 1 with a step size of 1.
Reducing the data dimension under the condition of not losing the features from the first largest pooling layer to the fifth largest pooling layer pool 1-pool 5 to obtain a feature map with reduced data dimension; the sizes of pool 1-pool 5 pooling nuclei are all 2X 2, step size is 2.
Second part, the fusion part
The fusion part comprises a first characteristic fusion layer and a second characteristic fusion layer, and the first characteristic fusion layer performs characteristic fusion with the output of the fifth maximum pooling layer after up-sampling the output of the seventh convolution layer; and the second feature fusion layer performs feature fusion on the output result of the first feature fusion layer and the features extracted by the fourth maximum pooling layer.
The first fusion layer fuses the segmentation results of the characteristics of the bottom two layers and is used for fusing the characteristics of multi-layer output, so that the identification accuracy is improved; the second feature fusion layer fuses the segmentation results of the bottom three layers, and cross-layer connection is achieved.
The feature fusion part is shown in figure 2:
the first feature fusion layer comprises an up-sampling layer and a convolution layer, wherein the input of the up-sampling layer is the output of conv7, the up-sampling multiple is 2 times, and the function is to expand pixels of the confidence map so as to facilitate feature fusion and obtain a confidence map A2 with expanded dimensions; the input of the convolutional layer is the output of pool5, the size of the convolutional kernel is 1 multiplied by 1, the step length is 1, the activation function is the Relu function, and a confidence map B of the output of the pool5 layer is obtained; the final output of the first feature fusion layer is the sum of the confidence map B and the confidence map A2, and is marked as a confidence map C, namely a segmentation result map fused with the features of the bottom two layers, and the segmentation result map is used for fusing the features of the multi-layer output and improving the recognition accuracy.
The second feature fusion layer comprises an up-sampling layer and a convolution layer, the input of the up-sampling layer is a segmentation result graph C, the up-sampling multiple is 2 times, the function is to expand pixels of the confidence graph so as to facilitate feature fusion, and a dimension expanded confidence graph C2 is obtained; the input of the convolutional layer is the output of pool4, the size of the convolutional kernel is 1 multiplied by 1, the step length is 1, the activation function is a relu function, and a confidence map D of the output of the pool4 layer is obtained; the final output of the second feature fusion layer is the sum of the confidence map D and the confidence map C2, and is recorded as a confidence map E, namely, the segmentation result map of the bottom three layers is fused, so that cross-layer connection is realized, the features of multi-layer output are fused, and the recognition accuracy is improved.
Third, output section
The output part comprises an output layer, the output layer comprises an up-sampling layer and a classification layer, wherein the input of the up-sampling layer is the output of the fusion part, namely the second characteristic fusion layer; the upsampling multiple is 8 times, and the function is to enlarge the size of the pixels of the fused confidence image E to the size of an original input image (an image input into a classification network); the classification layer activation function is a SoftMax function and is used for performing classification prediction on each pixel to finally obtain a high-resolution class thermodynamic diagram consistent with the size of an original input image.
Step 2, training the semantic segmentation network constructed in thestep 1
Creating a data set: in the data set, the golden monkey partial pixels are uniformly marked as target areas. And finally, creating a golden monkey target detection set-Nature golden monkey Segmentation under a natural scene according to the standard of the PASCAL VOC data set, and naming the golden monkey target detection set-Nature golden monkey Segmentation as the NGS. The NGS data set contains 600 golden monkey images in a natural scene in total, and 31 golden monkey individuals appear in total, and the golden monkey has uniform gender and age layer distribution and rich action types.
Training is carried out by utilizing an NGS data set, training data and testing data are randomly distributed according to the ratio of 4:1, the maximum iteration number is set to be 5000 times, and a network model is stored after the training is finished. The specific training process is as follows:
using the final output of the semantic segmentation network as
Figure GDA0002918802590000065
Inputting the classified images of the images as y (namely the actual classification result of the original image) into a loss function and calculating, wherein the calculated result reflects the difference between the prediction and the actual result of the network; and (3) carrying out derivation on parameters in the network by the loss function, updating the network parameters according to the relationship of the derivatives, setting the learning rate to be 0.00005, and keeping the learning rate stable and unchanged in the learning process.
In the scheme, a loss function distance-weight loss based on distance weight is adopted as the loss function and is marked as DWL; the loss function is obtained by introducing weight coefficients to different pixel points in a basic cross entropy function and taking the weight coefficients as a distance constraint weight function. The DWL loss function can better calculate the distance between the predicted value and the true value of the sample, so that the model focuses more on the central area of the body of the golden monkey, and learns the structural information of the body of the golden monkey, and the DWL loss function is as follows:
Figure GDA0002918802590000061
a specific derivation of the DWL loss function is given below.
For the segmentation problem of the invention, the network model outputs the prediction probability value of whether each pixel belongs to the golden monkey region, and the cross entropy function L of the network model can be deduced through the basic classification cross entropy function as follows:
Figure GDA0002918802590000062
wherein, y(i,j)Representing the label value at the actually classified image pixel point (i, j) corresponding to the input image,
Figure GDA0002918802590000063
the predicted value of the pixel point (i, j) of the output image after the input image is processed by the semantic segmentation network is represented, and height and width respectively represent the height and width of the image.
The invention mainly makes two improvements to the cross entropy loss function: the improvement of different weights of the ROI area and the environmental background area and the introduction of the distance information from the center to the edge position of the golden monkey in the ROI area.
Firstly, introducing a weight coefficient W to different pixel points in a basic cross entropy function(i,j)The new cross entropy function WL is as follows:
Figure GDA0002918802590000064
wherein, W(i,j)The weight of loss at the pixel point (i, j) is expressed, and the original cross entropy loss function L can be understood as W(i,j)Is constantly equal to 1 and is therefore negligible.
In calculating W(i,j)In the process, the label value y of the pixel point is required to be determined(i,j)And the position information pair weight coefficient W of the pixel point (i, j)(i,j)Respectively taking values: firstly, keeping the weight of an environmental background region unchanged, and increasing the weight coefficient of a golden monkey ROI region to enable a model to pay more attention to the golden monkey region; secondly, for a golden monkey ROI target region, in order to utilize the body structure information of the golden monkey, linearly reduced weights are set from the central part of the body of the golden monkey, namely the rectangular center of the ROI region to the edge, so that the learning of the model to the central region of the body of the golden monkey is strengthened, and meanwhile, important distance constraint information between the center of the body of the golden monkey and the hair edge can be kept as much as possible; the distance constraint weight function dw(i,j)As follows:
Figure GDA0002918802590000071
wherein, I(i,j)Indicating the pixel at (i, j), center(i,j)(ii) represents the center of the connected domain where the pixel point (i, j) is located, distance: ((i,j),enter(i,j)) Representing a pixel point I(i,j)Center away from the connected domain where it is located(i,j)A and beta are two constants for controlling the pixel point I(i,j)The weighted value of the region makes the weighted range of different pixel points in the ROI region of the golden monkey be [ alpha-beta, alpha ]](ii) a Let α be 2 and β be max (distance: (b)(i,j),enter(i,j)) When I) then(i,j)When being the center of the connected component, dw(i,j)A value of 2; when I is(i,j)At farthest distance from the center of the connected component, dw(i,j)The value is 1, thereby achieving the goal of decreasing weights in the golden monkey ROI region from the center to the edge of the connected domain.
Weighting coefficient W of new cross entropy function WL(i,j)Value is dw(i,j)An improved distance weight based loss function DWL can be obtained as follows:
Figure GDA0002918802590000072
and 3, storing the trained network model for the segmentation detection of the image to be segmented.
After the semantic segmentation network is trained in thestep 2, storing the trained network model; in practical application, an image to be segmented is input into a network model, and the output of the network model is the segmented high-resolution thermodynamic diagram.
Through experimental contrastive analysis, the problem that the body of the golden monkey is divided into a plurality of parts can be better solved by using the improved loss function, and the improved natural scene golden monkey detection network can better improve the image segmentation result of the previous problems.
The experimental comparative analysis procedure is as follows:
the invention performs experiments on the NGS dataset and randomly distributes training data and test data in a 4:1 ratio. In the natural scene golden monkey detection algorithm, the learning rate is set to be 0.00005, the learning rate is stable and unchanged in the learning process, and the maximum iteration number of the experiment is set to be 5000 times. After the segmentation result of the original image in the natural scene is obtained, generating a rectangular golden monkey individual detection result according to the edge of the segmentation image, in the generation process, eliminating a target pixel region which is too small to be normally utilized, and finally obtaining golden monkey individual data through the image region in the rectangular frame.
IoU standard can be used to quantitatively measure the correlation between the true value and the predicted value, as shown in fig. 3, in the present invention, the segmentation results obtained by the FCN network before and after the DWL function is applied are compared, IoU and the average value of all 100 images in the test set are calculated by calculating IoU between the segmentation results and ground truth as the evaluation index of the network performance, and the results are shown in the following table:
IoU comparison before and after DWL function improvement
Loss functionIoU
Cross entropy function85.28%
DWL function86.14%
As can be seen from the table, the segmentation effect obtained by the original semantic segmentation network is better for the basic data, and the method is proved to be suitable for the segmentation task of the invention; after the loss function is improved, the result of using the DWL function is improved by 0.86 percent compared with the original network, and the improvement of the loss function is proved by the invention, so that the segmentation effect of the network model is improved to a certain extent on the whole.
As shown in fig. 4, the rectangle frame of (a) shows the segmentation result of the original image, and the rectangle frame of (b) is the rectangular golden monkey individual detection result generated based on the result image.
Fig. 5 shows in detail the influence of segmentation error on the generation of rectangle detection result, where (a) and (c) show two cases where the lower edge of the golden monkey is not completely covered by the rectangle detection result, and the segmentation error causes the problem of missing edge information of the golden monkey. (b) The situation that a single golden monkey is divided into two detection boxes due to the segmentation result error is shown, and the segmentation error causes the problem that the single golden monkey is divided into a plurality of rectangles. It is clear that the test data shown in fig. 5 is not usable for the golden monkey individual re-identification experiment.
By the method, the loss function of the original FCN network is improved, the loss weight calculation mode of different pixels based on distance is adopted, so that the constraint information of the body structure of the golden monkey is introduced, the segmentation result is shown in fig. 6, and the result has a better improvement effect. The image compares the situation that the rectangular detection result cannot be used due to two typical segmentation errors before improvement, the segmentation results in the two images are obviously improved, and the area of the error result is obviously reduced. For the generated rectangular detection results, (a) the lower edge of the golden monkey is completely included in the detection results, and (b) the golden monkey originally divided into two rectangular detection frames is successfully and accurately detected as a complete golden monkey individual.
The distance weight-based loss function DWL designed by the invention can greatly improve the golden monkey segmentation result in a natural scene, thereby effectively improving the accuracy of the final rectangular detection result and obtaining golden monkey individual image data meeting the requirements of the golden monkey individual re-identification experiment through the rectangular region result.

Claims (6)

Translated fromChinese
1.一种自然场景下的金丝猴躯体分割算法,其特征在于,包括以下步骤:1. a golden monkey body segmentation algorithm under a natural scene, is characterized in that, comprises the following steps:构造语义分割网络,实现端到端的图像分割;对所述的语义分割网络进行训练,保存训练后的网络模型用于待分割图像的分割检测;Constructing a semantic segmentation network to achieve end-to-end image segmentation; training the semantic segmentation network, and saving the trained network model for segmentation and detection of images to be segmented;所述的语义分割网络包括分类网络、融合部分以及输出部分,其中:The semantic segmentation network includes a classification network, a fusion part and an output part, wherein:语义分割网络从前至后依次为第一卷积层、第一最大池化层、第二卷积层c、第二最大池化层、第三卷积层、第三最大池化层、第四卷积层、第四最大池化层、第五卷积层、第五最大池化层、第六卷积层、第七卷积层;第一卷积层至第五卷积层的每层包括两次连续的卷积计算、第六卷积层包括一次卷积计算,作用是将输入的图像进行特征提取处理,得到特征图;第七卷积层包括一次卷积计算和分类激活函数,作用是进行特征提取处理并做像素级的分类,得到置信图;第一最大池化层至第五最大池化层的作用是在不损失特征的情况下减小数据维度;From front to back, the semantic segmentation network is the first convolutional layer, the first maximum pooling layer, the second convolutional layer c, the second maximum pooling layer, the third convolutional layer, the third maximum pooling layer, and the fourth Convolutional layer, fourth max pooling layer, fifth convolutional layer, fifth max pooling layer, sixth convolutional layer, seventh convolutional layer; each layer from the first convolutional layer to the fifth convolutional layer It includes two consecutive convolution calculations, and the sixth convolution layer includes one convolution calculation. The function is to perform feature extraction processing on the input image to obtain a feature map; the seventh convolution layer includes a convolution calculation and a classification activation function. The function is to perform feature extraction processing and do pixel-level classification to obtain a confidence map; the function of the first maximum pooling layer to the fifth maximum pooling layer is to reduce the data dimension without losing features;融合部分包括第一特征融合层和第二特征融合层,第一特征融合层将第七卷积层的输出经过上采样后,与第五最大池化层的输出进行特征融合;第二特征融合层将第一特征融合层的输出结果和第四最大池化层提取的特征进行特征融合,得到融合后的置信图;The fusion part includes a first feature fusion layer and a second feature fusion layer. The first feature fusion layer upsamples the output of the seventh convolutional layer and performs feature fusion with the output of the fifth maximum pooling layer; the second feature fusion layer fuses the output result of the first feature fusion layer with the features extracted by the fourth maximum pooling layer to obtain a fused confidence map;输出部分包括输出层,输出层包括一个上采样层和一个分类层,其中上采样层的输入为第二特征融合层的输出,作用是扩大所述融合后的置信图到原始输入图像的大小;分类层作用是对每个像素进行分类预测,最终得到与原始输入图像大小一致的高分辨率的类热力图。The output part includes an output layer, and the output layer includes an upsampling layer and a classification layer, wherein the input of the upsampling layer is the output of the second feature fusion layer, and the function is to expand the fused confidence map to the size of the original input image; The role of the classification layer is to classify and predict each pixel, and finally obtain a high-resolution class heatmap that is consistent with the size of the original input image.2.如权利要求1所述的自然场景下的金丝猴躯体分割算法,其特征在于,所述的第一卷积层至第五卷积层的每一层卷积核大小均为2×2,步长为2;第六卷积层的卷积核大小为7×7,步长为1;第七卷积层的卷积核大小为1×1,步长为1。2. The golden monkey body segmentation algorithm in a natural scene as claimed in claim 1, wherein the size of each convolution kernel of the first convolution layer to the fifth convolution layer is 2×2, The stride is 2; the convolution kernel size of the sixth convolutional layer is 7×7, and the stride is 1; the convolutional kernel size of the seventh convolutional layer is 1×1, and the stride is 1.3.如权利要求1所述的自然场景下的金丝猴躯体分割算法,其特征在于,所述的第一最大池化层至第五最大池化层的每一层池化核大小为2×2,步长为2。3. The golden monkey body segmentation algorithm in a natural scene according to claim 1, wherein the size of the pooling kernel of each layer from the first maximum pooling layer to the fifth maximum pooling layer is 2×2 , with a step size of 2.4.如权利要求1所述的自然场景下的金丝猴躯体分割算法,其特征在于,所述的第一特征融合层将第七卷积层的输出经过上采样后,与第五最大池化层的输出进行特征融合,包括:4. The golden monkey body segmentation algorithm in a natural scene according to claim 1, wherein the first feature fusion layer upsamples the output of the seventh convolutional layer, and then combines the output of the seventh convolutional layer with the fifth maximum pooling layer. The output of the feature fusion, including:第一特征融合层包括一个上采样层和卷积层,上采样层的输入为第七卷积层的输出,上采样倍数为2倍,作用是扩大置信图的像素便于特征融合,得到维度扩大的置信图A2;卷积层的输入为第五最大池化层的输出,卷积核大小为1×1,步长为1,激活函数为Relu函数,得到第五最大池化层输出的置信图B;第一特征融合层最终的输出为置信图B与置信图A2的和,记为置信图C。The first feature fusion layer includes an upsampling layer and a convolutional layer. The input of the upsampling layer is the output of the seventh convolutional layer, and the upsampling multiple is 2 times. The function is to expand the pixels of the confidence map to facilitate feature fusion and obtain dimension expansion. The confidence map A2 of ; the input of the convolutional layer is the output of the fifth maximum pooling layer, the convolution kernel size is 1×1, the stride is 1, and the activation function is the Relu function to obtain the confidence of the output of the fifth maximum pooling layer. Figure B; the final output of the first feature fusion layer is the sum of the confidence map B and the confidence map A2, denoted as the confidence map C.5.如权利要求1所述的自然场景下的金丝猴躯体分割算法,其特征在于,所述的第二特征融合层将第一特征融合层的输出结果和第四最大池化层提取的特征进行特征融合,得到融合后的置信图,包括:5. The golden monkey body segmentation algorithm in a natural scene as claimed in claim 1, wherein the second feature fusion layer combines the output result of the first feature fusion layer and the features extracted by the fourth maximum pooling layer. Feature fusion to obtain a fused confidence map, including:第二特征融合层包括一个上采样层和卷积层,上采样层的输入为分割结果图C,上采样倍数为2倍,作用是扩大置信图的像素便于特征融合,得到维度扩大的置信图C2;卷积层的输入为第四最大池化层的输出,卷积核大小为1×1,步长为1,激活函数为Relu函数,得到第四最大池化层输出的置信图D;第二特征融合层最终输出为置信图D与置信图C2的和,记为置信图E,即为融合后的置信图。The second feature fusion layer includes an upsampling layer and a convolutional layer. The input of the upsampling layer is the segmentation result graph C, and the upsampling multiple is 2 times. C2; the input of the convolution layer is the output of the fourth maximum pooling layer, the size of the convolution kernel is 1×1, the step size is 1, the activation function is the Relu function, and the confidence map D of the output of the fourth maximum pooling layer is obtained; The final output of the second feature fusion layer is the sum of the confidence map D and the confidence map C2, denoted as the confidence map E, which is the fused confidence map.6.如权利要求1所述的自然场景下的金丝猴躯体分割算法,其特征在于,所述的语义分割网络进行训练时,所采用的损失函数为:6. The golden monkey body segmentation algorithm under natural scene as claimed in claim 1, is characterized in that, when described semantic segmentation network is trained, the loss function that adopts is:
Figure FDA0003024407380000021
Figure FDA0003024407380000021
其中,y(i,j)表示输入图像对应的实际分类后的图像像素点(i,j)处的标签值,
Figure FDA0003024407380000022
表示输入图像经语义分割网络处理后的输出图像像素点(i,j)处的预测值,height和width分别表示图像的高和宽,dw(i,j)为距离约束权重函数,表示为:
Among them, y(i, j) represents the label value at the actual classified image pixel point (i, j) corresponding to the input image,
Figure FDA0003024407380000022
Represents the predicted value at pixel (i, j) of the output image after the input image is processed by the semantic segmentation network, height and width represent the height and width of the image respectively, dw(i, j) is the distance constraint weight function, expressed as:
Figure FDA0003024407380000023
Figure FDA0003024407380000023
其中,distance(I(i,j),center(i,j))表示像素点I(i,j)距离其所在连通域中心center(i,j)的距离,α和β是两个常数。Among them, distance(I(i,j) ,center(i,j) ) represents the distance between the pixel point I(i,j) and the center(i,j) of the connected domain where it is located, and α and β are two constants.
CN201910405596.6A2019-05-162019-05-16Golden monkey body segmentation algorithm in natural sceneExpired - Fee RelatedCN110287777B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910405596.6ACN110287777B (en)2019-05-162019-05-16Golden monkey body segmentation algorithm in natural scene

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910405596.6ACN110287777B (en)2019-05-162019-05-16Golden monkey body segmentation algorithm in natural scene

Publications (2)

Publication NumberPublication Date
CN110287777A CN110287777A (en)2019-09-27
CN110287777Btrue CN110287777B (en)2021-06-08

Family

ID=68002084

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910405596.6AExpired - Fee RelatedCN110287777B (en)2019-05-162019-05-16Golden monkey body segmentation algorithm in natural scene

Country Status (1)

CountryLink
CN (1)CN110287777B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110930383A (en)*2019-11-202020-03-27佛山市南海区广工大数控装备协同创新研究院 Injector defect detection method based on deep learning semantic segmentation and image classification
CN111179262B (en)*2020-01-022024-09-06国家电网有限公司Electric power inspection image hardware fitting detection method combining shape attribute
CN111242929A (en)*2020-01-132020-06-05中国科学技术大学 A method, system, device and medium for measuring fetal skull shape parameters
CN113469892A (en)*2020-04-292021-10-01海信集团有限公司Video frame processing method, device, equipment and medium
CN113744276A (en)*2020-05-132021-12-03Oppo广东移动通信有限公司Image processing method, image processing apparatus, electronic device, and readable storage medium
CN111626196B (en)*2020-05-272023-05-16西南石油大学Knowledge-graph-based intelligent analysis method for body structure of typical bovine animal
CN112163449B (en)*2020-08-212022-12-16同济大学 A lightweight multi-branch feature cross-layer fusion image semantic segmentation method
EP4036792A1 (en)*2021-01-292022-08-03Aptiv Technologies LimitedMethod and device for classifying pixels of an image
CN114399513B (en)*2021-12-102023-04-18北京百度网讯科技有限公司Method and device for training image segmentation model and image segmentation
CN117351537B (en)*2023-09-112024-05-17中国科学院昆明动物研究所Kiwi face intelligent recognition method and system based on deep learning

Citations (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109447990A (en)*2018-10-222019-03-08北京旷视科技有限公司Image, semantic dividing method, device, electronic equipment and computer-readable medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8494285B2 (en)*2010-12-092013-07-23The Hong Kong University Of Science And TechnologyJoint semantic segmentation of images and scan data
CN105825168B (en)*2016-02-022019-07-02西北大学 A face detection and tracking method of golden snub-nosed monkey based on S-TLD
WO2017210690A1 (en)*2016-06-032017-12-07Lu LeSpatial aggregation of holistically-nested convolutional neural networks for automated organ localization and segmentation in 3d medical scans
CN106709568B (en)*2016-12-162019-03-22北京工业大学 Object detection and semantic segmentation of RGB-D images based on deep convolutional networks
CN109145939B (en)*2018-07-022021-11-02南京师范大学 A Small Object-Sensitive Two-Channel Convolutional Neural Network Semantic Segmentation Method
CN109389051A (en)*2018-09-202019-02-26华南农业大学A kind of building remote sensing images recognition methods based on convolutional neural networks

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109447990A (en)*2018-10-222019-03-08北京旷视科技有限公司Image, semantic dividing method, device, electronic equipment and computer-readable medium

Also Published As

Publication numberPublication date
CN110287777A (en)2019-09-27

Similar Documents

PublicationPublication DateTitle
CN110287777B (en)Golden monkey body segmentation algorithm in natural scene
CN110189334B (en)Medical image segmentation method of residual error type full convolution neural network based on attention mechanism
CN110532900B (en) Facial Expression Recognition Method Based on U-Net and LS-CNN
CN107506761B (en) Brain image segmentation method and system based on saliency learning convolutional neural network
CN109118467B (en)Infrared and visible light image fusion method based on generation countermeasure network
CN108509978B (en)Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion
CN109345508B (en) A Bone Age Evaluation Method Based on Two-Stage Neural Network
CN108133188B (en)Behavior identification method based on motion history image and convolutional neural network
CN111738363B (en)Alzheimer disease classification method based on improved 3D CNN network
CN111898432B (en)Pedestrian detection system and method based on improved YOLOv3 algorithm
CN111062278B (en)Abnormal behavior identification method based on improved residual error network
CN114821052B (en)Three-dimensional brain tumor nuclear magnetic resonance image segmentation method based on self-adjustment strategy
CN110543906B (en)Automatic skin recognition method based on Mask R-CNN model
CN104077613A (en)Crowd density estimation method based on cascaded multilevel convolution neural network
CN112861718A (en)Lightweight feature fusion crowd counting method and system
CN109743642B (en)Video abstract generation method based on hierarchical recurrent neural network
CN111986125A (en) A method for instance segmentation for multi-objective tasks
CN107784288A (en)A kind of iteration positioning formula method for detecting human face based on deep neural network
CN114332133A (en) Method and system for segmentation of infected area in CT images of COVID-19 based on improved CE-Net
CN110533683A (en)A kind of image group analysis method merging traditional characteristic and depth characteristic
CN116844041A (en) A farmland extraction method based on bidirectional convolution temporal self-attention mechanism
CN116012337A (en)Hot rolled strip steel surface defect detection method based on improved YOLOv4
CN114360067A (en) A dynamic gesture recognition method based on deep learning
CN112329793B (en) Saliency detection method based on structure-adaptive and scale-adaptive receptive field
CN114462558A (en)Data-augmented supervised learning image defect classification method and system

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
CF01Termination of patent right due to non-payment of annual fee

Granted publication date:20210608

CF01Termination of patent right due to non-payment of annual fee

[8]ページ先頭

©2009-2025 Movatter.jp