Movatterモバイル変換


[0]ホーム

URL:


CN114022458B - Skeleton detection method, device, electronic equipment and computer readable storage medium - Google Patents

Skeleton detection method, device, electronic equipment and computer readable storage medium
Download PDF

Info

Publication number
CN114022458B
CN114022458BCN202111318921.9ACN202111318921ACN114022458BCN 114022458 BCN114022458 BCN 114022458BCN 202111318921 ACN202111318921 ACN 202111318921ACN 114022458 BCN114022458 BCN 114022458B
Authority
CN
China
Prior art keywords
skeleton
feature
convolution
network
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111318921.9A
Other languages
Chinese (zh)
Other versions
CN114022458A (en
Inventor
项蕾
陈华华
林金曙
张芸菲
陈丽娟
刘亚洲
张奇明
童鲁虹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hundsun Technologies Inc
Original Assignee
Hundsun Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hundsun Technologies IncfiledCriticalHundsun Technologies Inc
Priority to CN202111318921.9ApriorityCriticalpatent/CN114022458B/en
Publication of CN114022458ApublicationCriticalpatent/CN114022458A/en
Application grantedgrantedCritical
Publication of CN114022458BpublicationCriticalpatent/CN114022458B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明实施例提出一种骨架检测方法、装置、电子设备和计算机可读存储介质,涉及计算机视觉技术领域。该方法通过将待检测图像输入训练好的骨架检测模型,依次经过特征提取网络、特征融合网络、矢量偏移网络和输出卷积层的处理,即可得到该待检测图像的骨架检测结果。由于训练骨架检测模型所需的骨架标签是通过对原始骨架标签进行膨胀来获得的,故有效改善了监督过程中的任务难度,提升了网络对骨架周围的检测能力;并且,在骨架检测模型的网络结构中引入矢量偏移网络,对检测的骨架信息进行偏移,实现输出的骨架信息的收缩。如此,通过骨架标签的膨胀和输出的骨架信息的收缩,极大提升了整个模型的骨架检测精度。

The embodiment of the present invention proposes a skeleton detection method, device, electronic device and computer-readable storage medium, which relate to the field of computer vision technology. The method can obtain the skeleton detection result of the image to be detected by inputting the image to be detected into a trained skeleton detection model, and sequentially processing through a feature extraction network, a feature fusion network, a vector offset network and an output convolution layer. Since the skeleton labels required for training the skeleton detection model are obtained by expanding the original skeleton labels, the task difficulty in the supervision process is effectively improved, and the network's detection ability around the skeleton is improved; and a vector offset network is introduced into the network structure of the skeleton detection model to offset the detected skeleton information and realize the contraction of the output skeleton information. In this way, the skeleton detection accuracy of the entire model is greatly improved through the expansion of the skeleton labels and the contraction of the output skeleton information.

Description

Skeleton detection method, device, electronic equipment and computer readable storage medium
Technical Field
The present invention relates to the field of computer vision, and in particular, to a skeleton detection method, a skeleton detection device, an electronic device, and a computer readable storage medium.
Background
The skeleton is an image feature proposed by Blum, is a one-dimensional simplified representation mode of an object, contains structural information of the object and connection information between object components, and can accurately express geometric features and topological features of the object. The skeleton of an object, also called the medial axis, is a descriptor based on the structure of the object. Because skeleton pixels occupy only a small proportion of the pixels in the image, skeletons have considerable abstraction. The skeleton is generally structural information extracted from an object, has invariance to non-rigid deformations, and has wide application in computer vision for modeling an object skeleton.
Direct detection of object skeleton from natural images is a hotspot problem and also faces many challenges. Objects in natural images often take on various forms, colors, textures and shapes, and in addition, images obtained in natural scenes are inevitably affected by problems such as uneven illumination, low contrast, shielding and the like. Therefore, whether the skeleton can be accurately extracted will also affect the processing performance of other subsequent intelligent systems to a certain extent.
In the existing skeleton detection technology, a plurality of skeleton filters are mainly used, each skeleton filter has a zero-sum structure and a reflection symmetry structure, noise distribution without the symmetry structure is filtered, and an effective skeleton is reserved. In this manner, the high-noise image is subjected to unsupervised skeleton extraction by using a plurality of skeleton filters, and the object skeleton detection accuracy is lowered when facing a complex scene.
Disclosure of Invention
In view of the above, the present invention is directed to a skeleton detecting method, device, electronic apparatus, and computer readable storage medium, so as to solve the problem of low accuracy of detecting an object skeleton in the existing skeleton detecting technology.
In order to achieve the above object, the technical scheme adopted by the embodiment of the invention is as follows:
in a first aspect, the present invention provides a skeleton detection method, the method comprising:
Inputting an image to be detected into a trained skeleton detection model, wherein a skeleton label adopted for training the skeleton detection model is obtained by expanding an original skeleton label;
Extracting features of the image to be detected by using a feature extraction network of the skeleton detection model to obtain a plurality of target feature images;
Inputting the target feature images into a feature fusion network of the skeleton detection model, and carrying out feature fusion on the target feature images by using the feature fusion network to obtain fused feature images;
Inputting the fused feature images into a vector offset network of the skeleton detection model, and performing offset processing on the fused feature images by using the vector offset network to obtain offset feature images;
Inputting the shifted feature images into an output convolution layer of the skeleton detection model, and carrying out convolution operation on the shifted feature images by utilizing the output convolution layer to obtain skeleton detection results corresponding to the images to be detected.
In an alternative embodiment, the skeletal tag is generated by:
performing expansion operation on the original skeleton tag according to a preset radius threshold value to obtain an expanded skeleton tag;
calculating the distance information from each non-zero pixel point in the expanded skeleton label to the nearest zero point;
according to the distance information and a preset Gaussian distribution function, a Gaussian distribution expansion framework is obtained;
And normalizing the Gaussian distribution expansion skeleton according to the maximum value in the Gaussian distribution expansion skeleton to obtain the skeleton tag.
In an optional implementation manner, the feature extraction network includes a plurality of feature extraction layers and a hole space pyramid pooling ASPP module that are sequentially connected in series, and the feature extraction network that uses the skeleton detection model performs feature extraction on the image to be detected to obtain a plurality of target feature graphs, including:
and carrying out feature extraction on the input image to be detected through the plurality of feature extraction layers and the ASPP module which are sequentially connected in series, and obtaining a plurality of target feature images according to output results of the plurality of feature extraction layers and the ASPP module.
In an optional embodiment, the feature fusion network includes a plurality of first convolution layers, a data splicing layer and a second convolution layer, where the plurality of first convolution layers are in one-to-one correspondence with the plurality of target feature maps, and the inputting the plurality of target feature maps into the feature fusion network of the skeleton detection model, and performing feature fusion on the plurality of target feature maps by using the feature fusion network to obtain a fused feature map includes:
respectively inputting the target feature maps into corresponding first convolution layers;
Performing convolution operation on the input target feature images by using the plurality of first convolution layers to obtain a plurality of feature images to be fused;
Inputting the feature images to be fused into the data splicing layer, and splicing the feature images to be fused by using the data splicing layer to obtain a spliced feature image;
And inputting the spliced feature images into the second convolution layer, and carrying out convolution operation on the spliced feature images by using the second convolution layer to obtain the fused feature images.
In an alternative embodiment, the vector offset network comprises a third convolution layer, a plurality of fourth convolution layers, a plurality of fifth convolution layers and a feature fusion layer, wherein the fourth convolution layers and the fifth convolution layers are connected in one-to-one correspondence;
Inputting the fused feature map into a vector migration network of the skeleton detection model, and performing migration processing on the fused feature map by using the vector migration network to obtain a migrated feature map, wherein the method comprises the following steps of:
And inputting the fused feature images into the third convolution layer and each fourth convolution layer, respectively carrying out convolution operation on the fused feature images by utilizing the third convolution layer, the fourth convolution layers and the fifth convolution layers which are sequentially connected in series, and inputting the output result of the third convolution layer and the output result of each fifth convolution layer into the feature fusion layer to carry out feature fusion, so as to obtain the feature images after offset.
In a second aspect, the present invention provides a skeleton detecting apparatus, the apparatus comprising:
The image input module is used for inputting the image to be detected into a trained skeleton detection model, wherein a skeleton label adopted for training the skeleton detection model is obtained by expanding an original skeleton label;
the feature extraction module is used for carrying out feature extraction on the image to be detected by utilizing a feature extraction network of the skeleton detection model to obtain a plurality of target feature images;
the feature fusion module is used for inputting the multiple target feature images into a feature fusion network of the skeleton detection model, and carrying out feature fusion on the multiple target feature images by utilizing the feature fusion network to obtain fused feature images;
The migration processing module is used for inputting the fused feature images into a vector migration network of the skeleton detection model, and carrying out migration processing on the fused feature images by using the vector migration network to obtain the feature images after migration;
And the output module is used for inputting the shifted feature images into an output convolution layer of the skeleton detection model, and carrying out convolution operation on the shifted feature images by utilizing the output convolution layer to obtain skeleton detection results corresponding to the images to be detected.
In an alternative embodiment, the vector offset network comprises a third convolution layer, a plurality of fourth convolution layers, a plurality of fifth convolution layers and a feature fusion layer, wherein the fourth convolution layers and the fifth convolution layers are connected in one-to-one correspondence;
The offset processing module is configured to input the fused feature map into the third convolution layer and each fourth convolution layer, perform convolution operation on the fused feature map by using the third convolution layer, the fourth convolution layers and the fifth convolution layers that are sequentially connected in series, and input an output result of the third convolution layer and an output result of each fifth convolution layer into the feature fusion layer to perform feature fusion, so as to obtain an offset feature map.
In an alternative embodiment, the skeleton tag is generated by performing expansion operation on an original skeleton tag according to a preset radius threshold to obtain an expanded skeleton tag, calculating distance information from each non-zero pixel point in the expanded skeleton tag to a nearest zero point, obtaining a Gaussian distribution expansion skeleton according to the distance information and a preset Gaussian distribution function, and performing normalization processing on the Gaussian distribution expansion skeleton according to the maximum value in the Gaussian distribution expansion skeleton to obtain the skeleton tag.
In a third aspect, the present invention provides an electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program implementing the steps of the skeleton detection method of any one of the preceding embodiments when executed by the processor.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the skeleton detection method of any of the preceding embodiments.
The skeleton detection method, the device, the electronic equipment and the computer readable storage medium provided by the embodiment of the invention can obtain the skeleton label required by training the skeleton detection model by expanding the original skeleton label, effectively improve the task difficulty in the supervision process, and improve the detection capability of a network on the periphery of the skeleton, introduce a vector offset network into the network structure of the skeleton detection model, offset the detected skeleton information, and realize the shrinkage of the output skeleton information. Therefore, the skeleton detection precision of the whole model is greatly improved through the expansion of the skeleton label and the contraction of the output skeleton information.
In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 shows a schematic representation of the generation of Gaussian distribution skeleton tags;
FIG. 2 shows a schematic structural diagram of a skeleton detection model;
FIG. 3 shows another structural schematic diagram of a skeleton detection model;
FIG. 4 shows a schematic diagram of a structure of a vector shift network;
FIG. 5 is a schematic diagram showing the effect of the vector shift network after the shift process;
fig. 6 is a schematic flow chart of a skeleton detection method according to an embodiment of the present invention;
FIG. 7 is a functional block diagram of a skeleton detecting device according to an embodiment of the present invention;
Fig. 8 shows a block schematic diagram of an electronic device according to an embodiment of the present invention.
The icons comprise a 700-skeleton detection device, 800-electronic equipment, 710-image input module, 720-feature extraction module, 730-feature fusion module, 740-offset processing module, 750-output module, 810-memory, 820-processor and 830-communication module.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.
It is noted that relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
Vision is an important approach for human perception society, and with the continuous development and progress of computer technology in the world, the degree of social informatization and intelligence is also continuously improved, so that the computer vision technology is widely applied.
Skeleton detection, which is a branch of computer vision, has become a concern for many expert students, and direct detection of an object skeleton from natural images is a hotspot problem, and also faces many challenges. Objects in natural images often take on various forms, colors, textures and shapes, and in addition, images obtained in natural scenes are inevitably affected by problems such as uneven illumination, low contrast, shielding and the like.
The existing skeleton detection technology mainly uses a plurality of skeleton filters, each skeleton filter has a zero sum structure and a reflection symmetry structure, so that noise distribution without the symmetry structure is filtered, and an effective skeleton is reserved. In this manner, the high-noise image is subjected to unsupervised skeleton extraction by using a plurality of skeleton filters, and the object skeleton detection accuracy is lowered when facing a complex scene.
Based on the above, the embodiment of the invention provides a skeleton detection method aiming at the object skeleton detection of a natural scene, wherein the skeleton detection is divided into two processes of training and reasoning, and in the training process of a skeleton detection model, the original skeleton label is expanded by introducing Gaussian probability distribution, so that the task difficulty in the supervision process is improved, and the detection capability of a network on the periphery of the skeleton is improved. And a vector offset network is introduced into the network structure of the skeleton detection model, and can be used as an independent module to be inserted into the network for training, so that the detected skeleton information is intensively offset to obtain a contracted skeleton output result, and the whole network precision is improved through the expansion of the label and the contraction of the output skeleton information. In addition, the skeleton detection method provided by the embodiment of the invention is an end-to-end skeleton detection method, after the skeleton detection model is trained, an object picture is given, and a skeleton binary image corresponding to the object picture can be obtained through a series of processes of the skeleton detection model, so that more post-processing of the obtained skeleton is not needed, a part of time expenditure is saved, and an optimal result is achieved in time performance.
Next, a training process of the skeleton detection model will be described in detail.
Because the training process of the skeleton detection model is supervised learning, paired real images and skeleton labels are required to form a data set for training the skeleton detection model. Before training the skeleton detection model, performing expansion operation on the original skeleton tag in the data set, namely expanding skeleton detection labeling data information, and introducing Gaussian distribution mask labeling information.
The gaussian distribution of skeleton information provided by the embodiment is different from a common segmentation task mask, and the gaussian distribution mask marks weight information of positions which are closer to the skeleton label, namely positions which are closer to the original skeleton label have larger weights. If the pixel points of the skeleton are simply and symmetrically expanded, the original skeleton label is simply expanded in pixel, and the real information weight of the pixels around the original skeleton label is not considered, so that the accuracy of the whole model detection is not facilitated.
The gaussian distribution model is widely applied to the distribution of continuous random variables, the distribution rule of the gaussian distribution model is introduced into a skeleton supervision tag, and P (x) represents a unitary gaussian distribution basic mathematical model (i.e. a gaussian distribution function), and the formula is as follows:
The original skeleton label is taken as a Gaussian distribution center axis, and the closer the original skeleton label is, the more weight is possessed, namely, the brighter the original skeleton label is, the smaller the weight is, and finally the weight tends to 0 as the original skeleton label is gradually far away.
The method comprises the steps of (1) expanding an original skeleton label according to a preset radius threshold to obtain an expanded skeleton label, (2) calculating distance information from each non-zero pixel point in the expanded skeleton label to a nearest zero value point, (3) obtaining a Gaussian distribution expansion skeleton according to the distance information and a preset Gaussian distribution function, and (4) carrying out normalization processing on the Gaussian distribution expansion skeleton according to the maximum value in the Gaussian distribution expansion skeleton to obtain the skeleton label.
The radius threshold r is used for designating the number of pixels expanded to two sides by taking the skeleton of an original pixel as the center when expanding. The specific value of the radius threshold r can be set according to actual needs, and this embodiment is not limited thereto.
In the binary image, the value of the black part is 0, the value of the white part is 1, and the distance information from each non-zero pixel point to the nearest zero point is that the distance from each white pixel point to the nearest black pixel point is found respectively.
The normalization process refers to that if the obtained gaussian expansion skeleton is represented by an array to be a two-dimensional array, and the value in the gaussian expansion skeleton is 0-255, the maximum value (if 235) can be found, the maximum value is mapped to 1, and other values are subjected to equal ratio mapping (for example, 100 is mapped to 100/235) according to the value, so that the normalization process of the gaussian expansion skeleton is realized, and then the skeleton label for training is obtained.
In one example, referring to fig. 1, (a) is a training image, (b) is an original skeleton tag corresponding to the training image, when the original skeleton tag is inflated, a copy may be created for the original skeleton tag, a radius threshold r (for example, r=5) and gaussian function parameters (i.e. μ and σ in the gaussian distribution function) are set, then an inflation operation is performed on the original skeleton tag (b) by using a dilate function in opencv to obtain (c), a distance information (i.e. a parameter x in the gaussian distribution function) from each non-zero pixel point to the nearest zero point in (c) is calculated by using a distance conversion function distanceTransform in opencv, then the distance information is brought into the gaussian distribution function to obtain a gaussian distribution inflated skeleton as shown in (d) in fig. 1, and finally, normalization processing is performed on the gaussian distribution inflated skeleton with a maximum value in (d) to obtain the skeleton tag for training.
As can be seen from fig. 1, the skeleton tag obtained after expansion uses the original skeleton tag given by the data set as a central axis, and widens towards two sides. Compared with the original skeleton tag, the skeleton tag obtained by the embodiment is smoother, and is also more beneficial to network learning of pixel information around the object skeleton.
Compared with the existing method Hi-fi, the Gaussian distribution skeleton tag can utilize more supervision information of surrounding pixels in the process of using network learning, and meanwhile, the weight distribution is utilized to give consideration to the position information of the original skeleton tag.
After the pretreated skeleton tag (i.e., the gaussian distribution skeleton tag) is obtained, a pre-constructed skeleton detection model can be trained. As shown in fig. 2, the structure of the skeleton detection model includes a feature extraction network, a feature fusion network, a vector offset network and an output convolution layer, and the images of the natural object sequentially pass through the feature extraction network, the feature fusion network, the vector offset network and the output convolution layer in fig. 2, so that the corresponding binary skeleton images can be obtained, and then model parameters are trained by optimizing the loss of the output skeleton and the preprocessed skeleton label, so as to obtain the trained skeleton detection model.
The training process of the skeleton detection model mainly comprises the steps of obtaining a training data set, wherein the training data set comprises training images and skeleton labels corresponding to the training images. It is understood that the skeleton tag is a gaussian distribution skeleton tag obtained by expanding the original skeleton tag. Inputting the training image into a pre-constructed skeleton detection model, and extracting features of the training image by using a feature extraction network of the skeleton detection model to obtain a plurality of target feature images. Inputting the multiple target feature images into a feature fusion network of the skeleton detection model, and carrying out feature fusion on the multiple target feature images by using the feature fusion network to obtain fused feature images. And performing offset processing on the fused feature images by using a vector offset network of the skeleton detection model to obtain the offset feature images. And carrying out convolution operation on the shifted feature images by using an output convolution layer of the skeleton detection model to obtain skeleton predicted values corresponding to the training images. And updating parameters of the skeleton detection model according to the skeleton label corresponding to the training image and the skeleton predicted value corresponding to the training image to obtain a trained skeleton detection model.
Therefore, the framework labels required by training the framework detection model are obtained by expanding the original framework labels, the task difficulty in the supervision process is effectively improved, the detection capacity of the network on the periphery of the framework is improved, a vector offset network is introduced into the network structure of the framework detection model, the detected framework information is offset, and the shrinkage of the output framework information is realized. Therefore, the skeleton detection precision of the whole model is greatly improved through the expansion of the skeleton label and the contraction of the output skeleton information.
Alternatively, in one embodiment, the specific structure of the feature extraction network and the feature fusion network may refer to fig. 3. As shown in fig. 3, the feature extraction network includes a plurality of feature extraction layers, i.e., C1 (Conv 1), C2 (Conv 2), C3 (Conv 3), C4 (Conv 4), C5 (Conv 5), and ASPP (Atrous SPATIAL PYRAMID Pooling, hole space pyramid pooling) modules, which are sequentially connected in series, and the plurality of feature extraction layers constitute a backbone feature extraction network. It should be noted that the number of feature extraction layers in this embodiment is only an example, and in practical applications, the feature extraction layers may be added or reduced as needed.
When the feature extraction network performs feature extraction on the training image, the feature extraction can be performed on the input training image through a plurality of feature extraction layers and ASPP modules which are sequentially connected in series, and then a plurality of target feature images are obtained according to the output results of the feature extraction layers and the ASPP modules.
For example, the output results of the feature extraction layers C3, C4, C5 and ASPP modules may be respectively used as the target feature graphs, so as to obtain a plurality of target feature graph input feature fusion networks.
In one example, the trunk feature extraction network may be obtained based on VGG16, i.e., discarding the last pooling layer and the subsequent fully-connected layers of VGG16, to obtain the trunk feature extraction network represented by C1, C2, C3, C4, and C5. Wherein C1, C2 each comprise two 3 x 3 convolutional layers, C3, C4, C5 each comprise three 3 x 3 convolutional layers, and there is one maximum pooling layer between C1, C2, C3, C4, C5.
In order to increase the scale detection information of the network, the ASPP module is added after the main feature is extracted from the network, so that the receptive field can be enlarged, and the extraction capability of the network to the large-scale features of the image can be increased. In one example, four parallel hole fusion convolutional layers with 3 x 3kernels but with different sampling rates (2, 4, 8, 16) can be added as ASPP modules to the back of the backbone feature extraction network and then concatenated along the channel dimension. The features are typically 4-dimensional data n×c×w×h, and the foregoing concatenation along the channel dimension is performed according to the dimension of the channel c (i.e. feature fusion). Where 4-dimensional data refers to, for example, the shape of the data is (10,1,28,28), it corresponds to 10 data of height 28, length 28, channel 1.
Still referring to fig. 3, the feature fusion network includes a plurality of first convolution layers, a data stitching layer, and a second convolution layer, where the plurality of first convolution layers are in one-to-one correspondence with the plurality of target feature maps. The method comprises the steps of respectively inputting a plurality of target feature images into corresponding first convolution layers, respectively carrying out convolution operation on the input target feature images by utilizing the plurality of first convolution layers to obtain a plurality of feature images to be fused, inputting the plurality of feature images to be fused into a data splicing layer, splicing the plurality of feature images to be fused by utilizing the data splicing layer to obtain a spliced feature image, inputting the spliced feature image into a second convolution layer, and carrying out convolution operation on the spliced feature image by utilizing the second convolution layer to obtain the fused feature image.
In this embodiment, in order to better combine the high-dimensional semantic information and the low-dimensional spatial information and improve the feature extraction effectiveness, the results output by the feature extraction layers C3, C4, C5 and ASPP modules may be used as target feature graphs, and input to the corresponding first convolution layers to perform 1×1 convolution, and then perform concat stitching. Since the feature maps of the different layers (i.e. the output results) have different spatial resolutions, they are all adjusted to Conv3 size using an upsampling operation before they are connected together, and then the spliced feature maps are input to a second convolution layer for a 1x 1 convolution operation, and the resulting fused feature maps are sent to a vector-shift network. In fig. 3, "x 2", "x4" indicates up-sampling magnification after convolution, "x2" indicates magnification of 2 times, and "×4" indicates magnification of 4 times.
In this embodiment, the vector shift network performs centralized shift on the detected skeleton information. In one embodiment, please refer to fig. 4, which is a schematic diagram of a structure of a vector shift network. The vector offset network comprises a third convolution layer, a plurality of fourth convolution layers, a plurality of fifth convolution layers and a feature fusion layer, wherein the fourth convolution layers are connected with the fifth convolution layers in a one-to-one correspondence manner, and the third convolution layer and the fifth convolution layers are connected with the feature fusion layer. Wherein, a convolution path is formed between the third convolution layer and the feature fusion layer, each fourth convolution layer, the fifth convolution layer connected with the fourth convolution layer and the feature fusion layer also form a convolution path, and therefore, the vector shift network is actually formed by a plurality of different convolution paths. It should be noted that, the vector shift network in fig. 4 only shows five convolution paths, and in practical application, convolution paths may be added or reduced as needed, which is not limited in this embodiment.
And inputting the fused feature graphs output by the feature fusion network into a third convolution layer and each fourth convolution layer, respectively carrying out convolution operation on the fused feature graphs by using the third convolution layer, the fourth convolution layers and the fifth convolution layers which are sequentially connected in series, and inputting the output results of the third convolution layers and the output results of each fifth convolution layer into the feature fusion layer to carry out feature fusion, so as to obtain the feature graphs after deflection. Therefore, the peripheral related features can be better utilized through the vector migration network, and the skeleton features extracted from the front part in the skeleton detection model are concentrated and migrated.
In one example, 1*1 convolutions can be adopted by the third convolution layer and the fifth convolution layers, and the fourth convolution layers can be respectively formed by convolutions with 5*1, 1*5, 11 x1 and 1 x 11 of four different scales, so that the third convolution layer can serve as a jump connection layer to play a role of residual connection, and finally, combination of deep semantic information and shallow semantic information is realized. Of course, in practical applications, the fourth convolution layers may also use other convolution layers with different convolution kernel sizes, for example, 3×1,1×3,7×1,1×7, and other combinations. In addition, the third convolution layer, the fourth convolution layer and the fifth convolution layer comprise an activation function (Relu) layer besides the convolution layer Conv for performing convolution processing, the value of the convolution layer Conv after the convolution processing is input into a corresponding activation function layer for processing, a final output result is obtained, and a nonlinear factor is added by using an activation function, so that the expressive capacity of the model is improved.
Because the vector offset network is designed with convolution layers (such as 5*1, 1*5, 11 x 1 and 1 x 11) with different scales, the information in the horizontal direction and the vertical direction can be utilized during convolution operation, so that the concentrated offset of skeleton information is realized.
Therefore, the vector offset network is added into the skeleton detection model to realize concentrated offset of the predicted skeleton information, so that the skeleton extraction precision in the final reasoning process is higher, and the detection precision of the whole model is improved. As shown in fig. 5 (a), supervision is performed by using skeleton labels obtained by gaussian distribution, which results in a case where the edge probability value is large, and the skeleton detection accuracy is calculated by comparing the values with the fixed pixel width, so that a vector offset network is inserted into the constructed skeleton detection model, the predicted skeleton values can be intensively offset through training of the network, and finally the skeleton output of the object shown in fig. 5 (b) is obtained.
Typically, the network structure through skeleton detection results in skeleton features that use 1*1 convolutions for feature combinations, and in addition, serve as final dimension transformation reduction. In the multi-layer channel, the pixel points at the same position are combined by different functions to output the prediction of the final skeleton characteristic. Assuming C channels, the convolution output for each 1*1 is Ooutput:
And Ooutput=wc1*xc1+wc2*xc2+…+wcc*xcc +b, wherein w is a combination weight learned on each group of channels, b is bias, and x is a characteristic value of a corresponding position on the channel.
The skeleton prediction is severely dependent on the location of the skeleton prediction, and a one to two pixel shift may result in a final accuracy drop. However, this 1*1 convolution structure only produces information calculations between channels, and does not make good use of surrounding feature information. Therefore, the present embodiment inserts the vector offset network of fig. 4 into the skeleton detection model for joint training and multiplexing (i.e. one or more vector offset networks can be added to the model) multiple times, so that the best result is achieved on the corresponding data set.
After the shifted feature map output by the vector shifting module is obtained, the shifted feature map is up-sampled to the size of the input image after the 1×1 convolution operation is performed on the shifted feature map by the output convolution layer. For up-sampling, bilinear interpolation may be used, and the final output is a single-channel skeleton binary map.
In one embodiment, the skeleton detection model may be trained and tested using five data sets SK-LARGE, SK506, WH-SYMMAX, SYM-PASCAL, SYMMAX, 300, respectively. The data set SK-LARGE is a reference data set for object skeleton detection, and comprises 1491 images, 746 images for training and 745 images for testing.
Dataset SK506, also known as SK-SMALL, is an early version of SK-LARGE, containing 300 training images and 206 test images.
The dataset WH-SYMMAX contained 328 cropped images from the Weizmann Horse dataset with skeletal annotations, which were divided into 228 training images and 100 test images.
The dataset SYM-paspal was derived from the paspal-VOC-2011 split dataset and was targeted for field object symmetry detection, consisting of 648 training images and 787 test images.
Dataset SYMMAX300 was built on top of Berkeley Segmentation Dataset (BSDS 300), which contained 200 training images and 100 test images, with both foreground and background symmetry being considered.
And (3) for the training data set, expanding the original skeleton label in the data set by using the Gaussian distribution skeleton label generation method to obtain the final skeleton label used in training. The input training image is denoted by Iinput, the original skeleton tag is denoted by SGT, the expanded skeleton tag is denoted by SGT_G, and the training dataset { Iinput,SGT_G } is denoted by SK-G hereinafter.
The skeleton detection model is finally output as a single-channel skeleton picture SOutput, and the constructed skeleton detection model is trained by optimizing the following loss function minL (SGT_G,SOutput).
In particular, in training a skeleton detection model, a Mean Square Error (MSE) may be used as a loss function to constrain the model output and true values. The training image size is adjusted to 3 different ratios (0.8, 1, 1.2), then they are rotated to 4 angles (0 °,90 °,180 °,270 °), and finally they are flipped (up and down, left and right, no flip) with respect to different axes. The proposed backbone feature extraction network is initialized by a pre-trained VGG16 model on ImageNet and the network is optimized using an ADAM solver. The learning rate is set to 10-4 for the first 100k iterations, and then reduced to 10-5 for the remaining 40k iterations. All training and evaluation procedures were performed on NVIDIA GeForce GTX Ti graphics cards.
The trained skeleton detection model is a probability distribution result and can be directly used as the output of skeleton detection. Compared with Deepflux, the time overhead generated by using non-maximum values is saved, so that the prediction of the skeleton can be completed more quickly.
Referring to table 1, for the method (Ours) for training the skeleton-detection model given in this example, the F-measurement (F-value) on the five skeleton-detection reference data sets was highest in comparison with the other methods (LMSDS, SRN, OD-SRN, LSN, hi-Fi, deep flux).
TABLE 1
Table 2 shows the effect of testing the generation of Gaussian distribution skeleton tags and the introduction of a vector-shifted network on F-measure based on the SK-LARGE dataset.
TABLE 2
MethodeGauss expansionVector offsetF-measure
DeepFlux0.732
middle0.743
OURS0.748
Therefore, in the embodiment, the skeleton detection effect of the model can be greatly improved through the expansion of the skeleton tag and the deviation of the skeleton information.
After training a pre-constructed skeleton detection model according to the training process, giving an object picture, and sequentially passing through a trunk feature extraction network, an ASPP module, a feature fusion network, a vector offset network and an output convolution layer of the skeleton detection model to obtain a skeleton binary image (namely a skeleton detection result) corresponding to the input picture.
Next, a process of skeleton detection (i.e., an inference process) is described in detail based on the trained skeleton detection model.
Fig. 6 is a schematic flow chart of a skeleton detection method according to an embodiment of the invention. It should be noted that, the skeleton detecting method according to the embodiment of the present invention is not limited by fig. 6 and the following specific sequence, and it should be understood that, in other embodiments, the sequence of part of the steps in the skeleton detecting method according to the embodiment of the present invention may be interchanged according to actual needs, or part of the steps may be omitted or deleted. The specific flow shown in fig. 6 will be described in detail.
And step S601, inputting the image to be detected into a trained skeleton detection model, wherein skeleton labels adopted for training the skeleton detection model are obtained by expanding original skeleton labels.
In this embodiment, the image to be detected may be an object image obtained in a natural scene. The skeleton detection model comprises a feature extraction network, a feature fusion network, a vector migration network and an output convolution layer, and an image to be detected of the input skeleton detection model is sequentially processed by the feature extraction network, the feature fusion network, the vector migration network and the output convolution layer, so that a skeleton detection result of the image to be detected is finally obtained.
And step S602, extracting features of the image to be detected by using a feature extraction network of the skeleton detection model to obtain a plurality of target feature images.
In this embodiment, after an image to be detected is input into a trained skeleton detection model, feature extraction is performed on the image to be detected through a feature extraction network in the skeleton detection model, and a plurality of target feature images are output.
Step S603, inputting the multiple target feature images into a feature fusion network of the skeleton detection model, and carrying out feature fusion on the multiple target feature images by using the feature fusion network to obtain fused feature images.
In this embodiment, after feature fusion is performed on a plurality of target feature graphs output by a feature extraction network through a feature fusion network, the obtained fused feature graphs are used as input of a vector offset network.
Step S604, inputting the fused feature images into a vector offset network of the skeleton detection model, and performing offset processing on the fused feature images by using the vector offset network to obtain offset feature images.
In this embodiment, after the fused feature map output by the feature fusion network is input into the vector offset network, offset processing is performed through the vector offset network, and the obtained offset feature map is used as the input of the output convolution layer.
Step S605, inputting the shifted feature map into an output convolution layer of the skeleton detection model, and carrying out convolution operation on the shifted feature map by using the output convolution layer to obtain a skeleton detection result corresponding to the image to be detected.
In this embodiment, the output convolution layer performs a1×1 convolution operation on the shifted feature map, and then upsamples the feature map to the size of the input image (i.e., the image to be detected), and finally outputs a skeleton binary map (i.e., the skeleton detection result) of a single channel.
Therefore, according to the skeleton detection method provided by the embodiment of the invention, after the skeleton detection model is trained, the image to be detected is input into the trained skeleton detection model, and the skeleton detection result of the image to be detected can be obtained through the processing of the feature extraction network, the feature fusion network, the vector offset network and the output convolution layer in sequence. The framework labels required for training the framework detection model are obtained by expanding the original framework labels, so that task difficulty in a supervision process is effectively improved, detection capacity of a network on the periphery of a framework is improved, a vector offset network is introduced into a network structure of the framework detection model, detected framework information is offset, and shrinkage of the output framework information is realized. Therefore, the skeleton detection precision of the whole model is greatly improved through the expansion of the skeleton label and the contraction of the output skeleton information. The skeleton detection method is an end-to-end skeleton detection method, can directly output skeleton detection results, does not need more post-treatment on the obtained skeleton, omits a part of time expenditure, and achieves the optimal result in time performance.
In one embodiment, the feature extraction network includes a plurality of feature extraction layers and ASPP modules sequentially connected in series, and the step S602 may include performing feature extraction on the input image to be detected through the plurality of feature extraction layers and ASPP modules sequentially connected in series, and obtaining a plurality of target feature maps according to output results of the plurality of feature extraction layers and ASPP modules.
In this embodiment, the plurality of feature extraction layers connected in series in sequence are used as a main feature extraction network, and can be obtained based on VGG16, and an ASPP module is added after the main feature extraction network, so that the receptive field can be enlarged, and the extraction capability of the network on large-scale features of the image can be increased, thereby increasing the scale detection information of the network.
In one embodiment, the feature fusion network comprises a plurality of first convolution layers, a data splicing layer and a second convolution layer, wherein the first convolution layers are in one-to-one correspondence with the target feature images, the step S603 can comprise the steps of respectively inputting the target feature images into the corresponding first convolution layers, respectively carrying out convolution operation on the input target feature images by using the first convolution layers to obtain a plurality of feature images to be fused, inputting the feature images to be fused into the data splicing layer, splicing the feature images to be fused by using the data splicing layer to obtain a spliced feature image, inputting the spliced feature image into the second convolution layer, and carrying out convolution operation on the spliced feature image by using the second convolution layer to obtain the fused feature image.
In this embodiment, after a plurality of target feature graphs are input into a first convolution layer to perform 1×1 convolution, a data splicing layer is input to perform concat splicing, and a spliced feature graph is input into a second convolution layer to perform 1×1 convolution operation, so as to obtain a fused feature graph. Therefore, the high-dimensional semantic information and the low-dimensional spatial information can be combined better, and the feature extraction effectiveness is improved.
In one embodiment, the vector offset network comprises a third convolution layer, a plurality of fourth convolution layers, a plurality of fifth convolution layers and a feature fusion layer, wherein the fourth convolution layers and the fifth convolution layers are connected in one-to-one correspondence, the third convolution layers and the fifth convolution layers are connected with the feature fusion layer, the step S604 can comprise the steps of inputting the fused feature map into the third convolution layer and each fourth convolution layer, carrying out convolution operation on the fused feature map by utilizing the third convolution layer and the fourth convolution layers and the fifth convolution layers which are sequentially connected in series, and carrying out feature fusion on an output result of the third convolution layer and an output result of each fifth convolution layer, so as to obtain the feature map after offset.
In this embodiment, the vector offset network is formed by combining a plurality of convolution paths, each convolution path includes convolution layers with different scales, and when in convolution operation, information in two directions of a transverse direction and a longitudinal direction can be utilized, so that concentrated offset of skeleton information is realized, skeleton extraction precision is improved, and detection precision of the whole model is further improved.
In order to perform the corresponding steps in the above embodiments and in each possible way, an implementation of the skeleton detecting device is given below. Referring to fig. 7, a functional block diagram of a skeleton detecting device 700 according to an embodiment of the invention is shown. It should be noted that, the basic principle and the technical effects of the skeleton detecting device 700 provided in this embodiment are the same as those of the above embodiment, and for brevity, reference should be made to the corresponding contents of the above embodiment. The skeleton detecting apparatus 700 includes an image input module 710, a feature extraction module 720, a feature fusion module 730, an offset processing module 740, and an output module 750.
The image input module 710 is configured to input an image to be detected into a trained skeleton detection model, where skeleton labels used for training the skeleton detection model are obtained by expanding original skeleton labels.
It is understood that the image input module 710 may perform the above step S601.
The feature extraction module 720 is configured to perform feature extraction on an image to be detected by using a feature extraction network of the skeleton detection model, so as to obtain a plurality of target feature graphs.
It is understood that the feature extraction module 720 may perform the step S602 described above.
The feature fusion module 730 is configured to input the plurality of target feature graphs into a feature fusion network of the skeleton detection model, and perform feature fusion on the plurality of target feature graphs by using the feature fusion network to obtain a fused feature graph.
It is understood that the feature fusion module 730 may perform the step S603 described above.
The offset processing module 740 is configured to input the fused feature map into a vector offset network of the skeleton detection model, and perform offset processing on the fused feature map by using the vector offset network to obtain an offset feature map.
It is understood that the offset processing module 740 may perform the step S604 described above.
The output module 750 is configured to input the shifted feature map into an output convolution layer of the skeleton detection model, and perform convolution operation on the shifted feature map by using the output convolution layer to obtain a skeleton detection result corresponding to the image to be detected.
It is understood that the output module 750 may perform the above step S605.
Optionally, the feature extraction network includes a plurality of feature extraction layers and a hole space pyramid pooling ASPP module sequentially connected in series, and the feature extraction module 720 is configured to perform feature extraction on an input image to be detected through the plurality of feature extraction layers and ASPP modules sequentially connected in series, and obtain a plurality of target feature graphs according to output results of the plurality of feature extraction layers and ASPP modules.
The feature fusion network comprises a plurality of first convolution layers, a data splicing layer and a second convolution layer, wherein the first convolution layers are in one-to-one correspondence with the target feature images, the feature fusion module 730 is used for respectively inputting the target feature images into the corresponding first convolution layers, carrying out convolution operation on the input target feature images by using the first convolution layers to obtain feature images to be fused, inputting the feature images to be fused into the data splicing layer, splicing the feature images to be fused by using the data splicing layer to obtain spliced feature images, inputting the spliced feature images into the second convolution layer, and carrying out convolution operation on the spliced feature images by using the second convolution layer to obtain the fused feature images.
Optionally, the vector offset network includes a third convolution layer, a plurality of fourth convolution layers, a plurality of fifth convolution layers, and a feature fusion layer, where the plurality of fourth convolution layers and the plurality of fifth convolution layers are connected in one-to-one correspondence, and the third convolution layer and the plurality of fifth convolution layers are connected to the feature fusion layer. The offset processing module 740 is configured to input the fused feature map into a third convolution layer and each fourth convolution layer, perform convolution operation on the fused feature map by using the third convolution layer, the fourth convolution layers and the fifth convolution layers that are sequentially connected in series, and input the output result of the third convolution layer and the output result of each fifth convolution layer into the feature fusion layer to perform feature fusion, so as to obtain the feature map after offset.
According to the skeleton detection device 700 provided by the embodiment of the invention, the image input module 710 inputs the image to be detected into the trained skeleton detection model, and the feature extraction module 720 performs feature extraction on the image to be detected by using a feature extraction network of the skeleton detection model to obtain a plurality of target feature images. The feature fusion module 730 inputs the multiple target feature images into a feature fusion network of the skeleton detection model, and performs feature fusion on the multiple target feature images by using the feature fusion network to obtain a fused feature image. The migration processing module 740 inputs the fused feature images into a vector migration network of the skeleton detection model, and performs migration processing on the fused feature images by using the vector migration network to obtain the feature images after migration. The output module 750 inputs the shifted feature map into an output convolution layer of the skeleton detection model, and carries out convolution operation on the shifted feature map by using the output convolution layer to obtain a skeleton detection result corresponding to the image to be detected. The framework labels required for training the framework detection model are obtained by expanding the original framework labels, so that task difficulty in a supervision process is effectively improved, detection capacity of a network on the periphery of a framework is improved, a vector offset network is introduced into a network structure of the framework detection model, detected framework information is offset, and shrinkage of the output framework information is realized. Therefore, the skeleton detection precision of the whole model is greatly improved through the expansion of the skeleton label and the contraction of the output skeleton information.
Referring to fig. 8, a block diagram of an electronic device 800 according to an embodiment of the invention is shown. The electronic device 800 includes a memory 810, a processor 820, and a communication module 830. The memory 810, the processor 820, and the communication module 830 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.
Wherein the memory 810 is used to store programs or data. The Memory 810 may be, but is not limited to, random access Memory (RandomAccess Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.
The processor 820 is used to read/write data or programs stored in the memory 810 and perform corresponding functions. For example, the skeleton detecting method disclosed in the above embodiments may be implemented when a computer program stored in the memory 810 is executed by the processor 820.
The communication module 830 is used for establishing a communication connection between the electronic device 800 and other communication terminals through a network, and for transceiving data through the network.
It should be understood that the architecture shown in fig. 8 is merely a schematic diagram of a server, and that the server may also include more or fewer components than shown in fig. 8, or have a different configuration than shown in fig. 8. The components shown in fig. 8 may be implemented in hardware, software, or a combination thereof.
Embodiments of the present invention also provide a computer-readable storage medium having stored thereon a computer program which, when executed by the processor 820, implements the skeleton detection method disclosed in the above embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

CN202111318921.9A2021-11-092021-11-09Skeleton detection method, device, electronic equipment and computer readable storage mediumActiveCN114022458B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111318921.9ACN114022458B (en)2021-11-092021-11-09Skeleton detection method, device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111318921.9ACN114022458B (en)2021-11-092021-11-09Skeleton detection method, device, electronic equipment and computer readable storage medium

Publications (2)

Publication NumberPublication Date
CN114022458A CN114022458A (en)2022-02-08
CN114022458Btrue CN114022458B (en)2024-11-29

Family

ID=80062998

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111318921.9AActiveCN114022458B (en)2021-11-092021-11-09Skeleton detection method, device, electronic equipment and computer readable storage medium

Country Status (1)

CountryLink
CN (1)CN114022458B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114756739A (en)*2022-03-212022-07-15国网辽宁省电力有限公司信息通信分公司 A knowledge recommendation method for related content in the power field
CN114723737B (en)*2022-05-062024-06-07福州大学 A method for detecting the spacing between scaffoldings on construction sites based on computer vision

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109299274A (en)*2018-11-072019-02-01南京大学 A natural scene text detection method based on fully convolutional neural network
CN112651333A (en)*2020-12-242021-04-13世纪龙信息网络有限责任公司Silence living body detection method and device, terminal equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2020024585A1 (en)*2018-08-032020-02-06华为技术有限公司Method and apparatus for training object detection model, and device
CN109858461B (en)*2019-02-212023-06-16苏州大学Method, device, equipment and storage medium for counting dense population
CN110298266B (en)*2019-06-102023-06-06天津大学 Object detection method based on deep neural network based on multi-scale receptive field feature fusion
CN113408455B (en)*2021-06-292022-11-29山东大学 An action recognition method, system and storage medium based on multi-stream information enhanced graph convolutional network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109299274A (en)*2018-11-072019-02-01南京大学 A natural scene text detection method based on fully convolutional neural network
CN112651333A (en)*2020-12-242021-04-13世纪龙信息网络有限责任公司Silence living body detection method and device, terminal equipment and storage medium

Also Published As

Publication numberPublication date
CN114022458A (en)2022-02-08

Similar Documents

PublicationPublication DateTitle
SinghPractical machine learning and image processing: for facial recognition, object detection, and pattern recognition using Python
Zuo et al.Convolutional recurrent neural networks: Learning spatial dependencies for image representation
CN113706526A (en)Training method and device for endoscope image feature learning model and classification model
CN111104941B (en)Image direction correction method and device and electronic equipment
CN110084773A (en)A kind of image interfusion method based on depth convolution autoencoder network
CN114022458B (en)Skeleton detection method, device, electronic equipment and computer readable storage medium
CN112861960B (en)Image tampering detection method, system and storage medium
CN115761258A (en)Image direction prediction method based on multi-scale fusion and attention mechanism
CN118470714B (en)Camouflage object semantic segmentation method, system, medium and electronic equipment based on decision-level feature fusion modeling
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
US20230410465A1 (en)Real time salient object detection in images and videos
CN116050498A (en) Network training method, device, electronic equipment and storage medium
CN111178363B (en)Character recognition method, character recognition device, electronic equipment and readable storage medium
Chacon-Murguia et al.Moving object detection in video sequences based on a two-frame temporal information CNN
CN111488888A (en)Image feature extraction method and human face feature generation device
CN113066094B (en)Geographic grid intelligent local desensitization method based on generation countermeasure network
JP7571800B2 (en) Learning device, learning method, and program
CN117953526A (en) A method for detecting subgraphs of academic papers based on multi-scale features
Wang et al.Face super-resolution via hierarchical multi-scale residual fusion network
Lee et al.Multi-scale foreground-background separation for light field depth estimation with deep convolutional networks
CN114240994A (en) Target tracking method, device, electronic device and storage medium
CN112183650A (en)Digital detection and identification method under camera out-of-focus condition
CN120510169B (en)Robust polyp segmentation method based on improved SAM-Med2D
Li et al.Infrared Small Target Detection with Feature Refinement and Context Enhancement
CN119762813B (en) Image matching method and device based on complementary descriptors

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp