CN112861776B

Movatterモバイル変換

Info

Publication number: CN112861776B
Application number: CN202110243581.1A
Authority: CN
Inventors: 吴俊�; 刘育俊; 吴则彪; 丁恺明; 张翔; 陈延艺
Original assignee: Lop Xiamen System Integration Co ltd; Ropt Technology Group Co ltd
Current assignee: Ropt Technology Group Co ltd; Ropt Xiamen Big Data Group Co ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2024-08-02
Anticipated expiration: 2041-03-05
Also published as: CN112861776A

Abstract

Translated fromChinese

本发明给出了一种基于密集关键点的人体姿态分析方法和系统，包括将包含人体的图片转换为二维像素坐标和三维颜色的特征向量，基于颜色和距离聚类构建局部区域并输入卷积网络提取局部特征；利用多层感知机提取图片的全局特征，融合全局特征和局部特征到全连接网络中获得每个像素的标签并分割出部件，获取各部件的密集关键点；获取密集关键点周围的最佳辅助关键像素坐标，构建包括密集关键点和最佳辅助关键像素的像素集合，叠加像素集合中每个点的特征获得关键点的特征，获取各部位的关键点特征集；将各部位的关键点特征集输入到全连接网络中回归出人体姿态类别。该方法和系统在关键点的基础上提取相应的特征，减小计算的复杂度，提高分析的效率。

The present invention provides a human posture analysis method and system based on dense key points, including converting a picture containing a human body into a feature vector of two-dimensional pixel coordinates and three-dimensional color, constructing a local area based on color and distance clustering and inputting it into a convolutional network to extract local features; using a multi-layer perceptron to extract global features of the picture, fusing global features and local features into a fully connected network to obtain the label of each pixel and segment the components, and obtaining dense key points of each component; obtaining the best auxiliary key pixel coordinates around the dense key points, constructing a pixel set including dense key points and the best auxiliary key pixels, superimposing the features of each point in the pixel set to obtain the features of the key points, and obtaining the key point feature set of each part; inputting the key point feature set of each part into a fully connected network to regress the human posture category. The method and system extract corresponding features based on key points, reduce the complexity of calculation, and improve the efficiency of analysis.

Description

Translated fromChinese

一种基于密集关键点的人体姿态分析方法和系统A human body posture analysis method and system based on dense key points

技术领域Technical Field

本发明涉及计算机视觉的技术领域，尤其是一种基于密集关键点的人体姿态分析方法和系统。The present invention relates to the technical field of computer vision, and in particular to a method and system for analyzing human body posture based on dense key points.

背景技术Background technique

着科技的进步，视频监控领域已经从传统的预览、回放发展到在视频中智能提取有效目标信息，也就是视频结构化技术。视频结构化技术就是通过智能分析算法，从原始视频文件中自动检测场景中的运动目标例如人、车等，并分析提取该目标的属性信息，如性别，车辆颜色，车牌号等。视频结构化技术融合了机器视觉、图像处理、模式识别、机器学习等最前沿的人工智能技术，随着深度学习技术的发展和硬件设备的提升，深度学习应用已经相对成熟。With the advancement of science and technology, the field of video surveillance has evolved from traditional preview and playback to intelligent extraction of effective target information in videos, that is, video structuring technology. Video structuring technology uses intelligent analysis algorithms to automatically detect moving targets in scenes such as people and cars from original video files, and analyze and extract attribute information of the target, such as gender, vehicle color, license plate number, etc. Video structuring technology integrates the most cutting-edge artificial intelligence technologies such as machine vision, image processing, pattern recognition, and machine learning. With the development of deep learning technology and the improvement of hardware equipment, deep learning applications have become relatively mature.

目标检测是视频结构化技术中重要的一部分，目标检测过程是从视频中提取前景目标，然后识别出前景目标是否为有效目标。在目标检测过程主要应用到运动目标检测、人脸检测和车辆检测等技术。目标检测与识别，作为计算机视觉领域的基石，也越来越受到重视。在实际生活中应用也越来越广泛，例如目标跟踪，视频监控，信息安全，自动驾驶，图像检索，医学图像分析，网络数据挖掘，无人机导航，遥感图像分析，国防系统等。在各国学者的共同努力下，目标检测与识别技术飞速发展，并使得最好的目标检测与识别算法在公开数据集上有着跨越式的进步，算法性能在不断地接近人类能力。Object detection is an important part of video structuring technology. The object detection process is to extract foreground objects from the video and then identify whether the foreground objects are valid objects. In the object detection process, the main applications are motion object detection, face detection, and vehicle detection. Object detection and recognition, as the cornerstone of computer vision, are also receiving more and more attention. They are also increasingly used in real life, such as target tracking, video surveillance, information security, autonomous driving, image retrieval, medical image analysis, network data mining, drone navigation, remote sensing image analysis, and national defense systems. With the joint efforts of scholars from various countries, object detection and recognition technology has developed rapidly, and the best object detection and recognition algorithms have made leaps and bounds in public data sets, and the algorithm performance is constantly approaching human capabilities.

如今，目标检测与识别的研究方法主要分为两大类：一类是基于传统图像处理和机器学习算法的目标检测与识别方法，一类是基于深度学习的目标检测与识别方法。传统的目标检测与识别方法主要可以表示为：目标特征提取-目标识别-目标定位。这里所用到的特征都是人为设计的，例如SIFT(尺度不变特征变换匹配算法Scale Invariant FeatureTransform)，HOG(方向梯度直方图特征Histogram of Oriented Gradient)，SURF(加速稳健特征Speeded Up Robust Features)等。通过这些特征对目标进行识别，然后再结合相应的策略对目标进行定位。如今，基于深度学习的目标检测与识别成为主流方法，主要可以表示为：图像的深度特征提取-基于深度神经网络的目标识别与定位，其中主要用到深度神经网络模型是卷积神经网络CNN。目前可以将现有的基于深度学习的目标检测与识别算法大致分为一下三大类：一类是基于区域建议的目标检测与识别算法，如R-CNN，Fast-R-CNN，Faster-R-CNN；一类是基于回归的目标检测与识别算法，如YOLO，SSD；还有一类是基于搜索的目标检测与识别算法，如基于视觉注意的AttentionNet，基于强化学习的算法。Nowadays, the research methods of target detection and recognition are mainly divided into two categories: one is the target detection and recognition method based on traditional image processing and machine learning algorithms, and the other is the target detection and recognition method based on deep learning. The traditional target detection and recognition method can be mainly expressed as: target feature extraction-target recognition-target positioning. The features used here are all artificially designed, such as SIFT (Scale Invariant Feature Transform), HOG (Histogram of Oriented Gradient), SURF (Speeded Up Robust Features), etc. The target is identified through these features, and then the target is positioned in combination with the corresponding strategy. Nowadays, target detection and recognition based on deep learning has become the mainstream method, which can be mainly expressed as: deep feature extraction of images-target recognition and positioning based on deep neural networks, among which the main deep neural network model used is convolutional neural network CNN. Currently, the existing deep learning-based target detection and recognition algorithms can be roughly divided into the following three categories: one is the target detection and recognition algorithm based on region proposal, such as R-CNN, Fast-R-CNN, Faster-R-CNN; one is the target detection and recognition algorithm based on regression, such as YOLO, SSD; and the other is the target detection and recognition algorithm based on search, such as AttentionNet based on visual attention and reinforcement learning-based algorithms.

Faster R-CNN是一种基于深度学习的经典目标检测算法，该算法能够有效的检测并识别目标，但是对于行人小、类内差异大的情况时，则无法精确的检测，且其存在时间复杂度高，存在冗余窗口等问题。HOG+SVM算法是近年来得到广泛应用的一种传统算法，该算法通过计算方向梯度信息从而获取图像的纹理特征，能有效地识别行人模型，但是对于切割后的物体模型识别效果较差。虽然这种基于计算梯度获取特征的方法对于有明显边缘特征的效果不错，但对于一些边缘特征不明显的物体容易产生误判。Faster R-CNN is a classic target detection algorithm based on deep learning. It can effectively detect and identify targets, but it cannot accurately detect pedestrians when they are small or when there are large differences within the class. It also has high time complexity and redundant windows. The HOG+SVM algorithm is a traditional algorithm that has been widely used in recent years. It obtains the texture features of the image by calculating the directional gradient information and can effectively identify pedestrian models, but it has poor recognition effect on cut object models. Although this method of obtaining features based on calculating gradients works well for objects with obvious edge features, it is easy to misjudge some objects with unclear edge features.

发明内容Summary of the invention

为了解决现有技术中对于行人小、类内差异大的情况时，则无法精确的检测，且其存在时间复杂度高，存在冗余窗口、对于切割后的物体模型识别效果较差、对于一些边缘特征不明显的物体容易产生误判的技术问题，本发明提出了一种基于密集关键点的人体姿态分析方法和系统，解决了上述技术问题。In order to solve the technical problems in the prior art that pedestrians are small, the intra-class differences are large, the detection cannot be done accurately, the time complexity is high, there are redundant windows, the recognition effect of cut object models is poor, and some objects with unclear edge features are easily misjudged, the present invention proposes a human posture analysis method and system based on dense key points to solve the above technical problems.

根据本发明的一个方面，提出了一种基于密集关键点的人体姿态分析方法，包括：According to one aspect of the present invention, a human body posture analysis method based on dense key points is proposed, comprising:

S1：将包含人体的图片转换为二维像素坐标和三维颜色的特征向量，基于颜色和距离聚类构建局部区域并输入卷积网络中提取局部特征；S1: Convert the image containing the human body into a feature vector of two-dimensional pixel coordinates and three-dimensional color, construct a local area based on color and distance clustering and input it into the convolutional network to extract local features;

S2：利用多层感知机提取图片的全局特征，融合全局特征和局部特征到全连接网络中获得每个像素的标签并分割出部件，获取各部件的密集关键点；S2: Use a multi-layer perceptron to extract the global features of the image, fuse the global features and local features into a fully connected network to obtain the label of each pixel and segment the components to obtain the dense key points of each component;

S3：获取密集关键点周围的最佳辅助关键像素坐标，构建包括密集关键点和最佳辅助关键像素的像素集合，叠加像素集合中的每个点的特征获得关键点的特征，进而获取各部位的关键点特征集；以及S3: obtaining the coordinates of the best auxiliary key pixels around the dense key points, constructing a pixel set including the dense key points and the best auxiliary key pixels, superimposing the features of each point in the pixel set to obtain the features of the key points, and then obtaining the key point feature set of each part; and

S4：将各部位的关键点特征集输入到全连接网络中回归初人体姿态类别。S4: Input the key point feature set of each part into the fully connected network to regress the initial human posture category.

在一些具体的实施例中，步骤S1具体包括：均匀分配m个聚类中心{C₁,C₂,...,C_m}进行初始化，通过颜色差值和距离差值作为度量标准进行聚类，层级迭代，最终得到m个聚类结果，利用m个聚类结果构建局部区域，输入到卷积网络中提取局部特征，其中，R_j、G_j、B_j和R_c、G_c、B_c分别表示j，c的RGB色值，X_j、Y_j和X_c、Y_c分别表示j，c的二维坐标，j,c代表图像中的二维像素。In some specific embodiments, step S1 specifically includes: uniformly distributing m cluster centers {C₁ , C₂ , ..., C_m } for initialization, and using color difference and distance difference Clustering is performed as a metric and iterated hierarchically to finally obtain m clustering results. The m clustering results are used to construct local regions and input into the convolutional network to extract local features, where R_j , G_j , B_j and R_c , G_c , B_c represent the RGB color values of j and c respectively, X_j , Y_j and X_c , Y_c represent the two-dimensional coordinates of j and c respectively, and j and c represent the two-dimensional pixels in the image.

在一些具体的实施例中，各部件输入到特征金字塔网络中获得各部件的密集关键点。In some specific embodiments, each component is input into a feature pyramid network to obtain dense key points of each component.

在一些具体的实施例中，步骤S3中最佳辅助关键像素坐标的获取方式具体包括：以每个密集关键点为中心原点，计算xy平面上每个点的方位角θ和半径r，构建极坐标系，获取不同部位的部位所占像素的数量与部位关键点数量的比值k，利用最邻近算法获取中心原点周围的k个最佳辅助关键像素坐标。In some specific embodiments, the method for obtaining the best auxiliary key pixel coordinates in step S3 specifically includes: taking each dense key point as the center origin, calculating the azimuth θ and radius r of each point on the xy plane, constructing a polar coordinate system, obtaining the ratio k of the number of pixels occupied by different parts to the number of key points in the part, and using the nearest neighbor algorithm to obtain k best auxiliary key pixel coordinates around the center origin.

在一些具体的实施例中，构建包括关键点与最佳辅助关键像素在内的像素集合{x₁,x₂,...,x_k}，对像素集合内的每个像素进行卷积运算，得到像素的特征f，最终获得特征集合{f₁,f₂,...,f_k}。In some specific embodiments, a pixel set {x₁ ,x₂ ,...,x_k } including key points and optimal auxiliary key pixels is constructed, and a convolution operation is performed on each pixel in the pixel set to obtain a feature f of the pixel, and finally a feature set {f₁ ,f₂ ,...,f_k } is obtained.

在一些具体的实施例中，利用线性函数计算每个辅助关键像素与关键点像素的线性相关性，其中，x_center为中心原点，σ为中心原点的影响距离，即半径r。In some specific embodiments, a linear function is used. Calculate the linear correlation between each auxiliary key pixel and the key point pixel, where x_center is the center origin and σ is the influence distance of the center origin, that is, the radius r.

在一些具体的实施例中，利用核函数对像素集合内的像素进行特征变换，获得新的特征序列，将每个点的特征叠加作为关键点的特征In some specific embodiments, the kernel function is used Perform feature transformation on pixels in the pixel set to obtain a new feature sequence, and superimpose the features of each point as the features of the key point

根据本发明的第二方面，提出了一种计算机可读存储介质，其上存储有一或多个计算机程序，该一或多个计算机程序被计算机处理器执行时实施上述任一项的方法。According to a second aspect of the present invention, a computer-readable storage medium is provided, on which one or more computer programs are stored. When the one or more computer programs are executed by a computer processor, any of the above methods is implemented.

根据本发明的第三方面，提出了一种基于密集关键点的人体姿态分析系统，系统包括：According to a third aspect of the present invention, a human body posture analysis system based on dense key points is proposed, the system comprising:

特征提取单元：配置用于将包含人体的图片转换为二维像素坐标和三维颜色的特征向量，基于颜色和距离聚类构建局部区域并输入卷积网络中提取局部特征；Feature extraction unit: configured to convert a picture containing a human body into a feature vector of two-dimensional pixel coordinates and three-dimensional color, construct a local area based on color and distance clustering, and input it into a convolutional network to extract local features;

密集关键点获取单元：配置用于利用多层感知机提取图片的全局特征，融合全局特征和局部特征到全连接网络中获得每个像素的标签并分割出部件，将每个部件输入到特征金字塔网络中获取各部件的密集关键点；Dense key point acquisition unit: configured to use a multi-layer perceptron to extract global features of the image, fuse global features and local features into a fully connected network to obtain the label of each pixel and segment the components, and input each component into a feature pyramid network to obtain dense key points of each component;

关键点特征集获取单元：配置用于获取密集关键点周围的最佳辅助关键像素坐标，构建包括密集关键点和最佳辅助关键像素的像素集合，叠加像素集合中的每个点的特征获得关键点的特征，进而获取各部位的关键点特征集；以及A key point feature set acquisition unit: configured to acquire the coordinates of the best auxiliary key pixels around the dense key points, construct a pixel set including the dense key points and the best auxiliary key pixels, superimpose the features of each point in the pixel set to obtain the features of the key points, and then acquire the key point feature set of each part; and

姿态分析单元：配置用于将各部位的关键点特征集输入到全连接网络中回归初人体姿态类别。Posture analysis unit: configured to input the key point feature set of each part into the fully connected network to regress the initial human posture category.

在一些具体的实施例中，关键点特征集获取单元具体配置为：以每个密集关键点为中心原点，计算xy平面上每个点的方位角θ和半径r，构建极坐标系，获取不同部位的部位所占像素的数量与部位关键点数量的比值k，利用最邻近算法获取中心原点周围的k个最佳辅助关键像素坐标；构建包括关键点与最佳辅助关键像素在内的像素集合{x₁,x₂,...,x_k}，对像素集合内的每个像素进行卷积运算，得到像素的特征f，最终获得特征集合{f₁,f₂,...，f_k}；利用线性函数计算每个辅助关键像素与关键点像素的线性相关性，其中，x_center为中心原点，σ为中心原点的影响距离，即半径r；利用核函数对像素集合内的像素进行特征变换，获得新的特征序列，将每个点的特征叠加作为关键点的特征In some specific embodiments, the key point feature set acquisition unit is specifically configured as follows: taking each dense key point as the center origin, calculating the azimuth θ and radius r of each point on the xy plane, constructing a polar coordinate system, obtaining the ratio k of the number of pixels occupied by different parts to the number of key points of the parts, and using the nearest neighbor algorithm to obtain the coordinates of k best auxiliary key pixels around the center origin; constructing a pixel set {x₁ ,x₂ ,...,x_k } including key points and best auxiliary key pixels, performing a convolution operation on each pixel in the pixel set to obtain the feature f of the pixel, and finally obtaining the feature set {f₁ ,f₂ ,...,f_k }; using a linear function Calculate the linear correlation between each auxiliary key pixel and the key point pixel, where x_center is the center origin and σ is the influence distance of the center origin, i.e., the radius r; use the kernel function Perform feature transformation on pixels in the pixel set to obtain a new feature sequence, and superimpose the features of each point as the features of the key point

本发明的基于密集关键点的人体姿态分析方法和系统对图像人体的关键部位特征有了很好地提取，可以有效的对每个关键点截取其周围的部分区域弥补了HOG与FasterR-CNN的缺点，可以与HOG与Faster R-CNN算法有很好地结合，同时，本发明很好的保持了物体的细节和几何特征并未出现畸变。The human body posture analysis method and system based on dense key points of the present invention have well extracted the key part features of the human body in the image, and can effectively intercept a partial area around each key point to make up for the shortcomings of HOG and Faster R-CNN, and can be well combined with HOG and Faster R-CNN algorithms. At the same time, the present invention well maintains the details and geometric features of the object without distortion.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

包括附图以提供对实施例的进一步理解并且附图被并入本说明书中并且构成本说明书的一部分。附图图示了实施例并且与描述一起用于解释本发明的原理。将容易认识到其它实施例和实施例的很多预期优点，因为通过引用以下详细描述，它们变得被更好地理解。通过阅读参照以下附图所作的对非限制性实施例所作的详细描述，本申请的其它特征、目的和优点将会变得更明显：The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated into and constitute a part of this specification. The accompanying drawings illustrate the embodiments and together with the description are used to explain the principles of the present invention. Other embodiments and many expected advantages of the embodiments will be readily appreciated as they become better understood by reference to the following detailed description. Other features, objects and advantages of the present application will become more apparent by reading the detailed description of the non-limiting embodiments made with reference to the following drawings:

图1是本申请的一个实施例的基于密集关键点的人体姿态分析方法流程图；FIG1 is a flow chart of a method for analyzing human body posture based on dense key points according to an embodiment of the present application;

图2是本申请的一个具体的实施例的人体姿态的分析方法流程图；FIG2 is a flow chart of a method for analyzing human body posture according to a specific embodiment of the present application;

图3是本申请的一个实施例的基于密集关键点的人体姿态分析系统的框架图；FIG3 is a framework diagram of a human body posture analysis system based on dense key points according to an embodiment of the present application;

图4是适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。FIG. 4 is a schematic diagram of the structure of a computer system of an electronic device suitable for implementing an embodiment of the present application.

具体实施方式Detailed ways

下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅仅用于解释相关发明，而非对该发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与有关发明相关的部分。The present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It is to be understood that the specific embodiments described herein are only used to explain the relevant invention, rather than to limit the invention. It should also be noted that, for ease of description, only the parts related to the relevant invention are shown in the accompanying drawings.

需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that, in the absence of conflict, the embodiments and features in the embodiments of the present application can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and in combination with the embodiments.

根据本申请的一个实施例的基于密集关键点的人体姿态分析方法，图1示出了根据本申请的实施例的基于密集关键点的人体姿态分析方法流程图。如图1所示，该方法包括：According to a human body posture analysis method based on dense key points according to an embodiment of the present application, FIG1 shows a flow chart of a human body posture analysis method based on dense key points according to an embodiment of the present application. As shown in FIG1 , the method includes:

S101：将包含人体的图片转换为二维像素坐标和三维颜色的特征向量，基于颜色和距离聚类构建局部区域并输入卷积网络中提取局部特征。S101: Convert the image containing the human body into a feature vector of two-dimensional pixel coordinates and three-dimensional color, construct a local area based on color and distance clustering, and input it into a convolutional network to extract local features.

在具体的实施例中，均匀分配m个聚类中心{C₁,C₂,...,C_m}进行初始化，通过颜色差值和距离差值作为度量标准进行聚类，层级迭代，最终得到m个聚类结果，利用m个聚类结果构建局部区域，输入到卷积网络中提取局部特征，其中，R_j、G_j、B_j和R_c、G_c、B_c分别表示j,c的RGB色值，X_j、Y_j和X_c、Y_c分别表示j,c的二维坐标，j,c代表图像中的二维像素。In a specific embodiment, m cluster centers {C₁ ,C₂ ,...,C_m } are uniformly allocated for initialization, and the color difference is calculated by and distance difference Clustering is performed as a metric and iterated hierarchically to finally obtain m clustering results. The m clustering results are used to construct local regions and input into the convolutional network to extract local features, where R_j , G_j , B_j and R_c , G_c , B_c represent the RGB color values of j and c respectively, X_j , Y_j and X_c , Y_c represent the two-dimensional coordinates of j and c respectively, and j and c represent the two-dimensional pixels in the image.

S102：利用多层感知机提取图片的全局特征，融合全局特征和局部特征到全连接网络中获得每个像素的标签并分割出部件，获取各部件的密集关键点。S102: Use a multi-layer perceptron to extract global features of the image, fuse the global features and local features into a fully connected network to obtain the label of each pixel and segment the components, and obtain dense key points of each component.

在具体的实施例中，各部件输入到特征金字塔网络中获取各部件的密集关键点。特征金字塔网络(FeaturePyramidNetwork,FPN)是为金字塔概念设计的特征提取器，设计时考虑到了精确性和速度。它代替了FasterR-CNN之类的检测模型的特征提取器，生成多层特征映射(多尺度特征映射)，信息的质量比普通的用于特征检测的特征金字塔更好。In a specific embodiment, each component is input into a feature pyramid network to obtain dense key points of each component. Feature Pyramid Network (FPN) is a feature extractor designed for the pyramid concept, with accuracy and speed taken into consideration. It replaces the feature extractor of detection models such as FasterR-CNN, generates multi-layer feature maps (multi-scale feature maps), and the quality of information is better than that of ordinary feature pyramids used for feature detection.

S103：获取密集关键点周围的最佳辅助关键像素坐标，构建包括密集关键点和最佳辅助关键像素的像素集合，叠加像素集合中的每个点的特征获得关键点的特征，进而获取各部位的关键点特征集。S103: Obtain the coordinates of the best auxiliary key pixels around the dense key points, construct a pixel set including the dense key points and the best auxiliary key pixels, superimpose the features of each point in the pixel set to obtain the features of the key points, and then obtain the key point feature set of each part.

在具体的实施例中，以每个密集关键点为中心原点，计算xy平面上每个点的方位角θ和半径r，构建极坐标系，获取不同部位的部位所占像素的数量与部位关键点数量的比值k，利用最邻近算法获取中心原点周围的k个最佳辅助关键像素坐标。构建包括关键点与最佳辅助关键像素在内的像素集合{x₁,x₂,...,x_k}，对像素集合内的每个像素进行卷积运算，得到像素的特征f，最终获得特征集合{f₁,f₂,...,f_k}。利用线性函数计算每个辅助关键像素与关键点像素的线性相关性，其中，x_center为中心原点，σ为中心原点的影响距离，即半径r。利用核函数对像素集合内的像素进行特征变换，获得新的特征序列，将每个点的特征叠加作为关键点的特征In a specific embodiment, each dense key point is taken as the center origin, the azimuth θ and radius r of each point on the xy plane are calculated, a polar coordinate system is constructed, the ratio k of the number of pixels occupied by different parts to the number of key points of the parts is obtained, and the coordinates of the k best auxiliary key pixels around the center origin are obtained using the nearest neighbor algorithm. A pixel set {x₁ ,x₂ ,...,x_k } including key points and best auxiliary key pixels is constructed, and a convolution operation is performed on each pixel in the pixel set to obtain the feature f of the pixel, and finally a feature set {f₁ ,f₂ ,...,f_k } is obtained. Using a linear function Calculate the linear correlation between each auxiliary key pixel and the key point pixel, where x_center is the center origin and σ is the influence distance of the center origin, that is, the radius r. Using the kernel function Perform feature transformation on pixels in the pixel set to obtain a new feature sequence, and superimpose the features of each point as the features of the key point

S104：将各部位的关键点特征集输入到全连接网络中回归初人体姿态类别。S104: Input the key point feature set of each part into the fully connected network to regress the initial human posture category.

本发明的一种基于密集关键点的人体姿态分析方法的关键点在于根据人体像素分布特点，将笛卡尔坐标系转换为基于关键点的笛卡尔坐标系，构建包括关键点与最佳辅助关键像素在内的像素集合，计算每个辅助关键像素与关键点像素的线性相关性并通过核函数对像素进行特征变换，最终将每个点的特征叠加起来，作为关键点特征，以此类推，构建关键点特征集，进而回归出人体姿态类别。The key point of the human posture analysis method based on dense key points of the present invention is to convert the Cartesian coordinate system into a Cartesian coordinate system based on key points according to the distribution characteristics of human body pixels, construct a pixel set including key points and optimal auxiliary key pixels, calculate the linear correlation between each auxiliary key pixel and the key point pixel and perform feature transformation on the pixel through the kernel function, and finally superimpose the features of each point as the key point feature, and so on, construct a key point feature set, and then regress the human posture category.

继续参考图2，图2示出了根据本申请的一个具体的实施例的人体姿态分析流程图，如图2所示，包括：Continuing to refer to FIG. 2 , FIG. 2 shows a flow chart of human posture analysis according to a specific embodiment of the present application, as shown in FIG. 2 , including:

S201：将输入图像转换为包含二维像素坐标和三维颜色的5维特征向量。S201: Convert the input image into a 5-dimensional feature vector including 2-dimensional pixel coordinates and 3-dimensional colors.

S202：均匀分配m个聚类中心{C₁,C₂,...,C_m}进行初始化，通过颜色差值和距离差值作为度量标准进行聚类，层级迭代，最终得到m个聚类结果。S202: Evenly distribute m cluster centers {C₁ ,C₂ ,...,C_m } for initialization, and use color difference and distance difference Clustering is performed as a metric, iteratively at each level, and finally m clustering results are obtained.

S203：利用聚类结果构建局部区域，输入到卷积网络中提取局部特征F_local。S203: Use the clustering results to construct a local region and input it into the convolutional network to extract the local feature F_local .

S204：将整张图片输入到多层感知机层中提取全局特征F_whole。S204: Input the entire image into a multi-layer perceptron layer to extract a global feature F_whole .

S205：将全局特征与局部特征融合输入到全连接网络中得到每个像素的标签，进而分割出部件。S205: The global features and local features are fused and input into a fully connected network to obtain a label for each pixel, and then the components are segmented.

S206：将每个部件输入到特征金字塔网络中得到每个部件的密集关键点。S206: Input each component into the feature pyramid network to obtain dense key points of each component.

S207：以每个密集关键点为中心原点X_center，不根据笛卡尔坐标系对点进行处理，而是计算XY平面上每个点的方位角θ和半径r。S207: Taking each dense key point as the center origin X_center , the points are not processed according to the Cartesian coordinate system, but the azimuth angle θ and radius r of each point on the XY plane are calculated.

S208：由于点密集地集中在以关键点为中心的中间网格像素单元，外围网格像素单元几乎没有人体像素，因此根据每个中心点方位角θ和半径r，构建极坐标系。S208: Since the points are densely concentrated in the middle grid pixel unit centered on the key point, and there are almost no human body pixels in the peripheral grid pixel units, a polar coordinate system is constructed according to the azimuth angle θ and radius r of each center point.

S209：根据每个部位所占像素的数量S除以部位关键点数量N，求出k＝S/N，确定K在不同部位的取值。S209: According to the number of pixels S occupied by each part divided by the number of key points N of the part, k=S/N is obtained, and the value of K in different parts is determined.

S210：通过KNN算法查询X_center周围的k个最佳辅助关键像素坐标{x₁，x₂,...,x_k}，从而减少像素冗余，降低后续特征计算量。S210: querying k best auxiliary key pixel coordinates {x₁ , x₂ , ..., x_k } around X_center by using KNN algorithm, thereby reducing pixel redundancy and reducing the amount of subsequent feature calculations.

S211：构建包括关键点与最佳辅助关键像素在内的像素集合{x₁，x₂，...,x_k}。对像素集合内的每个像素进行卷积运算，得到该像素的特征f，最后得到特征集合{f₁,f₂,...，f_k}。S211: Construct a pixel set {x₁ , x₂ , ..., x_k } including key points and optimal auxiliary key pixels. Perform a convolution operation on each pixel in the pixel set to obtain a feature f of the pixel, and finally obtain a feature set {f₁ , f₂ , ..., f_k }.

S212：计算每个辅助关键像素与关键点像素的线性相关性，线性函数为：σ为中心点的影响距离，即半径r。S212: Calculate the linear correlation between each auxiliary key pixel and the key point pixel, and the linear function is: σ is the influence distance of the center point, that is, the radius r.

S213：用核函数对像素集合内像素进行特征变换。S213: Using kernel function Perform feature transformation on pixels within a pixel set.

S214：对于像素集合内每个点，都用步骤113的方法，得出一个新的特征序列。S214: For each point in the pixel set, the method of step 113 is used to obtain a new feature sequence.

S215：将每个点的特征叠加起来，作为关键点的特征S215: Superimpose the features of each point as the features of the key point

S216：对于每个关键点都采取步骤107-115操作，进而得到每个部位的关键点特征集S216: For each key point, steps 107-115 are performed to obtain a key point feature set for each part.

S217：将出每个部位的关键点特征集输入到全连接网络回归出人体姿态类别。S217: Input the key point feature set of each part into the fully connected network to regress the human body posture category.

该人体姿态的估计方法根据人体像素分布特点，将笛卡尔坐标系转换为基于关键点的笛卡尔坐标系，根据每个部位所占像素的数量S除以部位关键点数量N，求出k＝S/N，确定K个最佳辅助关键像素坐标，计算每个辅助关键像素与关键点像素的线性相关性。通过核函数对像素进行特征变换，将像素集合内每个点的特征叠加起来作为关键点特征，有效地利用人体部位属性提高对姿态进行分析的精度。。This method for estimating human posture transforms the Cartesian coordinate system into a Cartesian coordinate system based on key points according to the distribution characteristics of human body pixels. According to the number of pixels S occupied by each part divided by the number of key points N of the part, k = S/N is obtained, the coordinates of K optimal auxiliary key pixels are determined, and the linear correlation between each auxiliary key pixel and the key point pixel is calculated. The pixel features are transformed by the kernel function, and the features of each point in the pixel set are superimposed as the key point features, effectively using the attributes of human body parts to improve the accuracy of posture analysis.

继续参考图3，图3示出了根据本申请的一个实施例的基于密集关键点的人体姿态分析系统的框架图。该系统具体包括特征提取单元301、密集关键点获取单元302、关键点特征集获取单元303和姿态分析单元304。3, which shows a framework diagram of a human posture analysis system based on dense key points according to an embodiment of the present application. The system specifically includes a feature extraction unit 301, a dense key point acquisition unit 302, a key point feature set acquisition unit 303 and a posture analysis unit 304.

在具体的实施例中，特征提取单元301配置用于将包含人体的图片转换为二维像素坐标和三维颜色的特征向量，基于颜色和距离聚类构建局部区域并输入卷积网络中提取局部特征；密集关键点获取单元302配置用于利用多层感知机提取图片的全局特征，融合全局特征和局部特征到全连接网络中获得每个像素的标签并分割出部件，将每个部件输入到特征金字塔网络中获取各部件的密集关键点；关键点特征集获取单元303配置用于获取密集关键点周围的最佳辅助关键像素坐标，构建包括密集关键点和最佳辅助关键像素的像素集合，叠加像素集合中的每个点的特征获得关键点的特征，进而获取各部位的关键点特征集；姿态分析单元304配置用于将各部位的关键点特征集输入到全连接网络中回归初人体姿态类别。In a specific embodiment, the feature extraction unit 301 is configured to convert a picture containing a human body into a feature vector of two-dimensional pixel coordinates and three-dimensional color, construct a local area based on color and distance clustering and input it into a convolutional network to extract local features; the dense key point acquisition unit 302 is configured to use a multi-layer perceptron to extract the global features of the picture, fuse the global features and local features into a fully connected network to obtain the label of each pixel and segment the components, and input each component into a feature pyramid network to obtain the dense key points of each component; the key point feature set acquisition unit 303 is configured to obtain the best auxiliary key pixel coordinates around the dense key points, construct a pixel set including dense key points and the best auxiliary key pixels, superimpose the features of each point in the pixel set to obtain the features of the key points, and then obtain the key point feature set of each part; the posture analysis unit 304 is configured to input the key point feature set of each part into the fully connected network to regress the initial human posture category.

在具体的实施例中，关键点特征集获取单元303具体配置为：以每个密集关键点为中心原点，计算xy平面上每个点的方位角θ和半径r，构建极坐标系，获取不同部位的部位所占像素的数量与部位关键点数量的比值k，利用最邻近算法获取中心原点周围的k个最佳辅助关键像素坐标；构建包括关键点与最佳辅助关键像素在内的像素集合{x₁,x₂，...,x_k}，对像素集合内的每个像素进行卷积运算，得到像素的特征f，最终获得特征集合{f₁,f₂,...,f_k}；利用线性函数计算每个辅助关键像素与关键点像素的线性相关性，其中，x_center为中心原点，σ为中心原点的影响距离，即半径r；利用核函数对像素集合内的像素进行特征变换，获得新的特征序列，将每个点的特征叠加作为关键点的特征In a specific embodiment, the key point feature set acquisition unit 303 is specifically configured as follows: taking each dense key point as the center origin, calculating the azimuth θ and radius r of each point on the xy plane, constructing a polar coordinate system, obtaining the ratio k of the number of pixels occupied by different parts to the number of key points of the parts, and using the nearest neighbor algorithm to obtain the coordinates of k best auxiliary key pixels around the center origin; constructing a pixel set {x₁ ,x₂ ,...,x_k } including key points and best auxiliary key pixels, performing a convolution operation on each pixel in the pixel set to obtain the feature f of the pixel, and finally obtaining the feature set {f₁ ,f₂ ,...,f_k }; using a linear function Calculate the linear correlation between each auxiliary key pixel and the key point pixel, where x_center is the center origin and σ is the influence distance of the center origin, i.e., the radius r; use the kernel function Perform feature transformation on pixels in the pixel set to obtain a new feature sequence, and superimpose the features of each point as the features of the key point

下面参考图4，其示出了适于用来实现本申请实施例的电子设备的计算机系统400的结构示意图。图4示出的电子设备仅仅是一个示例，不应对本申请实施例的功能和使用范围带来任何限制。Referring to Figure 4, a schematic diagram of a computer system 400 suitable for implementing an electronic device of an embodiment of the present application is shown. The electronic device shown in Figure 4 is only an example and should not limit the functions and scope of use of the embodiment of the present application.

如图4所示，计算机系统400包括中央处理单元(CPU)401，其可以根据存储在只读存储器(ROM)402中的程序或者从存储部分408加载到随机访问存储器(RAM)403中的程序而执行各种适当的动作和处理。在RAM 403中，还存储有系统400操作所需的各种程序和数据。CPU 401、ROM 402以及RAM 403通过总线404彼此相连。输入/输出(I/O)接口405也连接至总线404。As shown in FIG4 , the computer system 400 includes a central processing unit (CPU) 401, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 402 or a program loaded from a storage portion 408 into a random access memory (RAM) 403. Various programs and data required for the operation of the system 400 are also stored in the RAM 403. The CPU 401, the ROM 402, and the RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to the bus 404.

以下部件连接至I/O接口405：包括键盘、鼠标等的输入部分406；包括诸如液晶显示器(LCD)等以及扬声器等的输出部分407；包括硬盘等的存储部分408；以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分409。通信部分409经由诸如因特网的网络执行通信处理。驱动器410也根据需要连接至I/O接口405。可拆卸介质411，诸如磁盘、光盘、磁光盘、半导体存储器等等，根据需要安装在驱动器410上，以便于从其上读出的计算机程序根据需要被安装入存储部分408。The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, etc.; an output section 407 including a liquid crystal display (LCD), etc. and a speaker, etc.; a storage section 408 including a hard disk, etc.; and a communication section 409 including a network interface card such as a LAN card, a modem, etc. The communication section 409 performs communication processing via a network such as the Internet. A drive 410 is also connected to the I/O interface 405 as needed. A removable medium 411, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 410 as needed, so that a computer program read therefrom is installed into the storage section 408 as needed.

特别地，根据本公开的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本公开的实施例包括一种计算机程序产品，其包括承载在计算机可读存储介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信部分409从网络上被下载和安装，和/或从可拆卸介质411被安装。在该计算机程序被中央处理单元(CPU)401执行时，执行本申请的方法中限定的上述功能。需要说明的是，本申请的计算机可读存储介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中，计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读存储介质，该计算机可读存储介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读存储介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：无线、电线、光缆、RF等等，或者上述的任意合适的组合。In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart can be implemented as a computer software program. For example, an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable storage medium, and the computer program includes a program code for executing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from the network through the communication part 409, and/or installed from the removable medium 411. When the computer program is executed by the central processing unit (CPU) 401, the above functions defined in the method of the present application are executed. It should be noted that the computer-readable storage medium of the present application can be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. The computer-readable storage medium can be, for example, - but not limited to - an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to, an electrical connection with one or more conductors, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above. In the present application, a computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, an apparatus, or a device. In the present application, a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. A computer-readable signal medium may also be any computer-readable storage medium other than a computer-readable storage medium, which may send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, an apparatus, or a device. The program code contained on the computer-readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wireline, optical cable, RF, etc., or any suitable combination of the foregoing.

可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码，程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing the operations of the present application may be written in one or more programming languages or a combination thereof, including object-oriented programming languages, such as Java, Smalltalk, C++, and conventional procedural programming languages, such as "C" or similar programming languages. The program code may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., via the Internet using an Internet service provider).

附图中的流程图和框图，图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flow chart and block diagram in the accompanying drawings illustrate the possible architecture, function and operation of the system, method and computer program product according to various embodiments of the present application. In this regard, each box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function. It should also be noted that in some alternative implementations, the functions marked in the box can also occur in a sequence different from that marked in the accompanying drawings. For example, two boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each box in the block diagram and/or flow chart, and the combination of the boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.

描述于本申请实施例中所涉及到的模块可以通过软件的方式实现，也可以通过硬件的方式来实现。The modules involved in the embodiments of the present application may be implemented by software or hardware.

作为另一方面，本申请还提供了一种计算机可读存储介质，该计算机可读存储介质可以是上述实施例中描述的电子设备中所包含的；也可以是单独存在，而未装配入该电子设备中。上述计算机可读存储介质承载有一个或者多个程序，当上述一个或者多个程序被该电子设备执行时，使得该电子设备：将包含人体的图片转换为二维像素坐标和三维颜色的特征向量，基于颜色和距离聚类构建局部区域并输入卷积网络中提取局部特征；利用多层感知机提取图片的全局特征，融合全局特征和局部特征到全连接网络中获得每个像素的标签并分割出部件，获取各部件的密集关键点；获取密集关键点周围的最佳辅助关键像素坐标，构建包括密集关键点和最佳辅助关键像素的像素集合，叠加像素集合中的每个点的特征获得关键点的特征，进而获取各部位的关键点特征集；将各部位的关键点特征集输入到全连接网络中回归初人体姿态类别。As another aspect, the present application also provides a computer-readable storage medium, which may be included in the electronic device described in the above embodiment; or it may exist independently and not be assembled into the electronic device. The above computer-readable storage medium carries one or more programs. When the above one or more programs are executed by the electronic device, the electronic device: converts a picture containing a human body into a feature vector of two-dimensional pixel coordinates and three-dimensional color, constructs a local area based on color and distance clustering and inputs it into a convolutional network to extract local features; uses a multi-layer perceptron to extract the global features of the picture, fuses the global features and local features into a fully connected network to obtain the label of each pixel and segment the components, and obtains the dense key points of each component; obtains the coordinates of the best auxiliary key pixels around the dense key points, constructs a pixel set including dense key points and the best auxiliary key pixels, superimposes the features of each point in the pixel set to obtain the features of the key points, and then obtains the key point feature set of each part; inputs the key point feature set of each part into the fully connected network to regress the initial human posture category.

以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解，本申请中所涉及的发明范围，并不限于上述技术特征的特定组合而成的技术方案，同时也应涵盖在不脱离上述发明构思的情况下，由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present application and an explanation of the technical principles used. Those skilled in the art should understand that the scope of the invention involved in the present application is not limited to the technical solution formed by a specific combination of the above technical features, but should also cover other technical solutions formed by any combination of the above technical features or their equivalent features without departing from the above invention concept. For example, the above features are replaced with the technical features with similar functions disclosed in this application (but not limited to) by each other.

Claims

1. The human body posture analysis method based on the dense key points is characterized by comprising the following steps of:

S1: converting a picture containing a human body into a feature vector of two-dimensional pixel coordinates and three-dimensional colors, constructing a local area based on color and distance clustering, and inputting the local area into a convolution network to extract local features;

S2: extracting global features of the picture by using a multi-layer perceptron, fusing the global features and the local features into a fully-connected network to obtain labels of each pixel, and dividing the labels into components, wherein the components are input into a feature pyramid network to obtain dense key points of the components;

s3: acquiring the optimal auxiliary key pixel coordinates around the dense key points, constructing a pixel set comprising the dense key points and the optimal auxiliary key pixels, superposing the characteristics of each point in the pixel set to obtain the characteristics of the key points, and further acquiring a key point characteristic set of each part; and

S4: inputting the key point feature set of each part into a fully connected network to return out the human body posture category,

The S1 specifically comprises the following steps: evenly distributing m cluster centers { C₁,C₂,...,C_m } for initialization, and carrying out color difference valueSum-to-distance differenceClustering is carried out as a measurement standard, hierarchical iteration is carried out, m clustering results are finally obtained, a local area is constructed by utilizing the m clustering results, the local area is input into a convolution network, local features are extracted, wherein R_j、G_j、B_j and R_c、G_c、B_c respectively represent RGB color values of j and c, X_j、Y_j and X_c、Y_c respectively represent two-dimensional coordinates of j and c, and j and c represent two-dimensional pixels in an image;

The step S3 specifically comprises the following steps: calculating azimuth angle theta and radius r of each point on an xy plane by taking each dense key point as a central origin, constructing a polar coordinate system, acquiring the ratio k of the number of pixels occupied by different parts to the number of key points of the parts, and acquiring k optimal auxiliary key pixel coordinates around the central origin by using a nearest neighbor algorithm; constructing a pixel set { x₁,x₂,...,x_k } comprising the key points and the optimal auxiliary key pixels, carrying out convolution operation on each pixel in the pixel set to obtain a feature f of the pixel, and finally obtaining a feature set { f₁,f₂,...,f_k }; using linear functionsCalculating the linear correlation of each auxiliary key pixel and the key point pixel, wherein x_center is the center origin, and sigma is the influence distance of the center origin, namely the radius r; performing feature transformation on pixels in the pixel set by using a kernel function g (y_i)＝∑_kh(x_k-x_center,x_center) to obtain a new feature sequence, and overlapping features of each point as features of key points

2. A computer readable storage medium having stored thereon one or more computer programs, which when executed by a computer processor implement the method of claim 1.

3. A dense keypoint-based human posture analysis system, the system comprising:

feature extraction unit: the method is configured to convert a picture containing a human body into a feature vector of two-dimensional pixel coordinates and three-dimensional colors, construct a local area based on color and distance clustering, and input the local area into a convolution network to extract local features, and specifically comprises the following steps: evenly distributing m cluster centers { C₁,C₂,...,C_m } for initialization, and carrying out color difference valueSum-to-distance differenceClustering is carried out as a measurement standard, hierarchical iteration is carried out, m clustering results are finally obtained, a local area is constructed by utilizing the m clustering results, the local area is input into a convolution network, local features are extracted, wherein R_j、G_j、B_j and R_c、G_c、B_c respectively represent RGB color values of j and c, X_j、Y_j and X_c、Y_c respectively represent two-dimensional coordinates of j and c, and j and c represent two-dimensional pixels in an image;

Dense key point acquisition unit: the method comprises the steps of configuring a global feature for extracting the picture by using a multi-layer perceptron, fusing the global feature and the local feature into a fully-connected network to obtain a label of each pixel, dividing the label into parts, and inputting each part into a feature pyramid network to obtain dense key points of each part;

The key point feature set acquisition unit: the method comprises the steps of configuring the optimal auxiliary key pixel coordinates around the dense key points, constructing a pixel set comprising the dense key points and the optimal auxiliary key pixels, superposing the characteristics of each point in the pixel set to obtain the characteristics of the key points, and further obtaining a key point characteristic set of each part; and

Gesture analysis unit: the key point feature sets of all the parts are input into a fully-connected network to regress the human body posture types;

The key point feature set acquisition unit is specifically configured to: calculating azimuth angle theta and radius r of each point on an xy plane by taking each dense key point as a central origin, constructing a polar coordinate system, acquiring the ratio k of the number of pixels occupied by different parts to the number of key points of the parts, and acquiring k optimal auxiliary key pixel coordinates around the central origin by using a nearest neighbor algorithm; constructing a pixel set { x₁,x₂,...,x_k } comprising the key points and the optimal auxiliary key pixels, carrying out convolution operation on each pixel in the pixel set to obtain a feature f of the pixel, and finally obtaining a feature set { f₁,f₂,...,f_k }; using linear functionsCalculating the linear correlation of each auxiliary key pixel and the key point pixel, wherein x_center is the center origin, and sigma is the influence distance of the center origin, namely the radius r; performing feature transformation on pixels in the pixel set by using a kernel function g (y_i)＝∑_kh((x_k-x_center,x_center) to obtain a new feature sequence, and overlapping features of each point as features of key points