Movatterモバイル変換


[0]ホーム

URL:


CN112084913B - End-to-end human body detection and attribute identification method - Google Patents

End-to-end human body detection and attribute identification method
Download PDF

Info

Publication number
CN112084913B
CN112084913BCN202010889969.4ACN202010889969ACN112084913BCN 112084913 BCN112084913 BCN 112084913BCN 202010889969 ACN202010889969 ACN 202010889969ACN 112084913 BCN112084913 BCN 112084913B
Authority
CN
China
Prior art keywords
attribute
constraint
human body
attributes
correlation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010889969.4A
Other languages
Chinese (zh)
Other versions
CN112084913A (en
Inventor
陈爱国
赵太银
朱大勇
罗光春
谷俊霖
杨栋栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of ChinafiledCriticalUniversity of Electronic Science and Technology of China
Publication of CN112084913ApublicationCriticalpatent/CN112084913A/en
Application grantedgrantedCritical
Publication of CN112084913BpublicationCriticalpatent/CN112084913B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention provides an end-to-end human body detection and attribute identification method based on deep learning, and aims to improve the network operation efficiency and the generalization performance. The network structure is composed of two modules of target detection and human body attribute identification, and the target detection module is used for identifying and positioning human body objects. The human body attribute identification module is a multi-output network and is used for finishing the judgment of a plurality of human body attributes. The model can accurately detect a plurality of people in a real scene and detect the attributes of the people, and meanwhile, a method for guiding network training by taking attribute correlation as prior knowledge is further arranged by combining the characteristics of the model.

Description

Translated fromChinese
一种端到端的人体检测与属性识别方法An end-to-end method for human detection and attribute recognition

技术领域technical field

本发明涉及目标检测与人体属性识别领域,特别是现实场景中的人体属性识别。The invention relates to the field of target detection and human body attribute recognition, in particular to the human body attribute recognition in real scenes.

背景技术Background technique

人体属性识别是指对现实场景中人的性别、年龄、发型、着装等人体属性进行判断。这些属性在行人识别与检索方面都有很多的应用。比如在视频质量较差时对行人进行身份验证;在刑侦案件中,可以通过犯罪嫌疑人的外在属性在监控视频中检索相似的疑犯。Human attribute recognition refers to the judgment of human attributes such as gender, age, hairstyle, and clothing of people in real scenes. These attributes have many applications in pedestrian recognition and retrieval. For example, the identity of pedestrians is verified when the video quality is poor; in criminal investigation cases, similar suspects can be retrieved from surveillance videos through the external attributes of criminal suspects.

现有人体属性识别方法主要是将目标检测和属性识别作为两个独立任务,为两个任务分别搭建深度卷积神经网络进行学习,然后将两个网络进行串联,该类方法实现比较简单,但是由于是将检测与识别分为两阶段,导致计算过程过于冗余。The existing human attribute recognition methods mainly regard target detection and attribute recognition as two independent tasks, build a deep convolutional neural network for the two tasks respectively, and then connect the two networks in series. This kind of method is relatively simple to implement, but Since the detection and recognition are divided into two stages, the calculation process is too redundant.

发明内容SUMMARY OF THE INVENTION

本发明的发明目的在于:为了解决现有人体属性识别方法存在模型训练效率低、经验知识未被有效利用等问题,本发明提出一种基于多任务学习的端到端的人体检测与属性识别方法。使用一个神经网络同时实现人体检测与属性识别两个任务,能快速实现现实场景中的人体检测及其属性的识别,具有更好的运行效率与泛化性能。The purpose of the present invention is: in order to solve the problems of low model training efficiency and ineffective use of experience knowledge in the existing human attribute recognition methods, the present invention proposes an end-to-end human detection and attribute recognition method based on multi-task learning. Using a neural network to realize the two tasks of human body detection and attribute recognition at the same time can quickly realize human body detection and attribute recognition in real scenes, and has better operation efficiency and generalization performance.

本发明的端到端的人体检测与属性识别方法,包括下列步骤:The end-to-end human body detection and attribute recognition method of the present invention includes the following steps:

构建及训练人体检测与属性识别的多任务网络模型:Build and train a multi-task network model for human detection and attribute recognition:

所述多任务网络模型的网络结构包括:The network structure of the multi-task network model includes:

由卷积神经网络构成的特征提取器,用于提取输入图像的特征图;A feature extractor composed of a convolutional neural network is used to extract the feature map of the input image;

由分类器与回归器所构成的人体检测模块,输入为特征提取器提取的特征图,其中,分类器用于判断是否为人体,回归器用于预测人体位置;The human body detection module composed of the classifier and the regressor, the input is the feature map extracted by the feature extractor, wherein the classifier is used to judge whether it is a human body, and the regressor is used to predict the position of the human body;

由多个属性识别分支所构成的属性识别模块,其中,属性识别分支数与待识别的属性数量一致;将人体检测模块的回归器预测的人体位置按比例映射到特征图,再在特征图中提取对应映射后的人体位置的特征块,输入到各属性识别分支;An attribute identification module composed of multiple attribute identification branches, wherein the number of attribute identification branches is consistent with the number of attributes to be identified; the human body position predicted by the regressor of the human body detection module is proportionally mapped to the feature map, and then in the feature map Extract the feature block corresponding to the mapped human body position and input it to each attribute recognition branch;

即,本发明的特征提取器所提取的图像特征作为人体检测模块与属性识别模块的输入;That is, the image features extracted by the feature extractor of the present invention are used as the input of the human body detection module and the attribute recognition module;

设置用于训练所述多任务网络模型的训练数据集,并对训练数据集进行数据集预处理后进行网络模型训练,保存满足训练需求的多任务网络模型;Setting up a training data set for training the multi-task network model, performing network model training on the training data set after pre-processing the data set, and saving the multi-task network model that meets the training requirements;

且训练时,所采用的损失函数包括:And when training, the loss function used includes:

特征提取器包括批归一化正则化项,即卷积神经网络的正则化项;The feature extractor includes a batch normalization regularization term, that is, the regularization term of the convolutional neural network;

人体检测模块包括分类损失和回归损失;The human detection module includes classification loss and regression loss;

属性识别模块包括多任务损失和针对不同属性关系类型的约束函数;The attribute recognition module includes multi-task loss and constraint functions for different attribute relation types;

将待处理的图像输入到所保存的多任务网络模型,基于属性识别模块的网络输出值得到人体属性的识别结果。Input the image to be processed into the saved multi-task network model, and obtain the recognition result of human attributes based on the network output value of the attribute recognition module.

进一步的,数据集预处理方式包括:Further, the data set preprocessing methods include:

对人体检测数据集中不含人体对象的样本进行过滤;Filter samples that do not contain human objects in the human detection dataset;

属性识别数据集中针对缺省属性进行预置值处理。Preset value processing is performed for the default attribute in the attribute identification data set.

本发明的多任务网络模型的训练和推理方式分别为:The training and reasoning modes of the multi-task network model of the present invention are respectively:

训练时,使用不同的数据集训练各自任务的分支,共同反馈主干卷积网络;通过训练一个可以同时用于人体检测与属性识别的特征提取器,该特征提取器所获得的特征既可以用于人体检测任务,也可以用于属性识别任务;During training, use different datasets to train branches of their respective tasks, and jointly feed back the backbone convolutional network; by training a feature extractor that can be used for both human detection and attribute recognition, the features obtained by the feature extractor can be used for both. Human detection tasks can also be used for attribute recognition tasks;

推理时通过添加一条信息通道,连接人体检测与属性识别,实现人体的检测与属性识别的信息对接;本发明通过直接在特征图上截取特征块的方式减少冗余的卷积计算。During inference, an information channel is added to connect human body detection and attribute recognition, so as to realize the information connection between human body detection and attribute recognition; the present invention reduces redundant convolution calculation by directly intercepting feature blocks on the feature map.

进一步的,本发明基于属性之间的置信度来确定属性之间的相关关系,根据属性之间的相关关系的建立约束域以及构建约束函数:Further, the present invention determines the correlation between the attributes based on the confidence between the attributes, and establishes a constraint domain and a constraint function according to the correlation between the attributes:

通过设置一组阈值α,β来界定不同的属性间关系,α为正相关的下限,β为负相关的上限,通过调节α,β来调节满足属性间相关性的界定区间;将属性间的相关关系确定为正相关、单向正相关、负相关三类关系;By setting a set of thresholds α, β to define the relationship between different attributes, α is the lower limit of the positive correlation, β is the upper limit of the negative correlation, by adjusting α, β to adjust the defined interval that satisfies the correlation between attributes; The correlation relationship is determined as positive correlation, one-way positive correlation, and negative correlation;

并为不同的相关关系设置不同的约束函数,且约束函数满足让约束域以外的结果的代价越大,约束域以内的结果的代价越小;该约束函数包括一个用于调节约束函数的约束强度的参数λ,λ越大则约束强度越大。And set different constraint functions for different correlations, and the constraint function satisfies the greater the cost of the results outside the constraint domain, and the smaller the cost of the results within the constraint domain; the constraint function includes a constraint strength for adjusting the constraint function. The parameter λ of λ, the larger the λ, the stronger the constraint strength.

综上所述,由于采用了上述技术方案,本发明的有益效果是:识别处理效率高,泛化性能好。To sum up, due to the adoption of the above technical solutions, the present invention has the beneficial effects of high identification processing efficiency and good generalization performance.

附图说明Description of drawings

图1为本发明的人体属性识别系统的整体框架图;Fig. 1 is the overall frame diagram of the human body attribute recognition system of the present invention;

图2为本发明的多任务网络结构图;Fig. 2 is a multi-task network structure diagram of the present invention;

图3为人体属性识别在推理时的流程示意图。FIG. 3 is a schematic diagram of the process flow of human attribute recognition during inference.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚,下面结合实施方式和附图,对本发明作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the embodiments and accompanying drawings.

本发明包含一种人体检测与属性识别的多任务网络及其训练方法和推理方式,以及一个属性相关性分析模型。所要解决的技术问题是:如何将目标检测与人体属性识别两个任务进行有机的结合,设计高效的网络结构,巧妙地共享网络中的参数,减少重复的运算,以提高模型的时间效率。具体而言主要包括三个部分:神经网络的构建,属性相关性分析,训练与推理。The invention includes a multi-task network for human body detection and attribute recognition, a training method and a reasoning method, and an attribute correlation analysis model. The technical problem to be solved is: how to organically combine the two tasks of target detection and human attribute recognition, design an efficient network structure, skillfully share the parameters in the network, reduce repeated operations, and improve the time efficiency of the model. Specifically, it mainly includes three parts: construction of neural network, attribute correlation analysis, training and reasoning.

(1)神经网络的构建。(1) The construction of neural network.

本发明是一种基于多任务学习思想的端到端人体属性识别方法。本发明的实现依赖于构建深度神经网络。网络包含一个由卷积神经网络所构成的特征提取器,一个人体检测模块M1,一个属性识别模块M2。其中目标检测模块M1包含分类器和回归器两个部件。分类器的作用是输出类别标签,即对检测的对象进行判断是否为人;回归器是输出位置信息,即回归人体对象的准确位置。属性识别模块M2也被设计成一个多任务的学习网络,M2从主干卷积层获取特征以后,连接多个人体属性输出。The invention is an end-to-end human attribute recognition method based on the idea of multi-task learning. The implementation of the present invention relies on the construction of a deep neural network. The network includes a feature extractor composed of a convolutional neural network, a human body detection module M1 , and an attribute recognition module M2 . The target detection module M1 includes two parts: classifier and regressor. The function of the classifier is to output the category label, that is, to judge whether the detected object is a person; the regressor is to output the position information, that is, to return the accurate position of the human object. The attribute recognition module M2 is also designed as a multi-task learning network. After M2 obtains features from the main convolution layer, it connects multiple human attribute outputs.

卷积神经网络提取图像特征,M1与M2共享该特征,从逻辑上讲,M2需要等待M1获得位置框之后,将位置框映射到特征图上,截取对象的特征块作为对象的特征。The convolutional neural network extracts image features, which are shared by M1 andM2 . Logically,M2 needs to wait for M1 to obtain the position frame, map the position frame to the feature map, and intercept the feature blockof the object as the object's feature block. feature.

(2)属性相关性分析。(2) Attribute correlation analysis.

人体属性相关性分析,根据置信度的相关定义得出属性数据集中属性之间的置信度矩阵,然后设置一组定义属性关系的规则,根据规则,可以将人体属性之间的相关关系进行分类。再根据属性间的各类关系的特点,设置不同的关系约束函数。具体步骤如下。In the correlation analysis of human attributes, the confidence matrix between attributes in the attribute data set is obtained according to the relevant definition of confidence, and then a set of rules for defining attribute relationships are set. According to the rules, the correlation between human attributes can be classified. Then, according to the characteristics of various relationships between attributes, different relationship constraint functions are set. Specific steps are as follows.

1)统计属性识别数据集中不同属性出现的频率,根据公式(1)计算两两属性之间的置信度1) Statistical attributes identify the frequency of different attributes in the data set, and calculate the confidence between two attributes according to formula (1).

Confidence(X→Y)=P(Y|X) (1)Confidence(X→Y)=P(Y|X) (1)

其中,X,Y表示不同的人体属性,即X,Y之间的置信度表示在属性X存在的情况下,属性Y存在的条件概率。Among them, X and Y represent different human attributes, that is, the confidence between X and Y represents the conditional probability of the existence of attribute Y under the condition of the existence of attribute X.

2)以Confidence(X,Y)的值构成置信度矩阵T。2) Construct the confidence matrix T with the value of Confidence(X, Y).

3)以T为依据,设置一组定义关系的规则。如表1所示。α为预设的正相关的下限,β为预设的负相关的上限。3) Based on T, set a set of rules that define the relationship. As shown in Table 1. α is a preset lower limit of positive correlation, and β is a preset upper limit of negative correlation.

表1:属性相关性关系与规则定义Table 1: Attribute correlation relationship and rule definition

Figure BDA0002656608990000031
Figure BDA0002656608990000031

Figure BDA0002656608990000041
Figure BDA0002656608990000041

其中,Cfds()两两属性之间的置信度,即Confidence(X→Y)的简化表示。Among them, Cfds() is the confidence between the two attributes, that is, the simplified representation of Confidence(X→Y).

4)根据不同关系的特点,确定约束域D。4) Determine the constraint domain D according to the characteristics of different relationships.

5)根据不同的约束域确定不同的约束函数F。见表2。其中λ用于调节约束函数的约束强度,取值为经验值。5) Determine different constraint functions F according to different constraint domains. See Table 2. Among them, λ is used to adjust the constraint strength of the constraint function, and the value is the empirical value.

表2属性关系的约束域与约束函数Table 2 Constraint domain and constraint function of attribute relationship

Figure BDA0002656608990000042
Figure BDA0002656608990000042

其中,e表示自然底数,x,y分别表示属性X,Y的网络输出。Among them, e represents the natural base, and x and y represent the network output of attributes X and Y, respectively.

此外,为了进一步简化计算,约束函数F还可以采用下述方式:In addition, in order to further simplify the calculation, the constraint function F can also adopt the following methods:

对于,正相关,约束函数为:(x-y)2For positive correlation, the constraint function is: (xy)2 ;

对于单向正相关,约束函数为:x2+y2-xy-y;For one-way positive correlation, the constraint function is: x2 +y2 -xy-y;

对于负相关,约束函数为:x2+y2+xy-x-y。For negative correlation, the constraint function is: x2 +y2 +xy-xy.

在得到具有显著相关性的属性对之后,对于属性相关性约束项可表示为:

Figure BDA0002656608990000043
其中Nres表示属性对的个数,xi,yi分别表示两个属性的网络输出,L(xi,yi)为对应的约束函数。After obtaining the attribute pairs with significant correlation, the constraint term for attribute correlation can be expressed as:
Figure BDA0002656608990000043
Among them, Nres represents the number of attribute pairs, xi , yi represent the network outputs of the two attributes, respectively, and L(xi , yi ) is the corresponding constraint function.

6)将约束函数作属性识别模块的损失函数之间的一个约束项参与训练。6) The constraint function is used as a constraint between the loss functions of the attribute recognition module to participate in training.

需要说明的是约束域表示该关系下符合先验知识的结果,约束域以外则表示不符合先验知识的结果。其中,0表示不发生,1表示发生。比如(1,1)表示X发生,Y也发生。根据约束域设计对应的约束函数,约束函数作用于具有相关性的两个属性(x,y)之间,让约束域以外的结果的代价越大,约束域以内的结果则不作处理,并将这些约束函数作为整体损失函数的一个约束项参与训练。这样做的目的是将属性相关性作为一种先验知识来约束本发明模型的训练,这种约束类似于正则化约束,能引导模型的训练,在优化损失函数的过程中,参数倾向于选择满足约束的梯度减少的方向优化,使最终训练的模型倾向于符合该先验知识。It should be noted that the constraint domain represents the results that conform to the prior knowledge under the relationship, and the results outside the constraint domain represent the results that do not conform to the prior knowledge. Among them, 0 means no occurrence, 1 means occurrence. For example (1,1) means that X happens and Y also happens. The corresponding constraint function is designed according to the constraint domain. The constraint function acts between two attributes (x, y) with correlation, so that the cost of the results outside the constraint domain is greater, and the results within the constraint domain are not processed, and the These constraint functions participate in training as a constraint of the overall loss function. The purpose of this is to use the attribute correlation as a prior knowledge to constrain the training of the model of the present invention. This constraint is similar to the regularization constraint and can guide the training of the model. In the process of optimizing the loss function, the parameters tend to be selected. The direction of gradient reduction that satisfies the constraints is optimized so that the final trained model tends to conform to this prior knowledge.

(3)训练与推理。(3) Training and reasoning.

由于目前已知的目标检测数据集均不包含目标的属性信息,因此使用已有的目标检测数据集和属性识别数据集两个数据集作为训练数据。也因此网络的训练过程和推理过程存在一定的差异。Since none of the currently known target detection data sets contain the attribute information of the target, two data sets, the existing target detection data set and the attribute recognition data set, are used as training data. Therefore, there are certain differences between the training process and the inference process of the network.

训练时,用检测数据集训练检测模块,用属性识别数据集训练属性识别模块,且人体检测与属性识别网络之间并不存在同步关系,即属性识别模块的训练并不依赖于检测模块的结果。需要注意的是,本发明在训练属性模块的时候把属性数据集中标注的边界框在特征图上对应的映射块作为属性模块的训练数据,而不是使用边界框在原图上的截取块作为属性分支的训练数据。During training, use the detection data set to train the detection module, and use the attribute recognition data set to train the attribute recognition module, and there is no synchronization between the human body detection and the attribute recognition network, that is, the training of the attribute recognition module does not depend on the results of the detection module. . It should be noted that, when training the attribute module, the present invention uses the mapping block corresponding to the bounding box marked in the attribute data set on the feature map as the training data of the attribute module, instead of using the intercepted block of the bounding box on the original image as the attribute branch. training data.

推理时,则需要将目标检测与属性识别相结合,即属性识别模块会依赖于检测模块的输出。具体步骤如下:During inference, target detection and attribute recognition need to be combined, that is, the attribute recognition module will depend on the output of the detection module. Specific steps are as follows:

1)首先对整张图片输入主干卷积层,得到特征图;1) First, input the backbone convolution layer to the entire image to obtain the feature map;

2)通过目标检测模块获得人体的检测框,根据该检测框在特征图上对应的映射块作为每个人的卷积特征输入到属性识别模块;2) Obtain the detection frame of the human body through the target detection module, and input the corresponding mapping block on the feature map according to the detection frame to the attribute recognition module as the convolution feature of each person;

3)以上一步中的映射块作为对应人体对象的特征,分别输入到多类属性的识别器中进行识别。3) The mapping block in the previous step is used as the feature of the corresponding human object, and is respectively input into the multi-class attribute recognizer for recognition.

从上述步骤可知,一张图像中存在的多个人的特征均是从同一张特征图上获取,因此,对一整张图像仅做一次卷积,就同时得到图中多个人的特征。这样就很大程度上精简网络模型的计算规模,提高了网络的时间性能。It can be seen from the above steps that the features of multiple people in an image are obtained from the same feature map. Therefore, only one convolution is performed on the entire image to obtain the features of multiple people in the image at the same time. In this way, the computational scale of the network model is greatly simplified, and the time performance of the network is improved.

实施例Example

参见图1,本实施的具体实现过程包括:Referring to Figure 1, the specific implementation process of this implementation includes:

步骤101:获取包含人体的检测数据集。将多个含有人体对象的检测数据集中的,删除不含人类的样本,仅保留含有人体的样本,并将多个数据集的标注格式统一化,随机排列得到数据集DB1Step 101: Acquire a detection data set containing a human body. Deleting samples that do not contain humans in a plurality of detection data sets containing human objects, retaining only samples containing human objects, unifying the labeling formats of the multiple data sets, and randomly arranging the data set DB1 .

步骤102:获取人体属性识别数据集,将多个人体属性识别数据集进行属性对齐,即取所有数据集的属性集合的并集S,将该并集S中的属性作为整合之后的数据集的属性集,为数据集中缺失的属性设置缺省值,如-1。并将多个数据集的标注格式统一化,随机排列得到数据集DB2Step 102: Acquire the human body attribute recognition data set, and perform attribute alignment on the multiple human body attribute recognition data sets, that is, take the union S of the attribute sets of all the data sets, and use the attributes in the union S as the attributes of the integrated data set. Set of attributes, setting default values for attributes that are missing in the dataset, such as -1. Unify the labeling formats of multiple data sets, and randomly arrange to obtain the data set DB2 .

步骤103:将步骤101-102获得的两个数据集,通过删除一些尺寸偏大或偏小的样本,以保持图片尺寸规模在一个统一的范围内。Step 103: For the two data sets obtained in steps 101-102, delete some samples that are too large or too small to keep the size of the picture within a uniform range.

步骤101~103构成了数据集的预处理过程,将处理好的数据集用于网络训练。Steps 101 to 103 constitute the preprocessing process of the data set, and the processed data set is used for network training.

步骤201:构建卷积神经主干网络,图像通过该网络获得图像的特征图(featuremap)。特征图将用于后续人体检测模块的输入。Step 201 : constructing a convolutional neural backbone network, through which the image obtains a feature map of the image. The feature map will be used as input to the subsequent human detection module.

步骤202:建立分类器,用于识别检测对象是否为人类,分类器是一个二分类网络。Step 202: Establish a classifier for identifying whether the detection object is a human being, and the classifier is a binary classification network.

步骤203:建立回归器,用于预测人体对象的坐标位置。Step 203: Establish a regressor for predicting the coordinate position of the human object.

步骤204:采用多任务交叉熵损失,损失包含分类器的分类损失和回归器的回归损失。Step 204: Multi-task cross-entropy loss is adopted, and the loss includes the classification loss of the classifier and the regression loss of the regressor.

步骤201~204构成了本发明对人体检测模块的构建过程。结构见图2。Steps 201 to 204 constitute the construction process of the human body detection module of the present invention. The structure is shown in Figure 2.

步骤301:在属性数据集中统计获取人体属性间的置信度矩阵。Step 301: Statistically obtain a confidence matrix between human attributes in the attribute data set.

步骤302:根据置信度矩阵中的数据,定义属性关联性的规则。Step 302: According to the data in the confidence matrix, define a rule of attribute association.

步骤303:根据规则得出属性之间的关系。Step 303: Obtain the relationship between attributes according to the rule.

步骤304:根据属性之间的关系的特点,确定属性关联关系对应的约束函数。Step 304: Determine the constraint function corresponding to the attribute association relationship according to the characteristics of the relationship between the attributes.

步骤301和步骤304构成了属性关联性分析模型。该方法确立属性之间存在的相关关系。Steps 301 and 304 constitute an attribute correlation analysis model. This method establishes the correlations that exist between attributes.

步骤401:与步骤201共用卷积神经网络,获得图像的特征图。特征图将用于属性识别模块的输入。Step 401 : share the convolutional neural network withstep 201 to obtain the feature map of the image. The feature map will be used as input to the attribute recognition module.

步骤402:如果是训练,则根据属性数据标注信息获得人体位置坐标。如果是推理,则根据步骤203的结果获得人体位置信息。Step 402: If it is training, obtain the human body position coordinates according to the attribute data annotation information. If it is inference, obtain the human body position information according to the result ofstep 203 .

步骤403:将上一步获取的人体在原图上的位置信息,按照原图与特征图的比例,将位置P1缩放到特征图上的位置P2Step 403: The position information of the human body on the original image obtained in the previous step is scaled according to the ratio of the original image and the feature map to the position P2on the feature map.

步骤404:由上一步得到的位置P2,在特征图上的截图,得到特征块。将特征块直接作为对应人体对象的特征,输入到人体属性识别模块。流程图可见图3。Step 404: Obtain a feature block by taking a screenshot of the position P2 obtained in the previous step on the feature map. The feature block is directly used as the feature of the corresponding human object and input to the human body attribute recognition module. The flowchart can be seen in Figure 3.

步骤405:建立人体属性识别子网络,子网络的个数由步骤102中集合S中的元素个数决定。Step 405 : establish a sub-network for identifying human attributes, and the number of the sub-network is determined by the number of elements in the set S instep 102 .

步骤406:获得人体属性值。如果是推理,步骤401-406就完成了人体属性识别的全过程。如果是训练,则还包括步骤407。Step 406: Obtain the attribute value of the human body. If it is inference, steps 401-406 complete the whole process of human attribute recognition. If it is training,step 407 is also included.

步骤407:根据步骤303-304中对人体属性相关性分析的结论,将约束函数作为多任务损失函数的一个约束项参与训练。Step 407: According to the conclusions of the correlation analysis of human body attributes in steps 303-304, the constraint function is used as a constraint item of the multi-task loss function to participate in the training.

步骤401~407构成了本发明的属性识别模块。该模块以人体检测模块的输出为输入,获得人体属性的预测结果。Steps 401 to 407 constitute the attribute identification module of the present invention. This module takes the output of the human detection module as input, and obtains the prediction results of human attributes.

其中,在属性识别处理时,将数据集中某张图像中被标记的对象及其标签标示为:

Figure BDA0002656608990000071
其中xi表示该图像中第i(i=1,2,…,n)个被标注的人体对象,
Figure BDA0002656608990000072
为相应的属性标签,n表示标注的人体对象数量,定义
Figure BDA0002656608990000073
m=1,…,M,M表示属性量数量。Among them, during attribute recognition processing, the marked objects and their labels in an image in the dataset are marked as:
Figure BDA0002656608990000071
where xi represents the i-th (i=1,2,...,n) labeled human object in the image,
Figure BDA0002656608990000072
is the corresponding attribute label, n represents the number of labeled human objects, and defines
Figure BDA0002656608990000073
m=1,...,M, where M represents the number of attributes.

属性损失函数采用交叉熵损失,则一个人体对象的损失函数可以表示为:The attribute loss function adopts cross entropy loss, and the loss function of a human object can be expressed as:

Figure BDA0002656608990000074
Figure BDA0002656608990000074

其中,yi表示真实的属性标签,且yi=[ai1,ai2,…,aiM]。Among them,yi represents the real attribute label, andyi =[ai1 ,ai2 ,...,aiM ].

定义Ldet表示人体检测模块的目标检测损失(分类损失和回归损失之和),参数μ表示平衡系数,则本发明的人体检测与属性识别的多任务网络模型的损失函数可以表示为:

Figure BDA0002656608990000075
训练时,通过最小化Lossjoint来获取最佳的预测值。Define Ldet to represent the target detection loss (the sum of classification loss and regression loss) of the human body detection module, and the parameter μ to represent the balance coefficient, then the loss function of the multi-task network model for human detection and attribute recognition of the present invention can be expressed as:
Figure BDA0002656608990000075
During training, the best predicted value is obtained by minimizing the Lossjoint .

上所述,仅为本发明的具体实施方式,本说明书中所公开的任一特征,除非特别叙述,均可被其他等效或具有类似目的的替代特征加以替换;所公开的所有特征、或所有方法或过程中的步骤,除了互相排斥的特征和/或步骤以外,均可以任何方式组合。The above are only specific embodiments of the present invention, and any feature disclosed in this specification, unless otherwise stated, can be replaced by other equivalent or alternative features with similar purposes; all the disclosed features, or All steps in a method or process, except mutually exclusive features and/or steps, may be combined in any way.

Claims (2)

Translated fromChinese
1.一种端到端的人体检测与属性识别方法,其特征在于,包括下列步骤:1. an end-to-end human detection and attribute recognition method, is characterized in that, comprises the following steps:构建及训练人体检测与属性识别的多任务网络模型:Build and train a multi-task network model for human detection and attribute recognition:所述多任务网络模型的网络结构包括:The network structure of the multi-task network model includes:由卷积神经网络构成的特征提取器,用于提取输入图像的特征图;A feature extractor composed of a convolutional neural network is used to extract the feature map of the input image;由分类器与回归器所构成的人体检测模块,输入为特征提取器提取的特征图,其中,分类器用于判断是否为人体,回归器用于预测人体位置;The human body detection module composed of the classifier and the regressor, the input is the feature map extracted by the feature extractor, wherein the classifier is used to judge whether it is a human body, and the regressor is used to predict the position of the human body;由多个属性识别分支所构成的属性识别模块,其中,属性识别分支数与待识别的属性数量一致;将人体检测模块的回归器预测的人体位置按比例映射到特征图,再在特征图中提取对应映射后的人体位置的特征块,输入到各属性识别分支;An attribute identification module composed of multiple attribute identification branches, wherein the number of attribute identification branches is consistent with the number of attributes to be identified; the human body position predicted by the regressor of the human body detection module is proportionally mapped to the feature map, and then in the feature map Extract the feature block corresponding to the mapped human body position and input it to each attribute recognition branch;设置用于训练所述多任务网络模型的训练数据集,并对训练数据集进行数据集预处理后进行网络模型训练,保存满足训练需求的多任务网络模型;Setting up a training data set for training the multi-task network model, performing data set preprocessing on the training data set, performing network model training, and saving the multi-task network model that meets the training requirements;且训练时,所采用的损失函数包括:And when training, the loss function used includes:特征提取器包括批归一化正则化项,即卷积神经网络的正则化项;The feature extractor includes a batch normalization regularization term, that is, the regularization term of the convolutional neural network;人体检测模块包括分类损失和回归损失;The human detection module includes classification loss and regression loss;属性识别模块包括多任务损失和针对不同属性关系类型的约束函数;The attribute recognition module includes multi-task loss and constraint functions for different attribute relation types;其中,基于属性之间的置信度来确定属性之间的相关关系,根据属性之间的相关关系的建立约束域以及构建约束函数:Among them, the correlation between the attributes is determined based on the confidence between the attributes, and the constraint domain and the constraint function are established according to the correlation between the attributes:计算任意两两属性之间的置信度,并基于一组阈值α,β将属性间的相关关系确定为正相关、单向正相关和负相关三类相关关系;其中α为正相关的下限,β为负相关的上限,通过调节α,β来调节满足属性间相关性的界定区间;Calculate the confidence between any pair of attributes, and determine the correlation between attributes based on a set of thresholds α, β as three types of correlation: positive correlation, one-way positive correlation and negative correlation; where α is the lower limit of positive correlation, β is the upper limit of negative correlation, by adjusting α, β to adjust the bounded interval that satisfies the correlation between attributes;为三类相关关系设置不同的约束函数,且约束函数满足让约束域以外的结果的代价越大,约束域以内的结果的代价越小;该约束函数包括一个用于调节约束函数的约束强度的参数λ,λ越大则约束强度越大;Set different constraint functions for the three types of correlations, and the constraint functions satisfy the cost of the results outside the constraint domain, and the cost of the results within the constraint domain is smaller; the constraint function includes a constraint function for adjusting the constraint strength of the constraint function. The parameter λ, the larger the λ, the greater the constraint strength;将待处理的图像输入到所保存的多任务网络模型,基于属性识别模块的网络输出值得到人体属性的识别结果。Input the image to be processed into the saved multi-task network model, and obtain the recognition result of human attributes based on the network output value of the attribute recognition module.2.如权利要求1所述的方法,其特征在于,数据集预处理方式包括:2. The method of claim 1, wherein the data set preprocessing method comprises:对人体检测数据集中不含人体对象的样本进行过滤;Filter samples that do not contain human objects in the human detection dataset;属性识别数据集中针对缺省属性进行预置值处理。Preset value processing is performed for the default attribute in the attribute identification data set.
CN202010889969.4A2020-08-152020-08-28End-to-end human body detection and attribute identification methodActiveCN112084913B (en)

Applications Claiming Priority (2)

Application NumberPriority DateFiling DateTitle
CN2020108217652020-08-15
CN20201082176572020-08-15

Publications (2)

Publication NumberPublication Date
CN112084913A CN112084913A (en)2020-12-15
CN112084913Btrue CN112084913B (en)2022-07-29

Family

ID=73729330

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010889969.4AActiveCN112084913B (en)2020-08-152020-08-28End-to-end human body detection and attribute identification method

Country Status (1)

CountryLink
CN (1)CN112084913B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112926427A (en)*2021-02-182021-06-08浙江智慧视频安防创新中心有限公司Target user dressing attribute identification method and device
CN113344048B (en)*2021-05-252025-03-14上海商汤智能科技有限公司 Multi-task behavior recognition model training method, device, equipment and storage medium
CN114283449A (en)*2021-12-222022-04-05北京市商汤科技开发有限公司Security state detection method and device
CN115131825B (en)*2022-07-142024-12-03北京百度网讯科技有限公司Human body attribute identification method and device, electronic equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US11087130B2 (en)*2017-12-292021-08-10RetailNext, Inc.Simultaneous object localization and attribute classification using multitask deep neural networks
CN108510000B (en)*2018-03-302021-06-15北京工商大学 Detection and recognition method of fine-grained attributes of pedestrians in complex scenes
CN111191526B (en)*2019-12-162023-10-10汇纳科技股份有限公司Pedestrian attribute recognition network training method, system, medium and terminal
CN111178251B (en)*2019-12-272023-07-28汇纳科技股份有限公司Pedestrian attribute identification method and system, storage medium and terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
关联规则对监控下行人属性识别影响的研究;李雪;《计算机与现代化》;20190430;第65-71页*

Also Published As

Publication numberPublication date
CN112084913A (en)2020-12-15

Similar Documents

PublicationPublication DateTitle
CN112084913B (en)End-to-end human body detection and attribute identification method
CN112818862B (en)Face tampering detection method and system based on multi-source clues and mixed attention
CN112800903B (en)Dynamic expression recognition method and system based on space-time diagram convolutional neural network
CN110516536B (en) A Weakly Supervised Video Behavior Detection Method Based on Complementarity of Temporal Category Activation Maps
CN111523462B (en)Video sequence expression recognition system and method based on self-attention enhanced CNN
CN109118467A (en)Based on the infrared and visible light image fusion method for generating confrontation network
CN113076905B (en)Emotion recognition method based on context interaction relation
CN110222560A (en)A kind of text people search's method being embedded in similitude loss function
CN106339719A (en)Image identification method and image identification device
CN110929099B (en)Short video frame semantic extraction method and system based on multi-task learning
CN113705596A (en)Image recognition method and device, computer equipment and storage medium
CN115393944B (en) A micro-expression recognition method based on multi-dimensional feature fusion
CN112348007B (en)Optical character recognition method based on neural network
CN110705379A (en) An expression recognition method based on multi-label learning convolutional neural network
CN108985200A (en)A kind of In vivo detection algorithm of the non-formula based on terminal device
CN114842238A (en)Embedded mammary gland ultrasonic image identification method
CN115860152B (en)Cross-modal joint learning method for character military knowledge discovery
CN110490189A (en)A kind of detection method of the conspicuousness object based on two-way news link convolutional network
CN111476174A (en)Face image-based emotion recognition method and device
CN111401149A (en)Lightweight video behavior identification method based on long-short-term time domain modeling algorithm
CN110458132A (en) An End-to-End Text Recognition Method of Indefinite Length
CN107169996A (en)Dynamic human face recognition methods in a kind of video
CN113065520B (en) A remote sensing image classification method for multimodal data
Li et al.A multiscale dilated dense convolutional network for saliency prediction with instance-level attention competition
Singla et al.Age and gender detection using Deep Learning

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp