CN108090406B

Movatterモバイル変換

Info

Publication number: CN108090406B
Application number: CN201611048348.3A
Authority: CN
Inventors: 葛主贝
Original assignee: Zhejiang Uniview Technologies Co Ltd
Current assignee: Zhejiang Uniview Technologies Co Ltd
Priority date: 2016-11-23
Filing date: 2016-11-23
Publication date: 2022-03-11
Anticipated expiration: 2036-11-23
Also published as: CN108090406A

Abstract

Translated fromChinese

本申请提供一种人脸识别方法及系统，该方法包括：经CNN网络提取不同场景下相机拍摄的人脸图像的人脸特征；计算各人脸特征与各预设的待布控人脸特征的相似度并排序；若最大相似度大于预设告警阈值，记录各相机对应的三元特征；当各相机对应的三元特征总数达到预设数量时，将该相机对应的三元特征输入该相机对应的二次微调网络进行自训练，获得该相机对应的微调模型，在下一次对该相机所拍摄的人脸图像进行人脸特征提取时，将该相机拍摄的人脸图像依次输入CNN网络、该相机当前的微调模型，获得所述人脸图像的人脸特征。本申请适应于各个相机环境，能够不断提升识别率。

The present application provides a face recognition method and system. The method includes: extracting face features of face images captured by cameras in different scenarios through a CNN network; calculating the difference between each face feature and each preset face feature to be controlled Similarity and sorting; if the maximum similarity is greater than the preset alarm threshold, record the ternary features corresponding to each camera; when the total number of ternary features corresponding to each camera reaches the preset number, input the ternary features corresponding to the camera into the camera The corresponding secondary fine-tuning network is self-trained to obtain the fine-tuning model corresponding to the camera. When the face image captured by the camera is extracted next time, the face images captured by the camera are sequentially input into the CNN network, the The current fine-tuning model of the camera to obtain the facial features of the facial image. The present application is suitable for each camera environment, and can continuously improve the recognition rate.

Description

Translated fromChinese

人脸识别方法及系统Face recognition method and system

技术领域technical field

本申请涉及图像处理技术领域，尤其涉及一种人脸识别方法及系统。The present application relates to the technical field of image processing, and in particular, to a face recognition method and system.

背景技术Background technique

人脸识别系统在各领域如互联网、监控、金融、公安、学校和监所等都有着广泛的应用，其主要采用人脸检测、矫正、特征提取和特征比对等技术对人脸进行识别，以达到布控、人员验证等功能。Face recognition systems are widely used in various fields such as the Internet, surveillance, finance, public security, schools, and prisons. They mainly use face detection, correction, feature extraction, and feature comparison to recognize faces. In order to achieve the functions of control and personnel verification.

目前，人脸识别系统中的特征提取一般采用大样本离线模型训练，以神经网络为例，首先需要准备大量的人脸样本，然后设计网络模型，将上述人脸样本作为该网络模型的输入，由该网络模型训练出一个特征提取模型。At present, the feature extraction in the face recognition system generally adopts the large sample offline model training. Taking the neural network as an example, it is necessary to prepare a large number of face samples first, and then design the network model, and use the above face samples as the input of the network model. A feature extraction model is trained from the network model.

考虑到特征模型需要应用在各个场景，各场景中又存在不同的图像采集环境，如光照、相机成像质量和相机位置角度等，在人脸样本准备时就需要采集各类不同场景中的各类人脸样本，样本量通常是几十万到几百万量级，从而给人脸样本收集带来了巨大的困难。而要拟合如此庞大的样本量，网络模型也会随之加大加深，模型参数增多，从而导致特征提取非常耗时。且训练出的特征提取模型也无法高效适用于如此多的实际环境场景，整个系统综合各个场景的人脸识别率表现一般。Considering that the feature model needs to be applied in each scene, and there are different image acquisition environments in each scene, such as illumination, camera imaging quality and camera position angle, etc., it is necessary to collect various types of face samples in different scenes when preparing face samples. For face samples, the sample size is usually in the order of hundreds of thousands to several million, which brings huge difficulties to the collection of face samples. To fit such a large sample size, the network model will also increase and deepen, and the model parameters will increase, resulting in very time-consuming feature extraction. Moreover, the trained feature extraction model cannot be efficiently applied to so many actual environmental scenarios, and the face recognition rate of the whole system in each scenario is average.

基于Gabor小波和模型自适应的鲁棒人脸识别方法，主要将采集到的真实环境人脸图像重新训练映射矩阵，并与原始数据算练出的映射矩阵做组合，更新人脸特征模型，以提升人脸系统对环境的鲁棒性，提升识别率。该方法使用传统特征Gabor小波提取作为人脸描述特征，特征描述能力有限，且只能识别被训练过的人脸(需要多张)，无法支持单张底图入库并与抓拍图像进行相似度比对。模型调整时，将所有新收集样本进行训练得到加性的映射矩阵，该模型的自适应只能提高该类的识别率，对其他类无迁移提升作用。The robust face recognition method based on Gabor wavelet and model adaptation mainly retrains the mapping matrix from the collected real-world face images, and combines it with the mapping matrix calculated from the original data to update the face feature model to Improve the robustness of the face system to the environment and improve the recognition rate. This method uses traditional feature Gabor wavelet extraction as face description features, which has limited feature description ability, and can only identify trained faces (multiple images are required), and cannot support a single base map to be stored in the database and compare the similarity with the captured image. Comparison. When the model is adjusted, all newly collected samples are trained to obtain an additive mapping matrix. The adaptation of the model can only improve the recognition rate of this class, and has no effect on other classes.

底图自动更新的人脸识别方法及装置，将新采集的人脸图像，同识别库中的质量最差的底图更换，以达到底图质量不断提升，从而提升下次的识别率。该方法及装置需要录入多张人脸图像，但目前各人脸系统一般只提供一张证件照，适用领域有限。该方法及装置通过系统的长时间运行，不断对被识别用户进行底图更换，仅能提升该经常被识别用户的识别率，而对其他用户无识别率提升作用。The face recognition method and device for automatically updating the base image replace the newly collected face image with the base image with the worst quality in the recognition library, so as to continuously improve the quality of the base image, thereby improving the next recognition rate. The method and device need to input multiple face images, but at present, each face system generally only provides one ID photo, and the applicable fields are limited. The method and device continuously change the base map of the identified user through the long-term operation of the system, which can only improve the recognition rate of the frequently identified user, but has no effect on improving the recognition rate of other users.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本申请提供一种人脸识别方法及系统，以解决现有技术中的人脸识别系统无法随环境自适应进行人脸识别的问题。In view of this, the present application provides a face recognition method and system to solve the problem that the face recognition system in the prior art cannot perform face recognition adaptively with the environment.

具体地，本申请是通过如下技术方案实现的：Specifically, the application is achieved through the following technical solutions:

根据本申请的第一方面，提供一种人脸识别方法，应用于人脸识别系统，所述人脸识别系统包括CNN网络和针对各相机分别构建的二次微调网络，所述方法包括：According to the first aspect of the present application, a face recognition method is provided, which is applied to a face recognition system. The face recognition system includes a CNN network and a secondary fine-tuning network respectively constructed for each camera. The method includes:

将位于不同场景的相机拍摄的人脸图像输入CNN网络进行人脸特征提取；Input the face images captured by cameras in different scenes into the CNN network for face feature extraction;

计算各人脸特征与各预设的待布控人脸特征的相似度并对各人脸特征对应的相似度分别进行排序；Calculate the similarity between each face feature and each preset face feature to be controlled, and sort the similarity corresponding to each face feature respectively;

判断各人脸特征的最大相似度是否大于预设告警阈值；Determine whether the maximum similarity of each facial feature is greater than the preset alarm threshold;

若是，则记录各相机对应的三元特征，所述三元特征包括该相机抓拍的最大相似度大于预设告警阈值的人脸特征、与该人脸特征相似度最大的待布控人脸特征以及与该人脸特征相似度第二的待布控人脸特征；If so, record the ternary features corresponding to each camera, and the ternary features include the facial features whose maximum similarity captured by the camera is greater than the preset alarm threshold, the facial features to be controlled with the greatest similarity with the facial features, and The face feature to be controlled that has the second similarity with the face feature;

当针对某一相机所记录的三元特征的总数量达到预设数量时，则将该相机对应的三元特征输入该相机对应的二次微调网络进行自训练，获得该相机对应的微调模型；When the total number of ternary features recorded for a certain camera reaches the preset number, the ternary features corresponding to the camera are input into the secondary fine-tuning network corresponding to the camera for self-training, and the fine-tuning model corresponding to the camera is obtained;

在下一次对该相机所拍摄的人脸图像进行人脸特征提取时，将该相机拍摄的人脸图像依次输入CNN网络、该相机当前的微调模型，获得所述人脸图像的人脸特征。The next time the face image captured by the camera is subjected to face feature extraction, the face image captured by the camera is sequentially input to the CNN network and the current fine-tuning model of the camera to obtain the face feature of the face image.

可选地，所述二次微调自训练网络包括：Optionally, the secondary fine-tuning self-training network includes:

构建各相机对应的全连接层，将各相机对应的三元特征中的人脸特征输入该相机对应的全连接层；Construct a fully connected layer corresponding to each camera, and input the face features in the ternary features corresponding to each camera into the fully connected layer corresponding to the camera;

将各相机对应的全连接层输出的特征以及相应相机三元特征中相似度最大的待布控人脸特征、相似度第二的待布控人脸特征输入Triplet Loss层学习，获得各相机对应的全连接层的参数作为该相机对应的微调模型参数。Input the output features of the fully connected layer corresponding to each camera, the face features to be deployed with the largest similarity among the three-dimensional features of the corresponding cameras, and the face features to be deployed with the second similarity into the Triplet Loss layer for learning to obtain the full corresponding to each camera. The parameters of the connection layer are used as the fine-tuning model parameters corresponding to the camera.

可选地，所述CNN网络的训练过程包括：Optionally, the training process of the CNN network includes:

设计CNN网络结构层对多个带有标签的人脸样本进行特征的初步提取；Design the CNN network structure layer to perform preliminary feature extraction on multiple labeled face samples;

将CNN网络结构层提取的特征输入Softmax Loss层进行分类训练，获得CNN网络的参数。The features extracted by the CNN network structure layer are input into the Softmax Loss layer for classification training, and the parameters of the CNN network are obtained.

可选地，将各相机拍摄的人脸图像输入CNN网络进行人脸特征提取前，对各相机拍摄的人脸图像进行人脸检测、人脸矫正和图像预处理，获取各相机拍摄的人脸图像的人脸区域。Optionally, before inputting the face images captured by each camera into the CNN network for facial feature extraction, face detection, face correction and image preprocessing are performed on the face images captured by each camera, and the faces captured by each camera are obtained. The face area of the image.

可选地，所述处理器在布控时间进行人脸特征的提取、相似度的对比以及三元特征的记录，所述处理器在非布控时间进行各相机对应的二次微调网络的自训练。Optionally, the processor performs extraction of facial features, comparison of similarities, and recording of ternary features during deployment time, and the processor performs self-training of the secondary fine-tuning network corresponding to each camera during non- deployment time.

根据本申请的第二方面，提供一种人脸识别系统，所述人脸识别系统包括CNN网络和针对各相机分别构建的二次微调网络，所述人脸识别系统还包括：According to a second aspect of the present application, a face recognition system is provided, the face recognition system includes a CNN network and a secondary fine-tuning network respectively constructed for each camera, and the face recognition system further includes:

样本获取模块，将位于不同场景的相机拍摄的人脸图像输入CNN网络进行人脸特征提取；The sample acquisition module inputs the face images captured by cameras located in different scenes into the CNN network for face feature extraction;

计算排序模块，计算各人脸特征与各预设的待布控人脸特征的相似度并对各人脸特征对应的相似度分别进行排序；A calculation sorting module, which calculates the similarity between each face feature and each preset face feature to be deployed, and sorts the similarity corresponding to each face feature respectively;

判断模块，判断各人脸特征的最大相似度是否大于预设告警阈值；a judgment module to judge whether the maximum similarity of each face feature is greater than a preset alarm threshold;

记录模块，当各人脸特征的最大相似度大于预设告警阈值时，则记录各相机对应的三元特征，所述三元特征包括该相机拍摄的最大相似度大于预设告警阈值的人脸特征、与该人脸特征相似度最大的待布控人脸特征以及与该人脸特征相似度第二的待布控人脸特征；Recording module, when the maximum similarity of each facial feature is greater than the preset alarm threshold, then record the ternary features corresponding to each camera, the ternary features include the faces captured by the camera with the maximum similarity greater than the preset alarm threshold feature, the face feature to be deployed with the greatest similarity with the face feature, and the face feature to be deployed with the second similarity with the face feature;

二次微调模块，当针对某一相机所记录的三元特征的总数量达到预设数量时，则将该相机对应的三元特征输入该相机对应的二次微调网络进行自训练，获得该相机对应的微调模型；The secondary fine-tuning module, when the total number of ternary features recorded for a certain camera reaches the preset number, the ternary features corresponding to the camera are input into the secondary fine-tuning network corresponding to the camera for self-training, and the camera is obtained. Corresponding fine-tuning model;

人脸识别模块，在下一次对该相机所拍摄的人脸图像进行人脸特征提取时，将该相机拍摄的人脸图像依次输入CNN网络、该相机当前的微调模型，获得所述人脸图像的人脸特征。The face recognition module, when performing facial feature extraction on the face image captured by the camera next time, sequentially inputs the face image captured by the camera into the CNN network and the current fine-tuning model of the camera, and obtains the face image of the face image. facial features.

可选地，所述二次微调网络包括：Optionally, the secondary fine-tuning network includes:

各相机对应的全连接层，接收相应相机三元特征中的人脸特征；The fully connected layer corresponding to each camera receives the face features in the ternary features of the corresponding camera;

Triplet Loss层，接收各相机对应的全连接层输出的特征以及相应相机三元特征中相似度最大的待布控人脸特征、相似度第二的待布控人脸特征并进行学习，获得各相机对应的全连接层的参数作为该相机对应的微调模型参数。Triplet Loss layer, which receives the output features of the fully connected layer corresponding to each camera, the face feature to be deployed with the largest similarity among the corresponding camera triplet features, and the face feature to be deployed with the second similarity, and learns to obtain the corresponding features of each camera. The parameters of the fully connected layer are used as the fine-tuning model parameters corresponding to the camera.

可选地，所述CNN网络离线训练过程包括：Optionally, the offline training process of the CNN network includes:

由CNN网络结构层对多个带有标签的人脸样本进行特征的初步提取；Preliminary extraction of features from multiple labeled face samples by the CNN network structure layer;

Softmax Loss层接收CNN网络结构层提取的特征并对CNN网络结构层提取的特征进行分类训练，获得CNN网络的参数。The Softmax Loss layer receives the features extracted by the CNN network structure layer and classifies and trains the features extracted by the CNN network structure layer to obtain the parameters of the CNN network.

可选地，所述系统还包括：Optionally, the system further includes:

人脸区域获取模块，在将各相机拍摄的人脸图像输入CNN网络进行人脸特征提取前，对各相机拍摄的人脸图像进行人脸检测、人脸矫正和图像预处理，获取各相机拍摄的人脸图像的人脸区域。The face area acquisition module performs face detection, face correction and image preprocessing on the face images captured by each camera before inputting the face images captured by each camera into the CNN network for face feature extraction, and obtains the images captured by each camera. face area of the face image.

可选地，所述样本获取模块、计算排序模块、判断模块、记录模块和人脸识别模块在布控时间工作，所述二次微调模块在非布控时间工作。Optionally, the sample acquisition module, the calculation sorting module, the judgment module, the recording module and the face recognition module work during the deployment time, and the secondary fine-tuning module works during the non-control time.

本申请的有益效果：利用离线训练好的CNN网络进行人脸特征提取，即使用预训练的基础识别率对各相机抓拍的人脸图像进行识别，在识别过程中保留有用特征，并针对各个相机进行对应的二次微调网络自训练，获得各相机对应的微调模型，并在之后的人脸特征提取时，使用CNN网络和当前自训练获得的各相机对应的微调模型，不断提升识别率。每个相机对应一个自训练的微调模型，各个相机对应的微调模型专注于对应相机数据下的特征表述，从而使系统适应于各个相机环境，不断提升识别率。且各相机对应的微调模型自训练工作不仅可以提升抓拍过的用户和底库之间的相似度，对各自场景下其他的未被抓拍的用户也有迁移提升作用。Beneficial effects of the present application: use the offline trained CNN network to extract facial features, that is, use the pre-trained basic recognition rate to recognize the facial images captured by each camera, retain useful features during the recognition process, and target each camera. Carry out the corresponding secondary fine-tuning network self-training to obtain the fine-tuning model corresponding to each camera, and use the CNN network and the fine-tuning model corresponding to each camera obtained by the current self-training in the subsequent face feature extraction to continuously improve the recognition rate. Each camera corresponds to a self-trained fine-tuning model, and the fine-tuning model corresponding to each camera focuses on the feature representation under the corresponding camera data, so that the system adapts to each camera environment and continuously improves the recognition rate. Moreover, the self-training work of the fine-tuning model corresponding to each camera can not only improve the similarity between the captured users and the base library, but also have a migration and improvement effect on other users who have not been captured in their respective scenes.

附图说明Description of drawings

图1是本申请一示例性实施例示出的离线训练CNN网络的流程图；1 is a flowchart of an offline training CNN network shown in an exemplary embodiment of the present application;

图2是本申请一示例性实施例示出的一种人脸识别方法的流程图；FIG. 2 is a flowchart of a face recognition method according to an exemplary embodiment of the present application;

图3是本申请一示例性实施例示出的待布控人脸图像的入库流程图；FIG. 3 is a storage flow chart of a face image to be deployed according to an exemplary embodiment of the present application;

图4是本申请一示例性实施例示出的获取人脸图像三元特征的流程图；FIG. 4 is a flow chart of obtaining triple features of a face image according to an exemplary embodiment of the present application;

图5是本申请一示例性实施例示出的特征对比结构示意图；5 is a schematic diagram of a feature comparison structure shown in an exemplary embodiment of the present application;

图6是本申请一示例性实施例示出的二次微调网络进行自训练的流程图；6 is a flowchart of self-training of a secondary fine-tuning network shown in an exemplary embodiment of the present application;

图7是本申请一示例性实施例示出的二次微调网络自训练的结果示意图；7 is a schematic diagram of a result of self-training of a secondary fine-tuning network shown in an exemplary embodiment of the present application;

图8是本申请一示例性实施例示出的人脸识别特征提取模块的网络结构示意图；8 is a schematic diagram of a network structure of a face recognition feature extraction module shown in an exemplary embodiment of the present application;

图9是本申请一示例性实施例示出的处理器工作流程图；FIG. 9 is a working flowchart of a processor shown in an exemplary embodiment of the present application;

图10是本申请一示例性实施例示出的一种人脸识别系统的结构示意图；FIG. 10 is a schematic structural diagram of a face recognition system according to an exemplary embodiment of the present application;

图11是本申请一示例性实施例示出的一具体的人脸识别系统的结构图。FIG. 11 is a structural diagram of a specific face recognition system according to an exemplary embodiment of the present application.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as recited in the appended claims.

在本申请使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本申请。在本申请和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。还应当理解，本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to limit the application. As used in this application and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.

应当理解，尽管在本申请可能采用术语第一、第二、第三等来描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如，在不脱离本申请范围的情况下，第一信息也可以被称为第二信息，类似地，第二信息也可以被称为第一信息。取决于语境，如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in this application to describe various information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other. For example, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information without departing from the scope of the present application. Depending on the context, the word "if" as used herein can be interpreted as "at the time of" or "when" or "in response to determining."

一套人脸监控系统，通常包含多台相机，相机被架设在不同的环境场景中，会导致各相机采集到的人脸图像存在光照、角度、尺度和图像质量等方面的差异。通常，这些相机会将人脸图像数据传给服务器，由服务器进行人脸比对。服务器通常采用固定的特征提取模型，从而造成各相机环境下的人脸识别率不同，受限于训练样本的采集环境及样本数量，在一些采集环境差异较大的相机场景下，人脸识别率偏低。A face monitoring system usually includes multiple cameras. The cameras are set up in different environmental scenes, which will cause the face images collected by each camera to have differences in illumination, angle, scale and image quality. Usually, these cameras will transmit face image data to the server, and the server will perform face comparison. The server usually uses a fixed feature extraction model, resulting in different face recognition rates in different camera environments, limited by the collection environment and the number of samples for training samples. In some camera scenarios with greatly different collection environments, the face recognition rate low.

为了解决上述问题，本申请提出了一种人脸识别方法及系统，在系统初始基础识别率上，根据不同相机所在的环境，不断迭代更新相应的特征提取模型，以适应各个相机所在环境，从而全面提升人脸识别率，该方法及系统能有效应用于各领域场景下的人脸识别，并通过系统长时间的运行，不断提升系统自身的人脸识别率。In order to solve the above problems, the present application proposes a face recognition method and system. Based on the initial basic recognition rate of the system, according to the environment where different cameras are located, the corresponding feature extraction models are continuously updated iteratively to adapt to the environment where each camera is located. To comprehensively improve the face recognition rate, the method and system can be effectively applied to face recognition in various fields and scenarios, and through the long-term operation of the system, the face recognition rate of the system itself can be continuously improved.

参见图1，在进行人脸识别(即人脸特征提取)前，需要离线训练获取CNN(Convolutional Neural Network,卷积神经网络)网络，CNN网络的获取步骤可以包括：首先，设计CNN网络结构层对多个带有标签的人脸样本进行特征的初步提取；接着，将CNN网络结构层提取的特征输入Softmax Loss(即分类损失函数)层进行分类训练，获得CNN网络的参数。Referring to Figure 1, before performing face recognition (that is, face feature extraction), offline training is required to obtain a CNN (Convolutional Neural Network, convolutional neural network) network. The steps for obtaining a CNN network may include: first, designing a CNN network structure layer Preliminary extraction of features is performed on multiple labeled face samples; then, the features extracted by the CNN network structure layer are input into the Softmax Loss (ie, classification loss function) layer for classification training, and the parameters of the CNN network are obtained.

其中，CNN网络结构层接收大量的带标签的待布控人员图像作为训练样本向量集，进行处理后输出特征，Softmax Loss层对CNN网络结构层输出的特征进行分类训练，获得CNN网络的参数。Among them, the CNN network structure layer receives a large number of labeled images of people to be controlled as training sample vector sets, and outputs features after processing. The Softmax Loss layer classifies and trains the features output by the CNN network structure layer to obtain the parameters of the CNN network.

本实施例中，CNN网络结构层可根据需要设置，CNN网络结构层的设计需要满足：可正常收敛，可提取有表征作用的人脸特征。In this embodiment, the structure layer of the CNN network can be set as required, and the design of the structure layer of the CNN network needs to satisfy the following requirements: normal convergence and extraction of facial features with a characterization function.

Softmax Loss层使用Softmax回归算法进行分类训练，本实施例的Softmax回归算法的代价函数J(θ)为：The Softmax Loss layer uses the Softmax regression algorithm for classification training. The cost function J(θ) of the Softmax regression algorithm in this embodiment is:

公式(1)中，θ是待训练参数，例如CNN网络结构层中的各卷积层和各全连接层以及其他层的权重w和偏置b；In formula (1), θ is the parameter to be trained, such as the weight w and bias b of each convolutional layer and each fully connected layer in the CNN network structure layer and other layers;

m为输入样本量，输入样本即为输入至CNN网络结构层的带标签的人脸样本；m is the input sample size, and the input sample is the labeled face sample input to the CNN network structure layer;

k为分类预测总数；k is the total number of classification predictions;

x为输入样本，在步骤S101中即为待布控人脸图像；x is the input sample, which is the face image to be deployed in step S101;

y为输入样本的类别标签；y is the class label of the input sample;

j和l为为样本分类标签取值[1，k]；j and l are the values [1, k] for the sample classification labels;

i为输入样本的序号取值[1，m]；i is the serial number of the input sample and takes the value [1, m];

θ为模型全部参数，θ₁，θ₂，...，θ_k是模型参数，T为矩阵转置操作。θ is all parameters of the model, θ₁ , θ₂ , ..., θ_k are model parameters, and T is the matrix transposition operation.

为了便于后续二次微调自训练使用，在CNN网络结构层和Softmax Loss层之间设有L2归一化层，L2归一化层将CNN网络结构层输出的特征进行L2归一化后输入SoftmaxLoss层进行分类训练。In order to facilitate the use of subsequent secondary fine-tuning self-training, an L2 normalization layer is set between the CNN network structure layer and the Softmax Loss layer. The L2 normalization layer performs L2 normalization on the features output by the CNN network structure layer and then inputs SoftmaxLoss. layer for classification training.

需要说明的是，Softmax回归算法的代价函数J(θ)也可根据本领域普通技术人员的经验设定成其它公式。It should be noted that the cost function J(θ) of the Softmax regression algorithm can also be set to other formulas according to the experience of those skilled in the art.

本实施例中，CNN网络结构层输出的特征构成的特征向量为x(x₁，x₂，...，x_n)，则L2归一化(范数归一化)的公式为：In this embodiment, the feature vector formed by the features output by the CNN network structure layer is x (x₁ , x₂ , . . . , x_n ), then the formula for L2 normalization (norm normalization) is:

公式(2)中，x_i′为特征向量x中第i个特征归一化后的数值；In formula (2), x_i ′ is the normalized value of the i-th feature in the feature vector x;

x_i为特征向量x中第i(1≤i≤n)个特征的数值。本实施例中，带有标签的人脸样本的获取过程可以包括以下步骤：x_i is the value of the i-th (1≤i≤n) feature in the feature vector x. In this embodiment, the acquisition process of the labeled face sample may include the following steps:

对多个人脸图像进行人脸检测，获得各人脸图像的人脸区域；Perform face detection on multiple face images to obtain the face area of each face image;

对各人脸图像的人脸区域进行人脸矫正(例如人脸定点)、图像预处理(例如，旋转、相似变换等)和打标签的操作，获得带有标签的人脸样本。Perform face correction (for example, face fixed point), image preprocessing (for example, rotation, similarity transformation, etc.) and labeling operations on the face area of each face image to obtain face samples with labels.

本实施例中，人脸检测算法可根据需要选择常规的人脸检测算法，例如，LBP(Local Binary Patterns，局部二值模式)、Haar(小波变换)、HOG(Histogram of OrientedGradient，方向梯度直方图)、SURF(Speed-up robust features，加速健壮特征)加adaboost(迭代算法)、SVM(Support Vector Machine，支持向量机)以及神经网络算法等。In this embodiment, the face detection algorithm may select a conventional face detection algorithm as required, for example, LBP (Local Binary Patterns, Local Binary Patterns), Haar (Wavelet Transform), HOG (Histogram of Oriented Gradient, Histogram of Oriented Gradients) ), SURF (Speed-up robust features, accelerated robust features) plus adaboost (iterative algorithm), SVM (Support Vector Machine, support vector machine) and neural network algorithms.

人脸矫正算法也可根据需要选择常规的人脸矫正算法，例如，神经网络算法。The face correction algorithm can also select a conventional face correction algorithm as required, for example, a neural network algorithm.

对各人脸图像进行打标签的处理，即对各类人脸图像设置唯一的标识符，即同一个人的不同人脸图像的标识符相同，不同人之间的人脸图像的标识符不相同，再将具备唯一标识符的各类人脸图像均输入CNN网络结构层进行特征的初步提取。The processing of tagging each face image, that is, setting unique identifiers for various face images, that is, the identifiers of different face images of the same person are the same, and the identifiers of face images between different people are different. , and then input all kinds of face images with unique identifiers into the CNN network structure layer for preliminary feature extraction.

如图2所示，为本实施例提供的一种人脸识别方法的流程图，该人脸识别方法可应用于人脸识别系统，所述人脸识别系统包括CNN网络和针对各相机分别构建的二次微调网络。As shown in FIG. 2 , a flow chart of a face recognition method provided in this embodiment, the face recognition method can be applied to a face recognition system, and the face recognition system includes a CNN network and a The quadratic fine-tuning network.

在进行人脸识别前，人脸识别系统需要先进行待布控人脸图像的人脸特征提取，以获得待布控人脸图像，具体如下：Before performing face recognition, the face recognition system needs to extract the face features of the face image to be deployed to obtain the face image to be deployed, as follows:

将待布控人脸图像输入CNN网络，提取各待布控人脸图像的人脸特征。The face images to be deployed are input into the CNN network, and the facial features of each face image to be deployed are extracted.

并且，在提取各待布控人脸图像的人脸特征前，需要将待布控人员的图像入库，由人脸识别系统进行人脸检测、人脸矫正、图像预处理后获得待布控人脸图像。In addition, before extracting the facial features of each face image to be deployed, the image of the personnel to be deployed needs to be stored, and the face recognition system will perform face detection, face correction, and image preprocessing to obtain the face image to be deployed. .

具体地，参见图3，待布控人员的图像入库由用户进行导入，并由人脸识别系统判断待布控人员图像是否读取完毕，若未读取完毕，则继续读取待布控人员图像，人脸识别系统在待布控人员图像读入成功后，对待布控人员图像进行人脸检测获取人脸区域，并对人脸区域进行人脸矫正，获取更为准确的人脸区域，将矫正后的人脸区域进行图像预处理获得待布控人脸图像。Specifically, referring to FIG. 3, the image of the person to be deployed is imported into the warehouse by the user, and the face recognition system determines whether the image of the person to be deployed has been read. If not, the image of the person to be deployed continues to be read. After the image of the person to be deployed is successfully read in, the face recognition system performs face detection on the image of the person to be deployed to obtain the face area, and performs face correction on the face area to obtain a more accurate face area. Perform image preprocessing on the face area to obtain the face image to be deployed.

在此过程中，若人脸识别系统判断待布控人员图像读取不成功或者人脸检测未检测到人脸区域，则表示入库失败，需要重新进行待布控人员图像的入库操作。During this process, if the face recognition system determines that the image of the person to be controlled is not successfully read or the face detection fails to detect the face area, it means that the storage fails, and the storage operation of the image of the person to be controlled needs to be performed again.

当人脸识别系统判断待布控人员图像读取完毕，则表示待布控人员图像入库结束。When the face recognition system judges that the image of the person to be deployed has been read, it means that the image of the person to be deployed is completed.

本实施例中，利用人脸检测算法进行人脸区域检测，并利用人脸矫正算法进行人脸定点。In this embodiment, a face detection algorithm is used to detect a face region, and a face correction algorithm is used to fix a face.

人脸识别系统会将各待布控人脸图像的人脸特征以及对应的待布控人脸图像保存到数据库，供后续的调用。The face recognition system will save the face features of each face image to be deployed and the corresponding face image to be deployed to the database for subsequent calls.

参见图2，本实施例提供的人脸识别方法可以包括：Referring to FIG. 2 , the face recognition method provided in this embodiment may include:

S101：将位于不同场景的相机拍摄的人脸图像输入CNN网络进行人脸特征提取。S101: Input face images captured by cameras located in different scenes into a CNN network for face feature extraction.

参见图4，在不同的场景如商场入口、地铁口、车站等设置相机，各相机不断抓拍人脸图像并将抓拍到的人脸图像并发送至人脸识别系统。Referring to Figure 4, cameras are set up in different scenes, such as shopping mall entrances, subway entrances, stations, etc., and each camera continuously captures face images and sends the captured face images to the face recognition system.

由人脸识别系统对输入的人脸图像进行人脸检测判断所输入的人脸图像中是否存在人脸，若存在，则记录抓拍该人脸图像的相机编号(即相机ID，identification)。接着，人脸识别系统对人脸图像进行人脸检测、人脸矫正、图像预处理、CNN网络获得与人脸图像一一对应的人脸特征。The face recognition system performs face detection on the input face image to determine whether there is a face in the input face image, and if so, records the camera number (ie camera ID, identification) that captured the face image. Then, the face recognition system performs face detection, face correction, image preprocessing on the face image, and the CNN network obtains the face features corresponding to the face image one-to-one.

其中，人脸检测、人脸矫正、图像预处理与上述CNN网络离线训练过程中的人脸样本的处理过程相同，这里不再赘述。Among them, face detection, face correction, and image preprocessing are the same as the above-mentioned processing procedures of face samples in the offline training process of the CNN network, and will not be repeated here.

S102：计算各人脸特征与各预设的待布控人脸特征的相似度并对各人脸特征对应的相似度分别进行排序。S102: Calculate the similarity between each face feature and each preset face feature to be deployed, and sort the similarity corresponding to each face feature respectively.

参见图4，利用相似度对比算法，将步骤S102中获取的各人脸特与待布控人脸特征进行相似度计算，即利用相似度对比算法计算各人脸特征与各待布控人脸特征的相似度，获得各人脸特征对应的相似度，接着人脸识别系统对各人脸特征对应的相似度分别进行排序，获得各人脸特征对应的相似度的排序结果。Referring to FIG. 4 , using the similarity comparison algorithm, the similarity between each face feature obtained in step S102 and the face feature to be controlled is calculated, that is, the similarity between each face feature and each face feature to be controlled is calculated by using the similarity comparison algorithm. Similarity, obtain the similarity corresponding to each face feature, and then the face recognition system sorts the similarity corresponding to each face feature respectively, and obtains the sorting result of the similarity corresponding to each face feature.

在一个例子中，待布控人脸为N个，经过相似度计算后，各人脸特征对应的相似度即为N个，对各人脸特征对应的这N个相似度进行排序。In an example, there are N faces to be controlled. After similarity calculation, the similarity corresponding to each face feature is N, and the N similarities corresponding to each face feature are sorted.

其中，相似度可选择为人脸特征与待布控人脸特征之间的欧氏距离、余弦距离等来计算。Wherein, the similarity can be calculated as the Euclidean distance, cosine distance, etc. between the face feature and the face feature to be controlled.

S103：判断各人脸特征的最大相似度是否大于预设告警阈值T。S103: Determine whether the maximum similarity of each face feature is greater than a preset alarm threshold T.

各人脸特征的最大相似度即为与各人脸特征相似度最高的待布控人脸特征与相应人脸特征之间的相似度。The maximum similarity of each face feature is the similarity between the face feature to be controlled with the highest similarity with each face feature and the corresponding face feature.

在一个例子中，预设告警阈值T＝80％，当各人脸特征的最大相似度>80％时，则进入步骤S104，并输出告警信息，该告警信息表面该人脸特征与待布控人脸特征中的一个相匹配，即该人脸特征被判定为可靠样本，可作为二次微调网络的自训练样本。In one example, the preset alarm threshold value T=80%, when the maximum similarity of each facial feature is >80%, then enter step S104, and output alarm information, which shows that the facial feature is related to the person to be controlled. One of the face features matches, that is, the face feature is judged as a reliable sample, which can be used as a self-training sample for the secondary fine-tuning network.

其中，告警信号可根据需要选择，例如，可选择弹出对话框形式提醒工作人员，该人脸特征为待布控人脸特征。The alarm signal can be selected as required, for example, a pop-up dialog box can be selected to remind the staff that the face feature is the face feature to be controlled.

S104：记录各相机对应的三元特征，所述三元特征该相机拍摄的最大相似度大于预设告警阈值的人脸特征a(Anchor)、与该人脸特征相似度最大的待布控人脸特征p(Positive，即为a的同类样本)以及与该人脸特征相似度第二的待布控人脸特征b(Negative，即a的非同类样本中最相近的样本)。S104: Record the ternary features corresponding to each camera, the ternary features of the face feature a (Anchor) whose maximum similarity captured by the camera is greater than the preset alarm threshold, and the face to be controlled with the greatest similarity to the face feature. Feature p (Positive, that is, a similar sample of a) and a face feature b (Negative, that is, the most similar sample among the non-similar samples of a) with the second similarity to the face feature.

本实施例中，最大相似度大于预设告警阈值的人脸特征a、与该人脸特征相似度最大的待布控人脸特征p(作为a的同类样本)以及与该人脸特征相似度第二的待布控人脸特征b(作为a的非同类样本中最相近的样本)形成三元特征。In this embodiment, the face feature a whose maximum similarity is greater than the preset alarm threshold, the face feature p to be controlled (as a similar sample of a) with the maximum similarity with the face feature, and the face feature with the highest similarity to the face feature p. The second face feature b to be deployed (as the most similar sample among the non-similar samples of a) forms a ternary feature.

在一实施例中，为了适应各个相机的环境，在记录三元特征时，也记录三元特征所属的相机编号，以针对各相机记录的三元特征对各相机分别进行对应网络的二次微调自训练。In one embodiment, in order to adapt to the environment of each camera, when recording the ternary feature, the camera number to which the ternary feature belongs is also recorded, so as to perform secondary fine-tuning of the corresponding network for each camera according to the ternary feature recorded by each camera. self-training.

在又一实施例中，人脸识别系统对各相机设置对应的存储模块，各存储模块用于相应相机所对应的三元特征，则在记录三元特征时，也记录三元特征所在存储设备的编号，以实现对各相机分别进行对应网络的二次微调训练。参见图5，在一个例子中，对某一相机的监控摄像机画面进行人脸检测，获得人脸相机抓拍的人脸图像，接着进行人脸矫正、图像预处理操作，输入CNN网络进行特征提取，获得与该人脸图像对应的人脸特征，利用特征对比算法对该人脸特征与各入库证件照(即待布控人员图像)对应的待布控人脸特征进行相似对计算，获得该人脸特征与第一幅入库证件照的待布控人脸特征的相似度为0.3，与第二幅入库证件照的待布控人脸特征的相似度为0.9，与第三幅入库证件照的待布控人脸特征的相似度为0.6，对该人脸特征对应的相似度进行排序，获得相似度最大为0.9，对应第二幅入库证件照。In yet another embodiment, the face recognition system sets a corresponding storage module for each camera, and each storage module is used for the ternary feature corresponding to the corresponding camera. When recording the ternary feature, the storage device where the ternary feature is located is also recorded. , in order to implement the secondary fine-tuning training of the corresponding network for each camera. Referring to Figure 5, in one example, face detection is performed on the surveillance camera image of a certain camera to obtain a face image captured by the face camera, then face correction and image preprocessing operations are performed, and the CNN network is input to perform feature extraction. Obtain the face feature corresponding to the face image, and use the feature comparison algorithm to perform a similarity pair calculation between the face feature and the face features to be deployed corresponding to the photos of each entry certificate (that is, the image of the person to be deployed), and obtain the face. The similarity between the features and the features of the face to be issued in the first entry ID photo is 0.3, the similarity with the facial features to be released in the second entry ID photo is 0.9, and the similarity with the third entry ID photo is 0.9. The similarity of the face features to be deployed is 0.6, and the similarity corresponding to the face features is sorted, and the maximum similarity is 0.9, which corresponds to the second storage ID photo.

人脸识别系统判断出该人脸特征的最大相似度0.9>80％，则弹告警对话框，即表明该人脸图像与第二幅入库证件照相匹配，并记录该人脸特征，第二幅入库证件照对应的待布控人脸特征、第三幅证件照对应的待布控人脸特征，形成该相机对应的三元特征。The face recognition system judges that the maximum similarity of the face features is 0.9>80%, and an alarm dialog box will pop up, indicating that the face image is matched with the second storage document, and the face features will be recorded. The face feature to be deployed corresponding to the ID photo of the warehouse entry and the face feature to be deployed corresponding to the third ID photo form the ternary feature corresponding to the camera.

S105：当针对某一相机所记录的三元特征的总数量达到预设数量时，则将该相机对应的三元特征输入该相机对应的二次微调网络进行自训练，获得该相机对应的微调模型。S105: When the total number of ternary features recorded for a certain camera reaches a preset number, input the ternary features corresponding to the camera to the secondary fine-tuning network corresponding to the camera for self-training, and obtain the fine-tuning corresponding to the camera. Model.

其中，人脸识别系统判断各相机抓拍的可靠人脸三元特征的总数量是否大于预设数量N，若是，则针对该相机启动该批样本的二次微调网络的自训练；否则，则放弃该相机的二次微调网络的自训练。Among them, the face recognition system judges whether the total number of reliable face ternary features captured by each camera is greater than the preset number N, if so, start the self-training of the secondary fine-tuning network for the batch of samples for the camera; otherwise, give up Self-training of a quadratic fine-tuned network for this camera.

将各相机对应的全连接层输出的特征以及相应相机三元特征中相似度最大的待布控人脸特征、相似度第二的待布控人脸特征输入Triplet Loss(即损失函数)层学习，获得各相机对应的全连接层的参数作为该相机对应的微调模型参数。Input the output feature of the fully connected layer corresponding to each camera, the face feature to be deployed with the largest similarity among the three-dimensional features of the corresponding camera, and the face feature to be deployed with the second similarity into the Triplet Loss (ie loss function) layer for learning to obtain The parameters of the fully connected layer corresponding to each camera are used as the fine-tuning model parameters corresponding to the camera.

本实施例中，二次微调网络训练采用Triplet Loss层。Triplet Loss层的输入为上述记录的各相机对应的三元特征。In this embodiment, the Triplet Loss layer is used for the secondary fine-tuning network training. The input of the Triplet Loss layer is the triplet feature corresponding to each camera recorded above.

Triplet Loss层的损失函数L计算公式为：The calculation formula of the loss function L of the Triplet Loss layer is:

公式(3)中，

为输入集合中第i个Anchor特征向量，即各相机对应的全连接层输出的第i个特征；In formula (3),

is the i-th Anchor feature vector in the input set, that is, the i-th feature output by the fully connected layer corresponding to each camera;

为输入集合中第i个Positive特征向量，即相应相机三元特征中的第i个相似度最大的待布控人脸特征；

is the i-th Positive feature vector in the input set, that is, the i-th face feature to be controlled with the largest similarity in the corresponding camera ternary features;

为输入集合中第i个Negative特征向量，即相应相机三元特征中的第i个相似度第二的待布控人脸特征；

is the i-th Negative feature vector in the input set, that is, the i-th face feature to be controlled with the second similarity in the corresponding camera ternary features;

f为全连接fc层函数，α是训练设定参数，N是自然数。f is the fully connected fc layer function, α is the training setting parameter, and N is a natural number.

当L在迭代周期内不再降低，或者训练迭代到预设次数时，获取各相机对应的全连接层函数的参数作为该相机对应的微调模型参数。即在迭代周期内不再降低，或者训练迭代到预设次数时，Triplet Loss层输出的微调模型fc(i)的参数，即为二次微调网络进行自训练的结果。微调模型fc(i)仅供相机编号或存储设备的编号为i所对应的相机使用。When L no longer decreases within the iteration period, or when the training iteration reaches a preset number of times, the parameters of the fully connected layer function corresponding to each camera are obtained as the fine-tuning model parameters corresponding to the camera. That is to say, when it is no longer reduced during the iteration period, or when the training iteration reaches a preset number of times, the parameters of the fine-tuning model fc(i) output by the Triplet Loss layer are the results of the self-training of the secondary fine-tuning network. The fine-tuning model fc(i) is only available for the camera with the camera number or storage device number i.

需要说明的是，Triplet Loss层的损失函数L计算公式也可由本领域普通技术人员根据经验设定成其它公式。It should be noted that, the calculation formula of the loss function L of the Triplet Loss layer can also be set to other formulas by those of ordinary skill in the art based on experience.

参见图7，通过Triplet Loss层后，将特征Anchor和特征Positive的距离拉近，特征Anchor和特征Negative的距离拉远，由此可见，经过Triplet Loss层的训练学习，拉近特征Anchor和特征Positive的距离，能够更精确地进行人脸匹配。Referring to Figure 7, after passing through the Triplet Loss layer, the distance between the feature Anchor and the feature Positive is shortened, and the distance between the feature Anchor and the feature Negative is shortened. It can be seen that after the training and learning of the Triplet Loss layer, the feature Anchor and the feature Positive are shortened. The distance can be more accurate for face matching.

而为了学习到各相机场景环境的异常参数，在二次微调网络进行自训练时，对各相机分别增加全连接fc层。参见图6，将各相机编号对应的人脸特征输入该相机对应的全连接fc层，全连接fc层输入、输出特征维数一致。In order to learn the abnormal parameters of the scene environment of each camera, the fully connected fc layer is added to each camera during the self-training of the secondary fine-tuning network. Referring to FIG. 6 , the face features corresponding to each camera number are input into the fully connected fc layer corresponding to the camera, and the input and output feature dimensions of the fully connected fc layer are the same.

人脸识别系统对各相机对应的全连接fc层输出的特征a以及与该人脸特征相似度最大的待布控人脸特征p、与该人脸特征相似度第二的待布控人脸特征n输入Triplet Loss层学习，获得各相机对应的微调模型fc(i)。The face recognition system outputs the feature a of the fully connected fc layer corresponding to each camera, the face feature p to be deployed with the greatest similarity with the face feature, and the face feature n to be deployed with the second similarity with the face feature. Enter the Triplet Loss layer to learn to obtain the fine-tuning model fc(i) corresponding to each camera.

为了简化计算，全连接fc层与Triplet Loss层之间还设有L2归一化层，对全连接fc层输出的特征进行L2归一化后输入Triplet Loss层。其中，L2归一化层的计算方式可参见公式(2)。In order to simplify the calculation, there is also an L2 normalization layer between the fully connected fc layer and the Triplet Loss layer, and the features output by the fully connected fc layer are L2 normalized and input into the Triplet Loss layer. The calculation method of the L2 normalization layer may refer to formula (2).

S106：在下一次对该相机所拍摄的人脸图像进行人脸特征提取时，将该相机拍摄的人脸图像依次输入CNN网络、该相机当前的微调模型，获得所述人脸图像的人脸特征。参见图8，将各相机的人脸识别特征提取模型更新为CNN网络和步骤S106中针对该相机训练学习的该相机当前的微调模型。S106: When performing facial feature extraction on the facial image captured by the camera next time, sequentially input the facial image captured by the camera into the CNN network and the current fine-tuning model of the camera to obtain the facial features of the facial image. . Referring to FIG. 8 , the face recognition feature extraction model of each camera is updated to the CNN network and the current fine-tuning model of the camera trained and learned in step S106 for the camera.

可选地，对各相机对应的微调模型fc(i)进行L2归一化后获得该相机当前的微调模型，以简化计算。Optionally, after performing L2 normalization on the fine-tuning model fc(i) corresponding to each camera, the current fine-tuning model of the camera is obtained to simplify the calculation.

本实施例中，在步骤S106获取各相机拍摄的人脸图像的人脸特征后，返回步骤S102继续往下执行，不断迭代各相机的微调模型，从而不断提升对各相机的人脸识别率。In this embodiment, after acquiring the facial features of the facial images captured by each camera in step S106, the process returns to step S102 and continues to execute, continuously iterating the fine-tuning models of each camera, thereby continuously improving the face recognition rate of each camera.

另外，需要说明的是，CNN网络的参数不需要更改。由于人脸识别系统中的处理器的计算性能往往有限，本申请提出采用布控时间段进行识别对比处理，非布控时间段进行模型微调，充分使用人脸识别系统的计算资源。In addition, it should be noted that the parameters of the CNN network do not need to be changed. Since the computing performance of the processor in the face recognition system is often limited, the present application proposes to use the control time period for identification and comparison processing, and the non-control time period for model fine-tuning, so as to fully utilize the computing resources of the face recognition system.

其中，布控时间段根据需要选择为相机场景环境的工作时间段，例如早上8:00-晚上10:00。非布控时间段则为相机场景环境的非工作时间段。The deployment time period is selected as the working time period of the camera scene environment as required, for example, 8:00 in the morning to 10:00 in the evening. The non-control time period is the non-working time period of the camera scene environment.

参见图9，在布控时间段，人脸识别系统进行人脸检测、人脸矫正、图像预处理、人脸特征提取、以及相似度计算排序、告警处理及各相机对应的三元特征记录。并在非布控时间段，人脸识别系统进行二次微调网络自训练，获取各相机对应的微调模型，并将获取的各相机对应的微调模型与CNN网络一起作为下一次布控时间中相应相机的人脸提取模型。Referring to Figure 9, during the deployment time period, the face recognition system performs face detection, face correction, image preprocessing, face feature extraction, similarity calculation and sorting, alarm processing, and ternary feature records corresponding to each camera. And in the non-control time period, the face recognition system performs secondary fine-tuning network self-training, obtains the fine-tuning model corresponding to each camera, and uses the obtained fine-tuning model corresponding to each camera together with the CNN network as the corresponding camera in the next control time. face extraction model.

本实施例的人脸识别系统分时处理不同任务，在布控时间段，人脸识别系统利用CNN网络以及后续二次微调网络自训练更新后的各相机对应的微调模型进行人脸识别布控任务；在非布控时间段，人脸识别系统进行各相机对应的二次微调网络的自训练，获得该相机对应的新微调模型，充分利用人脸识别系统的计算资源。The face recognition system of this embodiment processes different tasks in a time-sharing manner, and during the deployment and control time period, the face recognition system uses the CNN network and the fine-tuning model corresponding to each camera after self-training and updating of the subsequent secondary fine-tuning network to perform the face-recognition deployment and control task; During the non-control period, the face recognition system performs self-training of the secondary fine-tuning network corresponding to each camera to obtain a new fine-tuning model corresponding to the camera, making full use of the computing resources of the face recognition system.

如图10所示，为本申请提供的人脸识别系统的结构框图，与上述人脸识别方法相对应，可参照上述人脸识别方法的实施例来理解或解释该人脸识别系统的内容。As shown in FIG. 10 , the structural block diagram of the face recognition system provided by the present application corresponds to the above-mentioned face recognition method, and the content of the face recognition system can be understood or explained with reference to the embodiments of the above-mentioned face recognition method.

参见图10，本实施例提供的一种人脸识别系统，所述人脸识别系统包括CNN网络和针对各相机分别构建的二次微调网络，所述人脸识别系统还包括样本获取模块101、计算排序模块102、判断模块103、记录模块104、二次微调模块105以及人脸识别模块106。Referring to FIG. 10 , a face recognition system provided in this embodiment includes a CNN network and a secondary fine-tuning network respectively constructed for each camera, and the face recognition system further includes a sample acquisition module 101, The calculation sorting module 102 , the judging module 103 , the recording module 104 , the secondary fine-tuning module 105 and theface recognition module 106 .

其中，样本获取模块101用于将位于不同场景的相机拍摄的人脸图像输入CNN网络进行人脸特征提取。Among them, the sample acquisition module 101 is used for inputting face images captured by cameras located in different scenes into the CNN network for face feature extraction.

计算排序模块102用于计算各人脸特征与各预设的待布控人脸特征的相似度并对各人脸特征对应的相似度分别进行排序。The calculating and sorting module 102 is configured to calculate the similarity between each face feature and each preset face feature to be deployed, and sort the similarity corresponding to each face feature respectively.

判断模块103用于判断各人脸特征的最大相似度是否大于预设告警阈值。The judgment module 103 is configured to judge whether the maximum similarity of each face feature is greater than the preset alarm threshold.

记录模块104在当各人脸特征的最大相似度大于预设告警阈值时，记录各相机对应的三元特征，所述三元特征包括该相机拍摄的最大相似度大于预设告警阈值的人脸特征、与该人脸特征相似度最大的待布控人脸特征以及与该人脸特征相似度第二的待布控人脸特征。The recording module 104 records the ternary features corresponding to each camera when the maximum similarity of each facial feature is greater than the preset alarm threshold, and the ternary features include the faces of which the maximum similarity captured by the camera is greater than the preset alarm threshold feature, the face feature to be deployed with the greatest similarity with the face feature, and the face feature to be deployed with the second similarity with the face feature.

二次微调模块105在当针对某一相机所记录的三元特征的总数量达到预设数量时，则将该相机对应的三元特征输入该相机的二次微调网络进行自训练，获得该相机对应的微调模型。The secondary fine-tuning module 105, when the total number of ternary features recorded for a certain camera reaches a preset number, inputs the ternary features corresponding to the camera into the secondary fine-tuning network of the camera for self-training, and obtains the camera. The corresponding fine-tuned model.

本实施例中，所述二次微调网络包括：In this embodiment, the secondary fine-tuning network includes:

可选地，所述全连接网络输出的各特征经L2归一化后输入至Triplet Loss层。Optionally, each feature output by the fully connected network is L2 normalized and then input to the Triplet Loss layer.

所述Triplet Loss层的损失函数L的计算公式为：The calculation formula of the loss function L of the Triplet Loss layer is:

其中，

为各相机对应的全连接层输出的第i个特征，

为相应相机三元特征中的第i个相似度最大的待布控人脸特征，

为相应相机三元特征中的第i个相似度第二的待布控人脸特征，f为全连接层函数，α是训练设定参数；in,

is the ith feature output by the fully connected layer corresponding to each camera,

is the i-th most similar face feature to be controlled in the corresponding camera ternary features,

is the ith face feature to be controlled with the second similarity in the corresponding camera ternary features, f is the fully connected layer function, and α is the training setting parameter;

当L在迭代周期内不再降低，或者训练迭代到预设次数时，获取各相机对应的全连接层函数的参数作为该相机对应的微调模型参数。When L no longer decreases within the iteration period, or when the training iteration reaches a preset number of times, the parameters of the fully connected layer function corresponding to each camera are obtained as the fine-tuning model parameters corresponding to the camera.

需要说明的是，Triplet Loss层的损失函数L的计算公式也可根据本领域普通技术人员的经验设定成其它公式。It should be noted that, the calculation formula of the loss function L of the Triplet Loss layer may also be set to other formulas according to the experience of those of ordinary skill in the art.

人脸识别模块106，在下一次对该相机所拍摄的人脸图像进行人脸特征提取时，将该相机拍摄的人脸图像依次输入CNN网络、该相机当前的微调模型，获得所述人脸图像的人脸特征。Theface recognition module 106, when performing facial feature extraction on the face image captured by the camera next time, sequentially input the face image captured by the camera into the CNN network and the current fine-tuning model of the camera to obtain the face image facial features.

本实施例的人脸识别系统还包括参数提取模块(图中未显示)，用于将待布控人脸图像输入CNN网络进行待布控人脸特征提取。The face recognition system of this embodiment further includes a parameter extraction module (not shown in the figure) for inputting the face image to be deployed into the CNN network for feature extraction of the face to be deployed.

本实施例中，离线训练获得所述CNN网络的过程可以包括：In this embodiment, the process of obtaining the CNN network by offline training may include:

接着，Softmax Loss层接收CNN网络结构层提取的特征并对CNN网络结构层提取的特征进行分类训练，获得CNN网络的参数。Next, the Softmax Loss layer receives the features extracted by the CNN network structure layer and classifies and trains the features extracted by the CNN network structure layer to obtain the parameters of the CNN network.

所述CNN网络结构层提取的分类特征经L2归一化处理后输入Softmax Loss层。The classification features extracted by the CNN network structure layer are input to the Softmax Loss layer after L2 normalization.

所述Softmax Loss层的代价函数J(θ)的计算公式为：The calculation formula of the cost function J(θ) of the Softmax Loss layer is:

其中，θ是待训练参数，m为输入样本量，k为分类预测总数，x为输入样本，y为输入样本的类别标签。Among them, θ is the parameter to be trained, m is the input sample size, k is the total number of classification predictions, x is the input sample, and y is the category label of the input sample.

本实施例中，所述系统还包括：In this embodiment, the system further includes:

人脸区域获取模块(图中未显示)，将待布控人脸图像或各相机拍摄的人脸图像输入CNN网络进行人脸特征提取前，对待布控人脸图像或各相机拍摄的人脸图像进行人脸检测、人脸矫正和图像预处理，获取待布控人脸图像或各相机拍摄的人脸图像的人脸区域。The face area acquisition module (not shown in the figure), before inputting the face image to be deployed or the face image captured by each camera into the CNN network for face feature extraction, the face image to be deployed or the face image captured by each camera is processed. Face detection, face correction and image preprocessing, to obtain the face image to be deployed or the face area of the face image captured by each camera.

在一个例子中，所述样本获取模块101、计算排序模块102、判断模块103、记录模块104和人脸识别模块106在布控时间工作，所述二次微调模块105在非布控时间工作，以充分利用人脸识别系统的计算资源。In one example, the sample acquisition module 101 , the calculation sorting module 102 , the judgment module 103 , the recording module 104 and theface recognition module 106 work during the deployment time, and the secondary fine-tuning module 105 works during the non-distribution time to fully Utilize the computational resources of the face recognition system.

在另一实施例中，参见图11，所述人脸识别系统包括处理器、存储设备、输入设备(例如鼠标、键盘、麦克风)、输出设备(例如屏幕、扬声器、警示灯等)和人脸相机序列。其中，存储设备、输入设备、输出设备和人脸相机序列均与处理器通信相连。In another embodiment, referring to FIG. 11 , the face recognition system includes a processor, a storage device, an input device (eg, a mouse, a keyboard, a microphone), an output device (eg, a screen, a speaker, a warning light, etc.) and a face camera sequence. Among them, the storage device, the input device, the output device and the face camera sequence are all connected to the processor in communication.

所述处理器为CPU(Central Processing Unit，中央处理器)或GPU(GraphicsProcessing Unit，图形处理器)。人脸相机用于分别在不同相机环境中，并在布控时间拍摄图像返回给处理器，由处理器进行人脸识别、相似度对比和三元特征的提取工作。存储设备用于存储底库的待布控人脸图像及对应的待布控人脸特征，并用于存储各相机对应的三元特征和相机编号或存储器的编号。The processor is a CPU (Central Processing Unit, central processing unit) or a GPU (GraphicsProcessing Unit, graphics processing unit). The face camera is used to capture images in different camera environments and return them to the processor at the deployment time. The processor performs face recognition, similarity comparison and ternary feature extraction. The storage device is used to store the image of the face to be deployed and the corresponding facial feature to be deployed in the base library, and is used to store the ternary feature corresponding to each camera and the camera number or the number of the memory.

输出设备可为用于告警的警示灯等。The output device may be a warning light or the like for an alarm.

综上所述，本申请的人脸识别方法及系统利用离线训练好的CNN网络进行人脸特征提取，即使用预训练的基础识别率对各相机抓拍的人脸图像进行识别，在识别过程中保留对各相机有用的三元特征，并对各个相机进行二次微调网络自训练，获得各相机对应的微调模型，并在之后的人脸特征提取时，使用CNN网络和当前自训练获得的各相机对应的微调模型，不断提升识别率。每个相机对应一个自训练微调模型，各个相机对应的微调模型专注于对应相机数据下的特征表述，从而使系统适应于各个相机环境，不断提升识别率。且各相机对应的微调模型自训练工作不仅可以提升抓拍过的用户和底库之间的相似度，对各自场景下其他的未被抓拍的用户也有迁移提升作用。To sum up, the face recognition method and system of the present application use the offline trained CNN network to extract face features, that is, use the pre-trained basic recognition rate to recognize the face images captured by each camera. Retain the ternary features that are useful for each camera, and perform secondary fine-tuning network self-training on each camera to obtain the fine-tuning model corresponding to each camera. The fine-tuning model corresponding to the camera continuously improves the recognition rate. Each camera corresponds to a self-training fine-tuning model, and the fine-tuning model corresponding to each camera focuses on the feature representation under the corresponding camera data, so that the system adapts to each camera environment and continuously improves the recognition rate. Moreover, the self-training work of the fine-tuning model corresponding to each camera can not only improve the similarity between the captured users and the base library, but also have a migration and improvement effect on other users who have not been captured in their respective scenes.

以上所述仅为本申请的较佳实施例而已，并不用以限制本申请，凡在本申请的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本申请保护的范围之内。The above descriptions are only preferred embodiments of the present application, and are not intended to limit the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present application shall be included in the present application. within the scope of protection.

Claims

Translated fromChinese

1.一种人脸识别方法，应用于人脸识别系统，其特征在于，所述人脸识别系统包括CNN网络和针对各相机分别构建的二次微调网络，所述方法包括：1. a face recognition method, is applied to the face recognition system, it is characterized in that, described face recognition system comprises CNN network and the secondary fine-tuning network that builds respectively for each camera, described method comprises:

当针对某一相机所记录的三元特征的总数量达到预设数量时，则将该相机对应的三元特征输入该相机对应的二次微调网络进行自训练，获得该相机对应的微调模型；When the total number of ternary features recorded for a certain camera reaches a preset number, the ternary features corresponding to the camera are input into the secondary fine-tuning network corresponding to the camera for self-training, and the fine-tuning model corresponding to the camera is obtained;

在下一次对该相机所拍摄的人脸图像进行人脸特征提取时，将该相机拍摄的人脸图像依次输入CNN网络、该相机当前的微调模型，获得所述人脸图像的人脸特征；When the face image captured by the camera is extracted next time, the face image captured by the camera is input into the CNN network and the current fine-tuning model of the camera in turn to obtain the face feature of the face image;

所述二次微调自训练网络包括：The secondary fine-tuning self-training network includes:

将各相机对应的全连接层输出的特征以及相应相机三元特征中相似度最大的待布控人脸特征、相似度第二的待布控人脸特征输入Triplet Loss层学习，获得各相机对应的全连接层的参数作为该相机对应的微调模型参数。Input the output features of the fully connected layer corresponding to each camera, the face features to be deployed with the greatest similarity among the three-dimensional features of the corresponding cameras, and the face features to be deployed with the second similarity into the Triplet Loss layer for learning, to obtain the full corresponding to each camera. The parameters of the connection layer are used as the fine-tuning model parameters corresponding to the camera.

2.如权利要求1所述的人脸识别方法，其特征在于，所述CNN网络的训练过程包括：2. face recognition method as claimed in claim 1, is characterized in that, the training process of described CNN network comprises:

3.如权利要求1所述的人脸识别方法，其特征在于，将各相机拍摄的人脸图像输入CNN网络进行人脸特征提取前，对各相机拍摄的人脸图像进行人脸检测、人脸矫正和图像预处理，获取各相机拍摄的人脸图像的人脸区域。3. face recognition method as claimed in claim 1 is characterized in that, before the face image that each camera is photographed is input CNN network to carry out face feature extraction, the face image that each camera is photographed is carried out face detection, human Face correction and image preprocessing to obtain the face area of the face image captured by each camera.

4.如权利要求1所述的人脸识别方法，其特征在于，所述处理器在布控时间进行人脸特征的提取、相似度的对比以及三元特征的记录，所述处理器在非布控时间进行各相机对应的二次微调网络的自训练。4. face recognition method as claimed in claim 1, is characterized in that, described processor carries out the extraction of face feature, the contrast of similarity and the record of ternary feature in deployment control time, described processor is in non-distribution control Time to carry out the self-training of the secondary fine-tuning network corresponding to each camera.

5.一种人脸识别系统，其特征在于，所述人脸识别系统包括CNN网络和针对各相机分别构建的二次微调网络，所述人脸识别系统还包括：5. A face recognition system, characterized in that the face recognition system comprises a CNN network and a secondary fine-tuning network constructed respectively for each camera, and the face recognition system also includes:

人脸识别模块，在下一次对该相机所拍摄的人脸图像进行人脸特征提取时，将该相机拍摄的人脸图像依次输入CNN网络、该相机当前的微调模型，获得所述人脸图像的人脸特征；The face recognition module, when performing facial feature extraction on the face image captured by the camera next time, sequentially inputs the face image captured by the camera into the CNN network and the current fine-tuning model of the camera, and obtains the face image of the face image. facial features;

所述二次微调网络包括：The secondary fine-tuning network includes:

6.如权利要求5所述的人脸识别系统，其特征在于，所述CNN网络离线训练过程包括：6. The face recognition system of claim 5, wherein the CNN network offline training process comprises:

7.如权利要求5所述的人脸识别系统，其特征在于，所述系统还包括：7. The face recognition system of claim 5, wherein the system further comprises:

8.如权利要求5所述的人脸识别系统，其特征在于，所述样本获取模块、计算排序模块、判断模块、记录模块和人脸识别模块在布控时间工作，所述二次微调模块在非布控时间工作。8. The face recognition system according to claim 5, wherein the sample acquisition module, the calculation sorting module, the judgment module, the recording module and the face recognition module work in the control time, and the secondary fine-tuning module is in the control time. Work during non-mandated hours.