CN112036339B

Movatterモバイル変換

Info

Publication number: CN112036339B
Application number: CN202010917905.0A
Authority: CN
Inventors: 张为义; 涂弘德; 刘以勒; 罗士杰
Original assignee: Fujian Cook Intelligent Technology Co ltd
Current assignee: Qingdao Cook Intelligent Technology Co.,Ltd.
Priority date: 2020-09-03
Filing date: 2020-09-03
Publication date: 2024-04-09
Anticipated expiration: 2040-09-03
Also published as: CN112036339A

Abstract

A face detection method, a face detection device and an electronic device can also perform face detection and living body detection in dark or low-light environment, and improve face detection efficiency and accuracy, so that performance of a face detection technology is comprehensively improved. The face detection method comprises the following steps: obtaining a depth map of a target to be detected, and extracting features of the depth map to obtain a first feature map; performing face detection on the first feature map to obtain a face region feature map; extracting features of the facial region feature map to obtain a second feature map; performing living body detection on the second feature map to obtain a living body detection result of the human face; and outputting a human face region frame comprising the living human face in the depth map according to the living body detection result of the human face.

Description

Translated fromChinese

人脸检测的方法、装置和电子设备Face detection method, device and electronic device

技术领域Technical Field

本申请涉及生物特征检测技术领域，并且更具体地，涉及一种人脸检测的方法、装置和电子设备。The present application relates to the technical field of biometric detection, and more specifically, to a method, device and electronic device for face detection.

背景技术Background technique

人脸检测(face detection)，是基于人的脸部特征信息进行身份识别的一种生物检测识别技术。用摄像机或摄像头采集含有人脸的图像或视频流，并自动在图像中检测和跟踪人脸，进而对检测到的人脸进行脸部的图像预处理、图像特征提取以及匹配与识别等一系列相关技术，通常也叫做人脸识别，人像识别或面部识别。随着计算机和网络技术的飞速发展，人脸检测技术已广泛地应用于智能门禁、移动终端、公共安全、娱乐、军事等诸多行业及领域。Face detection is a biometric detection and recognition technology that uses facial feature information to identify people. It uses a camera or a camera to collect images or video streams containing faces, and automatically detects and tracks faces in the images, and then performs a series of related technologies such as facial image preprocessing, image feature extraction, matching and recognition on the detected faces. It is also commonly called face recognition, portrait recognition or facial recognition. With the rapid development of computer and network technology, face detection technology has been widely used in many industries and fields such as smart access control, mobile terminals, public security, entertainment, and military.

现有的人脸检测技术，大多先在图像中找出候选框的大致位置，并确定该候选框中的图像内容不是背景的情况下，再进一步精准定位候选框的位置，并辨识候选框中是否为人脸，造成人脸检测过程复杂，检测效率低下，并且没有提供活体检测的信息，此外，现有技术中的人脸检测技术也无法在黑暗或者低光照环境下进行人脸检测。Most existing face detection technologies first find the approximate position of the candidate frame in the image, and determine that the image content in the candidate frame is not the background, and then further accurately locate the position of the candidate frame and identify whether the candidate frame contains a human face. This makes the face detection process complicated, the detection efficiency is low, and no information on liveness detection is provided. In addition, the face detection technology in the existing technology cannot perform face detection in dark or low-light environments.

因此，如何在黑暗或者低光照环境下也能进行人脸检测和活体检测，并提高人脸检测的准确性和效率，从而综合提高人脸检测装置的性能，是一项亟待解决的技术问题。Therefore, how to perform face detection and liveness detection in dark or low-light environments and improve the accuracy and efficiency of face detection, thereby comprehensively improving the performance of face detection devices, is a technical problem that needs to be solved urgently.

发明内容Summary of the invention

本申请实施例提供了一种人脸检测的方法、装置和电子设备，能够在黑暗或者低光照环境下也能进行人脸检测和活体检测，并提高人脸检测效率和准确性，从而综合提高人脸检测技术的性能。The embodiments of the present application provide a method, device and electronic device for face detection, which can perform face detection and liveness detection in dark or low-light environments, and improve the efficiency and accuracy of face detection, thereby comprehensively improving the performance of face detection technology.

第一方面，提供一种人脸检测的方法，包括：获取待检测目标的深度图，并对该深度图进行特征提取，得到第一特征图；对该第一特征图进行人脸检测，得到人脸区域特征图；对该人脸区域特征图进行特征提取，得到第二特征图；对该第二特征图进行活体检测，得到人脸的活体检测结果；根据该人脸的活体检测结果，输出该深度图中包括活体人脸的人脸区域框。In a first aspect, a method for face detection is provided, comprising: obtaining a depth map of a target to be detected, and performing feature extraction on the depth map to obtain a first feature map; performing face detection on the first feature map to obtain a face area feature map; performing feature extraction on the face area feature map to obtain a second feature map; performing liveness detection on the second feature map to obtain a liveness detection result of the face; and outputting a face area frame including a live face in the depth map based on the liveness detection result of the face.

基于本申请实施例的方案，可以输出得到待检测目标中人脸位置并同步输出人脸的活体检测结果，从而提高人脸检测的准确性，与此同时，本申请实施例的输入的图像为待检测目标的深度图，可以避免环境光对于人脸检测的影响，在低光照、无光照或者逆光照等情况下仍能够有效进行人脸检测，提高人脸检测效率，从而综合提高人脸检测技术的性能。Based on the solution of the embodiment of the present application, the face position in the target to be detected can be output and the liveness detection result of the face can be output synchronously, thereby improving the accuracy of face detection. At the same time, the input image of the embodiment of the present application is a depth map of the target to be detected, which can avoid the influence of ambient light on face detection. In low light, no light or backlighting conditions, face detection can still be effectively performed, thereby improving the efficiency of face detection and comprehensively improving the performance of face detection technology.

在一些可能的实施方式中，该对该深度图进行特征提取，得到第一特征图，包括：采用第一人脸特征提取模块对该深度图进行特征提取，得到该第一特征图，其中，该第一特征图中包括该深度图中的边缘线条特征。In some possible implementations, the performing feature extraction on the depth map to obtain a first feature map includes: performing feature extraction on the depth map using a first face feature extraction module to obtain the first feature map, wherein the first feature map includes edge line features in the depth map.

在一些可能的实施方式中，该第一人脸特征提取模块的卷积层的层数不大于4。In some possible implementations, the number of convolutional layers of the first facial feature extraction module is no more than 4.

在一些可能的实施方式中，该对该第一特征图进行人脸检测，得到人脸区域特征图，包括：采用人脸检测模块对该第一特征图进行人脸检测，得到人脸区域特征图。In some possible implementations, performing face detection on the first feature map to obtain a face region feature map includes: performing face detection on the first feature map using a face detection module to obtain a face region feature map.

在一些可能的实施方式中，该人脸检测模块包括：卷积层网络、人脸范围卷积层和人脸中心卷积层，其中，该采用人脸检测模块对该第一特征图进行人脸检测，得到人脸区域特征图，包括：采用该卷积层网络对该第一特征图进行卷积计算，得到第一中间特征图；采用该人脸范围卷积层和该人脸中心卷积层分别对该第一中间特征图进行卷积计算，得到人脸区域预测图和人脸中心预测图；根据该人脸区域预测图和该人脸中心预测图，得到该第一特征图中的该人脸区域特征图。In some possible implementations, the face detection module includes: a convolutional layer network, a face range convolutional layer and a face center convolutional layer, wherein the face detection module is used to perform face detection on the first feature map to obtain a face area feature map, including: using the convolutional layer network to perform convolution calculation on the first feature map to obtain a first intermediate feature map; using the face range convolutional layer and the face center convolutional layer to perform convolution calculation on the first intermediate feature map respectively to obtain a face area prediction map and a face center prediction map; obtaining the face area feature map in the first feature map according to the face area prediction map and the face center prediction map.

在一些可能的实施方式中，该人脸检测模块还包括：人脸特征专注层；该人脸特征专注层用于对该中间特征图的像素值进行权重分布，以突出该中间特征图中的人脸五官特征。In some possible implementations, the face detection module further includes: a face feature focus layer; the face feature focus layer is used to perform weight distribution on the pixel values of the intermediate feature map to highlight the facial features in the intermediate feature map.

通过本申请实施例的方案，通过在人脸检测模块中加入该人脸特征专注层，可以使得卷积网络能够卷积得到更为凸显人脸五官特征的特征图，增加后续人脸位置检测以及活体检测的准确性。Through the solution of the embodiment of the present application, by adding the face feature focus layer in the face detection module, the convolutional network can convolute to obtain a feature map that better highlights the facial features, thereby increasing the accuracy of subsequent face position detection and liveness detection.

在一些可能的实施方式中，该人脸特征专注层为基于空间的注意力模块。In some possible implementations, the facial feature focus layer is a space-based attention module.

在一些可能的实施方式中，方法还包括：对该人脸检测模块进行神经网络训练，得到该人脸检测模块的参数。In some possible implementations, the method further includes: performing neural network training on the face detection module to obtain parameters of the face detection module.

在一些可能的实施方式中，该人脸检测模块还包括中心调整卷积层，其中，该对该人脸检测模块进行神经网络训练，得到该人脸检测模块的参数，包括：获取样本图像，该样本图像中标注有人脸区域真实值和人脸中心真实值；采用该卷积层网络对该样本图像进行卷积计算，得到第一样本特征图；采用该人脸范围卷积层、该人脸中心卷积层和该中心调整卷积层分别对该第一样本特征图进行卷积计算，得到人脸区域预测值，人脸中心预测值以及人脸中心偏移预测值；根据该人脸区域预测值，该人脸中心预测值、该人脸中心偏移预测值，以及该人脸区域真实值和该人脸中心真实值，计算损失函数得到该人脸检测模块的参数。In some possible implementations, the face detection module also includes a center adjustment convolution layer, wherein the face detection module is trained with a neural network to obtain parameters of the face detection module, including: obtaining a sample image, in which the sample image is annotated with a true value of the face area and a true value of the face center; using the convolution layer network to perform convolution calculation on the sample image to obtain a first sample feature map; using the face range convolution layer, the face center convolution layer and the center adjustment convolution layer to perform convolution calculation on the first sample feature map respectively to obtain a face area prediction value, a face center prediction value and a face center offset prediction value; according to the face area prediction value, the face center prediction value, the face center offset prediction value, the face area true value and the face center true value, calculating the loss function to obtain the parameters of the face detection module.

通过本申请实施例的方案，在训练过程中，通过设置中心调整卷积层等来调整预测的人脸中心位置的坐标，以增加中心位置预测的鲁棒性与准确性，而在实际的人脸检测过程中，只用人脸范围卷积层和人脸中心卷积层获取人脸区域，能够提高检测过程的效率，加快人脸检测的速度。Through the scheme of the embodiment of the present application, during the training process, the coordinates of the predicted face center position are adjusted by setting a center adjustment convolution layer, etc., so as to increase the robustness and accuracy of the center position prediction. In the actual face detection process, only the face range convolution layer and the face center convolution layer are used to obtain the face area, which can improve the efficiency of the detection process and speed up the face detection.

在一些可能的实施方式中，该人脸范围卷积层和该人脸中心卷积层为两个1×1卷积层，其中，该人脸中心预测图为人脸中心热图。In some possible implementations, the face range convolution layer and the face center convolution layer are two 1×1 convolution layers, wherein the face center prediction image is a face center heat map.

在一些可能的实施方式中，该对该人脸区域特征图进行特征提取，得到第二特征图，包括：采用第二人脸特征提取模块对该人脸区域特征图进行特征提取，得到第二特征图，其中，该第二特征图中包括人脸的细节特征。In some possible implementations, the extracting features from the facial area feature map to obtain a second feature map includes: using a second facial feature extraction module to extract features from the facial area feature map to obtain a second feature map, wherein the second feature map includes detailed features of the face.

在一些可能的实施方式中，该第二人脸特征提取模块的卷积层的层数不大于4。In some possible implementations, the number of convolutional layers of the second facial feature extraction module is no more than 4.

在一些可能的实施方式中，该第二特征图中包括人脸的五官特征。In some possible implementations, the second feature map includes facial features of the human face.

在一些可能的实施方式中，该对该第二特征图进行活体检测，得到人脸的活体检测结果，包括：采用专注模块对该第二特征图进行活体检测，得到人脸的活体检测结果，其中，该专注模块为结合空间和通道的注意力机制模块。In some possible implementations, performing liveness detection on the second feature map to obtain a liveness detection result of a face includes: using a focus module to perform liveness detection on the second feature map to obtain a liveness detection result of a face, wherein the focus module is an attention mechanism module that combines space and channels.

通过本申请实施例的方案，采用轻量的专注模块，相比于只关注于通道的注意力模块或者只关注于空间的注意力模块，能够更为简单有效的获得目标特征图。Through the solution of the embodiment of the present application, a lightweight focus module is used, which can more simply and effectively obtain the target feature map compared to an attention module that only focuses on the channel or an attention module that only focuses on the space.

在一些可能的实施方式中，该专注模块包括：多层卷积层、通道注意力模块以及空间注意力模块；该采用专注模块对该第二特征图进行活体检测，得到人脸的活体检测结果，包括：采用第一卷积层对该第二特征图进行卷积计算，得到第一中间特征图；采用该通道注意力模块对该第一中间特征图进行处理得到通道注意力特征图；采用第二卷积层对通道注意力特征图和第一中间特征图进行卷积计算，得到第二中间特征图；采用该空间注意力模块对该第二中间特征图进行处理得到空间注意力特征图；采用第三卷积层对空间注意力特征图和第二中间特征图进行卷积计算，得到目标特征图；基于该目标特征图得到人脸的活体检测结果，其中，该目标特征图中包括人脸的活体特征。In some possible embodiments, the focus module includes: multiple convolutional layers, a channel attention module and a spatial attention module; the use of the focus module to perform liveness detection on the second feature map to obtain a liveness detection result of the face includes: using the first convolutional layer to perform convolution calculation on the second feature map to obtain a first intermediate feature map; using the channel attention module to process the first intermediate feature map to obtain a channel attention feature map; using the second convolutional layer to perform convolution calculation on the channel attention feature map and the first intermediate feature map to obtain a second intermediate feature map; using the spatial attention module to process the second intermediate feature map to obtain a spatial attention feature map; using the third convolutional layer to perform convolution calculation on the spatial attention feature map and the second intermediate feature map to obtain a target feature map; obtaining a liveness detection result of the face based on the target feature map, wherein the target feature map includes liveness features of the face.

通过本申请实施例的方案，实现各步骤的模块形成轻量的类神经网络架构，该轻量的类神经网络便于运行于友善边缘运算的装置上，从而使得该人脸检测方法的可以应用于更多的场景中。Through the solution of the embodiment of the present application, the modules of each step are implemented to form a lightweight neural network-like architecture, which is easy to run on a friendly edge computing device, so that the face detection method can be applied to more scenarios.

在一些可能的实施方式中，该方法运行于边缘运算的装置上。In some possible implementations, the method runs on an edge computing device.

第二方面，提供一种人脸检测的装置，包括：获取单元，用于获取待检测目标的深度图；第一人脸特征提取模块，用于对该深度图进行特征提取，得到第一特征图；人脸检测模块，用于对该第一特征图进行人脸检测，得到人脸区域特征图；第二人脸特征提取模块，对该人脸区域特征图进行特征提取，得到第二特征图；专注模块，对该第二特征图进行活体检测，得到人脸的活体检测结果；输出模块，根据该人脸的活体检测结果，输出该深度图中包括活体人脸的人脸区域框。In a second aspect, a face detection device is provided, comprising: an acquisition unit, used to acquire a depth map of a target to be detected; a first face feature extraction module, used to perform feature extraction on the depth map to obtain a first feature map; a face detection module, used to perform face detection on the first feature map to obtain a face area feature map; a second face feature extraction module, used to perform feature extraction on the face area feature map to obtain a second feature map; a focus module, used to perform liveness detection on the second feature map to obtain a liveness detection result of the face; and an output module, used to output a face area frame including a live face in the depth map according to the liveness detection result of the face.

在一些可能的实施方式中，该第一特征图中包括该深度图中的边缘线条特征。In some possible implementations, the first feature map includes edge line features in the depth map.

在一些可能的实施方式中，该人脸检测模块包括：卷积层网络、人脸范围卷积层和人脸中心卷积层；In some possible implementations, the face detection module includes: a convolutional layer network, a face range convolutional layer, and a face center convolutional layer;

该卷积层网络用于对该第一特征图进行卷积计算，得到中间特征图；该人脸范围卷积层和该人脸中心卷积层分别用于对该中间特征图进行卷积计算，得到人脸区域预测图和人脸中心预测图；该人脸区域预测图和该人脸中心预测图用于将其检测结果映射至该第一特征图中，得到该人脸区域特征图。The convolution layer network is used to perform convolution calculation on the first feature map to obtain an intermediate feature map; the face range convolution layer and the face center convolution layer are respectively used to perform convolution calculation on the intermediate feature map to obtain a face area prediction map and a face center prediction map; the face area prediction map and the face center prediction map are used to map their detection results to the first feature map to obtain the face area feature map.

在一些可能的实施方式中，该人脸检测模块还包括：人脸特征专注层；该卷积层网络和该人脸特征专注层用于对该第一特征图进行卷积计算，得到中间特征图；该人脸特征专注层用于对该中间特征图的像素值进行权重分布，以突出该中间特征图中的人脸五官特征。In some possible implementations, the face detection module also includes: a face feature focus layer; the convolution layer network and the face feature focus layer are used to perform convolution calculations on the first feature map to obtain an intermediate feature map; the face feature focus layer is used to weight the pixel values of the intermediate feature map to highlight the facial features in the intermediate feature map.

在一些可能的实施方式中，该人脸检测模块的参数是通过神经网络训练得到的。In some possible implementations, the parameters of the face detection module are obtained through neural network training.

在一些可能的实施方式中，该人脸检测模块还包括中心调整卷积层，该卷积层网络还用于：对样本图像进行卷积计算，得到第一样本特征图，其中，该样本图像中标注有人脸区域真实值和人脸中心真实值；该人脸范围卷积层、该人脸中心卷积层和该中心调整卷积层用于：分别对该第一样本特征图进行卷积计算，得到人脸区域预测值，人脸中心预测值以及人脸中心偏移预测值；该人脸区域预测值，该人脸中心预测值、该人脸中心偏移预测值，以及该人脸区域真实值和该人脸中心真实值用于计算损失函数得到该人脸检测模块的参数。In some possible embodiments, the face detection module also includes a center adjustment convolution layer, and the convolution layer network is also used to: perform convolution calculation on the sample image to obtain a first sample feature map, wherein the sample image is annotated with a true value of the face area and a true value of the face center; the face range convolution layer, the face center convolution layer and the center adjustment convolution layer are used to: perform convolution calculation on the first sample feature map respectively to obtain a face area prediction value, a face center prediction value and a face center offset prediction value; the face area prediction value, the face center prediction value, the face center offset prediction value, the face area true value and the face center true value are used to calculate the loss function to obtain the parameters of the face detection module.

在一些可能的实施方式中，该第二特征图中包括人脸的细节特征。In some possible implementations, the second feature map includes detailed features of a human face.

在一些可能的实施方式中，该专注模块为结合空间和通道的注意力机制模块。In some possible implementations, the focus module is a module that combines spatial and channel attention mechanisms.

在一些可能的实施方式中，该专注模块包括：多层卷积层、通道注意力模块以及空间注意力模块；该多层卷积层中的第一卷积层对该第二特征图进行卷积计算，得到第一中间特征图；采用该通道注意力模块对该第一中间特征图进行处理得到通道注意力特征图；采用第二卷积层对通道注意力特征图和第一中间特征图进行卷积计算，得到第二中间特征图；采用该空间注意力模块对该第二中间特征图进行处理得到空间注意力特征图；采用第三卷积层对空间注意力特征图和第二中间特征图进行卷积计算，得到目标特征图；基于该目标特征图得到人脸的活体检测结果，其中，该目标特征图中包括人脸的活体特征。In some possible embodiments, the focus module includes: a multi-layer convolutional layer, a channel attention module and a spatial attention module; the first convolutional layer in the multi-layer convolutional layer performs convolution calculation on the second feature map to obtain a first intermediate feature map; the channel attention module is used to process the first intermediate feature map to obtain a channel attention feature map; the second convolutional layer is used to perform convolution calculation on the channel attention feature map and the first intermediate feature map to obtain a second intermediate feature map; the spatial attention module is used to process the second intermediate feature map to obtain a spatial attention feature map; the third convolutional layer is used to perform convolution calculation on the spatial attention feature map and the second intermediate feature map to obtain a target feature map; based on the target feature map, a liveness detection result of the face is obtained, wherein the target feature map includes liveness features of the face.

在一些可能的实施方式中，该装置为边缘运算装置。In some possible implementations, the device is an edge computing device.

第三方面，提供一种电子设备，包括：上述第二方面或者其任一可能的实施方式中的人脸检测的装置。According to a third aspect, an electronic device is provided, comprising: a face detection device according to the second aspect or any possible implementation manner thereof.

在一些可能的实施方式中，该电子设备还包括：深度图采集装置。In some possible implementations, the electronic device further includes: a depth map acquisition device.

第四方面，提供一种计算机可读存储介质，用于存储程序指令，该程序指令被计算机运行时，该计算机执行上述第一方面或第一方面的任一可能的实现方式中的人脸检测的方法。In a fourth aspect, a computer-readable storage medium is provided for storing program instructions. When the program instructions are executed by a computer, the computer executes the face detection method in the above-mentioned first aspect or any possible implementation manner of the first aspect.

第五方面，提供一种包含指令的计算机程序产品，该指令被计算机执行时使得该计算机执行上述第一方面或第一方面的任一可能的实现方式中的人脸检测的方法。In a fifth aspect, a computer program product comprising instructions is provided, which, when executed by a computer, enables the computer to perform the face detection method in the above-mentioned first aspect or any possible implementation manner of the first aspect.

具体地，该计算机程序产品可以运行于上述第三方面的电子设备上。Specifically, the computer program product can run on the electronic device of the third aspect mentioned above.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本申请提供的系统架构的结构示意图。FIG1 is a schematic diagram of the structure of the system architecture provided by the present application.

图2是根据本申请实施例的一种faster RCNN的基本框架示意图。FIG2 is a schematic diagram of a basic framework of a Faster RCNN according to an embodiment of the present application.

图3是根据本申请实施例的一种faster RCNN的目标检测过程示意图。FIG3 is a schematic diagram of a target detection process of a Faster RCNN according to an embodiment of the present application.

图4是根据本申请实施例的一种人脸检测方法的示意性流程框图。FIG4 is a schematic flowchart of a face detection method according to an embodiment of the present application.

图5是根据本申请实施例的一种类神经网络架构的示意性结构图。FIG5 is a schematic structural diagram of a neural network-like architecture according to an embodiment of the present application.

图6是根据本申请实施例的一种人脸检测模块的架构示意图。FIG6 is a schematic diagram of the architecture of a face detection module according to an embodiment of the present application.

图7是根据本申请实施例的第一特征图中的人脸区域框的示意图。FIG. 7 is a schematic diagram of a face region frame in a first feature map according to an embodiment of the present application.

图8是根据本申请实施例的另一种人脸检测模块的结构示意图。FIG8 is a schematic diagram of the structure of another face detection module according to an embodiment of the present application.

图9是根据本申请实施例的一种专注模块的结构示意图。FIG9 is a schematic diagram of the structure of a focus module according to an embodiment of the present application.

图10是根据本申请实施例的一种人脸检测装置的示意性结构框图。FIG. 10 is a schematic structural block diagram of a face detection device according to an embodiment of the present application.

图11是根据本申请实施例的处理单元的示意性结构框图。FIG. 11 is a schematic structural block diagram of a processing unit according to an embodiment of the present application.

图12是根据本申请实施例的人脸检测装置的硬件结构示意图。FIG. 12 is a schematic diagram of the hardware structure of a face detection device according to an embodiment of the present application.

具体实施方式Detailed ways

下面将结合附图，对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below in conjunction with the accompanying drawings.

本申请实施例可适用于人脸检测系统，包括但不限于基于光学人脸成像的产品。该人脸检测系统可以应用于具有图像采集装置(如摄像头)的各种电子设备，该电子设备可以为个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、媒体消费设备、可穿戴设备、机顶盒、游戏机、增强现实(augmented reality，AR)AR/虚拟现实(virtual reality，VR)设备，车载终端等，本申请公开的实施例对此不做限定。The embodiments of the present application may be applicable to face detection systems, including but not limited to products based on optical face imaging. The face detection system may be applied to various electronic devices with image acquisition devices (such as cameras), such as personal computers, computer workstations, smart phones, tablet computers, smart cameras, media consumption devices, wearable devices, set-top boxes, game consoles, augmented reality (AR) AR/virtual reality (VR) devices, vehicle-mounted terminals, etc., and the embodiments disclosed in the present application do not limit this.

应理解，本文中的具体的例子只是为了帮助本领域技术人员更好地理解本申请实施例，而非限制本申请实施例的范围。It should be understood that the specific examples in this article are only intended to help those skilled in the art better understand the embodiments of the present application, rather than to limit the scope of the embodiments of the present application.

还应理解，在本申请的各种实施例中，各过程的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should also be understood that in the various embodiments of the present application, the size of the serial number of each process does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

还应理解，本说明书中描述的各种实施方式，既可以单独实施，也可以组合实施，本申请实施例对此并不限定。It should also be understood that the various implementation methods described in this specification can be implemented individually or in combination, and the embodiments of the present application are not limited to this.

除非另有说明，本申请实施例所使用的所有技术和科学术语与本申请的技术领域的技术人员通常理解的含义相同。本申请中所使用的术语只是为了描述具体的实施例的目的，不是旨在限制本申请的范围。本申请所使用的术语“和/或”包括一个或多个相关的所列项的任意的和所有的组合。Unless otherwise stated, all technical and scientific terms used in the embodiments of the present application have the same meaning as those generally understood by those skilled in the art of the present application. The terms used in the present application are only for the purpose of describing specific embodiments and are not intended to limit the scope of the present application. The term "and/or" used in the present application includes any and all combinations of one or more related listed items.

为了更好地理解本申请实施例的方案，下面先结合图1对本申请实施例可能的应用场景进行简单的介绍。In order to better understand the solution of the embodiment of the present application, the possible application scenarios of the embodiment of the present application are briefly introduced below in conjunction with Figure 1.

如图1所示，本申请实施例提供了一种系统架构100。在图1中，数据采集设备160用于采集训练数据。针对本申请实施例的人脸检测的方法来说，训练数据可以包括训练图像或者训练视频。As shown in Fig. 1, the embodiment of the present application provides a system architecture 100. In Fig. 1, a data acquisition device 160 is used to acquire training data. For the face detection method of the embodiment of the present application, the training data may include training images or training videos.

在采集到训练数据之后，数据采集设备160将这些训练数据存入数据库130，训练设备120基于数据库130中维护的训练数据训练得到目标模型/规则101。After collecting the training data, the data collection device 160 stores the training data in the database 130 , and the training device 120 obtains the target model/rule 101 through training based on the training data maintained in the database 130 .

上述目标模型/规则101能够用于实现本申请实施例的人脸检测的方法。本申请实施例中的目标模型/规则101具体可以为神经网络。需要说明的是，在实际的应用中，所述数据库130中维护的训练数据不一定都来自于数据采集设备160的采集，也有可能是从其他设备接收得到的。另外需要说明的是，训练设备120也不一定完全基于数据库130维护的训练数据进行目标模型/规则101的训练，也有可能从云端或其他地方获取训练数据进行模型训练，上述描述不应该作为对本申请实施例的限定。The above-mentioned target model/rule 101 can be used to implement the method of face detection in the embodiment of the present application. The target model/rule 101 in the embodiment of the present application can specifically be a neural network. It should be noted that in actual applications, the training data maintained in the database 130 does not necessarily all come from the collection of the data acquisition device 160, and may also be received from other devices. It should also be noted that the training device 120 does not necessarily train the target model/rule 101 entirely based on the training data maintained by the database 130, and it is also possible to obtain training data from the cloud or other places for model training. The above description should not be used as a limitation on the embodiments of the present application.

根据训练设备120训练得到的目标模型/规则101可以应用于不同的系统或设备中，如应用于图1所示的执行设备110，所述执行设备110可以是终端，如手机终端，平板电脑，笔记本电脑等，还可以是服务器或者云端等。在图1中，执行设备110配置输入/输出(input/output，I/O)接口112，用于与外部设备进行数据交互，用户可以通过客户设备140向I/O接口112输入数据，所述输入数据在本申请实施例中可以包括：客户设备140输入的待处理视频或者待处理图像。The target model/rule 101 obtained by training with the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG1 , and the execution device 110 can be a terminal, such as a mobile phone terminal, a tablet computer, a laptop computer, etc., or a server or a cloud, etc. In FIG1 , the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with an external device. The user can input data to the I/O interface 112 through the client device 140. The input data can include: a video to be processed or an image to be processed input by the client device 140 in the embodiment of the present application.

在一些实施方式中，该客户设备140可以与上述执行设备110为同一设备，例如，客户设备140可以与上述执行设备110均为终端设备。In some implementations, the client device 140 may be the same device as the execution device 110 . For example, the client device 140 and the execution device 110 may both be terminal devices.

在另一些实施方式中，该客户设备140可以与上述执行设备110为不同设备，例如，客户设备140为终端设备，而执行设备110为云端、服务器等设备，客户设备140可以通过任何通信机制/通信标准的通信网络与执行设备310进行交互，通信网络可以是广域网、局域网、点对点连接等方式，或它们的任意组合。In other embodiments, the client device 140 may be a different device from the execution device 110. For example, the client device 140 is a terminal device, and the execution device 110 is a cloud device, a server, or other device. The client device 140 may interact with the execution device 310 through a communication network with any communication mechanism/communication standard. The communication network may be a wide area network, a local area network, a point-to-point connection, or any combination thereof.

执行设备110的计算模块111用于根据I/O接口112接收到的输入数据(如待处理图像)进行处理。在执行设备110的计算模块111执行计算等相关的处理过程中，执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理，也可以将相应处理得到的数据、指令等存入数据存储系统150中。The computing module 111 of the execution device 110 is used to process the input data (such as the image to be processed) received by the I/O interface 112. When the computing module 111 of the execution device 110 performs the calculation and other related processing, the execution device 110 can call the data, code, etc. in the data storage system 150 for the corresponding processing, and can also store the data, instructions, etc. obtained by the corresponding processing in the data storage system 150.

最后，I/O接口112将处理结果，如上述得到的人脸检测结果返回给客户设备140，从而提供给用户。Finally, the I/O interface 112 returns the processing result, such as the face detection result obtained above, to the client device 140 for providing to the user.

值得说明的是，训练设备120可以针对不同的目标或称不同的任务，基于不同的训练数据生成相应的目标模型/规则101，该相应的目标模型/规则101即可以用于实现上述目标或完成上述任务，从而为用户提供所需的结果。It is worth noting that the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or different tasks. The corresponding target models/rules 101 can be used to achieve the above goals or complete the above tasks, thereby providing users with the desired results.

在图1中所示情况下，用户可以手动给定输入数据，该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下，客户设备140可以自动地向I/O接口112发送输入数据，如果要求客户设备140自动发送输入数据需要获得用户的授权，则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果，具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端，采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据，并存入数据库130。当然，也可以不经过客户设备140进行采集，而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果，作为新的样本数据存入数据库130。In the case shown in FIG. 1 , the user can manually give input data, and the manual giving can be operated through the interface provided by the I/O interface 112. In another case, the client device 140 can automatically send input data to the I/O interface 112. If the client device 140 is required to automatically send input data and needs to obtain the user's authorization, the user can set the corresponding authority in the client device 140. The user can view the results output by the execution device 110 on the client device 140, and the specific presentation form can be a specific method such as display, sound, action, etc. The client device 140 can also be used as a data acquisition terminal to collect the input data of the input I/O interface 112 and the output results of the output I/O interface 112 as shown in the figure as new sample data, and store them in the database 130. Of course, it is also possible not to collect through the client device 140, but the I/O interface 112 directly stores the input data of the input I/O interface 112 and the output results of the output I/O interface 112 as new sample data in the database 130.

值得注意的是，图1仅是本申请实施例提供的一种系统架构的示意图，图中所示设备、器件、模块等之间的位置关系不构成任何限制，例如，在图1中，数据存储系统150相对执行设备110是外部存储器，在其它情况下，也可以将数据存储系统150置于执行设备110中。It is worth noting that Figure 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application. The positional relationship between the devices, components, modules, etc. shown in the figure does not constitute any limitation. For example, in Figure 1, the data storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 can also be placed in the execution device 110.

如图1所示，根据训练设备120训练得到目标模型/规则101，该目标模型/规则101在本申请实施例中可以是神经网络，具体的，本申请实施例的神经网络可以为卷积神经网络(convolutional neuron network，CNN)、区域卷积神经网络(region CNN，RCNN)，加快区域卷积神经网络(faster RCNN)或者其它类型的神经网络等等，本申请对此不做具体限定。As shown in Figure 1, the target model/rule 101 is obtained by training according to the training device 120. The target model/rule 101 can be a neural network in the embodiment of the present application. Specifically, the neural network in the embodiment of the present application can be a convolutional neural network (CNN), a regional convolutional neural network (RCNN), an accelerated regional convolutional neural network (faster RCNN) or other types of neural networks, etc., and the present application does not make specific limitations on this.

目前，在人脸检测系统中，通常使用两阶段式的神经网络架构，例如上述fasterRCNN神经网络。Currently, in face detection systems, a two-stage neural network architecture is usually used, such as the fasterRCNN neural network mentioned above.

为了方便理解，下面首先结合图2和图3对faster RCNN的神经网络进行简单介绍。For ease of understanding, we first briefly introduce the faster RCNN neural network in conjunction with Figures 2 and 3.

图2示出了faster RCNN的基本框架示意图，图3示出了faster RCNN的目标检测过程示意图。FIG2 shows a schematic diagram of the basic framework of Faster RCNN, and FIG3 shows a schematic diagram of the target detection process of Faster RCNN.

如图2和图3所示，faster RCNN可以分为区域候选网络(region proposalnetwork，RPN)、卷积神经网络CNN、引言感兴趣区域池化(region of interest pooling，ROI pooling)、以及分类器(classifier)几大部分。As shown in Figures 2 and 3, Faster RCNN can be divided into several parts: region proposal network (RPN), convolutional neural network CNN, region of interest pooling (ROI pooling), and classifier.

其中，卷积神经网络CNN用于对输入图像进行特征卷积，提取输入图像的特征图(feature map)。Among them, the convolutional neural network CNN is used to perform feature convolution on the input image and extract the feature map of the input image.

区域候选网络RPN用于对特征图中的卷积特征提取候选框(region proposals)。在一些实施方式中，通过在特征图中设置多个锚框(anchor)，然后通过softmax函数判断锚框属于包括检测目标的锚框(positive anchor)或者是不包括检测目标的锚框(negativeanchor)，再利用边框回归(bounding box regression)修正锚框获得精确的候选框。The region proposal network RPN is used to extract candidate boxes (region proposals) from the convolutional features in the feature map. In some embodiments, multiple anchor boxes are set in the feature map, and then the softmax function is used to determine whether the anchor box belongs to an anchor box that includes the detection target (positive anchor) or an anchor box that does not include the detection target (negative anchor), and then the bounding box regression is used to correct the anchor box to obtain an accurate candidate box.

ROI池化用于接收特征图以及候选框，综合这些信息后提取候选特征图(proposalfeature maps)，方便后续的分类器进行分类识别。ROI pooling is used to receive feature maps and candidate boxes, and extract candidate feature maps (proposal feature maps) after integrating this information, which facilitates subsequent classifiers to perform classification and recognition.

分类器利用候选特征图计算候选框的类别，同时再次边框回归获得检测框最终的精确位置。The classifier uses the candidate feature map to calculate the category of the candidate box, and then regresses the box to obtain the final precise position of the detection box.

通过上述说明可知，若采用该faster RCNN网络进行人脸检测，在通过CNN网络得到输入图像的特征图之后，第一阶段会先利用RPN中anchor大约找出人脸候选框的位置跟确定不是背景，第二阶段再透过后续处理辨识候选框是否为人脸以及更精准定位候选框的位置。From the above description, we can know that if the Faster RCNN network is used for face detection, after obtaining the feature map of the input image through the CNN network, the first stage will use the anchor in the RPN to roughly find the position of the face candidate frame and determine that it is not the background. In the second stage, subsequent processing is used to identify whether the candidate frame is a face and locate the position of the candidate frame more accurately.

采用上述传统的人脸检测方法，其人脸检测过程较为复杂，需要使用专用服务器或者大型服务器执行上述方法，不利于应用推广于边缘运算(edge computing)的装置上，且复杂的人脸检测过程也会严重影响检测效率。The face detection process of the above-mentioned traditional face detection method is relatively complicated and requires the use of a dedicated server or a large server to execute the above-mentioned method, which is not conducive to the application and promotion of edge computing devices. In addition, the complicated face detection process will also seriously affect the detection efficiency.

另外，在在黑暗或者低光照环境下，无法拍摄输入图像或者拍摄得到的输入图像质量变差，难以在黑暗或者低光照环境下进行人脸检测，且上述人脸检测方法也没有提供活体检测的信息，造成人脸检测结果不准确，影响人脸检测的综合性能。In addition, in dark or low-light environments, it is impossible to capture input images or the quality of the captured input images deteriorates, making it difficult to perform face detection in dark or low-light environments. The above-mentioned face detection method does not provide information on liveness detection, resulting in inaccurate face detection results and affecting the overall performance of face detection.

基于此，本申请针对友善边缘运算的装置，提出了一种友善边缘运算的类神经网络架构，基于该类神经网络架构进行人脸检测，能够高效率的执行人脸检测过程，使其可以运行于边缘运算装置上，并在人脸检测的过程中提供活体检测的结果，在此基础上，本申请的人脸检测方法对光的抵抗性较好，能够不受光信号的变化影响，在低光照或者无光照情况下仍能进行人脸检测。Based on this, the present application proposes a friendly edge computing-like neural network architecture for friendly edge computing devices. Face detection is performed based on this type of neural network architecture, and the face detection process can be executed efficiently, so that it can run on the edge computing device and provide liveness detection results during the face detection process. On this basis, the face detection method of the present application has good resistance to light, is not affected by changes in light signals, and can still perform face detection in low light or no light conditions.

下面，首先结合图4至图9，说明本申请实施例提供的用于人脸检测的类神经网络架构以及人脸检测方法流程。Below, firstly, in combination with Figures 4 to 9, the neural network-like architecture for face detection and the face detection method flow provided in the embodiments of the present application are explained.

图4示出了一种人脸检测方法200的示意性流程框图。可选地，该人脸检测方法200的执行主体可以为上文图1中的执行设备110。Fig. 4 shows a schematic flowchart of a face detection method 200. Optionally, the execution subject of the face detection method 200 may be the execution device 110 in Fig. 1 above.

如图4所示，该人脸检测方法200可以包括以下步骤。As shown in FIG. 4 , the face detection method 200 may include the following steps.

S210：获取待检测目标的深度图，并对该深度图进行特征提取，得到第一特征图。S210: Obtain a depth map of the target to be detected, and perform feature extraction on the depth map to obtain a first feature map.

S220：对该第一特征图进行人脸检测，得到人脸区域特征图。S220: Perform face detection on the first feature map to obtain a face area feature map.

S230：对该人脸区域特征图进行进一步的特征提取，得到第二特征图。S230: Perform further feature extraction on the face region feature map to obtain a second feature map.

S240：对该第二特征图进行活体检测，得到人脸的活体检测结果。S240: Perform liveness detection on the second feature map to obtain a liveness detection result of the face.

S250：根据人脸的活体检测结果，输出深度图中包括活体人脸的至少一个人脸区域框。S250: Outputting at least one face region frame including a living face in the depth map according to the liveness detection result of the face.

在本申请实施例中，待检测目标包括但不限于人脸、照片、视频、三维模型等任意物体。例如，待检测目标可以为目标用户人脸、其他用户的人脸、用户照片、贴有照片的曲面模型等等。In the embodiment of the present application, the target to be detected includes but is not limited to any object such as a face, a photo, a video, a three-dimensional model, etc. For example, the target to be detected can be the face of the target user, the face of other users, a user photo, a surface model with a photo attached, etc.

作为示例，在一些实施方式中，深度图采集装置采集待检测目标的深度图后，将该深度图发送给执行设备中的处理单元，以进行后续的图像处理工作。可选地，该深度图采集装置可以集成于该执行设备中，也可以与该执行设备分离设置。As an example, in some embodiments, after the depth map acquisition device acquires the depth map of the target to be detected, the depth map is sent to the processing unit in the execution device for subsequent image processing. Optionally, the depth map acquisition device can be integrated into the execution device or can be set separately from the execution device.

本申请实施例中的深度图(depth image)为待检测目标的深度图，该深度图也被称为距离影像(range image)，该待检测目标的深度图中的像素值表示待检测目标的表面各点与同一点或者同一平面之间距离信息。The depth image in the embodiment of the present application is a depth map of the target to be detected, which is also called a range image. The pixel values in the depth map of the target to be detected represent the distance information between each surface point of the target to be detected and the same point or the same plane.

例如，深度图采集装置用于获取人脸的深度图，该深度图中的像素值表示人脸表面各点距离图像采集模块的距离。当深度图为灰度图像时，图像像素值的变化也可以表现为图像的灰度变化，因此，深度图的灰度变化也对应于人脸的深度变化，直接反映了人脸表面的几何形状以及深度信息。For example, the depth map acquisition device is used to obtain the depth map of the face, and the pixel values in the depth map represent the distance between each point on the face surface and the image acquisition module. When the depth map is a grayscale image, the change in the image pixel value can also be expressed as the grayscale change of the image. Therefore, the grayscale change of the depth map also corresponds to the depth change of the face, which directly reflects the geometric shape and depth information of the face surface.

在一些可能的实施方式中，可以通过结构光投射模组投射结构光至待检测目标，深度图采集装置接收结构光经待检测目标反射后的反射结构光信号，并将该反射结构光信号转换得到深度图。In some possible implementations, structured light can be projected to a target to be detected by a structured light projection module, and a depth map acquisition device receives a reflected structured light signal after the structured light is reflected by the target to be detected, and converts the reflected structured light signal into a depth map.

可选地，上述结构光包括但不限于散斑图像，点阵光等带有结构图案的光信号，结构光投射模组可以为任意投射结构光的装置结构，包括但不限于：采用垂直腔面发射激光器(vertical cavity surface emitting laser，VCSEL)光源的点阵光投射器，散斑结构光投射器等发光装置。Optionally, the above-mentioned structured light includes but is not limited to speckle images, dot light and other light signals with structural patterns, and the structured light projection module can be any device structure that projects structured light, including but not limited to: a dot light projector using a vertical cavity surface emitting laser (VCSEL) light source, a speckle structured light projector and other light-emitting devices.

应理解，本申请实施例还可以采用其他能够获取待检测目标的深度信息的图像采集模块来进行深度图的采集，例如飞行时间(time of flight，TOF)光模块等图像采集模块来获取深度图，然后将深度图传输至处理单元，本申请实施例对图像采集模块的类型以及深度图的采集方式不做具体限定。It should be understood that the embodiments of the present application may also use other image acquisition modules that can obtain depth information of the target to be detected to acquire the depth map, such as time of flight (TOF) light modules and other image acquisition modules to acquire the depth map, and then transmit the depth map to the processing unit. The embodiments of the present application do not specifically limit the type of image acquisition module and the method of acquiring the depth map.

还应理解，在上述步骤中，也可以获取待检测目标的点云(point cloud)数据，并将该点云数据转换为深度图，具体获取待检测目标的点云数据的技术方案以及将点云数据转换为深度图的具体技术方案，可以参见相关技术中的方法，本申请实施例对此也不做具体限定。It should also be understood that in the above steps, point cloud data of the target to be detected can also be obtained and the point cloud data can be converted into a depth map. For the specific technical solution for obtaining the point cloud data of the target to be detected and the specific technical solution for converting the point cloud data into a depth map, please refer to the methods in the relevant technology, and the embodiments of the present application do not make specific limitations on this.

获取待检测目标的深度图后，本申请提出一种类神经网络架构，对该深度图进行后续处理，以得到深度图中活体人脸的人脸区域框。After obtaining the depth map of the target to be detected, the present application proposes a neural network-like architecture to perform subsequent processing on the depth map to obtain a face region frame of a living face in the depth map.

图5示出了本申请实施例的一种类神经网络架构20的示意性结构图。FIG5 shows a schematic structural diagram of a neural network-like architecture 20 according to an embodiment of the present application.

如图5所示，该类神经网络架构20包括：第一人脸特征提取模块21，人脸检测模块22，第二人脸特征提取模块23，以及专注模块24。As shown in FIG. 5 , the neural network architecture 20 includes: a first face feature extraction module 21 , a face detection module 22 , a second face feature extraction module 23 , and a focus module 24 .

具体地，该第一人脸特征提取模块21用于执行上述步骤S210，对待检测目标的深度图进行特征提取，得到至少一张第一特征图。Specifically, the first face feature extraction module 21 is used to execute the above step S210, extract features from the depth map of the target to be detected, and obtain at least one first feature map.

在一些实施方式中，该第一人脸特征提取模块21可以包括至少一层卷积层(convolutional layer)，对待检测目标的深度图进行卷积计算，从而提取到深度图中的边缘线条特征，或者说提取到深度图中的高频特征，若待检测目标为人脸，则通过该第一人脸特征提取模块21提取得到的是人脸上的五官线条以及人脸边缘线条等人脸线条特征。In some embodiments, the first facial feature extraction module 21 may include at least one convolutional layer to perform convolution calculation on the depth map of the target to be detected, thereby extracting edge line features in the depth map, or extracting high-frequency features in the depth map. If the target to be detected is a human face, the facial line features such as facial features and facial edge lines are extracted by the first facial feature extraction module 21.

在一些实施方式中，至少一层卷积层中的每个卷积层均包括一个或者多个卷积核(kernel)。其中，卷积核也称为滤波器(filter)或者特征检测器(feature detector)。通过在图像上滑动卷积核并计算点乘得到矩阵叫做卷积特征(convolved feature)或者激活图(activation map)或者特征图(feature map)。对于同样的输入图像，不同值的卷积核将会生成不同的特征图，因而通过一个或者多个卷积核，可以得到一张或者多张包括线条特征的第一特征图。通过修改卷积核的数值，可以从深度图中检测到不同的第一特征图。In some embodiments, each convolution layer in at least one convolution layer includes one or more convolution kernels. The convolution kernel is also called a filter or a feature detector. The matrix obtained by sliding the convolution kernel on the image and calculating the dot product is called a convolution feature or an activation map or a feature map. For the same input image, convolution kernels with different values will generate different feature maps, so one or more first feature maps including line features can be obtained through one or more convolution kernels. By modifying the value of the convolution kernel, different first feature maps can be detected from the depth map.

应理解，上述卷积核可以为3×3矩阵，5×5矩阵或者其它大小的矩阵，本申请实施例对此不做限定。It should be understood that the above convolution kernel can be a 3×3 matrix, a 5×5 matrix or a matrix of other sizes, and the embodiments of the present application are not limited to this.

还应理解，在本申请实施例中，第一人脸特征提取模块21中的卷积层的层数可以为一层至四层之间，或者还可以为四层以上的卷积层，每层卷积层中的多个卷积核的大小可以相同或者不同，多个卷积核的卷积步长可以相同或者不同，本申请实施例对此不做限定。It should also be understood that in the embodiment of the present application, the number of convolutional layers in the first facial feature extraction module 21 can be between one and four layers, or can be more than four convolutional layers, the sizes of multiple convolutional kernels in each convolutional layer can be the same or different, and the convolution step sizes of multiple convolutional kernels can be the same or different, which is not limited in the embodiment of the present application.

还应理解，在本申请实施例中，第一人脸特征提取模块21中的至少一层卷积层的类型可以相同或者可以不同，该至少一层卷积层包括但不限于是二维卷积，三维卷积，逐点卷积(pointwise convolution)，深度卷积(depthwise convolution)，可分离卷积(separable convolution)，反卷积(deconvolution)，和/或空洞卷积(dilatedconvolution)等等。It should also be understood that in the embodiment of the present application, the type of at least one convolution layer in the first facial feature extraction module 21 may be the same or different, and the at least one convolution layer includes but is not limited to two-dimensional convolution, three-dimensional convolution, pointwise convolution, depthwise convolution, separable convolution, deconvolution, and/or dilated convolution, etc.

可选地，在第一人脸特征提取模块21中，至少一层卷积层之后，还可以包括激励层(activation layer)，该激励层中包含激励函数，用于对卷积得到的特征图中的每个像素值进行非线性化处理。可选地，激励函数包括但不限于修正线性单元(rectified linearunit，ReLU)函数、指数线性单元(exponential linear unit，ELU)函数，以及ReLU函数的几种变体形式，例如：带泄露修正线性单元(leaky ReLU，LReLU)，参数化修正线性单元(parametric ReLU，PReLU)，随机纠正线性单元(randomized ReLU，RReLU)等。经过激励函数处理后的特征图中，像素值具有稀疏性，激励层可以实现稀疏后的神经网络结构能够更好地挖掘相关特征，拟合训练数据。Optionally, in the first face feature extraction module 21, after at least one convolution layer, an activation layer may also be included, the activation layer including an activation function for performing nonlinear processing on each pixel value in the feature map obtained by convolution. Optionally, the activation function includes but is not limited to a rectified linear unit (ReLU) function, an exponential linear unit (ELU) function, and several variants of the ReLU function, such as: a leaky rectified linear unit (LReLU), a parametric rectified linear unit (PReLU), a randomized rectified linear unit (RReLU), etc. In the feature map processed by the activation function, the pixel values are sparse, and the activation layer can realize that the sparse neural network structure can better mine relevant features and fit the training data.

由于上述第一人脸特征提取模块21位于类神经网络的前端，该第一人脸特征提取模块21中的多层结构也可以称之为类神经网络的浅层，对深度图进行初步处理。Since the first facial feature extraction module 21 is located at the front end of the neural network, the multi-layer structure in the first facial feature extraction module 21 can also be called the shallow layer of the neural network, which performs preliminary processing on the depth map.

可选地，经过上述第一人脸特征提取模块21执行步骤S210之后，类神经网络模型中的人脸检测模块22继续执行上述步骤S220，对该第一特征图进行人脸检测(facedetection)，得到人脸区域特征图。Optionally, after the first facial feature extraction module 21 executes step S210, the face detection module 22 in the neural network-like model continues to execute step S220, performs face detection on the first feature map, and obtains a facial region feature map.

作为示例，图6示出了一种人脸检测模块22的架构示意图。As an example, FIG6 shows a schematic diagram of the architecture of a face detection module 22 .

如图6所示，该人脸检测模块22可以包括：卷积层网络221，人脸范围卷积层222以及人脸中心卷积层223。As shown in FIG. 6 , the face detection module 22 may include: a convolutional layer network 221 , a face range convolutional layer 222 , and a face center convolutional layer 223 .

具体地，该卷积层网络221包括至少两层卷积层，以进一步对上述第一人脸特征提取模块21输出的第一特征图进行卷积操作，以提取第一特征图中更多的人脸特征。Specifically, the convolutional layer network 221 includes at least two convolutional layers to further perform a convolution operation on the first feature map output by the first facial feature extraction module 21 to extract more facial features from the first feature map.

可以理解的是，该卷积层网络221中，每一层卷积层中的卷积核大小、卷积核类型以及卷积核的值可以不同，以提取第一特征图中不同维度的特征信息，从而组合得到包括有人脸特征的特征图。与此同时，通过该卷积层网络221，可以控制特征图的通道数量，特征图大小等等过程参数，便于类神经网络的后层结构进行统一化处理。It is understandable that in the convolution layer network 221, the convolution kernel size, convolution kernel type and convolution kernel value in each convolution layer can be different to extract feature information of different dimensions in the first feature map, thereby combining to obtain a feature map including facial features. At the same time, through the convolution layer network 221, the number of channels of the feature map, the size of the feature map and other process parameters can be controlled to facilitate unified processing of the back layer structure of the neural network.

可选地，如图6所示，该人脸检测模块22还可以包括人脸特征专注层224，该人脸特征专注层224为表示卷积模块的注意力机制(attention mechanism)模块，其可以为基于空间(spatial)的注意力机制模块，也可以为基于通道(channel)的注意力机制模块，或者还可以为结合空间和通道的注意力机制模块。Optionally, as shown in Figure 6, the face detection module 22 may also include a face feature focus layer 224, which is an attention mechanism module representing the convolution module, which may be a spatial-based attention mechanism module, a channel-based attention mechanism module, or an attention mechanism module combining space and channel.

作为一种示例，该人脸特征专注层224为基于空间的注意力机制模块，其专注于每个特征图的空间特征，特别的，由于人脸上眼睛、鼻子等五官部位的深度值与其它平面的深度值不一致，该人脸特征专注层224用于专注于人脸五官的位置，对每个特征图的像素值进行权重分布，以突出特征图中的人脸五官。通过在人脸检测模块22中加入该人脸特征专注层224，可以使得卷积网络221能够卷积得到更为凸显人脸五官特征的特征图，增加后续人脸位置检测以及活体检测的准确性。As an example, the facial feature focus layer 224 is a space-based attention mechanism module, which focuses on the spatial features of each feature map. In particular, since the depth values of the facial features such as eyes and nose are inconsistent with the depth values of other planes, the facial feature focus layer 224 is used to focus on the position of the facial features and weight the pixel values of each feature map to highlight the facial features in the feature map. By adding the facial feature focus layer 224 to the face detection module 22, the convolutional network 221 can be convoluted to obtain a feature map that highlights the facial features, thereby increasing the accuracy of subsequent face position detection and liveness detection.

可选地，若该人脸特征专注层224为基于空间的注意力机制模块，其包括但不限于是空间转换网络(spatial transformer networks，STN)模型，其还可以是相关技术中任意一种空间注意力机制模块，旨在提取卷积层网络221中每一层卷积层输出的特征图的平面特征。Optionally, if the facial feature focus layer 224 is a space-based attention mechanism module, it includes but is not limited to a spatial transformer network (STN) model, it can also be any spatial attention mechanism module in the related technology, aiming to extract the planar features of the feature map output by each convolution layer in the convolution layer network 221.

进一步地，经过卷积层网络221和人脸特征专注层224的专注卷积处理之后，得到中间特征图，该中间特征图分别通过人脸范围卷积层222和人脸中心卷积层223的卷积处理后，可以得到人脸区域预测图(scale map)以及人脸中心预测图(center map)，其中该人脸区域预测图用于表征第一特征图中人脸的区域大小，人脸中心预测图用于表征第一特征图中人脸的中心位置。Furthermore, after the focused convolution processing of the convolution layer network 221 and the face feature focused layer 224, an intermediate feature map is obtained. After the intermediate feature map is convolutionally processed by the face range convolution layer 222 and the face center convolution layer 223 respectively, a face area prediction map (scale map) and a face center prediction map (center map) can be obtained, wherein the face area prediction map is used to characterize the area size of the face in the first feature map, and the face center prediction map is used to characterize the center position of the face in the first feature map.

可选地，该人脸范围卷积层222和人脸中心卷积层223为两个1×1卷积层，其中，人脸中心卷积层223卷积得到的人脸中心预测图为人脸中心点热图(center heatmap)。Optionally, the face range convolution layer 222 and the face center convolution layer 223 are two 1×1 convolution layers, wherein the face center prediction image obtained by convolution of the face center convolution layer 223 is a face center point heatmap.

如图7所示，通过上述人脸区域预测图和人脸中心点热图，将其检测结果映射至第一特征图上，可以得到第一特征图中的人脸区域框，即人脸区域特征图。As shown in FIG. 7 , by mapping the detection results of the above-mentioned face region prediction map and face center point heat map onto the first feature map, the face region frame in the first feature map, namely, the face region feature map, can be obtained.

图6所示的人脸检测模块22为实际人脸检测过程中的神经网络结构，其中各卷积层的卷积核参数以及其它相关参数均需要通过神经网络训练得到。为了提高实际人脸检测过程中的准确度和鲁棒性，图8示出了训练阶段的一种人脸检测模块22的结构示意图。The face detection module 22 shown in FIG6 is a neural network structure in the actual face detection process, wherein the convolution kernel parameters of each convolution layer and other related parameters need to be obtained through neural network training. In order to improve the accuracy and robustness in the actual face detection process, FIG8 shows a schematic diagram of the structure of a face detection module 22 in the training stage.

如图8所示，与图6中的人脸检测模块22相比，在训练阶段，该人脸检测模块22还包括中心调整卷积层225，用于调整人脸中心卷积层223得到的人脸中心的位置坐标。可选地，在本申请实施例中，人脸中心卷积层223的通道包含两层，分别负责水平方向和垂直方向的偏移量。As shown in FIG8 , compared with the face detection module 22 in FIG6 , during the training phase, the face detection module 22 further includes a center adjustment convolution layer 225 for adjusting the position coordinates of the face center obtained by the face center convolution layer 223. Optionally, in the embodiment of the present application, the channel of the face center convolution layer 223 includes two layers, which are responsible for the horizontal and vertical offsets respectively.

在训练阶段，通过收集不同场景、不同角度下的人脸图像样本，并将样本中的人脸框(即真实值ground truth)转化为中心点和范围标注的真实值。例如，对于人脸中心点，当目标中心落在哪个位置，则在该位置赋值1(也即正样本)，其它位置赋值0(也即负样本)；对于人脸区域，当目标中心落在哪个位置，则在该位置赋值区域大小的log值，其它位置赋值0。In the training phase, we collect samples of face images in different scenes and angles, and convert the face frames in the samples (i.e., ground truth) into the real values of the center point and range annotation. For example, for the face center point, when the target center falls at a certain position, the position is assigned a value of 1 (i.e., positive sample), and other positions are assigned a value of 0 (i.e., negative sample); for the face area, when the target center falls at a certain position, the position is assigned the log value of the area size, and other positions are assigned a value of 0.

损失函数(loss function)包括三个部分，一部分是预测中心点位置的损失，第二部分是预测人脸框大小的损失，第三部分是预测中心点偏移量位置的损失，该三部分损失分别加权之后，三者之和即为最终人脸检测模块22的损失函数，利用该损失函数进行对人脸检测模块22进行训练，以得到模块中各层的参数。The loss function includes three parts. One part is the loss of predicting the center point position, the second part is the loss of predicting the face frame size, and the third part is the loss of predicting the center point offset position. After the three parts of the loss are weighted respectively, the sum of the three is the loss function of the final face detection module 22. The face detection module 22 is trained using the loss function to obtain the parameters of each layer in the module.

通过本申请实施例的方案，在训练过程中，通过设置中心调整卷积层225等来调整预测的人脸中心位置的坐标，以增加中心位置预测的鲁棒性与准确性，而在实际的人脸检测过程中，只用人脸范围卷积层222和人脸中心卷积层223获取人脸区域，能够提高检测过程的效率，加快人脸检测的速度。Through the solution of the embodiment of the present application, during the training process, the coordinates of the predicted face center position are adjusted by setting the center adjustment convolution layer 225, etc., so as to increase the robustness and accuracy of the center position prediction. In the actual face detection process, only the face range convolution layer 222 and the face center convolution layer 223 are used to obtain the face area, which can improve the efficiency of the detection process and speed up the face detection.

进一步地，通过上述步骤S220的人脸检测得到人脸区域特征图之后，执行步骤S230，对该人脸区域特征图进行进一步的特征提取，得到至少一张第二特征图。Furthermore, after obtaining the face region feature map through face detection in step S220, step S230 is executed to further extract features from the face region feature map to obtain at least one second feature map.

具体地，该步骤S230可以通过类神经网络20中的第二人脸特征提取模块23进行特征提取。Specifically, step S230 can perform feature extraction through the second face feature extraction module 23 in the neural network 20.

可选地，该第二人脸特征提取模块23可以包括至少一层卷积层，该至少一层卷积层用于对人脸区域特征图进行卷积计算，从而提取得到人脸区域特征图中的活体特征，例如面部纹理特征，五官细节特征等等，用于区分活体人脸区域以及非活体人脸区域，换言之，通过第二人脸特征提取模块23处理后，活体人脸区域的第二特征图与非活体人脸的第二特征图具有较大的特征差异。Optionally, the second facial feature extraction module 23 may include at least one convolution layer, which is used to perform convolution calculation on the facial area feature map, so as to extract live features in the facial area feature map, such as facial texture features, facial feature details features, etc., which are used to distinguish between live facial areas and non-live facial areas. In other words, after being processed by the second facial feature extraction module 23, the second feature map of the live facial area and the second feature map of the non-live facial area have a large feature difference.

与上文介绍的第一人脸特征学习模块类似，该第二人脸特征提取模块中的至少一层卷积层中的卷积核可以为3*3矩阵，5*5矩阵或者其它大小的矩阵，本申请实施例对此不做限定。Similar to the first facial feature learning module introduced above, the convolution kernel in at least one convolution layer in the second facial feature extraction module can be a 3*3 matrix, a 5*5 matrix or a matrix of other sizes, which is not limited in the embodiments of the present application.

可选地，第二人脸特征提取模块中的卷积层的层数可以为一层至四层之间，或者还可以为四层以上的卷积层，每层卷积层中的多个卷积核的大小可以相同或者不同，多个卷积核的卷积步长可以相同或者不同，本申请实施例对此不做限定。Optionally, the number of convolutional layers in the second facial feature extraction module can be between one and four layers, or can be more than four convolutional layers. The sizes of multiple convolutional kernels in each convolutional layer can be the same or different, and the convolution step sizes of multiple convolutional kernels can be the same or different. The embodiment of the present application is not limited to this.

可选地，至少一层卷积层的类型可以相同或者可以不同，该至少一层卷积层包括但不限于是二维卷积，三维卷积，逐点卷积，深度卷积，可分离卷积，反卷积，和/或空洞卷积等等。Optionally, the type of at least one convolution layer may be the same or different, and the at least one convolution layer includes but is not limited to two-dimensional convolution, three-dimensional convolution, point-by-point convolution, depth-wise convolution, separable convolution, deconvolution, and/or dilated convolution, etc.

可选地，在至少一层卷积层之后，还可以包括激励层，该激励层中包含激励函数，用于对卷积得到的特征图中的每个像素值进行非线性化处理。Optionally, after at least one convolution layer, an excitation layer may be further included, wherein the excitation layer includes an excitation function for performing nonlinear processing on each pixel value in the feature map obtained by convolution.

可以理解的是，在训练阶段，可以通过多个活体人脸区域图像样本与非活体人脸区域图像样本，对该第二人脸特征提取模块中的相关参数进行训练，以优化得到本申请实施例中的第二人脸特征提取模块的模型，从而在实际的人脸检测过程中，能够通过该第二人脸特征提取模块处理得到能够区分活体与非活体的特征图。It can be understood that during the training stage, the relevant parameters in the second face feature extraction module can be trained through multiple living face area image samples and non-living face area image samples to optimize the model of the second face feature extraction module in the embodiment of the present application, so that in the actual face detection process, the second face feature extraction module can be used to process and obtain a feature map that can distinguish between living and non-living objects.

更进一步地，在上述步骤S230之后，执行步骤S240，对该第二特征图进行活体检测，判断该第二特征图中的人脸是否为活体人脸，得到人脸的活体检测结果。Furthermore, after the above step S230, step S240 is executed to perform liveness detection on the second feature map to determine whether the face in the second feature map is a live face, and obtain a liveness detection result of the face.

具体地，在本申请实施例中，可以通过类神经网络20中的专注模块24执行步骤S240。Specifically, in the embodiment of the present application, step S240 can be performed by the focus module 24 in the neural network 20.

可以理解的是，该专注模块24为专注于第二特征图中的活体特征的注意力模块。It can be understood that the focusing module 24 is an attention module that focuses on the living features in the second feature map.

可选地，在本申请实施例中，该专注模块24可以为基于空间的注意力机制模块，也可以为基于通道的注意力机制模块，或者还可以为结合空间和通道的注意力机制模块。Optionally, in an embodiment of the present application, the focus module 24 may be a space-based attention mechanism module, a channel-based attention mechanism module, or an attention mechanism module combining space and channel.

作为一种示例，该专注模块24为结合空间和通道的注意力机制模块，其包括但不限于是卷积块注意模块(convolutional block attention module，CBAM)模型，其还可以是相关技术中任意一种结合空间和通道注意力机制模块，旨在提取多个特征图的平面特征以及不同通道中的重点特征。As an example, the focus module 24 is an attention mechanism module that combines space and channels, including but not limited to a convolutional block attention module (CBAM) model. It can also be any one of the related technologies that combines space and channel attention mechanism modules, aiming to extract planar features of multiple feature maps and key features in different channels.

图9示出了一种专注模块24的结构示意图。FIG9 shows a schematic structural diagram of a focusing module 24 .

具体地，该专注模块24包括多层卷积层，每层卷积层之后加入注意力机制模块，以生成优化的特征图。Specifically, the focus module 24 includes multiple convolutional layers, and an attention mechanism module is added after each convolutional layer to generate an optimized feature map.

如图9所示，第一卷积层241之后，加入通道注意力模块244，在第二卷积层242之后，加入空间注意力模块245，然后通过第三卷积层243，输出目标特征图，基于该目标特征图进行分类判断，可以确定该目标特征图是否为活体人脸。As shown in Figure 9, after the first convolution layer 241, a channel attention module 244 is added, and after the second convolution layer 242, a spatial attention module 245 is added. Then, through the third convolution layer 243, the target feature map is output, and classification judgment is performed based on the target feature map to determine whether the target feature map is a living face.

作为示例，下文结合图9中的专注模块24的结构，说明第二特征图至目标特征图的处理过程。As an example, the following describes the processing process from the second feature map to the target feature map in combination with the structure of the focus module 24 in FIG. 9 .

第二特征图经过第一卷积层241后得到N张(N通道)第一中间特征图，该N张第一中间特征图作为输入特征图输入至通道注意力模块244中。After the second feature map passes through the first convolution layer 241, N (N channels) first intermediate feature maps are obtained, and the N first intermediate feature maps are input into the channel attention module 244 as input feature maps.

在该通道注意力模块244中，将N张第一中间特征图在空间维度上分别经过最大池化(max pooling)和平均池化(average pooling)的压缩，得到两个1×1×N的第一中间向量，将该两个1×1×N的第一中间向量分别经过共享的多层感知器(multi-layerperceptron，MLP)进行处理得到两个1×1×N的第二中间向量，将该MLP输出的第二中间向量进行基于元素的加和(elementwise addition)操作，再经过激活函数，例如sigmoid函数激活处理，生成通道注意力特征图(channel attention featuremap)。将该通道注意力特征图和第一中间特征图进行基于元素的乘法(elementwise multiplication)操作，得到输入至第二卷积层242的N张(N通道)第二中间特征图。In the channel attention module 244, the N first intermediate feature maps are compressed by max pooling and average pooling in the spatial dimension to obtain two 1×1×N first intermediate vectors, and the two 1×1×N first intermediate vectors are processed by a shared multi-layer perceptron (MLP) to obtain two 1×1×N second intermediate vectors, and the second intermediate vectors output by the MLP are subjected to element-wise addition operation, and then subjected to activation function, such as sigmoid function activation processing, to generate a channel attention feature map. The channel attention feature map and the first intermediate feature map are subjected to element-wise multiplication operation to obtain N (N channel) second intermediate feature maps input to the second convolutional layer 242.

可选地，在一些实施方式中，第二卷积层242对该N张第二中间特征图进行卷积操作，将卷积后的特征图作为输入特征图输入至空间注意力模块245中，或者，在另一些实施方式中，省略该第二卷积层242的结构，直接将N张第二中间特征图作为输入特征图输入至空间注意力模块245中。Optionally, in some embodiments, the second convolutional layer 242 performs a convolution operation on the N second intermediate feature maps, and inputs the convolved feature maps as input feature maps into the spatial attention module 245. Alternatively, in other embodiments, the structure of the second convolutional layer 242 is omitted, and the N second intermediate feature maps are directly input into the spatial attention module 245 as input feature maps.

在空间注意力模块245中，将输入的多张特征图，例如N张(N通道)第二中间特征图在通道维度上进行最大池化(max pooling)和平均池化(average pooling)的压缩，得到两个W×H×1的第三中间特征图，其中，W和H为第二中间特征图的宽和高。然后将该两个第三中间向量结果基于通道进行合并(concat)操作，然后经过一个卷积操作，降维为一个W×H×1的第四中间特征图。该第四中间特征图再经过激活函数，例如sigmoid激活函数处理后生成空间注意力特征图(spatial attention feature)，将该空间注意力特征图和第二中间特征图进行基于元素的乘法(elementwise multiplication)操作，得到输入至第三卷积层243的N张(N通道)第五中间特征图。In the spatial attention module 245, multiple input feature maps, such as N (N channels) second intermediate feature maps, are compressed by maximum pooling and average pooling in the channel dimension to obtain two W×H×1 third intermediate feature maps, where W and H are the width and height of the second intermediate feature map. Then, the two third intermediate vector results are merged (concat) based on the channel, and then reduced to a W×H×1 fourth intermediate feature map through a convolution operation. The fourth intermediate feature map is then processed by an activation function, such as a sigmoid activation function, to generate a spatial attention feature map, and the spatial attention feature map and the second intermediate feature map are subjected to an element-wise multiplication operation to obtain N (N channels) fifth intermediate feature maps input to the third convolution layer 243.

可选地，进一步的通过第三卷积层243对第五中间特征图进行卷积操作，最终输出得到目标特征图，该目标特征图集合了空间特征以及通道特征，通过该目标特征图进行活体检测，具有较高的可靠性以及鲁棒性。Optionally, the fifth intermediate feature map is further convolved through the third convolutional layer 243, and a target feature map is finally outputted. The target feature map combines spatial features and channel features. Liveness detection is performed through the target feature map with high reliability and robustness.

在本申请实施例中，采用轻量的CBMA模块，相比于只关注于通道的注意力模块或者只关注于空间的注意力模块，能够更为简单有效的获得目标特征图。In an embodiment of the present application, a lightweight CBMA module is used, which can obtain the target feature map more simply and effectively compared to an attention module that only focuses on the channel or an attention module that only focuses on the space.

在本申请实施例中，在上述包括有卷积层和激励层的类神经网络架构中取消了一些压缩奖惩模块(squeeze and excitation block，SE block)，可以实现轻量的神经网络架构，也便于该轻量的类神经网络运行于友善边缘运算的装置上。In an embodiment of the present application, some squeeze and excitation blocks (SE blocks) are eliminated in the above-mentioned neural network-like architecture including convolutional layers and excitation layers, so as to realize a lightweight neural network architecture and facilitate the lightweight neural network-like architecture to run on a friendly edge computing device.

与此同时，通过加入本申请实施方式中的专注模块，有利于活体检测专注于人脸特征上，从而提升活体检测结果的准确性。At the same time, by adding the focus module in the implementation mode of the present application, it is helpful for liveness detection to focus on facial features, thereby improving the accuracy of liveness detection results.

基于本申请实施例的方案，通过类神经网络，可以输出得到待检测目标中人脸位置并同步输出人脸的活体检测结果，与此同时，本申请实施例的类神经网络运行效率较高，可以运行在边缘运算的装置上，且输入的图像为待检测目标的深度图，可以避免环境光对于人脸检测的影响，在低光照、无光照或者逆光照等情况下仍能够有效进行人脸检测。Based on the solution of the embodiment of the present application, through a quasi-neural network, the face position in the target to be detected can be output and the liveness detection result of the face can be output synchronously. At the same time, the quasi-neural network of the embodiment of the present application has a high operating efficiency and can be run on an edge computing device. The input image is a depth map of the target to be detected, which can avoid the influence of ambient light on face detection and can still effectively perform face detection in low light, no light or backlighting conditions.

上文结合图4至图9，详细描述了本申请中的人脸检测的方法实施例，下文结合图10至图12，详细描述本申请的人脸检测的装置实施例，应理解，装置实施例与方法实施例相互对应，类似的描述可以参照方法实施例。The above, in combination with Figures 4 to 9, describes in detail an embodiment of the method for face detection in the present application. The following, in combination with Figures 10 to 12, describes in detail an embodiment of the device for face detection in the present application. It should be understood that the device embodiment and the method embodiment correspond to each other, and similar descriptions can refer to the method embodiment.

图10是根据本申请实施例的人脸检测装置20的示意性结构框图，该人脸检测装置20对应于上述人脸检测方法200。FIG. 10 is a schematic structural block diagram of a face detection device 20 according to an embodiment of the present application. The face detection device 20 corresponds to the above-mentioned face detection method 200 .

如图10所示，人脸检测装置20包括：As shown in FIG10 , the face detection device 20 includes:

获取单元210，用于获取待检测目标的深度图；An acquisition unit 210 is used to acquire a depth map of a target to be detected;

处理单元220，用于对上述深度图进行图像处理，得到深度图中的包括活体人脸的人脸区域框。The processing unit 220 is used to perform image processing on the depth map to obtain a face region frame including a living face in the depth map.

输出单元230，输出该深度图中包括活体人脸的人脸区域框。The output unit 230 outputs a face region frame including a living face in the depth map.

在本申请实施例中，进行人脸检测的图像为待检测目标的深度图，因而可以避免环境光对于人脸检测的影响，在低光照、无光照或者逆光照等情况下仍能够有效进行人脸检测。In an embodiment of the present application, the image for face detection is a depth map of the target to be detected, thereby avoiding the influence of ambient light on face detection, and effectively performing face detection in low light, no light or backlighting conditions.

具体地，如图11所示，上述处理单元220可以包括：Specifically, as shown in FIG11 , the processing unit 220 may include:

第一人脸特征提取模块21，用于对该深度图进行特征提取，得到第一特征图；A first face feature extraction module 21, used to extract features from the depth map to obtain a first feature map;

人脸检测模块22，用于对该第一特征图进行人脸检测，得到人脸区域特征图；A face detection module 22, configured to perform face detection on the first feature map to obtain a face region feature map;

第二人脸特征提取模块23，对该人脸区域特征图进行特征提取，得到第二特征图；A second face feature extraction module 23 performs feature extraction on the face region feature map to obtain a second feature map;

专注模块24，对该第二特征图进行活体检测，得到人脸的活体检测结果。The focusing module 24 performs liveness detection on the second feature map to obtain a liveness detection result of the face.

可选地，该处理单元220可以包括上述方法实施例中的类神经网络20，以对待检测目标的深度图像进行处理。Optionally, the processing unit 220 may include the neural network 20 in the above method embodiment to process the depth image of the target to be detected.

在本申请实施例中，处理单元220中的类神经网络20为一个轻量的神经网络架构，运行效率高，且在进行人脸检测的同时还能够进行活体检测，能够一次性输出得到包括活体人脸的人脸区域框，从而提高人脸检测的准确性。In the embodiment of the present application, the neural network 20 in the processing unit 220 is a lightweight neural network architecture with high operating efficiency. It can perform liveness detection while performing face detection, and can output a face area frame including a live face at one time, thereby improving the accuracy of face detection.

具体地，该处理单元220中的第一人脸特征提取模块21、人脸检测模块22、第二人脸特征提取模块23、以及专注模块24，分别对应于上述类神经网络20中的第一人脸特征提取模块21、人脸检测模块22、第二人脸特征提取模块23、以及专注模块24。Specifically, the first face feature extraction module 21, the face detection module 22, the second face feature extraction module 23, and the focus module 24 in the processing unit 220 respectively correspond to the first face feature extraction module 21, the face detection module 22, the second face feature extraction module 23, and the focus module 24 in the above-mentioned neural network 20.

可以理解的是，在本申请实施例中，第一人脸特征提取模块21、人脸检测模块22、第二人脸特征提取模块23、以及专注模块24的相关技术方案可以参见上文中的相关描述，此处不再赘述。It can be understood that in the embodiment of the present application, the relevant technical solutions of the first face feature extraction module 21, the face detection module 22, the second face feature extraction module 23, and the focus module 24 can be found in the relevant description above and will not be repeated here.

在一些可能的实施方式中，第一人脸特征提取模块21对待检测目标的深度图进行特征提取后，得到的第一特征图中包括该深度图中的边缘线条特征。In some possible implementations, after the first facial feature extraction module 21 performs feature extraction on the depth map of the target to be detected, the first feature map obtained includes edge line features in the depth map.

在一些可能的实施方式中，该第一人脸特征提取模块21可以包括：不大于4层的卷积层，在保证提取性能的同时提高模块的运行速度。In some possible implementations, the first facial feature extraction module 21 may include: no more than 4 convolutional layers, which improve the operating speed of the module while ensuring the extraction performance.

参见上文中图6以及相关描述，在一些可能的实施方式中，该人脸检测模块22可以包括：卷积层网络221、人脸范围卷积层222和人脸中心卷积层223。Referring to FIG. 6 and related descriptions above, in some possible implementations, the face detection module 22 may include: a convolutional layer network 221, a face range convolutional layer 222, and a face center convolutional layer 223.

可选地，该卷积层网络221用于对第一特征图进行卷积计算，得到中间特征图；Optionally, the convolutional layer network 221 is used to perform convolution calculation on the first feature map to obtain an intermediate feature map;

该人脸范围卷积层222和该人脸中心卷积层223分别用于对该中间特征图进行卷积计算，得到人脸区域预测图和人脸中心预测图；The face range convolution layer 222 and the face center convolution layer 223 are used to perform convolution calculations on the intermediate feature map to obtain a face region prediction map and a face center prediction map respectively;

该人脸区域预测图和该人脸中心预测图用于将其检测结果映射至该第一特征图中，得到人脸区域特征图。The face area prediction map and the face center prediction map are used to map their detection results to the first feature map to obtain a face area feature map.

可选地，参见图6所示，上述人脸检测模块22还包括：人脸特征专注层224；Optionally, as shown in FIG6 , the above-mentioned face detection module 22 further includes: a face feature focus layer 224;

卷积层网络221和人脸特征专注层224用于对第一特征图进行卷积计算，得到中间特征图；The convolutional layer network 221 and the face feature focus layer 224 are used to perform convolution calculation on the first feature map to obtain an intermediate feature map;

人脸特征专注层224用于对该中间特征图的像素值进行权重分布，以突出该中间特征图中的人脸五官特征。The facial feature focus layer 224 is used to perform weight distribution on the pixel values of the intermediate feature map to highlight the facial features in the intermediate feature map.

作为示例，其中，人脸特征专注层224为基于空间的注意力模块。As an example, the facial feature focus layer 224 is a space-based attention module.

此外，参见上文图8以及相关描述，在一些可能的实施方式中，人脸检测模块22还可以包括中心调整卷积层225，该人脸检测模块22的参数是通过神经网络训练得到的。In addition, referring to FIG. 8 and related descriptions above, in some possible implementations, the face detection module 22 may further include a center adjustment convolution layer 225, and the parameters of the face detection module 22 are obtained through neural network training.

在神经网络训练阶段，在人脸检测模块22中，卷积层网络221还用于：对样本图像进行卷积计算，得到第一样本特征图，其中，该样本图像中标注有人脸区域真实值和人脸中心真实值；In the neural network training stage, in the face detection module 22, the convolution layer network 221 is also used to: perform convolution calculation on the sample image to obtain a first sample feature map, wherein the sample image is annotated with a true value of the face area and a true value of the face center;

人脸范围卷积层222、人脸中心卷积层223和中心调整卷积层225用于：分别对该第一样本特征图进行卷积计算，得到人脸区域预测值，人脸中心预测值以及人脸中心偏移预测值；The face range convolution layer 222, the face center convolution layer 223 and the center adjustment convolution layer 225 are used to: perform convolution calculations on the first sample feature map respectively to obtain a face area prediction value, a face center prediction value and a face center offset prediction value;

该人脸区域预测值，该人脸中心预测值、该人脸中心偏移预测值，以及该人脸区域真实值和该人脸中心真实值用于计算损失函数得到该人脸检测模块22的参数。The face area prediction value, the face center prediction value, the face center offset prediction value, and the face area true value and the face center true value are used to calculate the loss function to obtain the parameters of the face detection module 22.

作为示例，上述人脸范围卷积层222和人脸中心卷积层223为两个1×1卷积层，其中，该人脸中心预测图为人脸中心热图。As an example, the face range convolution layer 222 and the face center convolution layer 223 are two 1×1 convolution layers, wherein the face center prediction image is a face center heat map.

进一步地，通过第二人脸特征提取模块23对上述人脸区域特征图进行特征提取，得到的第二特征图中包括人脸的细节特征。Furthermore, the second face feature extraction module 23 performs feature extraction on the face region feature map, and the obtained second feature map includes detailed features of the face.

作为示例，该第二特征图中包括人脸的五官特征。As an example, the second feature map includes facial features of a person.

在一些可能的实施方式中，该第二人脸特征提取模块23包括不大于4层的卷积层，在保证提取性能的同时提高模块的运行速度。In some possible implementations, the second facial feature extraction module 23 includes no more than 4 convolutional layers, which improves the operating speed of the module while ensuring the extraction performance.

参见上文中图9以及相关描述，在一些可能的实施方式中，在处理单元220中，专注模块24为结合空间和通道的注意力机制模块。Referring to FIG. 9 and the related description above, in some possible implementations, in the processing unit 220, the focus module 24 is an attention mechanism module that combines space and channels.

作为示例，该专注模块24包括：多层卷积层(第一卷积层241、第二卷积层242以及第三卷积层243)、通道注意力模块244以及空间注意力模块245；As an example, the focus module 24 includes: multiple convolutional layers (a first convolutional layer 241, a second convolutional layer 242, and a third convolutional layer 243), a channel attention module 244, and a spatial attention module 245;

该多层卷积层中的第一卷积层241对第二特征图进行卷积计算，得到第一中间特征图；The first convolution layer 241 in the multi-layer convolution layer performs convolution calculation on the second feature map to obtain a first intermediate feature map;

采用通道注意力模块244对该第一中间特征图进行处理得到通道注意力特征图；The first intermediate feature map is processed by the channel attention module 244 to obtain a channel attention feature map;

采用第二卷积层242对通道注意力特征图和第一中间特征图进行卷积计算，得到第二中间特征图；The second convolution layer 242 is used to perform convolution calculation on the channel attention feature map and the first intermediate feature map to obtain a second intermediate feature map;

采用空间注意力模块245对该第二中间特征图进行处理得到空间注意力特征图；The second intermediate feature map is processed by the spatial attention module 245 to obtain a spatial attention feature map;

采用第三卷积层243对空间注意力特征图和第二中间特征图进行卷积计算，得到目标特征图；The third convolutional layer 243 is used to perform convolution calculation on the spatial attention feature map and the second intermediate feature map to obtain a target feature map;

基于该目标特征图得到人脸的活体检测结果，其中，该目标特征图中包括人脸的活体特征。A liveness detection result of the human face is obtained based on the target feature map, wherein the target feature map includes liveness features of the human face.

由于在本申请实施例中，处理单元220中采用轻量的类神经网络进行人脸检测运算，在一些实施方式中，上述人脸检测装置20可以为边缘运算装置。Since in the embodiment of the present application, a lightweight neural network is used in the processing unit 220 to perform face detection operations, in some implementations, the face detection device 20 may be an edge computing device.

图12是本申请实施例的人脸检测装置的硬件结构示意图。图12所示的人脸检测装置30(该人脸检测装置30具体可以是一种计算机设备)包括存储器310、处理器320、通信接口330以及总线340。其中，存储器310、处理器320、通信接口330通过总线340实现彼此之间的通信连接。FIG12 is a schematic diagram of the hardware structure of the face detection device of the embodiment of the present application. The face detection device 30 shown in FIG12 (the face detection device 30 may be a computer device) includes a memory 310, a processor 320, a communication interface 330, and a bus 340. The memory 310, the processor 320, and the communication interface 330 are connected to each other through the bus 340.

存储器310可以是只读存储器(read only memory，ROM)，静态存储设备，动态存储设备或者随机存取存储器(random access memory，RAM)。存储器310可以存储程序，当存储器310中存储的程序被处理器320执行时，处理器320和通信接口330用于执行本申请实施例的人脸检测的方法的各个步骤。The memory 310 may be a read only memory (ROM), a static storage device, a dynamic storage device or a random access memory (RAM). The memory 310 may store a program. When the program stored in the memory 310 is executed by the processor 320, the processor 320 and the communication interface 330 are used to execute the various steps of the face detection method of the embodiment of the present application.

处理器320可以采用通用的中央处理器(central processing unit，CPU)，微处理器，应用专用集成电路(application specific integrated circuit，ASIC)，图形处理器(graphics processing unit，GPU)或者一个或多个集成电路，用于执行相关程序，以实现本申请实施例的人脸检测装置中的模块所需执行的功能，或者执行本申请方法实施例的人脸检测的方法。The processor 320 can adopt a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU) or one or more integrated circuits to execute relevant programs to implement the functions required to be performed by the modules in the face detection device of the embodiment of the present application, or to execute the face detection method of the method embodiment of the present application.

处理器320还可以是一种集成电路芯片，具有信号的处理能力。在实现过程中，本申请的人脸检测的方法的各个步骤可以通过处理器320中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器320还可以是通用处理器、数字信号处理器(digitalsignal processing，DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmablegate array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器310，处理器320读取存储器310中的信息，结合其硬件完成本申请实施例的人脸检测装置中包括的模块所需执行的功能，或者执行本申请方法实施例的人脸检测的方法。The processor 320 may also be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the face detection method of the present application may be completed by an integrated logic circuit of hardware or software instructions in the processor 320. The above-mentioned processor 320 may also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The methods, steps and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor, etc. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed by a hardware decoding processor, or may be executed by a combination of hardware and software modules in a decoding processor. The software module may be located in a mature storage medium in the art such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory or an electrically erasable programmable memory, a register, etc. The storage medium is located in the memory 310, and the processor 320 reads the information in the memory 310, and combines its hardware to complete the functions required to be performed by the modules included in the face detection device of the embodiment of the present application, or executes the face detection method of the method embodiment of the present application.

通信接口330使用例如但不限于收发器一类的收发装置，来实现装置30与其他设备或通信网络之间的通信。例如，可以通过通信接口330获取输入数据。The communication interface 330 uses a transceiver device such as, but not limited to, a transceiver to implement communication between the device 30 and other devices or a communication network. For example, input data can be obtained through the communication interface 330.

总线340可包括在装置30各个部件(例如，存储器310、处理器320、通信接口330)之间传送信息的通路。Bus 340 may include a pathway for transmitting information between various components of device 30 (eg, memory 310 , processor 320 , communication interface 330 ).

应注意，尽管图12所示的装置30仅仅示出了存储器310、处理器320、通信接口340和总线340，但是在具体实现过程中，本领域的技术人员应当理解，装置30还包括实现正常运行所必须的其他器件。同时，根据具体需要，本领域的技术人员应当理解，装置30还可包括实现其他附加功能的硬件器件。此外，本领域的技术人员应当理解，装置30也可仅仅包括实现本申请实施例所必须的器件，而不必包括图12中所示的全部器件。It should be noted that although the device 30 shown in FIG. 12 only shows the memory 310, the processor 320, the communication interface 340 and the bus 340, in the specific implementation process, those skilled in the art should understand that the device 30 also includes other devices necessary for normal operation. At the same time, according to specific needs, those skilled in the art should understand that the device 30 may also include hardware devices for implementing other additional functions. In addition, those skilled in the art should understand that the device 30 may also include only the devices necessary for implementing the embodiments of the present application, and does not necessarily include all the devices shown in FIG. 12.

应理解，人脸检测装置30可以与上述图10中的人脸检测装置20相对应，人脸检测装置20中的处理单元220的功能可以由处理器320实现，获取单元210和输出单元230的功能可以由通信接口330实现。为避免重复，此处适当省略详细描述。It should be understood that the face detection device 30 may correspond to the face detection device 20 in FIG. 10 , the function of the processing unit 220 in the face detection device 20 may be implemented by the processor 320, and the functions of the acquisition unit 210 and the output unit 230 may be implemented by the communication interface 330. To avoid repetition, detailed description is appropriately omitted here.

本申请实施例还提供了一种处理装置，包括处理器和接口；所述处理器，用于执行上述任一方法实施例中的人脸检测的方法。An embodiment of the present application also provides a processing device, including a processor and an interface; the processor is used to execute the face detection method in any of the above method embodiments.

应理解，上述处理装置可以是一个芯片。例如，该处理装置可以是现场可编程门阵列(field-programmable gate array，FPGA)，可以是专用集成芯片(applicationspecific integrated circuit，ASIC)，还可以是系统芯片(system on chip，SoC)，还可以是中央处理器(central processor unit，CPU)，还可以是网络处理器(networkprocessor，NP)，还可以是数字信号处理电路(digital signal processor，DSP)，还可以是微控制器(micro controller unit，MCU)，还可以是可编程控制器(programmable logicdevice，PLD)或其他集成芯片。It should be understood that the above-mentioned processing device can be a chip. For example, the processing device can be a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a system on chip (SoC), a central processor unit (CPU), a network processor (NP), a digital signal processor (DSP), a microcontroller unit (MCU), a programmable logic device (PLD) or other integrated chips.

本申请实施例还提供一种平台系统，其包括前述的人脸检测装置。An embodiment of the present application also provides a platform system, which includes the aforementioned face detection device.

本申请实施例还提供了一种计算机可读介质，其上存储有计算机程序，该计算机程序被计算机执行时实现上述任一方法实施例的方法。An embodiment of the present application further provides a computer-readable medium having a computer program stored thereon, and when the computer program is executed by a computer, the method of any of the above method embodiments is implemented.

本申请实施例还提供了一种计算机程序产品，该计算机程序产品被计算机执行时实现上述任一方法实施例的方法。The embodiment of the present application also provides a computer program product, which implements the method of any of the above method embodiments when executed by a computer.

本申请实施例还提供了一种电子设备，该电子设备可以包括上述申请实施例的人脸识别装置。An embodiment of the present application also provides an electronic device, which may include the face recognition device of the above-mentioned embodiment of the application.

例如，电子设备为智能门锁、手机、电脑、门禁系统等等需要应用人脸识别的设备。所述人脸识别装置包括电子设备中用于人脸识别的软件以及硬件装置。For example, the electronic device is a device that needs to apply face recognition, such as a smart door lock, a mobile phone, a computer, an access control system, etc. The face recognition device includes software and hardware devices for face recognition in the electronic device.

可选地，该电子设备中还可以包括深度图采集装置。Optionally, the electronic device may also include a depth map acquisition device.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.

在本说明书中使用的术语“单元”、“模块”、“系统”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如，部件可以是但不限于，在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序和/或计算机。通过图示，在计算设备上运行的应用和计算设备都可以是部件。一个或多个部件可驻留在进程和/或执行线程中，部件可位于一个计算机上和/或分布在2个或更多个计算机之间。此外，这些部件可从在上面存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个或多个数据分组(例如来自与本地系统、分布式系统和/或网络间的另一部件交互的二个部件的数据，例如通过信号与其它系统交互的互联网)的信号通过本地和/或远程进程来通信。The terms "unit", "module", "system", etc. used in this specification are used to represent computer-related entities, hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component can be, but is not limited to, a process running on a processor, a processor, an object, an executable file, an execution thread, a program and/or a computer. By way of illustration, both applications running on a computing device and a computing device can be components. One or more components may reside in a process and/or an execution thread, and a component may be located on a computer and/or distributed between two or more computers. In addition, these components may be executed from various computer-readable media having various data structures stored thereon. Components may, for example, communicate through local and/or remote processes according to signals having one or more data packets (e.g., data from two components interacting with another component between a local system, a distributed system and/or a network, such as the Internet interacting with other systems through signals).

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices and units described above can refer to the corresponding processes in the aforementioned method embodiments and will not be repeated here.

在本申请所提供的几个实施例中，应所述理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者所述技术方案的部分可以以软件产品的形式体现出来，所述计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be essentially or partly embodied in the form of a software product that contributes to the prior art, and the computer software product is stored in a storage medium, including several instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, etc., and other media that can store program codes.

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应所述以权利要求的保护范围为准。The above is only a specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any technician familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.