Movatterモバイル変換


[0]ホーム

URL:


CN112232204A - Living body detection method based on infrared image - Google Patents

Living body detection method based on infrared image
Download PDF

Info

Publication number
CN112232204A
CN112232204ACN202011106811.1ACN202011106811ACN112232204ACN 112232204 ACN112232204 ACN 112232204ACN 202011106811 ACN202011106811 ACN 202011106811ACN 112232204 ACN112232204 ACN 112232204A
Authority
CN
China
Prior art keywords
face
detector
living body
preset
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011106811.1A
Other languages
Chinese (zh)
Other versions
CN112232204B (en
Inventor
严安
周治尹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Dianze Intelligent Technology Co ltd
Zhongke Zhiyun Technology Co ltd
Original Assignee
Shanghai Dianze Intelligent Technology Co ltd
Zhongke Zhiyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dianze Intelligent Technology Co ltd, Zhongke Zhiyun Technology Co ltdfiledCriticalShanghai Dianze Intelligent Technology Co ltd
Priority to CN202011106811.1ApriorityCriticalpatent/CN112232204B/en
Publication of CN112232204ApublicationCriticalpatent/CN112232204A/en
Application grantedgrantedCritical
Publication of CN112232204BpublicationCriticalpatent/CN112232204B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明属于人脸识别技术领域,具体涉及一种实时多功能人脸检测方法。基于红外图像的活体检测方法,包括:采集红外图片并进行预处理操作;将图片放入检测器中进行预测,得到人脸框预测值、人脸关键点和口罩识别结果;将人脸框预测值和人脸关键点进行解码操作;采用阈值为0.4的非极大值抑制算法消除重叠检测框,得到最终的人脸检测框、人脸关键点和口罩识别结果;根据人脸关键点提取两只眼睛坐标x和y,将x和y分别向四个方向延伸预设像素后得到眼部图像;采用活体识别神经网络判断眼部图像是否为活体,得到判断结果。本发明能在移动端只有CPU的情况下达到实时检测效果,准确检测出眼部位置。

Figure 202011106811

The invention belongs to the technical field of face recognition, in particular to a real-time multifunctional face detection method. The infrared image-based living detection method includes: collecting infrared images and performing preprocessing operations; putting the images into the detector for prediction, and obtaining the face frame prediction value, face key points and mask recognition results; predicting the face frame Perform decoding operation on the value and face key points; use a non-maximum suppression algorithm with a threshold of 0.4 to eliminate overlapping detection frames, and obtain the final face detection frame, face key points and mask recognition results; Only eye coordinates x and y are obtained, and the eye image is obtained by extending x and y in four directions respectively by preset pixels; the living body recognition neural network is used to judge whether the eye image is a living body, and the judgment result is obtained. The present invention can achieve real-time detection effect under the condition that the mobile terminal has only CPU, and accurately detect the position of the eye.

Figure 202011106811

Description

Living body detection method based on infrared image
Technical Field
The invention belongs to the technical field of face recognition, and particularly relates to a real-time multifunctional face detection method.
Background
The face recognition system takes a face recognition technology as a core, is an emerging biological recognition technology, and is a high-precision technology for the current international scientific and technological field. The method is widely applied to regional characteristic analysis, integrates a computer image processing technology and a biological statistics principle, extracts portrait characteristic points from a video by using the computer image processing technology, analyzes and establishes a mathematical model by using the biological statistics principle, and has wide development prospect. Face detection is a key link in automatic face recognition systems. However, the human face has quite complicated detail changes, different appearances such as face shapes, skin colors and the like, and different expressions such as opening and closing of eyes and mouths and the like; mask occlusion, etc., and the variation of these intrinsic and extrinsic factors makes face detection a complex and challenging pattern detection problem in face recognition systems.
Although people have extensively studied face detection algorithms based on convolutional neural networks, the face detection algorithms on mobile devices cannot achieve real-time effects at the mobile end, nor can they achieve real-time detection effects under the condition of only a CPU.
In addition, when the existing face detection is performed, the detection function is single, the eye position cannot be accurately detected, the steps for detecting a dynamic living body are multiple, the dynamic living body is easily influenced by external environments such as natural illumination, and the robustness is insufficient.
Disclosure of Invention
The invention aims to solve the technical problems that the existing face detection cannot accurately detect the eye removing position and the dynamic living body detection has various steps, and provides a living body detection method based on an infrared image.
The living body detection method based on the infrared image comprises the following steps:
acquiring an infrared picture and preprocessing the picture;
the picture is put into a preset detector for prediction, features obtained through four different convolution layers in a backbone network of the detector are combined with anchor points with multiple sizes, face detection, face key point detection and mask recognition are carried out, and a face frame prediction value, a face key point and a mask recognition result are obtained;
decoding the predicted value of the face frame, converting the predicted value into the real position of a boundary frame, and decoding the key points of the face to convert the key points into the real position of the key points;
eliminating overlapped detection frames by adopting a non-maximum suppression algorithm with a threshold value of 0.4 to obtain a final face detection frame, face key points and mask recognition results, wherein the final face detection frame, face key points and mask recognition results comprise information of a left upper corner coordinate, a right lower corner coordinate, two eye coordinates, a nose coordinate, a pair of mouth corner coordinates and confidence coefficient of wearing a mask;
extracting coordinates x and y of two eyes according to the key points of the face, and extending the x and y to preset pixels in four directions respectively to obtain an eye image;
and judging whether the eye image is a living body by adopting a preset living body recognition neural network to obtain a judgment result.
Optionally, before the picture is placed in a preset detector for prediction, the method further includes:
loading preset pre-training network parameters to the detector, and generating a default anchor point according to the size and length-width ratio of the preset anchor point;
training the detector through a preset data set to obtain a trained detector;
the detector includes a backbone network, a prediction layer, and a multi-tasking loss layer.
Optionally, the training the detector through a preset data set to obtain a trained detector includes:
acquiring unoccluded data and occluded data serving as a data set, converting a BGR picture in the data set into a YUV format, only storing data of a Y channel, and then performing data enhancement to obtain an enhanced data set;
performing network training by using a random optimization algorithm with momentum of 0.9 and weight attenuation factor of 0.0005, wherein the random optimization algorithm reduces imbalance between positive and negative samples by using a difficult sample mining mode, and the initial learning rate is set to 10 in the first 100 rounds of training-3After 50 and 100 rounds each decreased by a factor of 10, each predictor was first compared to the best Jacca during trainingrd overlaps the anchor point to match, then matches the anchor point to the Jaccard overlapping face with a threshold above 0.35.
Optionally, the non-occlusion data is a face picture when the mask is not worn, the occlusion data is a face picture when the mask is worn, and the occlusion data is greater than the non-occlusion data.
Optionally, the performing data enhancement includes:
adding data to prevent model overfitting by applying a combination of at least one or more of color distortion, increased brightness contrast, random cropping, horizontal flipping, and transformation channels to pictures in the data set.
Optionally, put into the picture and predict in the detector of predetermineeing, through the characteristics that four different convolution layers obtained combine with the anchor point of a plurality of sizes in the backbone network of detector, carry out face detection, people's face key point detection and gauze mask discernment, obtain people's face frame predicted value, people's face key point and gauze mask recognition result, include:
the pictures are put into the trained detector for prediction, and the characteristics of 8 th, 11 th, 13 th and 15 th convolutional layers in the backbone network are respectively input into each prediction layer for face frame, face key point positioning and mask recognition operation during prediction;
for each anchor point, representing by using 4 offsets from its coordinates and N scores for classification, where N is 2; for each anchor point during detector training, the minimization of the multitask loss function:
Figure BDA0002727188230000031
wherein L isobjDetecting for a cross entropy loss function whether an anchor point contains a target classification, piFor the probability of an anchor having a target, if the anchor contains a target, then
Figure BDA0002727188230000032
Otherwise, the value is 0; l isboxEmploying the smoth-L1 loss function for humansPositioning of the face anchor, ti={tx,ty,tw,thI is the coordinate offset of the prediction box,
Figure BDA0002727188230000033
the coordinate offset of the anchor point of the positive sample; l islandmarkAdopting smoth-L1 loss function for positioning key points of human facei={lx1,ly1,lx2,ly2,...,lx5,ly5}iFor the predicted amount of the keypoint offset,
Figure BDA0002727188230000034
is the coordinate offset of the key point of the positive sample, if the sample is the wearing maski={lx1,ly1,lx2,ly2}i
Figure BDA0002727188230000035
Wherein lx1,ly1And
Figure BDA0002727188230000036
respectively representing the left-eye predicted keypoint coordinate offset and the positive sample keypoint offset, lx2,ly2And
Figure BDA0002727188230000037
respectively representing the coordinate offset of the right-eye predicted key point and the offset of the positive sample key point; lambda [ alpha ]1And λ2Respectively, the weight coefficients of the face frame and the key point loss function.
Optionally, anchor points of 10 to 256 pixels are used to match the minimum size of the corresponding effective receptive field, with each anchor point for detecting features being sized to (10, 16, 24), (32, 48), (64, 96) and (128, 192, 256), respectively.
Optionally, the decoding the face frame prediction value to convert the face frame prediction value into a real position of a bounding box, and the decoding the face key point to convert the face key point into a real position of a key point includes:
the predicted value l ═ of the face frame obtained by the detector is (l)cx,lcy,lw,lh) Decoding operation is carried out, and the real position b ═ b of the boundary box is converted intocx,bcy,bw,bh):
bcx=lcxdw+dcx,bcy=lcydh+dcy
bw=dwexp(lw),bh=dhexp(lh);
Predicting the face key point value obtained by the detector
Figure BDA0002727188230000038
Figure BDA0002727188230000039
Translating to true positions of keypoints
Figure BDA00027271882300000310
Figure BDA00027271882300000311
Wherein d ═ d (d)cx,dcy,dw,dh) Representing a generated default anchor point.
Optionally, the extracting coordinates x and y of the two eyes according to the key points of the face, and extending the x and y to preset pixels in four directions respectively to obtain the eye image includes:
and extracting coordinates x and y of two eyes according to the key points of the human face, and extending the x and the y to four directions by 32 pixels respectively to obtain a 64 x 64 eye image.
Optionally, the determining, by using a preset living body recognition neural network, whether the eye image is a living body, to obtain a determination result, includes:
the living body recognition neural network adopts a mobilenet lightweight neural network to extract living body characteristics, and the living body recognition neural network uses a cross entropy loss function as a loss function.
The positive progress effects of the invention are as follows: the invention adopts the living body detection method based on the infrared image, and has the following remarkable advantages:
1. the real-time detection effect can be achieved under the condition that the mobile terminal only has a CPU;
2. the living body accuracy is improved in a mode of finely detecting the bright pupil effect;
3. accurately detecting the position of the eye;
4. the robustness is strong, and the influence of the outside is small.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a network architecture of the detector of the present invention;
FIG. 3 is a diagram of the attack image results of the present invention;
FIG. 4 is a diagram of a human image result of the present invention.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific drawings.
Referring to fig. 1, a living body detecting method based on an infrared image includes:
and S1, inputting the picture, collecting the infrared picture through the infrared camera, and carrying out preprocessing operation on the picture.
In the step, the infrared picture can be directly acquired from the infrared camera end, or the infrared picture can be input through the input interface. The preprocessing operation of the picture comprises image size adjustment and standardization.
S2, predicting by the detector: the picture is put into a preset detector for prediction, features obtained through four different convolution layers in a backbone network of the detector are combined with anchor points of multiple sizes, face detection, face key point detection and mask recognition are carried out, and a face frame prediction value, a face key point and a mask recognition result are obtained.
Before the step of placing the picture into a preset detector for prediction, the method further comprises the following steps:
loading preset pre-training network parameters to a detector, and generating a default anchor point according to the size and length-width ratio of the preset anchor point, wherein the default anchor point is as follows: d ═ d (d)cx,dcy,dw,dh)。
Wherein, referring to fig. 2, the detector includes a backbone network, a prediction layer, and a multi-tasking loss layer. The backbone network comprises 15 convolutional layers, 4 prediction layers and 1 multitask loss layer. The 15 convolutional layers comprise a convolution module 1, thirteen convolution modules 2 and a convolution module 3. The convolution module 1 consists of convolution, normalization and activation layers. The convolution module 2 is composed of two groups of modules, namely a first module composed of a group of convolution, normalization and activation layers and a second module composed of a group of convolution, normalization and activation layers. The convolution module 3 is composed of two groups of modules, namely a first module composed of a group of convolution, normalization and activation layers and a second module only containing convolution. In the step, the characteristics of 8 th, 11 th, 13 th and 15 th convolution layers in the backbone network are respectively input into each prediction layer to carry out face frame, face key point positioning and mask recognition operation, and each prediction layer is input into a multi-task loss layer to realize the fitting of a plurality of detection results.
And training the detector through a preset data set to obtain the trained detector. The detector algorithm is preferably implemented using a pytorech open source deep learning library. During training, the following processes are included:
s201, data acquisition: the acquisition includes unoccluded data and occlusion data as datasets.
The non-occlusion data is a face picture when the mask is not worn, the occlusion data is a face picture when the mask is worn, the occlusion data is larger than the non-occlusion data, and most of the occlusion data is preferably a data set of the mask. During data acquisition, manually processed WiderFace unoccluded data and MAFA occluded data can be adopted.
S202, data processing and enhancing: and converting the BGR pictures in the data set into YUV format, and only storing the data of the Y channel, and then performing data enhancement to obtain an enhanced data set.
The data enhancement includes adding data to prevent model overfitting by applying a combination of at least one or more of color distortion, increasing brightness contrast, random cropping, horizontal flipping, and transformation channels to pictures in the data set.
The method for training the data in the single channel can reduce the parameter quantity of the model and improve the detection speed of the model. Through the picture of direct training single channel Y form, also avoid moving the end and need picture format conversion, save time for the model can reach super real-time detection's effect under the condition that moves the end only CPU.
The strategy adopted for enhancing the brightness contrast is to reduce the brightness in the target frame and increase the brightness outside the target frame. The data enhancement can be realized in various combinations, so that the model can be more robust under the illumination condition.
S203, training: performing network training by using a random optimization algorithm with momentum of 0.9 and weight attenuation factor of 0.0005, wherein the random optimization algorithm reduces imbalance between positive and negative samples by using a difficult sample mining mode, and the initial learning rate is set to 10 in the first 100 rounds of training-3After 50 and 100 rounds each by a factor of 10, during training each predictor was first matched to the best Jaccard overlap anchor point, and then the anchor point was matched to Jaccard overlap faces with a threshold above 0.35.
By the design, the trained detector can predict pictures.
During prediction, the characteristics of 8 th, 11 th, 13 th and 15 th convolution layers in the backbone network are respectively input into each prediction layer to carry out face frame, face key point positioning and mask recognition operation.
For each anchor point, representing by using 4 offsets from its coordinates and N scores for classification, where N is 2; for each anchor point during detector training, the minimization of the multitask loss function:
Figure BDA0002727188230000061
wherein L isobjDetecting for a cross entropy loss function whether an anchor point contains a target classification, piFor the probability of an anchor having a target, if the anchor contains a target, then
Figure BDA0002727188230000062
Otherwise, the value is 0; l isboxAdopt the smoth-L1 loss function for locating the anchor point of the human face, ti={tx,ty,tw,th}iIn order to predict the coordinate offset of the box,
Figure BDA0002727188230000063
the coordinate offset of the anchor point of the positive sample; l islandmarkAdopting smoth-L1 loss function for positioning key points of human facei={lx1,ly1,lx2,ly2,...,lx5,ly5}iFor the predicted amount of the keypoint offset,
Figure BDA0002727188230000064
is the coordinate offset of the key point of the positive sample, if the sample is the wearing maski={lx1,ly1,lx2,ly2}i
Figure BDA0002727188230000065
Wherein lx1,ly1And
Figure BDA0002727188230000066
respectively representing the left-eye predicted keypoint coordinate offset and the positive sample keypoint offset, lx2,ly2And
Figure BDA0002727188230000067
respectively representing the coordinate offset of the right-eye predicted key point and the offset of the positive sample key point; lambda [ alpha ]1And λ2Respectively, the weight coefficients of the face frame and the key point loss function.
Where anchor points of 10 to 256 pixels are employed to match the minimum size of the corresponding effective receptive field, the anchor points for each detected feature are sized to (10, 16, 24), (32, 48), (64, 96) and (128, 192, 256), respectively.
By the design, the purpose of end-to-end mask identification is achieved, an additional classifier is not needed to be added to independently identify whether the mask is worn, operations such as picture rotation and matting can be avoided under the condition that the mobile end only has a CPU, and time is saved. In addition, the invention optimizes the detection of key points of the face wearing the mask, and only visible eye feature loss is optimized during training under the condition of wearing the mask.
And S3, decoding according to the generated anchor points: and decoding the predicted value of the face frame, converting the predicted value into the real position of the boundary frame, and decoding the key points of the face to convert the decoded value into the real position of the key points.
The specific decoding process is as follows:
the predicted value l of the face frame obtained by the detector is equal to (l)cx,lcy,lw,lh) Decoding operation is carried out, and the real position b ═ b of the boundary box is converted intocx,bcy,bw,bh):
bcx=lcxdw+dcx,bcy=lcydh+dcy
bw=dwexp(lw),bh=dhexp(lh);
Predicting the face key points obtained by the detector
Figure BDA0002727188230000071
Translating to true positions of keypoints
Figure BDA0002727188230000072
Figure BDA0002727188230000073
Wherein d ═ d (d)cx,dcy,dw,dh) Indicating the default anchor generated at step S2.
S4, non-maximum suppression: and eliminating the overlapped detection frames by adopting a non-maximum suppression algorithm with a threshold value of 0.4 to obtain a final face detection frame, face key points and mask recognition result, wherein the final face detection frame, face key points and mask recognition result comprises information of the upper left corner coordinate, the lower right corner coordinate, the two eye coordinates, the nose coordinate, a pair of mouth corner coordinates and the confidence coefficient of wearing the mask.
The picture shown in fig. 3 is pre-processed to adjust the image size and standardize the image size. And converting the standardized picture format into a YUV format, storing the data of the Y channel only, enhancing the data, and inputting the data into a trained detector for prediction. The network model at the time of prediction is shown in fig. 2, in the multitask loss function, the anchor point contains the target,
Figure BDA0002727188230000074
finally, a face detection frame is detected and red frame marking is carried out, and each face detection frame comprises two eye coordinates, a nose coordinate and a pair of mouth angle coordinates and is marked. The obtained detection results are face detection frames, face key points and mask identification results, and the detection results are used in a face identification scene and can be used in other subsequent identification processes as accurate data. Particularly, the invention extracts the coordinates of two eyes as accurate data aiming at the key points of the human face in the detection result, and provides important basis for judging whether the human body is a living body or not after data processing.
The picture shown in fig. 4 is subjected to image resizing by means of preprocessing, so as to be standardized. And converting the standardized picture format into a YUV format, storing the data of the Y channel only, enhancing the data, and inputting the data into a trained detector for prediction. The network model at the time of prediction is shown in fig. 2, in the multitask loss function, the anchor point contains the target,
Figure BDA0002727188230000075
finally, a face detection frame is detected and red frame marking is carried out, and each face detection frame comprises two eye coordinates, a nose coordinate and a pair of mouth angle coordinates and is marked.
And S5, intercepting the eye image: extracting coordinates x and y of two eyes according to the key points of the face, and extending the x and y to preset pixels in four directions respectively to obtain an eye image.
Specifically, two eye coordinates x and y are extracted according to the key points of the face, and the x and the y are extended for 32 pixels in four directions respectively to obtain 64 × 64 eye images.
S6, living body recognition neural network: and judging whether the eye image is a living body by adopting a preset living body recognition neural network to obtain a judgment result.
Specifically, the living body recognition neural network extracts living body features by using a mobilenet lightweight neural network, and the living body recognition neural network judges whether an eye image is a living body or not by using a cross entropy loss function as a loss function.
The living body recognition neural network in the step adopts a trained living body recognition neural network, and during training, a trained data set uses collected samples, wherein a positive sample is a real person picture shot under an infrared camera, and an attack sample is one or more combination forms of a mobile phone screen face, an ipad face, a printed colorful face or a gray face shot under an infrared image.
The picture shown in fig. 3 is an attack sample, after the picture is processed in S4, two eye coordinates x and y are extracted according to key points of a human face, x and y are extended by 32 pixels in four directions respectively to obtain 64 × 64 eye images, and after the eye images are judged by the living body recognition neural network in the step, the judgment result is that "fake" is not a living body.
The picture shown in fig. 4 is a real person picture, after the picture is processed in S4, two eye coordinates x and y are extracted according to key points of a human face, x and y are extended by 32 pixels in four directions respectively to obtain 64 × 64 eye images, and after the eye images are judged by the living body recognition neural network in the step, the result is that "real" is a living body.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (10)

1. A living body detection method based on infrared images is characterized by comprising the following steps:
acquiring an infrared picture and preprocessing the picture;
the picture is put into a preset detector for prediction, features obtained through four different convolution layers in a backbone network of the detector are combined with anchor points with multiple sizes, face detection, face key point detection and mask recognition are carried out, and a face frame prediction value, a face key point and a mask recognition result are obtained;
decoding the predicted value of the face frame, converting the predicted value into the real position of a boundary frame, and decoding the key points of the face to convert the key points into the real position of the key points;
eliminating overlapped detection frames by adopting a non-maximum suppression algorithm with a threshold value of 0.4 to obtain a final face detection frame, face key points and mask recognition results, wherein the final face detection frame, face key points and mask recognition results comprise information of a left upper corner coordinate, a right lower corner coordinate, two eye coordinates, a nose coordinate, a pair of mouth corner coordinates and confidence coefficient of wearing a mask;
extracting coordinates x and y of two eyes according to the key points of the face, and extending the x and y to preset pixels in four directions respectively to obtain an eye image;
and judging whether the eye image is a living body by adopting a preset living body recognition neural network to obtain a judgment result.
2. The infrared image-based in-vivo detection method as set forth in claim 1, wherein before the picture is placed in a preset detector for prediction, the method further comprises:
loading preset pre-training network parameters to the detector, and generating a default anchor point according to the size and length-width ratio of the preset anchor point;
training the detector through a preset data set to obtain a trained detector;
the detector includes a backbone network, a prediction layer, and a multi-tasking loss layer.
3. The infrared image-based in-vivo detection method as set forth in claim 2, wherein the training of the detector through a preset data set to obtain a trained detector comprises:
acquiring unoccluded data and occluded data serving as a data set, converting a BGR picture in the data set into a YUV format, only storing data of a Y channel, and then performing data enhancement to obtain an enhanced data set;
performing network training by using a random optimization algorithm with momentum of 0.9 and weight attenuation factor of 0.0005, wherein the random optimization algorithm reduces imbalance between positive and negative samples by using a difficult sample mining mode, and the initial learning rate is set to 10 in the first 100 rounds of training-3After 50 and 100 rounds each by a factor of 10, during training each predictor was first matched to the best Jaccard overlap anchor point, and then the anchor point was matched to Jaccard overlap faces with a threshold above 0.35.
4. The infrared image-based living body detection method according to claim 3, wherein the non-occlusion data is a face picture when a mask is not worn, the occlusion data is a face picture when a mask is worn, and the occlusion data is larger than the non-occlusion data.
5. The infrared image-based liveness detection method of claim 3 wherein said performing data enhancement comprises:
adding data to prevent model overfitting by applying a combination of at least one or more of color distortion, increased brightness contrast, random cropping, horizontal flipping, and transformation channels to pictures in the data set.
6. The method for detecting living bodies based on infrared images according to claim 2, wherein the steps of putting the pictures into a preset detector for prediction, combining features obtained by four different convolution layers in a backbone network of the detector with anchor points with a plurality of sizes, and performing face detection, face key point detection and mask recognition to obtain a face frame prediction value, a face key point and a mask recognition result comprise:
the pictures are put into the trained detector for prediction, and the characteristics of 8 th, 11 th, 13 th and 15 th convolutional layers in the backbone network are respectively input into each prediction layer for face frame, face key point positioning and mask recognition operation during prediction;
for each anchor point, representing by using 4 offsets from its coordinates and N scores for classification, where N is 2; for each anchor point during detector training, the minimization of the multitask loss function:
Figure FDA0002727188220000021
wherein L isobjDetecting for a cross entropy loss function whether an anchor point contains a target classification, piFor the probability of an anchor having a target, if the anchor contains a target, then
Figure FDA0002727188220000022
Otherwise, the value is 0; l isboxAdopt the smoth-L1 loss function for locating the anchor point of the human face, ti={tx,ty,tw,th}iIn order to predict the coordinate offset of the box,
Figure FDA0002727188220000023
the coordinate offset of the anchor point of the positive sample; l islandmarkAdopting smoth-L1 loss function for positioning key points of human facei={lx1,ly1,lx2,ly2,…,lx5,ly5}iFor the predicted amount of the keypoint offset,
Figure FDA0002727188220000024
is the coordinate offset of the key point of the positive sample, if the sample is the wearing maski={lx1,ly1,lx2,ly2}i
Figure FDA0002727188220000025
Wherein lx1,ly1And
Figure FDA0002727188220000026
respectively representing the left-eye predicted keypoint coordinate offset and the positive sample keypoint offset, lx2,ly2And
Figure FDA0002727188220000027
respectively representing the coordinate offset of the right-eye predicted key point and the offset of the positive sample key point; lambda [ alpha ]1And λ2Respectively, the weight coefficients of the face frame and the key point loss function.
7. The infrared image-based living body detecting method as set forth in claim 6, wherein anchor points of 10 to 256 pixels are employed to match the minimum size of the corresponding effective receptive field, and the size of each anchor point for detecting the feature is set to (10, 16, 24), (32, 48), (64, 96) and (128, 192, 256), respectively.
8. The infrared image-based living body detection method as claimed in claim 1, wherein the decoding operation of the face frame prediction value is performed to convert the face frame prediction value into the real position of the bounding box, and the decoding operation of the face key point is performed to convert the face key point into the real position of the key point, comprising:
the predicted value l ═ of the face frame obtained by the detector is (l)cx,lcy,lw,lh) Decoding operation is carried out, and the real position b ═ b of the boundary box is converted intocx,bcy,bw,bh):
bcx=lcxdw+dcx,bcy=lcydh+dcy
bw=dwexp(lw),bh=dhexp(lh);
Predicting the face key point value obtained by the detector
Figure FDA0002727188220000031
Figure FDA0002727188220000032
Translating to true positions of keypoints
Figure FDA0002727188220000033
Figure FDA0002727188220000034
Wherein d ═ d (d)cx,dcy,dw,dh) Representing a generated default anchor point.
9. The infrared image-based in-vivo detection method as claimed in claim 1, wherein the extracting coordinates x and y of two eyes according to the key points of the human face, and extending the x and y to four directions respectively by preset pixels to obtain the eye image comprises:
and extracting coordinates x and y of two eyes according to the key points of the human face, and extending the x and the y to four directions by 32 pixels respectively to obtain a 64 x 64 eye image.
10. The method for detecting living bodies based on infrared images as claimed in claim 1, wherein the determining whether the eye images are living bodies by using a preset living body recognition neural network to obtain a determination result comprises:
the living body recognition neural network adopts a mobilenet lightweight neural network to extract living body characteristics, and the living body recognition neural network uses a cross entropy loss function as a loss function.
CN202011106811.1A2020-10-162020-10-16 Infrared image-based living detection methodActiveCN112232204B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202011106811.1ACN112232204B (en)2020-10-162020-10-16 Infrared image-based living detection method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202011106811.1ACN112232204B (en)2020-10-162020-10-16 Infrared image-based living detection method

Publications (2)

Publication NumberPublication Date
CN112232204Atrue CN112232204A (en)2021-01-15
CN112232204B CN112232204B (en)2022-07-19

Family

ID=74118035

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202011106811.1AActiveCN112232204B (en)2020-10-162020-10-16 Infrared image-based living detection method

Country Status (1)

CountryLink
CN (1)CN112232204B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112801038A (en)*2021-03-022021-05-14重庆邮电大学Multi-view face living body detection method and system
CN113033374A (en)*2021-03-222021-06-25开放智能机器(上海)有限公司Artificial intelligence dangerous behavior identification method and device, electronic equipment and storage medium
CN113298008A (en)*2021-06-042021-08-24杭州鸿泉物联网技术股份有限公司Living body detection-based driver face identification qualification authentication method and device
WO2021238125A1 (en)*2020-05-272021-12-02嘉楠明芯(北京)科技有限公司Face occlusion detection method and face occlusion detection apparatus
CN114926458A (en)*2022-06-172022-08-19珠海格力电器股份有限公司 Method, device and face recognition system for generating face image of infrared mask
CN114973372A (en)*2022-05-272022-08-30图灵视讯(深圳)有限公司Baby expression classification detection method

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107748858A (en)*2017-06-152018-03-02华南理工大学A kind of multi-pose eye locating method based on concatenated convolutional neutral net
CN109919097A (en)*2019-03-082019-06-21中国科学院自动化研究所 Joint detection system and method of face and key points based on multi-task learning
CN110119676A (en)*2019-03-282019-08-13广东工业大学A kind of Driver Fatigue Detection neural network based
CN110647817A (en)*2019-08-272020-01-03江南大学 Real-time face detection method based on MobileNet V3
CN110866490A (en)*2019-11-132020-03-06复旦大学Face detection method and device based on multitask learning
WO2020151489A1 (en)*2019-01-252020-07-30杭州海康威视数字技术股份有限公司Living body detection method based on facial recognition, and electronic device and storage medium
CN111680588A (en)*2020-05-262020-09-18广州多益网络股份有限公司Human face gate living body detection method based on visible light and infrared light

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107748858A (en)*2017-06-152018-03-02华南理工大学A kind of multi-pose eye locating method based on concatenated convolutional neutral net
WO2020151489A1 (en)*2019-01-252020-07-30杭州海康威视数字技术股份有限公司Living body detection method based on facial recognition, and electronic device and storage medium
CN109919097A (en)*2019-03-082019-06-21中国科学院自动化研究所 Joint detection system and method of face and key points based on multi-task learning
CN110119676A (en)*2019-03-282019-08-13广东工业大学A kind of Driver Fatigue Detection neural network based
CN110647817A (en)*2019-08-272020-01-03江南大学 Real-time face detection method based on MobileNet V3
CN110866490A (en)*2019-11-132020-03-06复旦大学Face detection method and device based on multitask learning
CN111680588A (en)*2020-05-262020-09-18广州多益网络股份有限公司Human face gate living body detection method based on visible light and infrared light

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TANVI B. PATEL,AND ETC: "Occlusion detection and recognizing human face using neural network", 《2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL (I2C2)》*
刘淇缘等: "遮挡人脸检测方法研究进展", 《计算机工程与应用》*

Cited By (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2021238125A1 (en)*2020-05-272021-12-02嘉楠明芯(北京)科技有限公司Face occlusion detection method and face occlusion detection apparatus
CN112801038A (en)*2021-03-022021-05-14重庆邮电大学Multi-view face living body detection method and system
CN113033374A (en)*2021-03-222021-06-25开放智能机器(上海)有限公司Artificial intelligence dangerous behavior identification method and device, electronic equipment and storage medium
CN113298008A (en)*2021-06-042021-08-24杭州鸿泉物联网技术股份有限公司Living body detection-based driver face identification qualification authentication method and device
CN114973372A (en)*2022-05-272022-08-30图灵视讯(深圳)有限公司Baby expression classification detection method
CN114926458A (en)*2022-06-172022-08-19珠海格力电器股份有限公司 Method, device and face recognition system for generating face image of infrared mask

Also Published As

Publication numberPublication date
CN112232204B (en)2022-07-19

Similar Documents

PublicationPublication DateTitle
CN112232204A (en)Living body detection method based on infrared image
CN112232205B (en)Mobile terminal CPU real-time multifunctional face detection method
CN114783024A (en)Face recognition system of gauze mask is worn in public place based on YOLOv5
CN111368666B (en)Living body detection method based on novel pooling and attention mechanism double-flow network
WO2018188453A1 (en)Method for determining human face area, storage medium, and computer device
CN109886153B (en) A real-time face detection method based on deep convolutional neural network
CN108764058A (en)A kind of dual camera human face in-vivo detection method based on thermal imaging effect
CN108717524A (en)It is a kind of based on double gesture recognition systems and method for taking the photograph mobile phone and artificial intelligence system
CN111652082A (en) Face liveness detection method and device
CN114550268A (en)Depth-forged video detection method utilizing space-time characteristics
CN114842397A (en)Real-time old man falling detection method based on anomaly detection
CN112818722A (en)Modular dynamically configurable living body face recognition system
CN111832464B (en)Living body detection method and device based on near infrared camera
CN109325472B (en) A face detection method based on depth information
CN108446690A (en)A kind of human face in-vivo detection method based on various visual angles behavioral characteristics
CN117623031B (en) Elevator sensorless control system and method
CN111274851A (en) A kind of living body detection method and device
Peng et al.Presentation attack detection based on two-stream vision transformers with self-attention fusion
CN119600670B (en) A face mask detection method based on YOLOv8 to resist occlusion and counterfeit attacks
CN111797694A (en)License plate detection method and device
CN115546683A (en) An improved pornographic video detection method and system based on key frames
CN114155590B (en) A face recognition method
CN113014914B (en)Neural network-based single face-changing short video identification method and system
CN117496019B (en)Image animation processing method and system for driving static image
CN112200008A (en)Face attribute recognition method in community monitoring scene

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
PE01Entry into force of the registration of the contract for pledge of patent right

Denomination of invention:Live detection method based on infrared images

Granted publication date:20220719

Pledgee:Bohai Bank Co.,Ltd. Shanghai Branch

Pledgor:Shanghai dianze Intelligent Technology Co.,Ltd.|Zhongke Zhiyun Technology Co.,Ltd.

Registration number:Y2024310001360

PE01Entry into force of the registration of the contract for pledge of patent right
PC01Cancellation of the registration of the contract for pledge of patent right

Granted publication date:20220719

Pledgee:Bohai Bank Co.,Ltd. Shanghai Branch

Pledgor:Shanghai Dianze Intelligent Technology Co., Ltd.|Zhongke Zhiyun Technology Co., Ltd.

Registration number:Y2024310001360

PC01Cancellation of the registration of the contract for pledge of patent right

[8]ページ先頭

©2009-2025 Movatter.jp