Movatterモバイル変換


[0]ホーム

URL:


CN114360041B - Fatigue state detection method and system based on key point detection and head posture - Google Patents

Fatigue state detection method and system based on key point detection and head posture
Download PDF

Info

Publication number
CN114360041B
CN114360041BCN202210013760.0ACN202210013760ACN114360041BCN 114360041 BCN114360041 BCN 114360041BCN 202210013760 ACN202210013760 ACN 202210013760ACN 114360041 BCN114360041 BCN 114360041B
Authority
CN
China
Prior art keywords
head
image
fatigue state
face
mouth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210013760.0A
Other languages
Chinese (zh)
Other versions
CN114360041A (en
Inventor
唐贤伦
张艺琼
李洁
刘庆
邹密
邓武权
徐梓辉
王会明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and TelecommunicationsfiledCriticalChongqing University of Post and Telecommunications
Priority to CN202210013760.0ApriorityCriticalpatent/CN114360041B/en
Publication of CN114360041ApublicationCriticalpatent/CN114360041A/en
Application grantedgrantedCritical
Publication of CN114360041BpublicationCriticalpatent/CN114360041B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Landscapes

Abstract

Translated fromChinese

本发明公开了基于关键点检测和头部姿态的疲劳状态检测方法及系统,构建并训练主干网络采用深度可分离卷积网络的MMC多任务预测模型,获取单位时间内的若干帧人脸图像,采用MTCNN网络检测每张图像的人脸位置并裁剪出头部图像;将头部图像输入训练好的MMC多任务预测模型中,得到头部姿态角度和人脸关键点的位置信息;利用双阈值法分别判定头部、眼部和嘴部疲劳状态;设定相关系数综合判定人的疲劳状态,结合人脸关键点检测和头部姿态的相关性,采用主干网络为深度可分离卷积网络的MMC多任务预测模型,将两个任务放在同一个网络中同时进行,可以大幅度的减少需要的参数量和运算量,从而提高了模型的检测速度,进而达到实时的效果。

The present invention discloses a fatigue state detection method and system based on key point detection and head posture, constructs and trains an MMC multi-task prediction model whose main network adopts a deep separable convolutional network, obtains a plurality of frames of face images within a unit time, adopts an MTCNN network to detect the face position of each image and cuts out the head image; inputs the head image into the trained MMC multi-task prediction model to obtain the head posture angle and the position information of the key points of the face; uses a double threshold method to respectively determine the fatigue state of the head, eyes and mouth; sets a correlation coefficient to comprehensively determine the fatigue state of the person, combines the correlation between the key point detection of the face and the head posture, adopts the MMC multi-task prediction model whose main network is a deep separable convolutional network, places the two tasks in the same network and performs them simultaneously, which can greatly reduce the required parameter amount and calculation amount, thereby improving the detection speed of the model and achieving a real-time effect.

Description

Fatigue state detection method and system based on key point detection and head posture
Technical Field
The invention belongs to the technical field of automatic fatigue detection, and particularly relates to a fatigue state detection method and system based on key point detection and head posture.
Background
Face detection and head pose estimation in the field of computer vision refers to detecting all faces in an image and estimating that each face can represent three directional angles, yaw (yaw), pitch (pitch), and roll (roll). Judging people's motivation, intent based on head pose, there are a wide range of applications in providing cues and gaze, such as human behavioral analysis and gaze estimation. Although the face detection and pose estimation have made tremendous progress, respectively, it is still a difficult task to implement a multitasking framework with good real-time and robustness in a complex environment. At present, convolutional Neural Networks (CNNs) are generally adopted for face detection and head pose estimation, and remarkable successes such as image classification, face recognition and object detection can be achieved in a series of complex computer vision tasks. In order to solve the problem of face detection and head pose estimation at the same time, all face images in a face are detected by using a face detection network, and then pose of each face is estimated by using a head pose estimation network, the problem is that because the two networks are separated, inaccuracy of the face detection network can affect the result of head pose estimation. Furthermore, such a framework is disadvantageous in that the inherent relatedness of the two tasks facilitates each other's performance. Meanwhile, the real-time performance of head pose estimation is also affected by the complex convolutional neural network. A common head pose estimation system first detects faces using a face detection network, then uses the head pose estimation network to estimate the pose of each face, and because the two networks are separate, the performance of the head pose estimation is affected by the face detection network due to the different data sets used. Meanwhile, the common convolutional neural network model is large, and the real-time effect is difficult to achieve during prediction.
Disclosure of Invention
The invention aims to provide a fatigue state detection method and a system based on key point detection and head gesture, which aims to solve the technical problem of improving the real-time performance of automatic face fatigue state detection, and the invention aims to provide a fatigue state detection method and a system based on key point detection and head gesture, wherein in the optimization process of a network for learning target training combining two tasks of face key point detection and head gesture estimation, because the correlation between the two can use the same network, adopt the backbone network as the MMC multitask prediction model of the separable convolutional network of degree of depth, put two tasks into the same network and go on at the same time, can reduce the parameter quantity and operand that needs by a wide margin, thus has improved the detection speed of the model, and then reach the real-time effect.
The invention is realized by the following technical scheme:
In one aspect, the invention provides a fatigue state detection method based on key point detection and head gesture, comprising the following steps:
constructing and training a backbone network, and adopting an MMC multi-task prediction model of a depth separable convolutional network to obtain a trained MMC multi-task prediction model;
Acquiring a plurality of frames of face images in unit time, wherein each frame is used as an image, detecting the face position of each image by adopting MTCNN networks and cutting out head images;
inputting the head image into a trained MMC multitask prediction model to obtain the head attitude angle and the position information of the key points of the human face;
Judging the fatigue states of the heads in the images by using a double-threshold method according to the head posture angle of each image, and judging the fatigue states of the eyes and the mouths in the images by using the double-threshold method according to the position information of the eyes and the mouths in the face key points of each image;
comprehensively judging the fatigue state of the person according to the fatigue states of the head, eyes and mouth.
When the fatigue state of a person is detected, the fatigue state of the person is generally judged based on the state of a face part, however, the characteristics of eyes, a mouth and a head are more complicated and the effect is general, the common deep convolutional neural network model is difficult to achieve the effect of real-time and less attention to image space information, and after the characteristics of the eyes, the mouth and the head are detected, a proper algorithm is established to judge whether the fatigue state exists, so that the formed convolutional neural network model is larger, the effect of real-time is difficult to achieve in the prediction, therefore, the invention considers that the face detection and the head posture estimation are both related to the face and depend on the potential facial characteristics, so that the face key point detection and the head posture estimation task can be simultaneously carried out by using the same network in the training and the prediction, when the same network is used, the head posture information can improve the accuracy of positioning the key points of the human face, the positioning of the key points of the human face can reflect the information of the head posture, the two have stronger relativity, the relativity has positive effects on two tasks, firstly, MTCNN networks are adopted to detect the position of the human face and cut head images, the preprocessed head images are input into an MMC multitask prediction model to obtain the head posture angle and the position information of the key points of the human face such as eyes, mouth and the like, in the prediction process, the fatigue state of each part is determined by adopting a double-threshold method, the head posture angle obtained by regression is used for judging the head state, the coordinates of the part of the characteristic points of eyes and the mouth in the key points of the human face obtained by regression are respectively used for judging the states of the eyes and the mouth, and finally, the fatigue state of the human is comprehensively judged, the backbone network of the model adopts a framework of a depth separable convolution network, so that two tasks are realized in one network at the same time, the parameter and the operand are greatly reduced, the detection speed of the model is improved, and the real-time effect can be achieved.
Further, training is performed by using a 300w_lp dataset having face keypoint coordinates and head pose angle labels when training the MMC multi-task prediction model, and preprocessing images in the dataset before training the MMC multi-task prediction model using the 300w_lp dataset, including:
and cutting redundant background parts in the image according to the coordinates of the key points of the face in the data set, unifying the size of the cut image to 224x224, and carrying out graying treatment and normalization treatment on the image with the unified size.
Further, the process of training the MMC multitasking prediction model includes two tasks:
The face key point detection task is used for positioning the position of the face feature point according to the face key point coordinates in the image, measuring the difference between the predicted coordinate value and the real coordinate value of the feature point by using the L2 loss function lossa, obtaining the position information of the face key point by regression,
The head estimation task is used for predicting angles of the head in the image in three directions yaw, pitch, roll according to the head posture angle label in the image, and the loss function is as follows:
wherein,As the estimation result of the head posture angles in the three directions of yaw, pitch, roll, (x1,x2,x3) is the head posture angle label in the three directions;
training by taking the total loss of the MMC multitask prediction model as a learning target, wherein the total loss is the sum of the loss of the face key point detection task and the head estimation task:
loss=lossa+ηlossb
wherein η is the task assigned weight set to 1.
Further, the process of obtaining the head attitude angle and the position information of the key points of the human face is as follows:
the main network of the MMC multitask prediction model is utilized to extract and fuse the characteristics of the input head image to obtain a characteristic image, the main network adopts an improved lightweight convolution MobileNet-V2 network structure,
Simultaneously, the CA attention module embedded in the backbone network is utilized to respectively pool the feature images along the horizontal direction and the vertical direction to obtain the position information of the feature images;
and according to the position information of the feature map, respectively utilizing two full-connection layers behind the backbone network to return the head posture angle and the position information of the key points of the human face.
Further, the improved lightweight convolution MobileNet-V2 network structure performs feature extraction on the input header image by adopting convolution kernels of 1x1, 3x3 and 5x5, sets the convolution stride to 1, sets the pad corresponding to each convolution kernel to 0,1 and 2, and the improved lightweight convolution MobileNet-V2 network structure size is 4M.
Further, for a plurality of continuous face images in the unit time, the fatigue state of each part is judged by a double-threshold method, which comprises the following steps:
judging the fatigue state of each image head:
Judging whether the pitch attitude angle of the head attitude angle is larger than 30 degrees when the head is low or not according to the head attitude angle of each image, if so, judging that the head of the image is in a fatigue state, and if the proportion of the head in the fatigue state in all images exceeds 30 percent, judging that the head is in the fatigue state;
judging the fatigue state of eyes of each image:
Calculating the aspect ratio of eyes according to the position information of eye key points of each image, judging whether the aspect ratio of eyes is smaller than 0.2, if so, judging that the eyes of the image are in a fatigue state, and if the proportion of the images in the fatigue state of eyes is more than 40 percent of the proportion of all the images, judging that the eyes are in the fatigue state;
Judging the fatigue state of each image mouth:
According to the position information of the mouth key points of each image, calculating the mouth aspect ratio, judging whether the mouth aspect ratio is larger than 0.3, if so, judging that the mouth of the image is in a fatigue state, and if the proportion of the images in the fatigue state of the mouth to all the images exceeds 40%, judging that the mouth is in the fatigue state.
Further, according to the influence weight of the fatigue states of the head, eyes and mouth on the fatigue state of the person, a correlation coefficient is set for the fatigue state of each part to comprehensively judge the fatigue state Z of the person:
Z=αZeye+βZmouth+λZhead
Wherein, each time Zeye represents the fatigue state of eyes, Zmouth represents the fatigue state of mouth, Zhead represents the fatigue state of head, and the relative systems alpha, beta and lambda are respectively set to 0.2, 0.3 and 0.5;
When Z is greater than or equal to 0.5, the person is judged to be in a fatigue state.
In another aspect, the present invention provides a fatigue state detection system based on keypoint detection and head pose, comprising:
The model training module is used for constructing and training an MMC multi-task prediction model to obtain a trained MMC multi-task prediction model;
The face position detection module is used for detecting the face position of each image by adopting MTCNN network and cutting out head images according to a plurality of frames of face images in the acquired unit time, wherein each frame is taken as an image;
the parallel prediction module is used for inputting the head image into the trained MMC multitask prediction model to obtain the head attitude angle and the position information of the key points of the human face;
The local state detection module is used for judging the fatigue states of the heads in the images by utilizing a double-threshold method according to the head posture angle of each image, and judging the fatigue states of the eyes and the mouths in the images by utilizing the double-threshold method according to the position information of the eyes and the mouths in the key points of the faces of the images;
And the comprehensive fatigue state detection module is used for comprehensively judging the fatigue state of the person according to the fatigue states of the head, eyes and mouth.
Further, the MMC multi-task prediction model comprises a main network and two full-connection layers respectively used for returning the head posture angle and the position information of the key points of the human face, the main network adopts an improved lightweight convolution MobileNet-V2 network structure, meanwhile, a CA attention module is embedded in the main network,
The main network is used for extracting and fusing the characteristics of the input head images to obtain a characteristic image,
The CA attention module is used for respectively carrying out pooling operation on the feature images along the horizontal direction and the vertical direction to obtain the position information of the feature images.
Further, the improved lightweight convolution MobileNet-V2 network structure performs feature extraction on the input header image by adopting convolution kernels of 1x1, 3x3 and 5x5, sets the convolution stride to 1, sets the pad corresponding to each convolution kernel to 0,1 and 2, and the improved lightweight convolution MobileNet-V2 network structure size is 4M.
Compared with the prior art, the invention has the following advantages and beneficial effects:
According to the invention, the depth separable convolution network is used as a backbone network of the MMC multi-task prediction model, and according to task correlation of face key point detection and head posture estimation, the same network is used for training and predicting two tasks, so that the required parameter quantity and calculation quantity can be greatly reduced, the detection speed of the model is improved, and the real-time effect is achieved;
according to the invention, an improved lightweight convolution MobileNet-V2 network structure is used as a backbone network, on one hand, features of pictures are extracted by adopting different scale convolutions at a first layer of the backbone network and are fused, different receptive fields are obtained through a plurality of convolution kernels with different sizes, and the correlation can be better described by obtaining a larger receptive field in consideration of the influence of the relative positions of eyes, nose, mouth and other parts in a face on the attitude angle, so that the model has higher speed and high instantaneity when training and predicting two tasks, and meanwhile, a CA attention module is embedded in the backbone network, so that accurate position information can be captured in space, and the effect of focusing on image space information while achieving instantaneity is realized.
Drawings
In order to more clearly illustrate the technical solutions of the exemplary embodiments of the present invention, the drawings that are needed in the examples will be briefly described below, it being understood that the following drawings only illustrate some examples of the present invention and therefore should not be considered as limiting the scope, and that other related drawings may be obtained from these drawings without inventive effort for a person skilled in the art. In the drawings:
FIG. 1 is a flowchart of a fatigue state detection method based on face key points and head gestures in embodiment 1 of the present invention;
FIG. 2 is a schematic diagram of a network structure of an MMC multitasking model according to an embodiment of the present invention;
FIG. 3 is a graph of a face keypoint map for a 300W_LP dataset in an embodiment of the present invention;
fig. 4 is a block diagram showing the system configuration in embodiment 2 of the present invention.
Detailed Description
For the purpose of making apparent the objects, technical solutions and advantages of the present invention, the present invention will be further described in detail with reference to the following examples and the accompanying drawings, wherein the exemplary embodiments of the present invention and the descriptions thereof are for illustrating the present invention only and are not to be construed as limiting the present invention.
Example 1
As shown in fig. 1, this embodiment 1 provides a fatigue state detection method based on key point detection and head pose, including the steps of:
S1, constructing and training a backbone network, and adopting an MMC multi-task prediction model of a depth separable convolutional network to obtain a trained MMC multi-task prediction model;
The training is performed by using a 300w_lp dataset when training the MMC multi-task prediction model, and since the 300w_lp dataset is widely used for facial feature recognition and head pose analysis, the MMC multi-task prediction model is a commonly used field 2D landmark dataset, which is composed of 61225 head pose images and is expanded to 122450 images by flipping, and the 300w_lp dataset has face keypoint coordinates and head pose angle labels, the preprocessing is performed on the images in the dataset before training the MMC multi-task prediction model by using the 300w_lp dataset, and the method comprises:
Redundant background parts in the images are cut according to the coordinates of the key points of the faces in the data set, so that training effects are improved, the sizes of the cut images are unified to 224x224, and the images with the unified sizes are subjected to gray scale treatment and normalization treatment.
Specifically, the main network adopts an improved lightweight convolution MobileNet-V2 network structure, the whole network structure is shown in fig. 2, features of images are extracted by adopting convolution of different scales on the first layer of the main network and are fused, different receptive fields are obtained through convolution kernels of different sizes, specifically, 1x1, 3x3 and 5x5 convolution kernels are respectively adopted to replace an original head image input by a single 3x3 convolution kernel for feature extraction, a convolution stride is set to be 1, and pad corresponding to each convolution kernel is set to be 0,1 and 2 respectively, so that the convolved images can obtain features of the same dimension, the features can be directly spliced together, the network performance can be increased in a mode of increasing the network width, in order to reduce the calculated amount, the number of parameters is reduced while the network performance is ensured, more nonlinearities are introduced, the generalization capability is improved, and a larger receptive field is obtained by adopting a 5x5 convolution kernel. The correlation can be better described by obtaining a larger receptive field in consideration of the influence of the relative positions of eyes, nose, mouth and the like in the face on the attitude angle. Meanwhile, a CA attention module is embedded in the backbone network, the CA attention module can capture accurate position information in space, and the CA method decomposes the two-dimensional global pooling operation into two one-dimensional coding processes, namely, the global pooling operation is decomposed into pooling operations respectively along the horizontal direction and the vertical direction of the input feature map, so that the position information related to the x axis and the y axis of the input feature map is obtained. After the improved MobileNet-V2 full-connection layer, two full-connection layers are added, the FC1 full-connection layer is used for face key point detection, 68 characteristic point coordinates are obtained through regression, the FC2 full-connection layer is used for head posture estimation, and three direction posture angles are regressed. The lightweight convolution MobileNet-V2 network structure thus modified is 4M in size. The model is smaller while ensuring the prediction precision, and the real-time performance can be achieved during prediction.
More specifically, the process of training the MMC multi-task prediction model includes two tasks:
the face key point detection task is used for positioning the position of a face feature point according to the face key point coordinates in the image, measuring the difference between the predicted coordinate value and the real coordinate value of the feature point by using an L2 loss function lossa, and obtaining the position information of the face key point by regression;
The head estimation task is used for predicting the angles of the head in the image in three directions yaw, pitch, roll according to the head posture angle labels in the image, the learning target is to return yaw, pitch, roll three angles to describe the position of the head, and the loss function is as follows:
wherein,As the estimation result of the head posture angles in the three directions of yaw, pitch, roll, (x1,x2,x3) is the head posture angle label in the three directions;
training by taking the total loss of the MMC multitask prediction model as a learning target, wherein the total loss is the sum of the loss of the face key point detection task and the head estimation task:
loss=lossa+ηlossb (2)
Wherein, since both face keypoint detection and head pose estimation belong to regression tasks, the η task allocation weight is set to 1.
S2, acquiring a plurality of frames of face images in unit time, wherein each frame is used as an image, detecting the face position of each image by adopting MTCNN networks and cutting out head images;
s3, inputting the head image into a trained MMC multitask prediction model to obtain the head attitude angle and the position information of the key points of the human face;
specifically, the process of obtaining the head attitude angle and the position information of the key points of the human face is as follows:
the main network of the MMC multitask prediction model is utilized to extract and fuse the characteristics of the input head image to obtain a characteristic image, the main network adopts an improved lightweight convolution MobileNet-V2 network structure,
Simultaneously, the CA attention module embedded in the backbone network is utilized to respectively pool the feature images along the horizontal direction and the vertical direction to obtain the position information of the feature images;
and according to the position information of the feature map, respectively utilizing two full-connection layers behind the backbone network to return the head posture angle and the position information of the key points of the human face.
S4, judging the fatigue states of the heads in the images by using a double-threshold method according to the head posture angle of each image, and judging the fatigue states of the eyes and the mouths in the images by using the double-threshold method according to the position information of the eyes and the mouths in the key points of the faces of the images. The fatigue state of each part is determined by adopting a double-threshold method, the unit time is set to be 30 seconds, continuous multi-frame images within 30 seconds are acquired, each frame serves as one image, whether each part is in the fatigue state or not is determined by calculating the proportion of the number of frames of the state of each part to the total number of frames of the unit time, for example, the fatigue state can be determined by setting the number of frames of a video frame in an eye-closing state and a mouth in a yawning state to be more than 40% of the number of frames of the unit time, the fatigue state is determined by setting the number of frames of a video frame with a pitch attitude angle of more than 30 degrees to be more than 30% of the number of frames of the unit time under the condition of low head, and the fatigue state is determined by specifically determining the following steps:
1. judging the fatigue state of each image head:
Three Euler angles (pitch, yaw, roll) of the head are obtained by direct regression of an MMC multitask prediction model, because the angle change of the pitch is larger when a person is in a fatigue state, in order to reduce the calculation amount, the change of the pitch attitude angle can be focused, a certain threshold value is set, the pitch attitude angle of a detected continuous multi-frame image when the head is lower is judged to be in a fatigue state, whether the pitch attitude angle of the head attitude angle is larger than 30 degrees is judged according to the head attitude angle of each image, if the pitch attitude angle is larger than 30 degrees, the head of each image is judged to be in the fatigue state, and if the proportion of the head in the fatigue state accounts for more than 30 percent of all images, the head is judged to be in the fatigue state;
2. judging the fatigue state of eyes of each image:
according to the position information of the eye key points of each image, the position coordinates of the eye key points can reflect the opening and closing degree of eyes, and when the closing frequency of the eyes is too high in a certain time, the eyes are judged to be in a fatigue state. The eye opening state is judged by calculating the eye aspect ratio EAR, whether the eye aspect ratio is smaller than 0.2 or not is judged, if the eye aspect ratio is smaller than 0.2, the eye of the image is judged to be in a fatigue state, if the proportion of the images in the fatigue state of the eyes to all the images exceeds 40%, the eyes are judged to be in the fatigue state, and according to the face key point marks as shown in figure 3, the EAR calculation formula is used as follows:
And respectively calculating coordinates of the left eye and the right eye to obtain a left eye aspect ratio EARl and a right eye aspect ratio EARr, and finally comprehensively judging the eye aspect ratio EAR, wherein the width and the height of the eyes are calculated by using Euclidean distance formula, when the eyes are in a closed state, the EAR value is 0, the preliminary threshold value is set to be 0.2, when the value is smaller than the value, the eye opening state is set when the value is larger than the threshold value, and the opening sizes of different eyes are different, so that different thresholds can be set according to specific conditions.
3. Judging the fatigue state of each image mouth:
Also, according to the position information of the mouth key points of each image, the state of the mouth is distinguished by adopting the mouth aspect ratio MAR, and whether the current state is yawned or not is judged by detecting the distance between the upper lip and the lower lip and the mouth opening time. The MAR value is calculated by using the coordinates of the highest and lowest points of the inner lips and the corners of the mouth, and the formula for calculating the MAR value according to the face key point labels as shown in fig. 3 is:
setting the MAR threshold value to be 0.3, judging whether the aspect ratio of the mouth is larger than 0.3, if so, judging that the mouth of the image is in a fatigue state, and if the proportion of the images in the fatigue state of the mouth in all the images exceeds 40%, judging that the mouth is in the fatigue state. S5, comprehensively judging the fatigue state of the person according to the fatigue states of the head, eyes and mouth.
Specifically, considering that a person is easily considered as a tired state when the person is low for a long time and the person is often in a close state while making a yawning, the tired state of the person is comprehensively determined by setting a correlation coefficient for each part, and the tired state Z of the person is comprehensively determined by setting a correlation coefficient for the tired state of each part according to the influence weight of the tired state of the head, eyes and mouth on the tired state of the person:
Z=αZeye+βZmouth+λZhead (7)
Wherein Zeye represents the fatigue state of eyes, Zmouth represents the fatigue state of mouth, Zhead represents the fatigue state of head, the relative systems alpha, beta and lambda are respectively set to 0.2, 0.3 and 0.5, and when Z is more than or equal to 0.5, the person is judged to be in the fatigue state.
Example 2
As shown in fig. 4, the present embodiment provides a fatigue state detection system based on key point detection and head pose, including:
The model training module is used for constructing and training an MMC multi-task prediction model to obtain a trained MMC multi-task prediction model;
The training is performed by using a 300w_lp dataset when training the MMC multi-task prediction model, and since the 300w_lp dataset is widely used for facial feature recognition and head pose analysis, the MMC multi-task prediction model is a commonly used field 2D landmark dataset, which is composed of 61225 head pose images and is expanded to 122450 images by flipping, and the 300w_lp dataset has face keypoint coordinates and head pose angle labels, the preprocessing is performed on the images in the dataset before training the MMC multi-task prediction model by using the 300w_lp dataset, and the method comprises:
Redundant background parts in the images are cut according to the coordinates of the key points of the faces in the data set, so that training effects are improved, the sizes of the cut images are unified to 224x224, and the images with the unified sizes are subjected to gray scale treatment and normalization treatment.
The MMC multi-task prediction model comprises a backbone network and two full-connection layers which are respectively used for regressing the head attitude angle and the position information of the key points of the human face, wherein the FC1 full-connection layer is used for detecting the key points of the human face, regressing to obtain coordinates of 68 characteristic points, and the FC2 full-connection layer is used for estimating the head attitude and regressing the attitude angles in three directions. The improved lightweight convolution MobileNet-V2 network structure is adopted in the main network, features of images are extracted and fused by adopting convolution of different scales in the first layer of the main network, different receptive fields are obtained through convolution kernels of different sizes, specifically, 1x1, 3x3 and 5x5 convolution kernels are adopted to replace an original single 3x3 convolution check input head image to conduct feature extraction, a convolution stride is set to be 1, and corresponding pad of each convolution kernel is set to be 0,1 and 2 respectively, so that the convolved images can obtain features of the same dimension, the features can be directly spliced together, the performance of the network can be increased by the mode of increasing the network width, in order to reduce the calculated amount, the 1x1 convolution kernel is used for carrying out dimension reduction processing, the number of parameters is reduced while the network performance is ensured to be obtained, more nonlinearity is introduced, the generalization capability is improved, and the 5x5 convolution kernels are adopted to obtain larger receptive fields. The correlation can be better described by obtaining a larger receptive field in consideration of the influence of the relative positions of eyes, nose, mouth and the like in the face on the attitude angle. Meanwhile, a CA attention module is embedded in the backbone network, the CA attention module can capture accurate position information in space, and the CA method decomposes the two-dimensional global pooling operation into two one-dimensional coding processes, namely, the global pooling operation is decomposed into pooling operations respectively along the horizontal direction and the vertical direction of the input feature map, so that the position information related to the x axis and the y axis of the input feature map is obtained. The lightweight convolution MobileNet-V2 network structure thus modified is 4M in size. The model is smaller while ensuring the prediction precision, and the real-time performance can be achieved during prediction.
The face position detection module is used for detecting the face position of each image by adopting MTCNN network and cutting out head images according to a plurality of frames of face images in the acquired unit time, wherein each frame is taken as an image;
the parallel prediction module is used for inputting the head image into the trained MMC multitask prediction model to obtain the head attitude angle and the position information of the key points of the human face;
The local state detection module is used for judging the fatigue states of the heads in the images by utilizing a double-threshold method according to the head posture angle of each image, and judging the fatigue states of the eyes and the mouths in the images by utilizing the double-threshold method according to the position information of the eyes and the mouths in the key points of the faces of the images;
The fatigue state of each part is determined by adopting a double-threshold method, the unit time is set to be 30 seconds, continuous multi-frame images within 30 seconds are acquired, each frame serves as one image, whether each part is in the fatigue state or not is determined by calculating the proportion of the number of frames of the state of each part to the total number of frames of the unit time, for example, the fatigue state can be determined by setting the number of frames of the video in the eye-closed state and the yawning state of the mouth to be more than 40% of the number of frames of the unit time, and the fatigue state can be determined by setting the number of frames of the video in the pitch attitude angle of more than 30 degrees to be more than 30% of the number of frames of the unit time under the condition of low head.
And the comprehensive fatigue state detection module is used for comprehensively judging the fatigue state of the person according to the fatigue states of the head, eyes and mouth. Considering that a person is easily considered as a fatigue state after a long time and the person is often in a close state while making a yawning, the fatigue state of the person is comprehensively determined by setting a correlation coefficient for each part, and the fatigue state Z of the person is comprehensively determined by using the formula (7) by setting a correlation coefficient for the fatigue state of each part according to the influence weight of the fatigue state of the head, eyes and mouth on the fatigue state of the person.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Those of ordinary skill in the art will appreciate that implementing all or part of the above facts and methods may be accomplished by a program to instruct related hardware, the program involved or the program may be stored in a computer readable storage medium which when executed includes the steps of bringing out the corresponding method steps at this time, the storage medium may be a ROM/RAM, a magnetic disk, an optical disk, etc.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (8)

Translated fromChinese
1.一种基于关键点检测和头部姿态的疲劳状态检测方法,其特征在于,包括以下步骤:1. A fatigue state detection method based on key point detection and head posture, characterized in that it comprises the following steps:构建并训练主干网络采用深度可分离卷积网络的MMC多任务预测模型,得到训练好的MMC多任务预测模型;Construct and train the MMC multi-task prediction model whose backbone network adopts a deep separable convolutional network to obtain a trained MMC multi-task prediction model;获取单位时间内的若干帧人脸图像,每一帧作为一张图像,采用MTCNN网络检测每张图像的人脸位置并裁剪出头部图像;Get several frames of face images within a unit time, treat each frame as an image, use the MTCNN network to detect the face position of each image and crop the head image;将头部图像输入训练好的MMC多任务预测模型中,得到头部姿态角度和人脸关键点的位置信息;Input the head image into the trained MMC multi-task prediction model to obtain the head posture angle and the position information of the key points of the face;根据每张图像的头部姿态角度,利用双阈值法判定若干张图像中头部的疲劳状态,同时根据每张图像的人脸关键点中眼部和嘴部的位置信息,利用双阈值法分别判定若干张图像中眼部和嘴部的疲劳状态;According to the head posture angle of each image, the double threshold method is used to determine the fatigue state of the head in several images. At the same time, according to the position information of the eyes and mouth in the facial key points of each image, the double threshold method is used to determine the fatigue state of the eyes and mouth in several images respectively.根据头部、眼部和嘴部的疲劳状态综合判定人的疲劳状态;Comprehensively judge a person's fatigue status based on the fatigue status of the head, eyes and mouth;其中,得到头部姿态角度和人脸关键点的位置信息的过程为:The process of obtaining the head posture angle and the position information of the key points of the face is as follows:利用MMC多任务预测模型的主干网络对输入的头部图像进行特征提取并融合,得到特征图;所述主干网络采用改进的轻量化卷积MobileNet-V2网络结构,The backbone network of the MMC multi-task prediction model is used to extract and fuse the features of the input head image to obtain a feature map; the backbone network adopts an improved lightweight convolution MobileNet-V2 network structure.同时利用嵌入主干网络中的CA注意力模块将特征图沿水平方向和垂直方向分别进行池化操作,得到特征图的位置信息;At the same time, the CA attention module embedded in the backbone network is used to perform pooling operations on the feature map in the horizontal and vertical directions to obtain the position information of the feature map;根据特征图的位置信息,分别利用主干网络后的两个全连接层,回归出头部姿态角度和人脸关键点的位置信息;According to the position information of the feature map, the two fully connected layers after the backbone network are used to regress the head posture angle and the position information of the key points of the face;其中,改进的轻量化卷积MobileNet-V2网络结构结构分别采用1ⅹ1、3ⅹ3和5ⅹ5的卷积核对输入的头部图像进行特征提取,并将卷积步幅设置为1,并将每个卷积核对应的pad分别设置为0,1,2,改进的轻量化卷积MobileNet-V2网络结构结构大小为4M。Among them, the improved lightweight convolution MobileNet-V2 network structure uses 1ⅹ1, 3ⅹ3 and 5ⅹ5 convolution kernels to extract features of the input head image, sets the convolution stride to 1, and sets the pad corresponding to each convolution kernel to 0, 1, and 2 respectively. The size of the improved lightweight convolution MobileNet-V2 network structure is 4M.2.根据权利要求1所述的一种基于关键点检测和头部姿态的疲劳状态检测方法,其特征在于,训练MMC多任务预测模型时采用300W_LP数据集进行训练,300W_LP数据集具有人脸关键点坐标和头部姿态角度标签,在利用300W_LP数据集对MMC多任务预测模型进行训练前,先对数据集中的图像进行预处理,包括:2. According to claim 1, a fatigue state detection method based on key point detection and head posture is characterized in that the 300W_LP data set is used for training the MMC multi-task prediction model, and the 300W_LP data set has facial key point coordinates and head posture angle labels. Before using the 300W_LP data set to train the MMC multi-task prediction model, the images in the data set are preprocessed, including:根据数据集中的人脸关键点坐标裁剪图像中多余的背景部分,将裁剪后的图像大小统一为224ⅹ224尺寸,并对统一尺寸后的图像进行灰度化处理和归一化处理。The redundant background part in the image is cropped according to the coordinates of the facial key points in the data set, the cropped image size is unified to 224×224, and the unified size image is grayscaled and normalized.3.根据权利要求2所述的一种基于关键点检测和头部姿态的疲劳状态检测方法,其特征在于,训练MMC多任务预测模型的过程包括两个任务:3. A fatigue state detection method based on key point detection and head posture according to claim 2, characterized in that the process of training the MMC multi-task prediction model includes two tasks:人脸关键点检测任务用于根据图像中人脸关键点坐标,定位脸部特征点的位置,使用L2损失函数lossa衡量特征点预测坐标值和真实坐标值之间的差值,回归得到人脸关键点的位置信息,The facial key point detection task is used to locate the position of facial feature points according to the coordinates of facial key points in the image, use the L2 loss function lossa to measure the difference between the predicted coordinate value and the actual coordinate value of the feature point, and regress to obtain the position information of the facial key points.头部估计任务用于根据图像中的头部姿态角度标签,预测出图像中头部在yaw、pitch、roll三个方向上的角度,损失函数为:The head estimation task is used to predict the angles of the head in the image in the three directions of yaw, pitch, and roll based on the head posture angle label in the image. The loss function is:其中,为yaw、pitch、roll三个方向上头部姿态角度的估计结果,(x1,x2,x3)为三个方向上的头部姿态角度标签;in, is the estimated result of the head posture angle in the three directions of yaw, pitch and roll, and (x1 ,x2 ,x3 ) is the head posture angle label in the three directions;以MMC多任务预测模型的总损失为学习目标进行训练,总损失为人脸关键点检测任务和头部估计任务的损失和:The total loss of the MMC multi-task prediction model is used as the learning target for training. The total loss is the sum of the losses of the face key point detection task and the head estimation task:loss=lossa+ηlossbloss = lossa + η lossb其中,η为任务分配权重设置为1。Among them, η is the task assignment weight set to 1.4.根据权利要求1所述的一种基于关键点检测和头部姿态的疲劳状态检测方法,其特征在于,对于获取到的单位时间内的若干帧连续的人脸图像,利用双阈值法判定各个部分的疲劳状态的过程为:4. The fatigue state detection method based on key point detection and head posture according to claim 1 is characterized in that, for a plurality of consecutive frames of face images acquired within a unit time, the process of determining the fatigue state of each part using the double threshold method is as follows:判断每张图像头部的疲劳状态:Determine the fatigue state of the head in each image:根据每张图像的头部姿态角度,判断头部姿态角度的pitch姿态角是否大于30°,若大于30°则判断该张图像头部为疲劳状态;若头部为疲劳状态的图像占所有图像的比例超过30%,则判定头部处于疲劳状态;According to the head posture angle of each image, it is determined whether the pitch posture angle of the head posture angle is greater than 30°. If it is greater than 30°, it is determined that the head in the image is in a fatigue state; if the proportion of images in which the head is in a fatigue state accounts for more than 30% of all images, it is determined that the head is in a fatigue state;判断每张图像眼部的疲劳状态:Determine the eye fatigue status of each image:根据每张图像的眼部关键点的位置信息,计算眼部纵横比,判断眼部纵横比是否小于0.2,若小于,则判断该张图像眼部为疲劳状态;若眼部为疲劳状态的图像占所有图像的比例超过40%,则判定眼部处于疲劳状态;According to the position information of the eye key points of each image, the eye aspect ratio is calculated to determine whether the eye aspect ratio is less than 0.2. If it is less than 0.2, the eye of the image is judged to be in a fatigue state; if the proportion of images in which the eye is in a fatigue state accounts for more than 40% of all images, the eye is judged to be in a fatigue state;判断每张图像嘴部的疲劳状态:Determine the fatigue state of the mouth in each image:根据每张图像的嘴部关键点的位置信息,计算嘴部纵横比,判断嘴部纵横比是否小于0.3,若小于,则判断该张图像嘴部为疲劳状态;若嘴部为疲劳状态的图像占所有图像的比例超过40%,则判定嘴部处于疲劳状态。According to the position information of the mouth key points of each image, the mouth aspect ratio is calculated to determine whether the mouth aspect ratio is less than 0.3. If it is less than 0.3, the mouth of the image is judged to be in a fatigued state; if the proportion of images with fatigued mouths in all images exceeds 40%, the mouth is judged to be in a fatigued state.5.根据权利要求1所述的一种基于关键点检测和头部姿态的疲劳状态检测方法,其特征在于,根据头部、眼部和嘴部的疲劳状态对人的疲劳状态的影响权重,对每个部分的疲劳状态设定相关系数综合判定人的疲劳状态Z:5. A fatigue state detection method based on key point detection and head posture according to claim 1, characterized in that, according to the influence weight of the fatigue state of the head, eyes and mouth on the fatigue state of the person, a correlation coefficient is set for the fatigue state of each part to comprehensively determine the fatigue state Z of the person:Z=αZeye+βZmouth+λZheadZ=αZeye +βZmouth +λZhead其中,Zeye每表示眼部的疲劳状态,Zmouth表示嘴部的疲劳状态,Zhead表示头部的疲劳状态,将相关系统α、β、λ分别设为0.2、0.3、0.5;Wherein, Zeye represents the fatigue state of the eyes, Zmouth represents the fatigue state of the mouth, and Zhead represents the fatigue state of the head. The relevant system α, β, and λ are set to 0.2, 0.3, and 0.5 respectively;当Z大于等于0.5时,则判断人处于疲劳状态。When Z is greater than or equal to 0.5, it is judged that the person is in a fatigue state.6.一种基于关键点检测和头部姿态的疲劳状态检测系统,其特征在于,基于权利要求1-5任一所述的方法,包括:6. A fatigue state detection system based on key point detection and head posture, characterized in that it is based on the method according to any one of claims 1 to 5, comprising:模型训练模块,用于构建并训练MMC多任务预测模型,得到训练好的MMC多任务预测模型;The model training module is used to build and train the MMC multi-task prediction model to obtain a trained MMC multi-task prediction model;人脸位置检测模块,用于根据获取到的单位时间内的若干帧人脸图像,以每一帧作为一张图像,采用MTCNN网络检测每张图像的人脸位置并裁剪出头部图像;The face position detection module is used to detect the face position of each image using the MTCNN network and crop the head image based on the acquired face image frames within a unit time, taking each frame as an image;并行预测模块,用于将头部图像输入训练好的MMC多任务预测模型中,得到头部姿态角度和人脸关键点的位置信息;The parallel prediction module is used to input the head image into the trained MMC multi-task prediction model to obtain the head posture angle and the position information of the key points of the face;局部状态检测模块,用于根据每张图像的头部姿态角度,利用双阈值法判定若干张图像中头部的疲劳状态,同时根据每张图像的人脸关键点中眼部和嘴部的位置信息,利用双阈值法分别判定若干张图像中眼部和嘴部的疲劳状态;A local state detection module is used to determine the fatigue state of the head in several images using a double threshold method based on the head posture angle of each image, and to determine the fatigue state of the eyes and mouth in several images using a double threshold method based on the position information of the eyes and mouth in the key points of the face of each image;综合疲劳状态检测模块,用于根据头部、眼部和嘴部的疲劳状态综合判定人的疲劳状态。The comprehensive fatigue state detection module is used to comprehensively determine a person's fatigue state based on the fatigue states of the head, eyes and mouth.7.根据权利要求6所述的一种基于关键点检测和头部姿态的疲劳状态检测系统,其特征在于,所述MMC多任务预测模型包括主干网络和分别用于回归头部姿态角度和人脸关键点的位置信息的两个全连接层,主干网络采用改进的轻量化卷积MobileNet-V2网络结构,同时在主干网络中嵌入CA注意力模块,7. According to claim 6, a fatigue state detection system based on key point detection and head posture is characterized in that the MMC multi-task prediction model includes a backbone network and two fully connected layers for regressing the head posture angle and the position information of the key points of the face respectively, the backbone network adopts an improved lightweight convolution MobileNet-V2 network structure, and a CA attention module is embedded in the backbone network.主干网络用于对输入的头部图像进行特征提取并融合,得到特征图,The backbone network is used to extract and fuse the features of the input head image to obtain the feature map.CA注意力模块用于将特征图沿水平方向和垂直方向分别进行池化操作,得到特征图的位置信息。The CA attention module is used to perform pooling operations on the feature map in the horizontal and vertical directions respectively to obtain the position information of the feature map.8.根据权利要求7所述的一种基于关键点检测和头部姿态的疲劳状态检测系统,其特征在于,改进的轻量化卷积MobileNet-V2网络结构结构分别采用1ⅹ1、3ⅹ3和5ⅹ5的卷积核对输入的头部图像进行特征提取,并将卷积步幅设置为1,并将每个卷积核对应的pad分别设置为0,1,2,改进的轻量化卷积MobileNet-V2网络结构结构大小为4M。8. According to the fatigue state detection system based on key point detection and head posture described in claim 7, it is characterized in that the improved lightweight convolution MobileNet-V2 network structure uses 1ⅹ1, 3ⅹ3 and 5ⅹ5 convolution kernels to extract features of the input head image, and sets the convolution stride to 1, and sets the pad corresponding to each convolution kernel to 0, 1, and 2 respectively, and the improved lightweight convolution MobileNet-V2 network structure size is 4M.
CN202210013760.0A2022-01-062022-01-06 Fatigue state detection method and system based on key point detection and head postureActiveCN114360041B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202210013760.0ACN114360041B (en)2022-01-062022-01-06 Fatigue state detection method and system based on key point detection and head posture

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202210013760.0ACN114360041B (en)2022-01-062022-01-06 Fatigue state detection method and system based on key point detection and head posture

Publications (2)

Publication NumberPublication Date
CN114360041A CN114360041A (en)2022-04-15
CN114360041Btrue CN114360041B (en)2025-04-15

Family

ID=81107211

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202210013760.0AActiveCN114360041B (en)2022-01-062022-01-06 Fatigue state detection method and system based on key point detection and head posture

Country Status (1)

CountryLink
CN (1)CN114360041B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114898444A (en)*2022-06-072022-08-12嘉兴锐明智能交通科技有限公司Fatigue driving monitoring method, system and equipment based on face key point detection
CN115393830A (en)*2022-08-262022-11-25南通大学 A fatigue driving detection method based on deep learning and facial features
CN116206350A (en)*2022-12-302023-06-02上海富瀚微电子股份有限公司 Method for face key point detection, face detection method and device
CN116468884A (en)*2023-03-312023-07-21深圳大学 A head movement-based visual stimulus response monitoring method and intelligent terminal
CN116434204B (en)*2023-04-172025-08-29南京邮电大学 A driver fatigue detection method, device and storage medium based on improved PIPNet network
CN117079183A (en)*2023-08-182023-11-17华戎技术有限公司 A method, device and storage medium for guard duty supervision
CN117218154A (en)*2023-08-232023-12-12湖南创星科技股份有限公司Patient attention degree judging method based on information entropy in meta-diagnosis room scene
CN117612142B (en)*2023-11-142024-07-12中国矿业大学Head posture and fatigue state detection method based on multi-task joint model

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112949345A (en)*2019-11-262021-06-11北京四维图新科技股份有限公司Fatigue monitoring method and system, automobile data recorder and intelligent cabin
CN112163470B (en)*2020-09-112024-12-10高新兴科技集团股份有限公司 Fatigue state recognition method, system, and storage medium based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习的头部姿态估计方法研究;张艺琼;《信息科技辑》;20240916;全文*

Also Published As

Publication numberPublication date
CN114360041A (en)2022-04-15

Similar Documents

PublicationPublication DateTitle
CN114360041B (en) Fatigue state detection method and system based on key point detection and head posture
US12087077B2 (en)Determining associations between objects and persons using machine learning models
CN110909651B (en)Method, device and equipment for identifying video main body characters and readable storage medium
WO2021068323A1 (en)Multitask facial action recognition model training method, multitask facial action recognition method and apparatus, computer device, and storage medium
CN112560741A (en)Safety wearing detection method based on human body key points
CN111062239A (en) Human target detection method, device, computer equipment and storage medium
CN114332214A (en) Object pose estimation method, device, electronic device and storage medium
CN111680550B (en)Emotion information identification method and device, storage medium and computer equipment
CN113326778B (en) Human pose detection method, device and storage medium based on image recognition
CN113487610B (en)Herpes image recognition method and device, computer equipment and storage medium
CN110598647B (en)Head posture recognition method based on image recognition
CN111881731A (en) Behavior recognition method, system, device and medium based on human skeleton
CN113780145A (en)Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
CN116189269A (en) A multi-task face detection method, device, electronic equipment and storage medium
CN114220138B (en) A face alignment method, training method, device and storage medium
CN113191195B (en)Face detection method and system based on deep learning
CN113963202A (en)Skeleton point action recognition method and device, electronic equipment and storage medium
CN119649467B (en)Theft behavior identification method and system based on computer vision
CN114332456A (en)Target detection and identification method and device for large-resolution image
CN114792437A (en)Method and system for analyzing safe driving behavior based on facial features
JP4011426B2 (en) Face detection device, face detection method, and face detection program
Ren et al.Gaze estimation based on attention mechanism combined with temporal network
CN117115854A (en) Facial action recognition method, device, electronic equipment and storage medium
CN116524572A (en) Accurate real-time face positioning method based on adaptive Hope-Net
Xiao et al.A novel laser stripe key point tracker based on self-supervised learning and improved KCF for robotic welding seam tracking

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp