Disclosure of Invention
The embodiment of the application provides an attention detection method based on facial orientation, facial expression and pupil tracking, which is used for solving the problem of low accuracy of attention detection in the prior art.
In order to solve the technical problems, the embodiment of the application is realized as follows:
the embodiment of the application provides a method for detecting attention based on facial orientation, facial expression and pupil tracking, which comprises the following steps:
Acquiring a target image frame containing a head portrait of an evaluated person;
extracting a face image of the person to be evaluated from the target image frame;
carrying out normalization processing on the pixel values of the face image;
calculating the pixel value of each pixel point of the face image after normalization processing as the input of a pre-trained face key point detection model to obtain a plurality of key point coordinates in the face image, wherein the plurality of key point coordinates comprise eye contour coordinates;
determining the face orientation angle of the person to be evaluated according to the plurality of key point coordinates, the preset camera internal parameter matrix and the preset camera distortion parameters;
extracting an eye region image of the person to be evaluated from the face image based on the eye contour coordinates;
determining the eye pupil position of the person to be evaluated according to the eye region image;
determining a pupil deflection parameter of the subject based on the eye region image and the eye pupil position;
calculating pixel values of all pixel points of the face image after normalization processing as input of a pre-trained facial emotion recognition model to obtain a first expression parameter and a second expression parameter corresponding to the person to be evaluated, wherein the first expression parameter represents the positive/negative degree of the person to be evaluated, and the second expression parameter represents the wakefulness/drowsiness degree of the person to be evaluated;
And determining an attention parameter corresponding to the evaluative person based on the face orientation angle, the pupil deflection parameter, the first expression parameter and the second expression parameter, wherein the attention parameter characterizes the attention concentration degree of the evaluative person.
Optionally, the format of the target image frame is an RGB format, and the extracting the face image of the person under evaluation from the target image frame includes:
sequentially converting the image channels of the target image frames into RGB;
scaling the target image frames after the image channels are sequentially converted into RGB to a first specified size;
normalizing the target image frames scaled to the first specified size;
calculating a matrix formed by pixel values of all pixel points of the target image frame subjected to normalization processing as input of a pre-trained face detection model to obtain face boundary point coordinates;
and extracting the face image from the target image frame based on the face boundary point coordinates.
Optionally, the face boundary point coordinates include first boundary point coordinates and second boundary point coordinates, and the extracting the face image from the target image frame based on the face boundary point coordinates includes:
And taking the first boundary point coordinates and the second boundary point coordinates as diagonal angles, and extracting the rectangular face image from the target image frame.
Optionally, the method further comprises:
scaling the face image to a second specified size;
the normalizing the pixel value of the face image includes:
and carrying out normalization processing on the pixel values of the face image scaled to the second designated size.
Optionally, the determining the face orientation angle of the person under evaluation according to the coordinates of the plurality of key points, the preset camera internal parameter matrix and the preset camera distortion parameter includes:
determining a plurality of head three-dimensional key point reference coordinates corresponding to the plurality of key point coordinates one by one according to the plurality of key point coordinates, the preset camera internal parameter matrix and the preset camera distortion parameters;
determining a rotation vector and a transformation vector of a camera according to the preset camera internal parameter matrix, the preset camera distortion parameter, the target key point coordinates and the target head three-dimensional key point reference coordinates;
converting the rotation vector into a rotation matrix;
splicing the rotation matrix with the transformation vector to obtain a posture matrix;
Decomposing the gesture matrix to obtain the face orientation angle of the person to be evaluated;
the target key point coordinate is one of the plurality of key point coordinates, and the target head three-dimensional key point reference coordinate is a head three-dimensional key point reference coordinate corresponding to the target key point coordinate.
Optionally, the determining the position of the pupil of the eye of the person under evaluation according to the eye region image includes:
determining a first face image with pupil ratio in a preset ratio range in the face image corresponding to the target image frame, wherein the pupil ratio is the ratio of the pupil area to the eye area;
when the number of the first face images reaches a preset number, calculating the average value of the pupil ratio of each first face image;
finding out the target pupil duty ratio closest to the average value in the pupil duty ratios of the first face images;
selecting a first face image corresponding to the target pupil ratio as a target face image;
and taking the center of the pupil area of the eyes in the target face image as the pupil position.
Optionally, the determining the first face image with the pupil ratio in the range of the preset ratio in the face image corresponding to the target image frame includes:
Based on eye contour coordinates in a face image corresponding to the target image frame, an externally connected rectangular area is obtained;
expanding the circumscribed matrix area outwards to designate pixels;
performing bilateral filtering on the expanded circumscribed matrix region, and then performing corrosion operation to obtain a corroded image;
performing binarization processing on the corroded image to obtain a binarized image;
inwards shrinking the specified pixels on the binarized image to obtain a shrunk image;
calculating the duty ratio of non-zero pixel values in the contracted image to obtain the pupil duty ratio of the face image;
and taking the face image with the pupil ratio in the range of the preset ratio as the first face image.
The above technical scheme that this application embodiment adopted can reach following beneficial effect:
the attention parameter corresponding to the subject is then determined based on the face orientation angle, the pupil deflection parameter, the first expression parameter, and the second expression parameter by determining the face orientation angle of the subject, the pupil deflection parameter of the subject, the first expression parameter that characterizes the degree of aggressiveness/negativity of the subject, and the second expression parameter that characterizes the degree of wakefulness/drowsiness of the subject. Thus, attention detection can be performed by comprehensively considering parameters related to attention from different dimensions, so that the attention concentration degree of an evaluative person can be accurately evaluated, and the robustness is high.
Detailed Description
For the purposes, technical solutions and advantages of this document, the technical solutions of this document will be clearly and completely described below with reference to specific embodiments of this document and corresponding drawings. It will be apparent that the embodiments described are only some, but not all, of the embodiments of this document. All other embodiments, based on the embodiments in this document, which would be within the purview of one of ordinary skill in the art without the creative effort, are contemplated within the scope of protection of this document.
In order to ensure the accuracy of attention assessment, the embodiment of the application provides an attention detection method based on facial orientation, facial expression and pupil tracking, which can accurately assess the attention concentration degree and has high robustness.
The following describes in detail the attention detection method based on the face orientation, the facial expression and the pupil tracking provided in the embodiment of the present application.
The attention detection method based on facial orientation, facial expression and pupil tracking provided by the embodiment of the application can be applied to a user terminal, wherein the user terminal can be, but is not limited to, a personal computer, a smart phone, a tablet computer, a laptop portable computer, a personal digital assistant and the like.
It is understood that the execution bodies do not constitute limitations on the embodiments of the present application.
Optionally, the flow of the method for detecting the attention based on the face orientation, the facial expression and the pupil tracking is shown in fig. 1, and may include the following steps:
step S101, a target image frame including a head portrait of the subject to be evaluated is acquired.
The target image frame may be a video acquired by a camera, or may be a picture shot by a camera, and the target image frame may include one or more frames of images, which is not specifically limited in the embodiment of the present application.
Step S102, extracting face images of the evaluated person from the target image frames.
If the target image frame is a frame of image, the face image of the person to be evaluated can be directly extracted according to the target image frame. If the target image frame includes multiple frames of images, when the face image of the person to be evaluated is extracted from the target image frame, each frame in the multiple frames of images may be extracted to obtain multiple face images corresponding to the multiple frames of images one by one, or one frame in the multiple frames of images may be extracted to obtain one face image, or one frame may be selected from the multiple frames of images at intervals of a certain frame number to be extracted to obtain multiple face images.
The target image frame is not limited in format, and taking the target image frame as an image in an RGB format as an example, the step of extracting the face image of the subject to be evaluated from the target image frame may include the steps of:
step S1021, sequentially converts the image channels of the target image frame into RGB.
In step S1022, the target image frame after the image channel is sequentially converted into RGB is scaled to the first specified size.
The first specified size may be determined according to a face detection model that is used to determine coordinates of a boundary point of a face, for example, an input of the face detection model is a 300×300 matrix, and the first specified size may be 300×300 pixels.
Step S1023, the target image frame scaled to the first specified size is normalized.
Specifically, the pixel value of each pixel in the target image frame scaled to the first specified size may be subtracted by 127.5 and then divided by 127.5, such that the pixel value is distributed within the interval of [ -1,1 ]. For use in subsequent operations.
Step S1024, the matrix formed by the pixel values of each pixel point of the normalized target image frame is used as the input of the pre-trained face detection model to calculate, and the face boundary point coordinates are obtained.
In the embodiment of the application, a face detection model for face detection is pre-established. After normalization processing, a matrix can be constructed by taking the pixel value of each pixel point in the target image frame of the normalization processing as one value in the matrix. And then, the matrix is used as the input of a pre-trained face detection model to operate, so as to obtain the face boundary point coordinates.
For example, if the target image frame of the normalization process is an image frame with a size of 300×300 pixels, the constructed matrix is a 300×300 matrix.
In step S1025, a face image is extracted from the target image frame based on the face boundary point coordinates.
In this embodiment of the present invention, the face boundary point coordinates include a first boundary point coordinate and a second boundary point coordinate, and when the face image is extracted, the first boundary point coordinate and the second boundary point coordinate may be diagonal, and a rectangular face image may be extracted from the target image frame.
For example, the first boundary point coordinates are (x_min, y_min), the second boundary point coordinates are (x_max, y_max), and rectangular area images with the horizontal coordinate ranges of x_min to x_max and the vertical coordinate ranges of y_min to y_max in the target image frame can be used as face images.
Step S103, carrying out normalization processing on the pixel values of the face image.
Specifically, the pixel value of each pixel point in the face image may be divided by 256, so that the pixel value of each pixel point in the normalized face image is distributed in the interval of [0,1], so as to be convenient for calculating the coordinates of the key point in the face image.
Furthermore, before the normalization processing is performed on the pixel values of the face image, the face image may be scaled to a second specified size, where the second specified size may be determined according to a face key point detection model that is used to calculate key point coordinates in the face image. For example, the input requirement of the face key point detection model is a 112×112 matrix, and the second specified size may be 112×112 pixel size.
Step S104, calculating the pixel value of each pixel point of the face image after normalization processing as the input of a pre-trained face key point detection model to obtain a plurality of key point coordinates in the face image.
In the embodiment of the application, a face key point detection model for calculating the coordinates of the face key points is trained in advance, wherein the face key points can be, but are not limited to, eyes, ears, nose and other areas on the face.
When the coordinates of the key points of the face are calculated, the pixel value of each pixel point of the face image which is processed by normalization can be used as one value in the matrix to construct a matrix. And then, the matrix is used as the input of a pre-trained face key point detection model to operate, so that a plurality of key point coordinates in the face image are obtained.
The plurality of key point coordinates comprise eye contour coordinates.
Step S105, determining the face orientation angle of the person to be evaluated according to the coordinates of the key points, the parameter matrix in the preset camera and the distortion parameters of the preset camera.
The preset camera internal parameter matrix refers to an internal parameter matrix of the camera for acquiring the target image frame, and the preset camera distortion parameter refers to a distortion parameter of the camera for acquiring the target image frame. The parameter matrix in the preset camera and the distortion parameters of the preset camera are preset, and the cameras of different manufacturers can be set differently.
In the embodiment of the present application, determining the face orientation angle of the evaluated person may include the following steps:
step S1051, determining a plurality of head three-dimensional key point reference coordinates corresponding to the plurality of key point coordinates one by one according to the plurality of key point coordinates, the preset camera internal parameter matrix and the preset camera distortion parameters.
In the embodiment of the present application, the reference coordinates of the three-dimensional key points of the head can be determined through opencv, which is the prior art, and specific description is not made in the embodiment of the present application.
Step S1052, determining a rotation vector and a transformation vector of the camera according to the preset camera internal parameter matrix, the preset camera distortion parameter, the target key point coordinates and the target head three-dimensional key point reference coordinates.
The target key point coordinates are one of a plurality of key point coordinates, and the target head three-dimensional key point reference coordinates are head three-dimensional key point reference coordinates corresponding to the target key point coordinates.
Specifically, the function solvepnp can be used for receiving the target key point coordinates and the target head three-dimensional key point reference coordinates, and the preset camera internal parameter matrix and the preset camera distortion parameters for back-pushing to obtain the rotation vector and the transformation vector of the camera.
In step S1053, the rotation vector is converted into a rotation matrix.
The rotation vector can be converted into a rotation matrix by using a function Rodrigues during conversion, and the embodiment of the application is not specifically described.
Step S1054, splice the rotation matrix and the transformation vector to obtain an attitude matrix.
For example, the rotation matrix is a 3*3 matrix, the transformation vector is a 3-dimensional vector, and the spliced posture matrix is a 3×4 matrix.
In step S1055, the pose matrix is decomposed to obtain the face orientation angle of the subject.
Wherein the face orientation angle includes a pitch angle, a yaw angle, and a roll angle.
Step S106, extracting an eye region image of the person to be evaluated from the face image based on the eye contour coordinates.
In the embodiment of the application, the coordinate index of the coordinate corresponding to each pixel point in the face image can be set, and after the eye contour coordinate is obtained, rough clipping can be performed according to the coordinate index of the eye contour coordinate, so that an eye region image of the person to be evaluated is obtained. The coordinate index of the eye contour coordinate comprises a coordinate index of a left eye contour coordinate and a coordinate index of a right eye contour coordinate, and the obtained eye area image comprises a left eye area image and a right eye area image.
Step S107, determining the pupil position of the eyes of the evaluated person according to the eye area image.
In an embodiment of the present application, determining the pupil position of the eye of the subject may include the following steps:
step S1071, determining a first face image with a pupil ratio in a preset ratio range in the face images corresponding to the target image frames.
Wherein, the pupil ratio is the ratio of the pupil area to the eye area.
Specifically, a face image corresponding to one frame of image can be selected from the target image frames at a certain frame number interval, and a corresponding circumscribed rectangular area is obtained according to the eye contour coordinates.
For example, if the fixed frame number is 5 frames, the 5 th frame image in the target image frame may be selected first, and the corresponding circumscribed rectangular region may be obtained according to the eye contour coordinates in the face image corresponding to the 5 th frame image.
The determined circumscribed matrix area is then expanded outwardly by a specified number of pixels, for example 5 pixels. And performing corrosion operation after bilateral filtering on the expanded circumscribed matrix area to obtain a corroded image. And then carrying out binarization processing on the corroded image to obtain a binarized image. And inwardly shrinking the binarized image by specified pixels to obtain a shrunk image, wherein the number of the shrunk pixels is the same as the number of the outwardly expanded pixels. And calculating the duty ratio of the non-zero pixel value in the contracted image to obtain the pupil duty ratio of the face image. If the pupil ratio of the face image is in the range of the preset ratio, the face image is taken as the first face image, otherwise, a frame of image is selected from the target image frame at intervals of a certain frame number (such as 5 frames) to calculate the pupil ratio of the face image.
The preset ratio range can be determined according to the ratio of the pupil to the eye area of an ordinary person. For example, the ratio of the pupils of the average person to the eye area is 0.46 to 0.50, and the preset ratio may be in the range of 0.46 to 0.50.
Step S1072, when the number of the first face images reaches the preset number, the average value of the pupil duty ratio of each first face image is obtained.
After each first face image is determined, whether the number of the first face images reaches the preset number or not can be judged, if the number of the first face images does not reach the preset number, one frame of image is continuously selected from the target image frames at intervals to calculate the pupil ratio of the face images, and the number of the first face images reaches the preset number. The preset number can be set according to actual conditions.
When the number of the first face images reaches the preset number, the average value of the pupil ratio of each first face image is calculated.
Step S1073, finding out the target pupil ratio closest to the mean value in the pupil ratios of the first face images.
After the average value of the pupil duty ratios of the first face images is obtained, one pupil duty ratio closest to the average value can be found out from the pupil duty ratios corresponding to the first face images to be used as the target pupil duty ratio.
Step S1074, selecting a first face image corresponding to the target pupil ratio as a target face image.
Step S1075, the center of the pupil area of the eye in the target face image is set as the pupil position.
And step S108, determining pupil deflection parameters of the evaluated person based on the eye region image and the eye pupil position.
The pupil deflection parameter may be a distance of left/right deflection of the pupil, or may be a proportion of left/right deflection of the pupil, which is not specifically limited in the embodiment of the present application.
Specifically, the distance between the two side corners of the eye and the pupil can be calculated according to the coordinates of the key points of the eye area and the coordinates corresponding to the pupil positions, and then the pupil deflection parameter of the person to be evaluated is determined according to the distances between the two side corners and the pupil.
Step S109, calculating the pixel value of each pixel point of the face image after normalization processing as the input of a pre-trained facial emotion recognition model to obtain a first expression parameter and a second expression parameter corresponding to the person to be evaluated.
In this embodiment of the present application, a facial emotion recognition model for facial emotion recognition is trained in advance, after normalization processing is performed on pixel values of a face image, pixel values of each pixel point of the face image after normalization processing may be used as one value in a matrix to construct a matrix, and then the matrix is used as input of the facial emotion recognition model trained in advance to perform operation, so as to obtain a first expression parameter and a second expression parameter corresponding to an evaluated person. Wherein the first expression parameter characterizes the positive/negative degree of the subject and the second expression parameter characterizes the wakefulness/drowsiness degree of the subject.
Step S110, determining the attention parameter corresponding to the person to be evaluated based on the face orientation angle, the pupil deflection parameter, the first expression parameter and the second expression parameter.
Wherein the attention parameter characterizes a degree of concentration of the subject.
The face orientation angle, the pupil deflection parameter, the first expression parameter and the second expression parameter can reflect whether the evaluative person is concentrated or not to a certain extent. Therefore, when determining the attention parameter corresponding to the person to be evaluated, a score may be respectively assigned to the face orientation angle, the pupil deflection parameter, the first expression parameter and the second expression parameter, and then a final score corresponding to the person to be evaluated is determined according to the sum of the scores corresponding to the face orientation angle, the pupil deflection parameter, the first expression parameter and the second expression parameter, so that the attention concentration degree of the person to be evaluated is obtained according to the final score.
Specifically, the total score of the face orientation angle may be set to 100, and the score corresponding to the face orientation angle may be expressed as ss=100-pitch a-yaw b-roll c, where pitch represents the degree of pitch angle in the face orientation angle, yaw represents the degree of yaw angle in the face orientation angle, and roll represents the degree of roll angle in the face orientation angle. When the absolute value of pitch is less than or equal to 15, the value of a is 0.8, and when the absolute value of pitch is greater than 15, the value of a is 1.5. When the absolute value of the yaw is less than or equal to 15, the value of b is 0.8, and when the absolute value of the yaw is more than 15, the value of b is 1.5. When the absolute value of the roll is less than or equal to 15, the value of c is 0.8, and when the absolute value of the roll is greater than 15, the value of c is 1.5
The score corresponding to the pupil deflection parameter may be the minimum value of the score corresponding to the pupil deflection parameter of the left eye and the score corresponding to the pupil deflection parameter of the right eye. The score corresponding to the pupil deflection parameter of the left eye may be expressed as els=100-abs ((1-LR)). 50, LR represents the pupil deflection parameter of the left eye. The score corresponding to the pupil deflection parameter for the right eye may be expressed as ers=100-abs ((1-RR)). 50, RR representing the pupil deflection parameter for the right eye. The score corresponding to the pupil deflection parameter is marked as EMS, and EMS is the minimum value in ELS and ERS.
The first expression parameter characterizes the positive/negative degree of the person being evaluated, wherein a first expression parameter greater than 0 indicates a positive state and less than 0 indicates a negative state, denoted V. The second expression parameter characterizes the wakefulness/drowsiness of the subject, wherein the second expression parameter is greater than 0 for the wakefulness state and less than 0 for the drowsiness state, denoted as A. The score corresponding to the first expression parameter may be denoted as sv=10 (1+V), and the score corresponding to the second expression parameter may be denoted as sa=10 (1+a).
The final score fs= (ss+ems+sa+sv)/200 for the evaluators. The corresponding final score FS of the evaluators characterizes the degree of concentration of the evaluators, wherein a larger final score FS indicates a more concentrated concentration of the evaluators.
According to the attention detection method based on facial orientation, facial expression and pupil tracking, the facial orientation angle of the person to be evaluated, the pupil deflection parameter of the person to be evaluated, the first expression parameter representing the positive/negative degree of the person to be evaluated and the second expression parameter representing the wakefulness/drowsiness degree of the person to be evaluated are determined, and then the attention parameter corresponding to the person to be evaluated is determined based on the facial orientation angle, the pupil deflection parameter, the first expression parameter and the second expression parameter. Thus, attention detection can be performed by comprehensively considering parameters related to attention from different dimensions, so that the attention concentration degree of an evaluative person can be accurately evaluated, and the robustness is high.
Fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 2, at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 2, but not only one bus or type of bus.
And the memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the attention detection device based on the face orientation, the facial expression and the pupil tracking on a logic level. The processor is used for executing the programs stored in the memory and is specifically used for executing the following operations:
acquiring a target image frame containing a head portrait of an evaluated person;
Extracting a face image of the person to be evaluated from the target image frame;
carrying out normalization processing on the pixel values of the face image;
calculating the pixel value of each pixel point of the face image after normalization processing as the input of a pre-trained face key point detection model to obtain a plurality of key point coordinates in the face image, wherein the plurality of key point coordinates comprise eye contour coordinates;
determining the face orientation angle of the person to be evaluated according to the plurality of key point coordinates, the preset camera internal parameter matrix and the preset camera distortion parameters;
extracting an eye region image of the person to be evaluated from the face image based on the eye contour coordinates;
determining the eye pupil position of the person to be evaluated according to the eye region image;
determining a pupil deflection parameter of the subject based on the eye region image and the eye pupil position;
calculating pixel values of all pixel points of the face image after normalization processing as input of a pre-trained facial emotion recognition model to obtain a first expression parameter and a second expression parameter corresponding to the person to be evaluated, wherein the first expression parameter represents the positive/negative degree of the person to be evaluated, and the second expression parameter represents the wakefulness/drowsiness degree of the person to be evaluated;
And determining an attention parameter corresponding to the evaluative person based on the face orientation angle, the pupil deflection parameter, the first expression parameter and the second expression parameter, wherein the attention parameter characterizes the attention concentration degree of the evaluative person.
The method performed by the attention detection device based on facial orientation, facial expression and pupil tracking as disclosed in the embodiment shown in fig. 2 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in one or more embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with one or more embodiments of the present application may be embodied directly in a hardware decoding processor or in a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.
The electronic device may further execute the method of the embodiment shown in fig. 1, and implement the functions of the attention detection device of the embodiment shown in fig. 1 based on facial orientation, facial expression and pupil tracking, which are not described herein.
Of course, other implementations, such as a logic device or a combination of hardware and software, are not excluded from the electronic device of the present application, that is, the execution subject of the following processing flow is not limited to each logic unit, but may be hardware or a logic device.
The present embodiments also provide a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, enable the portable electronic device to perform the method of the embodiment of fig. 1, and in particular to:
acquiring a target image frame containing a head portrait of an evaluated person;
extracting a face image of the person to be evaluated from the target image frame;
carrying out normalization processing on the pixel values of the face image;
calculating the pixel value of each pixel point of the face image after normalization processing as the input of a pre-trained face key point detection model to obtain a plurality of key point coordinates in the face image, wherein the plurality of key point coordinates comprise eye contour coordinates;
Determining the face orientation angle of the person to be evaluated according to the plurality of key point coordinates, the preset camera internal parameter matrix and the preset camera distortion parameters;
extracting an eye region image of the person to be evaluated from the face image based on the eye contour coordinates;
determining the eye pupil position of the person to be evaluated according to the eye region image;
determining a pupil deflection parameter of the subject based on the eye region image and the eye pupil position;
calculating pixel values of all pixel points of the face image after normalization processing as input of a pre-trained facial emotion recognition model to obtain a first expression parameter and a second expression parameter corresponding to the person to be evaluated, wherein the first expression parameter represents the positive/negative degree of the person to be evaluated, and the second expression parameter represents the wakefulness/drowsiness degree of the person to be evaluated;
and determining an attention parameter corresponding to the evaluative person based on the face orientation angle, the pupil deflection parameter, the first expression parameter and the second expression parameter, wherein the attention parameter characterizes the attention concentration degree of the evaluative person.
Fig. 3 is a schematic structural diagram of an attention detection device based on facial orientation, facial expression and pupil tracking according to an embodiment of the present application. Referring to fig. 3, in a software embodiment, the provided attention detection device based on facial orientation, facial expression and pupil tracking may include:
an acquisition module for acquiring a target image frame containing a head portrait of an evaluated person;
a first extraction module, configured to extract a face image of the person under evaluation from the target image frame;
the normalization module is used for carrying out normalization processing on the pixel values of the face image;
the first operation module is used for operating the pixel value of each pixel point of the face image after normalization processing as the input of a pre-trained face key point detection model to obtain a plurality of key point coordinates in the face image, wherein the plurality of key point coordinates comprise eye contour coordinates;
the first determining module is used for determining the face orientation angle of the person to be evaluated according to the plurality of key point coordinates, the preset camera internal parameter matrix and the preset camera distortion parameters;
the second extraction module is used for extracting an eye region image of the person to be evaluated from the face image based on the eye contour coordinates;
The second determining module is used for determining the eye pupil position of the evaluated person according to the eye area image;
a third determining module for determining a pupil deflection parameter of the subject based on the eye region image and the eye pupil position;
the second operation module is used for operating the pixel values of each pixel point of the face image after normalization processing as the input of a pre-trained facial emotion recognition model to obtain a first expression parameter and a second expression parameter corresponding to the person to be evaluated, wherein the first expression parameter represents the positive/negative degree of the person to be evaluated, and the second expression parameter represents the wakefulness/drowsiness degree of the person to be evaluated;
and a fourth determining module, configured to determine an attention parameter corresponding to the person under evaluation based on the face orientation angle, the pupil deflection parameter, the first expression parameter, and the second expression parameter, where the attention parameter characterizes the attention concentration of the person under evaluation.
The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In summary, the foregoing description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
All embodiments in the application are described in a progressive manner, and identical and similar parts of all embodiments are mutually referred, so that each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.