CN110059624B

Movatterモバイル変換

Info

Publication number: CN110059624B
Application number: CN201910312194.1A
Authority: CN
Inventors: 王旭
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2019-04-18
Filing date: 2019-04-18
Publication date: 2021-10-08
Anticipated expiration: 2039-04-18
Also published as: CN110059624A

Abstract

Embodiments of the present disclosure disclose methods and devices for detecting a living body. One embodiment of the method comprises: extracting adjacent video frames from a video frame sequence corresponding to a target face video to serve as a first face image and a second face image; determining face key points from the first face image as first face key points; determining face key points corresponding to the first face key points from the second face image as second face key points; and generating a detection result for indicating whether the face corresponding to the target face video is a living face or not based on the distance between the first face key point and the second face key point. This embodiment can improve the efficiency of in vivo detection; and moreover, the living body detection is carried out based on the key points of the human face, so that the consumption of a CPU (central processing unit) in the living body detection process is reduced.

Description

Method and apparatus for detecting living body

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method and apparatus for detecting a living body.

Background

With the development of face recognition technology, people can realize account login, payment, unlocking and the like through face brushing. This brings convenience to people's lives, but at the same time there are risks. The machine can recognize a face, but of course, also a face image. Therefore, there may be a risk that an illegal person disguises the identity using the face image of another person.

Currently, in order to reduce the above risk, the living body detection is generally performed on the target human face object.

Disclosure of Invention

Embodiments of the present disclosure propose methods and apparatuses for detecting a living body.

In a first aspect, embodiments of the present disclosure provide a method for detecting a living body, the method including: extracting adjacent video frames from a video frame sequence corresponding to a target face video to serve as a first face image and a second face image; determining face key points from the first face image as first face key points; determining face key points corresponding to the first face key points from the second face image as second face key points; and generating a detection result for indicating whether the face corresponding to the target face video is a living face or not based on the distance between the first face key point and the second face key point.

In some embodiments, determining face keypoints from the first face image as first face keypoints comprises: and inputting the first face image into a pre-trained face key point recognition model to obtain a face key point serving as the first face key point.

In some embodiments, determining face keypoints corresponding to the first face keypoints from the second face image as second face keypoints comprises: and inputting the second face image into the face key point recognition model to obtain face key points serving as second face key points.

In some embodiments, generating a detection result indicating whether the face corresponding to the target face video is a live face based on the distance between the first face key point and the second face key point includes: determining whether the distance between the first face key point and the second face key point is greater than or equal to a preset threshold value; and generating a detection result for indicating that the face corresponding to the target face video is the living body face in response to the fact that the face is determined to be larger than or equal to the preset threshold value.

In some embodiments, generating a detection result indicating whether the face corresponding to the target face video is a live face based on the distance between the first face key point and the second face key point further includes: and generating a detection result for indicating that the face corresponding to the target face video is a non-living body face in response to the fact that the distance between the first face key point and the second face key point is smaller than a preset threshold value.

In some embodiments, the method further comprises: acquiring an initial result generated in advance aiming at a target face video, wherein the initial result is used for indicating whether a face corresponding to the target face video is a living face or not; and generating a final result for indicating whether the face corresponding to the target face video is the living body face or not based on the initial result and the detection result.

In some embodiments, the method further comprises: and sending the final result to the electronic equipment in communication connection, and controlling the electronic equipment to present the final result.

In some embodiments, the method further comprises: and in response to the fact that the detection result indicates that the face corresponding to the target face video is a living body face, detecting a video frame sequence corresponding to the target face video based on an optical flow method, and generating a final result for indicating whether the face corresponding to the target face video is the living body face.

In a second aspect, embodiments of the present disclosure provide an apparatus for detecting a living body, the apparatus including: an extraction unit configured to extract adjacent video frames from a video frame sequence corresponding to a target face video as a first face image and a second face image; a first determination unit configured to determine face key points from a first face image as first face key points; a second determination unit configured to determine, as second face key points, face key points corresponding to the first face key points from the second face image; and the first generation unit is configured to generate a detection result used for indicating whether the face corresponding to the target face video is the living face or not based on the distance between the first face key point and the second face key point.

In some embodiments, the first determination unit is further configured to: and inputting the first face image into a pre-trained face key point recognition model to obtain a face key point serving as the first face key point.

In some embodiments, the second determination unit is further configured to: and inputting the second face image into the face key point recognition model to obtain face key points serving as second face key points.

In some embodiments, the first generation unit includes: a determining module configured to determine whether a distance between the first face key point and the second face key point is greater than or equal to a preset threshold; and the first generation module is configured to generate a detection result for indicating that the face corresponding to the target face video is the living body face in response to the determination that the face is greater than or equal to the preset threshold value.

In some embodiments, the first generation unit further comprises: and the second generation module is configured to generate a detection result for indicating that the face corresponding to the target face video is a non-living face in response to determining that the distance between the first face key point and the second face key point is smaller than a preset threshold value.

In some embodiments, the apparatus further comprises: the acquisition unit is configured to acquire an initial result generated in advance aiming at the target face video, wherein the initial result is used for indicating whether a face corresponding to the target face video is a living body face or not; and a second generating unit configured to generate a final result indicating whether the face corresponding to the target face video is a live face based on the initial result and the detection result.

In some embodiments, the apparatus further comprises: a sending unit configured to send the final result to the communicatively connected electronic device and to control the electronic device to present the final result.

In some embodiments, the apparatus further comprises: and a second generation unit configured to detect the video frame sequence corresponding to the target face video based on an optical flow method in response to determining that the detection result indicates that the face corresponding to the target face video is a live face, and generate a final result indicating whether the face corresponding to the target face video is a live face.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method of any of the embodiments of the method for detecting a face of a living subject described above.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium, on which a computer program is stored, which when executed by a processor, implements the method of any of the above-described methods for detecting a face of a living subject.

The method and the device for detecting the living body provided by the embodiments of the present disclosure extract adjacent video frames from a video frame sequence corresponding to a target face video as a first face image and a second face image, then determine face key points from the first face image as first face key points, then determine face key points corresponding to the first face key points from the second face image as second face key points, and finally generate a detection result for indicating whether a face corresponding to the target face video is a living body face based on distances between the first face key points and the second face key points, so as to determine whether the face corresponding to the face video is a living body face based on distances between the corresponding face key points in the adjacent face images corresponding to the face video, it can be understood that when a distance or a distance generated between the corresponding face key points is greater than or equal to a preset threshold value, the method can determine that the face corresponding to the face video executes the action, and further determine that the face corresponding to the face video is a living body face, so that more convenient living body detection can be realized, and the efficiency of the living body detection is improved; in addition, the living body detection is carried out based on the face key points of the face image, so that the detection complexity can be reduced, and the consumption of a CPU (central processing unit) in the living body detection process can be reduced; in addition, the method provided by the disclosure can be used for verifying the initial results generated by other in-vivo detection methods in advance, so that the accuracy of in-vivo detection is improved; in addition, the method can be used as a preprocessing step of the living body detection in the prior art, so that the human face video to be subjected to the living body detection in the prior art can be screened, and the method is favorable for improving the efficiency of the living body detection.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for detecting a face of a living subject according to the present disclosure;

FIG. 3 is a schematic diagram of an application scenario of a method for detecting a live face according to an embodiment of the present disclosure;

FIG. 4 is a flow diagram of yet another embodiment of a method for detecting a face of a living subject according to the present disclosure;

FIG. 5 is a schematic diagram of an embodiment of an apparatus for detecting a face of a living subject according to the present disclosure;

FIG. 6 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 shows anexemplary system architecture 100 to which embodiments of the method for detecting a living body or the apparatus for detecting a living body of the present disclosure may be applied.

As shown in fig. 1, thesystem architecture 100 may include

terminal devices

101, 102, 103, anetwork 104, and aserver 105. Thenetwork 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and theserver 105.Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with theserver 105 via thenetwork 104 to receive or send messages or the like. Various communication client applications, such as payment-type software, shopping-type application, video processing-type application, search-type application, instant messaging tool, mailbox client, social platform software, etc., may be installed on the

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, and 103 are hardware, they may be various electronic devices with cameras, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

Theserver 105 may be a server that provides various services, such as a video processing server that processes a target face video captured by the

terminal apparatuses

101, 102, 103. The video processing server may perform processing such as analysis on the received data of the target face video and obtain a processing result (e.g., a detection result indicating whether a face corresponding to the target face video is a living face).

It should be noted that the method for detecting a living body provided by the embodiment of the present disclosure may be executed by the

terminal devices

101, 102, 103, or may be executed by theserver 105, and accordingly, the apparatus for detecting a living body may be disposed in the

terminal devices

101, 102, 103, or may be disposed in theserver 105.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where data used in generating the detection result does not need to be acquired from a remote place, the system architecture described above may not include a network but only a terminal device or a server.

With continued reference to FIG. 2, aflow 200 of one embodiment of a method for detecting a living subject according to the present disclosure is shown. The method for detecting a living body includes the steps of:

step 201, extracting adjacent video frames from a video frame sequence corresponding to a target face video as a first face image and a second face image.

In the present embodiment, an execution subject (for example, theserver 105 shown in fig. 1) of the method for detecting a living body may acquire a target face video from a remote location or a local location by a wired connection manner or a wireless connection manner, and extract adjacent video frames from a video frame sequence corresponding to the target face video as a first face image and a second face image. The target face video may be a face video for performing living body detection. Specifically, the target face video may be a video obtained by shooting a face. The face shot here may be a real face (i.e., a living face) or a virtual face (i.e., a non-living face, such as a face sculpture, a face image, etc.).

In practice, the sequence of video frames is arranged in the chronological order of the playing times. Because the target face video is obtained by shooting a face, the video frames in the video frame sequence comprise face image areas corresponding to the shot face.

Specifically, the execution subject may extract two adjacent video frames from a video frame sequence corresponding to the target face video, and determine the two extracted video frames as the first face image and the second face image, respectively. Here, the determination manner of the first face image and the second face image may be arbitrary. For example, if a video frame a and a video frame B are extracted from the target face video, the execution subject may determine the video frame a as a first face image and determine the video frame B as a second face image; alternatively, the video frame B may be determined as the first face image, and the video frame a may be determined as the second face image.

Specifically, the execution subject may further extract at least two groups of video frames from the video frame sequence corresponding to the target face video, with two adjacent video frames as a group, and for each of the extracted at least two groups of video frames, the execution subject may determine two adjacent video frames in the group of video frames as the first face image and the second face image, respectively. Similarly, the determination method of the first face image and the second face image corresponding to each group of video frames may be any.

Step 202, determining face key points from the first face image as first face key points.

In this embodiment, based on the first face image obtained instep 201, the executing subject may determine a face key point from the first face image as the first face key point.

In practice, the face key points may be key points in the face (virtual face or real face), and specifically, may be points that affect the face contour or the shape of five sense organs. As an example, the key points of the face may be points corresponding to the tip of the nose, eyes, corners of the mouth, and the like.

In this embodiment, the first face keypoints may be characterized in various forms, for example, by points marked in the first face image; alternatively, it may be characterized in terms of coordinates, which may be used to indicate the location of the first face keypoints in the first face image.

Specifically, the executing entity may determine the face key points from the first face image as the first face key points by using various methods. As an example, the execution subject may display a first face image, and further obtain a face key point selected by a user from the first face image as the first face key point.

In some optional implementation manners of this embodiment, the executing body may input the first face image into a pre-trained face key point recognition model, and obtain the face key point as the first face key point.

In this implementation manner, the face key point recognition model may be used to represent the correspondence between the face image and the face key point corresponding to the face image. Specifically, as an example, the face key point recognition model may be a correspondence table in which a plurality of face images and corresponding face key points are stored, the correspondence table being pre-made by a technician in advance based on statistics of a large number of face images and the face key points corresponding to the face images; the model may be a model obtained by training an initial model (e.g., a neural network) by a machine learning method based on a preset training sample.

Step 203, determining a face key point corresponding to the first face key point from the second face image as a second face key point.

In this embodiment, based on the second face image obtained instep 201, the execution subject may determine, from the second face image, a face key point corresponding to the first face key point as the second face key point. The face key point corresponding to the first face key point is a face key point whose corresponding face part is the same as the face part corresponding to the first face key point, for example, the face part corresponding to the first face key point is a mouth corner, and the face part corresponding to the face key point corresponding to the first face key point is also a mouth corner.

Specifically, the executing entity may determine the second face key points corresponding to the first face key points from the second face image by using various methods. As an example, the executing entity may track the first face key points by using an existing optical flow method, and further determine the second face key points corresponding to the first face key points from the second face image.

In some optional implementation manners of this embodiment, the execution subject may input a second face image into the face key point recognition model, and obtain a face key point as the second face key point.

Here, it should be noted that the model input by the second face image is a face key point recognition model for obtaining the first face key points, and the face key points that can be recognized by the face key point recognition model are determined by the training samples in the training of obtaining the face key point recognition model and are predetermined (for example, points corresponding to eyes can be recognized), and further, the second face image is input to the face key point recognition model for obtaining the first face key points, so that the second face key points corresponding to the first face key points can be obtained.

And 204, generating a detection result for indicating whether the face corresponding to the target face video is the living body face or not based on the distance between the first face key point and the second face key point.

In this embodiment, based on the first face key points obtained instep 202 and the second face key points obtained instep 203, the execution subject may determine distances between the first face key points and the second face key points, and generate a detection result indicating whether the face corresponding to the target face video is a live face based on the determined distances. Wherein, the detection result may include but is not limited to at least one of the following: numbers, words, symbols, images, audio.

In this embodiment, the distance between the first face key point and the second face key point refers to a distance between the first face key point and the second face key point in the same coordinate system. Specifically, since the shape and size of the first face image and the second face image are respectively the same, the execution main body may establish a coordinate system based on the first face image or the second face image, and then map the second face key points or the first face key points to the established coordinate system, thereby determining the distance between the first face key points and the second face key points.

It should be noted that the executing entity may establish a coordinate system by various methods based on the face image (the first face image or the second face image), for example, may establish a rectangular coordinate system by using the face key points (the first face key points or the second face key points) in the face image as an origin and using any two coordinate axes perpendicular to each other as an x axis and a y axis.

Here, for the first face key point and the second face key point in the same coordinate system, the executing entity may determine the distance therebetween by using various methods. For example, a first face key point and a second face key point may be connected to obtain a line segment, and the length of the obtained line segment, that is, the distance between the first face key point and the corresponding second face key point, is determined; or the coordinates of the first face key points and the corresponding second face key points may be determined, and finally, based on the determined coordinates, the distance between the first face key points and the corresponding second face key points is determined by using a distance formula.

Specifically, based on the distance between the first face key point and the second face key point, the execution subject may generate a detection result indicating whether the face corresponding to the target face video is a live face by using various methods. For example, the execution subject may determine whether a distance between the first face key point and the second face key point is 0; if not, a detection result (for example, "1") indicating that the face corresponding to the target face video is a living face may be generated; if so, a detection result (e.g., -1 ") indicating that the face corresponding to the target face video is a non-living face may be generated.

In some optional implementations of the embodiment, based on the distance between the first face key point and the second face key point, the executing subject may generate a detection result indicating whether the face corresponding to the target face video is a live face by: first, the execution subject may determine whether a distance between the first face key point and the second face key point is greater than or equal to a preset threshold. Further, the execution subject may generate a detection result indicating that the face corresponding to the target face video is a live face in response to determining that the face is greater than or equal to the preset threshold. The preset threshold may be a minimum distance value preset by a technician.

In some optional implementation manners of this embodiment, the executing body may further generate a detection result indicating that the face corresponding to the target face video is a non-living face in response to determining that the distance between the first face key point and the second face key point is smaller than the preset threshold.

It should be particularly noted that, when at least two groups of video frames are extracted based onstep 201, for each group of video frames in the at least two groups of video frames, the executing entity may determine the first face key point and the second face key point corresponding to the group of video frames throughstep 202 and step 203, and further may generate the detection result corresponding to the group of video frames based on the distance between the first face key point and the second face key point corresponding to the group of video frames. Furthermore, the execution body may generate at least two detection results for at least two sets of video frames.

In some optional implementation manners of this embodiment, after obtaining the detection result, the executing entity may further detect, based on an optical flow method, a video frame sequence corresponding to the target face video in response to determining that the detection result indicates that the face corresponding to the target face video is a live face, and generate a final result for indicating whether the face corresponding to the target face video is a live face. Wherein the final result can be used for presentation, which can include but is not limited to at least one of the following: numbers, words, symbols, images, audio.

In practice, according to the optical flow method, the "motion" of a pixel point can be determined by using the time domain variation and the correlation of the pixel intensity data of a video frame in a video frame sequence, and then a gaussian difference filter or a support vector machine and the like are adopted to perform data statistical analysis on the motion information, so that whether the face corresponding to the video frame sequence is a living face or not can be determined.

In this implementation manner, when the corresponding detection result indicates that the face video is a living face, the detection is performed based on the optical flow method, and when the detection result indicates that the face video is a non-living face, the detection may not be performed based on the optical flow method, so that the face video to be detected based on the optical flow method may be screened, which is beneficial to improving the efficiency of the living body detection.

With continued reference to fig. 3, fig. 3 is a schematic view of an application scenario of the method for detecting a living body according to the present embodiment. In the application scenario of fig. 3, theserver 301 may first extract two adjacent video frames from the video frame sequence corresponding to thetarget face video 302 as afirst face image 3031 and asecond face image 3032. Then, theserver 301 may determine face keypoints from thefirst face image 3031 asfirst face keypoints 3041, and determine face keypoints corresponding to the first face keypoints 3041 from thesecond face image 3032 assecond face keypoints 3042. Finally, theserver 301 may determine adistance 305 between the first facekey point 3041 and the second facekey point 3042, and generate adetection result 306 indicating whether the face corresponding to thetarget face video 302 is a live face based on thedistance 305.

At present, in-vivo detection in the prior art generally needs to analyze and operate each pixel point of a video frame in a video, and the method has high detection cost and low efficiency.

The method provided by the embodiment of the disclosure can determine whether the face corresponding to the face video is a living body face or not based on the distance between the corresponding face key points in the adjacent face images corresponding to the face video, and it can be understood that when the distance generated between the corresponding face key points or the distance is greater than or equal to the preset threshold value, the face corresponding to the face video can be determined to execute an action, and further the face corresponding to the face video can be determined to be a living body face, so that more convenient living body detection can be realized, and the efficiency of the living body detection can be improved; in addition, the living body detection is carried out based on the face key points of the face image, so that the detection complexity can be reduced, and the consumption of a CPU (central processing unit) in the living body detection process can be reduced; in addition, the method provided by the disclosure can be used for verifying the initial results generated by other in-vivo detection methods in advance, so that the accuracy of in-vivo detection is improved; in addition, the method can be used as a preprocessing step of the living body detection in the prior art, so that the human face video to be subjected to the living body detection in the prior art can be screened, and the method is favorable for improving the efficiency of the living body detection.

With further reference to FIG. 4, aflow 400 of yet another embodiment of a method for detecting a living subject is shown. Theprocess 400 of the method for detecting a living subject includes the steps of:

step 401, extracting adjacent video frames from a video frame sequence corresponding to a target face video as a first face image and a second face image.

In the present embodiment, an executing subject (for example, theserver 105 shown in fig. 1) of the method for detecting a living body may acquire a target face video through a wired connection manner or a wireless connection manner, and extract adjacent video frames from a video frame sequence corresponding to the target face video as a first face image and a second face image. The target face video may be a face video for performing living body detection. Specifically, the target face video may be a video obtained by shooting a face.

Step 402, determining face key points from the first face image as first face key points.

In this embodiment, based on the first face image obtained instep 401, the executing entity may determine a face key point from the first face image as the first face key point.

And 403, determining the face key points corresponding to the first face key points from the second face image as second face key points.

In this embodiment, based on the second face image obtained instep 401, the execution subject may determine, from the second face image, a face key point corresponding to the first face key point as the second face key point. And the face key points corresponding to the first face key points are the face key points of which the corresponding face parts are the same as the face parts corresponding to the first face key points.

And step 404, generating a detection result for indicating whether the face corresponding to the target face video is a living face or not based on the distance between the first face key point and the second face key point.

In this embodiment, based on the first face key points obtained instep 402 and the second face key points obtained instep 403, the executing entity may determine distances between the first face key points and the second face key points, and generate a detection result indicating whether the face corresponding to the target face video is a live face based on the determined distances. Wherein, the detection result may include but is not limited to at least one of the following: numbers, words, symbols, images, audio.

In this embodiment, the distance between the first face key point and the second face key point refers to the distance between the first face key point and the second face key point when the first face key point and the second face key point are located on the same image.

The

steps

401, 402, 403 and 404 are respectively consistent with the

steps

201, 202, 203 and 204 in the foregoing embodiment, and the above description for the

steps

201, 202, 203 and 204 also applies to the

steps

401, 402, 403 and 404, which is not described herein again.

Step 405, obtaining an initial result generated in advance for the target face video.

In this embodiment, the execution subject may obtain the initial result generated in advance for the target face video from a remote location or a local location through a wired connection manner or a wireless connection manner. The initial result is a result generated in advance by using an existing live body detection method and used for indicating whether a face corresponding to the target face video is a live body face, and may include, but is not limited to, at least one of the following: numbers, words, symbols, images, audio.

And 406, generating a final result for indicating whether the face corresponding to the target face video is the living body face or not based on the initial result and the detection result.

In this embodiment, based on the initial result obtained instep 405 and the detection result obtained instep 404, the execution subject may generate a final result indicating whether the face corresponding to the target face video is a live face. Wherein the final result can be used for presentation, which can include but is not limited to at least one of the following: numbers, words, symbols, images, audio.

In some optional implementation manners of this embodiment, after obtaining the final result, the execution main body may further send the final result to the electronic device connected in communication, and control the electronic device to present the final result.

Here, the electronic device may be a terminal or a server. Specifically, the execution main body may send a control signal to the electronic device, so as to control the electronic device to present a final result. Here, the form of presentation may be determined according to the form of the final result, for example, if the form of the final result is audio, the form of presentation may be play; the final result is in the form of an image or text, and the presentation may be in the form of a display.

In the implementation manner, since the final result is a result generated based on the initial result and the detection result, compared with a scheme of presenting the initial result in the prior art, the implementation manner can control the electronic device to present a more accurate result, and improve the accuracy of the in-vivo detection.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, theflow 400 of the method for detecting a living body in the present embodiment highlights the steps of acquiring the initial result generated in advance for the target face video, and generating the target face video based on the initial result and the detection result. Therefore, the scheme described in this embodiment can verify whether the face generated in advance and used for representing the target face video corresponds to the living body face initial result by using the obtained detection result, so that the accuracy of the final result corresponding to the target face video can be improved, and more accurate living body detection results can be displayed.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides one embodiment of an apparatus for detecting a living body, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable in various electronic devices.

As shown in fig. 5, theapparatus 500 for detecting a living body of the present embodiment includes: anextraction unit 501, afirst determination unit 502, asecond determination unit 503, and afirst generation unit 504. Wherein the extractingunit 501 is configured to extract adjacent video frames from a video frame sequence corresponding to a target face video as a first face image and a second face image; thefirst determination unit 502 is configured to determine face key points from the first face image as first face key points; the second determiningunit 503 is configured to determine, as second face key points, face key points corresponding to the first face key points from the second face image; thefirst generating unit 504 is configured to generate a detection result indicating whether the face corresponding to the target face video is a live face based on a distance between the first face key point and the second face key point.

In this embodiment, theextraction unit 501 of theapparatus 500 for detecting a living body may acquire the target face video from a remote location or a local location by a wired connection manner or a wireless connection manner, and extract adjacent video frames as the first face image and the second face image from a video frame sequence corresponding to the target face video. The target face video may be a face video for performing living body detection. Specifically, the target face video may be a video obtained by shooting a face.

In this embodiment, based on the first face image obtained by theextraction unit 501, thefirst determination unit 502 may determine face key points as first face key points from the first face image.

In this embodiment, based on the second face image obtained by theextraction unit 501, thesecond determination unit 502 may determine, as the second face keypoints, face keypoints corresponding to the first face keypoints from the second face image. The face key point corresponding to the first face key point is a face key point whose corresponding face part is the same as the face part corresponding to the first face key point, for example, the face part corresponding to the first face key point is a mouth corner, and the face part corresponding to the face key point corresponding to the first face key point is also a mouth corner.

In this embodiment, based on the first face keypoints obtained by thefirst determination unit 502 and the second face keypoints obtained by thesecond determination unit 503, thefirst generation unit 504 may determine distances between the first face keypoints and the second face keypoints, and generate a detection result indicating whether the face corresponding to the target face video is a live face based on the determined distances. Wherein, the detection result may include but is not limited to at least one of the following: numbers, words, symbols, images, audio.

In some optional implementations of this embodiment, the first determiningunit 502 may be further configured to: and inputting the first face image into a pre-trained face key point recognition model to obtain a face key point serving as the first face key point.

In some optional implementations of this embodiment, the second determiningunit 503 may be further configured to: and inputting the second face image into the face key point recognition model to obtain face key points serving as second face key points.

In some optional implementations of the present embodiment, thefirst generating unit 504 may include: a determining module (not shown in the figures) configured to determine whether a distance between the first face key point and the second face key point is greater than or equal to a preset threshold; and a first generation module (not shown in the figure) configured to generate a detection result indicating that the face corresponding to the target face video is a live face in response to determining that the face is greater than or equal to a preset threshold.

In some optional implementations of the present embodiment, thefirst generating unit 504 may further include: and a second generating module (not shown in the figure) configured to generate a detection result indicating that the face corresponding to the target face video is a non-living face in response to determining that the distance between the first face key point and the second face key point is smaller than a preset threshold.

In some optional implementations of this embodiment, theapparatus 500 may further include: an acquiring unit (not shown in the figure) configured to acquire an initial result generated in advance for the target face video, wherein the initial result is used for indicating whether a face corresponding to the target face video is a living face; and a second generating unit (not shown in the figure) configured to generate a final result indicating whether the face corresponding to the target face video is a live face based on the initial result and the detection result.

In some optional implementations of this embodiment, theapparatus 500 may further include: a sending unit (not shown in the figure) configured to send the final result to the communicatively connected electronic device and to control the electronic device to present the final result.

In some optional implementations of this embodiment, theapparatus 500 may further include: and a third generating unit (not shown in the figure) configured to, in response to determining that the detection result indicates that the face corresponding to the target face video is a live face, detect the sequence of video frames corresponding to the target face video based on an optical flow method, and generate a final result indicating whether the face corresponding to the target face video is a live face.

It will be understood that the elements described in theapparatus 500 correspond to various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to theapparatus 500 and the units included therein, and are not described herein again.

The apparatus 500 provided in the foregoing embodiment of the present disclosure may determine whether a face corresponding to a face video is a living body face based on distances between corresponding face key points in adjacent face images corresponding to the face video, and it can be understood that when a distance or a distance generated between corresponding face key points is greater than or equal to a preset threshold, it may be determined that the face corresponding to the face video performs an action, and further, it may be determined that the face corresponding to the face video is a living body face, so that a simpler and more convenient living body detection may be implemented, which is beneficial to improving the efficiency of living body detection; in addition, the living body detection is carried out based on the face key points of the face image, so that the detection complexity can be reduced, and the consumption of a CPU (central processing unit) in the living body detection process can be reduced; in addition, the method provided by the disclosure can be used for verifying the initial results generated by other in-vivo detection methods in advance, so that the accuracy of in-vivo detection is improved; in addition, the method can be used as a preprocessing step of the living body detection in the prior art, so that the human face video to be subjected to the living body detection in the prior art can be screened, and the method is favorable for improving the efficiency of the living body detection.

Referring now to fig. 6, a schematic diagram of an electronic device (e.g.,

terminal devices

101, 102, 103 orserver 105 of fig. 1) 600 suitable for implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6,electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of theelectronic apparatus 600 are also stored. Theprocessing device 601, theROM 602, and the RAM603 are connected to each other via abus 604. An input/output (I/O)interface 605 is also connected tobus 604.

Generally, the following devices may be connected to the I/O interface 605:input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.;output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like;storage 608 including, for example, tape, hard disk, etc.; and acommunication device 609. The communication means 609 may allow theelectronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates anelectronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from theROM 602. The computer program, when executed by theprocessing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: extracting adjacent video frames from a video frame sequence corresponding to a target face video to serve as a first face image and a second face image; determining face key points from the first face image as first face key points; determining face key points corresponding to the first face key points from the second face image as second face key points; and generating a detection result for indicating whether the face corresponding to the target face video is a living face or not based on the distance between the first face key point and the second face key point.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Here, the name of a unit does not constitute a limitation of the unit itself in some cases, and for example, the first generation unit may also be described as a "unit that generates a detection result".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method for detecting a living body, comprising:

extracting adjacent video frames from a video frame sequence corresponding to a target face video to serve as a first face image and a second face image;

determining face key points from the first face image as first face key points;

tracking the first face key points by using an optical flow method to determine face key points corresponding to the first face key points from the second face image as second face key points;

generating a detection result for indicating whether the face corresponding to the target face video is a living face or not based on the distance between the first face key point and the second face key point in the same coordinate system;

in response to determining that the detection result indicates that the face corresponding to the target face video is a living body face, detecting a video frame sequence corresponding to the target face video based on an optical flow method, and generating a final result for indicating whether the face corresponding to the target face video is a living body face;

and generating a final result for indicating that the face corresponding to the target face video is the non-living face in response to determining that the detection result indicates that the face corresponding to the target face video is the non-living face.

2. The method of claim 1, wherein said determining face keypoints from the first face image as first face keypoints comprises:

and inputting the first face image into a pre-trained face key point recognition model to obtain a face key point serving as a first face key point.

3. The method of claim 2, wherein said determining face keypoints from the second face image that correspond to first face keypoints as second face keypoints comprises:

and inputting the second face image into the face key point recognition model to obtain face key points serving as second face key points.

4. The method of claim 1, wherein the generating a detection result indicating whether the face corresponding to the target face video is a live face based on the distance between the first face key point and the second face key point comprises:

determining whether the distance between the first face key point and the second face key point is greater than or equal to a preset threshold value;

and generating a detection result for indicating that the face corresponding to the target face video is a living body face in response to the fact that the face is determined to be larger than or equal to the preset threshold value.

5. The method of claim 4, wherein the generating a detection result indicating whether the face corresponding to the target face video is a live face based on the distance between the first face key point and the second face key point further comprises:

and generating a detection result for indicating that the face corresponding to the target face video is a non-living face in response to determining that the distance between the first face key point and the second face key point is smaller than the preset threshold value.

6. The method according to one of claims 1-5, wherein the method further comprises:

acquiring an initial result generated in advance aiming at the target face video, wherein the initial result is used for indicating whether the face corresponding to the target face video is a living body face;

and generating a final result for indicating whether the face corresponding to the target face video is a living face or not based on the initial result and the detection result.

7. The method of claim 6, wherein the method further comprises:

and sending the final result to the electronic equipment in communication connection, and controlling the electronic equipment to present the final result.

8. An apparatus for detecting a living body, comprising:

an extraction unit configured to extract adjacent video frames from a video frame sequence corresponding to a target face video as a first face image and a second face image;

a first determination unit configured to determine face key points from the first face image as first face key points;

a second determination unit configured to track the first face key points by an optical flow method to determine face key points corresponding to the first face key points from the second face image as second face key points;

a first generating unit configured to generate a detection result indicating whether the face corresponding to the target face video is a live face based on a distance between the first face key point and the second face key point in the same coordinate system;

a third generating unit configured to detect, in response to determining that the detection result indicates that the face corresponding to the target face video is a living body face, a sequence of video frames corresponding to the target face video based on an optical flow method, and generate a final result indicating whether the face corresponding to the target face video is a living body face; and generating a final result for indicating that the face corresponding to the target face video is the non-living face in response to determining that the detection result indicates that the face corresponding to the target face video is the non-living face.

9. The apparatus of claim 8, wherein the first determining unit is further configured to:

10. The apparatus of claim 9, wherein the second determining unit is further configured to:

11. The apparatus of claim 8, wherein the first generating unit comprises:

a determining module configured to determine whether a distance between the first face key point and the second face key point is greater than or equal to a preset threshold;

a first generation module configured to generate a detection result indicating that the face corresponding to the target face video is a live face in response to determining that the face is greater than or equal to the preset threshold.

12. The apparatus of claim 11, wherein the first generating unit further comprises:

a second generation module configured to generate a detection result indicating that the face corresponding to the target face video is a non-living face in response to determining that the distance between the first face key point and the second face key point is smaller than the preset threshold.

13. The apparatus according to one of claims 8-12, wherein the apparatus further comprises:

the acquisition unit is configured to acquire an initial result generated in advance for the target face video, wherein the initial result is used for indicating whether a face corresponding to the target face video is a living body face;

a second generating unit configured to generate a final result indicating whether the face corresponding to the target face video is a live face based on the initial result and the detection result.

14. The apparatus of claim 13, wherein the apparatus further comprises:

a sending unit configured to send the final result to a communicatively connected electronic device and control the electronic device to present the final result.

15. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

16. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.