Summary of the invention
For the problems referred to above overcoming existing biological characteristic anti-counterfeiting technology to exist, the present invention provides a kind of based on man-machine interaction and patternIdentify the biological characteristic biopsy method that combines and system, with refuse prosthese attack record a video such as face picture, face andSound is recorded, and improves the safety of living things feature recognition.
Details are as follows for a kind of biological characteristic in-vivo detection method and system that the application provides.
First aspect according to the embodiment of the present application, it is provided that a kind of biological characteristic biopsy method, including:
Generate random action sequence instruction;
The code of described random action sequence instruction is converted into text, vision and/or auditory coding, and with visual,Audible sound or the two mode combined present;
Gather user's response image sequence;
Tong Bu described response image sequence and described random action sequence instruction are carried out vision present;
Analyze the response action sequence of user in described response image sequence;
Judge whether described response action sequence meets the action sequence that described random action sequence instruction is corresponding, if symbolClose, then judge that described response action sequence is from live body people.
Wherein, described biological characteristic biopsy method, it is also possible to including:
Generate the instruction of random voice sequence;
The code that described random voice sequence instructs is converted into text, vision and/or auditory coding, and with visual,Audible sound or the two mode combined present;
Gather user's voice responsive sequence;
Analyze the voice responsive of user in described voice responsive sequence;
Judge whether described voice responsive meets the voice sequence that the instruction of described random voice sequence is corresponding, if met, thenJudge that described voice responsive sequence is from live body people.
Wherein, described biological characteristic biopsy method, is that the time is specified in each action in random action sequence instructionStamp, described timestamp is used for identifying the movement time of each action or the initial time of each action and end time, describedTimestamp stochastic generation.
Wherein, the response action sequence of user in described analysis described response image sequence, including:
The face in every image is detected from response image sequence;
Every face is carried out key point location;
Face key point according to location calculates head pose corner;
Face key point according to location calculates human face expression type;
The response action sequence of user is obtained according to described head pose corner and described human face expression type;
The action sequence that relatively described response action sequence is corresponding with described random action sequence instruction, calculates type of action symbolRight;
Described type of action goodness of fit is compared with the first predetermined threshold value, if described type of action goodness of fit is pre-more than firstIf threshold value, then judge that described response action sequence, from live body people, is otherwise not considered as from live body people.
Wherein, the response action sequence of user in described analysis described response image sequence, it is also possible to including:
To each action in described response action sequence, calculate the movement time of each action;
Relatively the movement time of each action of described calculating and the timestamp of each action, calculate goodness of fit movement time;
The overall goodness of fit of calculating action=type of action goodness of fit+w × movement time goodness of fit, wherein w is weights;
Overall for described action goodness of fit is compared with the second predetermined threshold value, if the overall goodness of fit of action presets threshold more than secondValue, then judge that described response action sequence, from live body people, is otherwise not considered as from live body people.
Wherein, described biological characteristic biopsy method, it is also possible to including:
Identify described voice responsive sequence content;
Calculate the voice content goodness of fit of described voice responsive sequence;
Calculate overall goodness of fit=type of action goodness of fit+w1 × movement time goodness of fit+w2 × voice content to meetDegree, wherein w1, w2 are weights;
Described overall goodness of fit is compared with the 3rd predetermined threshold value, if overall goodness of fit is more than the 3rd predetermined threshold value, then sentencesDisconnected described response action sequence, from live body people, is otherwise not considered as from live body people.
Wherein, the complexity of random action sequence instruction and described first predetermined threshold value, second pre-are set according to safe classIf threshold value and the size of the 3rd predetermined threshold value.
Corresponding to the first aspect of the embodiment of the present application, according to the second aspect of the embodiment of the present application, it is provided that a kind of biological specialLevy In vivo detection system, including:
Action sequence instruction signal generating unit, is used for generating random action sequence instruction;
Action command display unit, including display and speaker, for first by the code of described random action sequence instructionIt is converted into text, vision and/or auditory coding, and presents in the way of visual, audible sound or the two combination;
Wherein, described display, for showing text and/or the picture of visual coding of described random action sequence instruction,Described speaker, for playing text and/or the auditory coding sound of described random action sequence instruction;
Image acquisition units, is used for gathering user and responds human face image sequence;
Response action display unit, for Tong Bu regarding described response image sequence and described random action sequence instructionFeel presents;
Motion analysis unit, for analyzing the response action sequence of user in described response human face image sequence;
Action goodness of fit judging unit, is used for judging whether described response action sequence meets described random action sequence instructionCorresponding action sequence, if met, then judges that described response action sequence is from live body people.
Wherein, described biological characteristic In vivo detection system, it is also possible to including:
Phonetic order signal generating unit, is used for generating random phonetic order;
Phonetic order display unit, including display and speaker, for the code first instructed by described random voice sequenceIt is converted into text, vision and/or auditory coding, and presents in the way of visual, audible sound or the two combination;
Voice collecting unit, is used for gathering user's voice responsive sequence;
Voice analyzing unit, for analyzing the voice responsive of user in described voice responsive sequence;
Voice goodness of fit judging unit, is used for judging whether described voice responsive meets described random voice sequence instruction correspondenceVoice sequence, if met, then judge that described voice responsive sequence is from live body people.
Wherein, described random action sequence instruction signal generating unit is that the time is specified in each action in random action sequence instructionStamp, described timestamp is used for identifying the movement time of each action or the initial time of each action and end time, describedTimestamp stochastic generation.
Wherein, described motion analysis unit, including:
Face datection subelement, for detecting the face in every image from response action sequence;
Key point locator unit, for carrying out key point location to every face;
Head pose corner computation subunit, calculates head pose corner for the face key point according to location;
Human face expression type computation subunit, calculates human face expression type for the face key point according to location;
Action sequence identification subelement, for obtaining described sound according to described head pose corner and described human face expression typeAnswer action sequence;
Described action goodness of fit judging unit, including:
Move corresponding with the instruction of described action sequence of type of action goodness of fit computation subunit, relatively described response action sequenceMake sequence, calculate type of action goodness of fit;
First judgment sub-unit, for described type of action goodness of fit is compared with the first predetermined threshold value, if described actionType goodness of fit is more than the first predetermined threshold value, then in response action sequence, the response type of action of people meets described random actionSequence instruction, it is judged that described response action sequence, from live body people, is otherwise not considered as from live body people.
Wherein, described motion analysis unit, it is also possible to including:
Movement time, computation subunit, was used for, to each action in described response action sequence, calculating the dynamic of each actionMake the time;
Movement time, goodness of fit computation subunit, was used for the movement time of each action of described calculating and each actionTimestamp comparation, calculates goodness of fit movement time;
Action overall goodness of fit computation subunit, is used for calculating the overall goodness of fit of action, the overall goodness of fit of action=action classType goodness of fit+w × movement time goodness of fit, wherein w is weights;
Second judgment sub-unit, for comparing overall for described action goodness of fit with the second predetermined threshold value, if greater than secondPredetermined threshold value, then in response action sequence, the response action of people meets the action sequence that described random action sequence instruction is corresponding,Judge that described response action sequence, from live body people, is otherwise not considered as from live body people.
Wherein, described biological characteristic In vivo detection system, it is also possible to including:
Voice analyzing unit, is used for identifying described voice responsive sequence content;
Voice goodness of fit computing unit, for calculating the described voice responsive content goodness of fit of described voice responsive sequence;
Overall goodness of fit computing unit, is used for calculating overall goodness of fit, calculate overall goodness of fit=type of action goodness of fit+W1 × movement time goodness of fit+w2 × voice content goodness of fit, wherein w1, w2 are weights;
3rd judging unit, for described overall goodness of fit being compared with the 3rd predetermined threshold value, presets threshold if greater than the 3rdValue, then judge that described response action sequence, from live body people, is otherwise not considered as from live body people.
Wherein, the complexity of random action sequence instruction and described first predetermined threshold value, second pre-are set according to safe classIf threshold value and the size of the 3rd predetermined threshold value.
The technical scheme that the embodiment of the present application provides can include following beneficial effect: action and phonetic order stochastic generation,It is difficult to use cut-and-dried human face photo, video or voice language material to attack;Random action sequence instruction is carried outAudio visual presents, and effectively helps user to understand instruction;Action and the voice synchronous feedback of user are presented, effectively guidesUser makes corresponding actions and sounding, thus improves the safety of authentication, and the ease for use of improving product and user's bodyTest.
It should be appreciated that it is only exemplary and explanatory that above general description and details hereinafter describe, can notLimit the application.
Detailed description of the invention
Here will illustrate exemplary embodiment in detail, its example represents in the accompanying drawings.Explained below relates to attachedDuring figure, unless otherwise indicated, the same numbers in different accompanying drawings represents same or analogous key element.Following exemplary is implementedEmbodiment described in example does not represent all embodiments consistent with the application.On the contrary, they be only with such asThe example of the apparatus and method that some aspects that described in detail in appended claims, the application are consistent.
In order to understand the application comprehensively, refer to numerous concrete details in the following detailed description, but art technologyPersonnel are it should be understood that the application can realize without these details.In other embodiments, it is not described in detail public affairsMethod, process, assembly and the circuit known, obscures in order to avoid undesirably resulting in embodiment.
Fig. 1 is the schematic flow sheet of a kind of biological characteristic biopsy method shown in the application one exemplary embodiment, asShown in Fig. 1, described method includes:
Step S101, generates random action sequence instruction.
Wherein, described random action sequence instruction is used for indicating how user does action, and it is described by type of action and forms,Can also include that timestamp is specified in each action, described timestamp is for identifying the movement time of each action or each actionTime started and the end time.Described timestamp is relative time stamp when the movement time identifying each action, tableLevy in random action sequence instruction, length movement time of each type of action;Described timestamp is used for identifying each movingIt is absolute timestamp when the time started made and end time, when can calculate the action of each action according to absolute timestampBetween, the end time of the most each action deducts the end time of each action.Described relative time stamp and absolute timestamp canWith stochastic generation, so that the random degree of random action sequence instruction is higher, during absolute timestamp stochastic generation, laterThe time started of the absolute timestamp of action is more than the end time of the absolute timestamp of previous action.Described action sequenceInstruction can be that individual part instruction can also be combined by multiple action commands.Described generation random action sequence instruction, canTo include:
(a1) action number N, such as N=4 are determined at random.
(a2) from candidate actions type set, randomly select N number of type of action and be combined, N number of dynamic in combinationThe order making type is random, such as 4 type of action of stochastic generation (head left-hand rotation 30 degree → 10 degree → left-hand rotation 20 of turning rightDegree → turn right 40 degree), or (head left-hand rotation 30 degree → become a full member into 0 degree → open one's mouth → shut up).
(a3) being randomly assigned movement time and/or the action frequency of each type of action, described movement time is each actionThe persistent period of type, wherein, movement time can be with relative time stamp or the form addition action sequence of absolute timestampRow instruction, the most each type of action describes the timestamp plus mark movement time, adds action sequence with timestamp formInstruction, can facilitate corresponding with each type of action, separate each type of action, and concordance is good, the most error-prone.When each type of action specify for absolute timestamp time, the time started of latter action is equal to the end of previous actionTime.
Can be with required movement number of times to each type of action, default-action number of times is 1 time, if user is at required movementIt is made that repeatedly action in time, then whether has done action in only identifying the required movement time in follow-up identification, and failed to seeDo not do several times;Can required movement number of times, not required movement time;Can also movement time and action frequency all refer toFixed, movement time and the action frequency of each type of action can be the same or different;Can also partial act type refer toDetermine movement time, the partial act type not required movement time, or, partial act type required movement number of times, partType of action not required movement number of times, by that analogy.Movement time and the action frequency of such as type of action are appointed as: fromLeft-to-right shake the head 2 times, 2 seconds movement times;Shake the head from top to bottom 3 times, 3 seconds movement times;Open one's mouth 2 times, action1 second time;Closing one's eyes 3 times, 2 seconds movement times, the instruction of corresponding action sequence is: (from left to right shake the head 2 times, dynamicMake the time 2 seconds) → (shaking the head from top to bottom 3 times, 3 seconds movement times) → (opening one's mouth 2 times, 1 second movement time)→ (closing one's eyes 3 times, 2 seconds movement times), specifies movement time therein in the way of relative time stamp.Work as movement timeWhen specifying in the way of absolute timestamp, corresponding action sequence instruction can be: (from the 0th second to the 2nd second, from a left side toShake the head 2 times in the right side) → (from the 2nd second to the 5th second, shake the head from top to bottom 3 times) → (from the 5th second to the 6th second,Mouth 2 times) → (from the 6th second to the 8th second, close one's eyes 3 times)
Wherein, random action sequence instruction can be according to the different complexity of the different set of safe class, such as, safetyDuring grade height, increase action number N in action sequence.
Step S102, is converted into text, vision and/or auditory coding by the code of described random action sequence instruction, andPresent in the way of visual, audible sound or the two combination.
Wherein, the code of random action sequence instruction is converted into text and/or visual coding, is rendered as " opening one's mouth ", " closesMouth ", text printed words, image or the animated of the action such as " head left-hand rotation ", " bowing ", " nictation ", then by it to regardThe mode felt presents to user by display;Or the code of random action sequence instruction is converted into text and/or auditionCoding, is converted into text, is then converted into voice by TTS (Text To Speech) engine, such as, passes through speakerVoice broadcast " is opened one's mouth ", " shutting up ", the sound such as " head left-hand rotation ", " bowing ", " nictation ";Or with vision and audition twoThe mode that person combines presents to user.Present by audio-visual and point out, helping user to understand instruction, in order in timeCorresponding action is made according to prompting.
Step S103, gathers user's response image sequence.
Wherein it is possible to utilize photographic head or other image/video capture apparatus, the action of shooting user, thus collectResponse human face image sequence, in response human face image sequence, each image is the frame of video that shooting obtains.
Step S104, Tong Bu carries out vision by described response image sequence and described random action sequence instruction and presents;
Wherein, synchronize to carry out vision and present be by captured response image sequence pictures together with random action sequence instructionIt is simultaneously displayed on screen, feeds back to user in time, allow users to adjust the action of oneself so that it is random and action sequenceRow instruction is consistent.
Step S105, analyzes the response action sequence of user in described response image sequence.
Wherein, step S105 includes:
(a1) from response action sequence, the face in every image is detected;
(a2) every face is carried out key point location;
(a3) head pose corner is calculated according to the face key point of location;
(a4) human face expression type is calculated according to the face key point of location;
(a5) described response action sequence is calculated according to head pose and expression type;
(a6) the action sequence that the response action sequence of described calculating gained is corresponding with described random action sequence instruction is comparedRow, calculate type of action goodness of fit.
Wherein, each image in response action sequence is carried out Face datection, available based on local feature andThe human-face detector of Adaboost study, it is also possible to utilize the human-face detector obtained based on neural metwork training, if inspectionMeasure face, then continue below step, be not detected by face, then skip this image, if every image does not all haveFace detected, then terminate whole process, now can point out user again to user by the way of vision or auditionStart.
After image detects face, face is carried out key point location, i.e. to every facial image, choose correspondingThe multiple key points preset, such as, choose 68 key points, can sketch the contours of face details profile according to key point coordinate.Attitude and the expression classification of face is calculated on the basis of key point.
In alternatively possible embodiment, it is possible to use feature assessment method obtains head pose corner and human face expression classType, described feature assessment method gathers the face image data under substantial amounts of different attitude and expression in advance, from facial image numberAccording to middle extraction external performance, the mode such as SVM or recurrence of employing is trained and is obtained Attitude estimation grader, then uses instructionGet and facial image carried out attitude and expression is estimated at Attitude estimation grader, such as facial image, Ke YijinRow Gabor characteristic is extracted or LBP (Local Binary Patterns, local binary patterns) feature extraction, usesSVM (Support Vector Machine, support vector machine) training obtains attitude and expression classifier to carry out faceThe Attitude estimation of image and expression classification.
Wherein, after obtaining every head pose corner corresponding to facial image and human face expression type, according to described head appearanceResponse human face image sequence is carried out cutting by state corner and described human face expression type, refers to each action to separate and to identifyThe human action that order is corresponding, meet with a response action sequence.The timestamp cutting that described cutting can instruct according to action sequence,Type of action cutting in can also instructing according to action sequence.
The response human face image sequence that timestamp cutting according to action sequence instruction will gather is by according to described timestampThe movement time obtained carries out cutting, when timestamp is relative time stamp, during the i.e. action of size that described relative time stabsBetween size, can according to relative time stamp carry out cutting, when timestamp is absolute timestamp, during according to action start-stopBetween carry out cutting.Such as action sequence instruction is (from left to right shaking the head 2 times, 2 seconds movement times) → (from top to bottomShake the head 3 times, 3 seconds movement times) → (opening one's mouth 2 times, 1 second movement time) → (close one's eyes 3 times, movement time 2Second), the time of timestamp mark is respectively 2 seconds, 2 seconds, 1 second and 2 seconds, then will response human face image sequence by 2 seconds,Within 2 seconds, 1 second, 2 seconds, carry out cutting.If action sequence instruction is (from the 0th second to the 2nd second, from left to right shake the head 2 times)→ (from the 2nd second to the 5th second, shake the head from top to bottom 3 times) → (from the 5th second to the 6th second, open one's mouth 2 times) → (from6th second to the 8th second, close one's eyes 3 times), the beginning and ending time of each action of timestamp mark is respectively the 0th second to the 2ndSecond, the 2nd second to the 5th second, the 5th second to the 6th second, the 6th second to the 8th second, then pressing response human face image sequenceCutting is carried out according to the beginning and ending time.The every section of response human face image sequence obtaining cutting, in conjunction with every the image detectedHead pose corner and human face expression type, identify the human action that every section of response human face image sequence is corresponding, dynamic to shake the headAs example, the head pose corner of each facial image obtained during shaking the head is different, by the response of every sectionThe head pose angle data of human face image sequence combines, and extracts the motion characteristic of every section of response human face image sequence,Use the human action recognizer of routine, every section of action corresponding to response human face image sequence can be obtained, also may be used simultaneouslyTo identify the number of times obtaining described action.Every section of number of times responding action corresponding to human face image sequence and described action is by formerSome time order and function sequential combination, meet with a response action sequence.
Type of action cutting in instructing according to action sequence, i.e. according to head pose corner and the face of each facial imageExpression type, and according to type of action in action sequence instruction and corresponding action frequency and order, successively to all soundAnswering human face image sequence to carry out action recognition, such as, in action sequence instruction, first element type is for shaking the head, action timeNumber is 2 times, then recognize whether head shaking movement and the number of times shaken the head in all response human face image sequences, ifCan recognize that the action shaken the head, then by response human face image sequence corresponding for all head shaking movements from all response face figuresCut out as in sequence, and keep the sequence of positions being syncopated as responding human face image sequence at whole facial images, such asThe facial image being split out is in all responding the front end of human face image sequence, records the response people cut out simultaneouslyThe number of times of the action that face image sequence is corresponding, such as, the number of times shaken the head identified.Then in instructing according to action sequenceSecond type of action, remaining response human face image sequence is carried out action recognition, such as in action sequence instruction theTwo type of action, for nodding, are nodded 3 times, then recognize whether to nod in remaining response human face image sequence dynamicThe number of times made and nod, if it is possible to identify the action nodded, then by response face figure corresponding for all nodding actionsAs sequence cuts out from all response human face image sequences, and keep the response human face image sequence being syncopated as allTime sequencing in facial image, and and the response human face image sequence being syncopated as the first time between sequential relationship, exampleIn this way at all the response front end of human face image sequence, middle part or afterbody, and at the response facial image separated for the first timeBefore sequence or after, record the number of times of action corresponding to response human face image sequence that this cuts out simultaneously.By that analogy, until last the type of action cutting in instructing according to action sequence is complete.After cutting, will be everyWhat section cut out responds the number of times of human action corresponding to human face image sequence and described human action by original time firstRear sequential combination, i.e. meet with a response action sequence.Wherein, for responding the action recognition of human face image sequence, Ke YigenConventional action recognition is used to calculate according to head pose corner and the human face expression type of each image in response human face image sequenceMethod.When response human face image sequence being carried out cutting according to type of action, contain action during cutting simultaneouslyIdentify.Again every section of response human face image sequence can be carried out after cutting the identification of human action, to ensure described knowledgeOther correctness, it is also possible to no longer carry out described identification, meets with a response action sequence according to the result of cutting.
When the timestamp instructed according to action sequence or type of action carry out cutting to response human face image sequence, Ke YiEvery section of response human face image sequence that cutting obtains adds relative time stamp or absolute timestamp, for identifying correspondenceThe persistent period of action or its time started and end time.During according to timestamp cutting, due to every section of response face figureAs sequence has clear and definite time span or beginning and ending time, it may not be necessary to add again timestamp identify correspondence action timeBetween length or beginning and ending time.When carrying out cutting according to type of action, to the every section of response human face image sequence being sliced into,Obtain first facial image and the time of last facial image in time sequencing, respond face figure respectively as this sectionThe time started of the action answered as sequence pair and end time, according to time started and end time in response action sequenceCorresponding human action adds timestamp.In response action sequence, add timestamp be conducive to separating each type of action,Remaining response human face image sequence after selecting when being also beneficial to cutting to carry out everyone body action identification, and be conducive to meterCalculate the movement time of each type of action.
For responding the cutting of human face image sequence, carrying out cutting according to timestamp, dicing process is simple, but requires userCan be in strict accordance with time requirement execution, owing to the action of people is difficult to accurate assurance sometimes movement time, can only be substantiallyMeeting the requirement of time span, such as, it is desirable to shake the head 2s, actual shaking the head may carry out cutting for 2.2s according to timestamp,Then it is likely to result in every section of cutting human action corresponding to response human face image sequence imperfect or have the remnants of other actionsImage, makes the identification of human action error occur.According to type of action cutting, although dicing process is complex, but rootThe response human face image sequence obtained according to cutting can identify complete human action exactly.
In a kind of possible embodiment, when response human face image sequence being carried out cutting according to type of action, if entirelyThe response human face image sequence in portion cannot recognize that the first element type in action sequence instruction, or according to allResponse human face image sequence identify instruct with action sequence in first element type identical type of action time, this movesMake response human face image sequence corresponding to type not in the front end (Part I) all responding human face image sequence, the most permissibleDetermine recognition failures, be not required to carry out again follow-up step.This mode can be the most undesirable at user's first element,Or forge feature when can not carry out action, it is determined that active user non-living body people, and terminate follow-up flow process, the most permissibleMore succinctly prevent the attack of possible dangerous feature rapidly.
Wherein, the action sequence that relatively the response action sequence of described calculating gained is corresponding with the instruction of described action sequence, meterCalculating every section of action in type of action goodness of fit, i.e. comparison response action sequence and corresponding action command, comparison includes movingThe type made and the number of times of action, according to the result of comparison, arrange different weights, such as, action sequence for every section of actionIn row instruction, first element type is for shaking the head, and action frequency is 3 times, if first element class in response action sequenceType is for shaking the head, and action frequency is 3 times, then the weights S of first element type in response action sequence1Could be arranged to 1,If first element type is for shaking the head in response action sequence, but action frequency is 2 times, then S1Could be arranged to 0.7,By that analogy.The weights of every section of action are added, obtain type of action goodness of fit.
Wherein, under random action sequence instruction is the pattern that type of action adds relative time stamp, step S105 is all rightIncluding:
(b1) to each action in the described response action sequence calculated, the movement time of each action is calculated;
(b2) movement time of each action is compared with the relative time stamp of each action, calculate goodness of fit movement time;
(b3) the overall goodness of fit of calculating action=type of action goodness of fit+w × movement time goodness of fit, wherein w is powerValue;
Movement time corresponding to every section of action and corresponding relative time stamp in the response action sequence that will obtain compareRight.According to the result of comparison, every section of action for response action sequence arranges different time weights, by every section of actionTime weight is added, and obtains goodness of fit movement time.The time weight of every section of action can be equal to (1-error movement time),Or equal to (1/ movement time error).Wherein, in random action sequence instruction, relative time stamp is t1, and responds actionIn sequence, the movement time of certain section of action is t2, then this section of action
The then overall goodness of fit of action=type of action goodness of fit+w × movement time goodness of fit, wherein w is weights.
Wherein, under random action sequence instruction is the pattern that type of action adds absolute timestamp, step S105 is all rightIncluding:
(c1) to each action in the described response action sequence calculated, the movement time of each action is calculated;
(c2) movement time of each action is compared with the absolute timestamp of each action, calculate goodness of fit movement time;
(c3) the overall goodness of fit of the calculating action=overall goodness of fit of calculating action=type of action goodness of fit+w × movement timeGoodness of fit, wherein w is weights;
Movement time corresponding to every section of action and corresponding absolute timestamp in the response action sequence that will obtain compareRight.According to the result of comparison, every section of action for response action sequence arranges different time weights, by every section of actionTime weight is added, and obtains goodness of fit movement time.The time weight of every section of action can be equal to (1-error movement time),Or equal to (1/ movement time error).Wherein, in random action sequence instruction, absolute timestamp is the t1 second to t2Second, and responding the movement time of certain section of action in action sequence is T, then this section of action
The then overall goodness of fit of action=type of action goodness of fit+w × movement time accordance, wherein w is weights.
Additionally, under random action sequence instruction is the pattern that type of action adds absolute timestamp, another kind of scheme is all rightIncluding:
Whether user is on the timestamp specified in analysis, has done the instruction action specified.Such as type of action job sequence is(the 0th second to the 2nd second, head turned left to 30 degree → the 2nd second to the 4th second, turned right to 10 degree → the 4th second to the 5thSecond, turn left to 20 degree → the 5th second to the 7th second, right-hand rotation is to 40 degree), the time started of absolute timestamp is respectively the0 second, the 2nd second, the 4th second, the 5th second, the end time was respectively the 2nd second, the 4th second, the 5th second, the 7th second.Whether system test head position when the 2nd second is left avertence 30 degree, the head position whether right avertence 10 degree when the 4th second,When the 5th second, head position whether left avertence 20 degree, when the 7th second, head position whether right avertence 40 degree, if all according withClose, then judged that 4 corresponding headworks judge that described response action sequence, from live body people, is otherwise not considered asFrom live body people.
Step S106, it is judged that whether described response action sequence meets the action sequence that described random action sequence instruction is correspondingRow, if met, then judge that described response action sequence is from live body people.
Wherein, described type of action goodness of fit is compared with the first predetermined threshold value, if described type of action goodness of fit is more thanFirst predetermined threshold value, then in response action sequence, the response type of action of people meets the instruction of described action sequence, it is judged that describedResponse action sequence, from live body people, is otherwise not considered as from live body people.It is that type of action adds at random action sequence instructionUnder the pattern of timestamp, it is also possible to overall for described action goodness of fit is compared with the second predetermined threshold value, pre-if greater than secondIf threshold value, then in response action sequence, the response action of people meets the action sequence that the instruction of described action sequence is corresponding, it is judged thatDescribed response action sequence, from live body people, is otherwise not considered as from live body people.
Wherein, described first predetermined threshold value and the second predetermined threshold value can be arranged according to the requirement of degree of safety, such as,Safety grade is high, then the first predetermined threshold value and the second predetermined threshold value are set to big value.
Below with the application mobile payment live body checking applied environment under an application case further illustrate the application,So that those skilled in the art are more fully understood that principle and the application of the application.
During mobile payment, forge feature during for preventing authentication and cause checking by mistake, active user need to be identified whetherFor real people.For making case the clearest, the key step of the application is carried out citing description.If mobile payment processIn, the instruction set of vivo identification system candidate includes { shake the head, open one's mouth, nictation } three kinds of common actions.
(1a) when, after system start-up vivo identification, system stochastic generation action sequence instructs, such as " from left to right shake the head3 times, 6 seconds movement times;Open one's mouth 2 times, 1 second movement time;Blink 4 times, 2 seconds movement times ", and with animationForm generation action command schematic diagram, present to user.
(2a) user is according to action command schematic diagram, in the face of photographic head, starts shooting, can make corresponding on requestAction, now system acquisition response human face image sequence, after the whole action of user completes, terminates shooting, now system knotBundle gathers response human face image sequence.
(3a) use Gabor characteristic to extract and SVM training obtains Attitude estimation grader, use described Attitude estimationThe response human face image sequence collected is estimated the attitude of each image by grader frame by frame, including head, eyes, noseState with face.
(4a) by response human face image sequence according to the timestamp of action command be cut into time span be 6 seconds, 1 second, 2Three sections of second.According to the attitude of every facial image, identify the human action that every section of response action sequence is corresponding, rungAnswer action sequence.
If being from left to right to shake for 3 times according to the corresponding human action that first paragraph response human face image sequence identification obtainsHead, then (type of action is described as from left to right shaking the head, action time to arrange first element corresponding in response action sequenceNumber is 3) weightsIf action corresponding in response action sequence is from left to right to shake the head for 2 times, then arrangeIf action corresponding in response action sequence is from left to right to shake the head for 1 time, then arrangeIfAccording to first paragraph response, action sequence is unidentified is from left to right shaken the head, then arrange
If the corresponding human action obtained according to second segment response human face image sequence identification is for opening one's mouth 2 times, then arrangeThe weights of second action corresponding in response action sequenceIf opened one's mouth, number of times is 1 time, then arrangeIf opened one's mouth, number of times is 0 (obtaining action of opening one's mouth not according to second segment response action sequence identification), then arrange
If being number of winks 4 times according to the corresponding human action that the 3rd section of response action sequence identification obtains, then arrangeThe weights note of the 3rd action corresponding in response action sequenceIf number of winks is 3 times, then arrangeIf number of winks is 2 times;Then arrangeIf number of winks is 1 time, then arrangeIf obtaining action nictation not according to second segment response action sequence identification, then arrange
(5a) the type of action goodness of fit being calculated response action sequence and random action sequence instruction isAnd compare with the first predetermined threshold value, ifThen actionType goodness of fit Sa=1.75, the threshold value preset is 2, then owing to type of action goodness of fit is less than the first predetermined threshold value, identifyFailure, it is judged that current user's non-living body, correspondingly, authentication can not be passed through.
If response action sequence being carried out cutting according to type of action, then step (4a) could alternatively be:
(4b) all response human face image sequences are carried out the identification of from left to right head shaking movement, if dynamic in all responsesHave identified the action from left to right shaken the head as sequence, response human face image sequence corresponding to described action is positioned at all responsesThe front end of human face image sequence, but action frequency is less than three times, then arrange the weights of first element in response action sequenceIf action frequency reaches three times, then rememberAnd recording responses human face image sequence gathers time started t0With the acquisition time t1 of last facial image responding action sequence of correspondence of from left to right shaking the head for the last time, calculateMeet with a response t1-t0 movement time of first element in action sequence.It can also be provided that other value, it is also possible to rootAccording to different action frequencies, different values is set.
Response human face image sequence after the t1 moment is opened one's mouth the identification of action, if that identifies opens one's mouth number of times notFull 2 times, then the weights of second action in response action sequence are setIf opened one's mouth, number of times reaches 2 times, thenArrangeAnd when recording the collection of corresponding last facial image responding human face image sequence of opening one's mouth for the last timeBetween t2, using t2-t1 as movement time of second action in response human face image sequence.
Response human face image sequence after the t2 moment is carried out the identification of action nictation, if action nictation identified is notFull 4 times, then the weights of the 3rd action in response human face image sequence are setIf action nictation reaches 4 times,Then rememberAnd when recording the collection of last corresponding last facial image responding human face image sequence of blinkingBetween t3, using t3-t2 as movement time of the 3rd action in response human face image sequence.
Step (5a) could alternatively be simultaneously:
(5b) response action sequence and the overall goodness of fit of random action sequence instruction are calculated:
WhereinT1, T2, T3 be respectively random action sequence instruction is shaken the head, open one's mouth, blink correspondence timestamp, η is weight coefficient.
Rule of thumb require to arrange the second predetermined threshold value θ with safe class, when overall goodness of fit is more than the second predetermined threshold value θTime, then it is judged to live body people, is otherwise judged to non-living body people.
In a kind of possible embodiment, the biological characteristic biopsy method that the embodiment of the present application provides, also include:
(d1) instruction of random voice sequence is generated;
(d2) code that described random voice sequence instructs is converted into text, vision and/or auditory coding, and with visionPicture, audible sound or the two mode combined present;;
(d3) user's voice responsive sequence is gathered;
(d4) voice responsive of user in described voice responsive sequence is analyzed;
(d5) judge whether described voice responsive meets the voice sequence that the instruction of described random voice sequence is corresponding, if symbolClose, then judge that described voice responsive sequence is from live body people.
Wherein, the instruction of described random voice sequence can be a string word or a string sound bite, its content stochastic generation,Or in sound template storehouse, randomly draw some sound templates be combined as phonetic order sequence.Phonetic order sequence after generationRow can show to indicate user at display with the form of word, image, or by raising one's voice in the way of speech playDevice instruction user, it is also possible to indicate with display and speaker in the way of the combination of text, image and speech play simultaneously and useFamily.User, after receiving instruction, according to instruction voice, i.e. sends voice responsive.Gather user by sound pick-up outfit to ringAnswer voice sequence.When phonetic order sequence is a string literal, or when being made up of sound template, can be to collectingUser's voice responsive sequence carries out audio analysis and identification, and described audio analysis can be conventional audio content analysis and knowledgeNot, result and the phonetic order sequence of identification are contrasted, if the percentage ratio shared by identical part is pre-more than oneIf threshold value, such as more than 90%, then judge that described user's voice responsive sequence is from live body people;When random phonetic orderWhen being a string literal, the result of described identification can be converted to word, after described voice content is converted to word and languageSound job sequence contrasts, if the word that the word that is converted to of described voice content and phonetic order sequence are consistentThe threshold value default more than one, such as more than 90%, then judge that described user's voice responsive sequence is from live body people.When withWhen machine phonetic order sequence is a string sound bite, can be to the user's voice responsive sequence collected and phonetic order sequenceCarry out Waveform Matching analysis, if the Waveform Matching degree of voice responsive sequence and random phonetic order sequence is preset more than oneThreshold value, then judge that described user's voice responsive sequence is from living person.
In a kind of possible embodiment, combining graphical analysis and speech analysis, action and voice to people simultaneously is enteredRow discriminatory analysis judges whether user is live body, then described biological characteristic biopsy method, may include that
(e1) random action sequence instruction is generated;
(e2) code of described random action sequence instruction is converted into text, vision and/or auditory coding, and with visionPicture, audible sound or the two mode combined present;
(e3) user's response image sequence is gathered;
(e4) the response action sequence of user in described response image sequence is analyzed;
(e5) judge whether described response action sequence meets the action sequence that described random action sequence instruction is corresponding, meterThe overall goodness of fit of calculation action;
(e6) random phonetic order is generated;
(e7) code that described random voice sequence instructs is converted into text, vision and/or auditory coding, and with visionPicture, audible sound or the two mode combined present;
(e8) user's voice responsive sequence is gathered;
(e9) voice responsive of user in described voice responsive sequence is analyzed;
(e10) the voice content goodness of fit of described voice responsive sequence is calculated, will voice responsive sequence content and randomPhonetic order alignment, according to the situation of contrast, each voice for voice responsive sequence arranges different weights, willThe weights of each voice are added, and obtain voice content goodness of fit;
(e11) overall goodness of fit=action overall goodness of fit+w2 × voice content goodness of fit=type of action goodness of fit is calculated+ w1 × movement time goodness of fit+w2 × voice content goodness of fit, wherein w1, w2 are weights;
(e12) described overall goodness of fit is compared with the 3rd predetermined threshold value, if greater than the 3rd predetermined threshold value, then judgeDescribed response action sequence, from live body people, is otherwise not considered as from live body people.
By the description of above embodiment of the method, those skilled in the art is it can be understood that can borrow to the applicationThe mode helping software to add required general hardware platform realizes, naturally it is also possible to by hardware, but a lot of in the case of the formerIt it is more preferably embodiment.Based on such understanding, prior art is made by the technical scheme of the application the most in other wordsThe part of contribution can embody with the form of software product, and is stored in a storage medium, including some instructionsWith so that a smart machine performs all or part of step of method described in each embodiment of the application.And aforesaid depositStorage media includes: read only memory (ROM), random access memory (RAM), magnetic disc or CD etc. are various canWith storage data and the medium of program code.
Fig. 2 is the structural representation of a kind of biological characteristic In vivo detection system shown in the application one exemplary embodiment.AsShown in Fig. 2, described system includes:
Action sequence instruction signal generating unit U201, is used for generating random action sequence instruction;
Action command display unit U202, including display and speaker, for first by described random action sequence instructionCode be converted into text, vision and/or auditory coding, then done visual, audible sound or the two combineMode presents;
Image acquisition units U203, is used for gathering user and responds human face image sequence;
Response action display unit U204, for Tong Bu with described random action sequence instruction by described response image sequenceCarry out vision to present;
Motion analysis unit U205, for analyzing the response action sequence of user in described response human face image sequence;
Action goodness of fit judging unit U206, is used for judging whether described response action sequence meets described random action sequenceThe action sequence that row instruction is corresponding, if met, then judges that described response action sequence is from live body people.
Wherein, described biological characteristic In vivo detection system, it is also possible to including:
Phonetic order signal generating unit, is used for generating random phonetic order;
Phonetic order display unit, including display and speaker, for the code first instructed by described random voice sequenceBe converted into text, vision and/or auditory coding, then done visual, audible sound or the two combine mode inExisting;
Voice collecting unit, is used for gathering user's voice responsive sequence;
Voice analyzing unit, for analyzing the voice responsive of user in described voice responsive sequence;
Voice goodness of fit judging unit, for judging whether described voice responsive sequence meets the language that described phonetic order is correspondingSound sequence, if met, then judges that described voice responsive sequence is from live body people.
Wherein, in a kind of possible embodiment, described action sequence instruction signal generating unit is at random action sequence instructionIn be that timestamp is specified in each action, described timestamp is for identifying the initial of movement time of each action or each actionTime and end time, described timestamp stochastic generation.
Wherein, described action command display unit is converted into text and/or visual coding the code of random action sequence instruction,Such as " open one's mouth ", " shutting up ", the text printed words of action, image, the animated such as " head left-hand rotation ", " bowing ", " nictation ",Then it is presented to user by display in the way of vision;The code of random action sequence instruction be converted into text and/orAuditory coding, is converted into text, is then converted into voice by TTS (Text To Speech) engine, such as by raising one's voiceDevice voice broadcast " is opened one's mouth ", " shutting up ", the sound such as " head left-hand rotation ", " bowing ", " nictation ";Or both vision and audition are tiedThe mode closed presents to user.Present prompting by audio visual, help user to understand instruction, in order to make according to instruction in timeCorresponding action.
Wherein, described image acquisition units, photographic head or other image/video capture apparatus, the action of shooting user,Thus collecting response human face image sequence, in response human face image sequence, each image is the frame of video that shooting obtains.
Wherein, response action display unit, for Tong Bu with described random action sequence instruction by described response image sequenceCarrying out vision to present, display, on screen, feeds back to user in time, allows users to adjust the action of oneself so that it is withRandom action sequence instruction is consistent.
Wherein, described motion analysis unit, may include that
Face datection subelement, for detecting the face in every image from response action sequence;
Key point locator unit, for carrying out key point location to every face;
Head pose corner computation subunit, calculates head pose corner for the face key point according to location;
Human face expression type computation subunit, calculates human face expression type for the face key point according to location;
Action sequence identification subelement, for calculating described sound according to described head pose corner and described human face expression typeAnswer action sequence.
Described action goodness of fit judging unit, may include that
The response action sequence of type of action goodness of fit computation subunit, relatively described calculating gained and described random action sequenceThe action sequence that row instruction is corresponding, calculates type of action goodness of fit;
First judgment sub-unit, for described type of action goodness of fit is compared with the first predetermined threshold value, if described actionType goodness of fit is more than the first predetermined threshold value, then in response action sequence, the response type of action of people meets described random actionSequence instruction, it is judged that described response action sequence, from live body people, is otherwise not considered as from live body people.
Wherein, described motion analysis unit, also include:
Movement time, computation subunit, for each action in the described response action sequence calculated, calculated eachThe movement time of action;
Movement time, goodness of fit computation subunit, was used for the timestamp ratio of the movement time of each action with stochastic generationRelatively, goodness of fit movement time is calculated;
Action overall goodness of fit computation subunit, is used for calculating the overall goodness of fit of action, the overall goodness of fit of action=action classType goodness of fit+w × movement time goodness of fit, wherein w is weights;
Second judgment sub-unit, for comparing overall for described action goodness of fit with the second predetermined threshold value, if greater than secondPredetermined threshold value, then in response action sequence, the response action of people meets the action sequence that the instruction of described action sequence is corresponding, sentencesDisconnected described response action sequence, from live body people, is otherwise not considered as from live body people.
The biological characteristic In vivo detection system that the embodiment of the present application provides, in a kind of possible embodiment, it is also possible to bagInclude:
Voice analyzing unit, is used for identifying described voice responsive sequence content;
Voice goodness of fit computing unit, for calculating the described voice responsive content goodness of fit of described voice responsive sequence;
Overall goodness of fit computing unit, is used for calculating overall goodness of fit, calculate overall goodness of fit=type of action goodness of fit+W1 × movement time goodness of fit+w2 × voice content goodness of fit, wherein w1, w2 are weights;
3rd judging unit, for described overall goodness of fit being compared with the 3rd predetermined threshold value, presets threshold if greater than the 3rdValue, then judge that described response action sequence, from live body people, is otherwise not considered as from live body people.
Complexity and described first predetermined threshold value, second predetermined threshold value of random action sequence instruction is set according to safe classSize with the 3rd predetermined threshold value.
Fig. 3 is that the random action sequence instruction visualization vision of biological character vivo detecting system presents and random voice sequenceThe schematic diagram that command visible vision presents.Wherein, (1), (2), (3) represent (left-hand rotation 45 of random action sequence instructionDegree → front → right-hand rotation 45 degree), present with writings and image simultaneously, (4) represent that random voice sequence instructs, and useWritten form presents, and wherein example is for reading in short, it is also possible to read a string random digit.
Fig. 4 is random action sequence instruction and user's response of display a kind of biological characteristic In vivo detection system when being perpendicular screenThe schematic diagram that image sequence synchronizing visual presents.Refer to preferably guide collected object to make to meet random action sequenceThe action sequence of order, is concurrently presented aobvious by the response image sequence of random action sequence instruction and collection in the way of visionShow on device.To perpendicular screen, random action sequence instruction is presented on the upper right corner of the response image sequence of collection, guiding in real timeUser makes and responds action sequence accordingly.Wherein in Fig. 4, (1) to (4) represents positive face → side face → positive face → open one's mouthRandom action sequence instruction and corresponding response image sequence present schematic diagram.
Fig. 5 is random action sequence instruction and user's response of display a kind of biological characteristic In vivo detection system when being transverse screenThe schematic diagram that image sequence synchronizing visual presents.Wherein in Fig. 5 (1) to (4) represent positive face → side face → positive face →The random action sequence instruction of mouth and corresponding response image sequence present schematic diagram.
Fig. 6 is text and the response image sequence schematic diagram that vision presents simultaneously, the wherein upper end of random action sequence instructionShowing every random action sequence instruction one by one, lower part shows gathered user's response image sequence.
Fig. 7 instructs by random voice sequence and random action sequence instruction is same together with the user's response image sequence gatheredThe schematic diagram of step display.
Each embodiment in this specification all uses the mode gone forward one by one to describe, identical similar part between each embodimentSeeing mutually, what each embodiment stressed is the difference with other embodiments.Especially for deviceOr for system embodiment, owing to it is substantially similar to embodiment of the method, so describing fairly simple, relevant part ginsengSee that the part of embodiment of the method illustrates.Apparatus and system embodiment described above is only schematically, whereinThe described unit illustrated as separating component can be or may not be physically separate, the portion shown as unitPart can be or may not be physical location, i.e. may be located at a place, or can also be distributed to multiple networkOn unit.Some or all of module therein can be selected according to the actual needs to realize the purpose of the present embodiment scheme.Those of ordinary skill in the art, in the case of not paying creative work, are i.e. appreciated that and implement.
It should be noted that in this article, such as the relational terms of " first " and " second " or the like be used merely to byOne entity or operation separate with another entity or operating space, and not necessarily require or imply these entities or behaviourRelation or the backward of any this reality is there is between work.And, term " includes ", " comprising " or its any itsHis variant is intended to comprising of nonexcludability, so that include the process of a series of key element, method, system or setStandby not only include those key elements, but also include other key elements being not expressly set out, or also include for this process,The key element that method, system or equipment are intrinsic.In the case of there is no more restriction, by statement " including ... "The key element limited, it is not excluded that there is also other phase in including the process of described key element, method, article or equipmentSame key element.
The above is only the detailed description of the invention of the application, makes to skilled artisans appreciate that or realize the application.Multiple amendment to these embodiments will be apparent to one skilled in the art, and as defined herein oneAs principle can realize in other embodiments in the case of without departing from spirit herein or scope.Therefore, this ShenPlease be not intended to be limited to the embodiments shown herein, and be to fit to and principles disclosed herein and features of noveltyThe widest consistent scope.