Summary of the invention
For solving the problems of the technologies described above, the embodiment of the invention provides a kind of voice-based authentication method and device, to solve the existing potential safety hazard of existing Application on Voiceprint Recognition technology.Technical scheme is following:
The present invention provides a kind of voice-based authentication method, and this method comprises:
Generate the text authentication information;
Said text authentication information is shown to user to be certified and points out user response;
Receive the voice messaging of user response, said voice messaging is carried out voiceprint and speech recognition, confirm that according to recognition result whether the user is through authentication.
In one embodiment of the invention, said this voice messaging is carried out voiceprint and speech recognition, confirms that according to the result user whether through authentication, comprising:
Utilize the sound-groove model of training in advance, said voice messaging is carried out voiceprint;
Under the situation that voiceprint passes through, further this voice signal is carried out speech recognition;
Voice identification result and said text authentication information are mated, confirm that according to matching result whether this user is through authentication.
In one embodiment of the invention, this method also comprises:
Under the situation of user, utilize said voice signal that said sound-groove model is revised through authentication.
In one embodiment of the invention, said generation text authentication information comprises:
According to the character set that presets, be that unit generates the text authentication information with the character.
In one embodiment of the invention, said prompting user response comprises: the prompting user is a unit with the character, reads said text authentication information.
In one embodiment of the invention, said speech recognition process is basic recognition unit with character.
In one embodiment of the invention, said voice identification result and the text authentication information that generated are mated, confirm that according to matching result this user whether through authentication, comprising:
According to the diversity factor of similar pronunciation character in the predefined said character set, the character diversity factor of computing voice recognition result and text authentication information is if result of calculation, confirms then that this user is through authentication less than preset threshold value.
In one embodiment of the invention, in the said character set that presets, do not comprise the character that pronunciation is similar.
In one embodiment of the invention,
Said generation text authentication information comprises: generate the information and the problem identificatioin answer of a problem form;
Said the text authentication information is shown to user to be certified and points out user response, comprising: problem is shown to user to be certified and points out the user to answer the problem in the said text authentication information.
In one embodiment of the invention, said voice identification result and the text authentication information that generated are mated, comprising:
Voice identification result and said problem answers are mated.
The present invention also provides a kind of voice-based authenticate device, and this device comprises:
The authentication information generation module is used to generate the text authentication information;
Display module is used for said text authentication information is shown to user to be certified and points out user response;
Authentication module is used to receive the voice messaging of user response, and said voice messaging is carried out voiceprint and speech recognition, confirms that according to recognition result whether the user is through authentication.
In one embodiment of the invention, said authentication module comprises:
The voiceprint submodule is used to receive the voice messaging of user response, utilizes the sound-groove model of training in advance, and said voice messaging is carried out voiceprint;
The speech recognition submodule is used under the situation that voiceprint passes through, further this voice signal being carried out speech recognition;
Authentication sub module is used for voice identification result and said text authentication information are mated, and confirms that according to matching result whether this user is through authentication.
In one embodiment of the invention, this device also comprises:
Correcting module is used under the situation of user through authentication, utilizing said voice signal that said sound-groove model is revised.
In one embodiment of the invention, said authentication information generation module specifically is used for:
According to the character set that presets, be that unit generates the text authentication information with the character.
In one embodiment of the invention, said display module specifically is used for: the prompting user is unit with the character, reads said text authentication information.
In one embodiment of the invention, said speech recognition submodule is basic recognition unit with character.
In one embodiment of the invention, said authentication sub module specifically is used for:
According to the diversity factor of similar pronunciation character in the predefined said character set, the character diversity factor of computing voice recognition result and text authentication information is if result of calculation, confirms then that this user is through authentication less than preset threshold value.
In one embodiment of the invention, in the said character set that presets, do not comprise the character that pronunciation is similar.
In one embodiment of the invention,
Said authentication information generation module specifically is used for: generate the information and the problem identificatioin answer of a problem form;
Said display module specifically is used for: problem is shown to user to be certified and points out the user to answer the problem of said text authentication information.
In one embodiment of the invention, said authentication sub module specifically is used for: voice identification result and said problem answers are mated.
The technical scheme that the embodiment of the invention provided combines voiceprint authentication technology and speech recognition technology, on the one hand; Confirm user's identity through voiceprint; On the other hand, utilize the instant authentication information that generates by user response, whether the response voice messaging content through the speech recognition judges conforms to the authentication content that is generated again; Thereby can avoid the recording deception effectively, solve the single existing potential safety hazard of Application on Voiceprint Recognition technology.
Embodiment
Emergence of human language is complex physical physical process between human body speech center and the vocal organs; Phonatory organ that the people used in when speech such as tongue, tooth, larynx, lung, nasal cavity are everyone widely different aspect size and the form, so any two people's vocal print collection of illustrative plates is all variant.Therefore in the ordinary course of things, people still can distinguish different people's sound or judge whether is same people's sound.
The concrete application of vocal print is divided into two types: one type is Application on Voiceprint Recognition, i.e. both speaker. identification (Speaker Identification); Another kind of is voiceprint, i.e. speaker verification (Speaker Verification).The former in order to judge that certain section voice are some philtrums which is said, be " multiselect one " problem; And the latter is " differentiating one to one " problem in order to confirm that whether certain section voice are that the someone of appointment is said.
Different application scenes can be used different Application on Voiceprint Recognition technology, as possibly need recognition technology when being used for criminal investigation, and application scenarioss such as gate inhibition then need authentication techniques.Compare with Application on Voiceprint Recognition, voiceprint does not need higher precision, but for machine; A just section audio information that when carrying out authentication, receives; Be difficult to distinguish be true man in a minute or recording, if, only user identity is carried out authentication with voiceprint if therefore do not have other supplementary meanss (for example manual oversight); Be easy to utilize the mode of recording to cheat, have bigger potential safety hazard.
To the problems referred to above, the embodiment of the invention provides a kind of voice-based authentication method, and referring to shown in Figure 1, this method can may further comprise the steps:
S101 generates the text authentication information;
S102 is shown to said text authentication information user to be certified and points out user response;
S103, the voice messaging of reception user response carries out voiceprint and speech recognition to said voice messaging, confirms that according to recognition result whether the user is through authentication.
Speech recognition is a kind of technology that lets voice signal change corresponding text or order into.Be able at present use in multiple field.The technical scheme that the embodiment of the invention provided combines voiceprint authentication technology and speech recognition technology, on the one hand; Confirm user's identity through voiceprint; On the other hand, utilize the instant authentication information that generates by user response, whether the response voice messaging content through the speech recognition judges conforms to the authentication content that is generated again; Thereby can avoid the recording deception effectively, solve the single existing potential safety hazard of Application on Voiceprint Recognition technology.
In order to make those skilled in the art understand the technical scheme among the present invention better; To combine the accompanying drawing in the embodiment of the invention below; Technical scheme in the embodiment of the invention describes in detail; Obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, the every other embodiment that those of ordinary skills obtained should belong to the scope that the present invention protects.
It is understandable that voiceprint and speech recognition can be two relatively independent processes, can carry out respectively in theory, then according to the synthetic determination as a result of voiceprint and speech recognition.For example, require the user to say one section (this section words can be fixing), be used to carry out voiceprint; Require the user to read one section simultaneously and specify literal or answer given problem, be used to carry out speech recognition; If voiceprint and speech recognition are satisfied simultaneously, then the user is through authentication.
In practical application, the implementation complexity of speech recognition will be higher than and voiceprint, in aforementioned schemes, mainly still utilizes voiceprint to confirm user identity, and the effect of assisting a ruler in governing a country is then played in speech recognition, and purpose is to avoid the recording deception.And voiceprint and speech recognition all be based on voice messaging, therefore, and in order to reduce system complexity; Improve the efficient of authentication; The present invention program only requires the user to import one section voice authentication information, and and if only if this segment information through after the voiceprint, just further carries out speech recognition.Shown in Figure 2 for the process flow diagram of a kind of embodiment of the present invention, may further comprise the steps:
S201 generates the text authentication information;
S202 is shown to said text authentication information user to be certified and points out user response;
S203 receives the voice messaging of user response, utilizes the sound-groove model of training in advance, and said voice messaging is carried out voiceprint;
S204 under the situation that voiceprint passes through, further carries out speech recognition to this voice signal;
S205 matees voice identification result and said text authentication information, confirms that according to matching result whether this user is through authentication.
Compare with characteristics such as fingerprint, irises, human phonetic feature can drift about along with the time, in order to follow the tracks of speaker's voice drift; In a preferred embodiment of the invention; After once correct authentication, can utilize the voice messaging of this correct authentication, existing sound-groove model is trained; Correction model parameter, thereby the recognition performance of the system of assurance.
Because portable equipments such as mobile phone, panel computer all are equipped with microphone, so certificate scheme provided by the present invention, can be applied to the unlocking operation of portable equipment.But the speech recognition system structure of practicability and algorithm are extremely complicated at present, also need realize by cloud computing platform sometimes, and present mobile device system resource-constrained.To this situation, the scheme that the present invention proposes is: at speech recognition process, adopt the isolated word identification of little vocabulary, can on the portable terminal of computational resource anxiety, realize speech recognition, need not visit cloud computing server, do not rely on network insertion during release.On the other hand, use the text related algorithm of little vocabulary, improve the demand that discrimination is prone to reach practical application, algorithm is simply effective, is easy on embedded system, realize.
In a kind of embodiment of the present invention, choosing identification set is a little isolated word set, specifically can comprise but is not limited to following several kinds of schemes:
A) " 0~9 " 10 arabic numeric characters;
B) " a~z " 26 English alphabet characters;
C) " 0~9 " adds " a~z " 36 characters altogether;
The character that d) pronunciation is similar in the aforementioned three kinds of schemes of removal (for example " 1 " and " 7 "), remaining subclass.
Scheme of the present invention can specifically be divided into the stage of setting and release stage to be implemented, below with describing in detail respectively:
1) is set the stage.Be mainly used in training speaker verification model.Can may further comprise the steps:
S301, whether system detects voice-input device (microphone) available, the available step S302 that then gets into; The unavailable user speech input equipment of then pointing out is unavailable, and withdraws from setting program;
S302, system's opening voice input function shows that the training character string on screen, requires the user to read, in order to guarantee to train successfully, the training character needs enough length, for example requires more than 30 seconds; If be necessary, can also require the user repeatedly to import, to increase template samples quantity.
Whether S303, system detect phonetic entry length automatically and meet the demands, if then begin training, otherwise return step S302, require the user to re-enter;
S304, if train successfully, the prompting user sets success, gets into step S305; If failure to train, the prompting user sets failure.
S305 trains successfully and need once test afterwards.Test is to conciliate the rapid basically identical of lock-step, and system produces a character string at random, requires the string length pronunciation can satisfy the identification needs, then the input speech data is discerned and is adjudicated;
S306 detects whether success of this release, if success then point out the user to train success; If the failure would get into step S305 retry, user's failure to train is then pointed out in 3 continuous failures of retry, withdraws from setting program.
2) the release stage.The user attempts unlocker device.
S401 gets into unlocking program;
S402, the input of system opening voice produces character string at random and is presented on the screen, requires the user to read; Perhaps produce the authentication information (answer that simultaneously also can problem identificatioin) of a problem form, for example simple sum partly is shown to the user with problem, and requires the user to answer this problem.
S403, user according to prompting input voice after, whether system detects voice length automatically and meets the demands, the voice length back system that meets the demands begins to carry out speech recognition, otherwise execution in step S402 again;
S404, the voiceprint module is carried out authentication to user input content, if authentication is passed through, then continues to carry out S405, otherwise the prompting authentification failure
S405; Sound identification module is discerned user input content, and the random string that is produced with S402 compares (or compare with the answer of problem), if in full accord or similarity reaches certain threshold value; Then be judged as validated user, carry out unlocking operation; If judgement then continues locking device, again execution in step S402 for the disabled user;
S406, after the correct release, the speech data that utilizes this correct identification is to the training of speaker verification's model, and this step can effectively be followed the tracks of speaker characteristic drift in time, improves accuracy of identification;
Introduce in the face of the concrete realization of algorithm related among the present invention program down:
1) voiceprint algorithm:
The present invention utilizes the voiceprint algorithm to confirm user identity, and everyone voice have comprised speaker's specific characteristic.Whether through the discriminating to phonetic feature, just can distinguish is same individual, and voiceprint mainly comprises feature extraction and characteristic matching two aspects, and concrete implementation can directly be used existing technology, and the present invention need not limit this.
Because the human speech characteristic generally can in order to follow the tracks of speaker's phonetic feature drift, after once correct identification, utilize speech data to the model training along with the time drifts about, the correction model parameter can improve the system identification accuracy.
2) speech recognition algorithm:
Speech recognition is according to voice input signal, identifies the method for content in a minute, and speech recognition can be divided into alone word voice identification and continuous speech recognition.The alone word voice recognizer is very ripe at present, and discrimination has reached application requirements, for example adopts dynamic time warping (DTW), HMM (HMM), vector quantization (VQ) scheduling algorithm.Isolated word recognizer not only precision is high, and complexity is lower, is easy on mobile device, realize.
3) decision algorithm
When judgement, two threshold value T1 and T2 are set.
T1 is the voiceprint threshold value, and the output valve Tspeaker of voiceprint module is the distance of input voice and sound-groove model, has characterized input feature vector and the similarity that characteristic is set, and Tspeaker and T1 are compared:
If Tspeaker>=T1 then is judged as legal speaker
If Tspeaker<T1 then is judged as illegal speaker
T2 is a speech recognition text diversity factor; In order to judge whether input characters is identical with the identification literal; Whether speech recognition is that recording is counterfeit as auxiliary judgment in the native system; Therefore under the enough situation of character degree length, needn't require must each character identically, less than being set in advance, threshold value T2 then can think legal.
A lot of word pronunciations in the Chinese are similar, when calculating text diversity factor Ttext, can consider the similar speech recognition system erroneous judgement that causes of word sound.The present invention proposes a kind of decision algorithm based on the pronunciation similarity, in detail as follows:
Suppose that preset character set adds for " 0~9 " " a~z " have 36 characters altogether, for a table set up in the similar pronunciation character of character set, to the diversity factor assignment between the character.The intercharacter diversity factor that does not appear in the table all is defaulted as 1, and identical characters differences degree is 0.Table 1 has been listed part numeral and English alphabet Chinese pronunciation difference degree, and the diversity factor value can obtain according to the speech data training.
| Character | Character | Diversity factor |
| “1” | “7” | 0.7 |
| “e” | “1” | 0.3 |
| “k” | “a” | 0.5 |
| “q” | “9” | 0.6 |
| “t” | “1” | 0.7 |
| “u” | “6” | 0.8 |
Table 1
When judging that two character strings are whether similar, can be through calculating the method for its diversity factor total value or mean difference degree, wherein, the method for calculating character mean difference degree is:
Ttext=mean difference degree=total variances degree/string length
If two string lengths are unequal, think that the diversity factor of character and blank character is 1, string length is two string length maximal values.
Illustrate, if the character string of text authentication information is " 124k5t ":
After user's input, through speech recognition, if recognition result character strings is " 124k5t ", then the total variances degree is 0, and the mean difference degree is 0;
If recognition result character strings is " 724k51 ", then according to table 1:
" 7 " and " 1 " diversity factor is 0.7,
" t " and " 1 " diversity factor is 0.7,
Therefore the total variances degree is 1.4, and the mean difference degree is 1.4/6=0.233;
If recognition result character strings is " 164385 ", then the total variances degree is 4, and the mean difference degree is 4/6=0.667;
If recognition result character strings is " 12 ", have only preceding 2 characters in the former character string, the back is a blank character, so diversity factor is 4, string length is 6, the mean difference degree is 4/6=0.667;
This shows that above-mentioned algorithm is actual to be the voice weighted mean of two character strings.Diversity factor is more little, and the whole pronunciation of description character string is close more; The big more description character pronunciation of diversity factor is more different.Diversity factor is 0 to show identically, and 1 expression is fully different.
When judgement, adjudicate according to preset T2:
If Ttext<=T2: be judged as correct reading, system accepts;
If Ttext>T2: be not judged as and correctly read, system's refusal;
In the last example: suppose T2=0.3, then the mean difference degree of " 124k5t " and " 724k51 " is judged to be correct reading less than T2; And " 164385 " and " 12 " the mean difference degree is greater than T2, therefore is judged to be incorrect reading.
At last, according to voiceprint and speech recognition system output, the condition of adjudicating to validated user is:
Tspeaker>=T1 and Ttext<=T2
Otherwise judgement is the disabled user.
Wherein, threshold value T1 and T2 can set through empirical value, make system reach optimum performance.
Corresponding to top method embodiment, the present invention also provides a kind of voice-based authenticate device, and referring to shown in Figure 3, this device can comprise:
Authenticationinformation generation module 510 is used to generate the text authentication information;
Display module 520 is used for said text authentication information is shown to user to be certified and points out user response;
Authentication module 530 is used to receive the voice messaging of user response, and said voice messaging is carried out voiceprint and speech recognition, confirms that according to recognition result whether the user is through authentication.
The device that the embodiment of the invention provided combines voiceprint authentication technology and speech recognition technology, on the one hand; Confirm user's identity through voiceprint; On the other hand, utilize the instant authentication information that generates by user response, whether the response voice messaging content through the speech recognition judges conforms to the authentication content that is generated again; Thereby can avoid the recording deception effectively, solve the single existing potential safety hazard of Application on Voiceprint Recognition technology.
In practical application, the implementation complexity of speech recognition will be higher than and voiceprint, in aforementioned schemes, mainly still utilizes voiceprint to confirm user identity, and the effect of assisting a ruler in governing a country is then played in speech recognition, and purpose is to avoid the recording deception.And voiceprint and speech recognition all be based on voice messaging, therefore, and in order to reduce system complexity; Improve the efficient of authentication; The present invention program only requires the user to import one section voice authentication information, and and if only if this segment information through after the voiceprint, just further carries out speech recognition.According to this scheme, theauthentication module 530 among the present invention specifically can comprise:
The voiceprint submodule is used to receive the voice messaging of user response, utilizes the sound-groove model of training in advance, and said voice messaging is carried out voiceprint;
The speech recognition submodule is used under the situation that voiceprint passes through, further this voice signal being carried out speech recognition;
Authentication sub module is used for voice identification result and said text authentication information are mated, and confirms that according to matching result whether this user is through authentication.
Compare with characteristics such as fingerprint, irises, human phonetic feature can drift about along with the time, in order to follow the tracks of speaker's voice drift; In a preferred embodiment of the invention; After once correct authentication, can utilize the voice messaging of this correct authentication, existing sound-groove model is trained; Correction model parameter, thereby the recognition performance of the system of assurance.
Referring to shown in Figure 4, in one embodiment of the invention, said authenticate device can further include:
Correctingmodule 540 is used under the situation of user through authentication, utilizing said voice signal that said sound-groove model is revised.
Because portable equipments such as mobile phone, panel computer all are equipped with microphone, so certificate scheme provided by the present invention, can be applied to the unlocking operation of portable equipment.But speech recognition system structure and the algorithm and the complicacy thereof of practicability also need realize by cloud computing platform sometimes at present, and present mobile device system resource-constrained.To this situation, the scheme that the present invention proposes is: at speech recognition process, adopt the isolated word identification of little vocabulary, can on the portable terminal of computational resource anxiety, realize speech recognition, need not visit cloud computing server, do not rely on network insertion during release.On the other hand, use the text related algorithm of little vocabulary, improve the demand that discrimination is prone to reach practical application, algorithm is simply effective, is easy on embedded system, realize.
In one embodiment of the invention, said authenticationinformation generation module 510 specifically is used for:
According to the character set that presets, be that unit generates the text authentication information with the character.
Saiddisplay module 520 specifically is used for: the prompting user is a unit with the character, reads said text authentication information.
Said speech recognition submodule is basic recognition unit with character.
Said authentication sub module specifically is used for: according to the diversity factor of the similar pronunciation character of predefined said character set; The character diversity factor of computing voice recognition result and text authentication information; If result of calculation, confirms then that this user is through authentication less than preset threshold value.
In a kind of implementation of the present invention, in the said character set that presets, can also remove the similar character of pronunciation in advance.
In another kind of implementation of the present invention, said authenticationinformation generation module 510 can also specifically be used to the information and the problem identificatioin answer of the problem form that generates;
Said display module specifically is used for: problem is shown to user to be certified and points out the user to answer the problem of said text authentication information.
Said authentication sub module specifically is used for: voice identification result and said problem answers are mated.
For the convenience of describing, be divided into various unit with function when describing above the device and describe respectively.Certainly, when embodiment of the present invention, can in same or a plurality of softwares and/or hardware, realize the function of each unit.
Description through above embodiment can know, those skilled in the art can be well understood to the present invention and can realize by the mode that software adds essential general hardware platform.Based on such understanding; The part that technical scheme of the present invention contributes to prior art in essence in other words can be come out with the embodied of software product; This computer software product can be stored in the storage medium, like ROM/RAM, magnetic disc, CD etc., comprises that some instructions are with so that a computer equipment (can be a personal computer; Server, the perhaps network equipment etc.) carry out the described method of some part of each embodiment of the present invention or embodiment.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and identical similar part is mutually referring to getting final product between each embodiment, and each embodiment stresses all is the difference with other embodiment.Especially, for device or system embodiment, because it is basically similar in appearance to method embodiment, so describe fairly simplely, relevant part gets final product referring to the part explanation of method embodiment.Apparatus and system embodiment described above only is schematic; Wherein said unit as the separating component explanation can or can not be physically to separate also; The parts that show as the unit can be or can not be physical locations also; Promptly can be positioned at a place, perhaps also can be distributed on a plurality of NEs.Can realize the purpose of present embodiment scheme according to the needs selection some or all of module wherein of reality.Those of ordinary skills promptly can understand and implement under the situation of not paying creative work.
The above only is an embodiment of the present invention; Should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the principle of the invention; Can also make some improvement and retouching, these improvement and retouching also should be regarded as protection scope of the present invention.