CN107886957A

Movatterモバイル変換

Info

Publication number: CN107886957A
Application number: CN201711145883.5A
Authority: CN
Inventors: 陈东鹏
Original assignee: Speakin Technologies Co ltd
Current assignee: Speakin Technologies Co ltd
Priority date: 2017-11-17
Filing date: 2017-11-17
Publication date: 2018-04-06

Abstract

The embodiment of the invention discloses a voice awakening method and device combined with voiceprint recognition. The method comprises the steps of judging whether the content of the voice to be verified is a preset awakening word or not through the MFCC characteristics of the voice to be verified, if so, extracting an i-vector through a preset deep neural network model, carrying out voiceprint recognition through the i-vector to confirm the identity of a speaker, acquiring the authority value of the voice to be verified, judging whether the speaker has enough authority or not according to the comparison result of the authority value of the speaker and the authority value corresponding to the preset awakening word corresponding to the voice to be verified, and if so, executing the operation corresponding to the preset awakening word corresponding to the voice to be verified.

Description

The voice awakening method and device of a kind of combination Application on Voiceprint Recognition

Technical field

The present invention relates to the voice awakening method and device of vocal print application field, more particularly to a kind of combination Application on Voiceprint Recognition.

Background technology

Voice, which wakes up, refers to user by saying a default wake-up word to realize that electronic equipment is extensive from holding stateNormal operating conditions is arrived again.Waken up by voice, user can be called out in the case where being inconvenient to click on electronic curtain by voiceFunction of waking up realizes the operation to electronic equipment.

But the current electronic product for possessing voice arousal function lacks the function to speaker's identity identification, can not sentenceDisconnected speaker's identity, therefore authority can not be further opened, it can only realize that some are simple, the equipment without user right is graspedMake.

Therefore, the voice arousal function that result in Current electronic product lacks subscription authentication function, can not carry out user's mirrorWeigh to realize the technical problem of equipment operation that is more complicated and needing user right.

The content of the invention

The invention provides a kind of voice awakening method of combination Application on Voiceprint Recognition and device, solves Current electronic productVoice arousal function lacks subscription authentication function, can not carry out subscription authentication to realize more complicated and need setting for user rightThe technical problem of standby operation.

The invention provides a kind of voice awakening method of combination Application on Voiceprint Recognition, including：

S1：Receive voice to be verified and carry out feature extraction, obtain the MFCC features of voice to be verified；

S2：The MFCC features of voice to be verified in predetermined period are cached；

S3：Whether the content that voice to be verified is judged according to the MFCC features of the voice to be verified of caching is preset wake-upWord, if so, then performing step S4；

S4：The MFCC features of the voice to be verified of caching are inputted in preset deep neural network model, obtained to be testedDemonstrate,prove the i-vector vectors of voice；

S5：The preset i-vector vectors of the i-vector vector sums of voice to be verified are compared, according to comparingWhether the matching fraction gone out obtains the authority credentials of voice to be verified, judge the authority credentials of voice to be verified more than or equal to be verifiedAuthority credentials corresponding to preset wake-up word corresponding to voice, if so, then performing preset wake-up word corresponding with voice to be verifiedCorresponding operation.

Preferably, step S4 is specifically included：

S41：The MFCC features of the voice to be verified of caching are cascaded；

S42：MFCC features after cascade are inputted in preset deep neural network model, obtain the i- of voice to be verifiedVector vectors, and preset deep neural network model is updated to by the MFCC features after cascade by new preset depthNeural network model.

Preferably, step S5 is specifically included：

S51：The preset i-vector vectors of the i-vector vector sums of voice to be verified are subjected to positive naturalization processing, will justThe i-vector vectors that the i-vector vector sums of voice to be verified after naturalization processing are preset pass through outline linear distinction pointAnalysis model is compared, and gets the matching fraction for comparing and drawing；

S52：Matching fraction is added into migration fraction, obtains new matching fraction；

S53：The authority credentials of voice to be verified is obtained according to new matching fraction, judge voice to be verified authority credentials whetherMore than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then performing step S54；

S54：Perform and operated corresponding to preset wake-up word corresponding with voice to be verified.

Preferably, step S5 also includes：Step S55；

Step S53 is specifically included：The authority credentials of voice to be verified is obtained according to new matching fraction, judges voice to be verifiedAuthority credentials whether be more than or equal to authority credentials corresponding to preset wake-up word corresponding to voice to be verified, if so, then performing stepRapid S54, if it is not, then performing step S55；

S55：The prompting of sending permission deficiency.

The invention provides a kind of voice Rouser of combination Application on Voiceprint Recognition, including：

Feature unit, for receiving voice to be verified and carrying out feature extraction, obtain the MFCC features of voice to be verified；

Buffer unit, for being cached to the MFCC features of the voice to be verified in predetermined period；

Wakeup unit, the MFCC features for the voice to be verified according to caching judge voice to be verified content whether bePreset wake-up word, if so, then performing step S4；

Vector location, for the MFCC features of the voice to be verified of caching to be inputted to preset deep neural network modelIn, the i-vector for obtaining voice to be verified is vectorial；

Comparing unit, the i-vector vector preset for the i-vector vector sums by voice to be verified are compared,The matching fraction drawn according to comparing obtains the authority credentials of voice to be verified, judge voice to be verified authority credentials whether be more than orEqual to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then performing corresponding with voice to be verified pre-Operated corresponding to the wake-up word put.

Preferably, vector location specifically includes：

Subelement is cascaded, for the MFCC features of the voice to be verified of caching to be cascaded；

Subelement is obtained, for the MFCC features after cascade to be inputted in preset deep neural network model, acquisition is treatedThe i-vector vectors of voice are verified, and are updated to preset deep neural network model newly by the MFCC features after cascadePreset deep neural network model.

Preferably, comparing unit specifically includes：

Coupling subelement, the i-vector preset for the i-vector vector sums by voice to be verified vectors are just being returnedChange is handled, and the i-vector vectors that the i-vector vector sums of the voice to be verified after positive naturalization is handled are preset pass through outline lineProperty distinction analysis model be compared, get and compare the matching fraction that draws；

Subelement is compensated, for matching fraction to be added into migration fraction, obtains new matching fraction；

Judgment sub-unit, for obtaining the authority credentials of voice to be verified according to new matching fraction, judge voice to be verifiedAuthority credentials whether be more than or equal to authority credentials corresponding to preset wake-up word corresponding to voice to be verified, if so, then triggering is heldRow subelement；

Subelement is performed, for performing operation corresponding to preset wake-up word corresponding with voice to be verified.

Preferably, comparing unit also includes：Prompt subelement；

Judgment sub-unit is specifically used for the authority credentials that voice to be verified is obtained according to new matching fraction, judges language to be verifiedWhether the authority credentials of sound is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then triggeringSubelement is performed, if it is not, then triggering prompting subelement；

Subelement is prompted, the prompting for sending permission deficiency.

As can be seen from the above technical solutions, the present invention has advantages below：

The invention provides a kind of voice awakening method of combination Application on Voiceprint Recognition, including：S1：Voice to be verified is received to go forward side by sideRow feature extraction, obtain the MFCC features of voice to be verified；S2：The MFCC features of voice to be verified in predetermined period are carried outCaching；S3：Whether the content that voice to be verified is judged according to the MFCC features of the voice to be verified of caching is preset wake-up word,If so, then perform step S4；S4：The MFCC features of the voice to be verified of caching are inputted to preset deep neural network modelIn, the i-vector for obtaining voice to be verified is vectorial；S5：By the i-vector that the i-vector of voice to be verified is vectorial and presetVector is compared, and the matching fraction drawn according to comparison obtains the authority credentials of voice to be verified, judges the power of voice to be verifiedWhether limit value is more than or equal to authority credentials corresponding to preset wake-up word corresponding to voice to be verified, if so, then perform with it is to be testedDemonstrate,prove and operated corresponding to wake-up word preset corresponding to voice.

The MFCC features that voice to be verified is first passed through in the present invention judge whether the content in voice to be verified is presetWord is waken up, if it is, extracting i-vector vectors by preset deep neural network model, is entered by i-vector vectorsRow Application on Voiceprint Recognition confirms speaker's identity, obtains the authority credentials of voice to be verified, and according to the authority credentials of speaker with it is to be verifiedPreset wake-up comparative result of authority credentials corresponding to word judges whether speaker has enough authorities corresponding to voice, ifHave, then perform and operated corresponding to preset wake-up word corresponding with voice to be verified, the voice for solving Current electronic product is called outFunction of waking up lacks subscription authentication function, can not carry out subscription authentication to realize equipment operation that is more complicated and needing user rightTechnical problem.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existingThere is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only thisSome embodiments of invention, for those of ordinary skill in the art, without having to pay creative labor, may be used alsoTo obtain other accompanying drawings according to these accompanying drawings.

Fig. 1 is a kind of stream of one embodiment of the voice awakening method of combination Application on Voiceprint Recognition provided in an embodiment of the present inventionJourney schematic diagram；

Fig. 2 is a kind of another embodiment of the voice awakening method of combination Application on Voiceprint Recognition provided in an embodiment of the present inventionSchematic flow sheet；

Fig. 3 is a kind of knot of one embodiment of the voice Rouser of combination Application on Voiceprint Recognition provided in an embodiment of the present inventionStructure schematic diagram.

Embodiment

The embodiments of the invention provide a kind of voice awakening method of combination Application on Voiceprint Recognition and device, solves Current electronicThe voice arousal function of product lacks subscription authentication function, can not carry out subscription authentication to realize more complicated and need user to weighThe technical problem of the equipment operation of limit.

To enable goal of the invention, feature, the advantage of the present invention more obvious and understandable, below in conjunction with the present inventionAccompanying drawing in embodiment, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that disclosed belowEmbodiment be only part of the embodiment of the present invention, and not all embodiment.Based on the embodiment in the present invention, this areaAll other embodiment that those of ordinary skill is obtained under the premise of creative work is not made, belongs to protection of the present inventionScope.

Referring to Fig. 1, the embodiments of the invention provide a kind of implementation of one of voice awakening method of combination Application on Voiceprint RecognitionExample, including：

Step 101：Receive voice to be verified and carry out feature extraction, obtain the MFCC features of voice to be verified；

It should be noted that if desired being waken up to equipment and Application on Voiceprint Recognition is, it is necessary to receive voice to be verified and carry outFeature extraction, obtain the MFCC features of voice to be verified.

Step 102：The MFCC features of voice to be verified in predetermined period are cached；

It should be noted that in order to save the memory space of electronic equipment, only to the voice to be verified in predetermined periodMFCC features are stored, and predetermined period could be arranged to nearest three seconds, as only to the voice to be verified in nearest three secondsMFCC features are stored.

Step 103：Whether the content that voice to be verified is judged according to the MFCC features of the voice to be verified of caching is presetWake-up word, if so, then performing step S4；

It should be noted that when voice to be verified is preset wake-up word, then follow-up step is performed, if language to be verifiedThe non-preset wake-up word of sound, then electronic equipment continue to keep standby.

Step 104：The MFCC features of the voice to be verified of caching are inputted in preset deep neural network model, obtainedThe i-vector vectors of voice to be verified；

It should be noted that when voice to be verified is preset wake-up word, then according to the MFCC features of voice to be verifiedExtract the i-vector vectors of voice to be verified；

Voiceprint Recognition System is come the system of automatic identification speaker's identity, sound groove recognition technology in e category according to the speciality of voiceIn one kind of biometric authentication technology, speaker's identity is verified using voice, this technology has preferable convenience, steadyThe features such as qualitative, measurability and security, as a kind of contactless collection and identification technology, the procurement cost of vocal print compared withIt is low, obtain it is convenient and using simple, before having huge applications in fields such as bank, social security, public security, smart home and mobile paymentsScape.

Step 105：The preset i-vector vectors of the i-vector vector sums of voice to be verified are compared, according to thanThe authority credentials of voice to be verified is obtained to the matching fraction drawn, judges whether the authority credentials of voice to be verified is more than or equal to and treatsAuthority credentials corresponding to wake-up word preset corresponding to voice is verified, if so, then performing step 106；

It should be noted that can by vectorial are compared with preset i-vector vectors of the i-vector of voice to be verifiedTo obtain matching fraction, and the identity of speaker can confirm that according to matching fraction, confirming the identity of speaker can getWhether the authority credentials of speaker, i.e., the authority credentials of voice to be verified, the authority credentials of voice to be verified are more than or equal to language to be verifiedPreset wake-up comparison of authority credentials corresponding to word corresponding to sound can decide whether to perform next step.

Step 106：Perform and operated corresponding to preset wake-up word corresponding with voice to be verified.

It should be noted that when the authority credentials of voice to be verified is more than or equal to wake-up preset corresponding to voice to be verifiedCorresponding to word during authority credentials, then perform corresponding to preset wake-up word corresponding with voice to be verified and operate.

The MFCC features that voice to be verified is first passed through in the present embodiment judge whether the content in voice to be verified is presetWake-up word, if it is, extracting i-vector vectors by preset deep neural network model, pass through i-vector vectorsCarry out Application on Voiceprint Recognition and confirm speaker's identity, obtain the authority credentials of voice to be verified, and according to the authority credentials of speaker with it is to be testedPreset wake-up comparative result of authority credentials corresponding to word judges whether speaker has enough authorities corresponding to card voice, ifHave, then perform and operated corresponding to preset wake-up word corresponding with voice to be verified, the voice for solving Current electronic product is called outFunction of waking up lacks subscription authentication function, can not carry out subscription authentication to realize equipment operation that is more complicated and needing user rightTechnical problem.

It is a kind of one embodiment of the voice awakening method of combination Application on Voiceprint Recognition provided in an embodiment of the present invention above, withIt is a kind of another embodiment of the voice awakening method of combination Application on Voiceprint Recognition provided in an embodiment of the present invention down.

Referring to Fig. 2, the embodiments of the invention provide a kind of another reality of the voice awakening method of combination Application on Voiceprint RecognitionExample is applied, including：

Step 201：Receive voice to be verified and carry out feature extraction, obtain the MFCC features of voice to be verified；

It should be noted that need to such as obtain more high-dimensional MFCC features, then can by carrying out a point window to discrete signal,Fourier transform is carried out after dividing window, and increases the number of wave filter group, then calculates mel cepstrum coefficients and can be obtained by more higher-dimensionMFCC coefficients.

Step 202：The MFCC features of voice to be verified in predetermined period are cached；

It should be noted that in order to save the memory space of electronic equipment, although electronic equipment, which is in, continues listening state,But only the MFCC features of the voice to be verified in predetermined period are stored, predetermined period could be arranged to nearest three seconds,As only stored to the MFCC features of the voice to be verified in nearest three seconds.

Step 203：Whether the content that voice to be verified is judged according to the MFCC features of the voice to be verified of caching is presetWake-up word, if so, then perform step 204；

It should be noted that when voice to be verified is preset wake-up word, then wake-up device is switched to just by holding stateNormal working condition simultaneously performs follow-up step, if the non-preset wake-up word of voice to be verified, electronic equipment continue to keep standbyState.

Step 204：The MFCC features of the voice to be verified of caching are cascaded；

It should be noted that it is that the MFCC features of the voice to be verified of caching are carried out into cascade to refer to time upper adjacent languageSound frame coefficient vector is stitched together as a longer vector.

Step 205：MFCC features after cascade are inputted in preset deep neural network model, obtain voice to be verifiedI-vector vectors, and by the MFCC features after cascade preset deep neural network model is updated to new presetDeep neural network model；

It should be noted that traditional i-vector vectors are to be extracted by mixed Gauss model (GMM), using bigSpeaker's speech data that the speaker of amount is unrelated, channel is unrelated, is pre-processed, and extracts MFCC features, general for trainingBackground model UBM and global disparity matrix T, the algorithm for training UBM are expectation maximization (ExpectationMaximization,EM)；

Training finishes, and preserves universal background model and global disparity matrix, in registration and test phase, to the every of speakerDuan Yuyin, i-vector vectors corresponding to every section of voice are extracted using formula (1)：

M_s=m_u+Tω_s

Wherein, m_uIt is UBM Gaussian mean super vector；T is the global disparity matrix of low-rank, for characterizing global disparity skyBetween information；ω_sIt is the global disparity factor.

Using the i-vector vector extracting methods based on deep neural network (DNN) in the present embodiment, language is usedThe deep neural network (DNN) of sound identification, regards the MFCC features after cascade as phoneme model (Tri-phone) and is input to itBefore in the DNN networks that train, Short Time Speech frame is classified according to posterior probability, each frame and corresponding posterior probability canBe used to train a new UBM, UBM can be so trained by way of supervised learning, instead of in traditional UBM trainingUnsupervised EM algorithms；

I-vector vectors are extracted by deep neural network and possess high accuracy rate, noise robustness and channel robustness, adaptationThe advantages that various texts.

Step 206：The preset i-vector vectors of the i-vector vector sums of voice to be verified are subjected to positive naturalization processing,The i-vector vectors that the i-vector vector sums of voice to be verified after positive naturalization is handled are preset are linearly distinguished by outlineProperty analysis model be compared, get and compare the matching fraction that draws；

It should be noted that positive naturalization processing refers to i-vector vectorial and preset the i-vector of voice to be verifiedVector is changed into identical length；

Preset i-vector vectors can have multiple, represent different users respectively, and such as A electronic equipment, but A existsA, B, C, D and E i-vector vectors are stored in electronic equipment, assign A, B, C, D the authority credentials different with E；

It is compared the i-vector of voice to be verified is vectorial, can gets with each preset i-vector vectorsDifferent matching fractions.

Step 207：Matching fraction is added into migration fraction, obtains new matching fraction；

It should be noted that because the extraneous factor such as noise can to match fraction shifts, be according to extraneous circumstanceCompensated by preset migration fraction, obtain new matching fraction.

Step 208：The authority credentials of voice to be verified is obtained according to new matching fraction, judges the authority credentials of voice to be verifiedWhether authority credentials corresponding to wake-up word preset corresponding to voice to be verified is more than or equal to, if so, step 209 is then performed, ifIt is no, then perform step 210；

It should be noted that according to each new matching fraction, the identity of speaker is can confirm that, such as voice to be verifiedI-vector vectors and E preset i-vector Vectors matching fraction highests and be more than predetermined threshold value, then judge to speak artificiallyE, gets E authority credentials, and judges whether E authority credentials is more than or equal to wake-up word pair preset corresponding to voice to be verifiedThe authority credentials answered, the wake-up word said such as E are to pay, and the authority credentials for paying needs is arranged to 5, but E authority credentials only has 3,The step of so can determining to perform according to this judged result；

It if speaker is F, can also obtain matching fraction, but match fraction and be less than predetermined threshold value, therefore F does not possessAny authority credentials.

Step 209：Perform and operated corresponding to preset wake-up word corresponding with voice to be verified；

It should be noted that if E authority credentials is 5, then is equal to pay the authority credentials needed, then electronic equipment is heldRow delivery operation corresponding with paying this wake-up word.

Step 210：The prompting of sending permission deficiency.

If it should be noted that E authority credentials be 3, less than pay need authority credentials, then by word, light orThe prompting of the mode such as person's loudspeaker sending permission deficiency.

Sound groove recognition technology in e is as a kind of identity validation technology of Remote Non-touch, with reference to across media interactive communications with answeringWith service platform, there is huge applications prospect in fields such as bank, social security, public security, smart home, mobile payments, by vocal printIdentification technology with voice arousal function be combined improve user interactive voice experience, veritably contact-free can realize that identity is recognizedCard；

And traditional mixed Gauss model is instead of in sound groove recognition technology in e with deep neural network, there is accuracy rateHigh, noise robustness and channel robustness, the advantages that various texts is adapted to, and support cross-platform and across channel deployment, had and calculate speedThe advantages such as fast, low in energy consumption, occupying system resources are few are spent, can easily be deployed on mobile electronic device, there is wide applicationProspect；

The MFCC features that voice to be verified is first passed through in the present embodiment judge whether the content in voice to be verified is presetWake-up word；If it is, extracting i-vector vectors by preset deep neural network model, pass through i-vector vectorsCarry out Application on Voiceprint Recognition and confirm speaker's identity, obtain the authority credentials of voice to be verified, and according to the authority credentials of speaker with it is to be testedPreset wake-up comparative result of authority credentials corresponding to word judges whether speaker has enough authorities corresponding to card voice, ifHave, then perform and operated corresponding to preset wake-up word corresponding with voice to be verified, the voice for solving Current electronic product is called outFunction of waking up lacks subscription authentication function, can not carry out subscription authentication to realize equipment operation that is more complicated and needing user rightTechnical problem.

It is a kind of another embodiment of the voice awakening method of combination Application on Voiceprint Recognition provided in an embodiment of the present invention above,It is a kind of one embodiment of the voice Rouser of combination Application on Voiceprint Recognition provided in an embodiment of the present invention below.

Referring to Fig. 3, the embodiments of the invention provide a kind of implementation of one of voice Rouser of combination Application on Voiceprint RecognitionExample, including：

Feature unit 301, for receiving voice to be verified and carrying out feature extraction, the MFCC for obtaining voice to be verified is specialSign；

Buffer unit 302, for being cached to the MFCC features of the voice to be verified in predetermined period；

Wakeup unit 303, the MFCC features for the voice to be verified according to caching judge that the content of voice to be verified isNo is preset wake-up word, if so, then performing step S4；

Vector location 304, for the MFCC features of the voice to be verified of caching to be inputted to preset deep neural network mouldIn type, the i-vector vectors of voice to be verified are obtained；

Comparing unit 305, the i-vector vector preset for the i-vector vector sums by voice to be verified are comparedIt is right, according to the authority credentials for comparing the matching fraction acquisition voice to be verified drawn, judge whether the authority credentials of voice to be verified is bigIn or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then performing corresponding with voice to be verifiedPreset wake-up word corresponding to operate.

Preferably, vector location 304 specifically includes：

Subelement 3041 is cascaded, for the MFCC features of the voice to be verified of caching to be cascaded；

Subelement 3042 is obtained, for the MFCC features after cascade to be inputted in preset deep neural network model, is obtainedThe i-vector vectors of voice to be verified are taken, and are updated preset deep neural network model by the MFCC features after cascadeFor new preset deep neural network model.

Preferably, comparing unit 305 specifically includes：

Coupling subelement 3051, the i-vector preset for the i-vector vector sums by voice to be verified vectors are carried outPositive naturalization processing, the preset i-vector vectors of the i-vector vector sums of the voice to be verified after positive naturalization processing are passed through generalIt is compared by linear distinction analysis model, gets the matching fraction for comparing and drawing；

Subelement 3052 is compensated, for matching fraction to be added into migration fraction, obtains new matching fraction；

Judgment sub-unit 3053, for obtaining the authority credentials of voice to be verified according to new matching fraction, judge to be verifiedWhether the authority credentials of voice is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then touchingHair performs subelement 3054；

Subelement 3054 is performed, for performing operation corresponding to preset wake-up word corresponding with voice to be verified.

Preferably, comparing unit 305 also includes：Prompt subelement 3055；

Judgment sub-unit 3053 is specifically used for the authority credentials that voice to be verified is obtained according to new matching fraction, judges to be testedWhether the authority credentials for demonstrate,proving voice is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, thenTriggering performs subelement 3054, if it is not, then triggering prompting subelement 3055；

Subelement 3055 is prompted, the prompting for sending permission deficiency.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,The specific work process of device and unit, the corresponding process in preceding method embodiment is may be referred to, will not be repeated here.

In several embodiments provided herein, it should be understood that disclosed apparatus and method, it can be passed throughIts mode is realized.For example, device embodiment described above is only schematical, for example, the division of the unit, onlyOnly a kind of division of logic function, there can be other dividing mode when actually realizing, such as multiple units or component can be tiedAnother system is closed or is desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown or discussedMutual coupling or direct-coupling or communication connection can be the INDIRECT COUPLINGs or logical by some interfaces, device or unitLetter connection, can be electrical, mechanical or other forms.

The unit illustrated as separating component can be or may not be physically separate, show as unitThe part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multipleOn NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs's.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can alsoThat unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated listMember can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.

If the integrated unit is realized in the form of SFU software functional unit and is used as independent production marketing or useWhen, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantiallyThe part to be contributed in other words to prior art or all or part of the technical scheme can be in the form of software productsEmbody, the computer software product is stored in a storage medium, including some instructions are causing a computerEquipment (can be personal computer, server, or network equipment etc.) performs the complete of each embodiment methods described of the present inventionPortion or part steps.And foregoing storage medium includes：USB flash disk, mobile hard disk, read-only storage (ROM, Read-OnlyMemory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journeyThe medium of sequence code.

Described above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although with reference to beforeEmbodiment is stated the present invention is described in detail, it will be understood by those within the art that：It still can be to precedingState the technical scheme described in each embodiment to modify, or equivalent substitution is carried out to which part technical characteristic；And theseModification is replaced, and the essence of appropriate technical solution is departed from the spirit and scope of various embodiments of the present invention technical scheme.

Claims

A kind of 1. voice awakening method of combination Application on Voiceprint Recognition, it is characterised in that including：
S1：Receive voice to be verified and carry out feature extraction, obtain the MFCC features of voice to be verified；
S2：The MFCC features of voice to be verified in predetermined period are cached；
S3：Whether the content that voice to be verified is judged according to the MFCC features of the voice to be verified of caching is preset wake-up word,If so, then perform step S4；
S4：The MFCC features of the voice to be verified of caching are inputted in preset deep neural network model, obtain language to be verifiedThe i-vector vectors of sound；
S5：The preset i-vector vectors of the i-vector vector sums of voice to be verified are compared, drawn according to comparisonThe authority credentials that fraction obtains voice to be verified is matched, judges whether the authority credentials of voice to be verified is more than or equal to voice to be verifiedAuthority credentials corresponding to corresponding preset wake-up word, if so, then perform with voice to be verified corresponding to preset wake-up word it is correspondingOperation.
2. the voice awakening method of a kind of combination Application on Voiceprint Recognition according to claim 1, it is characterised in that step S4 is specificIncluding：
S41：The MFCC features of the voice to be verified of caching are cascaded；
S42：MFCC features after cascade are inputted in preset deep neural network model, obtain the i- of voice to be verifiedVector vectors, and preset deep neural network model is updated to by the MFCC features after cascade by new preset depthNeural network model.
3. the voice awakening method of a kind of combination Application on Voiceprint Recognition according to claim 1, it is characterised in that step S5 is specificIncluding：
S51：The preset i-vector vectors of the i-vector vector sums of voice to be verified are subjected to positive naturalization processing, by positive naturalizationThe i-vector vectors that the i-vector vector sums of voice to be verified after processing are preset analyze mould by the linear distinction of outlineType is compared, and gets the matching fraction for comparing and drawing；
S52：Matching fraction is added into migration fraction, obtains new matching fraction；
S53：The authority credentials of voice to be verified is obtained according to new matching fraction, judges whether the authority credentials of voice to be verified is more thanOr equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then performing step S54；
S54：Perform and operated corresponding to preset wake-up word corresponding with voice to be verified.
4. the voice awakening method of a kind of combination Application on Voiceprint Recognition according to claim 3, it is characterised in that step S5 is also wrappedInclude：Step S55；
Step S53 is specifically included：The authority credentials of voice to be verified is obtained according to new matching fraction, judges the power of voice to be verifiedWhether limit value is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then performing stepS54, if it is not, then performing step S55；
S55：The prompting of sending permission deficiency.
A kind of 5. voice Rouser of combination Application on Voiceprint Recognition, it is characterised in that including：
Feature unit, for receiving voice to be verified and carrying out feature extraction, obtain the MFCC features of voice to be verified；
Buffer unit, for being cached to the MFCC features of the voice to be verified in predetermined period；
Wakeup unit, the MFCC features for the voice to be verified according to caching judge whether the content of voice to be verified is presetWake-up word, if so, then performing step S4；
Vector location, for the MFCC features of the voice to be verified of caching to be inputted in preset deep neural network model, obtainTake the i-vector vectors of voice to be verified；
Comparing unit, the i-vector vector preset for the i-vector vector sums by voice to be verified are compared, according toThe authority credentials that the matching fraction drawn obtains voice to be verified is compared, judges whether the authority credentials of voice to be verified is more than or equal toAuthority credentials corresponding to preset wake-up word corresponding to voice to be verified, if so, then performing corresponding with voice to be verified presetWake up and operated corresponding to word.
6. the voice Rouser of a kind of combination Application on Voiceprint Recognition according to claim 5, it is characterised in that vector location hasBody includes：
Subelement is cascaded, for the MFCC features of the voice to be verified of caching to be cascaded；
Subelement is obtained, for the MFCC features after cascade to be inputted in preset deep neural network model, is obtained to be verifiedThe i-vector vectors of voice, and by the MFCC features after cascade be updated to preset deep neural network model new pre-The deep neural network model put.
7. the voice Rouser of a kind of combination Application on Voiceprint Recognition according to claim 5, it is characterised in that comparing unit hasBody includes：
Coupling subelement, the i-vector preset for the i-vector vector sums by voice to be verified vectors are carried out at positive naturalizationReason, the i-vector vectors that the i-vector vector sums of the voice to be verified after positive naturalization is handled are preset pass through outline linear zonePoint property analysis model is compared, and gets the matching fraction for comparing and drawing；
Subelement is compensated, for matching fraction to be added into migration fraction, obtains new matching fraction；
Judgment sub-unit, for obtaining the authority credentials of voice to be verified according to new matching fraction, judge the power of voice to be verifiedWhether limit value is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then triggering performs sonUnit；
Subelement is performed, for performing operation corresponding to preset wake-up word corresponding with voice to be verified.
8. the voice Rouser of a kind of combination Application on Voiceprint Recognition according to claim 7, it is characterised in that comparing unit is alsoIncluding：Prompt subelement；
Judgment sub-unit is specifically used for the authority credentials that voice to be verified is obtained according to new matching fraction, judges voice to be verifiedWhether authority credentials is more than or equal to authority credentials corresponding to wake-up word preset corresponding to voice to be verified, if so, then triggering performsSubelement, if it is not, then triggering prompting subelement；
Subelement is prompted, the prompting for sending permission deficiency.