CN108766441A

Movatterモバイル変換

Info

Publication number: CN108766441A
Application number: CN201810533494.8A
Authority: CN
Inventors: 卢敬光; 刘海模; 吴晓东; 刘雄; 肖虎; 马鸿飞
Original assignee: Guangdong Sheng General Technology Co Ltd
Current assignee: ZHUHAI RONGTAI ELECTRONICS Co.,Ltd.
Priority date: 2018-05-29
Filing date: 2018-05-29
Publication date: 2018-11-06
Anticipated expiration: 2038-05-29
Also published as: CN108766441B

Abstract

A kind of sound control method based on offline Application on Voiceprint Recognition and speech recognition, includes the following steps：It receives and wakes up word sound, and extract the first phonetic feature and the first vocal print feature for waking up word sound；Check that extracted the first phonetic feature and the first vocal print feature whether respectively all with wake-up word sound template and vocal print template matches, obtain the first vocal print code corresponding with the first vocal print feature；Order word sound is received, and extracts the second vocal print feature of order word sound；It checks that the second vocal print is characterized in no and vocal print template matches, obtains the second vocal print code corresponding with the second vocal print feature；It checks whether the first vocal print code and the second vocal print code are identical, extracts the second phonetic feature of order word；Check the second extracted phonetic feature whether with order word sound template matches, the phonetic code of the second phonetic feature of acquisition and based on the corresponding control instruction of phonetic code generation.

Description

A kind of sound control method and device based on offline Application on Voiceprint Recognition and speech recognition

Technical field

This application involves the technical fields of speaker verification, more particularly to one kind being based on offline Application on Voiceprint Recognition and speech recognitionSound control method and realize correlation technique device.

Background technology

It is increasingly ripe and universal with speech recognition technology in recent years, control instruction is sent out to electronic equipment by voiceFunction be successfully applied in many electric consumers (such as Siri functions of iPhone).It is above-mentioned to be based on voiceElectronic equipment control technology be related to speaker verification (Speaker Verification) technology in speech recognition technology,Confirm related voice whether by specified user (such as the holder of mobile phone or people with electronic equipment access rightMember) it sends out, and confirm the control instruction corresponding to voice content.

Compared with traditional electronic equipment control technology, although voice-based electronic equipment control technology can be userMore friendly easily electronic equipment interactive operation mode is provided and (such as is manually entered password without user to verify its right to useLimit)；But prior art is easy due to voice itself by other conditions (such as the sounding of background noise and speaker itselfSituation etc.) influence caused by unstable and voice content determination, and it is converted to electronic equipment by natural language canThe computer language of receiving often requires that one external data base for being used for semantic conversion of relevant device on-line joining process.These problemsAll improve voice-based electronic equipment control technology use cost.

Invention content

The purpose of the application is to solve the deficiencies in the prior art, is provided a kind of based on offline Application on Voiceprint Recognition and speech recognitionSound control method and device can obtain and realize voice-based electronic equipment control offline, and reduce outer strip as possibleThe effect of influence of the part to speech recognition.

To achieve the goals above, the application proposes a kind of voice control based on offline Application on Voiceprint Recognition and speech recognition firstMethod processed comprising following steps：It receives and wakes up word sound, and extract the first phonetic feature and the first vocal print for waking up word soundFeature；Check extracted the first phonetic feature and the first vocal print feature whether respectively all with wake up word sound template and vocal print mouldPlate matches, and terminates if mismatching, otherwise obtains the first vocal print code corresponding with the first vocal print feature；Receive order wordSound, and extract the second vocal print feature of order word sound；Check that the second vocal print is characterized in no and vocal print template matches, if mismatchingThen terminate, otherwise obtains the second vocal print code corresponding with the second vocal print feature；Check the first vocal print code and the second vocal printWhether code is identical, terminates if differing, otherwise the second phonetic feature of extraction order word；Check the second extracted voiceFeature whether with order word sound template matches, if mismatch if terminate, otherwise obtain the second phonetic feature phonetic code simultaneouslyCorresponding control instruction is generated based on above-mentioned phonetic code.Wherein, word sound template, order word sound template and vocal print template are waken upIt is stored in local.

In a preferred embodiment of the above method, it is to pass through instruction to wake up word sound template and order word sound templatePractice the speech production collected in advance.

In a preferred embodiment of the above method, vocal print template is at least use collected in advance by trainingThe speech production at family.

In a preferred embodiment of the above method, the correspondence of voice and phonetic code is customized.

Further, in above-mentioned preferred embodiment, the correspondence of voice and phonetic code is stored in local.

In a preferred embodiment of the above method, it is by dynamic to wake up word sound template and order word sound templateMade of the collected voice training of state update.

In a preferred embodiment of the above method, vocal print template is to update an at least designated person by dynamicVoice training made of.

Secondly, the application also proposes a kind of phonetic controller based on offline Application on Voiceprint Recognition and speech recognition, including withLower module：First receiving module wakes up word sound for receiving, and extracts the first phonetic feature and the first sound for waking up word soundLine feature；First checks module, for check the first extracted phonetic feature and the first vocal print feature whether respectively all with call outAwake word sound template and vocal print template matches, terminate if mismatching, otherwise obtain corresponding with the first vocal print feature firstVocal print code；Second receiving module for receiving order word sound, and extracts the second vocal print feature of order word sound；SecondIt checks module, for checking that the second vocal print is characterized in no and vocal print template matches, terminates if mismatching, otherwise obtain and secondThe corresponding second vocal print code of vocal print feature；Third checks module, for checking the first vocal print code and the second vocal print codeIt is whether identical, the flow if differing, on the contrary the second phonetic feature of extraction order word；Directive generation module, for checkingSecond phonetic feature of extraction whether with order word sound template matches, terminate if mismatching, otherwise it is special to obtain the second voiceThe phonetic code of sign simultaneously generates corresponding control instruction based on above-mentioned phonetic code.Wherein, word sound template, order word sound are waken upTemplate and vocal print template are stored in local.

In a preferred embodiment of above-mentioned apparatus, it is to pass through instruction to wake up word sound template and order word sound templatePractice the speech production collected in advance.

In a preferred embodiment of above-mentioned apparatus, vocal print template is at least use collected in advance by trainingThe speech production at family.

In a preferred embodiment of above-mentioned apparatus, the correspondence of voice and phonetic code is customized.

In a preferred embodiment of above-mentioned apparatus, it is by dynamic to wake up word sound template and order word sound templateMade of the collected voice training of state update.

In a preferred embodiment of above-mentioned apparatus, vocal print template is to update an at least designated person by dynamicVoice training made of.

Finally, disclosed herein as well is a kind of computer readable storage medium, it is stored thereon with computer instruction, the instructionIt is realized such as the step of aforementioned any one of them method when being executed by processor.

The application's has the beneficial effect that：By local sound template and vocal print template, the body of speaker is easily confirmedThe content of part and voice, improves voice-based electronic equipment ease for use.

Description of the drawings

Fig. 1 show the flow of one embodiment of the sound control method based on offline Application on Voiceprint Recognition and speech recognitionFigure；

Fig. 2 is shown based on the embodiment in Fig. 1, the configuration schematic diagram of relevant device；

Fig. 3 show the schematic diagram of user-defined voice and the correspondence of phonetic code；

Fig. 4 show the module knot of one embodiment of the phonetic controller based on offline Application on Voiceprint Recognition and speech recognitionComposition.

Specific implementation mode

The technique effect of the design of the application, concrete structure and generation is carried out below with reference to embodiment and attached drawing clearChu, complete description, to be completely understood by the purpose, scheme and effect of the application.It should be noted that the case where not conflictingUnder, the features in the embodiments and the embodiments of the present application can be combined with each other.The identical attached drawing mark used everywhere in attached drawingNote indicates same or analogous part.

Herein, unless otherwise expressly stated, it refers to by the access right user with electronic equipment to wake up word soundIt is sent out, the identity for verifying user and the sound for starting electronic equipment control flow.Only when wake-up word note closesWhen certain condition, relevant equipment just can further receive the instruction of other voices.Correspondingly, order word sound refer to byAfter confirming relevant wake-up word sound, user further sends out phonetic order, for being assigned to electronic equipment with practical specialDetermine the voice of meaning.

Fig. 1 show the flow of one embodiment of the sound control method based on offline Application on Voiceprint Recognition and speech recognitionFigure.The above method includes the following steps：It receives and wakes up word sound, and extract the first phonetic feature and the first sound for waking up word soundLine feature；Check extracted the first phonetic feature and the first vocal print feature whether respectively all with wake up word sound template and vocal printTemplate matches terminate if mismatching, otherwise obtain the first vocal print code corresponding with the first vocal print feature；Receive order wordVoice, and extract the second vocal print feature of order word sound；It is no with vocal print template matches to check that the second vocal print is characterized in, if notWith then terminating, otherwise obtain the second vocal print code corresponding with the second vocal print feature；Check the first vocal print code and the rising toneWhether line code is identical, terminates if differing, otherwise the second phonetic feature of extraction order word；Check the second extracted languageSound feature whether with order word sound template matches, if mismatch if terminate, otherwise obtain the second phonetic feature phonetic codeAnd corresponding control instruction is generated based on above-mentioned phonetic code.As shown in the schematic diagram in Fig. 2, word sound template, order word are waken upSound template and vocal print template are stored in local.Wherein, when appearance unmatched situation (either phonetic feature or soundLine feature is mismatched with local corresponding templates are stored in) when, above method flow can all force to terminate, and be back to and wait for userInput wakes up the stage of word sound again.

Wherein, the first vocal print feature and the second vocal print feature are all based on the stability of human speech, to collected voicePhysical quantity (such as sound quality, the duration of a sound, loudness of a sound and pitch etc.) formed voice TuPu method parameter.Further, in the applicationOne embodiment in, vocal print template be by extraction with electronic equipment access right multiple users vocal print feature, andThe vocal print feature of multiple users is grouped serialization according to the electronic equipment access right of user.Vocal print feature can be led by this technologyConventional algorithm in domain is analyzed the sound of user and is formed, and the application not limits this.

To reduce the operand of system, in one embodiment of the application, after receiving wake-up word sound, only the first languageSound feature is extracted.When the first phonetic feature and a certain group of phase in multigroup characteristic parameter recorded in wake-up word sound templateWhen matching, the first vocal print feature for waking up word is just extracted；Otherwise, if the first phonetic feature and the institute in wake-up word sound templateWhen thering is characteristic parameter all to mismatch, then user is prompted to send out wake-up word, to be matched again.Relevant matching judgement (such asFirst phonetic feature and the matching of wake-up word sound template, the matching of the first vocal print feature and vocal print template and the second phonetic featureThe matching of order word sound template) may be used this field ordinary matches algorithm realize, the application to this not limit.

In one embodiment of the application, waking up word sound template and order word sound template is received in advance by trainingThe speech production of collection.Specifically, user can repeatedly input in advance wakes up word sound and order word sound, passes through TrainingIt improves and wakes up word sound template and order word sound template, to improve the accuracy rate of speech recognition.

With reference to the schematic diagram of user-defined voice shown in Fig. 3 and the correspondence of phonetic code, the application'sIn one embodiment, can according to actual electronic equipment and the used language of user, sets itself voice and phonetic codeCorrespondence.At this point, since user can be with the correspondence between self-defined voice and phonetic code, so being sent out to electronic equipmentConcrete syntax is unrelated used by the specific instruction gone out sends out order word sound with user.For example, by changing order word soundThe characteristic parameter of the voice of labeled specific meanings in template so that the voice of English or Chinese and specified phonetic code phaseAssociation is corresponding control instruction to realize reception and convert with the order word sound that English or Chinese are sent out.

Further, in above-described embodiment of the application, the correspondence of voice and phonetic code is also stored in thisGround, so that can realize voice-based electronic equipment control without connecting network.

In one embodiment of the application, it is to update institute by dynamic to wake up word sound template and order word sound templateMade of the voice training of collection.User can wake up word sound and order word sound by regularly updating, and improve electronic equipmentSafety coefficient, avoid abusing the electronic equipment by other personnel without access right.

The modular structure of one embodiment of the phonetic controller based on offline Application on Voiceprint Recognition and speech recognition shown in Fig. 4Figure.Shown device comprises the following modules：First receiving module wakes up word sound for receiving, and extracts and wake up the of word soundOne phonetic feature and the first vocal print feature；First checks module, for checking extracted the first phonetic feature and the first vocal printWhether feature terminates if mismatching respectively all with wake-up word sound template and vocal print template matches, otherwise obtains and the first soundThe corresponding first vocal print code of line feature；Second receiving module for receiving order word sound, and extracts order word soundSecond vocal print feature；Second checks module, for checking that the second vocal print is characterized in no and vocal print template matches, is tied if mismatchingBeam, on the contrary obtain the second vocal print code corresponding with the second vocal print feature；Third checks module, for checking for the first vocal print generationWhether code is identical as the second vocal print code, terminates if differing, otherwise the second phonetic feature of extraction order word；Instruction generatesModule, for check the second extracted phonetic feature whether with order word sound template matches, if mismatch if terminate, it is on the contraryIt obtains the phonetic code of the second phonetic feature and corresponding control instruction is generated based on above-mentioned phonetic code.Such as the schematic diagram in Fig. 2It is shown, it wakes up word sound template, order word sound template and vocal print template and is stored in local.Wherein, when mismatchingThe case where (either phonetic feature or vocal print feature and be stored in local corresponding templates mismatch) when, above-mentioned apparatus all canIt is back to and user is waited for input the state for waking up word sound again.

To reduce the operand of system, in one embodiment of the application, after receiving wake-up word sound, first receivesModule only extracts the first phonetic feature.When the first inspection module determines the first phonetic feature and wakes up recorded in word sound templateMultigroup characteristic parameter in a certain group when matching, the first receiving module just extracts the first vocal print feature for waking up word；Otherwise,If the first inspection module judges that all characteristic parameters in the first phonetic feature and wake-up word sound template all mismatch, firstReceiving module then prompts user to send out wake-up word, to be matched again.Relevant matching judgement (such as the first phonetic feature withWake up the matching and the second phonetic feature order word sound template of the matching, the first vocal print feature of word sound template with vocal print templateMatching) may be used this field ordinary matches algorithm realize, the application to this not limit.

With reference to the schematic diagram of user-defined voice shown in Fig. 3 and the correspondence of phonetic code, the application'sIn one embodiment, directive generation module can be according to actual electronic equipment and the used language of user, sets itself voiceWith the correspondence of phonetic code.At this point, due to user can with the correspondence between self-defined voice and phonetic code, soConcrete syntax is unrelated used by the specific instruction sent out to electronic equipment sends out order word sound with user.For example, by repairingChange in order word sound template the characteristic parameter of the voice of labeled specific meanings so that the voice of English or Chinese with it is specifiedPhonetic code it is associated, be that corresponding control refers to realize reception and convert the order word sound sent out with English or ChineseIt enables.

Although the description of the present application is quite detailed and especially several embodiments are described, it is notAny of these details or embodiment or any specific embodiments are intended to be limited to, but it is by reference to appended that should be considered asClaim considers that the prior art provides the possibility explanation of broad sense for these claims, to effectively cover the applicationPreset range.In addition, the application is described with inventor's foreseeable embodiment above, its purpose is to be provided withDescription, and those equivalent modifications that the application can be still represented to the unsubstantiality change of the application still unforeseen at present.

Claims

1. a kind of sound control method based on offline Application on Voiceprint Recognition and speech recognition, which is characterized in that include the following steps：

It receives and wakes up word sound, and extract the first phonetic feature and the first vocal print feature for waking up word sound；

Check extracted the first phonetic feature and the first vocal print feature whether respectively all with wake up word sound template and vocal print mouldPlate matches, and terminates if mismatching, otherwise obtains the first vocal print code corresponding with the first vocal print feature；

Order word sound is received, and extracts the second vocal print feature of order word sound；

It checks that the second vocal print is characterized in no and vocal print template matches, terminates if mismatching, otherwise obtain and the second vocal print featureCorresponding second vocal print code；

Check whether the first vocal print code and the second vocal print code are identical, terminate if differing, on the contrary the of extraction order wordTwo phonetic features；

Check extracted the second phonetic feature whether with order word sound template matches, terminate if mismatching, otherwise obtainThe phonetic code of second phonetic feature simultaneously generates corresponding control instruction based on the phonetic code；

Wherein, word sound template, order word sound template and vocal print template are waken up and is stored in local.

2. according to the method described in claim 1, it is characterized in that, it is to pass through to wake up word sound template and order word sound templateThe speech production that training is collected in advance.

3. according to the method described in claim 1, it is characterized in that, vocal print template is at least one collected in advance by trainingThe speech production of user.

4. according to the method described in claim 1, it is characterized in that, voice and the correspondence of phonetic code are customized.

5. according to the method described in claim 4, it is characterized in that, voice and the correspondence of phonetic code are stored in local.

6. according to the method described in claim 1, it is characterized in that, it is to pass through to wake up word sound template and order word sound templateDynamic updates made of collected voice training.

7. according to the method described in claim 1, it is characterized in that, vocal print template is to update an at least nominator by dynamicMade of the voice training of member.

8. a kind of phonetic controller based on offline Application on Voiceprint Recognition and speech recognition, which is characterized in that comprise the following modules：

First receiving module wakes up word sound for receiving, and extracts the first phonetic feature and the first vocal print for waking up word soundFeature；

First checks module, for checks the first extracted phonetic feature and the first vocal print feature whether respectively all with wake-up wordSound template and vocal print template matches terminate if mismatching, otherwise obtain the first vocal print corresponding with the first vocal print featureCode；

Second receiving module for receiving order word sound, and extracts the second vocal print feature of order word sound；

Second checks module, for checking that the second vocal print is characterized in no and vocal print template matches, terminates if mismatching, otherwise obtainsTake the second vocal print code corresponding with the second vocal print feature；

Third checks module, for checking whether the first vocal print code and the second vocal print code are identical, terminate if differing, insteadExtraction order word the second phonetic feature；

Directive generation module, for check the second extracted phonetic feature whether with order word sound template matches, if notWith then terminating, otherwise the phonetic code for obtaining the second phonetic feature simultaneously generates corresponding control instruction based on the phonetic code；

9. a kind of computer readable storage medium, is stored thereon with computer instruction, it is characterised in that the instruction is held by processorThe step of method as described in any one of claim 1 to 7 is realized when row.