CN108257604A

Movatterモバイル変換

Info

Publication number: CN108257604A
Application number: CN201711293919.4A
Authority: CN
Inventors: 梁承飞
Original assignee: Ping An Puhui Enterprise Management Co Ltd
Current assignee: Ping An Puhui Enterprise Management Co Ltd
Priority date: 2017-12-08
Filing date: 2017-12-08
Publication date: 2018-07-06
Anticipated expiration: 2037-12-08
Also published as: CN108257604B

Abstract

The present invention is suitable for technical field of information processing, provides a kind of audio recognition method, terminal device and computer readable storage medium.Wherein, a kind of audio recognition method, by monitoring reception to the first input voice and the comparison result of the first reference voice that prestores, and when comparison result is matching, voice joint tool is called to splice the first input voice equipped with label stamp with the first reference voice, obtains the second reference voice；When detecting predetermined registration operation again, second reference voice section is divided into the first segmentation voice and the second segmentation voice, first segmentation voice is compared with the second input voice, and when the first segmentation voice and the second input voice mismatch, the second input voice is carried out vocal print feature with the second segmentation voice to compare, it realizes and reference voice is updated during speech recognition, avoid because speech recognition is inaccurate caused by the sound natural trend of people the phenomenon that.

Description

Audio recognition method, terminal device and computer readable storage medium

Technical field

The invention belongs to a kind of technical field of information processing more particularly to audio recognition method, terminal device and computersReadable storage medium storing program for executing.

Background technology

Biological information identification technology is widely used in Information Authentication business, and existing biological identification technology includes：PeopleFace identification, fingerprint recognition, iris recognition and speech recognition etc..

Existing speech recognition schemes are after advance typing reference voice, by the voice and the reference that acquire user in real timeVoice carries out acoustic ratio pair, so as to complete speech recognition according to comparison result.Due to people sound can with advancing age andVariation changes with natural physiological change, when the sound of people is because physiological change during natural trend, if also with beforeThe reference voice of advance typing refers to, then can lead to the phenomenon that voice recognition result is inaccurate occur.

Invention content

In view of this, an embodiment of the present invention provides a kind of audio recognition method, terminal device and computer-readable storagesMedium, to avoid because speech recognition is inaccurate caused by the sound variation of people the phenomenon that.

The first aspect of the embodiment of the present invention provides a kind of audio recognition method, including：

If detecting the predetermined registration operation for carrying out speech recognition, it is defeated to monitor first received in the predetermined registration operationEnter voice and the comparison result of the first reference voice to prestore；

If the comparison result matches for the described first input voice with first reference voice, to described firstInput voice setting flag stamp；

Voice joint tool is called to carry out the first input voice equipped with the label stamp with first reference voiceSplicing, obtains the second reference voice；

When detecting the predetermined registration operation again, the second reference voice section is divided into the first segmentation voice and secondVoice is segmented, the first segmentation voice is corresponding with first reference voice, and described second is segmented voice and described first defeatedEnter voice correspondence；

By received in the predetermined registration operation detected again second input voice with described first segmentation voice intoRow vocal print feature compares；

It is preset if the first matching rate that the second input voice is compared with the described first segmentation voice is less than firstDescribed second input voice is then carried out vocal print feature with the described second segmentation voice and compared by matching rate；

If the second input voice and described second the second matching rate for comparing of segmentation voice are equal to or more than theTwo preset matching rates, it is determined that the second input voice matches with second reference voice.

The second aspect of the embodiment of the present invention provides a kind of speech recognition equipment, is performed described in first aspect including being used forThe unit of method.

The third aspect of the embodiment of the present invention provides a kind of terminal device, including：It memory, processor and is stored inIn the memory and the computer program that can run on the processor, when the processor performs the computer programThe step of realizing above-mentioned first aspect the method.

The fourth aspect of the embodiment of the present invention provides a kind of computer readable storage medium, and the computer program is locatedThe step of reason device realizes above-mentioned first scheme the method when performing.

The embodiment of the present invention is by when detecting the predetermined registration operation for carrying out speech recognition, monitoring in the predetermined registration operationIn the first input voice for receiving and the comparison result of the first reference voice that prestores, it is right and when comparison result is matchingFirst input voice setting flag stamp, by calling voice joint tool by the first input voice and the first ginseng equipped with label stampWritten comments on the work, etc of public of officials sound is spliced, and obtains the second reference voice；When detecting predetermined registration operation again, the second reference voice section is divided intoOne segmentation voice and the second segmentation voice, and the detect again second input voice and the first segmentation voice progress vocal print is specialSign compares, the first obtained matching rate, and is decided whether according to first matching rate and the comparison result of the first preset matching rateThe second input voice is carried out vocal print feature with the second segmentation voice to compare, is realized during speech recognition to referring to languageSound is updated, and reference voice is enable to change with the sound natural trend of same identified person, is avoided because people'sThe phenomenon that speech recognition is inaccurate caused by sound variation.

Description of the drawings

It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior artNeeded in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description be only the present invention someEmbodiment, for those of ordinary skill in the art, without having to pay creative labor, can also be according to theseAttached drawing obtains other attached drawings.

Fig. 1 is a kind of realization flow diagram of audio recognition method provided in an embodiment of the present invention；

Fig. 2 is a kind of realization flow diagram for audio recognition method that another embodiment of the present invention provides；

Fig. 3 is a kind of structure diagram of speech recognition equipment provided in an embodiment of the present invention；

Fig. 4 is the schematic diagram of terminal device provided in an embodiment of the present invention.

Specific embodiment

In being described below, in order to illustrate rather than in order to limit, it is proposed that such as tool of particular system structure, technology etcBody details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specificallyThe present invention can also be realized in the other embodiments of details.In other situations, it omits to well-known system, device, electricityRoad and the detailed description of method, in case unnecessary details interferes description of the invention.

In order to illustrate technical solutions according to the invention, illustrated below by specific embodiment.

It is that the embodiment of the present invention provides a kind of realization flow chart of audio recognition method, voice as shown in Figure 1 referring to Fig. 1Recognition methods may include：

S11：If detecting the predetermined registration operation for carrying out speech recognition, received in the predetermined registration operation is monitoredOne input voice and the comparison result of the first reference voice to prestore.

In step s 11, for carry out the predetermined registration operation of speech recognition can be when opening default application in terminal intoThe trigger action of row speech recognition or the triggering of triggering progress speech cipher input manually in default application process is usedOperation, then either by the way that in the step of request for obtaining permission, triggering current operation interface jumps to speech recognition interfaceTrigger action, wherein, trigger action can be by triggering the single clicing on of speech recognition button, clicking twice or Continued depressionTo realize.

It should be noted that the first input voice and the comparison result of preset first reference voice reflect that this voice is knownWhether first not targeted its source of input voice is identical with the source of the first reference voice.

In the present embodiment, if the source of the first input voice is identical with the source of the first reference voice, then it represents that canTo be updated according to the first input voice to the first reference voice.If first input its source of voice and the first reference voiceSource differ, then it represents that cannot according to first input voice the first reference voice is updated.Pass through preset monitoredThe the first input voice received in operation and the comparison result of the first reference voice to prestore, can be by obtaining speech recognitionDisplay interface content later judges whether the display interface content is consistent with the interface content corresponding to the predetermined registration operation,And then determine the first input voice and the comparison result of the first reference voice to prestore.

For carrying out resource payment in default application in a manner that user is by speech recognition, corresponding to predetermined registration operationInterface content is paid successfully for prompting resource, and the display interface content after speech recognition got does not complete branch for promptingWhen paying or paying failure, it is determined that the first input voice is mismatch with the comparison result of the first reference voice to prestore.When obtainingWhen display interface content after the speech recognition got pay successfully for prompting, it is determined that first inputs voice and prestore theThe comparison result of one reference voice is matching.

In other embodiments, the first input voice received and the first reference voice to prestore in preset monitored operationComparison result, can also be by judging whether there is newly-increased task or whether having newly-increased process or by obtaining newly-increased appointThe content of business or newly-increased process determines the first input voice and the comparison result of the first reference voice to prestore.

It is illustrated with speech recognition for Account Logon.

For example, predetermined registration operation inputs voice for typing first carries out login authentication, when the first input voice and prestoreWhen the comparison result of first reference voice is matching, then load and show login interface corresponding during language identification success.When the first input voice and the comparison result of the first reference voice to prestore is mismatch, then any operation is not done.Therefore it is logicalIt crosses and judges whether there is loading and the display newly-increased task of login interface or newly-increased process, and then can determine the first input voiceWith the comparison result of the first reference voice to prestore.

S12：If the comparison result matches for the described first input voice with first reference voice, to describedFirst input voice setting flag stamp.

In step s 12, label stamp inputs voice for label first and reflects that the source of the first input voice is legal,That is the source of the first input voice is identical with the source of the first reference voice.

It should be noted that the first input voice and the first reference voice respectively include corresponding data head agreementWith voice data content, wherein, data head agreement at least can be used in reacting the file size of voice, voice content duration andPhonetic matrix.

In the present embodiment, can be in the data corresponding to the first input voice to the first input voice setting flag stampSetting flag accords with or to setting flag keyword in the filename of the first input voice in head agreement.

As a kind of mode in the cards, step S12 can include：If the comparison result is the described first input languageSound matches with first reference voice, and the text of the file format and first reference voice of the first input voicePart form is consistent, then to the described first input voice setting flag stamp；If the comparison result for described first input voice withFirst reference voice matches, and the tray of the file format and first reference voice of the first input voiceFormula is inconsistent, then calls phonetic matrix crossover tool that the described first input voice is converted to the text with first reference voiceThe consistent target input voice of part form, and to target input voice setting flag stamp.

It is understood that phonetic matrix crossover tool can be existing voice document format converter tools, with firstInput voice is MP3 format, for the first reference voice is WAV forms, by calling voice document format converter tools to theThe suffix name of one input voice is modified, and " .MP3 " is revised as " .WAV ", makes the first input voice and the first reference voiceIt can be spliced and play.

S13：Voice joint tool is called by the first input voice equipped with the label stamp and first reference voiceSpliced, obtain the second reference voice.

In step s 13, the first input voice equipped with the label stamp is spliced with first reference voice,Specifically, the voice data of the first input voice equipped with label stamp and the voice data of the first reference voice are spliced,And spliced voice data and new data head agreement are packaged, and then obtain the second reference voice.

It should be noted that the voice of the first input voice is included at least in voice data corresponding to the second reference voiceThe voice data of data and the first reference voice.

In the present embodiment, voice joint tool is for splicing the first input voice and the first reference that are equipped with label stampThe script file of voice, wherein, the object-oriented of script file be the first input voice equipped with label stamp voice data andThe voice data of first reference voice.

It should be noted that voice joint is different from phonetic synthesis, voice joint is will be at least two voice documentsVoice data is spliced, and can be that voice data splices end to end or voice data segment interception is spliced, wherein, voice numberDuring according to splicing end to end, by the initial time that voice data after splicing is determined in the voice data at least two voice documentsIt stabs position, splice point timestamp position and terminates timestamp position；During segmentation interception splicing, by least two voices textVoice data in part is segmented, and obtains multiple voice segments to be spliced, splices plan according to the voice data section made an appointmentMultiple voice segments to be spliced are slightly spliced into complete voice data.

It is understood that during calling voice joint tool progress voice joint obtains the second reference voice, languageScript file corresponding to sound splicing tool can be write by existing logical language, i.e., in practical applications, languageThe object-oriented of sound splicing tool can also be the process of voice joint.

S14：When detecting the predetermined registration operation again, by the second reference voice section be divided into the first segmentation voice andSecond segmentation voice.

In step S14, the first segmentation voice is corresponding with the first reference voice, the second segmentation voice and the first input voiceIt is corresponding.

In the present embodiment, the first segmentation voice language referred to corresponding to the first segmentation voice corresponding with the first reference voiceSound data are the voice data corresponding to the first reference voice, i.e., the voice content of the first segmentation voice and the first reference voiceContent is identical, likewise, the second segmentation voice is corresponding with the first input voice namely the second voice content and for being segmented voiceThe content of one input voice is identical.

It should be noted that the second reference voice section is divided into the first segmentation voice and the second segmentation voice, it can be withIt is that setting is for distinguishing the mark point of the first segmentation voice and the second segmentation voice in the second reference voice section, according to the first ginsengWritten comments on the work, etc of public of officials sound and the first input corresponding voice data length of voice, set corresponding mark position, realize the second referenceVoice segments are divided into the first segmentation voice and the second segmentation voice.

S15：By the received in the predetermined registration operation detected again second input voice and the described first segmentation languageSound carries out vocal print feature comparison.

In step S15, the second input voice with the first segmentation voice carries out vocal print feature and compares to be by drawing and theThe corresponding target vocal print figure of two input voices the first vocal print figure corresponding with the first segmentation voice, by the vocal print in target vocal print figureAfter feature extracts, it is compared using the first vocal print figure as reference.

It should be noted that vocal print figure can be broadband vocal print figure, narrowband vocal print figure, amplitude vocal print figure, contour vocal printAt least one of figure, time wave spectrum vocal print figure and section vocal print figure, wherein, section vocal print figure includes section broadband vocal print figureWith section narrowband vocal print figure.Broadband vocal print figure is used to reflect that the frequency of language pushes away at any time with intensity in voice with narrowband vocal print figureThe variation characteristic of shifting；Amplitude vocal print figure, contour vocal print figure and time wave spectrum vocal print figure are used to reflect voice intensity or acoustic pressureThe feature changed over time；When section vocal print is sometime put for being reflected in, intensity of acoustic wave and frequecy characteristic.

In all embodiments of the invention, when carrying out vocal print figure between voice and comparing, two vocal print figures being comparedClassification it is consistent.

In the present embodiment, the second input voice is compared with the first segmentation voice progress vocal print feature, can be specifically ratioTo second input voice with first be segmented voice in same word, word vocal print in homogenous characteristics.For example, by selecting respectivelyThe frequency values of two input voices and the resonance wave crest in vocal print figure in the first segmentation voice are compared, and then find out the second inputIdentical point and discrepancy between voice and the first segmentation voice.

It is understood that in practical applications, vocal print feature is carried out between the second input voice and the first segmentation voiceWhen comparing, when being compared based on different vocal print figures, it can also be different with the characteristic point made comparisons, it is existing in the prior artThe scheme that specific vocal print feature compares, therefore details are not described herein again.

S16：If the first matching rate that the second input voice is compared with the described first segmentation voice is less than firstDescribed second input voice is then carried out vocal print feature with the described second segmentation voice and compared by preset matching rate.

In step s 16, result is compared for reflecting that the second input voice is segmented voice with first in the first matching rate.First preset matching rate is that matching rate when matching is minimum for describing the second input voice to be segmented voice comparison result with firstStandard.

It should be noted that in all embodiments of the invention, two voices that matching rate is compared for descriptionBetween similarity degree, that is, match rate score it is higher, then two voices for carrying out vocal print feature comparison are more similar, and belong to phasePossibility with source is also bigger.

In the present embodiment, the described second input voice is carried out vocal print feature with the described first segmentation voice to compare, is hadBody realization method is similar to step S15, therefore details are not described herein again.

It is understood that in other embodiments of the invention, audio recognition method further includes arranged side by side with step S16First step arranged side by side：If the second input voice is equal to or greatly with described first the first matching rate for comparing of segmentation voiceIn the first preset matching rate, it is determined that the second input voice matches with second reference voice.

It should be noted that step S16 with the above-mentioned first step execution arranged side by side sequence in no particular order, when performing stepJust the first step arranged side by side is no longer performed after S16, step S16 is just no longer performed after the first step arranged side by side is performed.

S17：If the second input voice is equal to or greatly with described second the second matching rate for comparing of segmentation voiceIn the second preset matching rate, it is determined that the second input voice matches with second reference voice.

In step S17, result is compared for reflecting that the second input voice is segmented voice with second in the second matching rate.Second preset matching rate is that matching rate when matching is minimum for describing the second input voice to be segmented voice comparison result with secondStandard.

In the present embodiment, the described second input voice is carried out vocal print feature with the described second segmentation voice to compare, is hadBody realization method is similar to step S15, therefore details are not described herein again.

In the present embodiment, audio recognition method further includes the second arranged side by side step arranged side by side with step S17：If described secondInput voice is less than the second preset matching rate with described second the second matching rate for comparing of segmentation voice, it is determined that described theTwo input voices are mismatched with second reference voice；Wherein, the first preset matching rate and second preset matchingRate is equal.

It should be noted that step S17 with the above-mentioned second step execution arranged side by side sequence in no particular order, when performing stepJust the second step arranged side by side is no longer performed after S167, step S17 is just no longer performed after the second step arranged side by side is performed.

Above as can be seen that the embodiment of the present invention is by when detecting the predetermined registration operation for carrying out speech recognition, supervisingReceived in the predetermined registration operation first is listened to input voice and the comparison result of the first reference voice to prestore, and tie comparingWhen fruit is matching, to the first input voice setting flag stamp, by calling voice joint tool defeated by first equipped with label stampEnter voice with the first reference voice to be spliced, obtain the second reference voice；When detecting predetermined registration operation again, by the second ginsengWritten comments on the work, etc of public of officials segment is divided into the first segmentation voice and the second segmentation voice, and the detect again second input voice and first are segmentedVoice progress vocal print feature comparison, the first obtained matching rate, and according to first matching rate and the ratio of the first preset matching rateRelatively result decides whether that the second input voice is carried out vocal print feature with the second segmentation voice compares, and realizes in speech recognitionReference voice is updated in the process, reference voice is enable to become with the sound natural trend of same identified personChange, avoid because speech recognition is inaccurate caused by the sound variation of people the phenomenon that.

Referring to Fig. 2, Fig. 2 is that another embodiment of the present invention provides a kind of schematic flow diagram of audio recognition method.Such as Fig. 2 institutesShow, the audio recognition method that another embodiment of the present invention provides may include：

S21：If detecting the predetermined registration operation for carrying out speech recognition, received in the predetermined registration operation is monitoredOne input voice and the comparison result of the first reference voice to prestore.

In the step s 21, for carry out the predetermined registration operation of speech recognition can be when opening default application in terminal intoThe trigger action of row speech recognition or the triggering of triggering progress speech cipher input manually in default application process is usedOperation, then either by the way that in the step of request for obtaining permission, triggering current operation interface jumps to speech recognition interfaceTrigger action, wherein, trigger action can be by triggering the single clicing on of speech recognition button, clicking twice or Continued depressionTo realize.

It is understood that in the present embodiment, the specific implementation of step S21 and step S11 in a upper embodimentSpecific implementation it is identical, referring specifically to the explanation to step S11, details are not described herein again.

S22：If the comparison result matches for the described first input voice with first reference voice, to describedFirst input voice setting flag stamp.

In step S22, label stamp inputs voice for label first and reflects that the source of the first input voice is legal,That is the source of the first input voice is identical with the source of the first reference voice.

It is understood that in the present embodiment, the specific implementation of step S22 and step S12 in a upper embodimentSpecific implementation it is identical, referring specifically to the explanation to step S12, details are not described herein again.

S23：Voice joint tool is called by the first input voice equipped with the label stamp and first reference voiceSpliced, obtain the second reference voice.

In step S23, voice joint tool includes：Data head protocol tool and data content splicing tool；First is defeatedEnter voice and the first reference voice and include data head agreement and voice data content.

As a kind of mode in the cards of the present embodiment, step S23 can specifically include：Call the data head agreementTool respectively splits the described first input voice with first reference voice, obtains the first input voice and corresponds toThe first data head agreement and the first voice data content and the second corresponding second data head agreement of input voice andSecond speech data content；It is assisted according to the new data head of the first data head agreement and the second data head protocol generationView；The data content splicing tool is called to spell the first voice data content and the second speech data contentIt connects, obtains new voice data content；The new data head agreement and the new voice data content are packaged, obtainedTo second reference voice.

In the present embodiment, data head protocol tool can be pre-set WavHeader.h scripts, by performing footIn this first data head agreement and the first voice data content and the second data are obtained for parsing the content of data head agreementHead agreement and second speech data content.

In the WavHeader.h scripts, to numerical digit and voice number where the parameters in voice data head agreementIt is defined and distinguishes according to digit shared by content, by running the WavHeader.h scripts, and then by the first input voice and theOne reference voice is split, and obtains the corresponding first data head agreement of the first input voice and the first voice data content, withAnd the second corresponding second data head agreement of input voice and second speech data content.

In the present embodiment, the first data head agreement and the second data head agreement be respectively used to the first input voice of description withThe contents such as the voice duration of the first reference voice, voice size.It is given birth to according to the first data head agreement and the second data head agreementInto new data head agreement described voice when a length of first input the sum of voice duration and the first reference voice duration, newlyThe described voice size of data head agreement for first input the sum of voice size and the first reference voice size.

Data content splicing tool can include voice data and read tool DataRead and voice data write-in toolDataWriter。

It should be noted that voice data reads tool DataRead and voice data write-in tool DataWriterTo be packaged and be read via corresponding binary data stream.

In the present embodiment, new data head agreement and new voice data content are packaged, obtain the second referenceVoice, wherein, corresponding namely new new of various voice data parameters and new voice data content in new data head agreementData head agreement in voice duration information and voice size information with the duration of voice data content and in the same size.

S24：When detecting the predetermined registration operation again, by the second reference voice section be divided into the first segmentation voice andSecond segmentation voice, the first segmentation voice is correspondings with first reference voice, and described second is segmented voice and described theOne input voice corresponds to.

S25：By the received in the predetermined registration operation detected again second input voice and the described first segmentation languageSound carries out vocal print feature comparison.

S26：If the first matching rate that the second input voice is compared with the described first segmentation voice is less than firstDescribed second input voice is then carried out vocal print feature with the described second segmentation voice and compared by preset matching rate.

S27：If the second input voice is equal to or greatly with described second the second matching rate for comparing of segmentation voiceIn the second preset matching rate, it is determined that the second input voice matches with second reference voice.

It should be noted that in the present embodiment in the specific implementation of step S24 to step S27 and a upper embodimentStep S14 to step S17 is corresponded, and referring specifically to step S14 to the description content of step S17, details are not described herein again.

It is understood that in the present embodiment, step S27 can be just only performed when performing step S26.

In the present embodiment, step S28 and step S29 are further included after step S27.

Step S28：The count value in counter is set as I_n, wherein, I_n>=0 and I_n=I_n-1+ 1, work as I_nEqual to defaultDuring with threshold value N, to the described second input voice setting flag stamp.

In step S28, I_n, n and N be integer, n >=1, N ＞ 1.

In the present embodiment, in each speech recognition process, it only will appear one time second input voice and the second referenceThe result that voice matches.When having carried out n times speech recognition, and voice recognition result is the second input voice and second every timeIt is that reference voice matches as a result, then setting count value in counter as I_n.If it is determined that it is described second input voice withSecond reference voice matches, then sets the count value in counter as I_n, wherein, I_n>=0 and I_n=I_n-1+1.Work as I_nDengWhen preset matching threshold value N, then show the second input voice and the second reference voice to match be event for necessary event, that is, arrangeIn addition to the second input voice and the second reference voice matches the possibility for being event for incident.

In practical applications, preset matching threshold value can be depending on the period that voice changes or according to the second referenceDepending on the number that voice is compared by contrast standard the most, then either used depending on duration according to the second reference voice.

It should be noted that the second input voice setting flag stamp, label stamp inputs voice simultaneously for label secondReflect that the source of the second input voice is legal, i.e. the source of the second input voice is identical with the source of the second reference voice.

S29：Voice joint tool is called by the second input voice equipped with the label stamp and second reference voiceMiddle target language segment is spliced, and obtains third reference voice, and the target voice is the corresponding language of the described first input voiceSegment.

In step S29, the second reference voice includes the corresponding voice data content of the first input voice and the first ginsengThe corresponding voice data content of written comments on the work, etc of public of officials sound.Target voice is the corresponding voice segments of the first input voice in the second reference voice.

It should be noted that in order to avoid causing the content of reference voice continuous as the number of speech recognition constantly risesIncrease, call voice joint tool that will be carried out equipped with target language segment in the second input voice of label stamp and the second reference voiceDuring splicing, target voice is the corresponding voice segments of the first input voice in the second reference voice.

In the present embodiment, by setting preset matching threshold value N, and the second input voice and described the is being determinedTwo reference voices match, then set the count value in counter as I_n, wherein, I_n>=0 and I_n=I_n-1+ 1, work as I_nEqual to defaultDuring matching threshold N, to the described second input voice setting flag stamp, and voice joint tool is called by second equipped with label stampInput voice is spliced with target language segment in the second reference voice, obtains third reference voice so that for speech recognitionReference voice can be continuously updated, ensuring that reference voice can change with user voice while change, also keeping awayExempt from because of the phenomenon that matching rate tapers into caused by the update of reference voice.

By setting preset matching threshold value N, and the second input voice and the second reference voice phase is being determinedMatching, then set the count value in counter as I_n, wherein, I_n>=0 and I_n=I_n-1+ 1, work as I_nDuring equal to preset matching threshold value N,To the described second input voice setting flag stamp, and voice joint tool is called by the second input voice equipped with label stamp and theTarget language segment is spliced in two reference voices, obtains third reference voice so that for the reference voice energy of speech recognitionIt is enough continuously updated, while ensuring that reference voice can change with user voice variation, it is thus also avoided that because of referenceThe phenomenon that matching rate tapers into caused by the update of voice.

Referring to Fig. 3, Fig. 3 is a kind of schematic block diagram of speech recognition equipment provided in an embodiment of the present invention.The present embodimentA kind of speech recognition equipment 3 include：Monitoring unit 31, the first indexing unit 32, the first concatenation unit 33, segmenting unit 34,First comparing unit 35, the second comparing unit 36 and determination unit 37.Specifically：

If monitoring unit 31 for detecting the predetermined registration operation for carrying out speech recognition, monitors the predetermined registration operationIn the first input voice for receiving and the comparison result of the first reference voice that prestores.

For example, if monitoring unit 31 detects the predetermined registration operation for carrying out speech recognition, the predetermined registration operation is monitoredIn the first input voice for receiving and the comparison result of the first reference voice that prestores.

First indexing unit 32, if being the described first input voice and first reference voice for the comparison resultMatch, then to the described first input voice setting flag stamp.

If for example, 32 comparison result of the first indexing unit is the described first input voice and first reference voiceMatch, then to the described first input voice setting flag stamp.

Further, the voice joint tool includes data head protocol tool and data content splicing tool；DescribedOne input voice includes data head agreement and voice data content with first reference voice.

First indexing unit 32 is specifically used for, call the data head protocol tool respectively to described first input voice withFirst reference voice is split, and obtains the corresponding first data head agreement of the first input voice and the first voice numberAccording to content and the corresponding second data head agreement of the second input voice and second speech data content；According to describedOne data head agreement and the new data head agreement of the second data head protocol generation；Call the data content splicing tool willThe first voice data content is spliced with the second speech data content, obtains new voice data content；By instituteIt states new data head agreement to be packaged with the new voice data content, obtains second reference voice.

For example, the first indexing unit 32 call the data head protocol tool respectively to described first input voice with it is describedFirst reference voice is split, and is obtained in the corresponding first data head agreement of the first input voice and the first voice dataHold and described second inputs the corresponding second data head agreement of voice and second speech data content；According to the described first numberAccording to head agreement and the new data head agreement of the second data head protocol generation；The data content splicing tool is called by described inFirst voice data content is spliced with the second speech data content, obtains new voice data content；It will be described newData head agreement be packaged with the new voice data content, obtain second reference voice.

First concatenation unit 33, for calling voice joint tool by the first input voice equipped with the label stamp and instituteIt states the first reference voice to be spliced, obtains the second reference voice.

For example, the first concatenation unit 33 calls voice joint tool by the first input voice equipped with the label stamp and instituteIt states the first reference voice to be spliced, obtains the second reference voice.

Segmenting unit 34, for when detecting the predetermined registration operation again, the second reference voice section to be divided intoOne segmentation voice and the second segmentation voice, the first segmentation voice is corresponding with first reference voice, second segmentationVoice is corresponding with the described first input voice.

For example, segmenting unit 34 is divided into when detecting the predetermined registration operation again, by the second reference voice sectionOne segmentation voice and the second segmentation voice, the first segmentation voice is corresponding with first reference voice, second segmentationVoice is corresponding with the described first input voice.

First comparing unit 35, for will be received in the predetermined registration operation detected again second input voice withThe first segmentation voice carries out vocal print feature comparison.

For example, the first comparing unit 35 by received in the predetermined registration operation detected again second input voice withThe first segmentation voice carries out vocal print feature comparison.

Second comparing unit 36, if compared for the described second input voice with the described first segmentation voice firstMatching rate is less than the first preset matching rate, then the described second input voice and the described second segmentation voice is carried out vocal print feature ratioIt is right.

If for example, the 36 second input voice of the second comparing unit and described first is segmented voice compares firstMatching rate is less than the first preset matching rate, then the described second input voice and the described second segmentation voice is carried out vocal print feature ratioIt is right.

Determination unit 37, if for the described second input voice with described second be segmented that voice compares second matchRate is equal to or more than the second preset matching rate, it is determined that the second input voice matches with second reference voice.

If for example, determination unit 37 described second input voice with described second be segmented that voice compares second matchRate is equal to or more than the second preset matching rate, it is determined that the second input voice matches with second reference voice.

Optionally, speech recognition equipment 30 can also include：Second indexing unit 38 and the second concatenation unit 39.SpecificallyGround：

Second indexing unit 38, for setting the count value in counter as I_n, wherein, I_n>=0 and I_n=I_n-1+ 1, work as I_nDuring equal to preset matching threshold value N, to the described second input voice setting flag stamp, wherein, n >=1, N ＞ 1.

For example, the second indexing unit 38 sets count value in counter as I_n, wherein, I_n>=0 and I_n=I_n-1+ 1, work as I_nDuring equal to preset matching threshold value N, to the described second input voice setting flag stamp, wherein, n >=1, N ＞ 1.

Second concatenation unit 39, for calling voice joint tool by the second input voice equipped with the label stamp and instituteIt states target language segment in the second reference voice to be spliced, obtains third reference voice, the target voice is defeated for described firstEnter the corresponding voice segments of voice.

For example, the second concatenation unit 39 calls voice joint tool by the second input voice equipped with the label stamp and instituteIt states target language segment in the second reference voice to be spliced, obtains third reference voice, the target voice is defeated for described firstEnter the corresponding voice segments of voice.

It is a kind of terminal schematic block diagram that another embodiment of the present invention provides referring to Fig. 4.In the present embodiment as depictedTerminal device 400 can include：One or more processors 401；One or more input equipments 402, it is one or more defeatedGo out equipment 403 and memory 404.Above-mentioned processor 401, input equipment 402, output equipment 403 and memory 404 pass through bus405 connections.Memory 402 is for storing, and computer program includes instruction, what processor 401 was stored by calling memory 402Computer program performs following operation：

Processor 401 is used for：If detecting the predetermined registration operation for carrying out speech recognition, monitor in the predetermined registration operationThe the first input voice received and the comparison result of the first reference voice to prestore.

Processor 401 is used for：If the comparison result is the described first input voice and the first reference voice phaseMatch, then to the described first input voice setting flag stamp.

Processor 401 is used for：Voice joint tool is called by the first input voice and described the equipped with the label stampOne reference voice is spliced, and obtains the second reference voice.

Processor 401 is used for：When detecting the predetermined registration operation again, the second reference voice section is divided into firstVoice and the second segmentation voice are segmented, the first segmentation voice is corresponding with first reference voice, the second segmentation languageSound is corresponding with the described first input voice.

Processor 401 is used for：By received in the predetermined registration operation detected again second input voice with it is describedFirst segmentation voice carries out vocal print feature comparison.

Processor 401 is used for：If the second input voice with described first be segmented that voice compares first matchRate is less than the first preset matching rate, then the described second input voice is carried out vocal print feature with the described second segmentation voice compares.

Processor 401 is used for：If the second input voice with described first be segmented that voice compares first matchRate is equal to or more than the first preset matching rate, it is determined that the second input voice matches with second reference voice.

Processor 401 is additionally operable to：If second that the second input voice is compared with the described second segmentation voiceIt is equal to or more than the second preset matching rate with rate, it is determined that the second input voice matches with second reference voice.

Processor 401 is additionally operable to：If second that the second input voice is compared with the described second segmentation voiceIt is less than the second preset matching rate with rate, it is determined that the second input voice is mismatched with second reference voice；Wherein, instituteIt is equal with the second preset matching rate to state the first preset matching rate.

Processor 401 is additionally operable to：The count value in counter is set as I_n, wherein, I_n>=0 and I_n=I_n-1+ 1, work as I_nDengWhen preset matching threshold value N, to the described second input voice setting flag stamp, wherein, n >=1, N ＞ 1.

Processor 401 is additionally operable to：Call voice joint tool by equipped with it is described label stamp second input voice with it is describedTarget language segment is spliced in second reference voice, obtains third reference voice, and the target voice is the described first inputThe corresponding voice segments of voice.

Processor 401 is specifically used for：It is described to call voice joint tool by the first input voice equipped with the label stampSpliced with first reference voice, obtain the second reference voice, including：

The data head protocol tool is called to be torn open respectively to the described first input voice with first reference voicePoint, obtain the first corresponding first data head agreement of input voice and the first voice data content and described second defeatedEnter the corresponding second data head agreement of voice and second speech data content；

According to the first data head agreement and the new data head agreement of the second data head protocol generation；

The data content splicing tool is called by the first voice data content and the second speech data contentSpliced, obtain new voice data content；

The new data head agreement and the new voice data content are packaged, obtain described second with reference to languageSound.

It should be appreciated that in embodiments of the present invention, alleged processor 501 can be central processing unit (CentralProcessing Unit, CPU), which can also be other general processors, digital signal processor (DigitalSignal Processor, DSP), application-specific integrated circuit (Application Specific Integrated Circuit,ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logicDevice, discrete gate or transistor logic, discrete hardware components etc..General processor can be microprocessor or this atIt can also be any conventional processor etc. to manage device.

Input equipment 402 can include Trackpad, fingerprint and adopt sensor (for acquiring the finger print information of user and fingerprintDirectional information), microphone etc., output equipment 403 can include display (LCD etc.), loud speaker etc..

The memory 404 can include read-only memory and random access memory, and to processor 401 provide instruction andData.The a part of of memory 404 can also include nonvolatile RAM.For example, memory 404 can also be depositedStore up the information of device type.

In the specific implementation, processor 401, input equipment 402, the output equipment 403 described in the embodiment of the present invention canPerform the realization described in the first embodiment and second embodiment of a kind of audio recognition method provided in an embodiment of the present inventionMode also can perform the realization method of the described equipment of the embodiment of the present invention, and details are not described herein.

A kind of computer readable storage medium, the computer-readable storage medium are provided in another embodiment of the inventionMatter is stored with computer program, and the computer program is realized when being executed by processor：

The computer program is also realized when being executed by processor：

If the second input voice and described first the first matching rate for comparing of segmentation voice are equal to or more than theOne preset matching rate, it is determined that the second input voice matches with second reference voice.

The computer program is also realized when being executed by processor：

It is preset if the second matching rate that the second input voice is compared with the described second segmentation voice is less than secondMatching rate, it is determined that the second input voice is mismatched with second reference voice；Wherein, the first preset matching rateIt is equal with the second preset matching rate.

The computer program is also realized when being executed by processor：

The count value in counter is set as I_n, wherein, I_n>=0 and I_n=I_n-1+ 1, work as I_nEqual to preset matching threshold value NWhen, to the described second input voice setting flag stamp, wherein, n >=1, N ＞ 1；

Voice joint tool is called by the second input voice equipped with the label stamp and mesh in second reference voicePoster segment is spliced, and obtains third reference voice, and the target voice is the corresponding voice segments of the described first input voice.

The computer readable storage medium can be the internal storage unit of the equipment described in aforementioned any embodiment, exampleSuch as the hard disk or memory of computer.The computer readable storage medium can also be the External memory equipment of the equipment, exampleSuch as the plug-in type hard disk being equipped in the equipment, intelligent memory card (Smart Media Card, SMC), secure digital (SecureDigital, SD) card, flash card (Flash Card) etc..Further, the computer readable storage medium can also be wrapped bothThe internal storage unit for including the equipment also includes External memory equipment.The computer readable storage medium is described for storingOther programs and data needed for computer program and the equipment.The computer readable storage medium can be also used for temporarilyWhen store the data that has exported or will export.

Those of ordinary skill in the art may realize that each exemplary lists described with reference to the embodiments described hereinMember and algorithm steps can be realized with the combination of electronic hardware, computer software or the two, in order to clearly demonstrate hardwareWith the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.ThisA little functions are performed actually with hardware or software mode, specific application and design constraint depending on technical solution.SpeciallyIndustry technical staff can realize described function to each specific application using distinct methods, but this realization is notIt is considered as beyond the scope of this invention.

It is apparent to those skilled in the art that for convenience of description and succinctly, foregoing description is setStandby and unit specific work process can refer to the corresponding process in preceding method embodiment, and details are not described herein.

In several embodiments provided herein, it should be understood that disclosed system, apparatus and method, it can be withIt realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unitIt divides, only a kind of division of logic function can have other dividing mode, such as multiple units or component in actual implementationIt may be combined or can be integrated into another system or some features can be ignored or does not perform.In addition, shown or beg forThe mutual coupling, direct-coupling or communication connection of opinion can be the INDIRECT COUPLING by some interfaces, device or unitOr communication connection or electricity, the connection of mechanical or other forms.

The unit illustrated as separating component may or may not be physically separate, be shown as unitThe component shown may or may not be physical unit, you can be located at a place or can also be distributed to multipleIn network element.Some or all of unit therein can be selected according to the actual needs to realize the embodiment of the present inventionPurpose.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can alsoIt is that each unit is individually physically present or two or more units integrate in a unit.It is above-mentioned integratedThe form that hardware had both may be used in unit is realized, can also be realized in the form of SFU software functional unit.

If the integrated unit is realized in the form of SFU software functional unit and is independent product sale or usesWhen, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme of the present invention is substantiallyThe part to contribute in other words to the prior art or all or part of the technical solution can be in the form of software productsIt embodies, which is stored in a storage medium, is used including some instructions so that a computerEquipment (can be personal computer, server or the network equipment etc.) performs the complete of each embodiment the method for the present inventionPortion or part steps.And aforementioned storage medium includes：USB flash disk, mobile hard disk, read-only memory (ROM, Read-OnlyMemory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journeyThe medium of sequence code.

The above description is merely a specific embodiment, but protection scope of the present invention is not limited thereto, anyThose familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replaceIt changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with rightIt is required that protection domain subject to.

Claims

1. a kind of audio recognition method, which is characterized in that including：

If detecting the predetermined registration operation for carrying out speech recognition, the first input language received in the predetermined registration operation is monitoredSound and the comparison result of the first reference voice to prestore；

If the comparison result matches for the described first input voice with first reference voice, inputted to described firstVoice setting flag is stabbed；

Voice joint tool is called to splice the first input voice equipped with the label stamp with first reference voice,Obtain the second reference voice；

When detecting the predetermined registration operation again, the second reference voice section is divided into the first segmentation voice and the second segmentationVoice, the first segmentation voice is corresponding with first reference voice, the second segmentation voice and the described first input languageSound corresponds to；

By the received in the predetermined registration operation detected again second input voice and the described first segmentation voice carry out soundLine aspect ratio pair；

If the first matching rate that the second input voice is compared with the described first segmentation voice is less than the first preset matchingDescribed second input voice is then carried out vocal print feature with the described second segmentation voice and compared by rate；

If the second matching rate that the second input voice is compared with the described second segmentation voice is equal to or more than second in advanceIf matching rate, it is determined that the second input voice matches with second reference voice.

2. audio recognition method as described in claim 1, which is characterized in that the voice joint tool includes data head agreementTool and data content splicing tool；The first input voice includes data head agreement and language with first reference voiceSound data content；

It is described that voice joint tool is called to carry out the first input voice equipped with the label stamp with first reference voiceSplicing, obtains the second reference voice, including：

The data head protocol tool is called to be split respectively to the described first input voice with first reference voice, is obtainedTo the corresponding first data head agreement of the described first input voice and the first voice data content and the second input voiceCorresponding second data head agreement and second speech data content；

The data content splicing tool is called to carry out the first voice data content and the second speech data contentSplicing, obtains new voice data content；

The new data head agreement and the new voice data content are packaged, obtain second reference voice.

3. audio recognition method as described in claim 1, which is characterized in that the predetermined registration operation that will be detected againIn the second input voice for receiving and the described first segmentation voice after vocal print feature compares, further include：

If the first matching rate that the second input voice is compared with the described first segmentation voice is equal to or more than first in advanceIf matching rate, it is determined that the second input voice matches with second reference voice.

4. audio recognition method as described in any one of claims 1 to 3, which is characterized in that described to determine second inputAfter voice matches with second reference voice, further include：

The count value in counter is set as I_n, wherein, I_n>=0 and I_n=I_n-1+ 1, work as I_nDuring equal to preset matching threshold value N, to instituteThe second input voice setting flag stamp is stated, wherein, n >=1, N ＞ 1；

Voice joint tool is called by the second input voice equipped with the label stamp and target language in second reference voiceSegment is spliced, and obtains third reference voice, and the target voice is the corresponding voice segments of the described first input voice.

5. audio recognition method as described in any one of claims 1 to 3, which is characterized in that if the second input languageThe first matching rate that sound is compared with the described first segmentation voice is less than the first preset matching rate, then inputs language by described secondAfter sound is compared with the described second segmentation voice progress vocal print feature, further include：

If the second matching rate that the second input voice is compared with the described second segmentation voice is less than the second preset matchingRate, it is determined that the second input voice is mismatched with second reference voice；

Wherein, the first preset matching rate is equal with the second preset matching rate.

6. a kind of terminal device, including memory, processor and it is stored in the memory and can be on the processorThe computer program of operation, which is characterized in that the processor realizes following steps when performing the computer program：

7. terminal device as claimed in claim 6, which is characterized in that the voice joint tool includes data head protocol toolWith data content splicing tool；The first input voice includes data head agreement and voice number with first reference voiceAccording to content；

8. terminal device as claimed in claim 6, which is characterized in that described to be connect in the predetermined registration operation detected againAfter the second input voice received is compared with the described first segmentation voice progress vocal print feature, further include：

9. such as claim 6 to 8 any one of them terminal device, which is characterized in that described to determine the second input voiceAfter matching with second reference voice, further include：

10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature existsIn when the computer program is executed by processor the step of realization such as any one of claim 1 to 5 the method.