Summary of the invention
The embodiment of the invention provides a kind of audio recognition method, device and terminals, to solve voice in the prior artRecognition result is not able to satisfy user demand, and user inputs the recognition result obtained when voice and user's intention is inconsistent, affectsThe problem of using effect of speech recognition product.
The embodiment of the invention provides a kind of audio recognition methods, are applied to terminal, which comprises
Receive the voice messaging of input;
According to voice match model trained in advance, determines and meet at least the one of the voice messaging of the first matching thresholdA speech recognition result;
Determine that the highest speech recognition result of matching degree is target voice identification at least one described speech recognition resultAs a result;
Obtain the corresponding file destination of the target voice recognition result;
Each speech recognition result and the corresponding file destination of the target voice recognition result are shown to display interface,Wherein the target voice recognition result shows that other speech recognition results are shown in a second display mode with the first display modeShow.
A kind of possible implementation, it is described to obtain the corresponding file destination of the target voice recognition result, comprising:
Semantics recognition is carried out to the target voice recognition result, determines the corresponding business of the target voice recognition resultType;
It is identified from the target voice is searched in resources bank in the corresponding type of service of the target voice recognition resultAs a result corresponding file destination.
A kind of possible implementation, it is described that semantics recognition is carried out to the target voice recognition result, determine the meshMark the corresponding type of service of speech recognition result, comprising:
According to preset dictionary, word segmentation processing is carried out to the target voice recognition result, and to the target voiceRespectively participle carries out semantics recognition in recognition result, determines the corresponding type of service of each participle;
According to the weight of the corresponding type of service of each participle, the corresponding service class of the target voice recognition result is determinedType.
A kind of possible implementation, it is described that each speech recognition result and the target voice recognition result is correspondingFile destination is shown to display interface, comprising:
Determine the priority of each speech recognition result;
Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal;
The corresponding file destination of the target voice recognition result is shown on the display interface of the terminal.
A kind of possible implementation, it is described that each speech recognition result and the target voice recognition result is correspondingFile destination is shown to display interface, further includes:
User is obtained to the switching command of the target voice recognition result;
The corresponding file destination of target voice recognition result according to the switching command, after determining change;
The target voice recognition result after change is shown with the first display mode, other speech recognition results are withTwo display modes are shown;The corresponding file destination of the target voice recognition result after showing change simultaneously.
A kind of possible implementation, the basis voice match model that training is completed in advance determine the voice letterBreath meets the speech recognition result of the first matching threshold, comprising:
The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, groupAt all possible candidate word;
Possible chinese character sequence and Chinese character are determined by syntax rule and statistical method for each possible candidate wordThe score of sequence;
Score is met into the chinese character sequence of the first matching threshold as institute's speech recognition result.
The embodiment of the invention provides a kind of speech recognition equipment, described device includes:
Transmit-Receive Unit, voice messaging for receiving input;
Processing unit, for determining the institute's predicate for meeting the first matching threshold according to voice match model trained in advanceAt least one speech recognition result of message breath;Determine that the highest voice of matching degree is known at least one described speech recognition resultOther result is target voice recognition result;Obtain the corresponding file destination of the target voice recognition result;
Display unit, for showing each speech recognition result and the corresponding file destination of the target voice recognition resultShow to display interface, wherein the target voice recognition result is shown with the first display mode, other speech recognition results are withTwo display modes are shown.
A kind of possible implementation, the processing unit are specifically used for:
Semantics recognition is carried out to the target voice recognition result, determines the corresponding business of the target voice recognition resultType;From searching the target voice recognition result in the corresponding type of service of the target voice recognition result in resources bankCorresponding file destination.
A kind of possible implementation, the processing unit are specifically used for:
According to preset dictionary, word segmentation processing is carried out to the target voice recognition result, and to the target voiceRespectively participle carries out semantics recognition in recognition result, determines the corresponding type of service of each participle;According to the corresponding service class of each participleThe weight of type determines the corresponding type of service of the target voice recognition result.
A kind of possible implementation, the processing unit are specifically used for: determining the preferential of each speech recognition resultGrade;Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal;
The display unit, is specifically used for: by the corresponding file destination of the target voice recognition result in the terminalDisplay interface on show.
A kind of possible implementation, the Transmit-Receive Unit are also used to: obtaining user to the target voice recognition resultSwitching command;
The processing unit, is also used to: according to the switching command, the target voice recognition result after determining change is correspondingFile destination;
The display unit, is also used to: the target voice recognition result after change shown with the first display mode,Other speech recognition results are shown in a second display mode;The target voice recognition result after showing change simultaneously is correspondingFile destination.
A kind of possible implementation, the processing unit are specifically used for:
The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, groupAt all possible candidate word;Possible Chinese character is determined by syntax rule and statistical method for each possible candidate wordThe score of sequence and chinese character sequence;Score is met into the chinese character sequence of the first matching threshold as institute's speech recognition result.
The embodiment of the invention provides a kind of terminals, including processor, communication interface, memory and communication bus, whereinProcessor, communication interface, memory complete mutual communication by communication bus;
It is stored with computer program in the memory, when described program is executed by the processor, so that the placeIt manages device and executes the step of any of the above-described is applied to the method for terminal.
The embodiment of the invention provides a kind of computer readable storage medium, it is stored with the computer that can be executed by terminalProgram, when described program is run on the terminal, so that the terminal executes the method that any of the above-described is applied to terminalThe step of.
The embodiment of the invention provides a kind of audio recognition method, device and terminals, which comprises receives inputVoice messaging;According to voice match model trained in advance, the voice messaging of the first matching threshold of satisfaction is determined at leastOne speech recognition result;Determine that the highest speech recognition result of matching degree is target at least one described speech recognition resultSpeech recognition result;Obtain the corresponding file destination of the target voice recognition result;By each speech recognition result and describedThe corresponding file destination of target voice recognition result is shown to display interface, wherein the target voice recognition result is aobvious with firstThe mode of showing shows that other speech recognition results are shown in a second display mode.By showing matching degree most to the first display modeHigh speech recognition result can quickly be shown, the convenient degree that user uses is improved;To at least one speech recognition resultIt carries out semantics recognition respectively, obtains more possible user search intents, and by each speech recognition result and at least one languageSound recognition result is shown by the second display mode to the display interface of the terminal, effectively provides more search for userAs a result, effectively improving the coverage rate of speech recognition result and user's intention, success rate of the user by phonetic search is improved, is improvedThe using effect of speech recognition product.
Specific embodiment
The present invention will be describe below in further detail with reference to the accompanying drawings, it is clear that described embodiment is only thisInvention a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art existAll other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.
Speech recognition is to allow machine to receive, identify and understand voice signal, and filled and change corresponding digital signal into.ThoughRight speech recognition produces a large amount of application in many industries, but to realize that really man-machine exchange naturally also needs very muchWork will be done, for example, needing bigger improvement at adaptive aspect, reach the requirement not influenced by accent, dialect and particular person.Sound-type in reality be it is various, male sound, female's sound and Tong Yin can be divided into for sound characteristic, in addition, very muchThe pronunciation of people has very big gap with standard pronunciation, this just needs to carry out the processing of accent and dialect.User is caused to input voiceWhen obtained recognition result and user be intended to inconsistent problem.
After user carries out voice input, user needs to carry out transliteration according to everyday words there are more in the word of speech recognitionBeing obtained after change, such as: four big names help --- > four great classical masterpieces, land Yao know Ma Li --- > Distance tests a horse's stamina, weather forecast --- > dayGas is pre- quick-fried etc..Terminal with speech identifying function when carrying out voice recognition processing, due to polyphonic word and accent, dialect andMeaningless modal particle in particular person, continuous speech recognition etc. influences, and leads to the influence factor multiplicity of speech recognition, it is likely that noThe case where can recognize that user wants as a result, leading to the intention of certain customers may cannot achieve, causing misrecognition, furtherSpeech recognition system is affected in recognition speed, recognition efficiency, reduces user experience effect.
It is of the existing technology in order to solve the problems, such as, using television set as the scene of terminal for, the embodiment of the present invention mentionsAll schemes supplied, can be executed by terminal, can also be executed by server, can according to need setting, it is not limited here.As shown in Figure 1, comprising:
Step 101: receiving the voice messaging of input;
Wherein, terminal can obtain the voice messaging of user's input by the speech ciphering equipment of terminal, can also be by externalSpeech ciphering equipment obtain user input voice messaging;Specifically, it is provided with speech recognition module in terminal, can identifies voiceInformation simultaneously carries out voice messaging acquisition.
In addition, be provided with communication module in terminal, such as WIFI wireless communication module etc., enable the terminal and serviceDevice connection, can be sent to server for collected voice messaging.It is of course also possible to all executed by terminal, it can also be onlyTransmitting portion needs the voice messaging of server process, it is not limited here.
Step 102: according to voice match model trained in advance, determining the voice messaging for meeting the first matching thresholdAt least one speech recognition result;
In the specific implementation process, voice match model can be set in terminal, also can be set on the server, hereinWithout limitation.If being set on server, server, which determines, meets at least the one of the voice messaging of the first matching thresholdAfter a speech recognition result, at least one described speech recognition result is sent to terminal.
Step 103: determining that the highest speech recognition result of matching degree is target at least one described speech recognition resultSpeech recognition result;
In the specific implementation process, terminal can determine matching degree most according to the score of at least one speech recognition resultHigh target voice recognition result, or server determines matching degree according to the score of at least one speech recognition resultAfter highest target voice recognition result, target voice recognition result is sent to terminal.
Step 104: obtaining the corresponding file destination of the target voice recognition result;
In the specific implementation process, terminal can search in local or network resources system according to target voice recognition resultThe corresponding file destination of Suo Suoshu target voice recognition result, or according to target voice recognition result, in Internet resourcesThe corresponding file destination of the target voice recognition result is searched in library, determines the corresponding file destination of target voice recognition resultAfterwards, the identification information of file destination or file destination is sent to terminal, so that terminal determines the target voice recognition resultCorresponding file destination.
Step 105: by each speech recognition result and the corresponding file destination of the target voice recognition result show toDisplay interface, wherein the target voice recognition result is shown with the first display mode, other speech recognition results are aobvious with secondThe mode of showing is shown.
In the specific implementation process, the second display mode can be the display mode opposite with the first display mode.For example,First display mode can be to highlight, having the display modes such as check boxes, and the second display mode can be highlighted, for nothing without choosingThe display modes such as center.For example, target as shown in Figure 7 shows that result is the first display mode that check boxes are highlighted, theThe mode that two display modes are highlighted for no check boxes.Specific display mode is it is not limited here.
A kind of audio recognition method provided in an embodiment of the present invention, by showing that matching degree is highest to the first display modeSpeech recognition result can quickly be shown, the convenient degree that user uses is improved;At least one speech recognition result is distinguishedSemantics recognition is carried out, obtains more possible user search intents, and each speech recognition result and at least one voice are knownOther result is shown by the second display mode to the display interface of the terminal, effective to provide more search knots for userFruit effectively improves the coverage rate of speech recognition result and user's intention, improves success rate of the user by phonetic search, improves languageThe using effect of sound identification product.
In the embodiment of the present invention, as shown in Fig. 2, a kind of method that speech recognition modeling determines speech recognition result is provided,Include:
Step 1: obtaining the voice messaging of user's input, the feature acoustics of the voice messaging is determined by acoustic featureProbability;
Specifically, acoustic feature extracts, i.e., speech acoustics feature information is extracted from voice messaging, in order to guarantee that identification is quasi-True rate, the extraction part should have preferable distinction to the modeling unit of acoustic model.Acoustics in the embodiment of the present inventionFeature may include: mel cepstrum coefficients (MFCC), linear prediction residue error (LPCC), perception linear predictor coefficient (PLP)Deng.
Step 2: the voice messaging after acoustic feature is extracted is input in voice match model, voice match model packetInclude language model and acoustic model;
For example, in embodiments of the present invention, the training process of the voice match model may include:
Step 1: obtaining sample voice information, the markup information of its affiliated voice is carried in the sample voice information;
Step 2: by each sample voice information input into voice match model;
Step 3: according to the output of each sample voice information and the voice match model, to the voice match mouldType is trained.
In order to facilitate the training of voice match model, a large amount of sample voice information can be collected, the sample voice informationIt can be terminal acquisition, or what other approach obtained;For sample voice information, can the sample voice information intoRower note.
By sample voice information input into the voice match model, which is trained, the modelIt can be to have the models such as dynamic time warping technology, hidden Markov model, artificial neural network, support vector machines.It will be eachSample voice information input is into the model, according to the output of the markup information of each sample voice information and voice match modelAs a result, being trained to the voice match model.
In the embodiment of the present invention, voice match model is obtained by being trained to great amount of samples voice messaging, and pass throughThe voice match model can carry out speech recognition to the voice messaging of acquisition.
Wherein, acoustic model is built using the acoustic model that training phonetic feature and its corresponding markup information carry out markMould.Acoustic model construct voice signal in observational characteristic and pronunciation modeling unit between mapping relations, with this carry out phoneme orThe classification of phoneme state.In the embodiment of the present invention, acoustic model can be basic as the modeling of acoustic model using HMM.
Wherein, language model can be using under the speech recognition framework modeled based on statistical learning, and N-gram counts languageSay model comprising a Markov chain indicates the generating process of word sequence, i.e., indicates the Probability p (W) for generating word sequence WAre as follows:
Wherein, wkIndicate word sequence in k-th of word, above formula as it can be seen that generate current word probability only with the word of front n-1It is related;
In the embodiment of the present invention, the training of language model and evaluation index can use language model puzzlement degree(Perplexity, PP), its definition are the inverses of word sequence generating probability ensemble average, it may be assumed that
As it can be seen that language model is smaller to the expectation puzzlement degree for generating word sequence from formula, then the language model is givenIt is higher to the prediction accuracy for generating which kind of current word in the case where history word sequence, therefore the training objective of language model is exactlyMinimize the puzzlement degree of training set corpus.
In the training process, the probability of each word occurred in training set corpus and related word combination is counted, and first with thisFor the relevant parameter of basic estimating language models.
However the number of related word combination is to increase with the vocabulary scale being likely to occur in geometry grade, counts all possibilityThe case where appearance, is simultaneously infeasible, and in the realistic case, training data be usually it is sparse, the general of appearance is combined between some wordsRate very little did not occur even at all.For these problems, by drop power (Discounting) and can recallThe methods of (Backing-off), and utilize recurrent neural network (Recurrent Neural Network, RNN) modeling languageThe method of model optimizes language model.
Step 3: obtaining the possible text of speech recognition after the result that speech model is obtained is input to decoder decodingThis information.
The phonetic feature acoustics probability that is calculated in conjunction with acoustic model and general by the calculated language model of language modelRate analyzes most possible word sequence W' by relevant search algorithm using decoder, possible in voice messaging to exportText information.
In a step 102, the voice match model completed according to preparatory training determines that the voice messaging meets firstSpeech recognition result with threshold value, comprising:
Step 1: the voice messaging is input to the voice match model, the phonetic in the voice messaging is identifiedSequence forms all possible candidate word;
In order to confirm correct character to each syllable, first according to the pinyin sequence of input, form all possibleCharacter assumes or the word of single-tone byte, multisyllable is assumed.Such as: by taking list entries " video of Zheng Kai " as an example, corresponding phoneticSequence is [zheng4, kai3, de1, shi4, pin2].As shown in figure 3, each path is a possible recognition result.
Step 2: determining possible chinese character sequence by syntax rule and statistical method for each possible candidate wordAnd the score of chinese character sequence;
Specifically, multiple candidate words by each sound to be identified obtain Chinese character using grammar rule and Principle of StatisticsThe score of sequence, and correct the mistake of some phonetic identifications.Applied probability statistical language model is in character string or word sequenceSearch may correct path.Decoder in the embodiment of the present invention can use the Viterbi algorithm of Dynamic Programming Idea, andBy certain algorithm (algorithm (Language is seen before Gauss selection algorithm (Gaussian Selection), language modelMode Look-Ahead) etc.) carry out quick synchronous probability calculating and search space cut, reduced by this in terms ofComplexity and memory overhead are calculated, the efficiency for realizing searching algorithm is improved.
Step 3: score is met the chinese character sequence of the first matching threshold as institute's speech recognition result.
Specifically, at least one matched man's sequence of language model and score are ranked up.From the knot of template matchingFrom the point of view of fruit, the high recognition result of matching degree identifies that correct probability is higher.There certainly exist due to not supplementing language in some modelAlthough material causes Model Matching score higher, not the case where correct result.Therefore, matching score can be chosen to need to meetThe speech recognition result of first threshold carries out semantics recognition.
Specifically, score needs to meet first threshold to get the speech recognition result divided greater than first threshold ρ as possibleCorrect recognition result.For example, recognition result is as follows:
| Recognition result | Matching score |
| The video of Zheng Kai | 0.641 |
| The video of Zheng Kai | 0.629 |
| The video of Chinese regular script | 0.457 |
| Just triumphant moral food | 0.231 |
As shown above, first threshold can be 0.4;Then speech recognition result at this time are as follows: " video, the Zheng Kai of Zheng KaiVideo, Chinese regular script video ".
A kind of possible implementation then takes when all recognition result Model Matching score values are respectively less than first threshold ρThe recognition result of highest scoring executes step 104, carries out semantic processes.
To improve user experience, effectively shows searching process, before step 103, target voice can also be identified and be tiedFruit is output to the interface of terminal and display.Wherein, the interface of terminal can be the client of the voice assistant of acquisition voice messagingDisplay interface, be also possible to other interfaces of terminal, it is not limited here.For example, can as shown in fig. 4 a, the meshMarking speech recognition result is " video of Zheng Kai ".
As shown in figure 4, display recognition result process the following steps are included:
Step 1: the topology file at creation interface;
Wherein, the topology file includes the text control for showing speech recognition result.
Step 2: creation interface loads topology file, text control is initialized.
Step 3: the display interface in terminal shows speech recognition result, that is, the text information identified.
In order to effectively improve the accuracy and coverage of identification, it is stored with preset dictionary in server, the dictionaryInclude a large amount of corpus data in library, have the function of semantic parsing, after the voice messaging that Cloud Server judgement receives, utilizesThe semantic parsing function of itself carries out semantic dissection process to the speech recognition result.Specifically, preserving semanteme in serverIdentification model, the semantics recognition model can identify the participle of voice messaging;Determine the participle in the voice messaging, identification pointThe semanteme of word determines the corresponding file destination of each semanteme.Certainly, the dictionary if desired retrieved is smaller, to improve parsing speedRate, semantics recognition can be completed at the terminal, it is not limited here.
At step 104, comprising:
Step 1: carrying out semantics recognition to target voice recognition result, the corresponding business of target voice recognition result is determinedType;
If terminal executes semantics recognition, speech recognition result can be exported according to the semantics recognition model in terminalParticiple, parses the semanteme of participle, and semantic corresponding annotation results, search in the annotation results whether include and type of service phaseThe type of service of pass.
If server executes semantics recognition, server is after the speech recognition result for receiving terminal transmission, according to serviceSemantics recognition model on device, exports the participle of speech recognition result, parses the semanteme of participle, and semantic corresponding mark knotWhether fruit searches in the annotation results comprising type of service relevant to type of service.
Step 2: from the target language is searched in the corresponding type of service of the target voice recognition result in resources bankThe corresponding file destination of sound recognition result.
For the accuracy for further increasing semantics recognition, in the embodiment of the present invention, the identification specific implementation of semantics recognition modelProcess may include:
Step 1: carrying out word segmentation processing to target voice recognition result, and know to target voice according to preset dictionaryRespectively participle carries out semantics recognition in other result, determines the corresponding type of service of each participle;
Wherein, preset dictionary can obtain corpus by the methods of web crawlers, to update participle and corresponding industryThe mark of service type.
Step 2: determining the corresponding business of target voice recognition result according to the weight of the corresponding type of service of each participleType.
To further increase recall precision, for target voice recognition result is removed, more than the speech recognition knot of first thresholdOther speech recognition results in fruit can also be performed simultaneously aforesaid operations with target voice recognition result.It is of course also possible toAfter the switching command for receiving user, then aforesaid operations are executed, it is not limited here.
As shown in figure 5, specifically, may include:
Step 1: carrying out semantics recognition to each speech recognition result at least one described speech recognition result, reallyDetermine the corresponding type of service of institute's speech recognition result;
Specifically, speech recognition result 1 is inputted into semantics recognition model, if wrapped in the result of semantics recognition model outputThe type of service 1 contained, then it is assumed that include type of service 1 in the speech recognition result 1, need in the corresponding application of type of service 1Subsequent processing is executed in program.
For example, speech recognition result 1 is " video of Zheng Kai ", the word segmentation result of semantics recognition model output are as follows: ZhengHappy, video.Wherein, the type of service of video is video type, then the type of service of speech recognition result is video type.
A kind of possible implementation, type of service can also be determined according to the attribute of participle.For example, speech recognition knotFruit 2 is " weather forecast ", the word segmentation result that semantics recognition model determines are as follows: weather, forecast;" weather " has Weather property(weatherKeys), it is determined that type of service is weather lookup type.
For the accuracy for further increasing semantics recognition, in the embodiment of the present invention, the identification specific implementation of semantics recognition modelProcess may include:
Step 1: carrying out word segmentation processing to institute's speech recognition result, and know to the voice according to preset dictionaryRespectively participle carries out semantics recognition in other result, determines the corresponding type of service of each participle;
Wherein, preset dictionary can obtain corpus by the methods of web crawlers, to update participle and corresponding industryThe mark of service type.
Step 2: determining the corresponding business of institute's speech recognition result according to the weight of the corresponding type of service of each participleType.
A kind of possible implementation, the weight of the type of service be according to the type of service in the terminalThe user of the user of the priority or terminal of the data bank in the participle institute source in priority, the preset dictionary is inclinedIn good at least one of determine.
For example, speech recognition result 3 is " video of Chinese regular script ", the word segmentation result that semantics recognition model determines are as follows: justPattern, video.The type of service of video is video type;The type of service of Chinese regular script is education type;If it is determined that " video " is correspondingVideo type weight be greater than " Chinese regular script " it is corresponding education type weight, it is determined that the type of service of speech recognition result 3For video type.If it is determined that the weight of " video " the corresponding video type education weight of type corresponding with " Chinese regular script " is identical,The corresponding type of service of speech recognition result 3 can also be determined as educating type and video type.
For another example, speech recognition result 4 is " weather is pre- quick-fried ", the word segmentation result that semantics recognition model determines are as follows: weather is pre-It is quick-fried;According to preset dictionary determine weather it is pre- it is quick-fried be a film, corresponding type of service includes video type, types of songsDeng;Then according to the weight of " weather is pre- quick-fried " corresponding video type, and the weight of " weather is pre- quick-fried " corresponding types of songs, determineThe type of service of speech recognition result 4.
In step 2, from searching at least one in the corresponding type of service of at least one speech recognition result in resources bankThe corresponding file destination of a speech recognition result.
The mesh of Zheng Kai can be searched for from the video type in resources bank for speech recognition result 1 in conjunction with the example aboveMark file.For speech recognition result 2, weather, the target text of forecast can be searched for from the weather lookup business in resources bankPart.For speech recognition result 3, can be searched for just from the video type or education type or education video type in resources bankThe file destination of pattern.For speech recognition result 4, it is pre- weather can be searched for from the video type or types of songs in resources bankQuick-fried file destination.
In step 105, it specifically includes:
Step 1: determining the priority of each speech recognition result at least one described speech recognition result;
Specifically, showing search result in the form of TAB in conjunction with semantic analysis UI, show the sequence of result mainly according to heatIt searches ranking and carries out TAB sequence.
Step 2: showing each speech recognition result according to the priority arrangement on the display interface of the terminal;
Its priority can be determining based on modes such as user's big data analysis, score and user preferences, it is not limited here.
Step 3: showing each speech recognition result and the corresponding file destination of the target voice recognition result to aobviousShow interface.
In the specific implementation process, as shown in Figure 6, comprising:
The corresponding TAB data of speech recognition result and file destination are converted to JSON data by semantics recognition module, transmissionTo the display module of terminal;
After the display module of terminal obtains the JSON data, corresponding speech recognition result and file destination are parsed;
Each speech recognition result and corresponding file destination are shown according to parsing result.
In conjunction with the example above, however, it is determined that ranking results are as follows: Zheng Kai > Zheng Kai > Chinese regular script then shows that result can be as shown in Figure 7.
A kind of possible implementation, the type of service that can not be determined for semantic analysis or can not determine corresponding targetWhen file, then the speech recognition result is not shown to terminal.As " video of Chinese regular script " semanteme can not understand or resources bank in searchLess than " Chinese regular script " calligraphy related content, then the speech recognition result is not shown to terminal.
In conjunction with the example above, however, it is determined that ranking results are as follows: weather forecast > weather is pre- quick-fried, then show result can such as Fig. 8 andShown in Fig. 9.
Further, if user wants switching target voice recognition result, the switching of speech recognition result can be carried out,It specifically includes:
User is obtained to the switching command of the target voice recognition result;
The corresponding file destination of target voice recognition result according to the switching command, after determining change;
The target voice recognition result after change is shown with the first display mode, other speech recognition results are withTwo display modes are shown;The corresponding file destination of the target voice recognition result after showing change simultaneously.
Specifically, determine that the corresponding file destination of target voice recognition result after change can refer to above-described embodiment,Details are not described herein.
In order to further increase identification speech recognition accuracy rate, in embodiments of the present invention, the method also includes:
User is obtained to the operational order of institute's speech recognition result or file destination;
Increase the matching degree of the corresponding speech recognition result of the operational order or file destination, to update user preference.
For example, user selects " weather is pre- quick-fried " in display interface, then in the user preference of user, " weather is pre- for recordIt is quick-fried ", and increase the matching degree of " weather is pre- quick-fried ".
In order to further increase identification speech recognition accuracy rate, the embodiment of the present invention also provides a kind of possible realityExisting mode, comprising:
Judge whether the voice messaging includes the first control instruction controlled the terminal;
If the user speech information is the first control instruction controlled the terminal, in the terminalExecute first control instruction.
A kind of possible implementation, if in voice messaging also include action type participle, illustrate terminal it is necessary toCorresponding operating is carried out according to the voice messaging.At this time the finger handled according to the voice messaging directly can be sent to terminalIt enables.For example, opening, viewing, the participle of the action types such as broadcasting.
A kind of possible implementation, in the semanteme of voice messaging whether comprising for terminal setting target control refer toOrder is judged, if it is, executing first control instruction in the terminal.
For example, the speech recognition result of identification is " video for opening Zheng Kai ", then it can determine that the first control instruction is to beatIt opens.
A kind of possible implementation, however, it is determined that the file destination of " video of Zheng Kai " is unique, then can directly execute and beatOpen the video of Zheng Kai " file destination.
A kind of possible implementation, however, it is determined that the file destination of " video of Zheng Kai " has multiple, can first show multipleFile destination executes open control instruction after the operational order for obtaining user.
The embodiment of the present invention, by identifying that the voice messaging ties the highest identification of matching score in voice match modelFruit shows user, while at least one speech recognition result for meeting the first matching threshold is carried out semantics recognition respectively, tiesSemantic processes are closed as a result, can more fully understand that user is intended to by the different service search result of UI interactive display to user,It compared with audio recognition method in the prior art, is requested by multiple semantic analysis, realizes the search of homonym name serviceWith show, user can select desired result according to intention.
Based on the same technical idea, the embodiment of the present invention provides a kind of speech recognition equipment 1000, as shown in Figure 10, packetIt includes:
Transmit-Receive Unit 1001, voice messaging for receiving input;
Processing unit 1002, for determining the institute for meeting the first matching threshold according to voice match model trained in advanceState at least one speech recognition result of voice messaging;Determine the highest language of matching degree at least one described speech recognition resultSound recognition result is target voice recognition result;Obtain the corresponding file destination of the target voice recognition result;
Display unit 1003, for each speech recognition result and the corresponding target of the target voice recognition result is literaryPart is shown to display interface, wherein the target voice recognition result is shown with the first display mode, other speech recognition resultsIt shows in a second display mode.
A kind of possible implementation, processing unit 1002 are specifically used for: carrying out language to the target voice recognition resultJustice identification, determines the corresponding type of service of the target voice recognition result;It identifies and ties in the target voice from resources bankThe corresponding file destination of the target voice recognition result is searched in the corresponding type of service of fruit.
A kind of possible implementation, processing unit 1002 are specifically used for: determining the preferential of each speech recognition resultGrade;Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal;
Display unit 1003, is specifically used for: by the corresponding file destination of the target voice recognition result in the terminalDisplay interface on show.
A kind of possible implementation, Transmit-Receive Unit 1001 are also used to: obtaining user to the target voice recognition resultSwitching command;
Processing unit 1002, is also used to: according to the switching command, the target voice recognition result after determining change is correspondingFile destination;
Display unit 1003, is also used to: the target voice recognition result after change shown with the first display mode,Other speech recognition results are shown in a second display mode;The target voice recognition result after showing change simultaneously is correspondingFile destination.
A kind of possible implementation, processing unit 1002 are specifically used for:
The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, groupAt all possible candidate word;Possible Chinese character is determined by syntax rule and statistical method for each possible candidate wordThe score of sequence and chinese character sequence;Score is met into the chinese character sequence of the first matching threshold as institute's speech recognition result.
On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of servers 1100, as shown in figure 11,It include: processor 1101, communication interface 1102, memory 1103 and communication bus 1104, wherein processor 1101, communication connectsMouth 1102, memory 1103 complete mutual communication by communication bus 1104;
It is stored with computer program in the memory 1103, when described program is executed by the processor 1101, is madeIt obtains the processor 1101 and executes following steps:
According to the voice match model that preparatory training is completed, determine that the voice messaging meets the first matching threshold at leastOne speech recognition result;Determine that the highest speech recognition result of matching degree is target at least one described speech recognition resultSpeech recognition result;Obtain the corresponding file destination of the target voice recognition result;By each speech recognition result and describedThe corresponding file destination of target voice recognition result shows to the display interface of terminal, wherein the target voice recognition result withFirst display mode shows that other speech recognition results are shown in a second display mode.
A kind of possible implementation, processor 1101 are specifically used for: carrying out to the target voice recognition result semanticIdentification, determines the corresponding type of service of the target voice recognition result;In the target voice recognition result from resources bankThe corresponding file destination of the target voice recognition result is searched in corresponding type of service.
A kind of possible implementation, processor 1101 are specifically used for:
According to preset dictionary, word segmentation processing is carried out to the target voice recognition result, and to the target voiceRespectively participle carries out semantics recognition in recognition result, determines the corresponding type of service of each participle;According to the corresponding service class of each participleThe weight of type determines the corresponding type of service of the target voice recognition result.
A kind of possible implementation, processor 1101 are specifically used for: determining the priority of each speech recognition result;Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal;
A kind of possible implementation, processor 1101 are also used to the target voice according to switching command, after determining changeThe corresponding file destination of recognition result;The target voice recognition result after change is shown with the first display mode, otherSpeech recognition result is shown in a second display mode;The corresponding target of the target voice recognition result after showing change simultaneouslyFile.The switching command is the user that is obtained by communication interface 1102 to the target voice recognition result.
A kind of possible implementation, processor 1101 are specifically used for:
The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, groupAt all possible candidate word;Possible Chinese character is determined by syntax rule and statistical method for each possible candidate wordThe score of sequence and chinese character sequence;Score is met into the chinese character sequence of the first matching threshold as institute's speech recognition result.
The communication bus that above-mentioned server is mentioned can be Peripheral Component Interconnect standard (Peripheral ComponentInterconnect, PCI) bus or expanding the industrial standard structure (Extended Industry StandardArchitecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For justIt is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface 1102 is for the communication between above-mentioned server and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easyThe property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used alsoTo be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit, network processing unit (NetworkProcessor, NP) etc.;It can also be digital command processor (Digital Signal Processing, DSP), dedicated collectionAt circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardPart component etc..
On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of computers to store readable storage mediumMatter is stored with the computer program that can be executed by server in the computer readable storage medium, when described program is describedWhen being run on server, so that the server realizes any method in above-described embodiment when executing.
Above-mentioned computer readable storage medium can be any usable medium that the processor in server can access orData storage device, including but not limited to magnetic storage such as floppy disk, hard disk, tape, magneto-optic disk (MO) etc., optical memory are such asCD, DVD, BD, HVD etc. and semiconductor memory such as ROM, EPROM, EEPROM, nonvolatile memory (NANDFLASH), solid state hard disk (SSD) etc..
On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of terminals 1200, as shown in figure 12, packetIt includes: processor 1201, communication interface 1202, memory 1203 and communication bus 1204, wherein processor 1201, communication interface1202, memory 1203 completes mutual communication by communication bus 1204;
It is stored with computer program in the memory 1203, when described program is executed by the processor 1201, is madeIt obtains the processor 1201 and executes following steps:
According to voice match model trained in advance, determines and meet at least the one of the voice messaging of the first matching thresholdA speech recognition result;Determine that the highest speech recognition result of matching degree is target language at least one described speech recognition resultSound recognition result;Obtain the corresponding file destination of the target voice recognition result;By each speech recognition result and the meshThe corresponding file destination of mark speech recognition result is shown to display interface, wherein the target voice recognition result is with the first displayMode shows that other speech recognition results are shown in a second display mode.
A kind of possible implementation, processor 1201 are specifically used for: carrying out to the target voice recognition result semanticIdentification, determines the corresponding type of service of the target voice recognition result;In the target voice recognition result from resources bankThe corresponding file destination of the target voice recognition result is searched in corresponding type of service.
A kind of possible implementation, processor 1201 are specifically used for:
According to preset dictionary, word segmentation processing is carried out to the target voice recognition result, and to the target voiceRespectively participle carries out semantics recognition in recognition result, determines the corresponding type of service of each participle;According to the corresponding service class of each participleThe weight of type determines the corresponding type of service of the target voice recognition result.
A kind of possible implementation, processor 1201 are specifically used for: determining the priority of each speech recognition result;Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal;By the target languageThe corresponding file destination of sound recognition result is shown on the display interface of the terminal.
A kind of possible implementation, processor 1201 are also used to the target voice according to switching command, after determining changeThe corresponding file destination of recognition result;The target voice recognition result after change is shown with the first display mode, otherSpeech recognition result is shown in a second display mode;The corresponding target of the target voice recognition result after showing change simultaneouslyFile.Wherein, switching command is that the user obtained by communication interface 1202 refers to the switching of the target voice recognition resultIt enables.
A kind of possible implementation, processor 1201 are specifically used for:
The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, groupAt all possible candidate word;Possible Chinese character is determined by syntax rule and statistical method for each possible candidate wordThe score of sequence and chinese character sequence;Score is met into the chinese character sequence of the first matching threshold as institute's speech recognition result.
The communication bus that above-mentioned terminal is mentioned can be Peripheral Component Interconnect standard (Peripheral ComponentInterconnect, PCI) bus or expanding the industrial standard structure (Extended Industry StandardArchitecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For justIt is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface 1202 is for the communication between above-mentioned terminal and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easyThe property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used alsoTo be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit, network processing unit (NetworkProcessor, NP) etc.;It can also be digital command processor (Digital Signal Processing, DSP), dedicated collectionAt circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardPart component etc..
On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of computers to store readable storage mediumMatter is stored with the computer program that can be executed by terminal in the computer readable storage medium, when described program is at the endWhen being run on end, so that the terminal realizes any method in above-described embodiment when executing.
Above-mentioned computer readable storage medium can be any usable medium or number that the processor in terminal can accessSuch as according to storage equipment, including but not limited to magnetic storage such as floppy disk, hard disk, tape, magneto-optic disk (MO) etc., optical memoryCD, DVD, BD, HVD etc. and semiconductor memory such as ROM, EPROM, EEPROM, nonvolatile memory (NANDFLASH), solid state hard disk (SSD) etc..
For systems/devices embodiment, since it is substantially similar to the method embodiment, so the comparison of description is simpleSingle, the relevent part can refer to the partial explaination of embodiments of method.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a realityBody or an operation are distinguished with another entity or another operation, without necessarily requiring or implying these entitiesOr there are any actual relationship or orders between operation.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer programProduct.Therefore, the reality of complete hardware embodiment, complete Application Example or connected applications and hardware aspect can be used in the applicationApply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) producesThe form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program productFigure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructionsThe combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programsInstruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produceA raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for realThe device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spyDetermine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram orThe function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that countingSeries of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer orThe instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram oneThe step of function of being specified in a box or multiple boxes.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basicProperty concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted asIt selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the artMind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologiesWithin, then the present invention is also intended to include these modifications and variations.