Recognition result	Matching score
		The video of Zheng Kai	0.641
The video of Zheng Kai	0.629
		The video of Chinese regular script	0.457
Just triumphant moral food	0.231

As shown above, first threshold can be 0.4；Then speech recognition result at this time are as follows: " video, the Zheng Kai of Zheng KaiVideo, Chinese regular script video ".

A kind of possible implementation then takes when all recognition result Model Matching score values are respectively less than first threshold ρThe recognition result of highest scoring executes step 104, carries out semantic processes.

To improve user experience, effectively shows searching process, before step 103, target voice can also be identified and be tiedFruit is output to the interface of terminal and display.Wherein, the interface of terminal can be the client of the voice assistant of acquisition voice messagingDisplay interface, be also possible to other interfaces of terminal, it is not limited here.For example, can as shown in fig. 4 a, the meshMarking speech recognition result is " video of Zheng Kai ".

As shown in figure 4, display recognition result process the following steps are included:

Step 1: the topology file at creation interface；

Wherein, the topology file includes the text control for showing speech recognition result.

Step 2: creation interface loads topology file, text control is initialized.

Step 3: the display interface in terminal shows speech recognition result, that is, the text information identified.

In order to effectively improve the accuracy and coverage of identification, it is stored with preset dictionary in server, the dictionaryInclude a large amount of corpus data in library, have the function of semantic parsing, after the voice messaging that Cloud Server judgement receives, utilizesThe semantic parsing function of itself carries out semantic dissection process to the speech recognition result.Specifically, preserving semanteme in serverIdentification model, the semantics recognition model can identify the participle of voice messaging；Determine the participle in the voice messaging, identification pointThe semanteme of word determines the corresponding file destination of each semanteme.Certainly, the dictionary if desired retrieved is smaller, to improve parsing speedRate, semantics recognition can be completed at the terminal, it is not limited here.

At step 104, comprising:

Step 1: carrying out semantics recognition to target voice recognition result, the corresponding business of target voice recognition result is determinedType；

If terminal executes semantics recognition, speech recognition result can be exported according to the semantics recognition model in terminalParticiple, parses the semanteme of participle, and semantic corresponding annotation results, search in the annotation results whether include and type of service phaseThe type of service of pass.

If server executes semantics recognition, server is after the speech recognition result for receiving terminal transmission, according to serviceSemantics recognition model on device, exports the participle of speech recognition result, parses the semanteme of participle, and semantic corresponding mark knotWhether fruit searches in the annotation results comprising type of service relevant to type of service.

Step 2: from the target language is searched in the corresponding type of service of the target voice recognition result in resources bankThe corresponding file destination of sound recognition result.

For the accuracy for further increasing semantics recognition, in the embodiment of the present invention, the identification specific implementation of semantics recognition modelProcess may include:

Step 1: carrying out word segmentation processing to target voice recognition result, and know to target voice according to preset dictionaryRespectively participle carries out semantics recognition in other result, determines the corresponding type of service of each participle；

Wherein, preset dictionary can obtain corpus by the methods of web crawlers, to update participle and corresponding industryThe mark of service type.

Step 2: determining the corresponding business of target voice recognition result according to the weight of the corresponding type of service of each participleType.

To further increase recall precision, for target voice recognition result is removed, more than the speech recognition knot of first thresholdOther speech recognition results in fruit can also be performed simultaneously aforesaid operations with target voice recognition result.It is of course also possible toAfter the switching command for receiving user, then aforesaid operations are executed, it is not limited here.

As shown in figure 5, specifically, may include:

Step 1: carrying out semantics recognition to each speech recognition result at least one described speech recognition result, reallyDetermine the corresponding type of service of institute's speech recognition result；

Specifically, speech recognition result 1 is inputted into semantics recognition model, if wrapped in the result of semantics recognition model outputThe type of service 1 contained, then it is assumed that include type of service 1 in the speech recognition result 1, need in the corresponding application of type of service 1Subsequent processing is executed in program.

For example, speech recognition result 1 is " video of Zheng Kai ", the word segmentation result of semantics recognition model output are as follows: ZhengHappy, video.Wherein, the type of service of video is video type, then the type of service of speech recognition result is video type.

A kind of possible implementation, type of service can also be determined according to the attribute of participle.For example, speech recognition knotFruit 2 is " weather forecast ", the word segmentation result that semantics recognition model determines are as follows: weather, forecast；" weather " has Weather property(weatherKeys), it is determined that type of service is weather lookup type.

Step 1: carrying out word segmentation processing to institute's speech recognition result, and know to the voice according to preset dictionaryRespectively participle carries out semantics recognition in other result, determines the corresponding type of service of each participle；

Step 2: determining the corresponding business of institute's speech recognition result according to the weight of the corresponding type of service of each participleType.

A kind of possible implementation, the weight of the type of service be according to the type of service in the terminalThe user of the user of the priority or terminal of the data bank in the participle institute source in priority, the preset dictionary is inclinedIn good at least one of determine.

For example, speech recognition result 3 is " video of Chinese regular script ", the word segmentation result that semantics recognition model determines are as follows: justPattern, video.The type of service of video is video type；The type of service of Chinese regular script is education type；If it is determined that " video " is correspondingVideo type weight be greater than " Chinese regular script " it is corresponding education type weight, it is determined that the type of service of speech recognition result 3For video type.If it is determined that the weight of " video " the corresponding video type education weight of type corresponding with " Chinese regular script " is identical,The corresponding type of service of speech recognition result 3 can also be determined as educating type and video type.

For another example, speech recognition result 4 is " weather is pre- quick-fried ", the word segmentation result that semantics recognition model determines are as follows: weather is pre-It is quick-fried；According to preset dictionary determine weather it is pre- it is quick-fried be a film, corresponding type of service includes video type, types of songsDeng；Then according to the weight of " weather is pre- quick-fried " corresponding video type, and the weight of " weather is pre- quick-fried " corresponding types of songs, determineThe type of service of speech recognition result 4.

In step 2, from searching at least one in the corresponding type of service of at least one speech recognition result in resources bankThe corresponding file destination of a speech recognition result.

The mesh of Zheng Kai can be searched for from the video type in resources bank for speech recognition result 1 in conjunction with the example aboveMark file.For speech recognition result 2, weather, the target text of forecast can be searched for from the weather lookup business in resources bankPart.For speech recognition result 3, can be searched for just from the video type or education type or education video type in resources bankThe file destination of pattern.For speech recognition result 4, it is pre- weather can be searched for from the video type or types of songs in resources bankQuick-fried file destination.

In step 105, it specifically includes:

Step 1: determining the priority of each speech recognition result at least one described speech recognition result；

Specifically, showing search result in the form of TAB in conjunction with semantic analysis UI, show the sequence of result mainly according to heatIt searches ranking and carries out TAB sequence.

Step 2: showing each speech recognition result according to the priority arrangement on the display interface of the terminal；

Its priority can be determining based on modes such as user's big data analysis, score and user preferences, it is not limited here.

Step 3: showing each speech recognition result and the corresponding file destination of the target voice recognition result to aobviousShow interface.

In the specific implementation process, as shown in Figure 6, comprising:

The corresponding TAB data of speech recognition result and file destination are converted to JSON data by semantics recognition module, transmissionTo the display module of terminal；

After the display module of terminal obtains the JSON data, corresponding speech recognition result and file destination are parsed；

Each speech recognition result and corresponding file destination are shown according to parsing result.

In conjunction with the example above, however, it is determined that ranking results are as follows: Zheng Kai > Zheng Kai > Chinese regular script then shows that result can be as shown in Figure 7.

A kind of possible implementation, the type of service that can not be determined for semantic analysis or can not determine corresponding targetWhen file, then the speech recognition result is not shown to terminal.As " video of Chinese regular script " semanteme can not understand or resources bank in searchLess than " Chinese regular script " calligraphy related content, then the speech recognition result is not shown to terminal.

In conjunction with the example above, however, it is determined that ranking results are as follows: weather forecast > weather is pre- quick-fried, then show result can such as Fig. 8 andShown in Fig. 9.

Further, if user wants switching target voice recognition result, the switching of speech recognition result can be carried out,It specifically includes:

Specifically, determine that the corresponding file destination of target voice recognition result after change can refer to above-described embodiment,Details are not described herein.

In order to further increase identification speech recognition accuracy rate, in embodiments of the present invention, the method also includes:

User is obtained to the operational order of institute's speech recognition result or file destination；

Increase the matching degree of the corresponding speech recognition result of the operational order or file destination, to update user preference.

For example, user selects " weather is pre- quick-fried " in display interface, then in the user preference of user, " weather is pre- for recordIt is quick-fried ", and increase the matching degree of " weather is pre- quick-fried ".

In order to further increase identification speech recognition accuracy rate, the embodiment of the present invention also provides a kind of possible realityExisting mode, comprising:

Judge whether the voice messaging includes the first control instruction controlled the terminal；

If the user speech information is the first control instruction controlled the terminal, in the terminalExecute first control instruction.

A kind of possible implementation, if in voice messaging also include action type participle, illustrate terminal it is necessary toCorresponding operating is carried out according to the voice messaging.At this time the finger handled according to the voice messaging directly can be sent to terminalIt enables.For example, opening, viewing, the participle of the action types such as broadcasting.

A kind of possible implementation, in the semanteme of voice messaging whether comprising for terminal setting target control refer toOrder is judged, if it is, executing first control instruction in the terminal.

For example, the speech recognition result of identification is " video for opening Zheng Kai ", then it can determine that the first control instruction is to beatIt opens.

A kind of possible implementation, however, it is determined that the file destination of " video of Zheng Kai " is unique, then can directly execute and beatOpen the video of Zheng Kai " file destination.

A kind of possible implementation, however, it is determined that the file destination of " video of Zheng Kai " has multiple, can first show multipleFile destination executes open control instruction after the operational order for obtaining user.

The embodiment of the present invention, by identifying that the voice messaging ties the highest identification of matching score in voice match modelFruit shows user, while at least one speech recognition result for meeting the first matching threshold is carried out semantics recognition respectively, tiesSemantic processes are closed as a result, can more fully understand that user is intended to by the different service search result of UI interactive display to user,It compared with audio recognition method in the prior art, is requested by multiple semantic analysis, realizes the search of homonym name serviceWith show, user can select desired result according to intention.

Based on the same technical idea, the embodiment of the present invention provides a kind of speech recognition equipment 1000, as shown in Figure 10, packetIt includes:

Transmit-Receive Unit 1001, voice messaging for receiving input；

Processing unit 1002, for determining the institute for meeting the first matching threshold according to voice match model trained in advanceState at least one speech recognition result of voice messaging；Determine the highest language of matching degree at least one described speech recognition resultSound recognition result is target voice recognition result；Obtain the corresponding file destination of the target voice recognition result；

Display unit 1003, for each speech recognition result and the corresponding target of the target voice recognition result is literaryPart is shown to display interface, wherein the target voice recognition result is shown with the first display mode, other speech recognition resultsIt shows in a second display mode.

A kind of possible implementation, processing unit 1002 are specifically used for: carrying out language to the target voice recognition resultJustice identification, determines the corresponding type of service of the target voice recognition result；It identifies and ties in the target voice from resources bankThe corresponding file destination of the target voice recognition result is searched in the corresponding type of service of fruit.

A kind of possible implementation, processing unit 1002 are specifically used for: determining the preferential of each speech recognition resultGrade；Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal；

Display unit 1003, is specifically used for: by the corresponding file destination of the target voice recognition result in the terminalDisplay interface on show.

A kind of possible implementation, Transmit-Receive Unit 1001 are also used to: obtaining user to the target voice recognition resultSwitching command；

Processing unit 1002, is also used to: according to the switching command, the target voice recognition result after determining change is correspondingFile destination；

Display unit 1003, is also used to: the target voice recognition result after change shown with the first display mode,Other speech recognition results are shown in a second display mode；The target voice recognition result after showing change simultaneously is correspondingFile destination.

A kind of possible implementation, processing unit 1002 are specifically used for:

On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of servers 1100, as shown in figure 11,It include: processor 1101, communication interface 1102, memory 1103 and communication bus 1104, wherein processor 1101, communication connectsMouth 1102, memory 1103 complete mutual communication by communication bus 1104；

It is stored with computer program in the memory 1103, when described program is executed by the processor 1101, is madeIt obtains the processor 1101 and executes following steps:

According to the voice match model that preparatory training is completed, determine that the voice messaging meets the first matching threshold at leastOne speech recognition result；Determine that the highest speech recognition result of matching degree is target at least one described speech recognition resultSpeech recognition result；Obtain the corresponding file destination of the target voice recognition result；By each speech recognition result and describedThe corresponding file destination of target voice recognition result shows to the display interface of terminal, wherein the target voice recognition result withFirst display mode shows that other speech recognition results are shown in a second display mode.

A kind of possible implementation, processor 1101 are specifically used for: carrying out to the target voice recognition result semanticIdentification, determines the corresponding type of service of the target voice recognition result；In the target voice recognition result from resources bankThe corresponding file destination of the target voice recognition result is searched in corresponding type of service.

A kind of possible implementation, processor 1101 are specifically used for:

A kind of possible implementation, processor 1101 are specifically used for: determining the priority of each speech recognition result；Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal；

A kind of possible implementation, processor 1101 are also used to the target voice according to switching command, after determining changeThe corresponding file destination of recognition result；The target voice recognition result after change is shown with the first display mode, otherSpeech recognition result is shown in a second display mode；The corresponding target of the target voice recognition result after showing change simultaneouslyFile.The switching command is the user that is obtained by communication interface 1102 to the target voice recognition result.

A kind of possible implementation, processor 1101 are specifically used for:

The communication bus that above-mentioned server is mentioned can be Peripheral Component Interconnect standard (Peripheral ComponentInterconnect, PCI) bus or expanding the industrial standard structure (Extended Industry StandardArchitecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For justIt is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.

Communication interface 1102 is for the communication between above-mentioned server and other equipment.

Memory may include random access memory (Random Access Memory, RAM), also may include non-easyThe property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used alsoTo be storage device that at least one is located remotely from aforementioned processor.

Above-mentioned processor can be general processor, including central processing unit, network processing unit (NetworkProcessor, NP) etc.；It can also be digital command processor (Digital Signal Processing, DSP), dedicated collectionAt circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardPart component etc..

On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of computers to store readable storage mediumMatter is stored with the computer program that can be executed by server in the computer readable storage medium, when described program is describedWhen being run on server, so that the server realizes any method in above-described embodiment when executing.

Above-mentioned computer readable storage medium can be any usable medium that the processor in server can access orData storage device, including but not limited to magnetic storage such as floppy disk, hard disk, tape, magneto-optic disk (MO) etc., optical memory are such asCD, DVD, BD, HVD etc. and semiconductor memory such as ROM, EPROM, EEPROM, nonvolatile memory (NANDFLASH), solid state hard disk (SSD) etc..

On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of terminals 1200, as shown in figure 12, packetIt includes: processor 1201, communication interface 1202, memory 1203 and communication bus 1204, wherein processor 1201, communication interface1202, memory 1203 completes mutual communication by communication bus 1204；

It is stored with computer program in the memory 1203, when described program is executed by the processor 1201, is madeIt obtains the processor 1201 and executes following steps:

According to voice match model trained in advance, determines and meet at least the one of the voice messaging of the first matching thresholdA speech recognition result；Determine that the highest speech recognition result of matching degree is target language at least one described speech recognition resultSound recognition result；Obtain the corresponding file destination of the target voice recognition result；By each speech recognition result and the meshThe corresponding file destination of mark speech recognition result is shown to display interface, wherein the target voice recognition result is with the first displayMode shows that other speech recognition results are shown in a second display mode.

A kind of possible implementation, processor 1201 are specifically used for: carrying out to the target voice recognition result semanticIdentification, determines the corresponding type of service of the target voice recognition result；In the target voice recognition result from resources bankThe corresponding file destination of the target voice recognition result is searched in corresponding type of service.

A kind of possible implementation, processor 1201 are specifically used for:

A kind of possible implementation, processor 1201 are specifically used for: determining the priority of each speech recognition result；Each speech recognition result is shown according to the priority arrangement on the display interface of the terminal；By the target languageThe corresponding file destination of sound recognition result is shown on the display interface of the terminal.

A kind of possible implementation, processor 1201 are also used to the target voice according to switching command, after determining changeThe corresponding file destination of recognition result；The target voice recognition result after change is shown with the first display mode, otherSpeech recognition result is shown in a second display mode；The corresponding target of the target voice recognition result after showing change simultaneouslyFile.Wherein, switching command is that the user obtained by communication interface 1202 refers to the switching of the target voice recognition resultIt enables.

A kind of possible implementation, processor 1201 are specifically used for:

The communication bus that above-mentioned terminal is mentioned can be Peripheral Component Interconnect standard (Peripheral ComponentInterconnect, PCI) bus or expanding the industrial standard structure (Extended Industry StandardArchitecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For justIt is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.

Communication interface 1202 is for the communication between above-mentioned terminal and other equipment.

On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of computers to store readable storage mediumMatter is stored with the computer program that can be executed by terminal in the computer readable storage medium, when described program is at the endWhen being run on end, so that the terminal realizes any method in above-described embodiment when executing.

Above-mentioned computer readable storage medium can be any usable medium or number that the processor in terminal can accessSuch as according to storage equipment, including but not limited to magnetic storage such as floppy disk, hard disk, tape, magneto-optic disk (MO) etc., optical memoryCD, DVD, BD, HVD etc. and semiconductor memory such as ROM, EPROM, EEPROM, nonvolatile memory (NANDFLASH), solid state hard disk (SSD) etc..

For systems/devices embodiment, since it is substantially similar to the method embodiment, so the comparison of description is simpleSingle, the relevent part can refer to the partial explaination of embodiments of method.

It should be noted that, in this document, relational terms such as first and second and the like are used merely to a realityBody or an operation are distinguished with another entity or another operation, without necessarily requiring or implying these entitiesOr there are any actual relationship or orders between operation.

It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer programProduct.Therefore, the reality of complete hardware embodiment, complete Application Example or connected applications and hardware aspect can be used in the applicationApply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) producesThe form of product.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program productFigure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructionsThe combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programsInstruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produceA raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for realThe device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spyDetermine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram orThe function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that countingSeries of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer orThe instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram oneThe step of function of being specified in a box or multiple boxes.

Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basicProperty concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted asIt selects embodiment and falls into all change and modification of the scope of the invention.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the artMind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologiesWithin, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of audio recognition method is applied to terminal, which is characterized in that the described method includes:

Receive the voice messaging of input；

According to voice match model trained in advance, at least one language for meeting the voice messaging of the first matching threshold is determinedSound recognition result；

Determine that the highest speech recognition result of matching degree is target voice recognition result at least one described speech recognition result；

Each speech recognition result and the corresponding file destination of the target voice recognition result are shown to display interface, whereinThe target voice recognition result shows that other speech recognition results are shown in a second display mode with the first display mode.

2. the method as described in claim 1, which is characterized in that described to obtain the corresponding target of the target voice recognition resultFile, comprising:

Semantics recognition is carried out to the target voice recognition result, determines the corresponding service class of the target voice recognition resultType；

From searching the target voice recognition result in the corresponding type of service of the target voice recognition result in resources bankCorresponding file destination.

3. method according to claim 2, which is characterized in that described to carry out semantic knowledge to the target voice recognition resultNot, the corresponding type of service of the target voice recognition result is determined, comprising:

According to preset dictionary, word segmentation processing is carried out to the target voice recognition result, and identify to the target voiceAs a result respectively participle carries out semantics recognition in, determines the corresponding type of service of each participle；

According to the weight of the corresponding type of service of each participle, the corresponding type of service of the target voice recognition result is determined.

4. the method as described in claim 1, which is characterized in that described to know each speech recognition result and the target voiceThe corresponding file destination of other result is shown to display interface, comprising:

Determine the priority of each speech recognition result；

5. method as claimed in claim 4, which is characterized in that described to know each speech recognition result and the target voiceThe corresponding file destination of other result is shown to display interface, further includes:

The target voice recognition result after change is shown with the first display mode, other speech recognition results are aobvious with secondThe mode of showing is shown；The corresponding file destination of the target voice recognition result after showing change simultaneously.

6. such as method described in any one of claim 1 to 5, which is characterized in that the basis voice that training is completed in advanceWith model, determine that the voice messaging meets the speech recognition result of the first matching threshold, comprising:

The voice messaging is input to the voice match model, identifies the pinyin sequence in the voice messaging, forms institutePossible candidate word；

Possible chinese character sequence and chinese character sequence are determined by syntax rule and statistical method for each possible candidate wordScore；

7. a kind of speech recognition equipment, which is characterized in that described device includes:

Transmit-Receive Unit, voice messaging for receiving input；Each voice at least one described speech recognition result is obtained to knowThe corresponding file destination of other result；

Processing unit, for according to voice match model trained in advance, determining the voice letter for meeting the first matching thresholdAt least one speech recognition result of breath；Determine the highest speech recognition knot of matching degree at least one described speech recognition resultFruit is target voice recognition result；

Display unit, for by each speech recognition result and the corresponding file destination of the target voice recognition result show toDisplay interface, wherein the target voice recognition result is shown with the first display mode, other speech recognition results are aobvious with secondThe mode of showing is shown.

8. device as claimed in claim 7, which is characterized in that the processing unit is specifically used for:

Semantics recognition is carried out to the target voice recognition result, determines the corresponding service class of the target voice recognition resultType；From searching the target voice recognition result pair in the corresponding type of service of the target voice recognition result in resources bankThe file destination answered.

9. a kind of terminal, which is characterized in that including processor, communication interface, memory and communication bus, wherein processor leads toBelieve that interface, memory complete mutual communication by communication bus；

It is stored with computer program in the memory, when described program is executed by the processor, so that the processorPerform claim requires the step of any one of 1-6 the method.

10. a kind of computer readable storage medium, which is characterized in that it is stored with the computer that can be executed by terminal or serverProgram, when described program is run in the terminal or server, so that the terminal or server perform claim require 1-6The step of any one the method.