Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
Adopt the mode of keyword to carry out the lower problem of speech recognition success ratio in order to solve, the embodiment of the invention provides a kind of natural-sounding recognition methods and device.
As shown in Figure 1, the natural-sounding recognition methods that the embodiment of the invention provides comprises:
Step 101 is obtained the phonetic corresponding to voice messaging of user's input.
For the natural-sounding recognition methods scope of application that the embodiment of the invention is provided wider, can identify the user speech information of different geographical, different accents, in the present embodiment,step 101 can adopt the unspecified person speech recognition technology that the voice messaging of user's input is identified parsing, obtains phonetic corresponding to this voice messaging.
Step 102, the phonetic that adopts the dictionary set in advance thatstep 101 is obtained carries out word segmentation processing, obtains the word pinyin string behind the participle.
Wherein, dictionary is used for being stored into target word and the phonetic corresponding to target word of lang sound identification.
In the present embodiment, the target word of storing in the dictionary can be the word of broad scope, particularly, can obtain the target word and form dictionary from daily life and the information that can touch of working, for example: can from the information of news report every day, extract word, form dictionary; The target word of storing in the dictionary also can be the word of narrow sense scope, particularly, can from the target information database, obtain the target word and form dictionary by canned data, wherein, the target information database is used for storing the information of pending identification, for example: if the natural-sounding recognition methods that the embodiment of the invention provides is applied in the automobile navigation field, the target information database is used for store geographic position information and/or destination name information etc.Need to prove that no matter be the word of broad scope or the word of narrow sense scope, the target word in the dictionary all is unique, does not repeat between each target word.
Because speech recognition technology generally uses in specific area, for example: be applied in navigation, requesting song or search the field such as contact person, in order to reduce the amount of redundancy of target word in the dictionary, save storage space, improve the speed of speech recognition, the embodiment of the invention preferably target word in the dictionary is set to the narrow sense scope word that arranges according to the target information database, but be not limited to above-mentioned set-up mode, well known to a person skilled in the art and be, for applied each industry field of this recognition technology, the technician of described industry all can according to its industry characteristic, rationally arrange its target information database.
In the present embodiment,step 102 specifically can be searched dictionary according to the phonetic thatstep 101 is obtained, the phonetic of phonetic according to the target word that comprises in appearance order and the dictionary is mated, when word pinyin string that the phonetic that finds with the target word mates fully, this word pinyin string is split from phonetic, continue the above-mentioned action of searching of circulation, until finish, thereby realization is to the word segmentation processing of phonetic.
Step 103, the word to be identified that the word pinyin string that findingstep 102 obtains from dictionary is corresponding.
Step 104 is searched the target information database according to word to be identified, obtains the target information the highest with word match degree to be identified from the target information database.
In the present embodiment,step 104 can be obtained the target information the highest with word match degree to be identified by two kinds of methods from the target information database, and the below introduces respectively these two kinds of methods:
1, weight coefficient judgement method
In the present embodiment, if dictionary also is used for corresponding weight grade n and the weight rate range N of storage target word, n, N is integer, N 〉=2, n ∈ [1, N], the importance of target word in described Word message of n level is larger than the importance of target word in described Word message of n+1 level, certainly, the relation of its importance and weight grade n also can be opposite, and those skilled in the art can oneself define as required, and present embodiment is carried out example according to the former, then before thestep 104, also comprise the step of obtaining weight grade corresponding to word to be identified according to dictionary.
Particularly, can set in advance the weight rate range N of word in the dictionary, and the weight grade n of each word, for example: the weight rate range of the target word that can dictionary comprises is set to 3, wherein, heavy grade is 1 the highest, the weight grade is 3 minimum, then the weight grade that according to monopoly and the popularity of target word each target word is set, as, when the target word was place name, the weight grade was set to 3, when target word right and wrong geographic position proprietary refers to noun (such as little fertile sheep), the weight grade is set to 1, certainly, described those skilled in the art can arrange rule according to other above-mentioned target word is carried out the weight grade classification, every kind of situation are not given unnecessary details one by one herein.Afterstep 102 is divided into word with Word message, from dictionary, obtain the weight grade attribute information of each word.
Then this moment, as shown in Figure 2,step 104 can comprise:
Step 1041 is searched the target information database according to word to be identified, from the target information database, obtain with word to be identified in the information aggregate that forms of the information of any one or a plurality of word match.
Step 1042, the weight grade corresponding according to word to be identified, every information in the information aggregate thatstep 1041 is obtained is processed respectively, obtains the weight coefficient of every information.
In the present embodiment,step 1042 can adopt Weighted Average Algorithm to obtain the weight coefficient of every information, can certainly adopt other algorithms to obtain the weight information of every information, does not give unnecessary details one by one herein.
Step 1043, the information that the weight selection coefficient is the highest from the information aggregate thatstep 1041 is obtained is target information.
Need to prove, in order to guarantee the accuracy of the target information thatstep 104 is obtained, improve the speech recognition quality, in the present embodiment, should comprise at least one weight grade in the word to be identified thatstep 103 is obtained and be 1 word, if not having the weight grade in the word to be identified is 1 word, then beforestep 104, also comprise: the phonetic that againstep 101 is obtained carries out word segmentation processing, to obtain the word of at least one weight grade as 1, then this moment,step 104 replaced with: search the target information database according to the word to be identified behind the participle again, obtaining from the target information database with word match degree to be identified is 1 target information.
Further, the natural-sounding recognition methods that provides of the embodiment of the invention can also comprise: at least one the highest grade of weight word and pinyin string corresponding to this word that obtains behind the participle again added in the described dictionary.
Need to prove, the embodiment of the invention is carried out concrete giving an example to the division of weight grade height, the height attribute of weight grade can also be set by other rules in the use procedure of reality, for example: when the weight rate range is 3, the weight grade can be set be 3 the highest, the weight grade is 1 minimum, and above method is that those skilled in the art can associate under the prerequisite of not paying creative work easily, gives unnecessary details no longer one by one herein.
2, the nested method of searching
As shown in Figure 3,step 104 can comprise:
Step 1044, the word to be identified thatstep 103 is obtained sorts.
In the present embodiment, step 1044 can sort word according to the sequencing that occurs in Word message, preferably, in order to improve seek rate, step 1044 can be obtained first the keyword in the word that Word message comprises, and the word that then Word message is comprised sorts according to the order of keyword, rear auxiliary word and front auxiliary word.
Wherein, keyword is to have the proprietary word that refers to meaning, and rear auxiliary word is to be positioned at keyword word afterwards in the Word message, and front auxiliary word is to be positioned at keyword word before in the Word message.
In the present embodiment, can set in advance antistop list, this antistop list can be according to canned data setting in the target information database, the technical scheme that the embodiment of the invention provides is after obtaining word to be identified, antistop list searched respectively in each word in the word to be identified, obtain with antistop list in the word of the keyword coupling of storing be the keyword that Word message comprises.
Need to prove that if know and do not have keyword in the word to be identified, then step 1044 sorts according to the sequencing that word occurs after searching; If know to comprise two above keywords in the word to be identified after searching, then auxiliary word is the later non-key word of first keyword in the word to be identified afterwards, and step 1044 still sorts according to the order of keyword, rear auxiliary word and front auxiliary word.
Need to prove that if instep 103, same word pinyin string finds word to be found more than two in dictionary, then step 1044 with described more than two word to be found sort as a Set Global.
The embodiment of the invention sorts by word that Word message the is comprised order according to keyword, rear auxiliary word and front auxiliary word, so that subsequent step is searched when coupling according to word order, keynote message is outstanding, can significantly shorten the time that coupling searched in word, improve the speed of speech recognition.
Step 1045 according to the ranking results of step 1044, is obtained first word from word to be identified, obtain the information with first word match from the target information database.
Step 1046 is obtained second word from word to be identified, obtain the information with second word match from the information aggregate that the information with first word match forms.
By that analogy, step 1047 is obtained last word from word to be identified, obtains the target information with last word match from the information aggregate that the information of a upper word match adjacent with last word forms.
Need to prove, in above step 1045-1047, if do not find the information with current word match, match information that then can current word is set to the information of a upper word match adjacent with this current word, if, current word is first word, and then the information of this first word match is the information that comprises in the whole target information database.
In order to make those skilled in the art more deep understanding be arranged to the above-described nested method of searching, below by concrete example nested specific implementation of searching method is described:
For example: the voice messaging of inputting as the user is: during the little fertile sheep chafing dish restaurant in anistree East Road, Shijingshan District, Beijing, obtain the phonetic corresponding with this voice messaging, comprising: beijingshijingshanqubajiaodongluxiaofeiyanghuoguodian; According to dictionary this phonetic is carried out participle, obtain the word pinyin string, comprising: beijing, shijingshanqu, bajiao, donglu, xiaofeiyang, huoguodian; Search dictionary according to the word pinyin string and obtain word to be identified, comprising: Beijing, Shijingshan District, anise, East Road, (little fertile sheep, the little sheep of boiling), chafing dish restaurant; If the word to be identified that xiaofeiyang is corresponding (little fertile sheep and the little sheep of boiling) is keyword, according to keyword, rear auxiliary word and front auxiliary word ordering be: (little fertile sheep, the little sheep of boiling), chafing dish restaurant, Beijing, Shijingshan District, anise, East Road; When the target information database comprises: little fertile sheep supermarket, Beijing, the little sheep chafing dish restaurant that boils in Beijing, the little sheep food and drink company of boiling in Shanghai, the little sheep roast meat shop of boiling in Shijingshan District, Beijing, ancient city, Shijingshan District Lu Xiaofei sheep chafing dish restaurant, Donglaishun, Beijing chafing dish restaurant, Donglaishun, anistree North Road, Beijing chafing dish restaurant, during the information such as the anistree little fertile sheep chafing dish restaurant in Beijing, according to the above-mentioned nested method of searching, at first, from the target information database, obtain the information of the keyword set coupling that forms with " little fertile sheep and the little sheep of boiling ", form first information storehouse, this first information storehouse comprises: little fertile sheep supermarket, Beijing, the little sheep chafing dish restaurant that boils in Beijing, the little sheep food and drink company of boiling in Shanghai, the little sheep roast meat shop of boiling in Shijingshan District, Beijing, ancient city, Shijingshan District Lu Xiaofei sheep chafing dish restaurant, the anistree little fertile sheep chafing dish restaurant in Beijing, then, from first information storehouse, obtain the information with " chafing dish restaurant " coupling, form the second information bank, this second information bank comprises: the little sheep chafing dish restaurant that boils in Beijing, ancient city, Shijingshan District Lu Xiaofei sheep chafing dish restaurant, the anistree little fertile sheep chafing dish restaurant in Beijing, the 3rd, from the second information bank, obtain the information with " Beijing " coupling, form the 3rd information bank, the 3rd information bank comprises: the little sheep chafing dish restaurant that boils in Beijing, the anistree little fertile sheep chafing dish restaurant in Beijing, the 4th, from the 3rd information bank, obtain the information with " anise " coupling, form the 4th information bank, the 4th information bank comprises: the anistree little fertile sheep chafing dish restaurant in Beijing, the 5th, from the 4th information bank, obtain the target information with " East Road " coupling, since in the 4th information bank not with the information of " East Road " coupling, so target information is the information that comprises in the 4th information bank, i.e. the anistree little fertile sheep chafing dish restaurant in Beijing.
Can find exactly the highest target information of word match degree that comprises with text message by above-described weight coefficient judgement method and the nested method of searching, realize the identification to the voice messaging of user's input.Certainly, in the use procedure of reality, the highest target information of word match degree that can also adopt additive method to obtain to comprise with text message is not given unnecessary details herein one by one.
Further, if instep 104, chosen two above target informations, in order to improve the accurately fixed of speech recognition, as shown in Figure 4, can also comprise after the step 104:
Step 105, the target information of obtaining user's transmission is chosen indication or user's speech recognition statistical information.
Particularly, the embodiment of the invention can be shown to the user with two above target informations choosing afterstep 104, and step 105 receives the user and chooses indication by the target information that the modes such as voice or button or literal input send.
Perhaps, the natural-sounding recognition methods that the embodiment of the invention provides can be added up the information that the user carries out speech recognition at every turn, and this statistics can be for specific user individual, also can be for specific user colony.Further, this speech recognition statistics can be for carrying out the number of times of speech recognition or the result of frequency statistics to one or more target information of user, also can be for a plurality of users being carried out for the last time the statistics of the target information of speech recognition, certainly can also for other statisticses relevant with speech recognition, not give unnecessary details one by one herein.
Step 106, according to target information choose the indication or the speech recognition statistical information from two above target informations, choose selected objective target information.
For example: when the speech recognition statistics for a plurality of target informations of user are carried out the number of times of speech recognition adds up as a result the time, if the phonetic corresponding to voice messaging of user's input is xiaofeiyanghuoguodian, step 104 has been obtained 4 objective information, comprise: the little fertile sheep chafing dish restaurant in Haidian District, the little fertile sheep chafing dish restaurant in Zhong Guan-cun, Haidian District, the little fertile sheep chafing dish restaurant in anistree East Road, Shijingshan, and Xizhimen Jia Mao is little boils during the sheep chafing dish restaurant, step 105 can be obtained speech recognition statistics corresponding to described 4 objective information, carry out speech recognition 3 times such as " the little fertile sheep chafing dish restaurant in Haidian District ", " the little fertile sheep chafing dish restaurant in Zhong Guan-cun, Haidian District " carries out speech recognition 5 times, " the little fertile sheep chafing dish restaurant in anistree East Road, Shijingshan " carries out speech recognition 40 times, " the little sheep chafing dish restaurant that boils of Xizhimen Jia Mao " carries out speech recognition 1 time, then step 106 can according to statistics, be chosen " the little fertile sheep chafing dish restaurant in anistree East Road, Shijingshan " and be selected objective target information from 4 objective information.
Alternatively, in order further to shorten the time of speech recognition, improve speech recognition speed, in the present embodiment, before thestep 104, can also comprise according to word to be identified and search spoken dictionary, according to lookup result, the step of deletion spoken word from word to be identified, wherein, spoken dictionary is used for the storage spoken word, does not comprise the Word message that has substantive implication in the voice messaging that relates to user's input in this spoken word.
In the present embodiment, can adopt the method for statistics to set in advance spoken dictionary, can comprise people's spoken word used in everyday in this spoken language dictionary, for example: " I think ", " I want ", " may I ask ", " being ", " right ", " can " and " how " etc., the spoken word that comprises in the spoken word storehouse is not given unnecessary details one by one herein.
Further, for the natural-sounding recognition methods that the embodiment of the invention is provided can be applicable to pronounce to pronounce indistinctly Chu and the different crowd of pronunciation standard, improve success ratio and the accuracy rate of speech recognition, on the technical scheme basis shown in above Fig. 1-4, the natural-sounding recognition methods that the embodiment of the invention provides can also comprise: the phonetic that step 101 is obtained carries out the fuzzy phoneme matching treatment, obtain the step of the phonetic after the fuzzy matching, then this moment, step 102 was specially: the phonetic after adopting the dictionary set in advance to fuzzy matching carries out word segmentation processing, obtains the word pinyin string behind the participle.
Particularly, can set in advance phonetic fuzzy matching table, in this phonetic fuzzy matching table, define matched rule, for example: z=zh, c=ch, s=sh, l=n, f=h, r=l, an=ang, en=eng, in=ing, ian=iang, uan=uang, iong=ing etc., do not give unnecessary details one by one, the phonetic that step 101 is obtained according to described rule carries out the fuzzy phoneme matching treatment herein.
By phonetic is carried out fuzzy matching, solved because problems such as the speech recognition failure that the user is speak with a lisp, cacoepy really causes or identification errors, and then improved recognition success rate and the accuracy rate that the embodiment of the invention provides the natural-sounding recognition methods.
The natural-sounding recognition methods that the embodiment of the invention provides, the to be identified word corresponding according to the word pinyin string carries out information matches, and with the target information that obtains as the identification to voice messaging with the highest information of word match degree to be identified in the target information database, do not need voice messaging mated fully and can obtain target information, improved the success ratio of speech recognition, having solved prior art adopts and voice messaging to be carried out complete matching process carries out speech recognition, causing owing to form of presentation is inconsistent makes speech recognition failed, the problem that the speech recognition success ratio is low, because the technical scheme that the embodiment of the invention provides adopts the mode of word match to carry out speech recognition, only need in dictionary, store the target word and in the target information database storage standards information get final product, do not need same thing is stored a large amount of multi-form text messages according to the language expression mode, the data scale of dictionary and target information database is less, be convenient to search, and then improved speech recognition speed, solve prior art and need in vocabulary, store the text message of a large amount of different expression forms to same thing, cause vocabulary in large scale, be not easy to search, carry out the slow problem of speech recognition.The technical scheme that the embodiment of the invention provides is different from English speech recognition technology, this technical scheme is large for Chinese language literal amount, the characteristics that word links up in the statement, nothing is paused, employing is carried out participle according to phonetic to word in the statement, and carry out speech recognition according to the mode that the word to be identified behind the participle is searched, higher to success ratio and the recognition speed of Chinese speech recognition.
As shown in Figure 5, the embodiment of the invention also provides a kind of natural-sounding recognition device, comprising:
The first acquiringunit 501 is used for obtaining the phonetic corresponding to voice messaging of user's input;
Wordsegmentation processing unit 502 be used for to adopt the dictionary that sets in advance that the phonetic that the first acquiringunit 501 obtains is carried out word segmentation processing, obtains the word pinyin string behind the participle;
Second acquisition unit 503 is used for searching word to be identified corresponding to word pinyin string that wordsegmentation processing unit 502 obtains from dictionary;
Search unit 504, be used for searching the target information database according to the word to be identified thatsecond acquisition unit 503 obtains, from the target information database, obtain the target information the highest with word match degree to be identified;
Wherein, described dictionary is used for being stored into target word and the phonetic corresponding to target word of lang sound identification.
Further, as shown in Figure 6, the natural-sounding recognition device that the embodiment of the invention provides can also comprise:
The 3rd acquiringunit 505, also be used for corresponding weight grade n and the weight rate range N of storage target word if be used for dictionary, obtain weight grade corresponding to word to be identified thatsecond acquisition unit 503 obtains according to dictionary, wherein, n, N is integer, N 〉=2, n ∈ [1, N], the importance of target word in described Word message of n level is larger than the importance of target word in described Word message of n+1 level, and certainly, the relation of its importance and weight grade n also can be opposite, those skilled in the art can oneself define as required, and present embodiment is carried out example according to the former;
Then, searching unit 504 can comprise:
Search subelement 5041, be used for searching the target information database according to the word to be identified thatsecond acquisition unit 503 obtains, from the target information database, obtain with word to be identified in the information aggregate that forms of the information of any one or a plurality of word match;
First obtains subelement 5042, is used for weight grade corresponding to word to be identified obtain according to the 3rd acquiringunit 505, and every information of searching in the information aggregate that subelement 5041 obtains is processed respectively, obtains the weight coefficient of every information;
Second obtains subelement 5043, is used for choosing first to obtain the highest information of weight coefficient that subelement 5042 obtains being target information from searching information aggregate that subelement 5041 obtains.
Further, as shown in Figure 7, the natural-sounding recognition device that the embodiment of the invention provides can also comprise:
Heavy participle unit 506, not have the weight grade be 1 word if be used for word to be identified thatsecond acquisition unit 503 obtains, the phonetic that again the first acquiringunit 501 is obtained carries out word segmentation processing, to obtain the word of at least one weight grade as 1;
Search unit 504, can also be used for searching the target information database according to the word to be identified behind the heavy participle unit 506 again participle, from the target information database, obtain the target information the highest with word match degree to be identified.
Further, as shown in Figure 8, the natural-sounding recognition device that the embodiment of the invention provides can also comprise:
Updating block 507, being used at least one weight grade that heavy participle unit 506 obtains is that 1 word and pinyin string corresponding to this word are added dictionary to.
Further, as shown in Figure 9, searching unit 504 can also comprise:
Ordering subelement 5044 is used for word to be identified is sorted;
The 3rd obtains subelement 5045, is used for the result according to 5044 orderings of ordering subelement, obtains first word from word to be identified, obtains the information with first word match from the target information database;
The 4th obtains subelement 5046, is used for obtaining second word from word to be identified, obtains the information with second word match from the information aggregate that the information with first word match forms;
By that analogy, the 5th obtains subelement 5047, is used for obtaining last word from word to be identified, obtains the target information with last word match from the information aggregate that the information of a upper word match adjacent with last word forms.
Further, as shown in figure 10, the natural-sounding recognition device that the embodiment of the invention provides can also comprise:
Delete cells 508, be used for searching spoken dictionary according to the word to be identified thatsecond acquisition unit 503 obtains, according to lookup result, from word to be identified, delete spoken word, wherein, spoken dictionary is used for the storage spoken word, does not comprise the Word message that has substantive implication in the voice messaging that relates to described user's input in this spoken word.
Further, as shown in figure 11, the natural-sounding recognition device that the embodiment of the invention provides can also comprise:
The 4th acquiring unit 509 finds two above target informations if be used for searching unit 504, and the target information of obtaining user's transmission is chosen indication or user's speech recognition statistical information;
Choose unit 5010, be used for choosing indication or speech recognition statistical information according to the target information that the 4th acquiring unit 509 obtains and choose selected objective target information from two above target informations of searching unit 504 and finding.
Further, as shown in figure 12, the natural-sounding recognition device that the embodiment of the invention provides can also comprise:
Fuzzy Processing unit 5011, the phonetic that is used for the first acquiringunit 501 is obtained carries out the fuzzy phoneme matching treatment, obtains the phonetic after the fuzzy matching;
Wordsegmentation processing unit 502 can also be used for adopt the phonetic after the fuzzy matching that the dictionary that sets in advance obtains Fuzzy Processing unit 5011 to carry out word segmentation processing, obtains the word pinyin string behind the participle.
The specific implementation of the natural-sounding recognition device that the embodiment of the invention provides can be described referring to the natural-sounding recognition methods that the embodiment of the invention provides, and repeats no more herein.
The natural-sounding recognition device that the embodiment of the invention provides, the to be identified word corresponding according to the word pinyin string carries out information matches, and with the target information that obtains as the identification to voice messaging with the highest information of word match degree to be identified in the target information database, do not need voice messaging mated fully and can obtain target information, improved the success ratio of speech recognition, having solved prior art adopts and voice messaging to be carried out complete matching process carries out speech recognition, causing owing to form of presentation is inconsistent makes speech recognition failed, the problem that the speech recognition success ratio is low, because the technical scheme that the embodiment of the invention provides adopts the mode of word match to carry out speech recognition, only need in dictionary, store the target word and in the target information database storage standards information get final product, do not need same thing is stored a large amount of multi-form text messages according to the language expression mode, the data scale of dictionary and target information database is less, be convenient to search, and then improved speech recognition speed, solve prior art and need in vocabulary, store the text message of a large amount of different expression forms to same thing, cause vocabulary in large scale, be not easy to search, carry out the slow problem of speech recognition.The technical scheme that the embodiment of the invention provides is different from English speech recognition technology, this technical scheme is large for Chinese language literal amount, the characteristics that word links up in the statement, nothing is paused, employing is carried out participle according to phonetic to word in the statement, and carry out speech recognition according to the mode that the word to be identified behind the participle is searched, higher to success ratio and the recognition speed of Chinese speech recognition.
The natural-sounding recognition methods that the embodiment of the invention provides and device can be applied in as in the information service systems such as navigation, requesting song and contact person's inquiry.
The above; be the specific embodiment of the present invention only, but protection scope of the present invention is not limited to this, anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; can expect easily changing or replacing, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion by described protection domain with claim.