Movatterモバイル変換


[0]ホーム

URL:


CN102867512A - Method and device for recognizing natural speech - Google Patents

Method and device for recognizing natural speech
Download PDF

Info

Publication number
CN102867512A
CN102867512ACN2011101847596ACN201110184759ACN102867512ACN 102867512 ACN102867512 ACN 102867512ACN 2011101847596 ACN2011101847596 ACN 2011101847596ACN 201110184759 ACN201110184759 ACN 201110184759ACN 102867512 ACN102867512 ACN 102867512A
Authority
CN
China
Prior art keywords
word
identified
information
target
target information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011101847596A
Other languages
Chinese (zh)
Inventor
余喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by IndividualfiledCriticalIndividual
Priority to CN2011101847596ApriorityCriticalpatent/CN102867512A/en
Publication of CN102867512ApublicationCriticalpatent/CN102867512A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Landscapes

Abstract

The invention discloses a method and a device for recognizing a natural speech, and relates to a speech recognition technology, so as to solve the problem of low speech recognition success ratio because of a keyword mode. The method comprises the steps as follows: obtaining pinyin corresponding to a speech message input by a user; carrying out word segmentation treatment on the pinyin by a dictionary set in advance to obtain a segmented word pinyin string; searching a word to be recognized corresponding to the word pinyin string from the dictionary, and searching a target information database to obtain the target information which has the highest matching degree with the word to be recognized according to the word to be recognized, wherein the dictionary is used for storing a target word to be recognized by speech and the pinyin corresponding to the target word. The technical scheme provided by the embodiment of the invention can be applied to information service systems for navigation, song requesting, linkman inquiry and the like.

Description

Natural-sounding recognition methods and device
Technical field
The present invention relates to speech recognition technology, relate in particular to a kind of natural-sounding recognition methods and device.
Background technology
In field of speech recognition, for different language, speech recognition technology is different, for example: for English, word consists of by the letter in 26 alphabets in the statement of pending speech recognition, when carrying out speech recognition, speech recognition system only need to be identified the letter in the statement, can identify text message corresponding to voice messaging.
Chinese is with English maximum difference, Chinese character quantity is larger, at present, the sum of Chinese character has surpassed 80,000, wherein about nearly 3500 words of Chinese characters in common use, in the face of huge Chinese character storehouse like this, traditional speech recognition technology is based on keyword, the voice content that speech recognition system need to send the user from the beginning to the end by word for word with vocabulary in pre-stored content of text mate, when only having certain bar text content of storing in voice content and the vocabulary to mate fully, speech recognition system just can identify the implication of the voice content of user's transmission, successfully carries out speech recognition, otherwise, the speech recognition failure.
Yet, in the life of reality, the language expression form is diversified, and everyone or same people are different in the statement of different times for same thing, and for example: the statement to mother's one word can comprise: mother, mother, mother, old mother, mommy etc.For success ratio and the accuracy rate that improves speech recognition, needs all store all expression forms of same thing in the vocabulary of speech recognition system as much as possible, this is so that the vocabulary scale of speech recognition system is very huge, safeguard inconvenient, and because vocabulary is in large scale, so that speech recognition system is carried out the speed of speech recognition is slower.In addition, because people's language expression form varies, along with the development in epoch, Expression of language is also being constantly updated, can't be in the vocabulary of speech recognition system all expression forms of limit same thing so that it is lower to adopt the keyword mode to carry out the success ratio of speech recognition.
Be CN00130067.9 at application number, the technical scheme relevant with speech recognition also disclosed in the Chinese patent such as CN03123123.3 and CN03138149.9, yet technique scheme can only be carried out phonetic synthesis or speech conversion is become literal, and can't realize speech conversion is become the identification of Word message, and, technique scheme designs for English speech recognition, according to above analysis as can be known, english language and Chinese language differ widely from word quantity and taxeme, even also can't effectively identify so that technique scheme is applied in the Chinese speech recognition, the success ratio of speech recognition is lower; Be in the Chinese patent of CN99813093.1 at application number, a kind of interactive user interface that adopts speech recognition and natural language processing is disclosed, although can realize speech conversion is become the identification of Word message, yet this technical scheme also designs for english language, in the process of carrying out speech recognition, need to consider the impact of the factors such as grammer, still can't effectively be applied in the Chinese speech recognition.
Summary of the invention
For solving the problems of the technologies described above, embodiments of the invention provide a kind of natural-sounding recognition methods and device, can improve Chinese speech recognition speed, and the success ratio of speech recognition.
A kind of natural-sounding recognition methods comprises: phonetic corresponding to voice messaging that obtains user's input; The dictionary that employing sets in advance carries out word segmentation processing to described phonetic, obtains the word pinyin string behind the participle; From described dictionary, search word to be identified corresponding to described word pinyin string; Search the target information database according to described word to be identified, from described target information database, obtain the target information the highest with described word match degree to be identified; Wherein, described dictionary is used for being stored into target word and the phonetic corresponding to target word of lang sound identification.
A kind of natural-sounding recognition device comprises:
The first acquiring unit is used for obtaining the phonetic corresponding to voice messaging of user's input;
The word segmentation processing unit be used for to adopt the dictionary that sets in advance that the phonetic that described the first acquiring unit obtains is carried out word segmentation processing, obtains the word pinyin string behind the participle;
Second acquisition unit is used for searching word to be identified corresponding to word pinyin string that described word segmentation processing unit obtains from described dictionary;
Search the unit, be used for searching the target information database according to the word to be identified that described second acquisition unit obtains, from described target information database, obtain the target information the highest with described word match degree to be identified;
Wherein, described dictionary is used for being stored into target word and the phonetic corresponding to target word of lang sound identification.
Natural-sounding recognition methods and device that the embodiment of the invention provides, the to be identified word corresponding according to the word pinyin string carries out information matches, and with the target information that obtains as the identification to voice messaging with the highest information of word match degree to be identified in the target information database, do not need voice messaging mated fully and can obtain target information, improved the success ratio of speech recognition, having solved prior art adopts and voice messaging to be carried out complete matching process carries out speech recognition, causing owing to form of presentation is inconsistent makes speech recognition failed, the problem that the speech recognition success ratio is low, because the technical scheme that the embodiment of the invention provides adopts the mode of word match to carry out speech recognition, only need in dictionary, store the target word and in the target information database storage standards information get final product, do not need same thing is stored a large amount of multi-form text messages according to the language expression mode, the data scale of dictionary and target information database is less, be convenient to search, and then improved speech recognition speed, solve prior art and need in vocabulary, store the text message of a large amount of different expression forms to same thing, cause vocabulary in large scale, be not easy to search, carry out the slow problem of speech recognition.The technical scheme that the embodiment of the invention provides is different from English speech recognition technology, this technical scheme is large for Chinese language literal amount, the characteristics that word links up in the statement, nothing is paused, employing is carried out participle according to phonetic to word in the statement, and carry out speech recognition according to the mode that the word to be identified behind the participle is searched, higher to success ratio and the recognition speed of Chinese speech recognition.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, the below will do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art, apparently, accompanying drawing in the following describes only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.
The natural-sounding recognition methods process flow diagram one that Fig. 1 provides for the embodiment of the invention;
The process flow diagram one of the natural-soundingrecognition methods step 104 that Fig. 2 provides for the embodiment of the invention shown in Figure 1;
The flowchart 2 of the natural-soundingrecognition methods step 104 that Fig. 3 provides for the embodiment of the invention shown in Figure 1;
The natural-sounding recognition methods flowchart 2 that Fig. 4 provides for the embodiment of the invention;
The natural-sounding recognition device structural representation one that Fig. 5 provides for the embodiment of the invention;
The natural-sounding recognition device structural representation two that Fig. 6 provides for the embodiment of the invention;
The natural-sounding recognition device structural representation three that Fig. 7 provides for the embodiment of the invention;
The natural-sounding recognition device structural representation four that Fig. 8 provides for the embodiment of the invention;
Search the structural representation of unit in the natural-sounding recognition device that Fig. 9 provides for the embodiment of the invention shown in Figure 5;
The natural-sounding recognition device structural representation five that Figure 10 provides for the embodiment of the invention;
The natural-sounding recognition device structural representation six that Figure 11 provides for the embodiment of the invention;
The natural-sounding recognition device structural representation seven that Figure 12 provides for the embodiment of the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
Adopt the mode of keyword to carry out the lower problem of speech recognition success ratio in order to solve, the embodiment of the invention provides a kind of natural-sounding recognition methods and device.
As shown in Figure 1, the natural-sounding recognition methods that the embodiment of the invention provides comprises:
Step 101 is obtained the phonetic corresponding to voice messaging of user's input.
For the natural-sounding recognition methods scope of application that the embodiment of the invention is provided wider, can identify the user speech information of different geographical, different accents, in the present embodiment,step 101 can adopt the unspecified person speech recognition technology that the voice messaging of user's input is identified parsing, obtains phonetic corresponding to this voice messaging.
Step 102, the phonetic that adopts the dictionary set in advance thatstep 101 is obtained carries out word segmentation processing, obtains the word pinyin string behind the participle.
Wherein, dictionary is used for being stored into target word and the phonetic corresponding to target word of lang sound identification.
In the present embodiment, the target word of storing in the dictionary can be the word of broad scope, particularly, can obtain the target word and form dictionary from daily life and the information that can touch of working, for example: can from the information of news report every day, extract word, form dictionary; The target word of storing in the dictionary also can be the word of narrow sense scope, particularly, can from the target information database, obtain the target word and form dictionary by canned data, wherein, the target information database is used for storing the information of pending identification, for example: if the natural-sounding recognition methods that the embodiment of the invention provides is applied in the automobile navigation field, the target information database is used for store geographic position information and/or destination name information etc.Need to prove that no matter be the word of broad scope or the word of narrow sense scope, the target word in the dictionary all is unique, does not repeat between each target word.
Because speech recognition technology generally uses in specific area, for example: be applied in navigation, requesting song or search the field such as contact person, in order to reduce the amount of redundancy of target word in the dictionary, save storage space, improve the speed of speech recognition, the embodiment of the invention preferably target word in the dictionary is set to the narrow sense scope word that arranges according to the target information database, but be not limited to above-mentioned set-up mode, well known to a person skilled in the art and be, for applied each industry field of this recognition technology, the technician of described industry all can according to its industry characteristic, rationally arrange its target information database.
In the present embodiment,step 102 specifically can be searched dictionary according to the phonetic thatstep 101 is obtained, the phonetic of phonetic according to the target word that comprises in appearance order and the dictionary is mated, when word pinyin string that the phonetic that finds with the target word mates fully, this word pinyin string is split from phonetic, continue the above-mentioned action of searching of circulation, until finish, thereby realization is to the word segmentation processing of phonetic.
Step 103, the word to be identified that the word pinyin string that findingstep 102 obtains from dictionary is corresponding.
Step 104 is searched the target information database according to word to be identified, obtains the target information the highest with word match degree to be identified from the target information database.
In the present embodiment,step 104 can be obtained the target information the highest with word match degree to be identified by two kinds of methods from the target information database, and the below introduces respectively these two kinds of methods:
1, weight coefficient judgement method
In the present embodiment, if dictionary also is used for corresponding weight grade n and the weight rate range N of storage target word, n, N is integer, N 〉=2, n ∈ [1, N], the importance of target word in described Word message of n level is larger than the importance of target word in described Word message of n+1 level, certainly, the relation of its importance and weight grade n also can be opposite, and those skilled in the art can oneself define as required, and present embodiment is carried out example according to the former, then before thestep 104, also comprise the step of obtaining weight grade corresponding to word to be identified according to dictionary.
Particularly, can set in advance the weight rate range N of word in the dictionary, and the weight grade n of each word, for example: the weight rate range of the target word that can dictionary comprises is set to 3, wherein, heavy grade is 1 the highest, the weight grade is 3 minimum, then the weight grade that according to monopoly and the popularity of target word each target word is set, as, when the target word was place name, the weight grade was set to 3, when target word right and wrong geographic position proprietary refers to noun (such as little fertile sheep), the weight grade is set to 1, certainly, described those skilled in the art can arrange rule according to other above-mentioned target word is carried out the weight grade classification, every kind of situation are not given unnecessary details one by one herein.Afterstep 102 is divided into word with Word message, from dictionary, obtain the weight grade attribute information of each word.
Then this moment, as shown in Figure 2,step 104 can comprise:
Step 1041 is searched the target information database according to word to be identified, from the target information database, obtain with word to be identified in the information aggregate that forms of the information of any one or a plurality of word match.
Step 1042, the weight grade corresponding according to word to be identified, every information in the information aggregate thatstep 1041 is obtained is processed respectively, obtains the weight coefficient of every information.
In the present embodiment,step 1042 can adopt Weighted Average Algorithm to obtain the weight coefficient of every information, can certainly adopt other algorithms to obtain the weight information of every information, does not give unnecessary details one by one herein.
Step 1043, the information that the weight selection coefficient is the highest from the information aggregate thatstep 1041 is obtained is target information.
Need to prove, in order to guarantee the accuracy of the target information thatstep 104 is obtained, improve the speech recognition quality, in the present embodiment, should comprise at least one weight grade in the word to be identified thatstep 103 is obtained and be 1 word, if not having the weight grade in the word to be identified is 1 word, then beforestep 104, also comprise: the phonetic that againstep 101 is obtained carries out word segmentation processing, to obtain the word of at least one weight grade as 1, then this moment,step 104 replaced with: search the target information database according to the word to be identified behind the participle again, obtaining from the target information database with word match degree to be identified is 1 target information.
Further, the natural-sounding recognition methods that provides of the embodiment of the invention can also comprise: at least one the highest grade of weight word and pinyin string corresponding to this word that obtains behind the participle again added in the described dictionary.
Need to prove, the embodiment of the invention is carried out concrete giving an example to the division of weight grade height, the height attribute of weight grade can also be set by other rules in the use procedure of reality, for example: when the weight rate range is 3, the weight grade can be set be 3 the highest, the weight grade is 1 minimum, and above method is that those skilled in the art can associate under the prerequisite of not paying creative work easily, gives unnecessary details no longer one by one herein.
2, the nested method of searching
As shown in Figure 3,step 104 can comprise:
Step 1044, the word to be identified thatstep 103 is obtained sorts.
In the present embodiment, step 1044 can sort word according to the sequencing that occurs in Word message, preferably, in order to improve seek rate, step 1044 can be obtained first the keyword in the word that Word message comprises, and the word that then Word message is comprised sorts according to the order of keyword, rear auxiliary word and front auxiliary word.
Wherein, keyword is to have the proprietary word that refers to meaning, and rear auxiliary word is to be positioned at keyword word afterwards in the Word message, and front auxiliary word is to be positioned at keyword word before in the Word message.
In the present embodiment, can set in advance antistop list, this antistop list can be according to canned data setting in the target information database, the technical scheme that the embodiment of the invention provides is after obtaining word to be identified, antistop list searched respectively in each word in the word to be identified, obtain with antistop list in the word of the keyword coupling of storing be the keyword that Word message comprises.
Need to prove that if know and do not have keyword in the word to be identified, then step 1044 sorts according to the sequencing that word occurs after searching; If know to comprise two above keywords in the word to be identified after searching, then auxiliary word is the later non-key word of first keyword in the word to be identified afterwards, and step 1044 still sorts according to the order of keyword, rear auxiliary word and front auxiliary word.
Need to prove that if instep 103, same word pinyin string finds word to be found more than two in dictionary, then step 1044 with described more than two word to be found sort as a Set Global.
The embodiment of the invention sorts by word that Word message the is comprised order according to keyword, rear auxiliary word and front auxiliary word, so that subsequent step is searched when coupling according to word order, keynote message is outstanding, can significantly shorten the time that coupling searched in word, improve the speed of speech recognition.
Step 1045 according to the ranking results of step 1044, is obtained first word from word to be identified, obtain the information with first word match from the target information database.
Step 1046 is obtained second word from word to be identified, obtain the information with second word match from the information aggregate that the information with first word match forms.
By that analogy, step 1047 is obtained last word from word to be identified, obtains the target information with last word match from the information aggregate that the information of a upper word match adjacent with last word forms.
Need to prove, in above step 1045-1047, if do not find the information with current word match, match information that then can current word is set to the information of a upper word match adjacent with this current word, if, current word is first word, and then the information of this first word match is the information that comprises in the whole target information database.
In order to make those skilled in the art more deep understanding be arranged to the above-described nested method of searching, below by concrete example nested specific implementation of searching method is described:
For example: the voice messaging of inputting as the user is: during the little fertile sheep chafing dish restaurant in anistree East Road, Shijingshan District, Beijing, obtain the phonetic corresponding with this voice messaging, comprising: beijingshijingshanqubajiaodongluxiaofeiyanghuoguodian; According to dictionary this phonetic is carried out participle, obtain the word pinyin string, comprising: beijing, shijingshanqu, bajiao, donglu, xiaofeiyang, huoguodian; Search dictionary according to the word pinyin string and obtain word to be identified, comprising: Beijing, Shijingshan District, anise, East Road, (little fertile sheep, the little sheep of boiling), chafing dish restaurant; If the word to be identified that xiaofeiyang is corresponding (little fertile sheep and the little sheep of boiling) is keyword, according to keyword, rear auxiliary word and front auxiliary word ordering be: (little fertile sheep, the little sheep of boiling), chafing dish restaurant, Beijing, Shijingshan District, anise, East Road; When the target information database comprises: little fertile sheep supermarket, Beijing, the little sheep chafing dish restaurant that boils in Beijing, the little sheep food and drink company of boiling in Shanghai, the little sheep roast meat shop of boiling in Shijingshan District, Beijing, ancient city, Shijingshan District Lu Xiaofei sheep chafing dish restaurant, Donglaishun, Beijing chafing dish restaurant, Donglaishun, anistree North Road, Beijing chafing dish restaurant, during the information such as the anistree little fertile sheep chafing dish restaurant in Beijing, according to the above-mentioned nested method of searching, at first, from the target information database, obtain the information of the keyword set coupling that forms with " little fertile sheep and the little sheep of boiling ", form first information storehouse, this first information storehouse comprises: little fertile sheep supermarket, Beijing, the little sheep chafing dish restaurant that boils in Beijing, the little sheep food and drink company of boiling in Shanghai, the little sheep roast meat shop of boiling in Shijingshan District, Beijing, ancient city, Shijingshan District Lu Xiaofei sheep chafing dish restaurant, the anistree little fertile sheep chafing dish restaurant in Beijing, then, from first information storehouse, obtain the information with " chafing dish restaurant " coupling, form the second information bank, this second information bank comprises: the little sheep chafing dish restaurant that boils in Beijing, ancient city, Shijingshan District Lu Xiaofei sheep chafing dish restaurant, the anistree little fertile sheep chafing dish restaurant in Beijing, the 3rd, from the second information bank, obtain the information with " Beijing " coupling, form the 3rd information bank, the 3rd information bank comprises: the little sheep chafing dish restaurant that boils in Beijing, the anistree little fertile sheep chafing dish restaurant in Beijing, the 4th, from the 3rd information bank, obtain the information with " anise " coupling, form the 4th information bank, the 4th information bank comprises: the anistree little fertile sheep chafing dish restaurant in Beijing, the 5th, from the 4th information bank, obtain the target information with " East Road " coupling, since in the 4th information bank not with the information of " East Road " coupling, so target information is the information that comprises in the 4th information bank, i.e. the anistree little fertile sheep chafing dish restaurant in Beijing.
Can find exactly the highest target information of word match degree that comprises with text message by above-described weight coefficient judgement method and the nested method of searching, realize the identification to the voice messaging of user's input.Certainly, in the use procedure of reality, the highest target information of word match degree that can also adopt additive method to obtain to comprise with text message is not given unnecessary details herein one by one.
Further, if instep 104, chosen two above target informations, in order to improve the accurately fixed of speech recognition, as shown in Figure 4, can also comprise after the step 104:
Step 105, the target information of obtaining user's transmission is chosen indication or user's speech recognition statistical information.
Particularly, the embodiment of the invention can be shown to the user with two above target informations choosing afterstep 104, and step 105 receives the user and chooses indication by the target information that the modes such as voice or button or literal input send.
Perhaps, the natural-sounding recognition methods that the embodiment of the invention provides can be added up the information that the user carries out speech recognition at every turn, and this statistics can be for specific user individual, also can be for specific user colony.Further, this speech recognition statistics can be for carrying out the number of times of speech recognition or the result of frequency statistics to one or more target information of user, also can be for a plurality of users being carried out for the last time the statistics of the target information of speech recognition, certainly can also for other statisticses relevant with speech recognition, not give unnecessary details one by one herein.
Step 106, according to target information choose the indication or the speech recognition statistical information from two above target informations, choose selected objective target information.
For example: when the speech recognition statistics for a plurality of target informations of user are carried out the number of times of speech recognition adds up as a result the time, if the phonetic corresponding to voice messaging of user's input is xiaofeiyanghuoguodian, step 104 has been obtained 4 objective information, comprise: the little fertile sheep chafing dish restaurant in Haidian District, the little fertile sheep chafing dish restaurant in Zhong Guan-cun, Haidian District, the little fertile sheep chafing dish restaurant in anistree East Road, Shijingshan, and Xizhimen Jia Mao is little boils during the sheep chafing dish restaurant, step 105 can be obtained speech recognition statistics corresponding to described 4 objective information, carry out speech recognition 3 times such as " the little fertile sheep chafing dish restaurant in Haidian District ", " the little fertile sheep chafing dish restaurant in Zhong Guan-cun, Haidian District " carries out speech recognition 5 times, " the little fertile sheep chafing dish restaurant in anistree East Road, Shijingshan " carries out speech recognition 40 times, " the little sheep chafing dish restaurant that boils of Xizhimen Jia Mao " carries out speech recognition 1 time, then step 106 can according to statistics, be chosen " the little fertile sheep chafing dish restaurant in anistree East Road, Shijingshan " and be selected objective target information from 4 objective information.
Alternatively, in order further to shorten the time of speech recognition, improve speech recognition speed, in the present embodiment, before thestep 104, can also comprise according to word to be identified and search spoken dictionary, according to lookup result, the step of deletion spoken word from word to be identified, wherein, spoken dictionary is used for the storage spoken word, does not comprise the Word message that has substantive implication in the voice messaging that relates to user's input in this spoken word.
In the present embodiment, can adopt the method for statistics to set in advance spoken dictionary, can comprise people's spoken word used in everyday in this spoken language dictionary, for example: " I think ", " I want ", " may I ask ", " being ", " right ", " can " and " how " etc., the spoken word that comprises in the spoken word storehouse is not given unnecessary details one by one herein.
Further, for the natural-sounding recognition methods that the embodiment of the invention is provided can be applicable to pronounce to pronounce indistinctly Chu and the different crowd of pronunciation standard, improve success ratio and the accuracy rate of speech recognition, on the technical scheme basis shown in above Fig. 1-4, the natural-sounding recognition methods that the embodiment of the invention provides can also comprise: the phonetic that step 101 is obtained carries out the fuzzy phoneme matching treatment, obtain the step of the phonetic after the fuzzy matching, then this moment, step 102 was specially: the phonetic after adopting the dictionary set in advance to fuzzy matching carries out word segmentation processing, obtains the word pinyin string behind the participle.
Particularly, can set in advance phonetic fuzzy matching table, in this phonetic fuzzy matching table, define matched rule, for example: z=zh, c=ch, s=sh, l=n, f=h, r=l, an=ang, en=eng, in=ing, ian=iang, uan=uang, iong=ing etc., do not give unnecessary details one by one, the phonetic that step 101 is obtained according to described rule carries out the fuzzy phoneme matching treatment herein.
By phonetic is carried out fuzzy matching, solved because problems such as the speech recognition failure that the user is speak with a lisp, cacoepy really causes or identification errors, and then improved recognition success rate and the accuracy rate that the embodiment of the invention provides the natural-sounding recognition methods.
The natural-sounding recognition methods that the embodiment of the invention provides, the to be identified word corresponding according to the word pinyin string carries out information matches, and with the target information that obtains as the identification to voice messaging with the highest information of word match degree to be identified in the target information database, do not need voice messaging mated fully and can obtain target information, improved the success ratio of speech recognition, having solved prior art adopts and voice messaging to be carried out complete matching process carries out speech recognition, causing owing to form of presentation is inconsistent makes speech recognition failed, the problem that the speech recognition success ratio is low, because the technical scheme that the embodiment of the invention provides adopts the mode of word match to carry out speech recognition, only need in dictionary, store the target word and in the target information database storage standards information get final product, do not need same thing is stored a large amount of multi-form text messages according to the language expression mode, the data scale of dictionary and target information database is less, be convenient to search, and then improved speech recognition speed, solve prior art and need in vocabulary, store the text message of a large amount of different expression forms to same thing, cause vocabulary in large scale, be not easy to search, carry out the slow problem of speech recognition.The technical scheme that the embodiment of the invention provides is different from English speech recognition technology, this technical scheme is large for Chinese language literal amount, the characteristics that word links up in the statement, nothing is paused, employing is carried out participle according to phonetic to word in the statement, and carry out speech recognition according to the mode that the word to be identified behind the participle is searched, higher to success ratio and the recognition speed of Chinese speech recognition.
As shown in Figure 5, the embodiment of the invention also provides a kind of natural-sounding recognition device, comprising:
The first acquiringunit 501 is used for obtaining the phonetic corresponding to voice messaging of user's input;
Wordsegmentation processing unit 502 be used for to adopt the dictionary that sets in advance that the phonetic that the first acquiringunit 501 obtains is carried out word segmentation processing, obtains the word pinyin string behind the participle;
Second acquisition unit 503 is used for searching word to be identified corresponding to word pinyin string that wordsegmentation processing unit 502 obtains from dictionary;
Search unit 504, be used for searching the target information database according to the word to be identified thatsecond acquisition unit 503 obtains, from the target information database, obtain the target information the highest with word match degree to be identified;
Wherein, described dictionary is used for being stored into target word and the phonetic corresponding to target word of lang sound identification.
Further, as shown in Figure 6, the natural-sounding recognition device that the embodiment of the invention provides can also comprise:
The 3rd acquiringunit 505, also be used for corresponding weight grade n and the weight rate range N of storage target word if be used for dictionary, obtain weight grade corresponding to word to be identified thatsecond acquisition unit 503 obtains according to dictionary, wherein, n, N is integer, N 〉=2, n ∈ [1, N], the importance of target word in described Word message of n level is larger than the importance of target word in described Word message of n+1 level, and certainly, the relation of its importance and weight grade n also can be opposite, those skilled in the art can oneself define as required, and present embodiment is carried out example according to the former;
Then, searching unit 504 can comprise:
Search subelement 5041, be used for searching the target information database according to the word to be identified thatsecond acquisition unit 503 obtains, from the target information database, obtain with word to be identified in the information aggregate that forms of the information of any one or a plurality of word match;
First obtains subelement 5042, is used for weight grade corresponding to word to be identified obtain according to the 3rd acquiringunit 505, and every information of searching in the information aggregate that subelement 5041 obtains is processed respectively, obtains the weight coefficient of every information;
Second obtains subelement 5043, is used for choosing first to obtain the highest information of weight coefficient that subelement 5042 obtains being target information from searching information aggregate that subelement 5041 obtains.
Further, as shown in Figure 7, the natural-sounding recognition device that the embodiment of the invention provides can also comprise:
Heavy participle unit 506, not have the weight grade be 1 word if be used for word to be identified thatsecond acquisition unit 503 obtains, the phonetic that again the first acquiringunit 501 is obtained carries out word segmentation processing, to obtain the word of at least one weight grade as 1;
Search unit 504, can also be used for searching the target information database according to the word to be identified behind the heavy participle unit 506 again participle, from the target information database, obtain the target information the highest with word match degree to be identified.
Further, as shown in Figure 8, the natural-sounding recognition device that the embodiment of the invention provides can also comprise:
Updating block 507, being used at least one weight grade that heavy participle unit 506 obtains is that 1 word and pinyin string corresponding to this word are added dictionary to.
Further, as shown in Figure 9, searching unit 504 can also comprise:
Ordering subelement 5044 is used for word to be identified is sorted;
The 3rd obtains subelement 5045, is used for the result according to 5044 orderings of ordering subelement, obtains first word from word to be identified, obtains the information with first word match from the target information database;
The 4th obtains subelement 5046, is used for obtaining second word from word to be identified, obtains the information with second word match from the information aggregate that the information with first word match forms;
By that analogy, the 5th obtains subelement 5047, is used for obtaining last word from word to be identified, obtains the target information with last word match from the information aggregate that the information of a upper word match adjacent with last word forms.
Further, as shown in figure 10, the natural-sounding recognition device that the embodiment of the invention provides can also comprise:
Delete cells 508, be used for searching spoken dictionary according to the word to be identified thatsecond acquisition unit 503 obtains, according to lookup result, from word to be identified, delete spoken word, wherein, spoken dictionary is used for the storage spoken word, does not comprise the Word message that has substantive implication in the voice messaging that relates to described user's input in this spoken word.
Further, as shown in figure 11, the natural-sounding recognition device that the embodiment of the invention provides can also comprise:
The 4th acquiring unit 509 finds two above target informations if be used for searching unit 504, and the target information of obtaining user's transmission is chosen indication or user's speech recognition statistical information;
Choose unit 5010, be used for choosing indication or speech recognition statistical information according to the target information that the 4th acquiring unit 509 obtains and choose selected objective target information from two above target informations of searching unit 504 and finding.
Further, as shown in figure 12, the natural-sounding recognition device that the embodiment of the invention provides can also comprise:
Fuzzy Processing unit 5011, the phonetic that is used for the first acquiringunit 501 is obtained carries out the fuzzy phoneme matching treatment, obtains the phonetic after the fuzzy matching;
Wordsegmentation processing unit 502 can also be used for adopt the phonetic after the fuzzy matching that the dictionary that sets in advance obtains Fuzzy Processing unit 5011 to carry out word segmentation processing, obtains the word pinyin string behind the participle.
The specific implementation of the natural-sounding recognition device that the embodiment of the invention provides can be described referring to the natural-sounding recognition methods that the embodiment of the invention provides, and repeats no more herein.
The natural-sounding recognition device that the embodiment of the invention provides, the to be identified word corresponding according to the word pinyin string carries out information matches, and with the target information that obtains as the identification to voice messaging with the highest information of word match degree to be identified in the target information database, do not need voice messaging mated fully and can obtain target information, improved the success ratio of speech recognition, having solved prior art adopts and voice messaging to be carried out complete matching process carries out speech recognition, causing owing to form of presentation is inconsistent makes speech recognition failed, the problem that the speech recognition success ratio is low, because the technical scheme that the embodiment of the invention provides adopts the mode of word match to carry out speech recognition, only need in dictionary, store the target word and in the target information database storage standards information get final product, do not need same thing is stored a large amount of multi-form text messages according to the language expression mode, the data scale of dictionary and target information database is less, be convenient to search, and then improved speech recognition speed, solve prior art and need in vocabulary, store the text message of a large amount of different expression forms to same thing, cause vocabulary in large scale, be not easy to search, carry out the slow problem of speech recognition.The technical scheme that the embodiment of the invention provides is different from English speech recognition technology, this technical scheme is large for Chinese language literal amount, the characteristics that word links up in the statement, nothing is paused, employing is carried out participle according to phonetic to word in the statement, and carry out speech recognition according to the mode that the word to be identified behind the participle is searched, higher to success ratio and the recognition speed of Chinese speech recognition.
The natural-sounding recognition methods that the embodiment of the invention provides and device can be applied in as in the information service systems such as navigation, requesting song and contact person's inquiry.
The above; be the specific embodiment of the present invention only, but protection scope of the present invention is not limited to this, anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; can expect easily changing or replacing, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion by described protection domain with claim.

Claims (18)

CN2011101847596A2011-07-042011-07-04Method and device for recognizing natural speechPendingCN102867512A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN2011101847596ACN102867512A (en)2011-07-042011-07-04Method and device for recognizing natural speech

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN2011101847596ACN102867512A (en)2011-07-042011-07-04Method and device for recognizing natural speech

Publications (1)

Publication NumberPublication Date
CN102867512Atrue CN102867512A (en)2013-01-09

Family

ID=47446336

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN2011101847596APendingCN102867512A (en)2011-07-042011-07-04Method and device for recognizing natural speech

Country Status (1)

CountryLink
CN (1)CN102867512A (en)

Cited By (35)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103383699A (en)*2013-06-282013-11-06安徽科大讯飞信息科技股份有限公司Character string retrieval method and system
CN103928024A (en)*2013-01-142014-07-16联想(北京)有限公司Voice query method and electronic equipment
CN105206272A (en)*2015-09-062015-12-30上海智臻智能网络科技股份有限公司Voice transmission control method and system
CN105489220A (en)*2015-11-262016-04-13小米科技有限责任公司Method and device for recognizing speech
CN106297799A (en)*2016-08-092017-01-04乐视控股(北京)有限公司Voice recognition processing method and device
CN107239547A (en)*2017-06-052017-10-10北京智能管家科技有限公司 Voice error correction method, terminal and storage medium for voice song ordering
CN107301866A (en)*2017-06-232017-10-27北京百度网讯科技有限公司Data inputting method
CN107451119A (en)*2017-07-262017-12-08上海智臻智能网络科技股份有限公司Method for recognizing semantics and device, storage medium, computer equipment based on interactive voice
CN108040185A (en)*2017-12-062018-05-15福建天晴数码有限公司A kind of method and apparatus for identifying harassing call
CN108257593A (en)*2017-12-292018-07-06深圳和而泰数据资源与云技术有限公司A kind of audio recognition method, device, electronic equipment and storage medium
CN108363735A (en)*2018-01-182018-08-03福建网龙计算机网络信息技术有限公司A kind of advertisement telephone knows method for distinguishing and terminal
CN108509416A (en)*2018-03-202018-09-07京东方科技集团股份有限公司Sentence realizes other method and device, equipment and storage medium
CN108630200A (en)*2017-03-172018-10-09株式会社东芝Voice keyword detection device and voice keyword detection method
CN109074821A (en)*2016-04-222018-12-21索尼移动通讯有限公司Speech is to Text enhancement media editing
CN109147760A (en)*2017-06-282019-01-04阿里巴巴集团控股有限公司Synthesize method, apparatus, system and the equipment of voice
CN109213994A (en)*2018-07-262019-01-15深圳市元征科技股份有限公司Information matching method and device
CN109213970A (en)*2017-06-302019-01-15北京国双科技有限公司Put down generation method and device
CN109559753A (en)*2017-09-272019-04-02北京国双科技有限公司Audio recognition method and device
CN109559752A (en)*2017-09-272019-04-02北京国双科技有限公司Audio recognition method and device
CN109741741A (en)*2018-12-292019-05-10深圳Tcl新技术有限公司Control method, intelligent terminal and the computer readable storage medium of intelligent terminal
CN109918502A (en)*2019-01-252019-06-21深圳壹账通智能科技有限公司 Documentation teaching method, apparatus, computer apparatus, and computer-readable storage medium
CN109976702A (en)*2019-03-202019-07-05青岛海信电器股份有限公司A kind of audio recognition method, device and terminal
CN110008471A (en)*2019-03-262019-07-12北京博瑞彤芸文化传播股份有限公司A kind of intelligent semantic matching process based on phonetic conversion
CN110162780A (en)*2019-04-082019-08-23深圳市金微蓝技术有限公司The recognition methods and device that user is intended to
CN110600005A (en)*2018-06-132019-12-20蔚来汽车有限公司Speech recognition error correction method and apparatus, computer device and recording medium
CN110728137A (en)*2019-10-102020-01-24京东数字科技控股有限公司Method and device for word segmentation
CN110956859A (en)*2019-11-052020-04-03合肥成方信息技术有限公司VR intelligent voice interaction English method based on deep learning
CN111350249A (en)*2020-04-132020-06-30于巧宇Intelligent closestool device based on speech recognition
CN111611349A (en)*2020-05-262020-09-01深圳壹账通智能科技有限公司 Voice query method, device, computer equipment and storage medium
CN111737541A (en)*2020-06-302020-10-02湖北亿咖通科技有限公司Semantic recognition and evaluation method supporting multiple languages
CN111755026A (en)*2019-05-222020-10-09广东小天才科技有限公司 A kind of speech recognition method and system
CN112331207A (en)*2020-09-302021-02-05音数汇元(上海)智能科技有限公司Service content monitoring method and device, electronic equipment and storage medium
CN112634858A (en)*2020-12-162021-04-09平安科技(深圳)有限公司Speech synthesis method, speech synthesis device, computer equipment and storage medium
CN112786024A (en)*2020-12-282021-05-11华南理工大学Voice command recognition method under condition of no professional voice data in water treatment field
CN120089132A (en)*2025-01-222025-06-03广州中医药大学(广州中医药研究院) A real-time enhancement method and system for speech recognition based on words in a finite vocabulary

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6999932B1 (en)*2000-10-102006-02-14Intel CorporationLanguage independent voice-based search system
CN101145289A (en)*2007-09-132008-03-19上海交通大学 Voice Answering System in Distance Education Environment Based on Agent Technology
CN101505328A (en)*2008-02-042009-08-12台达电子工业股份有限公司Network data retrieval method and system applying voice recognition
CN101996195A (en)*2009-08-282011-03-30中国移动通信集团公司Searching method and device of voice information in audio files and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6999932B1 (en)*2000-10-102006-02-14Intel CorporationLanguage independent voice-based search system
CN101145289A (en)*2007-09-132008-03-19上海交通大学 Voice Answering System in Distance Education Environment Based on Agent Technology
CN101505328A (en)*2008-02-042009-08-12台达电子工业股份有限公司Network data retrieval method and system applying voice recognition
CN101996195A (en)*2009-08-282011-03-30中国移动通信集团公司Searching method and device of voice information in audio files and equipment

Cited By (52)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103928024A (en)*2013-01-142014-07-16联想(北京)有限公司Voice query method and electronic equipment
CN103383699B (en)*2013-06-282016-11-09科大讯飞股份有限公司Character string retrieving method and system
CN103383699A (en)*2013-06-282013-11-06安徽科大讯飞信息科技股份有限公司Character string retrieval method and system
CN105206272A (en)*2015-09-062015-12-30上海智臻智能网络科技股份有限公司Voice transmission control method and system
CN105489220A (en)*2015-11-262016-04-13小米科技有限责任公司Method and device for recognizing speech
CN105489220B (en)*2015-11-262020-06-19北京小米移动软件有限公司 Speech recognition method and device
CN109074821B (en)*2016-04-222023-07-28索尼移动通讯有限公司Method and electronic device for editing media content
CN109074821A (en)*2016-04-222018-12-21索尼移动通讯有限公司Speech is to Text enhancement media editing
CN106297799A (en)*2016-08-092017-01-04乐视控股(北京)有限公司Voice recognition processing method and device
CN108630200A (en)*2017-03-172018-10-09株式会社东芝Voice keyword detection device and voice keyword detection method
CN108630200B (en)*2017-03-172022-01-07株式会社东芝Voice keyword detection device and voice keyword detection method
CN107239547B (en)*2017-06-052019-05-28北京儒博科技有限公司 Voice error correction method, terminal and storage medium for voice song request
CN107239547A (en)*2017-06-052017-10-10北京智能管家科技有限公司 Voice error correction method, terminal and storage medium for voice song ordering
CN107301866A (en)*2017-06-232017-10-27北京百度网讯科技有限公司Data inputting method
CN107301866B (en)*2017-06-232021-01-05北京百度网讯科技有限公司Information input method
CN109147760A (en)*2017-06-282019-01-04阿里巴巴集团控股有限公司Synthesize method, apparatus, system and the equipment of voice
CN109213970B (en)*2017-06-302022-07-29北京国双科技有限公司Method and device for generating notes
CN109213970A (en)*2017-06-302019-01-15北京国双科技有限公司Put down generation method and device
CN107451119A (en)*2017-07-262017-12-08上海智臻智能网络科技股份有限公司Method for recognizing semantics and device, storage medium, computer equipment based on interactive voice
CN109559753B (en)*2017-09-272022-04-12北京国双科技有限公司Speech recognition method and device
CN109559753A (en)*2017-09-272019-04-02北京国双科技有限公司Audio recognition method and device
CN109559752A (en)*2017-09-272019-04-02北京国双科技有限公司Audio recognition method and device
CN109559752B (en)*2017-09-272022-04-26北京国双科技有限公司Speech recognition method and device
CN108040185B (en)*2017-12-062019-11-19福建天晴数码有限公司A kind of method and apparatus identifying harassing call
CN108040185A (en)*2017-12-062018-05-15福建天晴数码有限公司A kind of method and apparatus for identifying harassing call
CN108257593A (en)*2017-12-292018-07-06深圳和而泰数据资源与云技术有限公司A kind of audio recognition method, device, electronic equipment and storage medium
CN108257593B (en)*2017-12-292020-11-13深圳和而泰数据资源与云技术有限公司Voice recognition method and device, electronic equipment and storage medium
CN108363735B (en)*2018-01-182021-10-01福建网龙计算机网络信息技术有限公司Method and terminal for identifying advertisement telephone
CN108363735A (en)*2018-01-182018-08-03福建网龙计算机网络信息技术有限公司A kind of advertisement telephone knows method for distinguishing and terminal
CN108509416A (en)*2018-03-202018-09-07京东方科技集团股份有限公司Sentence realizes other method and device, equipment and storage medium
CN110600005A (en)*2018-06-132019-12-20蔚来汽车有限公司Speech recognition error correction method and apparatus, computer device and recording medium
CN110600005B (en)*2018-06-132023-09-19蔚来(安徽)控股有限公司 Speech recognition error correction method and device, computer equipment and recording medium
CN109213994A (en)*2018-07-262019-01-15深圳市元征科技股份有限公司Information matching method and device
CN109741741A (en)*2018-12-292019-05-10深圳Tcl新技术有限公司Control method, intelligent terminal and the computer readable storage medium of intelligent terminal
CN109918502A (en)*2019-01-252019-06-21深圳壹账通智能科技有限公司 Documentation teaching method, apparatus, computer apparatus, and computer-readable storage medium
CN109976702A (en)*2019-03-202019-07-05青岛海信电器股份有限公司A kind of audio recognition method, device and terminal
CN110008471A (en)*2019-03-262019-07-12北京博瑞彤芸文化传播股份有限公司A kind of intelligent semantic matching process based on phonetic conversion
CN110162780A (en)*2019-04-082019-08-23深圳市金微蓝技术有限公司The recognition methods and device that user is intended to
CN110162780B (en)*2019-04-082023-05-09深圳市金微蓝技术有限公司User intention recognition method and device
CN111755026A (en)*2019-05-222020-10-09广东小天才科技有限公司 A kind of speech recognition method and system
CN110728137A (en)*2019-10-102020-01-24京东数字科技控股有限公司Method and device for word segmentation
CN110956859A (en)*2019-11-052020-04-03合肥成方信息技术有限公司VR intelligent voice interaction English method based on deep learning
CN111350249A (en)*2020-04-132020-06-30于巧宇Intelligent closestool device based on speech recognition
CN111611349A (en)*2020-05-262020-09-01深圳壹账通智能科技有限公司 Voice query method, device, computer equipment and storage medium
CN111737541B (en)*2020-06-302021-10-15湖北亿咖通科技有限公司Semantic recognition and evaluation method supporting multiple languages
CN111737541A (en)*2020-06-302020-10-02湖北亿咖通科技有限公司Semantic recognition and evaluation method supporting multiple languages
CN112331207A (en)*2020-09-302021-02-05音数汇元(上海)智能科技有限公司Service content monitoring method and device, electronic equipment and storage medium
CN112634858A (en)*2020-12-162021-04-09平安科技(深圳)有限公司Speech synthesis method, speech synthesis device, computer equipment and storage medium
CN112634858B (en)*2020-12-162024-01-23平安科技(深圳)有限公司Speech synthesis method, device, computer equipment and storage medium
CN112786024A (en)*2020-12-282021-05-11华南理工大学Voice command recognition method under condition of no professional voice data in water treatment field
CN112786024B (en)*2020-12-282022-05-24华南理工大学Voice command recognition method in water treatment field under condition of no professional voice data
CN120089132A (en)*2025-01-222025-06-03广州中医药大学(广州中医药研究院) A real-time enhancement method and system for speech recognition based on words in a finite vocabulary

Similar Documents

PublicationPublication DateTitle
CN102867512A (en)Method and device for recognizing natural speech
CN102867511A (en)Method and device for recognizing natural speech
CN102254557B (en)Navigation method and system based on natural voice identification
KR102417045B1 (en)Method and system for robust tagging of named entities
US10997370B2 (en)Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
CN106326303B (en)A kind of spoken semantic analysis system and method
CN110210029A (en)Speech text error correction method, system, equipment and medium based on vertical field
CN109492077A (en)The petrochemical field answering method and system of knowledge based map
CN104011712A (en)Evaluating query translations for cross-language query suggestion
CN102750949A (en)Voice recognition method and device
CN108287843A (en)A kind of method and apparatus and navigation equipment of interest point information retrieval
CN103365925A (en)Method for acquiring polyphone spelling, method for retrieving based on spelling, and corresponding devices
CN102479191A (en)Method and device for providing multi-granularity word segmentation result
CN107665217A (en)A kind of vocabulary processing method and system for searching service
CN105096944B (en)Audio recognition method and device
CN105574173A (en)Commodity searching method and commodity searching device based on voice recognition
CN110909116B (en)Entity set expansion method and system for social media
CN108038099B (en) A low-frequency keyword recognition method based on word clustering
CN114238595A (en) A method and system for question answering of metallurgical knowledge based on knowledge graph
CN102322866A (en)Navigation method and system based on natural speech recognition
CN109165331A (en)A kind of index establishing method and its querying method and device of English place name
CN102347026B (en)Audio/video on demand method and system based on natural voice recognition
CN113806483A (en) Data processing method, apparatus, electronic device and computer program product
CN102385597B (en)The fault-tolerant searching method of a kind of POI
CN105808737B (en)Information retrieval method and server

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
C02Deemed withdrawal of patent application after publication (patent law 2001)
WD01Invention patent application deemed withdrawn after publication

Application publication date:20130109


[8]ページ先頭

©2009-2025 Movatter.jp