Movatterモバイル変換


[0]ホーム

URL:


CN1979638A - Method for correcting error of voice identification result - Google Patents

Method for correcting error of voice identification result
Download PDF

Info

Publication number
CN1979638A
CN1979638ACNA2005101274476ACN200510127447ACN1979638ACN 1979638 ACN1979638 ACN 1979638ACN A2005101274476 ACNA2005101274476 ACN A2005101274476ACN 200510127447 ACN200510127447 ACN 200510127447ACN 1979638 ACN1979638 ACN 1979638A
Authority
CN
China
Prior art keywords
error correction
masterplate
recognition result
degree
error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2005101274476A
Other languages
Chinese (zh)
Inventor
王晓瑞
江杰
王士进
丁鹏
徐波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of SciencefiledCriticalInstitute of Automation of Chinese Academy of Science
Priority to CNA2005101274476ApriorityCriticalpatent/CN1979638A/en
Publication of CN1979638ApublicationCriticalpatent/CN1979638A/en
Pendinglegal-statusCriticalCurrent

Links

Landscapes

Abstract

The invention relates to the voice recognition technical field, especially relating to a voice recognition result correcting method, namely a method for correct the recognition result by error correction knowledge library, and the most basic characteristics of the method comprises: 1. using continuous language fragments in corpus as error correcting template, and using the corpus to build error correction template library; 2. indexing for the error correction template library and using the searching technique to fast find error correction templates; 3. according to error correction modes, using creditability to cut recognition result into short recognition fragments and submitting creditable parts in the recognition fragments to an error correction template system for fast find, and obtaining error correction template candidates highly related with the recognition fragments; and 4. using acoustic confusion matrix to select templates close to acoustic characteristics of the recognition fragments from the error correction template candidates to make substitution error correction.

Description

Translated fromChinese
一种语音识别结果纠错方法A method for correcting errors in speech recognition results

技术领域technical field

本发明涉及语音识别技术领域,特别是一种语音识别结果纠错方法。The invention relates to the technical field of speech recognition, in particular to a method for correcting errors in speech recognition results.

背景技术Background technique

目前大部分语音识别系统采用N元文法(Ngram)语言模型,由于这种模型存在一种不完善的独立性假设,即假设当前词只依赖该词之前的N-1个词,其局限性表现在它只是前N-1个词的不确定性推理,导致识别结果往往出现毫无意义的句子或片段。At present, most speech recognition systems use the N-gram language model (Ngram). Since this model has an imperfect independence assumption, that is, it is assumed that the current word only depends on the N-1 words before the word, and its limitations show It is only the uncertainty reasoning of the first N-1 words, which often leads to meaningless sentences or fragments in the recognition results.

发明内容Contents of the invention

本发明提出了一种语音识别结果纠错方法,能够利用可变长纠错模版,根据置信度和声学混淆度对识别结果进行纠错。本发明可用于大词汇量连续语音识别系统。本发明主要有如下特征:一是以语料库中的连续语言片段作为纠错模版,利用语料库建立纠错模版库;二是对纠错模版库建立索引,使用快速搜索技术对纠错模版库进行快速查找;三是依据纠错模式,利用置信度将识别结果切分成短的识别片段,并将识别片段中的可信赖部分提交的纠错模版系统进行快速查找,得到与识别片段相关性高的纠错模版候选;四是利用声学混淆度矩阵从纠错模版候选中挑选与识别片段声学特征相近的模版进行替换纠错。The invention proposes an error correction method for speech recognition results, which can use a variable-length error correction template to correct errors for the recognition results according to the degree of confidence and the degree of acoustic confusion. The invention can be used in a large vocabulary continuous speech recognition system. The present invention mainly has the following features: one is to use the continuous language segment in the corpus as the error correction template, and use the corpus to build an error correction template library; the other is to build an index for the error correction template library, and use the fast search technology to quickly search the error correction template library. Third, according to the error correction mode, the recognition result is divided into short recognition segments by using the confidence level, and the error correction template system submitted by the reliable part of the recognition segment is quickly searched to obtain the corrected part with high correlation with the recognition segment. The fourth is to use the acoustic confusion matrix to select a template similar to the acoustic characteristics of the recognition segment from the error correction template candidates for replacement and error correction.

技术方案Technical solutions

一种语音识别结果纠错方法,包括以下步骤:A method for correcting errors in speech recognition results, comprising the following steps:

1)识别系统对输入语音进行识别运算和置信度计算,得到带有1) The recognition system performs recognition operations and confidence calculations on the input speech, and obtains

置信度的识别结果;Confidence recognition results;

2)依据纠错模式,按照置信度的高低将识别结果切分成小的识别片段;2) According to the error correction mode, the recognition result is divided into small recognition segments according to the level of confidence;

3)将所得到的识别片段输入到纠错模版检索系统,得到与识别片段相关性高的纠错模版候选列表;3) Inputting the obtained recognition fragments into the error correction template retrieval system to obtain a candidate list of error correction templates highly correlated with the recognition fragments;

4)计算识别片段与候选列表中纠错模版的声学混淆度,挑选其中声学相似度最高的模版,当识别片断与该纠错模版的相似程度大于一个可信赖的门限时,使用纠错模版代替该识别结果片段;4) Calculate the acoustic confusion between the recognition segment and the error correction template in the candidate list, select the template with the highest acoustic similarity, and when the similarity between the recognition segment and the error correction template is greater than a reliable threshold, use the error correction template instead the recognition result segment;

5)将纠错后的片段合并,得到纠错后的识别结果。5) Merge the error-corrected segments to obtain the error-corrected recognition result.

所述的语音识别结果纠错方法,还包括,在对输入语音进行识别运算的同时进行置信度计算,得到带有置信度的识别结果的步骤。The speech recognition result error correction method further includes the step of calculating the confidence level while performing the recognition operation on the input speech, to obtain a recognition result with confidence level.

所述的一种语音识别结果纠错方法,还包括,根据置信的高低将识别结果切分成小的识别片段时,首先设置置信度门限CM-threshold和系统纠错模板最长字数max-var-length,置信度高于CM-threshold时认为识别结果是可信赖的,切分后的识别片段中可信赖的字数不得大于max-var-length。The error correction method for a speech recognition result also includes, when cutting the recognition result into small recognition segments according to the level of confidence, first setting the confidence threshold CM-threshold and the system error correction template longest word count max-var- length, when the confidence level is higher than CM-threshold, the recognition result is considered reliable, and the number of reliable words in the segmented recognition fragments must not be greater than max-var-length.

所述的语音识别结果纠错方法,还包括,将识别结果分块,连续的置信度高于或低于CM-threshold的字划为一个模块的步骤,即将识别结果划为一个或多个(A,x,B)模式构成,其中A、B为置信度高于CM-threshold的模块,x为置信度低于CM-threshold的模块,A和B最多一个为空模块。The error correction method of the speech recognition result also includes, the recognition result is divided into blocks, and the continuous confidence level is higher than or lower than the step of CM-threshold word is divided into a module, and the recognition result is divided into one or more ( A, x, B) pattern composition, where A and B are modules whose confidence is higher than CM-threshold, x is a module whose confidence is lower than CM-threshold, and at most one of A and B is an empty module.

所述的语音识别结果纠错方法,还包括,对于识别结果中所有的低置信度模块x,若A的长度大于或等于max-var-length,则将A中与x相邻的长为max-var-length的部分,设为sub-A,与x组成片段(sub-A,x),sub-A用来搜索纠错模版库,sub-A的长度不固定的步骤。The error correction method for the speech recognition result also includes, for all low confidence modules x in the recognition result, if the length of A is greater than or equal to max-var-length, then the length of A adjacent to x is max The part of -var-length is set to sub-A, which forms a segment (sub-A, x) with x, sub-A is used to search the error correction template library, and the length of sub-A is not a fixed step.

所述的语音识别结果纠错方法,还包括,对于识别结果中所有的低置信度模块x,若A长度小于max-var-length,则将B中与x相邻的部分sub-B,与A、x组成片段(A,x,sub-B),A和sub-B用来搜索纠错模版库的步骤,A和sub-B最多一个可以为空模块,其中A和sub-B的长度和不得大于max-var-length,A和sub-B的长度不固定。The error correction method for the speech recognition result also includes, for all low confidence modules x in the recognition result, if the length of A is less than max-var-length, sub-B, the part adjacent to x in B, and A and x form a segment (A, x, sub-B). A and sub-B are used to search the error correction template library. At most one of A and sub-B can be an empty module, where the length of A and sub-B The sum must not be greater than max-var-length, and the lengths of A and sub-B are not fixed.

所述的语音识别结果纠错方法,还包括,将识别结果切分成片段后,将每个片段的可信赖部分在纠错模版库中进行快速查找,得到一个或多个与识别片段相关性高的纠错模版的步骤。The error correction method for the speech recognition result also includes, after dividing the recognition result into segments, quickly searching the reliable part of each segment in the error correction template library, and obtaining one or more segments with high correlation with the recognition segment. The steps of the error correction template.

所述的语音识别结果纠错方法,还包括,纠错模版检索系统包括两个部分,第一部分是纠错模版索引的建立,第二部分是纠错模版的搜索。The error correction method for speech recognition results further includes that the error correction template retrieval system includes two parts, the first part is the establishment of the error correction template index, and the second part is the error correction template search.

所述的语音识别结果纠错方法,还包括,其中第一部分的基本原理是,把语料库中所有连续的字数在6到12之间的语言片段作为纠错模版,首先从语料库中提取出所有的纠错模版,然后采用倒置文件作为索引结构对纠错模版库建立索引,为了减小倒置文件的大小,需要对倒置文件压缩。The error correction method for speech recognition results also includes, wherein the basic principle of the first part is to use all consecutive language segments between 6 and 12 in the corpus as an error correction template, and first extract all the words from the corpus. Error correction template, and then use the inverted file as the index structure to index the error correction template library. In order to reduce the size of the inverted file, the inverted file needs to be compressed.

所述的语音识别结果纠错方法,还包括,其中第二部分的基本原理是,查询时首先将可信赖部分转换为布尔查询,在索引库中进行快速搜索,针对语音识别结果具有时序性、局部性的特点,在转换为布尔查询的时候,需要加入对可信赖部分的时序性要求和词与词间的局部性要求。The error correction method for the speech recognition result also includes, wherein the basic principle of the second part is that when querying, the reliable part is first converted into a Boolean query, and a fast search is carried out in the index library, and the speech recognition result has time sequence, The characteristics of locality, when converting to Boolean query, need to add timing requirements for reliable parts and locality requirements between words.

所述的语音识别结果纠错方法,还包括,对于纠错模版搜索返回的所有结果,使用纠错模版与识别片断的声学混淆度挑选最优模版的步骤,对于识别片段A和纠错模版候选列表中每一个模版Ti,计算A与Ti的混淆度C(A,Ti),当其中的最大值maxC(A,Ti)超过一个可信赖的门限时,我们使用该纠错模版替换识别片段,若maxC(A,Ti)小于该门限,则保留识别片段。The speech recognition result error correction method also includes, for all the results returned by the error correction template search, the step of using the error correction template and the acoustic confusion of the recognition segment to select the optimal template, for the recognition segment A and the error correction template candidate For each template Ti in the list, calculate the degree of confusion C(A, Ti ) between A and Ti , and when the maximum value maxC(A, Ti ) exceeds a reliable threshold, we use this error correction template Replace the recognition segment, if maxC(A, Ti ) is smaller than the threshold, keep the recognition segment.

所述的语音识别结果纠错方法,还包括,纠错模版与识别片断的声学混淆度的计算包括三个部分构成,第一部分是汉语声韵母识别混淆情况的统计,第二部分是汉语声韵母识别混淆度的后验概率计算,第三部分是识别片断与纠错模版的模糊整体匹配。The error correction method for the speech recognition result also includes that the calculation of the error correction template and the degree of acoustic confusion of the recognition segment includes three parts, the first part is the statistics of the recognition confusion of Chinese consonants and finals, and the second part is the statistics of Chinese consonants and finals. The posterior probability calculation of recognition confusion, the third part is the fuzzy overall matching of recognition fragments and error correction templates.

所述的语音识别结果纠错方法,还包括,第一部分的基本原理是,对语音数据库进行识别,并通过以下方式得到所有声母之间的混淆情况和所有韵母之间的混淆情况:假设声韵母之间不会产生混淆,若其中一个样本其识别结果为拼音串C1′V1′C2′V2′…Cm′Vm′,该识别结果与正确的C1V1C2V2…CnVn进行动态对整,使得其能对上的拼音串最大,这样就能得到大量的拼音串对,即(C1′,C1),(V1′,V1)…(Cm′,Cn)(Vm′,Vn),统计这些拼音串对的出现次数,得到每个声母的样本总数和它被识别为其他每个声母的次数,以及得到每个韵母的样本总数和它被识别为其他每个韵母的次数。The error correction method for speech recognition results also includes, the basic principle of the first part is to identify the speech database, and obtain the confusion between all initial consonants and the confusion between all finals in the following way: assuming that the initials and finals are There will be no confusion between them. If the recognition result of one of the samples is the pinyin string C1 ′V1 ′C2 ′V2 ′…Cm ′Vm ′, the recognition result is consistent with the correct C1 V1 C2 V2 …Cn Vn performs dynamic alignment, so that it can match the largest pinyin string, so that a large number of pinyin string pairs can be obtained, that is, (C1 ′, C1 ), (V1 ′, V1 )… (Cm ′, Cn )(Vm ′, Vn ), count the number of occurrences of these pinyin string pairs, get the total number of samples of each initial consonant and the number of times it is recognized as each other initial consonant, and get each final The total number of samples of and the number of times it was recognized as each other final.

所述的语音识别结果纠错方法,还包括,其中第二部分的基本原理是,根据第一部分的统计结果,首先计算每个声母被识别为其他声母的概率,Ci被混淆成Cj的模糊度,其计算公式为:The error correction method for the speech recognition result also includes, wherein the basic principle of the second part is, according to the statistical results of the first part, first calculate the probability that each initial consonant is recognized as other initial consonants, Ci is confused into Cj Fuzziness, its calculation formula is:

P(Cj|Ci)=Σ(Ci,Cj)|Ci|其中∑(Ci,Cj)为Ci被识P ( C j | C i ) = Σ ( C i , C j ) | C i | Where ∑(Ci , Cj ) is Ci is recognized

别为Cj的总数,|Ci|为Ci样本总数,is the total number of Cj , |Ci | is the total number of Ci samples,

当识别结果为Ci时,正确结果应为Cj的后验概率:When the recognition result is Ci , the correct result should be the posterior probability of Cj :

PP~~((CCjj||CCii))==PP((CCii||CCjj))PP((CCjj))ΣΣkkPP((CCii||CCkk))PP((CCkk))

其中P(Cj)=|Cj|Σ|Ci|,∑|Ci|表示所有声母的样本总数,韵母的计算方法与声母类似。in P ( C j ) = | C j | Σ | C i | , ∑|Ci | represents the total number of samples of all initial consonants, and the calculation method of final consonants is similar to that of initial consonants.

所述的语音识别结果纠错方法,还包括,其中第三块的基本原理是,设识别片段A的拼音串为C1′V1′C2′V2′…Cm′Vm′设纠错模版候选列表中的第i个模版Ti的拼音串为C1V1C2V2…CnVn,则定义A与Ti的声学混淆度C(A,Ti)为:找到一个对齐(1,i1),(2,i2)…(k,ik)…(m,im),该对齐使得P~(Ti|A)=ΠP~(Ck|Cik)P~(Vk|Vik)取得最大值,定义该最大值为A与Ti的声学混淆度。The error correction method for speech recognition results also includes, wherein the basic principle of the third block is to set the pinyin string of the recognition segment A as C1 ′V1 ′C2 ′V2 ′...Cm ′Vm ′ The pinyin string of the i-th template Ti in the error correction template candidate list is C1 V1 C2 V2 ... Cn Vn , then define the acoustic confusion C(A, Ti ) between A and Ti as: Find an alignment (1, i1 ), (2, i2 )...(k, ik )...(m, im ) such that P ~ ( T i | A ) = Π P ~ ( C k | C i k ) P ~ ( V k | V i k ) The maximum value is obtained, and the maximum value is defined as the acoustic confusion between A and Ti .

所述的语音识别结果纠错方法,还包括,在实际应用时,首先对后验概率取对数后计算,将问题转化为使得The error correction method for the speech recognition result also includes, in actual application, first calculating the logarithm of the posterior probability, and converting the problem into such that

LoglogPP~~((TTii||AA))==ΣLogΣLogPP~~((CCkk||CCiikk))++ΣLogΣLogPP~~((VVkk||VViikk))

取得最大值,此时使用该最大值作为A与Ti的对数声学混淆度。Take the maximum value, and use this maximum value as the logarithmic acoustic confusion of A and Ti at this time.

具体实施方式Detailed ways

本发明主要有三个模块,一是利用置信度对识别结果的切分,二是纠错模版候选列表的获得,三是识别片断与纠错模版声学混淆度的计算。下面加以详细说明。The invention mainly has three modules, one is to segment the recognition result by using the confidence degree, the other is to obtain the error correction template candidate list, and the third is to calculate the acoustic confusion degree of the recognition fragment and the error correction template. Describe in detail below.

利用置信度对识别结果的切分。首先设置置信度门限CM-threshold和系统纠错模板最长字数max-var-length,置信度高于CM-threshold的识别结果认为是可信赖的,然后对识别结果进行切分,步骤如下:Segmentation of recognition results by confidence. First, set the confidence threshold CM-threshold and the maximum number of words in the system error correction template max-var-length. The recognition results with a confidence higher than CM-threshold are considered reliable, and then the recognition results are segmented. The steps are as follows:

1.将识别结果分块,连续的置信度高于或低于CM-threshold的字划为一个模块,将识别结果划为一个或多个(A,x,B)结构构成,其中A、B为置信度高于CM-threshold的模块,x为置信度低于CM-threshold的模块,A和B最多一个为空模块。1. Divide the recognition results into blocks, and divide the words with continuous confidence higher or lower than the CM-threshold into a module, and divide the recognition results into one or more (A, x, B) structures, where A, B is a module whose confidence is higher than CM-threshold, x is a module whose confidence is lower than CM-threshold, and at most one of A and B is an empty module.

2.对于识别结果中所有的低置信度模块x2. For all low confidence modules x in the recognition result

a)若A的长度大于或等于max-var-length,则将A中与x相邻的长为max-var-length的部分,设为sub-A,与x组成片段(sub-A,x),sub-A用来搜索纠错模版库,sub-A的长度不固定;a) If the length of A is greater than or equal to max-var-length, set the max-var-length part of A adjacent to x as sub-A, and form a segment with x (sub-A, x ), sub-A is used to search the error correction template library, and the length of sub-A is not fixed;

b)若A长度小于max-var-length,将B中与x相邻的部分sub-B,与A、x组成片段(A,x,sub-B),A和sub-B用来搜索纠错模版库,A和sub-B最多一个可以为空模块。其中A和sub-B的长度和不得大于max-var-length,A和sub-B的长度不固定。b) If the length of A is less than max-var-length, the part sub-B adjacent to x in B is combined with A and x to form a segment (A, x, sub-B), and A and sub-B are used to search and correct Wrong template library, at most one of A and sub-B can be an empty module. The sum of the lengths of A and sub-B must not be greater than max-var-length, and the lengths of A and sub-B are not fixed.

纠错模版候选列表的获得。将识别结果切分成片段后,将每个片段中的可信赖部分提交到纠错模版检索系统,得到与识别片段相关性高的纠错模版候选。纠错模版检索系统包括两个部分,第一部分是纠错模版索引的建立,第二部分是对纠错模版库的快速搜索。Acquisition of error correction template candidate list. After the recognition result is divided into segments, the reliable part of each segment is submitted to the error correction template retrieval system, and the error correction template candidates with high correlation with the recognition segment are obtained. The error correction template retrieval system includes two parts. The first part is the establishment of the error correction template index, and the second part is the fast search of the error correction template library.

其中第一部分的基本原理是,把语料库中所有连续的字数在6到12之间的语言片段作为纠错模版,首先从语料库中提取出所有的纠错模版,然后采用倒置文件作为索引结构对纠错模版库建立索引。为了减小倒置文件的大小,需要对倒置文件压缩。The basic principle of the first part is to use all consecutive language segments between 6 and 12 in the corpus as error correction templates, first extract all error correction templates from the corpus, and then use the inverted file as an index structure to correct Create an index for the wrong template library. In order to reduce the size of the inversion file, it is necessary to compress the inversion file.

其中第二部分的基本原理是,首先将片段中的可信赖部分转换为布尔查询,在索引库中进行快速检索。针对语音识别结果具有时序性、局部性特点,在转换为布尔查询的时候,需要加入对片段中可信赖部分的时序性要求和词与词间的局部性要求。The basic principle of the second part is to firstly convert the reliable part in the fragment into Boolean query, and perform fast retrieval in the index library. As the speech recognition results have the characteristics of timing and locality, when converting to Boolean query, it is necessary to add timing requirements for reliable parts in the segment and locality requirements between words.

声学混淆度的计算。纠错模版往往获得一个或多个候选,这时使用纠错模版与识别片断的声学混淆度挑选最优模版。对于识别片段A和纠错模版候选列表中每一个模版Ti,计算A与Ti的混淆度C(A,Ti),当其中的最大值maxC(A,Ti)超过一个可信赖的门限时,我们使用该纠错模版替换识别片段,若maxC(A,Ti)小于该门限,则保留识别片段。Calculation of Acoustic Confusion. The error correction template usually obtains one or more candidates, and the optimal template is selected by using the error correction template and the acoustic confusion of the recognition segment. For each template Ti in the recognition segment A and error correction template candidate list, calculate the confusion degree C(A, Ti ) between A and Ti , when the maximum value maxC(A, Ti ) exceeds a reliable When the threshold is set, we use the error correction template to replace the recognition segment. If maxC(A, Ti ) is smaller than the threshold, the recognition segment is retained.

混淆度的计算包括三个部分构成,第一部分是汉语声韵母识别混淆情况的统计,第二部分是汉语声韵母识别混淆度的后验概率计算,第三部分是识别片断与纠错模版的模糊整体匹配。The calculation of the degree of confusion consists of three parts. The first part is the statistics of the confusion in the recognition of Chinese consonants and finals. The second part is the calculation of the posterior probability of the recognition of Chinese consonants and finals. The third part is the fuzziness of recognition fragments and error correction templates. overall match.

其中第一部分的基本原理是,对语音数据库进行识别,并通过以下方式得到所有声母之间的混淆情况和所有韵母之间的混淆情况:假设声韵母之间不会产生混淆,若其中一个样本其识别结果为拼音串C1′V1′C2′V2′…Cm′Vm′,该识别结果与正确的C1V1C2V2…CnVn进行动态对整,使得其能对上的拼音串最大,这样就能得到大量的拼音串对,即(C1′,C1),(V1′,V1)…(Cm′,Cn)(Vm′,Vn),统计这些拼音串对的出现次数,得到每个声母的样本总数和它被识别为其他每个声母的次数,以及得到每个韵母的样本总数和它被识别为其他每个韵母的次数。The basic principle of the first part is to identify the speech database, and obtain the confusion between all initials and all finals in the following way: Assuming that there will be no confusion between the initials and finals, if one of the samples has The recognition result is a pinyin string C1 ′V1 ′C2 ′V2 ′…Cm ′Vm ′, which is dynamically aligned with the correct C1 V1 C2 V2 …Cn Vn , so that It can match the largest pinyin string, so that a large number of pinyin string pairs can be obtained, namely (C1 ′, C1 ), (V1 ′, V1 )...(Cm ′, Cn )(Vm ′ , Vn ), count the number of occurrences of these pinyin string pairs, get the total number of samples of each initial consonant and the number of times it is recognized as each other initial consonant, and obtain the total number of samples of each final and it is recognized as each other finals times.

其中第二部分的基本原理是,根据第一部分的统计结果,首先计算每个声母被识别为其他声母的概率,Ci被混淆成Cj的模糊度,其计算公式为:The basic principle of the second part is, according to the statistical results of the first part, first calculate the probability that each initial consonant is recognized as another initial consonant, Ci is confused into the ambiguity of Cj , and its calculation formula is:

PP((CCjj||CCii))==ΣΣ((CCii,,CCjj))||CCii||

其中∑(Ci,Cj)表示Ci被识别为Cj的总数,|Ci|表示Ci样本总数,where ∑(Ci , Cj ) represents the total number of Ci identified as Cj , |Ci | represents the total number of Ci samples,

当识别结果为Ci时正确结果应为Cj的后验概率:When the recognition result is Ci , the correct result should be the posterior probability of Cj :

PP~~((CCjj||CCii))==PP((CCii||CCjj))PP((CCjj))ΣΣkkPP((CCii||CCkk))PP((CCkk))

其中P(Cj)=|Cj|Σ|Ci|,∑|Ci|表示所有声母的样本总数。in P ( C j ) = | C j | Σ | C i | , ∑|Ci | represents the total number of samples of all initial consonants.

其中第三块的基本原理是,设识别片段A的拼音串为C1′V1′C2′V2′…Cm′Vm′,设纠错模版候选列表中的第i个模版Ti的拼音串为C1V1C2V2…CnVn,则定义A与Ti的声学混淆度C(A,Ti)为:找到一个对齐(1,i1),(2,i2)…(k,ik)…(m,im),该对齐使得The basic principle of the third block is that the pinyin string of the recognition segment A is C1 ′V1 ′C2 ′V2 ′…Cm ′Vm ′, and the i-th template T in the error correction template candidate list is The pinyin string ofi is C1 V1 C2 V2 ... Cn Vn , then define the acoustic confusion C(A, Ti ) between A and Ti as: find an alignment (1, i1 ), (2 , i2 )…(k, ik )…(m, im ), the alignment makes

PP~~((TTii||AA))==ΠΠPP~~((CCkk||CCiikk))PP~~((VVkk||VViikk))------((11))

取得最大值,定义该最大值为A与Ti的声学混淆度。The maximum value is obtained, and the maximum value is defined as the acoustic confusion between A and Ti .

在实际应用时,首先对后验概率取对数后计算,将问题转化为使得In practical application, the logarithm of the posterior probability is first calculated, and the problem is transformed into such that

LoglogPP~~((TTii||AA))==ΣΣLoglogPP~~((CCkk||CCiikk))++ΣLogΣLogPP~~((VVkk||VViikk))

取得最大值,此时使用该最大值作为A与Ti的对数声学混淆度。Take the maximum value, and use this maximum value as the logarithmic acoustic confusion of A and Ti at this time.

Claims (16)

13. according to claim 1 or 12 described method for correcting error of voice identification result, it is characterized in that, also comprise, the ultimate principle of first is, speech database is discerned, and obtain the situation of obscuring between situation and all simple or compound vowel of a Chinese syllable obscured between all initial consonants in the following manner: suppose can not produce between the sound mother and obscure, if its recognition result of one of them sample is pinyin string C1' V1' C2' V2' ... Cm' Vm', this recognition result and correct C1V1C2V2CnVnCarry out dynamically making that to whole it can be to last pinyin string maximum, it is right so just to obtain a large amount of pinyin string, i.e. (C1', C1), (V1', V1) ... (Cm', Cn) (Vm', Vn), add up the right occurrence number of these pinyin string, obtain the total sample number of each initial consonant and it is identified as the number of times of other each initial consonants, and obtain the total sample number of each simple or compound vowel of a Chinese syllable and it is identified as the number of times of other each simple or compound vowel of a Chinese syllable.
CNA2005101274476A2005-12-022005-12-02Method for correcting error of voice identification resultPendingCN1979638A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CNA2005101274476ACN1979638A (en)2005-12-022005-12-02Method for correcting error of voice identification result

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CNA2005101274476ACN1979638A (en)2005-12-022005-12-02Method for correcting error of voice identification result

Publications (1)

Publication NumberPublication Date
CN1979638Atrue CN1979638A (en)2007-06-13

Family

ID=38130774

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CNA2005101274476APendingCN1979638A (en)2005-12-022005-12-02Method for correcting error of voice identification result

Country Status (1)

CountryLink
CN (1)CN1979638A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101894549A (en)*2010-06-242010-11-24中国科学院声学研究所Method for fast calculating confidence level in speech recognition application field
CN101452701B (en)*2007-12-052011-09-07株式会社东芝Confidence degree estimation method and device based on inverse model
CN102831177A (en)*2012-07-312012-12-19聚熵信息技术(上海)有限公司Statement error correction method and system
CN103000176A (en)*2012-12-282013-03-27安徽科大讯飞信息科技股份有限公司Speech recognition method and system
CN103021412A (en)*2012-12-282013-04-03安徽科大讯飞信息科技股份有限公司Voice recognition method and system
CN105244029A (en)*2015-08-282016-01-13科大讯飞股份有限公司Voice recognition post-processing method and system
CN105632499A (en)*2014-10-312016-06-01株式会社东芝Method and device for optimizing voice recognition result
CN106340293A (en)*2015-07-062017-01-18无锡天脉聚源传媒科技有限公司Method and device for adjusting audio data recognition result
CN106534548A (en)*2016-11-172017-03-22科大讯飞股份有限公司Voice error correction method and device
CN106683677A (en)*2015-11-062017-05-17阿里巴巴集团控股有限公司Method and device for recognizing voice
CN106710592A (en)*2016-12-292017-05-24北京奇虎科技有限公司Speech recognition error correction method and speech recognition error correction device used for intelligent hardware equipment
CN106875943A (en)*2017-01-222017-06-20上海云信留客信息科技有限公司A kind of speech recognition system for big data analysis
CN107045496A (en)*2017-04-192017-08-15畅捷通信息技术股份有限公司The error correction method and error correction device of text after speech recognition
CN108052498A (en)*2010-01-052018-05-18谷歌有限责任公司The words grade of phonetic entry is corrected
CN109922371A (en)*2019-03-112019-06-21青岛海信电器股份有限公司Natural language processing method, equipment and storage medium
CN109920432A (en)*2019-03-052019-06-21百度在线网络技术(北京)有限公司A kind of audio recognition method, device, equipment and storage medium
CN110428822A (en)*2019-08-052019-11-08重庆电子工程职业学院A kind of speech recognition error correction method and interactive system
CN110797026A (en)*2019-09-172020-02-14腾讯科技(深圳)有限公司Voice recognition method, device and storage medium
CN111192586A (en)*2020-01-082020-05-22北京松果电子有限公司Voice recognition method and device, electronic equipment and storage medium
CN112784581A (en)*2020-11-202021-05-11网易(杭州)网络有限公司Text error correction method, device, medium and electronic equipment
CN113407694A (en)*2018-07-192021-09-17深圳追一科技有限公司Customer service robot knowledge base ambiguity detection method, device and related equipment
CN114530145A (en)*2020-11-232022-05-24中移互联网有限公司Speech recognition result error correction method and device, and computer readable storage medium

Cited By (37)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101452701B (en)*2007-12-052011-09-07株式会社东芝Confidence degree estimation method and device based on inverse model
CN108052498A (en)*2010-01-052018-05-18谷歌有限责任公司The words grade of phonetic entry is corrected
CN101894549A (en)*2010-06-242010-11-24中国科学院声学研究所Method for fast calculating confidence level in speech recognition application field
CN102831177B (en)*2012-07-312015-09-02聚熵信息技术(上海)有限公司Statement error correction and system thereof
CN102831177A (en)*2012-07-312012-12-19聚熵信息技术(上海)有限公司Statement error correction method and system
CN103021412B (en)*2012-12-282014-12-10安徽科大讯飞信息科技股份有限公司Voice recognition method and system
CN103021412A (en)*2012-12-282013-04-03安徽科大讯飞信息科技股份有限公司Voice recognition method and system
CN103000176B (en)*2012-12-282014-12-10安徽科大讯飞信息科技股份有限公司Speech recognition method and system
CN103000176A (en)*2012-12-282013-03-27安徽科大讯飞信息科技股份有限公司Speech recognition method and system
CN105632499B (en)*2014-10-312019-12-10株式会社东芝Method and apparatus for optimizing speech recognition results
CN105632499A (en)*2014-10-312016-06-01株式会社东芝Method and device for optimizing voice recognition result
CN106340293B (en)*2015-07-062019-11-29无锡天脉聚源传媒科技有限公司A kind of method of adjustment and device of audio data recognition result
CN106340293A (en)*2015-07-062017-01-18无锡天脉聚源传媒科技有限公司Method and device for adjusting audio data recognition result
CN105244029A (en)*2015-08-282016-01-13科大讯飞股份有限公司Voice recognition post-processing method and system
CN105244029B (en)*2015-08-282019-02-26安徽科大讯飞医疗信息技术有限公司Voice recognition post-processing method and system
CN106683677A (en)*2015-11-062017-05-17阿里巴巴集团控股有限公司Method and device for recognizing voice
US11664020B2 (en)2015-11-062023-05-30Alibaba Group Holding LimitedSpeech recognition method and apparatus
CN106534548A (en)*2016-11-172017-03-22科大讯飞股份有限公司Voice error correction method and device
CN106534548B (en)*2016-11-172020-06-12科大讯飞股份有限公司Voice error correction method and device
CN106710592A (en)*2016-12-292017-05-24北京奇虎科技有限公司Speech recognition error correction method and speech recognition error correction device used for intelligent hardware equipment
CN106875943A (en)*2017-01-222017-06-20上海云信留客信息科技有限公司A kind of speech recognition system for big data analysis
CN107045496A (en)*2017-04-192017-08-15畅捷通信息技术股份有限公司The error correction method and error correction device of text after speech recognition
CN107045496B (en)*2017-04-192021-01-05畅捷通信息技术股份有限公司Error correction method and error correction device for text after voice recognition
CN113407694A (en)*2018-07-192021-09-17深圳追一科技有限公司Customer service robot knowledge base ambiguity detection method, device and related equipment
US11264034B2 (en)2019-03-052022-03-01Baidu Online Network Technology (Beijing) Co., LtdVoice identification method, device, apparatus, and storage medium
CN109920432A (en)*2019-03-052019-06-21百度在线网络技术(北京)有限公司A kind of audio recognition method, device, equipment and storage medium
CN109922371A (en)*2019-03-112019-06-21青岛海信电器股份有限公司Natural language processing method, equipment and storage medium
CN109922371B (en)*2019-03-112021-07-09海信视像科技股份有限公司Natural language processing method, apparatus and storage medium
CN110428822B (en)*2019-08-052022-05-03重庆电子工程职业学院Voice recognition error correction method and man-machine conversation system
CN110428822A (en)*2019-08-052019-11-08重庆电子工程职业学院A kind of speech recognition error correction method and interactive system
CN110797026A (en)*2019-09-172020-02-14腾讯科技(深圳)有限公司Voice recognition method, device and storage medium
CN110797026B (en)*2019-09-172024-11-26腾讯科技(深圳)有限公司 A speech recognition method, device and storage medium
CN111192586A (en)*2020-01-082020-05-22北京松果电子有限公司Voice recognition method and device, electronic equipment and storage medium
CN112784581A (en)*2020-11-202021-05-11网易(杭州)网络有限公司Text error correction method, device, medium and electronic equipment
CN112784581B (en)*2020-11-202024-02-13网易(杭州)网络有限公司Text error correction method, device, medium and electronic equipment
CN114530145A (en)*2020-11-232022-05-24中移互联网有限公司Speech recognition result error correction method and device, and computer readable storage medium
CN114530145B (en)*2020-11-232023-08-15中移互联网有限公司Speech recognition result error correction method and device and computer readable storage medium

Similar Documents

PublicationPublication DateTitle
CN1979638A (en)Method for correcting error of voice identification result
CN101510222B (en)Multilayer index voice document searching method
US6877001B2 (en)Method and system for retrieving documents with spoken queries
CN100440150C (en)Machine translation system based on examples
Xu et al.Minimum bayes risk decoding and system combination based on a recursion for edit distance
JP5449521B2 (en) Search device and search program
US20070179784A1 (en)Dynamic match lattice spotting for indexing speech content
CN105404621B (en)A kind of method and system that Chinese character is read for blind person
CN112395385B (en)Text generation method and device based on artificial intelligence, computer equipment and medium
US20030204399A1 (en)Key word and key phrase based speech recognizer for information retrieval systems
CN105957518A (en)Mongolian large vocabulary continuous speech recognition method
CN87106964A (en) language translation system
WO2007056029A1 (en)Speech index pruning
JP2002063199A (en) Index method and apparatus
CN109948144A (en) A method of intelligent processing of teachers' discourse based on classroom teaching situation
CN112232055A (en) A Text Detection and Correction Method Based on Pinyin Similarity and Language Model
CN111160014A (en)Intelligent word segmentation method
WO2010044123A1 (en)Search device, search index creating device, and search system
US7603272B1 (en)System and method of word graph matrix decomposition
CN102999533A (en)Textspeak identification method and system
Fusayasu et al.Word-Error Correction of Continuous Speech Recognition Based on Normalized Relevance Distance.
Wang et al.Improving handwritten Chinese text recognition by unsupervised language model adaptation
Palmer et al.Improving out-of-vocabulary name resolution
CN111583915B (en)Optimization method, optimization device, optimization computer device and optimization storage medium for n-gram language model
Liang et al.An efficient error correction interface for speech recognition on mobile touchscreen devices

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
C12Rejection of a patent application after its publication
RJ01Rejection of invention patent application after publication

Open date:20070613


[8]ページ先頭

©2009-2025 Movatter.jp