CN1979638A

Movatterモバイル変換

Info

Publication number: CN1979638A
Application number: CNA2005101274476A
Authority: CN
Inventors: 王晓瑞; 江杰; 王士进; 丁鹏; 徐波
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2005-12-02
Filing date: 2005-12-02
Publication date: 2007-06-13

Abstract

The invention relates to the voice recognition technical field, especially relating to a voice recognition result correcting method, namely a method for correct the recognition result by error correction knowledge library, and the most basic characteristics of the method comprises: 1. using continuous language fragments in corpus as error correcting template, and using the corpus to build error correction template library; 2. indexing for the error correction template library and using the searching technique to fast find error correction templates; 3. according to error correction modes, using creditability to cut recognition result into short recognition fragments and submitting creditable parts in the recognition fragments to an error correction template system for fast find, and obtaining error correction template candidates highly related with the recognition fragments; and 4. using acoustic confusion matrix to select templates close to acoustic characteristics of the recognition fragments from the error correction template candidates to make substitution error correction.

Description

Translated fromChinese

一种语音识别结果纠错方法A method for correcting errors in speech recognition results

技术领域technical field

本发明涉及语音识别技术领域，特别是一种语音识别结果纠错方法。The invention relates to the technical field of speech recognition, in particular to a method for correcting errors in speech recognition results.

背景技术Background technique

目前大部分语音识别系统采用N元文法(Ngram)语言模型，由于这种模型存在一种不完善的独立性假设，即假设当前词只依赖该词之前的N-1个词，其局限性表现在它只是前N-1个词的不确定性推理，导致识别结果往往出现毫无意义的句子或片段。At present, most speech recognition systems use the N-gram language model (Ngram). Since this model has an imperfect independence assumption, that is, it is assumed that the current word only depends on the N-1 words before the word, and its limitations show It is only the uncertainty reasoning of the first N-1 words, which often leads to meaningless sentences or fragments in the recognition results.

发明内容Contents of the invention

本发明提出了一种语音识别结果纠错方法，能够利用可变长纠错模版，根据置信度和声学混淆度对识别结果进行纠错。本发明可用于大词汇量连续语音识别系统。本发明主要有如下特征：一是以语料库中的连续语言片段作为纠错模版，利用语料库建立纠错模版库；二是对纠错模版库建立索引，使用快速搜索技术对纠错模版库进行快速查找；三是依据纠错模式，利用置信度将识别结果切分成短的识别片段，并将识别片段中的可信赖部分提交的纠错模版系统进行快速查找，得到与识别片段相关性高的纠错模版候选；四是利用声学混淆度矩阵从纠错模版候选中挑选与识别片段声学特征相近的模版进行替换纠错。The invention proposes an error correction method for speech recognition results, which can use a variable-length error correction template to correct errors for the recognition results according to the degree of confidence and the degree of acoustic confusion. The invention can be used in a large vocabulary continuous speech recognition system. The present invention mainly has the following features: one is to use the continuous language segment in the corpus as the error correction template, and use the corpus to build an error correction template library; the other is to build an index for the error correction template library, and use the fast search technology to quickly search the error correction template library. Third, according to the error correction mode, the recognition result is divided into short recognition segments by using the confidence level, and the error correction template system submitted by the reliable part of the recognition segment is quickly searched to obtain the corrected part with high correlation with the recognition segment. The fourth is to use the acoustic confusion matrix to select a template similar to the acoustic characteristics of the recognition segment from the error correction template candidates for replacement and error correction.

技术方案Technical solutions

一种语音识别结果纠错方法，包括以下步骤：A method for correcting errors in speech recognition results, comprising the following steps:

1)识别系统对输入语音进行识别运算和置信度计算，得到带有1) The recognition system performs recognition operations and confidence calculations on the input speech, and obtains

置信度的识别结果；Confidence recognition results;

2)依据纠错模式，按照置信度的高低将识别结果切分成小的识别片段；2) According to the error correction mode, the recognition result is divided into small recognition segments according to the level of confidence;

3)将所得到的识别片段输入到纠错模版检索系统，得到与识别片段相关性高的纠错模版候选列表；3) Inputting the obtained recognition fragments into the error correction template retrieval system to obtain a candidate list of error correction templates highly correlated with the recognition fragments;

4)计算识别片段与候选列表中纠错模版的声学混淆度，挑选其中声学相似度最高的模版，当识别片断与该纠错模版的相似程度大于一个可信赖的门限时，使用纠错模版代替该识别结果片段；4) Calculate the acoustic confusion between the recognition segment and the error correction template in the candidate list, select the template with the highest acoustic similarity, and when the similarity between the recognition segment and the error correction template is greater than a reliable threshold, use the error correction template instead the recognition result segment;

5)将纠错后的片段合并，得到纠错后的识别结果。5) Merge the error-corrected segments to obtain the error-corrected recognition result.

所述的语音识别结果纠错方法，还包括，在对输入语音进行识别运算的同时进行置信度计算，得到带有置信度的识别结果的步骤。The speech recognition result error correction method further includes the step of calculating the confidence level while performing the recognition operation on the input speech, to obtain a recognition result with confidence level.

所述的一种语音识别结果纠错方法，还包括，根据置信的高低将识别结果切分成小的识别片段时，首先设置置信度门限CM-threshold和系统纠错模板最长字数max-var-length，置信度高于CM-threshold时认为识别结果是可信赖的，切分后的识别片段中可信赖的字数不得大于max-var-length。The error correction method for a speech recognition result also includes, when cutting the recognition result into small recognition segments according to the level of confidence, first setting the confidence threshold CM-threshold and the system error correction template longest word count max-var- length, when the confidence level is higher than CM-threshold, the recognition result is considered reliable, and the number of reliable words in the segmented recognition fragments must not be greater than max-var-length.

所述的语音识别结果纠错方法，还包括，将识别结果分块，连续的置信度高于或低于CM-threshold的字划为一个模块的步骤，即将识别结果划为一个或多个(A，x，B)模式构成，其中A、B为置信度高于CM-threshold的模块，x为置信度低于CM-threshold的模块，A和B最多一个为空模块。The error correction method of the speech recognition result also includes, the recognition result is divided into blocks, and the continuous confidence level is higher than or lower than the step of CM-threshold word is divided into a module, and the recognition result is divided into one or more ( A, x, B) pattern composition, where A and B are modules whose confidence is higher than CM-threshold, x is a module whose confidence is lower than CM-threshold, and at most one of A and B is an empty module.

所述的语音识别结果纠错方法，还包括，对于识别结果中所有的低置信度模块x，若A的长度大于或等于max-var-length，则将A中与x相邻的长为max-var-length的部分，设为sub-A，与x组成片段(sub-A，x)，sub-A用来搜索纠错模版库，sub-A的长度不固定的步骤。The error correction method for the speech recognition result also includes, for all low confidence modules x in the recognition result, if the length of A is greater than or equal to max-var-length, then the length of A adjacent to x is max The part of -var-length is set to sub-A, which forms a segment (sub-A, x) with x, sub-A is used to search the error correction template library, and the length of sub-A is not a fixed step.

所述的语音识别结果纠错方法，还包括，对于识别结果中所有的低置信度模块x，若A长度小于max-var-length，则将B中与x相邻的部分sub-B，与A、x组成片段(A，x，sub-B)，A和sub-B用来搜索纠错模版库的步骤，A和sub-B最多一个可以为空模块，其中A和sub-B的长度和不得大于max-var-length，A和sub-B的长度不固定。The error correction method for the speech recognition result also includes, for all low confidence modules x in the recognition result, if the length of A is less than max-var-length, sub-B, the part adjacent to x in B, and A and x form a segment (A, x, sub-B). A and sub-B are used to search the error correction template library. At most one of A and sub-B can be an empty module, where the length of A and sub-B The sum must not be greater than max-var-length, and the lengths of A and sub-B are not fixed.

所述的语音识别结果纠错方法，还包括，将识别结果切分成片段后，将每个片段的可信赖部分在纠错模版库中进行快速查找，得到一个或多个与识别片段相关性高的纠错模版的步骤。The error correction method for the speech recognition result also includes, after dividing the recognition result into segments, quickly searching the reliable part of each segment in the error correction template library, and obtaining one or more segments with high correlation with the recognition segment. The steps of the error correction template.

所述的语音识别结果纠错方法，还包括，纠错模版检索系统包括两个部分，第一部分是纠错模版索引的建立，第二部分是纠错模版的搜索。The error correction method for speech recognition results further includes that the error correction template retrieval system includes two parts, the first part is the establishment of the error correction template index, and the second part is the error correction template search.

所述的语音识别结果纠错方法，还包括，其中第一部分的基本原理是，把语料库中所有连续的字数在6到12之间的语言片段作为纠错模版，首先从语料库中提取出所有的纠错模版，然后采用倒置文件作为索引结构对纠错模版库建立索引，为了减小倒置文件的大小，需要对倒置文件压缩。The error correction method for speech recognition results also includes, wherein the basic principle of the first part is to use all consecutive language segments between 6 and 12 in the corpus as an error correction template, and first extract all the words from the corpus. Error correction template, and then use the inverted file as the index structure to index the error correction template library. In order to reduce the size of the inverted file, the inverted file needs to be compressed.

所述的语音识别结果纠错方法，还包括，其中第二部分的基本原理是，查询时首先将可信赖部分转换为布尔查询，在索引库中进行快速搜索，针对语音识别结果具有时序性、局部性的特点，在转换为布尔查询的时候，需要加入对可信赖部分的时序性要求和词与词间的局部性要求。The error correction method for the speech recognition result also includes, wherein the basic principle of the second part is that when querying, the reliable part is first converted into a Boolean query, and a fast search is carried out in the index library, and the speech recognition result has time sequence, The characteristics of locality, when converting to Boolean query, need to add timing requirements for reliable parts and locality requirements between words.

所述的语音识别结果纠错方法，还包括，对于纠错模版搜索返回的所有结果，使用纠错模版与识别片断的声学混淆度挑选最优模版的步骤，对于识别片段A和纠错模版候选列表中每一个模版T_i，计算A与T_i的混淆度C(A，T_i)，当其中的最大值maxC(A，T_i)超过一个可信赖的门限时，我们使用该纠错模版替换识别片段，若maxC(A，T_i)小于该门限，则保留识别片段。The speech recognition result error correction method also includes, for all the results returned by the error correction template search, the step of using the error correction template and the acoustic confusion of the recognition segment to select the optimal template, for the recognition segment A and the error correction template candidate For each template T_i in the list, calculate the degree of confusion C(A, T_i ) between A and T_i , and when the maximum value maxC(A, T_i ) exceeds a reliable threshold, we use this error correction template Replace the recognition segment, if maxC(A, T_i ) is smaller than the threshold, keep the recognition segment.

所述的语音识别结果纠错方法，还包括，纠错模版与识别片断的声学混淆度的计算包括三个部分构成，第一部分是汉语声韵母识别混淆情况的统计，第二部分是汉语声韵母识别混淆度的后验概率计算，第三部分是识别片断与纠错模版的模糊整体匹配。The error correction method for the speech recognition result also includes that the calculation of the error correction template and the degree of acoustic confusion of the recognition segment includes three parts, the first part is the statistics of the recognition confusion of Chinese consonants and finals, and the second part is the statistics of Chinese consonants and finals. The posterior probability calculation of recognition confusion, the third part is the fuzzy overall matching of recognition fragments and error correction templates.

所述的语音识别结果纠错方法，还包括，第一部分的基本原理是，对语音数据库进行识别，并通过以下方式得到所有声母之间的混淆情况和所有韵母之间的混淆情况：假设声韵母之间不会产生混淆，若其中一个样本其识别结果为拼音串C₁′V₁′C₂′V₂′…C_m′V_m′，该识别结果与正确的C₁V₁C₂V₂…C_nV_n进行动态对整，使得其能对上的拼音串最大，这样就能得到大量的拼音串对，即(C₁′，C₁)，(V₁′，V₁)…(C_m′，C_n)(V_m′，V_n)，统计这些拼音串对的出现次数，得到每个声母的样本总数和它被识别为其他每个声母的次数，以及得到每个韵母的样本总数和它被识别为其他每个韵母的次数。The error correction method for speech recognition results also includes, the basic principle of the first part is to identify the speech database, and obtain the confusion between all initial consonants and the confusion between all finals in the following way: assuming that the initials and finals are There will be no confusion between them. If the recognition result of one of the samples is the pinyin string C₁ ′V₁ ′C₂ ′V₂ ′…C_m ′V_m ′, the recognition result is consistent with the correct C₁ V₁ C₂ V₂ …C_n V_n performs dynamic alignment, so that it can match the largest pinyin string, so that a large number of pinyin string pairs can be obtained, that is, (C₁ ′, C₁ ), (V₁ ′, V₁ )… (C_m ′, C_n )(V_m ′, V_n ), count the number of occurrences of these pinyin string pairs, get the total number of samples of each initial consonant and the number of times it is recognized as each other initial consonant, and get each final The total number of samples of and the number of times it was recognized as each other final.

所述的语音识别结果纠错方法，还包括，其中第二部分的基本原理是，根据第一部分的统计结果，首先计算每个声母被识别为其他声母的概率，C_i被混淆成C_j的模糊度，其计算公式为：The error correction method for the speech recognition result also includes, wherein the basic principle of the second part is, according to the statistical results of the first part, first calculate the probability that each initial consonant is recognized as other initial consonants, C_i is confused into C_j Fuzziness, its calculation formula is:

$P (C_{j} | C_{i}) = \frac{Σ (C_{i}, C_{j})}{| C_{i} |}$ 其中∑(C_i，C_j)为C_i被识 $P (C_{j} | C_{i}) = \frac{Σ (C_{i}, C_{j})}{| C_{i} |}$ Where ∑(C_i , C_j ) is C_i is recognized

别为C_j的总数，|C_i|为C_i样本总数，is the total number of C_j , |C_i | is the total number of C_i samples,

当识别结果为C_i时，正确结果应为C_j的后验概率：When the recognition result is C_i , the correct result should be the posterior probability of C_j :

$\overset{~ ~}{P P} (({C C}_{j j} | | {C C}_{i i})) = = \frac{P P (({C C}_{i i} | | {C C}_{j j})) P P (({C C}_{j j}))}{\underset{k k}{Σ Σ} P P (({C C}_{i i} | | {C C}_{k k})) P P (({C C}_{k k}))}$

其中 $P (C_{j}) = \frac{| C_{j} |}{Σ | C_{i} |},$ ∑|C_i|表示所有声母的样本总数，韵母的计算方法与声母类似。in $P (C_{j}) = \frac{| C_{j} |}{Σ | C_{i} |},$ ∑|C_i | represents the total number of samples of all initial consonants, and the calculation method of final consonants is similar to that of initial consonants.

所述的语音识别结果纠错方法，还包括，其中第三块的基本原理是，设识别片段A的拼音串为C₁′V₁′C₂′V₂′…C_m′V_m′设纠错模版候选列表中的第i个模版T_i的拼音串为C₁V₁C₂V₂…C_nV_n，则定义A与T_i的声学混淆度C(A，T_i)为：找到一个对齐(1，i₁)，(2，i₂)…(k，i_k)…(m，i_m)，该对齐使得 $\tilde{P} (T_{i} | A) = Π \tilde{P} (C_{k} | C_{i_{k}}) \tilde{P} (V_{k} | V_{i_{k}})$ 取得最大值，定义该最大值为A与T_i的声学混淆度。The error correction method for speech recognition results also includes, wherein the basic principle of the third block is to set the pinyin string of the recognition segment A as C₁ ′V₁ ′C₂ ′V₂ ′...C_m ′V_m ′ The pinyin string of the i-th template T_i in the error correction template candidate list is C₁ V₁ C₂ V₂ ... C_n V_n , then define the acoustic confusion C(A, T_i ) between A and T_i as: Find an alignment (1, i₁ ), (2, i₂ )...(k, i_k )...(m, i_m ) such that $\tilde{P} (T_{i} | A) = Π \tilde{P} (C_{k} | C_{i_{k}}) \tilde{P} (V_{k} | V_{i_{k}})$ The maximum value is obtained, and the maximum value is defined as the acoustic confusion between A and T_i .

所述的语音识别结果纠错方法，还包括，在实际应用时，首先对后验概率取对数后计算，将问题转化为使得The error correction method for the speech recognition result also includes, in actual application, first calculating the logarithm of the posterior probability, and converting the problem into such that

$Log log \overset{~ ~}{P P} (({T T}_{i i} | | A A)) = = ΣLog ΣLog \overset{~ ~}{P P} (({C C}_{k k} | | {C C}_{{i i}_{k k}})) + + ΣLog ΣLog \overset{~ ~}{P P} (({V V}_{k k} | | {V V}_{{i i}_{k k}}))$

取得最大值，此时使用该最大值作为A与T_i的对数声学混淆度。Take the maximum value, and use this maximum value as the logarithmic acoustic confusion of A and T_i at this time.

具体实施方式Detailed ways

本发明主要有三个模块，一是利用置信度对识别结果的切分，二是纠错模版候选列表的获得，三是识别片断与纠错模版声学混淆度的计算。下面加以详细说明。The invention mainly has three modules, one is to segment the recognition result by using the confidence degree, the other is to obtain the error correction template candidate list, and the third is to calculate the acoustic confusion degree of the recognition fragment and the error correction template. Describe in detail below.

利用置信度对识别结果的切分。首先设置置信度门限CM-threshold和系统纠错模板最长字数max-var-length，置信度高于CM-threshold的识别结果认为是可信赖的，然后对识别结果进行切分，步骤如下：Segmentation of recognition results by confidence. First, set the confidence threshold CM-threshold and the maximum number of words in the system error correction template max-var-length. The recognition results with a confidence higher than CM-threshold are considered reliable, and then the recognition results are segmented. The steps are as follows:

1.将识别结果分块，连续的置信度高于或低于CM-threshold的字划为一个模块，将识别结果划为一个或多个(A，x，B)结构构成，其中A、B为置信度高于CM-threshold的模块，x为置信度低于CM-threshold的模块，A和B最多一个为空模块。1. Divide the recognition results into blocks, and divide the words with continuous confidence higher or lower than the CM-threshold into a module, and divide the recognition results into one or more (A, x, B) structures, where A, B is a module whose confidence is higher than CM-threshold, x is a module whose confidence is lower than CM-threshold, and at most one of A and B is an empty module.

2.对于识别结果中所有的低置信度模块x2. For all low confidence modules x in the recognition result

a)若A的长度大于或等于max-var-length，则将A中与x相邻的长为max-var-length的部分，设为sub-A，与x组成片段(sub-A，x)，sub-A用来搜索纠错模版库，sub-A的长度不固定；a) If the length of A is greater than or equal to max-var-length, set the max-var-length part of A adjacent to x as sub-A, and form a segment with x (sub-A, x ), sub-A is used to search the error correction template library, and the length of sub-A is not fixed;

b)若A长度小于max-var-length，将B中与x相邻的部分sub-B，与A、x组成片段(A，x，sub-B)，A和sub-B用来搜索纠错模版库，A和sub-B最多一个可以为空模块。其中A和sub-B的长度和不得大于max-var-length，A和sub-B的长度不固定。b) If the length of A is less than max-var-length, the part sub-B adjacent to x in B is combined with A and x to form a segment (A, x, sub-B), and A and sub-B are used to search and correct Wrong template library, at most one of A and sub-B can be an empty module. The sum of the lengths of A and sub-B must not be greater than max-var-length, and the lengths of A and sub-B are not fixed.

纠错模版候选列表的获得。将识别结果切分成片段后，将每个片段中的可信赖部分提交到纠错模版检索系统，得到与识别片段相关性高的纠错模版候选。纠错模版检索系统包括两个部分，第一部分是纠错模版索引的建立，第二部分是对纠错模版库的快速搜索。Acquisition of error correction template candidate list. After the recognition result is divided into segments, the reliable part of each segment is submitted to the error correction template retrieval system, and the error correction template candidates with high correlation with the recognition segment are obtained. The error correction template retrieval system includes two parts. The first part is the establishment of the error correction template index, and the second part is the fast search of the error correction template library.

其中第一部分的基本原理是，把语料库中所有连续的字数在6到12之间的语言片段作为纠错模版，首先从语料库中提取出所有的纠错模版，然后采用倒置文件作为索引结构对纠错模版库建立索引。为了减小倒置文件的大小，需要对倒置文件压缩。The basic principle of the first part is to use all consecutive language segments between 6 and 12 in the corpus as error correction templates, first extract all error correction templates from the corpus, and then use the inverted file as an index structure to correct Create an index for the wrong template library. In order to reduce the size of the inversion file, it is necessary to compress the inversion file.

其中第二部分的基本原理是，首先将片段中的可信赖部分转换为布尔查询，在索引库中进行快速检索。针对语音识别结果具有时序性、局部性特点，在转换为布尔查询的时候，需要加入对片段中可信赖部分的时序性要求和词与词间的局部性要求。The basic principle of the second part is to firstly convert the reliable part in the fragment into Boolean query, and perform fast retrieval in the index library. As the speech recognition results have the characteristics of timing and locality, when converting to Boolean query, it is necessary to add timing requirements for reliable parts in the segment and locality requirements between words.

声学混淆度的计算。纠错模版往往获得一个或多个候选，这时使用纠错模版与识别片断的声学混淆度挑选最优模版。对于识别片段A和纠错模版候选列表中每一个模版T_i，计算A与T_i的混淆度C(A，T_i)，当其中的最大值maxC(A，T_i)超过一个可信赖的门限时，我们使用该纠错模版替换识别片段，若maxC(A，T_i)小于该门限，则保留识别片段。Calculation of Acoustic Confusion. The error correction template usually obtains one or more candidates, and the optimal template is selected by using the error correction template and the acoustic confusion of the recognition segment. For each template T_i in the recognition segment A and error correction template candidate list, calculate the confusion degree C(A, T_i ) between A and T_i , when the maximum value maxC(A, T_i ) exceeds a reliable When the threshold is set, we use the error correction template to replace the recognition segment. If maxC(A, T_i ) is smaller than the threshold, the recognition segment is retained.

混淆度的计算包括三个部分构成，第一部分是汉语声韵母识别混淆情况的统计，第二部分是汉语声韵母识别混淆度的后验概率计算，第三部分是识别片断与纠错模版的模糊整体匹配。The calculation of the degree of confusion consists of three parts. The first part is the statistics of the confusion in the recognition of Chinese consonants and finals. The second part is the calculation of the posterior probability of the recognition of Chinese consonants and finals. The third part is the fuzziness of recognition fragments and error correction templates. overall match.

其中第一部分的基本原理是，对语音数据库进行识别，并通过以下方式得到所有声母之间的混淆情况和所有韵母之间的混淆情况：假设声韵母之间不会产生混淆，若其中一个样本其识别结果为拼音串C₁′V₁′C₂′V₂′…C_m′V_m′，该识别结果与正确的C₁V₁C₂V₂…C_nV_n进行动态对整，使得其能对上的拼音串最大，这样就能得到大量的拼音串对，即(C₁′，C₁)，(V₁′，V₁)…(C_m′，C_n)(V_m′，V_n)，统计这些拼音串对的出现次数，得到每个声母的样本总数和它被识别为其他每个声母的次数，以及得到每个韵母的样本总数和它被识别为其他每个韵母的次数。The basic principle of the first part is to identify the speech database, and obtain the confusion between all initials and all finals in the following way: Assuming that there will be no confusion between the initials and finals, if one of the samples has The recognition result is a pinyin string C₁ ′V₁ ′C₂ ′V₂ ′…C_m ′V_m ′, which is dynamically aligned with the correct C₁ V₁ C₂ V₂ …C_n V_n , so that It can match the largest pinyin string, so that a large number of pinyin string pairs can be obtained, namely (C₁ ′, C₁ ), (V₁ ′, V₁ )...(C_m ′, C_n )(V_m ′ , V_n ), count the number of occurrences of these pinyin string pairs, get the total number of samples of each initial consonant and the number of times it is recognized as each other initial consonant, and obtain the total number of samples of each final and it is recognized as each other finals times.

其中第二部分的基本原理是，根据第一部分的统计结果，首先计算每个声母被识别为其他声母的概率，C_i被混淆成C_j的模糊度，其计算公式为：The basic principle of the second part is, according to the statistical results of the first part, first calculate the probability that each initial consonant is recognized as another initial consonant, C_i is confused into the ambiguity of C_j , and its calculation formula is:

$P P (({C C}_{j j} | | {C C}_{i i})) = = \frac{Σ Σ (({C C}_{i i},, {C C}_{j j}))}{| | {C C}_{i i} | |}$

其中∑(C_i，C_j)表示C_i被识别为C_j的总数，|C_i|表示C_i样本总数，where ∑(C_i , C_j ) represents the total number of C_i identified as C_j , |C_i | represents the total number of C_i samples,

当识别结果为C_i时正确结果应为C_j的后验概率：When the recognition result is C_i , the correct result should be the posterior probability of C_j :

其中 $P (C_{j}) = \frac{| C_{j} |}{Σ | C_{i} |},$ ∑|C_i|表示所有声母的样本总数。in $P (C_{j}) = \frac{| C_{j} |}{Σ | C_{i} |},$ ∑|C_i | represents the total number of samples of all initial consonants.

其中第三块的基本原理是，设识别片段A的拼音串为C₁′V₁′C₂′V₂′…C_m′V_m′，设纠错模版候选列表中的第i个模版T_i的拼音串为C₁V₁C₂V₂…C_nV_n，则定义A与T_i的声学混淆度C(A，T_i)为：找到一个对齐(1，i₁)，(2，i₂)…(k，i_k)…(m，i_m)，该对齐使得The basic principle of the third block is that the pinyin string of the recognition segment A is C₁ ′V₁ ′C₂ ′V₂ ′…C_m ′V_m ′, and the i-th template T in the error correction template candidate list is The pinyin string of_i is C₁ V₁ C₂ V₂ ... C_n V_n , then define the acoustic confusion C(A, T_i ) between A and T_i as: find an alignment (1, i₁ ), (2 , i₂ )…(k, i_k )…(m, i_m ), the alignment makes

$\overset{~ ~}{P P} (({T T}_{i i} | | A A)) = = Π Π \overset{~ ~}{P P} (({C C}_{k k} | | {C C}_{{i i}_{k k}})) \overset{~ ~}{P P} (({V V}_{k k} | | {V V}_{{i i}_{k k}})) - - - - - - ((11))$

取得最大值，定义该最大值为A与T_i的声学混淆度。The maximum value is obtained, and the maximum value is defined as the acoustic confusion between A and T_i .

在实际应用时，首先对后验概率取对数后计算，将问题转化为使得In practical application, the logarithm of the posterior probability is first calculated, and the problem is transformed into such that

$Log log \overset{~ ~}{P P} (({T T}_{i i} | | A A)) = = Σ Σ Log log \overset{~ ~}{P P} (({C C}_{k k} | | {C C}_{{i i}_{k k}})) + + ΣLog ΣLog \overset{~ ~}{P P} (({V V}_{k k} | | {V V}_{{i i}_{k k}}))$