Movatterモバイル変換


[0]ホーム

URL:


CN105404621B - A kind of method and system that Chinese character is read for blind person - Google Patents

A kind of method and system that Chinese character is read for blind person
Download PDF

Info

Publication number
CN105404621B
CN105404621BCN201510623525.5ACN201510623525ACN105404621BCN 105404621 BCN105404621 BCN 105404621BCN 201510623525 ACN201510623525 ACN 201510623525ACN 105404621 BCN105404621 BCN 105404621B
Authority
CN
China
Prior art keywords
word
braille
participle
chinese character
tone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510623525.5A
Other languages
Chinese (zh)
Other versions
CN105404621A (en
Inventor
王向东
杨阳
钱跃良
刘宏
张金超
姜文斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CASfiledCriticalInstitute of Computing Technology of CAS
Priority to CN201510623525.5ApriorityCriticalpatent/CN105404621B/en
Publication of CN105404621ApublicationCriticalpatent/CN105404621A/en
Application grantedgrantedCritical
Publication of CN105404621BpublicationCriticalpatent/CN105404621B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Landscapes

Abstract

Translated fromChinese

本发明提出一种用于盲人读取汉字的方法及系统,涉及自然语言处理技术领域和面向残疾人的人机交互技术领域,该方法包括获取汉语文本,对所述汉语文本进行分词操作,生成汉字串,通过发音词典、多音字字典与词频信息,参考分词得到的词性标注,将所述汉字串中的每个词转换为对应的拼音并连接为拼音串;通过查找拼音和盲符的对照字典,将所述拼音串转换为盲符串,通过分词模型对所述盲符串进行盲文分词,生成初始盲文分词,将所述汉字串与所述初始盲文分词进行融合,生成新盲文分词,根据盲文分词连写规则对所述新盲文分词进行调整;对根据盲文分词连写规则调整后的所述新盲文分词进行盲文标调,生成最终盲文分词,将所述最终盲文分词进行显示。

The present invention proposes a method and system for blind people to read Chinese characters, which relate to the technical field of natural language processing and the technical field of human-computer interaction for the disabled. Chinese character string, through the pronunciation dictionary, polyphonic word dictionary and word frequency information, refer to the part-of-speech annotation obtained by word segmentation, convert each word in the Chinese character string into the corresponding pinyin and connect it into a pinyin string; a dictionary, converting the pinyin string into a Braille character string, performing Braille word segmentation on the Braille character string through a word segmentation model to generate an initial Braille word segment, and merging the Chinese character string with the initial Braille word segment to generate a new Braille word segment, Adjusting the new Braille word segmentation according to the Braille word segmentation and writing rules; performing Braille adjustment on the new Braille word segmentation adjusted according to the Braille word segmentation and writing rules to generate a final Braille word segmentation, and displaying the final Braille word segmentation.

Description

Translated fromChinese
一种用于盲人读取汉字的方法及系统A method and system for blind people to read Chinese characters

技术领域technical field

本发明涉及自然语言处理技术领域和面向残疾人的人机交互技术领域,特 别是涉及一种用于盲人读取汉字的方法及系统。The present invention relates to the technical field of natural language processing and the technical field of human-computer interaction for the disabled, in particular to a method and system for blind people to read Chinese characters.

背景技术Background technique

在当今信息社会,信息化水平不断提高,信息技术在人们的工作、学习和 生活中得到了广泛应用,而互联网也成为人们日常生活中的一个重要组成部分, 网络以一种便捷的方式为人们提供了海量的信息资源。在中国,各种数字化、 网络文本资源大多以汉语文本的形式存储,而这些资源难以被我国现有的 1200万盲人所使用。这阻碍了盲人像正常人一样享受海量的信息资源,使盲 人和正常人之间的信息鸿沟不断扩大,盲人在信息化社会中的生存和发展能力 受到进一步制约。虽然现有的语音合成技术日趋成熟,网络上大量的文本资源 可以通过语音合成转换为音频文件使得盲人可以通过听觉获得这些信息,但语音资源的存储比较耗费空间,并且在携带、查询等方面并不方便,而且,语音 通道获取信息效率较低,因此,对于盲人来说,阅读文本资源仍然是获得信息 最重要的方式。In today's information society, the level of informatization is constantly improving, information technology has been widely used in people's work, study and life, and the Internet has become an important part of people's daily life. Provides a wealth of information resources. In China, various digital and network text resources are mostly stored in the form of Chinese text, and these resources are difficult to be used by the 12 million blind people in our country. This prevents blind people from enjoying massive information resources like normal people, widens the information gap between blind people and normal people, and further restricts blind people's ability to survive and develop in an information society. Although the existing speech synthesis technology is becoming more and more mature, a large number of text resources on the Internet can be converted into audio files through speech synthesis so that blind people can obtain these information through hearing, but the storage of speech resources is relatively space-consuming, and it is not easy to carry and query. It is inconvenient, and the efficiency of obtaining information through the voice channel is low. Therefore, for the blind, reading text resources is still the most important way to obtain information.

我国盲人在阅读书写时使用的文字是中国盲文,中国盲文基于布莱尔 (Braille)盲文体系,每个盲符均以两列共6个点作为一个基本结构,这6 个点有的凸起,有的不凸起,形成64种变化,即能表示64种不同的字符。 在汉语盲文中,每种字符分别表示汉语拼音中的一个声母、韵母或声调,不同 的字符按照汉语拼音规则组成合法音节以表示汉字,因此,中国盲文本质上是 一种拼音文字。盲文一般印刷和书写在特制的较厚的盲文纸上,在盲文纸上 压出凸起的点位以供盲人摸读。为使盲人能够在计算机上摸读盲文,当前已经 设制和生产出了盲用点显器,这种设备可与计算机连接,接收计算机中的盲符 串,并将其在点显机面板上显示为相应的凸起的点位,当收到新的盲符串后, 可在面板上清除原来的点位重新显示新的点位。Chinese blind people use Chinese Braille when they read and write. Chinese Braille is based on the Braille system. Each braille symbol has two columns of 6 points as a basic structure. The non-protrusion forms 64 kinds of changes, which can represent 64 different characters. In Chinese Braille, each character represents an initial consonant, final or tone in Chinese Pinyin, and different characters form legal syllables to represent Chinese characters according to the rules of Chinese Pinyin. Therefore, Chinese Braille is essentially a kind of Pinyin writing. Braille is generally printed and written on special thicker Braille paper, on which raised dots are embossed for blind people to touch and read. In order to enable the blind to touch and read Braille on the computer, a point display for the blind has been designed and produced. This device can be connected to a computer to receive the string of braille characters in the computer and display it on the panel of the point display machine. It is displayed as a corresponding raised point. When a new blind character string is received, the original point can be cleared on the panel and a new point can be displayed again.

虽然有了点显器,但是盲人仍然很难在计算机上读取汉语文本,原因在于 还需要将汉语文本转换为盲文。由于汉语普遍存在的一音多字、一字多音等现 象,使得汉语到盲文的转换并非简单的规则对应,而需要综合考虑语法、语义 等。更为重要的是,盲文具有分词连写规则,要求将具备一定语义的词或短语 用一个“空方”分隔开来,以便于盲人理解。当前已有方法一般基于盲文分词 连写规则对汉语分词结果进行调整以得到分词后的盲文,但由于盲文分词连写 规则一般与语义相关且有一定的主观性,因此,由计算机自动完成时分词准确 率较低,在使用这些方法进行转换之后,还需要做大量人工修正工作,造成了 效率低下,也使得盲文文本资源的获取的时间较长且成本较高。因此,提高汉 盲转换的准确率,减少人工修正的操作,加快汉盲转换的效率,对于提高中文 信息资源在盲人群体中的普及率,让盲人群体更好地融入主流社会中有着重要 的现实意义。Despite the dot display, it is still difficult for blind people to read Chinese text on a computer because the Chinese text needs to be converted into Braille. Due to the ubiquitous phenomenon of one sound with multiple characters and one word with multiple sounds, the conversion from Chinese to Braille is not a simple correspondence of rules, but requires comprehensive consideration of grammar and semantics. More importantly, Braille has word segmentation rules, which require words or phrases with certain semantics to be separated by an "empty square" to facilitate understanding by blind people. The current existing methods generally adjust the Chinese word segmentation results based on the Braille word segmentation rules to obtain the Braille after word segmentation. However, because the Braille word segmentation rules are generally related to semantics and have a certain degree of subjectivity, the accuracy of word segmentation when automatically completed by a computer After using these methods for conversion, a lot of manual correction work is required, resulting in low efficiency, and also makes the acquisition of braille text resources take a long time and cost high. Therefore, improving the accuracy of Chinese-blind conversion, reducing manual correction operations, and speeding up the efficiency of Chinese-blind conversion is of great importance for improving the popularity of Chinese information resources among the blind population and allowing the blind population to better integrate into the mainstream society. significance.

发明内容Contents of the invention

针对现有技术的不足,本发明提出一种用于盲人读取汉字的方法及系统。Aiming at the deficiencies of the prior art, the present invention proposes a method and system for blind people to read Chinese characters.

本发明提出一种用于盲人读取汉字的方法,包括:The present invention proposes a method for blind people to read Chinese characters, including:

步骤1,获取汉语文本,对所述汉语文本进行分词操作,生成汉字串,通 过发音词典、多音字字典与词频信息,参考分词得到的词性标注,将所述汉字 串中的每个词转换为对应的拼音并连接为拼音串;Step 1, obtain the Chinese text, carry out the word segmentation operation on the Chinese text, generate a Chinese character string, and convert each word in the Chinese character string into The corresponding pinyin is connected into a pinyin string;

步骤2,通过查找拼音和盲符的对照字典,将所述拼音串转换为盲符串, 通过分词模型对所述盲符串进行盲文分词,生成初始盲文分词,将所述汉字串 与所述初始盲文分词进行融合,生成新盲文分词,根据盲文分词连写规则对所 述新盲文分词进行调整;Step 2, convert the pinyin string into a braille string by looking up a comparison dictionary of pinyin and braille characters, perform braille segmentation on the braille string through a word segmentation model, generate an initial braille word segmentation, combine the Chinese character string with the The initial Braille word segmentation is fused to generate a new Braille word segmentation, and the new Braille word segmentation is adjusted according to the Braille word segmentation rule;

步骤3,对根据盲文分词连写规则调整后的所述新盲文分词进行盲文标调, 生成最终盲文分词,将所述最终盲文分词进行显示。Step 3: Carry out Braille marking on the new Braille word segmentation adjusted according to the Braille word segmentation rule, generate a final Braille word segment, and display the final Braille word segment.

所述的用于盲人读取汉字的方法,所述步骤1中将所述汉字串转换成拼音 串的具体步骤为:The described method for the blind to read Chinese characters, the concrete steps that described Chinese character string is converted into pinyin string in described step 1 are:

步骤2.1对于所述汉字串中的每个词,判断每个词是否为多字词,若为 多字词,且在发音词典中能够找到所述多字词对应的拼音,则直接返回所述多 字词对应的拼音,否则执行步骤2.2;Step 2.1 For each word in the Chinese character string, judge whether each word is a multi-word word, if it is a multi-word word, and the pinyin corresponding to the multi-word word can be found in the pronunciation dictionary, then directly return to the Pinyin corresponding to multiple words, otherwise perform step 2.2;

步骤2.2将所述多字词切分为汉字的序列,依次取所述多字词中所有的汉 字,对每个汉字,执行步骤2.3至2.4;Step 2.2 is divided into the sequence of Chinese characters described multi-character word, gets all Chinese characters in described multi-character word successively, for each Chinese character, carries out step 2.3 to 2.4;

步骤2.3对于当前汉字,查找多音字字典,判断所述当前汉字是否为多 音字,若非多音字,在发音词典中查找所述当前汉字的拼音并返回所述拼音; 否则执行步骤2.4;Step 2.3 is for current Chinese character, search polyphonic word dictionary, judge whether described current Chinese character is polyphonic word, if non-polyphonic word, look up the pinyin of described current Chinese character in pronunciation dictionary and return described pinyin; Otherwise execute step 2.4;

步骤2.4若为多音字,则执行以下步骤,具体步骤为:If step 2.4 is a polyphonic word, then perform the following steps, the specific steps are:

步骤2.4.1如果当前多音字来自于一个单字词,则直接执行步骤2.4.2; 若为多字词,则执行下述步骤:Step 2.4.1 If the current polyphonic word comes from a single-character word, then directly perform step 2.4.2; if it is a multi-character word, then perform the following steps:

对于多字词中的多音字wk,a)步骤,与后续n个字组成一n+1字的词 Wk,n=wkwk+1…wk+n,在多音字词组字典中查找Wk,n,如查找到,则以Wk,n中被查 找到字的发音作为多音字wk的读音并返回;如未查到,则执行b)步骤,与前 面n个字组成一n+1字的词Wn-k,k=wn-kwn-kk+1…wn,在多音字词组字典中查找Wn-k,k, 如查找到,则以Wk,n中被查找到字的发音作为多音字的读音并返回,如未查找, 则分别与后续及前面n-1个字组成一n字的词Wk,n-1、Wn-k+1,k,对所述多字词分 别执行a)、b)步骤,直至确定所述多音字wk发音;For the polyphonic word wk in the multi-character word, a) step, form a word Wk of n+1 characters with follow-up n words, n =wk wk+1 ...wk+n , in polyphonic word Look up Wk, n in the group dictionary, if found, then with Wk, the pronunciation of the word that is found in n is used as the pronunciation of polyphonic word wk and return; If not found, then perform b) step, and the previous n Words form a word Wnk of n+1 words, k =wn-k wn-kk+1 ... wn , look up Wnk,k in the dictionary of polyphonic words, as found, then use W The pronunciation of the searched word ink, n is used as the pronunciation of the polyphonic character and returned. If not found, it forms a word Wk, n-1 and Wn-k of n characters with the subsequent and previous n-1 characters respectively+1, k , respectively carry out a) and b) steps to the multi-character word, until determining the pronunciation of the multi-phonetic word wk ;

步骤2.4.2假设所述多音字有tone1,...,tonen共n个读音,分词词性概率定 义为Ppos,权值为λ1,语言模型概率定义为Plm,权值为λ2,分词词频概率定义 为Pfreq,权值为λ3,系统为所述多音字的每一个读音计算一个得分Scorei,其中 Scorei=λ1·Ppos(tonei)+λ2·Plm(tonei)+λ3·Pfreq(tonei),取出得分最高的读音作为多音字的最终拼音并返回。Step 2.4.2 Assume that the polyphonic word has n pronunciations of tone1 ,...,tonen , the part-of-speech probability of word segmentation is defined as Ppos , the weight is λ1 , the language model probability is defined as Plm , and the weight is λ2. The frequency probability of word segmentation is defined as Pfreq , and the weight is λ3 . The system calculates a score Scorei for each pronunciation of the polyphonic character, where Scorei = λ1 ·Ppos (tonei )+λ2 ·Plm (tonei )+λ3 ·Pfreq (tonei ), take out the pronunciation with the highest score As the final pinyin of polyphonic characters and return.

所述的用于盲人读取汉字的方法,所述步骤2中进行融合的步骤为,对于 所述汉字串C=c1c2…cm与所述初始盲文分词B=b1b2…bn,其中ci,bj分别表示所述 汉字串及所述初始盲文分词中的一个分词,对于所述初始盲文分词B,将B映 射至对应的所述汉字串B'=b1'b'2…b'n,其中b'j为所述初始盲文分词bj映射为中文 后的分词。In the method for reading Chinese characters for the blind, the step of fusion in step 2 is, for the Chinese character string C=c1 c2 ...cm and the initial Braille word segmentation B=b1 b2 ... bn , where ci , bj respectively represent the Chinese character string and a participle in the initial Braille word, for the initial Braille word B, map B to the corresponding Chinese character string B'=b1 'b'2 ... b'n , where b'j is the word segment after the initial Braille word segment bj is mapped to Chinese.

所述的用于盲人读取汉字的方法,所述步骤2中盲文分词连写规则如下:The described method for the blind to read Chinese characters, in the step 2, the Braille word segmentation and ligature rules are as follows:

连写规则:POSk:[m,n]:POSk-m+…+POSk+…+POSk+n→POSk-m…POSk+nConsecutive writing rules: POSk :[m,n]:POSkm +…+POSk +…+POSk+n →POSkm …POSk+n

POSk为激活条件,m与n表示需要分别查看当前新盲文分词的前m个词 和n个词,如果m和n都为0,则表示这是一条分词规则,第二个冒号后表示 的是分词的词性组合,如果满足该组合,则执行右箭头之后的操作。POSk is the activation condition, m and n indicate that the first m words and n words of the current new Braille word segmentation need to be checked respectively, if both m and n are 0, it means that this is a word segmentation rule, and it is indicated after the second colon It is the part-of-speech combination of participle. If the combination is satisfied, the operation after the right arrow will be performed.

所述的用于盲人读取汉字的方法,所述步骤3中所述盲文标调的具体步骤 为:The described method for the blind to read Chinese characters, the specific steps of the braille standard tone described in the step 3 are:

依次查看每个调整后的所述新盲文分词对应字的拼音,并与盲文标调集中 的规则进行比对,如果满足条件,则对当前新盲文分词进行标调,所述盲文标 调集的格式如下:Check the pinyin of the corresponding word of each adjusted new Braille word segmentation in turn, and compare it with the rules in the Braille marking set. If the conditions are met, then adjust the current new Braille word segmentation. as follows:

标调规则:tonek:[n]:tonek…tonek+nStandard tone rules: tonek :[n]:tonek …tonek+n

其中tonek为当前新盲文分词的拼音,n为需要查看当前新盲文分词的后n 个新盲文分词的拼音,tonek…tonek+n为标调条件,如果拼音序列满足标调条件, 则对tonek进行标调。Among them, tonek is the pinyin of the current new Braille word segmentation, n is the pinyin of the last n new Braille word segmentations that need to be viewed, tonek ... tonek+n is the tone condition, if the pinyin sequence meets the tone condition, then Standardize tonek .

本发明还提出一种用于盲人读取汉字的系统,包括:The present invention also proposes a system for blind people to read Chinese characters, including:

获取拼音串模块,用于获取汉语文本,对所述汉语文本进行分词操作,生 成汉字串,通过发音词典、多音字字典与词频信息,参考分词得到的词性标注, 将所述汉字串中的每个词转换为对应的拼音并连接为拼音串;Obtaining the Pinyin string module is used to obtain the Chinese text, perform word segmentation operations on the Chinese text, and generate a Chinese character string. Through the pronunciation dictionary, polyphonic word dictionary and word frequency information, refer to the part-of-speech annotation obtained by word segmentation, and convert each word in the Chinese character string Words are converted into corresponding pinyin and connected as pinyin strings;

获取新盲文分词并调整模块,用于通过查找拼音和盲符的对照字典,将所 述拼音串转换为盲符串,通过分词模型对所述盲符串进行盲文分词,生成初始 盲文分词,将所述汉字串与所述初始盲文分词进行融合,生成新盲文分词,根 据盲文分词连写规则对所述新盲文分词进行调整;Obtain a new Braille word segmentation and adjust the module, which is used to convert the pinyin string into a Braille string by looking up a comparison dictionary of Pinyin and Braille characters, and perform Braille word segmentation on the Braille string through a word segmentation model to generate an initial Braille word segment, and The Chinese character string is fused with the initial Braille word segmentation to generate a new Braille word segmentation, and the new Braille word segmentation is adjusted according to the Braille word segmentation rule;

盲文显示模块,用于对根据盲文分词连写规则调整后的所述新盲文分词进 行盲文标调,生成最终盲文分词,将所述最终盲文分词进行显示。The Braille display module is configured to perform Braille marking on the new Braille word segmentation adjusted according to the Braille word segmentation rule, to generate the final Braille word segmentation, and to display the final Braille word segmentation.

所述的用于盲人读取汉字的系统,所述获取拼音串模块中将所述汉字串转 换成拼音串的具体步骤为:The described system for the blind to read Chinese characters, the specific steps of converting the Chinese character strings into pinyin strings in the described acquisition pinyin string module are:

步骤2.1对于所述汉字串中的每个词,判断每个词是否为多字词,若为 多字词,且在发音词典中能够找到所述多字词对应的拼音,则直接返回所述多 字词对应的拼音,否则执行步骤2.2;Step 2.1 For each word in the Chinese character string, judge whether each word is a multi-word word, if it is a multi-word word, and the pinyin corresponding to the multi-word word can be found in the pronunciation dictionary, then directly return to the Pinyin corresponding to multiple words, otherwise perform step 2.2;

步骤2.2将所述多字词切分为汉字的序列,依次取所述多字词中所有的汉 字,对每个汉字,执行步骤2.3至2.4;Step 2.2 is divided into the sequence of Chinese characters described multi-character word, gets all Chinese characters in described multi-character word successively, for each Chinese character, carries out step 2.3 to 2.4;

步骤2.3对于当前汉字,查找多音字字典,判断所述当前汉字是否为多 音字,若非多音字,在发音词典中查找所述当前汉字的拼音并返回所述拼音; 否则执行步骤2.4;Step 2.3 is for current Chinese character, search polyphonic word dictionary, judge whether described current Chinese character is polyphonic word, if non-polyphonic word, look up the pinyin of described current Chinese character in pronunciation dictionary and return described pinyin; Otherwise execute step 2.4;

步骤2.4若为多音字,则执行以下步骤,具体步骤为:If step 2.4 is a polyphonic word, then perform the following steps, the specific steps are:

步骤2.4.1如果当前多音字来自于一个单字词,则直接执行步骤2.4.2; 若为多字词,则执行下述步骤:Step 2.4.1 If the current polyphonic word comes from a single-character word, then directly perform step 2.4.2; if it is a multi-character word, then perform the following steps:

对于多字词中的多音字wk,a)步骤,与后续n个字组成一n+1字的词 Wk,n=wkwk+1…wk+n,在多音字词组字典中查找Wk,n,如查找到,则以Wk,n中被查 找到字的发音作为多音字wk的读音并返回;如未查到,则执行b)步骤,与前 面n个字组成一n+1字的词Wn-k,k=wn-kwn-kk+1…wn,在多音字词组字典中查找Wn-k,k, 如查找到,则以Wk,n中被查找到字的发音作为多音字的读音并返回,如未查找, 则分别与后续及前面n-1个字组成一n字的词Wk,n-1、Wn-k+1,k,对所述多字词分 别执行a)、b)步骤,直至确定所述多音字wk发音;For the polyphonic word wk in the multi-character word, a) step, form a word Wk of n+1 characters with follow-up n words, n =wk wk+1 ...wk+n , in polyphonic word Look up Wk, n in the group dictionary, if found, then with Wk, the pronunciation of the word that is found in n is used as the pronunciation of polyphonic word wk and return; If not found, then perform b) step, and the previous n Words form a word Wnk of n+1 words, k =wn-k wn-kk+1 ... wn , look up Wnk,k in the dictionary of polyphonic words, as found, then use W The pronunciation of the searched word ink, n is used as the pronunciation of the polyphonic character and returned. If not found, it forms a word Wk, n-1 and Wn-k of n characters with the subsequent and previous n-1 characters respectively+1, k , respectively carry out a) and b) steps to the multi-character word, until determining the pronunciation of the multi-phonetic word wk ;

步骤2.4.2假设所述多音字有tone1,...,tonen共n个读音,分词词性概率定 义为Ppos,权值为λ1,语言模型概率定义为Plm,权值为λ2,分词词频概率定义 为Pfreq,权值为λ3,系统为所述多音字的每一个读音计算一个得分Scorei,其中 Scorei=λ1·Ppos(tonei)+λ2·Plm(tonei)+λ3·Pfreq(tonei),取出得分最高的读音作为多音字的最终拼音并返回。Step 2.4.2 Assume that the polyphonic word has n pronunciations of tone1 ,...,tonen , the part-of-speech probability of word segmentation is defined as Ppos , the weight is λ1 , the language model probability is defined as Plm , and the weight is λ2. The frequency probability of word segmentation is defined as Pfreq , and the weight is λ3 . The system calculates a score Scorei for each pronunciation of the polyphonic character, where Scorei = λ1 ·Ppos (tonei )+λ2 ·Plm (tonei )+λ3 ·Pfreq (tonei ), take out the pronunciation with the highest score As the final pinyin of polyphonic characters and return.

所述的用于盲人读取汉字的系统,所述获取新盲文分词并调整模块中进行 融合的步骤为,对于所述汉字串C=c1c2…cm与所述初始盲文分词B=b1b2…bn, 其中ci,bj分别表示所述汉字串及所述初始盲文分词中的一个分词,对于所述初 始盲文分词B,将B映射至对应的所述汉字串B'=b1'b'2…b'n,其中b'j为所述初始 盲文分词bj映射为中文后的分词。In the system for blind people to read Chinese characters, the step of obtaining new Braille word segmentation and adjusting the fusion in the module is, for the Chinese character string C=c1 c2 ...cm and the initial Braille word segmentation B= b1 b2 …bn , where ci , bj respectively represent the Chinese character string and a participle in the initial Braille word segmentation, for the initial Braille word B, map B to the corresponding Chinese character string B '=b1 'b'2 ...b'n , where b'j is the word segmentation after the initial Braille word bj is mapped to Chinese.

所述的用于盲人读取汉字的系统,所述获取新盲文分词并调整模块中盲文 分词连写规则如下:The described system for the blind to read Chinese characters, the new braille word segmentation is obtained and the braille word segmentation ligature rules in the adjustment module are as follows:

连写规则:POSk:[m,n]:POSk-m+…+POSk+…+POSk+n→POSk-m…POSk+nConsecutive writing rules: POSk :[m,n]:POSkm +…+POSk +…+POSk+n →POSkm …POSk+n

POSk为激活条件,m与n表示需要分别查看当前新盲文分词的前m个词 和n个词,如果m和n都为0,则表示这是一条分词规则,第二个冒号后表示 的是分词的词性组合,如果满足该组合,则执行右箭头之后的操作。POSk is the activation condition, m and n indicate that the first m words and n words of the current new Braille word segmentation need to be checked respectively, if both m and n are 0, it means that this is a word segmentation rule, and it is indicated after the second colon It is the part-of-speech combination of participle. If the combination is satisfied, the operation after the right arrow will be performed.

所述的用于盲人读取汉字的系统,所述盲文显示模块中所述盲文标调的具 体步骤为:The described system for the blind to read Chinese characters, the specific steps of the braille standard tone in the described braille display module are:

依次查看每个调整后的所述新盲文分词对应字的拼音,并与盲文标调集中 的规则进行比对,如果满足条件,则对当前新盲文分词进行标调,所述盲文标 调集的格式如下:Check the pinyin of the corresponding word of each adjusted new Braille word segmentation in turn, and compare it with the rules in the Braille marking set. If the conditions are met, then adjust the current new Braille word segmentation. as follows:

标调规则:tonek:[n]:tonek…tonek+nStandard tone rules: tonek :[n]:tonek …tonek+n

其中tonek为当前新盲文分词的拼音,n为需要查看当前新盲文分词的后n 个新盲文分词的拼音,tonek…tonek+n为标调条件,如果拼音序列满足标调条件, 则对tonek进行标调。Among them, tonek is the pinyin of the current new Braille word segmentation, n is the pinyin of the last n new Braille word segmentations that need to be checked for the current new Braille word segmentation, tonek ... tonek+n is the tone condition, if the pinyin sequence meets the tone condition, then Standardize tonek .

由以上方案可知,本发明的优点在于:As can be seen from the above scheme, the present invention has the advantages of:

本发明不同于现有的汉盲转换技术中,先对汉字串进行汉语分词,再在分 词结果上运用一系列复杂的分词连写规则进行二次处理的做法,本发明利用构 建的基于统计机器学习技术的盲文分词模型直接对盲符串进行一步式分词,分 词结果基本符合盲文分词连写规则,只需进行少量微调即可作为盲文输出,相 比现有技术,避免了用计算机处理复杂的、涉及语义的分词连写规则导致的准 确率不高的问题,分词准确率和整体汉盲转换准确率都有较大的提升。The present invention is different from the existing Chinese blind conversion technology, which first performs Chinese word segmentation on Chinese character strings, and then uses a series of complex word segmentation rules for secondary processing on the word segmentation results. The present invention utilizes the constructed statistical machine learning The advanced braille word segmentation model directly performs one-step word segmentation on the braille character string. The word segmentation result basically conforms to the braille word segmentation rule, and can be output as braille with only a small amount of fine-tuning. Compared with the existing technology, it avoids the use of computers to deal with complicated The problem of low accuracy caused by semantic word segmentation and consecutive writing rules, the accuracy of word segmentation and the overall conversion accuracy of Chinese characters have been greatly improved.

附图说明Description of drawings

图1为用于盲人读取汉字的方法流程图;Fig. 1 is the flow chart of the method for reading Chinese characters for the blind;

图2为分词后的汉字串转换为拼音串的流程图。Fig. 2 is a flow chart of converting Chinese character strings after word segmentation into pinyin strings.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚,以下结合附图及实施例, 对本发明的用于盲人读取汉字的方法进行进一步详细说明,应当理解,此处所 描述的具体实施例仅用以解释本发明,并不用于限定本发明。In order to make the purpose, technical solution and advantages of the present invention clearer, the method for reading Chinese characters for the blind of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used To explain the present invention, not to limit the present invention.

本发明的用于盲人读取汉字的方法主要流程如附图1所示,其输入为一个 汉语句子,即一个汉字串,输出为相应的盲文,并显示在盲用点显器上。The main flow of the method for the blind to read Chinese characters of the present invention is as shown in accompanying drawing 1, and its input is a Chinese sentence, i.e. a Chinese character string, and the output is corresponding Braille, and is shown on the point display for the blind.

步骤1.汉语分词。即采用汉语分词系统将输入的汉字串切分为汉语词的 序列,得到分词后的汉字串,同时为每个词标注词性,汉语分词可采用当前已 有的各种方法和系统,如基于词典的最大或最小匹配方法,基于隐马尔科夫模 型(HMM)的方法,基于最大熵模型的方法等;Step 1. Chinese word segmentation. That is, the Chinese word segmentation system is used to divide the input Chinese character string into a sequence of Chinese words, and the Chinese character string after word segmentation is obtained. At the same time, the part of speech is marked for each word. Chinese word segmentation can use various methods and systems currently available, such as based on a dictionary The maximum or minimum matching method, the method based on the hidden Markov model (HMM), the method based on the maximum entropy model, etc.;

步骤2.将分词后的汉字串转换为拼音串,即利用发音词典、多音字字典 和词频信息,参考分词得到的词性标注,将分词后的汉字串中的每个词转换为 对应的拼音并连接为拼音串,所述发音词典为汉字词(包括单字词与多字词) 与拼音的映射表。在一个实施例中,发音词典的规模为7万词左右,所述多音 字字典中列出了所有的多音字及其每个多音字对应的多个拼音,所述词频信息 为每个汉字的在汉语文本中的出现频率,该信息预先采用大量汉语文本统计得 到。在一个实施例中,词频信息中字的规模为7000字左右。Step 2. Convert the Chinese character string after the word segmentation into a pinyin string, that is, use the pronunciation dictionary, polyphone dictionary and word frequency information, refer to the part-of-speech tag obtained by the word segmentation, convert each word in the Chinese character string after the word segmentation into the corresponding pinyin and The connection is a pinyin string, and the pronunciation dictionary is a mapping table of Chinese characters (including single-character words and multi-character words) and pinyin. In one embodiment, the scale of the pronunciation dictionary is about 70,000 words, all polyphonic characters and multiple pinyin corresponding to each polyphonic character thereof are listed in the polyphonic dictionary, and the word frequency information is the number of each Chinese character The frequency of occurrence in Chinese texts, the information is obtained in advance by statistics of a large number of Chinese texts. In one embodiment, the scale of words in the word frequency information is about 7000 words.

以下为分词的具体步骤,如图2所示:The following are the specific steps of word segmentation, as shown in Figure 2:

步骤2.1对于分词后的汉字串中的每个词,判断该词是否为多字词(包 含两个或以上的汉字),若为多字词,且在发音词典中能够找到该词对应的拼 音,则直接返回该拼音,否则执行步骤2.2;Step 2.1 For each word in the Chinese character string after word segmentation, judge whether the word is a multi-character word (comprising two or more Chinese characters), if it is a multi-character word, and the corresponding pinyin of the word can be found in the pronunciation dictionary , return the pinyin directly, otherwise go to step 2.2;

步骤2.2对于输入的词(单字词或多字词),将词切分为汉字的序列,依 次取其所有的汉字,对每个汉字,执行步骤2.3至2.4;Step 2.2 For the input word (single-character word or multi-character word), the word is segmented into a sequence of Chinese characters, and all Chinese characters thereof are successively taken, and for each Chinese character, steps 2.3 to 2.4 are performed;

步骤2.3对于当前汉字,查找多音字字典,判断该字是否为多音字,若 非多音字,在发音词典中查找该字的拼音并返回该拼音;否则执行步骤2.4;Step 2.3 is for current Chinese character, searches polyphonic word dictionary, judges whether this word is polyphonic word, if non-polyphonic word, searches the pinyin of this word in pronunciation dictionary and returns this pinyin; Otherwise execution step 2.4;

步骤2.4对于多音字,需综合多种信息确定多音字的拼音。具体步骤为:Step 2.4 For polyphonic characters, multiple information needs to be integrated to determine the pinyin of polyphonic characters. The specific steps are:

步骤2.4.1如果当前多音字来自于一个单字词,则直接执行步骤2.4.2; 否则先执行下述步骤:Step 2.4.1 If the current polyphonic character comes from a single-character word, then directly perform step 2.4.2; otherwise, perform the following steps first:

对于多字词中的多音字wk,a)与后续n个字组成一n+1字的词 Wk,n=wkwk+1…wk+n,在多音字词组字典中查找Wk,n,如查找到,则以该词组中该 字的发音作为多音字的读音并返回;如未查到,则b)与前面n个字组成一n+1 字的词Wn-k,k=wn-kwn-kk+1…wn,在多音字词组字典中查找Wn-k,k,如查找到,则以 该词组中该字的发音作为多音字的读音并返回,如未查找,则分别与后续及前 面n-1个字组成一n字的词Wk,n-1、Wn-k+1,k,对该词分别执行a)、b)步骤,直至 确定该多音字发音。如果n=1时,Wk,k+1、Wk-1,k仍然无法在多音字词组字典查找 到读音,则返回空;For the polyphonic word wk in the multi-character word, a) form a word Wk of n+1 word with follow-up n words, n =wk wk+1 ...wk+n , in polyphonic word phrase dictionary Look up Wk,n in, if found, then take the pronunciation of this character in this phrase as the pronunciation of polyphonic characters and return; If not found, then b) form a word W of n+1 words with the previous n wordsnk, k =wnk wn-kk+1 ...wn , look up Wnk, k in the polyphonic word phrase dictionary, if found, then take the pronunciation of the word in the phrase as the pronunciation of the polyphonic word and return , if not found, then respectively form a word Wk,n-1 , Wn-k+1,k of n characters with subsequent and previous n-1 characters, and perform steps a) and b) respectively on the word, Until the polyphone pronunciation is determined. If when n=1, Wk, k+1 , Wk-1, k still can't find pronunciation in polyphonic word phrase dictionary, then return empty;

步骤2.4.2假设多音字有tone1,...,tonen共n个读音,分词词性概率定义为 Ppos,权值为λ1,语言模型概率定义为Plm,权值为λ2,分词词频概率定义为Pfreq, 权值为λ3,系统为多音字的每一个读音计算一个得分Scorei,其中 Scorei=λ1·Ppos(tonei)+λ2·Plm(tonei)+λ3·Pfreq(tonei),取出得分最高的读音作为多音字的最终拼音并返回。需要说明的是,对于词性、词频、语言模型各 类型每一发音的概率,需要进行归一化处理,各类型的权值可根据经验值设定。Step 2.4.2 Assume that polyphonic characters have n pronunciations of tone1 ,...,tonen , the part-of-speech probability of word segmentation is defined as Ppos , the weight is λ1 , the language model probability is defined as Plm , and the weight is λ2 , The word frequency probability of word segmentation is defined as Pfreq , and the weight is λ3 . The system calculates a score Scorei for each pronunciation of polyphonic characters, where Scorei = λ1 ·Ppos (tonei )+λ2 ·Plm (tonei )+λ3 ·Pfreq (tonei ), take out the pronunciation with the highest score As the final pinyin of polyphonic characters and return. It should be noted that the probability of each pronunciation of each type of part of speech, word frequency, and language model needs to be normalized, and the weights of each type can be set according to empirical values.

步骤3.将拼音串转换为盲符串。通过查找拼音和盲符的对照字典,将步 骤2得到的拼音串转换为盲符串,此时的盲符串是未分词的盲符串。所述拼音 和盲符的对照字典为拼音和对应盲符的映射表。Step 3. Convert the pinyin string to a blind character string. By searching the comparison dictionary of pinyin and blind characters, the pinyin string obtained in step 2 is converted into a blind character string, and the blind character string at this moment is an unsegmented blind character string. The contrast dictionary of described pinyin and blind symbols is a mapping table of pinyin and corresponding blind symbols.

步骤4.采用预先用统计机器学习方法训练好的分词模型进行盲文分词, 生成初始盲文分词。采用当前相关领域常用的感知器模型,模型训练时采用已 经分好词的盲文语料,采用的特征为一元特征、二元特征和属性特征。分词时 对盲符串的每一个可以切分的位置提取特征并利用训练好的模型计算概率,根 据概率判断是否需要在该位置进行词语切分。Step 4. Use the pre-trained word segmentation model using statistical machine learning methods to perform Braille word segmentation to generate initial Braille word segmentation. The current commonly used perceptron model in related fields is used. The Braille corpus that has been divided into words is used for model training. The features used are unary features, binary features and attribute features. During word segmentation, extract features from each segmentable position of the blind character string and use the trained model to calculate the probability, and judge whether word segmentation is required at this position according to the probability.

训练模型采用感知机算法,学习到从输入到输出的判别式映射模型,输入 是训练语料中的句子,输出是相应的标注结果。The training model uses the perceptron algorithm to learn a discriminative mapping model from input to output. The input is the sentence in the training corpus, and the output is the corresponding labeling result.

对盲文句子的分词采用字分类模型。给定一个由n个字组成的句子,分词 的过程是将这个句子分成m(m≤n)块,每一块是一个有意义的词。给每个字分 配一个代表其在词中位置的类标,将分词问题转化成字分类问题。采用b,m,e,s 作为字的边界类标,b,m,e分别代表该字位于词的开始位置、中间位置、结束 位置,s代表该字是单字词。解码过程是寻找使得分数评价函数f(x)最高的标 注序列y。The word classification model is used for word segmentation of Braille sentences. Given a sentence consisting of n characters, the process of word segmentation is to divide the sentence into m (m≤n) blocks, each block is a meaningful word. Assign a class label representing its position in the word to each word, and transform the word segmentation problem into a word classification problem. Adopt b, m, e, s as the boundary class mark of the word, b, m, e represent that the word is located in the beginning position, middle position, and end position of the word respectively, and s represents that the word is a single word. The decoding process is to find the label sequence y that makes the score evaluation function f(x) the highest.

其中,f(x)分数累加了每个字和类标对的分数, (i,t)∈y(s.t.1≤i≤n,t∈{b,m,e,s}),Φ(x,y)是特征提取函数,是参数向量。分词 使用维特比解码算法。Among them, the f(x) score accumulates the scores of each word and class label pair, (i,t)∈y(st1≤i≤n,t∈{b,m,e,s}), Φ(x, y) is the feature extraction function, is a parameter vector. Word segmentation uses the Viterbi decoding algorithm.

步骤5.汉语和初始盲文分词进行融合,即利用汉语盲文分词结果对盲文 分词结果进行微调,以进一步提高分词的准确率。Step 5. Fusion of Chinese and initial Braille word segmentation, that is, fine-tuning the Braille word segmentation results using the Chinese Braille word segmentation results to further improve the accuracy of word segmentation.

对于中文分词C=c1c2…cm和盲文分词B=b1b2…bn,其中ci,bj分别表示中文及 盲文中的一个分词,对于盲文分词B,可以将B映射至对应的中文分词 B'=b1'b'2…b'n,其中b'j为盲文分词bj映射为中文后的分词。对中文分词C和映射 为中文的盲文分词B'进行编辑距离对齐,可以得到C和B'中不同的片段,运用 上述的融合规则,确定不同片段的最终结果是采用中文分词结果还是盲文分词 结果。假设C和B'中不同的片段分别定义为CH=ch1ch2…chm和BR=br1br2…brn, 具体步骤如下:For Chinese participle C=c1 c2 ...cm and Braille participle B=b1 b2 ...bn , where ci and bj respectively represent a participle in Chinese and Braille, and for Braille participle B, B can be mapped To the corresponding Chinese word segmentation B'=b1 'b'2 ...b'n , where b'j is the word segmentation after bj is mapped to Chinese in Braille. Align the Chinese word segmentation C with the Braille word segmentation B' mapped to Chinese to obtain different segments in C and B', and use the above fusion rules to determine whether the final results of different segments use the Chinese word segmentation result or the Braille word segmentation result . Assuming that the different fragments in C and B' are respectively defined as CH=ch1 ch2 ...chm and BR=br1 br2 ...brn , the specific steps are as follows:

步骤5.1假设chi为CH中第i个分词,brj为BR中第j个分词,初始值i,j 都设置为1Step 5.1 Assume that chi is the i-th participle in CH, brj is the j-th participle in BR, and the initial values i and j are both set to 1

步骤5.2分别比较chi和brj,如果说明第一个分词中,盲文分词 包含中文分词,则对于第一个分词,采用盲文分词的结果brj;相反的,如果则采用中文分词的结果chiStep 5.2 Compare chi and brj respectively, if It shows that in the first participle, the Braille participle contains the Chinese participle, then for the first participle, the result of Braille participle brj is used; on the contrary, if Then use the result of Chinese word segmentation chi

步骤5.3初始设置k=1Step 5.3 Initially set k=1

5.3.1对于的情况,定义chi,i+k=chi…chi+k,比较chi,i+k和brj:5.3.1 For In the case of , define chi,i+k =chi ...chi+k , and compare chi,i+k with brj :

a)如果chi,i+1=brj,设置i=i+2,j=j+1,如果i>m或j>n,跳转至步骤5.4, 否则,跳转至步骤5.2a) If chi,i+1 =brj , set i=i+2,j=j+1, if i>m or j>n, go to step 5.4, otherwise, go to step 5.2

b)如果k=k+1,跳转至5.3.1b) if k=k+1, skip to 5.3.1

c)如果说明chi+k中包含brj中的最后一个字,定义该字的位置 为pos,则以pos为分界,将chi+k分为chi+k,pos和chi+k,after_pos,其中 chi+k=chi+k,poschi+k,after_pos,chi+k,pos表示chi+k中第1个到第pos个字组成的词组, chi+k,after_pos表示chi+k中第pos+1字到最后一个字组成的词组。将中文分词中第 i+k个分词用chi+k,after_pos替换,即更新CH=ch1…chi+k-1chi+k,after_poschi+k+1…chm, i=i+k,j=j+1,跳转至步骤5.2c) if Explain that chi+k contains the last word in brj , define the position of the word as pos, then use pos as the boundary, divide chi+k into chi+k,pos and chi+k,after_pos , Wherein chi+k =chi+k,pos chi+k,after_pos , chi+k,pos represents the phrase formed from the first to pos words in chi+k , chi+k,after_pos represents A phrase consisting of the word pos+1 to the last word in chi+k . Replace the i+k participle in the Chinese word segmentation with chi+k, after_pos , that is, update CH=ch1 ...chi+k-1 chi+k,after_pos chi+k+1 ...chm , i= i+k, j=j+1, skip to step 5.2

5.3.2对于的情况,定义brj,j+k=brj…brj+k,比较brj,j+k和chi:5.3.2 For In the case of , define brj,j+k =brj ...brj+k , and compare brj,j+k with chi :

a)如果brj,j+1=chi,则i=i+1,j=j+2,跳转至步骤5.2a) If brj,j+1 =chi , then i=i+1,j=j+2, go to step 5.2

b)如果k=k+1,跳转至5.3.2b) if k=k+1, skip to 5.3.2

c)如果说明brj+k中包含chi中的最后一个字,定义该字的位置 为pos,则以pos为分界,将brj+k分为brj+k,pos和brj+k,after_pos,其中 brj+k=brj+k,posbrj+k,after_pos,brj+k,pos表示brj+k中第1个到第pos个字组成的词组, brj+k,after_pos表示brj+k中第pos+1字到最后一个字组成的词组。将盲文分词中第 j+k个分词用brj+k,after_pos替换,即更新BR=br1…brj+k-1brj+k,after_posbrj+k+1…brn, i=i+1,j=j+k,跳转至步骤5.2c) if Explain that brj+k contains the last word in chi , define the position of the word as pos, then use pos as the boundary, divide brj+k into brj+k,pos and brj+k,after_pos , Wherein brj+k =brj+k, pos brj+k, after_pos , brj+k, pos represents the phrase formed from the first to pos words in brj+k , brj+k, after_pos represents brj+k is a phrase composed of the word pos+1 to the last word. Replace the j+kth participle in Braille word segmentation with brj+k,after_pos , that is, update BR=br1 ...brj+k-1 brj+k,after_pos brj+k+1 ...brn , i= i+1, j=j+k, skip to step 5.2

步骤5.4结束整合算法Step 5.4 ends the integration algorithm

步骤6.根据盲文分词连写规则调整分词结果。依次查看分词对应的词性, 并与盲文分词连写规则集中的激活条件进行比对,如果符合,则运用规则集中 的条件对结果进行分词或连写。盲文分词连写规则集格式如下:Step 6. Adjust word segmentation results according to Braille word segmentation rules. Check the part of speech corresponding to the word segmentation in turn, and compare it with the activation conditions in the Braille word segmentation and writing rule set. If they match, use the conditions in the rule set to perform word segmentation or link writing on the result. The Braille word segmentation rule set format is as follows:

连写规则:POSk:[m,n]:POSk-m+…+POSk+…+POSk+n→POSk-m…POSk+nConsecutive writing rules: POSk :[m,n]:POSkm +…+POSk +…+POSk+n →POSkm …POSk+n

对于规则集中的规则,第一个冒号前的词性POSk是激活条件,规则后会 跟一中括号,里面的m和n表示需要分别查看当前分词的前m个词和n个词, 如果m和n都为0,则表示这是一条分词规则。第二个冒号后表示的是分词的 词性组合,如果满足该组合,则执行右箭头之后的操作。For the rules in the rule set, the part-of-speech POSk before the first colon is the activation condition, and there will be a square bracket after the rule. The m and n inside indicate that you need to check the first m words and n words of the current participle, if m and n are both 0, which means this is a word segmentation rule. The part-of-speech combination of the participle is indicated after the second colon. If the combination is satisfied, the operation after the right arrow will be performed.

步骤7.盲文标调。依次查看每个分词对应字的拼音,并与盲文标调集中 的规则进行比对,如果满足条件,则对当前字进行标调。盲文标调集的格式如 下:Step 7. Braille coding. Check the pinyin of the character corresponding to each participle in turn, and compare it with the rules in the Braille marking set. If the conditions are met, the current word will be marked. The format of the Braille collection is as follows:

标调规则:tonek:[n]:tonek…tonek+nStandard tone rules: tonek :[n]:tonek …tonek+n

其中tonek为当前字的拼音,方括号中的n表示需要查看当前字的后n个字 的拼音,tonek…tonek+n为标调条件,如果拼音序列满足标调条件,则对tonek进 行标调Among them, tonek is the pinyin of the current word, n in the square brackets indicates that the pinyin of the last n characters of the current word needs to be checked, tonek ... tonek+n is the tone condition, if the pinyin sequence meets the tone condition, then tonek for calibration

步骤8.盲文显示,即将盲文输出到盲用点显器上。可采用当前已有的各 种点显器产品,并调用其相应的输出接口。Step 8. Braille display, that is, to output the Braille to the dot display for the blind. Various existing dot display products can be used and their corresponding output interfaces can be called.

本发明还提出一种用于盲人读取汉字的系统,包括:The present invention also proposes a system for blind people to read Chinese characters, including:

获取拼音串模块,用于获取汉语文本,对所述汉语文本进行分词操作,生 成汉字串,通过发音词典、多音字字典与词频信息,参考分词得到的词性标注, 将所述汉字串中的每个词转换为对应的拼音并连接为拼音串;Obtaining the Pinyin string module is used to obtain the Chinese text, perform word segmentation operations on the Chinese text, and generate a Chinese character string. Through the pronunciation dictionary, polyphonic word dictionary and word frequency information, refer to the part-of-speech annotation obtained by word segmentation, and convert each word in the Chinese character string Words are converted into corresponding pinyin and connected as pinyin strings;

获取新盲文分词并调整模块,用于通过查找拼音和盲符的对照字典,将所 述拼音串转换为盲符串,通过分词模型对所述盲符串进行盲文分词,生成初始 盲文分词,将所述汉字串与所述初始盲文分词进行融合,生成新盲文分词,根 据盲文分词连写规则对所述新盲文分词进行调整;Obtain a new Braille word segmentation and adjust the module, which is used to convert the pinyin string into a Braille string by looking up a comparison dictionary of Pinyin and Braille characters, and perform Braille word segmentation on the Braille string through a word segmentation model to generate an initial Braille word segment, and The Chinese character string is fused with the initial Braille word segmentation to generate a new Braille word segmentation, and the new Braille word segmentation is adjusted according to the Braille word segmentation rule;

盲文显示模块,用于对根据盲文分词连写规则调整后的所述新盲文分词进 行盲文标调,生成最终盲文分词,将所述最终盲文分词进行显示。The Braille display module is configured to perform Braille marking on the new Braille word segmentation adjusted according to the Braille word segmentation rule, to generate the final Braille word segmentation, and to display the final Braille word segmentation.

所述获取拼音串模块中将所述汉字串转换成拼音串的具体步骤为:The specific steps of converting the Chinese character string into a pinyin string in the module of obtaining the pinyin string are:

步骤2.1对于所述汉字串中的每个词,判断每个词是否为多字词,若为 多字词,且在发音词典中能够找到所述多字词对应的拼音,则直接返回所述多 字词对应的拼音,否则执行步骤2.2;Step 2.1 For each word in the Chinese character string, judge whether each word is a multi-word word, if it is a multi-word word, and the pinyin corresponding to the multi-word word can be found in the pronunciation dictionary, then directly return to the Pinyin corresponding to multiple words, otherwise perform step 2.2;

步骤2.2将所述多字词切分为汉字的序列,依次取所述多字词中所有的汉 字,对每个汉字,执行步骤2.3至2.4;Step 2.2 is divided into the sequence of Chinese characters described multi-character word, gets all Chinese characters in described multi-character word successively, for each Chinese character, carries out step 2.3 to 2.4;

步骤2.3对于当前汉字,查找多音字字典,判断所述当前汉字是否为多 音字,若非多音字,在发音词典中查找所述当前汉字的拼音并返回所述拼音; 否则执行步骤2.4;Step 2.3 is for current Chinese character, search polyphonic word dictionary, judge whether described current Chinese character is polyphonic word, if non-polyphonic word, look up the pinyin of described current Chinese character in pronunciation dictionary and return described pinyin; Otherwise execute step 2.4;

步骤2.4若为多音字,则执行以下步骤,具体步骤为:If step 2.4 is a polyphonic word, then perform the following steps, the specific steps are:

步骤2.4.1如果当前多音字来自于一个单字词,则直接执行步骤2.4.2; 若为多字词,则执行下述步骤:Step 2.4.1 If the current polyphonic word comes from a single-character word, then directly perform step 2.4.2; if it is a multi-character word, then perform the following steps:

对于多字词中的多音字wk,a)步骤,与后续n个字组成一n+1字的词 Wk,n=wkwk+1…wk+n,在多音字词组字典中查找Wk,n,如查找到,则以Wk,n中被查 找到字的发音作为多音字wk的读音并返回;如未查到,则执行b)步骤,与前 面n个字组成一n+1字的词Wn-k,k=wn-kwn-kk+1…wn,在多音字词组字典中查找Wn-k,k, 如查找到,则以Wk,n中被查找到字的发音作为多音字的读音并返回,如未查找, 则分别与后续及前面n-1个字组成一n字的词Wk,n-1、Wn-k+1,k,对所述多字词分 别执行a)、b)步骤,直至确定所述多音字wk发音;For the polyphonic word wk in the multi-character word, a) step, form a word Wk of n+1 characters with follow-up n words, n =wk wk+1 ...wk+n , in polyphonic word Look up Wk, n in the group dictionary, if found, then with Wk, the pronunciation of the word that is found in n is used as the pronunciation of polyphonic word wk and return; If not found, then perform b) step, and the previous n Words form a word Wnk of n+1 words, k =wn-k wn-kk+1 ... wn , look up Wnk,k in the dictionary of polyphonic words, as found, then use W The pronunciation of the searched word ink, n is used as the pronunciation of the polyphonic character and returned. If not found, it forms a word Wk, n-1 and Wn-k of n characters with the subsequent and previous n-1 characters respectively+1, k , respectively carry out a) and b) steps to the multi-character word, until determining the pronunciation of the multi-phonetic word wk ;

步骤2.4.2假设所述多音字有tone1,...,tonen共n个读音,分词词性概率定 义为Ppos,权值为λ1,语言模型概率定义为Plm,权值为λ2,分词词频概率定义 为Pfreq,权值为λ3,系统为所述多音字的每一个读音计算一个得分Scorei,其中 Scorei=λ1·Ppos(tonei)+λ2·Plm(tonei)+λ3·Pfreq(tonei),取出得分最高的读音作为多音字的最终拼音并返回。Step 2.4.2 Assume that the polyphonic word has n pronunciations of tone1 ,...,tonen , the part-of-speech probability of word segmentation is defined as Ppos , the weight is λ1 , the language model probability is defined as Plm , and the weight is λ2. The frequency probability of word segmentation is defined as Pfreq , and the weight is λ3 . The system calculates a score Scorei for each pronunciation of the polyphonic character, where Scorei = λ1 ·Ppos (tonei )+λ2 ·Plm (tonei )+λ3 ·Pfreq (tonei ), take out the pronunciation with the highest score As the final pinyin of polyphonic characters and return.

所述获取新盲文分词并调整模块中进行融合的步骤为,对于所述汉字串 C=c1c2…cm与所述初始盲文分词B=b1b2…bn,其中ci,bj分别表示所述汉字串及所 述初始盲文分词中的一个分词,对于所述初始盲文分词B,将B映射至对应的 所述汉字串B'=b1'b'2…b'n,其中b'j为所述初始盲文分词bj映射为中文后的分词。The step of obtaining the new Braille word segmentation and adjusting the fusion in the module is, for the Chinese character string C=c1 c2 ...cm and the initial Braille word segmentation B=b1 b2 ...bn , where ci , bj represent the Chinese character string and a word in the initial Braille word segmentation, for the initial Braille word B, map B to the corresponding Chinese character string B'=b1 'b'2 ...b'n , wherein b'j is the word segmentation after the initial Braille word segmentation bj is mapped to Chinese.

所述获取新盲文分词并调整模块中盲文分词连写规则如下:The acquisition of the new Braille word segmentation and adjustment of the Braille word segmentation rules in the module are as follows:

连写规则:POSk:[m,n]:POSk-m+…+POSk+…+POSk+n→POSk-m…POSk+nConsecutive writing rules: POSk :[m,n]:POSkm +…+POSk +…+POSk+n →POSkm …POSk+n

POSk为激活条件,m与n表示需要分别查看当前新盲文分词的前m个词 和n个词,如果m和n都为0,则表示这是一条分词规则,第二个冒号后表示 的是分词的词性组合,如果满足该组合,则执行右箭头之后的操作。POSk is the activation condition, m and n indicate that the first m words and n words of the current new Braille word segmentation need to be checked respectively, if both m and n are 0, it means that this is a word segmentation rule, and it is indicated after the second colon It is the part-of-speech combination of participle. If the combination is satisfied, the operation after the right arrow will be performed.

所述盲文显示模块中所述盲文标调的具体步骤为:The specific steps of the Braille marking in the Braille display module are:

依次查看每个调整后的所述新盲文分词对应字的拼音,并与盲文标调集中 的规则进行比对,如果满足条件,则对当前新盲文分词进行标调,所述盲文标 调集的格式如下:Check the pinyin of the corresponding word of each adjusted new Braille word segmentation in turn, and compare it with the rules in the Braille marking set. If the conditions are met, then adjust the current new Braille word segmentation. as follows:

标调规则:tonek:[n]:tonek…tonek+nStandard tone rules: tonek :[n]:tonek …tonek+n

其中tonek为当前新盲文分词的拼音,n为需要查看当前新盲文分词的后n 个新盲文分词的拼音,tonek…tonek+n为标调条件,如果拼音序列满足标调条件, 则对tonek进行标调。Among them, tonek is the pinyin of the current new Braille word segmentation, n is the pinyin of the last n new Braille word segmentations that need to be checked for the current new Braille word segmentation, tonek ... tonek+n is the tone condition, if the pinyin sequence meets the tone condition, then Standardize tonek .

下面通过对一个汉语句子进行汉语到盲文的转换及显示作为实例,详细介 绍本发明的用于盲人读取汉字的方法及系统的实施过程,应该明白该例子只是 用于举例说明,而不是意图限制本发明的范围。Below, by converting and displaying a Chinese sentence from Chinese to Braille as an example, the implementation process of the method and system for reading Chinese characters for blind people of the present invention will be introduced in detail. It should be understood that this example is only used for illustration, not intended to limit scope of the invention.

假设需转换为盲文的汉语句子为:“北京是她们的目的地”,采用汉语分词 模块进行汉语分词并进行词性标注,得到的结果为:“北京/NR是/VC她们/PN 的/DEG目的/NN地/NN”。Assuming that the Chinese sentence that needs to be converted into Braille is: "Beijing is their destination", the Chinese word segmentation module is used to perform Chinese word segmentation and part-of-speech tagging, and the result is: "Beijing/NR is the /DEG purpose of /VC and their/PN /NN ground/NN".

调用汉字串到拼音串转换模块将分词结果转换为拼音串,对于“北京”、 “是”、“她们”、“目的”这五个词,通过查找发音字典可直接确认读音;对于 “的”和“地”这两个字,由于都是多音字,需调用算法确定多音字发音。Call the Chinese character string to pinyin string conversion module to convert the word segmentation result into a pinyin string. For the five words "Beijing", "yes", "they", and "purpose", the pronunciation can be directly confirmed by looking up the pronunciation dictionary; for "de" Since the two characters "地" and "地" are polyphonic characters, an algorithm needs to be called to determine the pronunciation of polyphonic characters.

以“的”字为例,通过词性标注可知“的”字的词性为“DEG”,由“DEG” 可以确认该字的发音为“de”,由于通过词性可唯一确认“的”字发音,所以:Taking the word "de" as an example, the part of speech of the word "de" is "DEG" through part-of-speech tagging. From "DEG", it can be confirmed that the pronunciation of the word "de" is "de". Since the pronunciation of the word "de" can be uniquely confirmed through the part of speech, so:

Ppos(de)=1,Ppos (de)=1,

Ppos(di)=0Ppos (di) = 0

在前一个词为“她们”的条件下,通过查找语言模型概率,可以得到发音 为“de”的概率为0.45,发音为“di”的概率为0.05:Under the condition that the previous word is "they", by looking up the probability of the language model, the probability of pronouncing "de" is 0.45, and the probability of pronouncing "di" is 0.05:

Plm(de)=P(de|tamen)=0.45Plm (de)=P(de|tamen)=0.45

Plm(di)=P(di|tamen)=0.05Plm (di)=P(di|tamen)=0.05

进行归一化处理后,可以得到:Plm(de)=0.9,Plm(di)=0.1After normalization processing, it can be obtained: Plm (de)=0.9, Plm (di)=0.1

在词频字典中查找“的”的单字词频,发音为“de”的次数为185次,发 音为“di”的次数为75次,通过计算可知,发音为“de”的概率为0.71,发 音为“di”的概率为0.29Look up the word frequency of "的" in the word frequency dictionary, the number of times that it is pronounced as "de" is 185 times, and the number of times that it is pronounced as "di" is 75 times. It can be seen through calculation that the probability of pronunciation as "de" is 0.71, Probability of pronounced "di" is 0.29

根据经验值,设置词性、语言模型、词频三者概率的权重都为1/3,则:According to the empirical value, set the weights of the probabilities of part of speech, language model, and word frequency to 1/3, then:

通过得分比较,可以确定多音字“的”的最终发音为“de”。Through score comparison, it can be determined that the final pronunciation of the polyphonic word "的" is "de".

类似的,可以确定“地”字的发音为“di”。最终得到汉语句子对应的拼音串为“beijing shi ta men de mu di di”。Similarly, it can be determined that the word "地" is pronounced as "di". Finally, the pinyin string corresponding to the Chinese sentence is "beijing shi ta men de mu di di".

调用拼音串到盲符串转换模块,得到拼音串对应的盲符串为“B!G*:T9 M0 D MUDI DI”。(本说明书中采用的盲文表示为盲符的ASCII码编码,而非 盲符的点位形式。下文中相同。)Call the pinyin string to blind character string conversion module to get the blind character string corresponding to the pinyin string as "B!G*:T9 M0 D MUDI DI". (The braille used in this manual is expressed as the ASCII code of the braille character, not the dot form of the braille character. The same applies below.)

调用盲文分词模块对盲符串进行分词,得到分词后的盲符串为“B!G*:|T9 M0|D|MU DI DI”。Call the Braille word segmentation module to segment the blind character string, and obtain the word-segmented blind character string as "B!G*:|T9 M0|D|MU DI DI".

调用汉语和盲文分词结果融合模块对中文分词结果和盲文分词结果进行 融合。将分词后盲文串对应至汉语串,可得到采用盲文分词的汉字串为“北京 是/她们/的/目的地”,将盲文分词的汉字串与汉语分词的汉字串进行编辑距离 对齐,可得到附表1:Call the Chinese and Braille word segmentation results fusion module to fuse the Chinese word segmentation results and Braille word segmentation results. Corresponding the Braille string after the word segmentation to the Chinese string, the Chinese character string using the Braille word segmentation can be obtained as "Beijing is/they/the/destination", and the Chinese character string of the Braille word segmentation and the Chinese word segmentation Chinese character string are aligned with the edit distance, and we can get Schedule 1:

附表1:中文、盲文分词对照表Attached Table 1: Chinese and Braille Word Segmentation Comparison Table

对比附表1中汉语和盲文分词,有两个不同的片段,片段1“北京是”和 片段2“目的地”。Comparing the Chinese and Braille word segmentation in Appendix 1, there are two different fragments, fragment 1 "Beijing is" and fragment 2 "destination".

对片段1进行处理,片段1的汉语分词为“北京/是”,盲文分词为“北京 是”,取汉语分词第一个分词“北京”和盲文分词的第一个分词“北京是”进 行对比,由于盲文分词中第一个词“北京是”包含了汉语分词中第一个词“北 京”,继续查看汉语分词的第二个词“是”,并与第一个词“北京”进行组合形 成“北京是”与盲文分词的第一个词“北京是”进行对比,因为两者相同且片 段1中不再有其它未处理词,根据选取字数较多的词语作为最终分词的规则, 因此确定片段1的分词为“北京是”。Process Fragment 1, the Chinese participle of Fragment 1 is "北京/是", the Braille participle is "北京是", and the first participle of Chinese participle "Beijing" is compared with the first participle of Braille participle "北京是". , since the first word "Beijing is" in the Braille participle contains the first word "Beijing" in the Chinese participle, continue to check the second word "Yes" in the Chinese participle and combine it with the first word "Beijing" The formation of "Beijing is" is compared with the first word "Beijing is" in Braille word segmentation, because the two are the same and there are no other unprocessed words in segment 1. According to the rule of selecting words with more words as the final word segmentation, therefore Determine the participle of segment 1 as "Beijing is".

类似的,可以确定片段2的分词为“目的地”。最终,可以确定融合后的 分词结果为“北京是/她们/的/目的地”。Similarly, it can be determined that the word segmentation of segment 2 is "destination". Finally, it can be determined that the word segmentation result after fusion is "Beijing is/their/of/destination".

调用分词结果调整模块,根据汉语分词标注结果,北京的词性为“NR”, 即专有名词,盲文标准中对于专有名词,后跟单音节通用名词才进行连写,示 例中“北京”后跟“是”,词性为“VC”,即“系动词”,不满足盲文标准的条 件,不应该进行连写,应对融合的分词“北京是”进行拆分,得到“北京/是”, 经调整后,得到的分词结果为“北京/是/她们/的/目的地”,其对应的盲文分 词表示形式为“B!G*:T9M0 D MUDIDI”。Call the word segmentation result adjustment module. According to the results of Chinese word segmentation, the part of speech of Beijing is "NR", which is a proper noun. In the Braille standard, proper nouns are followed by monosyllable common nouns. In the example, "Beijing" is followed by " "is", the part of speech is "VC", that is, "linked verb", which does not meet the conditions of the Braille standard, and should not be written consecutively. The fused participle "Beijing is" should be split to obtain "Beijing/Yes". After adjustment, The obtained word segmentation result is "Beijing/is/they/of/destination", and its corresponding Braille word segmentation form is "B!G*:T9M0 D MUDIDI".

调用盲文标调模块对分词结果进行标调。盲文标准中规定,“他”、“她”、 “字”需使用特殊的表示方法,对于“她”字必须要标调。“她”的盲符为“T9”, 声调为第一声,盲符中的表示为“A”,标调后盲文串的表示形式为“B!G*:T9AM0 D MUDIDI”。Call the Braille standardization module to standardize the word segmentation results. The Braille standard stipulates that "he", "she", and "character" need to use a special representation method, and the word "she" must be marked. The braille symbol of "she" is "T9", the tone is the first tone, the expression in the braille symbol is "A", and the expression form of the braille string after standardization is "B!G*:T9AM0 D MUDIDI".

调用盲文显示模块将盲文串显示在盲用点显器上。Call the braille display module to display the braille string on the dot display for the blind.

Claims (10)

  1. For the polyphone w in multi-character wordsk, a) step, the word W with follow-up n word one n+1 words of compositionk,n=wkwk+1…wk+n,W is searched in polyphone phrase dictionaryk,n, such as find, then with Wk,nIn be searched the pronunciation of word as polyphone wkPronunciationAnd it returns;If do not found, then b) step is performed, the word W of a n+1 words is formed with the word of front nn-k,k=wn-kwn-kk+1…wn,W is searched in polyphone phrase dictionaryn-k,k, such as find, then with Wk,nIn be searched word pronunciation as polyphone pronunciation simultaneouslyIt returns, does not search such as, then form the word W of a n words with the follow-up and word of front n-1 respectivelyk,n-1、Wn-k+1,k, to the multi-character wordsPerform respectively a), b) step, until determining the polyphone wkPronunciation;
  2. For the polyphone w in multi-character wordsk, a) step, the word W with follow-up n word one n+1 words of compositionk,n=wkwk+1…wk+n,W is searched in polyphone phrase dictionaryk,n, such as find, then with Wk,nIn be searched the pronunciation of word as polyphone wkPronunciationAnd it returns;If do not found, then b) step is performed, the word W of a n+1 words is formed with the word of front nn-k,k=wn-kwn-kk+1…wn,W is searched in polyphone phrase dictionaryn-k,k, such as find, then with Wk,nIn be searched word pronunciation as polyphone pronunciation simultaneouslyIt returns, does not search such as, then form the word W of a n words with the follow-up and word of front n-1 respectivelyk,n-1、Wn-k+1,k, to the multi-character wordsPerform respectively a), b) step, until determining the polyphone wkPronunciation;
CN201510623525.5A2015-09-252015-09-25A kind of method and system that Chinese character is read for blind personActiveCN105404621B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201510623525.5ACN105404621B (en)2015-09-252015-09-25A kind of method and system that Chinese character is read for blind person

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201510623525.5ACN105404621B (en)2015-09-252015-09-25A kind of method and system that Chinese character is read for blind person

Publications (2)

Publication NumberPublication Date
CN105404621A CN105404621A (en)2016-03-16
CN105404621Btrue CN105404621B (en)2018-07-10

Family

ID=55470115

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201510623525.5AActiveCN105404621B (en)2015-09-252015-09-25A kind of method and system that Chinese character is read for blind person

Country Status (1)

CountryLink
CN (1)CN105404621B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107203508A (en)*2016-03-172017-09-26富士施乐实业发展(中国)有限公司Braille document generating method and system
CN107273357B (en)*2017-06-142020-11-10北京百度网讯科技有限公司 Amendment method, device, equipment and medium for word segmentation model based on artificial intelligence
CN107368474B (en)*2017-07-072020-08-04浙江理工大学 A method of automatic and efficient translation from Chinese to Braille
CN107886808B (en)*2017-11-032021-03-09中国科学院计算技术研究所Braille square auxiliary labeling method and system
CN108062886A (en)*2017-11-032018-05-22中国科学院计算技术研究所Braille point interactive mode mask method and system
CN108052936B (en)*2017-11-032021-06-29中国科学院计算技术研究所 A method and system for automatic tilt correction of braille images
CN108491441B (en)*2018-02-122022-02-01北京联合大学Braille information statistical system
CN108461111A (en)*2018-03-162018-08-28重庆医科大学Chinese medical treatment text duplicate checking method and device, electronic equipment, computer read/write memory medium
CN110920268B (en)*2019-11-192021-05-28西安交通大学 A braille engraving method and system thereof
CN111078898B (en)*2019-12-272023-08-08出门问问创新科技有限公司Multi-tone word annotation method, device and computer readable storage medium
CN112257420B (en)*2020-10-212024-06-18北京猿力未来科技有限公司Text processing method and device
CN113035026B (en)*2021-03-102022-06-17之江实验室 An audio-visual touch-sensing matching method for braille information barrier-free
CN114429128A (en)*2021-12-202022-05-03中国科学院计算技术研究所Method and system for constructing Chinese character-universal braille comparison corpus
CN116432603B (en)*2023-03-272023-10-13之江实验室 A Chinese braille chip integrating storage and calculation

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1323003A (en)*2001-06-222001-11-21清华大学Intelligent Chinese computer system for the blind
CN1323004A (en)*2001-06-082001-11-21清华大学Automatic conversion method from Chinese braille to Chinese character
WO2002006916A3 (en)*2000-07-182003-10-30Yishay LangenthalReading aid for the blind
CN1591414A (en)*2004-06-032005-03-09华建电子有限责任公司Automatic translating converting method for Chinese language to braille
CN102184172A (en)*2011-05-102011-09-14中国科学院计算技术研究所Chinese character reading system and method for blind people

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2002006916A3 (en)*2000-07-182003-10-30Yishay LangenthalReading aid for the blind
CN1323004A (en)*2001-06-082001-11-21清华大学Automatic conversion method from Chinese braille to Chinese character
CN1323003A (en)*2001-06-222001-11-21清华大学Intelligent Chinese computer system for the blind
CN1591414A (en)*2004-06-032005-03-09华建电子有限责任公司Automatic translating converting method for Chinese language to braille
CN102184172A (en)*2011-05-102011-09-14中国科学院计算技术研究所Chinese character reading system and method for blind people

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
EasyBraille:中文汉语盲文自动转换系统;朱小燕,包塔;《自然语言理解与机器翻译——全国第六届计算语言学联合学术会议论文集》;20010801;326-331*
汉字—盲文转换系统的设计;杨潮,车磊;《北京印刷学院学报》;20111231;第19卷(第6期);第4节,图4*
汉语—盲文机器翻译系统的研究与实现;李宏乔 等;《计算机应用》;20021110;第22卷(第11期);第2.3节,第3.2节,第3.4节*
面向统计机器翻译的领域自适应方法研究;苏晨;《中国优秀硕士学位论文全文数据库 信息科技辑》;20150815;第I138-765页正文第22页第3.3节*

Also Published As

Publication numberPublication date
CN105404621A (en)2016-03-16

Similar Documents

PublicationPublication DateTitle
CN105404621B (en)A kind of method and system that Chinese character is read for blind person
CN107305768B (en) A typo-prone calibration method in voice interaction
TWI539441B (en)Speech recognition method and electronic apparatus
TWI532035B (en)Method for building language model, speech recognition method and electronic apparatus
CN105957518B (en)A kind of method of Mongol large vocabulary continuous speech recognition
CN101286170B (en)Voice search device
TW473674B (en)Chinese word segmentation apparatus
US8131539B2 (en)Search-based word segmentation method and device for language without word boundary tag
CN109241540B (en) A method and system for automatic conversion of Chinese to blind based on deep neural network
CN102799577B (en)A kind of Chinese inter-entity semantic relation extraction method
TW201517015A (en)Method for building acoustic model, speech recognition method and electronic apparatus
Gao et al.Phoneme-based transliteration of foreign names for OOV problem
CN113571037B (en)Chinese braille voice synthesis method and system
KR102794379B1 (en)Learning data correction method and apparatus thereof using ensemble score
CN102063900A (en)Speech recognition method and system for overcoming confusing pronunciation
CN114970503A (en)Word pronunciation and font knowledge enhancement Chinese spelling correction method based on pre-training
CN103324607A (en)Method and device for word segmentation of Thai texts
JP5097802B2 (en) Japanese automatic recommendation system and method using romaji conversion
CN106294310B (en)A kind of Tibetan language tone prediction technique and system
AsahiahDevelopment of a Standard Yorùbá digital text automatic diacritic restoration system
JP2001229162A (en) Automatic Chinese document proofing method and device
Saychum et al.Efficient Thai Grapheme-to-Phoneme Conversion Using CRF-Based Joint Sequence Modeling.
CN114528861A (en)Foreign language translation training method and device based on corpus
Minghu et al.Segmentation of Mandarin Braille word and Braille translation based on multi-knowledge
Wang et al.Accurate Braille-Chinese translation towards efficient Chinese input method for blind people

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp