CN105404621B

Movatterモバイル変換

Info

Publication number: CN105404621B
Application number: CN201510623525.5A
Authority: CN
Inventors: 王向东; 杨阳; 钱跃良; 刘宏; 张金超; 姜文斌
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2015-09-25
Filing date: 2015-09-25
Publication date: 2018-07-10
Anticipated expiration: 2035-09-25
Also published as: CN105404621A

Abstract

本发明提出一种用于盲人读取汉字的方法及系统，涉及自然语言处理技术领域和面向残疾人的人机交互技术领域，该方法包括获取汉语文本，对所述汉语文本进行分词操作，生成汉字串，通过发音词典、多音字字典与词频信息，参考分词得到的词性标注，将所述汉字串中的每个词转换为对应的拼音并连接为拼音串；通过查找拼音和盲符的对照字典，将所述拼音串转换为盲符串，通过分词模型对所述盲符串进行盲文分词，生成初始盲文分词，将所述汉字串与所述初始盲文分词进行融合，生成新盲文分词，根据盲文分词连写规则对所述新盲文分词进行调整；对根据盲文分词连写规则调整后的所述新盲文分词进行盲文标调，生成最终盲文分词，将所述最终盲文分词进行显示。

The present invention proposes a method and system for blind people to read Chinese characters, which relate to the technical field of natural language processing and the technical field of human-computer interaction for the disabled. Chinese character string, through the pronunciation dictionary, polyphonic word dictionary and word frequency information, refer to the part-of-speech annotation obtained by word segmentation, convert each word in the Chinese character string into the corresponding pinyin and connect it into a pinyin string; a dictionary, converting the pinyin string into a Braille character string, performing Braille word segmentation on the Braille character string through a word segmentation model to generate an initial Braille word segment, and merging the Chinese character string with the initial Braille word segment to generate a new Braille word segment, Adjusting the new Braille word segmentation according to the Braille word segmentation and writing rules; performing Braille adjustment on the new Braille word segmentation adjusted according to the Braille word segmentation and writing rules to generate a final Braille word segmentation, and displaying the final Braille word segmentation.

Description

Translated fromChinese

一种用于盲人读取汉字的方法及系统A method and system for blind people to read Chinese characters

技术领域technical field

本发明涉及自然语言处理技术领域和面向残疾人的人机交互技术领域，特别是涉及一种用于盲人读取汉字的方法及系统。The present invention relates to the technical field of natural language processing and the technical field of human-computer interaction for the disabled, in particular to a method and system for blind people to read Chinese characters.

背景技术Background technique

在当今信息社会，信息化水平不断提高，信息技术在人们的工作、学习和生活中得到了广泛应用，而互联网也成为人们日常生活中的一个重要组成部分，网络以一种便捷的方式为人们提供了海量的信息资源。在中国，各种数字化、网络文本资源大多以汉语文本的形式存储，而这些资源难以被我国现有的 1200万盲人所使用。这阻碍了盲人像正常人一样享受海量的信息资源，使盲人和正常人之间的信息鸿沟不断扩大，盲人在信息化社会中的生存和发展能力受到进一步制约。虽然现有的语音合成技术日趋成熟，网络上大量的文本资源可以通过语音合成转换为音频文件使得盲人可以通过听觉获得这些信息，但语音资源的存储比较耗费空间，并且在携带、查询等方面并不方便，而且，语音通道获取信息效率较低，因此，对于盲人来说，阅读文本资源仍然是获得信息最重要的方式。In today's information society, the level of informatization is constantly improving, information technology has been widely used in people's work, study and life, and the Internet has become an important part of people's daily life. Provides a wealth of information resources. In China, various digital and network text resources are mostly stored in the form of Chinese text, and these resources are difficult to be used by the 12 million blind people in our country. This prevents blind people from enjoying massive information resources like normal people, widens the information gap between blind people and normal people, and further restricts blind people's ability to survive and develop in an information society. Although the existing speech synthesis technology is becoming more and more mature, a large number of text resources on the Internet can be converted into audio files through speech synthesis so that blind people can obtain these information through hearing, but the storage of speech resources is relatively space-consuming, and it is not easy to carry and query. It is inconvenient, and the efficiency of obtaining information through the voice channel is low. Therefore, for the blind, reading text resources is still the most important way to obtain information.

我国盲人在阅读书写时使用的文字是中国盲文，中国盲文基于布莱尔 (Braille)盲文体系，每个盲符均以两列共6个点作为一个基本结构，这6 个点有的凸起，有的不凸起，形成64种变化，即能表示64种不同的字符。在汉语盲文中，每种字符分别表示汉语拼音中的一个声母、韵母或声调，不同的字符按照汉语拼音规则组成合法音节以表示汉字，因此，中国盲文本质上是一种拼音文字。盲文一般印刷和书写在特制的较厚的盲文纸上，在盲文纸上压出凸起的点位以供盲人摸读。为使盲人能够在计算机上摸读盲文，当前已经设制和生产出了盲用点显器，这种设备可与计算机连接，接收计算机中的盲符串，并将其在点显机面板上显示为相应的凸起的点位，当收到新的盲符串后，可在面板上清除原来的点位重新显示新的点位。Chinese blind people use Chinese Braille when they read and write. Chinese Braille is based on the Braille system. Each braille symbol has two columns of 6 points as a basic structure. The non-protrusion forms 64 kinds of changes, which can represent 64 different characters. In Chinese Braille, each character represents an initial consonant, final or tone in Chinese Pinyin, and different characters form legal syllables to represent Chinese characters according to the rules of Chinese Pinyin. Therefore, Chinese Braille is essentially a kind of Pinyin writing. Braille is generally printed and written on special thicker Braille paper, on which raised dots are embossed for blind people to touch and read. In order to enable the blind to touch and read Braille on the computer, a point display for the blind has been designed and produced. This device can be connected to a computer to receive the string of braille characters in the computer and display it on the panel of the point display machine. It is displayed as a corresponding raised point. When a new blind character string is received, the original point can be cleared on the panel and a new point can be displayed again.

虽然有了点显器，但是盲人仍然很难在计算机上读取汉语文本，原因在于还需要将汉语文本转换为盲文。由于汉语普遍存在的一音多字、一字多音等现象，使得汉语到盲文的转换并非简单的规则对应，而需要综合考虑语法、语义等。更为重要的是，盲文具有分词连写规则，要求将具备一定语义的词或短语用一个“空方”分隔开来，以便于盲人理解。当前已有方法一般基于盲文分词连写规则对汉语分词结果进行调整以得到分词后的盲文，但由于盲文分词连写规则一般与语义相关且有一定的主观性，因此，由计算机自动完成时分词准确率较低，在使用这些方法进行转换之后，还需要做大量人工修正工作，造成了效率低下，也使得盲文文本资源的获取的时间较长且成本较高。因此，提高汉盲转换的准确率，减少人工修正的操作，加快汉盲转换的效率，对于提高中文信息资源在盲人群体中的普及率，让盲人群体更好地融入主流社会中有着重要的现实意义。Despite the dot display, it is still difficult for blind people to read Chinese text on a computer because the Chinese text needs to be converted into Braille. Due to the ubiquitous phenomenon of one sound with multiple characters and one word with multiple sounds, the conversion from Chinese to Braille is not a simple correspondence of rules, but requires comprehensive consideration of grammar and semantics. More importantly, Braille has word segmentation rules, which require words or phrases with certain semantics to be separated by an "empty square" to facilitate understanding by blind people. The current existing methods generally adjust the Chinese word segmentation results based on the Braille word segmentation rules to obtain the Braille after word segmentation. However, because the Braille word segmentation rules are generally related to semantics and have a certain degree of subjectivity, the accuracy of word segmentation when automatically completed by a computer After using these methods for conversion, a lot of manual correction work is required, resulting in low efficiency, and also makes the acquisition of braille text resources take a long time and cost high. Therefore, improving the accuracy of Chinese-blind conversion, reducing manual correction operations, and speeding up the efficiency of Chinese-blind conversion is of great importance for improving the popularity of Chinese information resources among the blind population and allowing the blind population to better integrate into the mainstream society. significance.

发明内容Contents of the invention

针对现有技术的不足，本发明提出一种用于盲人读取汉字的方法及系统。Aiming at the deficiencies of the prior art, the present invention proposes a method and system for blind people to read Chinese characters.

本发明提出一种用于盲人读取汉字的方法，包括：The present invention proposes a method for blind people to read Chinese characters, including:

步骤1，获取汉语文本，对所述汉语文本进行分词操作，生成汉字串，通过发音词典、多音字字典与词频信息，参考分词得到的词性标注，将所述汉字串中的每个词转换为对应的拼音并连接为拼音串；Step 1, obtain the Chinese text, carry out the word segmentation operation on the Chinese text, generate a Chinese character string, and convert each word in the Chinese character string into The corresponding pinyin is connected into a pinyin string;

步骤2，通过查找拼音和盲符的对照字典，将所述拼音串转换为盲符串，通过分词模型对所述盲符串进行盲文分词，生成初始盲文分词，将所述汉字串与所述初始盲文分词进行融合，生成新盲文分词，根据盲文分词连写规则对所述新盲文分词进行调整；Step 2, convert the pinyin string into a braille string by looking up a comparison dictionary of pinyin and braille characters, perform braille segmentation on the braille string through a word segmentation model, generate an initial braille word segmentation, combine the Chinese character string with the The initial Braille word segmentation is fused to generate a new Braille word segmentation, and the new Braille word segmentation is adjusted according to the Braille word segmentation rule;

步骤3，对根据盲文分词连写规则调整后的所述新盲文分词进行盲文标调，生成最终盲文分词，将所述最终盲文分词进行显示。Step 3: Carry out Braille marking on the new Braille word segmentation adjusted according to the Braille word segmentation rule, generate a final Braille word segment, and display the final Braille word segment.

所述的用于盲人读取汉字的方法，所述步骤1中将所述汉字串转换成拼音串的具体步骤为：The described method for the blind to read Chinese characters, the concrete steps that described Chinese character string is converted into pinyin string in described step 1 are:

步骤2.1对于所述汉字串中的每个词，判断每个词是否为多字词，若为多字词，且在发音词典中能够找到所述多字词对应的拼音，则直接返回所述多字词对应的拼音，否则执行步骤2.2；Step 2.1 For each word in the Chinese character string, judge whether each word is a multi-word word, if it is a multi-word word, and the pinyin corresponding to the multi-word word can be found in the pronunciation dictionary, then directly return to the Pinyin corresponding to multiple words, otherwise perform step 2.2;

步骤2.2将所述多字词切分为汉字的序列，依次取所述多字词中所有的汉字，对每个汉字，执行步骤2.3至2.4；Step 2.2 is divided into the sequence of Chinese characters described multi-character word, gets all Chinese characters in described multi-character word successively, for each Chinese character, carries out step 2.3 to 2.4;

步骤2.3对于当前汉字，查找多音字字典，判断所述当前汉字是否为多音字，若非多音字，在发音词典中查找所述当前汉字的拼音并返回所述拼音；否则执行步骤2.4；Step 2.3 is for current Chinese character, search polyphonic word dictionary, judge whether described current Chinese character is polyphonic word, if non-polyphonic word, look up the pinyin of described current Chinese character in pronunciation dictionary and return described pinyin; Otherwise execute step 2.4;

步骤2.4若为多音字，则执行以下步骤，具体步骤为：If step 2.4 is a polyphonic word, then perform the following steps, the specific steps are:

步骤2.4.1如果当前多音字来自于一个单字词，则直接执行步骤2.4.2；若为多字词，则执行下述步骤：Step 2.4.1 If the current polyphonic word comes from a single-character word, then directly perform step 2.4.2; if it is a multi-character word, then perform the following steps:

对于多字词中的多音字w_k，a)步骤，与后续n个字组成一n+1字的词 W_k,n＝w_kw_k+1…w_k+n，在多音字词组字典中查找W_k,n，如查找到，则以W_k,n中被查找到字的发音作为多音字w_k的读音并返回；如未查到，则执行b)步骤，与前面n个字组成一n+1字的词W_n-k,k＝w_n-_kw_n-kk+1…w_n，在多音字词组字典中查找W_n-k,k，如查找到，则以W_k,n中被查找到字的发音作为多音字的读音并返回，如未查找，则分别与后续及前面n-1个字组成一n字的词W_k,n-1、W_n-k+1,k，对所述多字词分别执行a)、b)步骤，直至确定所述多音字w_k发音；For the polyphonic word w_k in the multi-character word, a) step, form a word W_{k of n+1 characters with follow-up n words, n} =w_k w_k+1 ...w_k+n , in polyphonic word Look up W_{k, n} in the group dictionary, if found, then with W_{k, the pronunciation of the word that is found in n} is used as the pronunciation of polyphonic word w_k and return; If not found, then perform b) step, and the previous n Words form a word W_{nk of n+1 words, k} =w_n-_k w_n-kk+1 ... w_n , look up W_nk,k in the dictionary of polyphonic words, as found, then use W The pronunciation of the searched word in_{k, n} is used as the pronunciation of the polyphonic character and returned. If not found, it forms a word W_{k, n-1} and W_n-k of n characters with the subsequent and previous n-1 characters respectively_{+1, k} , respectively carry out a) and b) steps to the multi-character word, until determining the pronunciation of the multi-phonetic word w_k ;

步骤2.4.2假设所述多音字有tone₁,...,tone_n共n个读音，分词词性概率定义为P_pos，权值为λ₁，语言模型概率定义为P_lm，权值为λ₂，分词词频概率定义为P_freq，权值为λ₃，系统为所述多音字的每一个读音计算一个得分Score_i，其中 Score_i＝λ₁·P_pos(tone_i)+λ₂·P_lm(tone_i)+λ₃·P_freq(tone_i)，取出得分最高的读音作为多音字的最终拼音并返回。Step 2.4.2 Assume that the polyphonic word has n pronunciations of tone₁ ,...,tone_n , the part-of-speech probability of word segmentation is defined as P_pos , the weight is λ₁ , the language model probability is defined as P_lm , and the weight is λ_2. The frequency probability of word segmentation is defined as P_freq , and the weight is λ₃ . The system calculates a score Score_i for each pronunciation of the polyphonic character, where Score_i = λ₁ ·P_pos (tone_i )+λ₂ ·P_lm (tone_i )+λ₃ ·P_freq (tone_i ), take out the pronunciation with the highest score As the final pinyin of polyphonic characters and return.

所述的用于盲人读取汉字的方法，所述步骤2中进行融合的步骤为，对于所述汉字串C＝c₁c₂…c_m与所述初始盲文分词B＝b₁b₂…b_n，其中c_i,b_j分别表示所述汉字串及所述初始盲文分词中的一个分词，对于所述初始盲文分词B，将B映射至对应的所述汉字串B'＝b₁'b'₂…b'_n，其中b'_j为所述初始盲文分词b_j映射为中文后的分词。In the method for reading Chinese characters for the blind, the step of fusion in step 2 is, for the Chinese character string C=c₁ c₂ ...c_m and the initial Braille word segmentation B=b₁ b₂ ... b_n , where c_i , b_j respectively represent the Chinese character string and a participle in the initial Braille word, for the initial Braille word B, map B to the corresponding Chinese character string B'=b₁ 'b'₂ ... b'_n , where b'_j is the word segment after the initial Braille word segment b_j is mapped to Chinese.

所述的用于盲人读取汉字的方法，所述步骤2中盲文分词连写规则如下：The described method for the blind to read Chinese characters, in the step 2, the Braille word segmentation and ligature rules are as follows:

连写规则：POS_k:[m,n]:POS_k-m+…+POS_k+…+POS_k+n→POS_k-m…POS_k+nConsecutive writing rules: POS_k :[m,n]:POS_km +…+POS_k +…+POS_k+n →POS_km …POS_k+n

POS_k为激活条件，m与n表示需要分别查看当前新盲文分词的前m个词和n个词，如果m和n都为0，则表示这是一条分词规则，第二个冒号后表示的是分词的词性组合，如果满足该组合，则执行右箭头之后的操作。POS_k is the activation condition, m and n indicate that the first m words and n words of the current new Braille word segmentation need to be checked respectively, if both m and n are 0, it means that this is a word segmentation rule, and it is indicated after the second colon It is the part-of-speech combination of participle. If the combination is satisfied, the operation after the right arrow will be performed.

所述的用于盲人读取汉字的方法，所述步骤3中所述盲文标调的具体步骤为：The described method for the blind to read Chinese characters, the specific steps of the braille standard tone described in the step 3 are:

依次查看每个调整后的所述新盲文分词对应字的拼音，并与盲文标调集中的规则进行比对，如果满足条件，则对当前新盲文分词进行标调，所述盲文标调集的格式如下：Check the pinyin of the corresponding word of each adjusted new Braille word segmentation in turn, and compare it with the rules in the Braille marking set. If the conditions are met, then adjust the current new Braille word segmentation. as follows:

标调规则：tone_k:[n]:tone_k…tone_k+nStandard tone rules: tone_k :[n]:tone_k …tone_k+n

其中tone_k为当前新盲文分词的拼音，n为需要查看当前新盲文分词的后n 个新盲文分词的拼音，tone_k…tone_k+n为标调条件，如果拼音序列满足标调条件，则对tone_k进行标调。Among them, tone_k is the pinyin of the current new Braille word segmentation, n is the pinyin of the last n new Braille word segmentations that need to be viewed, tone_k ... tone_k+n is the tone condition, if the pinyin sequence meets the tone condition, then Standardize tone_k .

本发明还提出一种用于盲人读取汉字的系统，包括：The present invention also proposes a system for blind people to read Chinese characters, including:

获取拼音串模块，用于获取汉语文本，对所述汉语文本进行分词操作，生成汉字串，通过发音词典、多音字字典与词频信息，参考分词得到的词性标注，将所述汉字串中的每个词转换为对应的拼音并连接为拼音串；Obtaining the Pinyin string module is used to obtain the Chinese text, perform word segmentation operations on the Chinese text, and generate a Chinese character string. Through the pronunciation dictionary, polyphonic word dictionary and word frequency information, refer to the part-of-speech annotation obtained by word segmentation, and convert each word in the Chinese character string Words are converted into corresponding pinyin and connected as pinyin strings;

获取新盲文分词并调整模块，用于通过查找拼音和盲符的对照字典，将所述拼音串转换为盲符串，通过分词模型对所述盲符串进行盲文分词，生成初始盲文分词，将所述汉字串与所述初始盲文分词进行融合，生成新盲文分词，根据盲文分词连写规则对所述新盲文分词进行调整；Obtain a new Braille word segmentation and adjust the module, which is used to convert the pinyin string into a Braille string by looking up a comparison dictionary of Pinyin and Braille characters, and perform Braille word segmentation on the Braille string through a word segmentation model to generate an initial Braille word segment, and The Chinese character string is fused with the initial Braille word segmentation to generate a new Braille word segmentation, and the new Braille word segmentation is adjusted according to the Braille word segmentation rule;

盲文显示模块，用于对根据盲文分词连写规则调整后的所述新盲文分词进行盲文标调，生成最终盲文分词，将所述最终盲文分词进行显示。The Braille display module is configured to perform Braille marking on the new Braille word segmentation adjusted according to the Braille word segmentation rule, to generate the final Braille word segmentation, and to display the final Braille word segmentation.

所述的用于盲人读取汉字的系统，所述获取拼音串模块中将所述汉字串转换成拼音串的具体步骤为：The described system for the blind to read Chinese characters, the specific steps of converting the Chinese character strings into pinyin strings in the described acquisition pinyin string module are:

所述的用于盲人读取汉字的系统，所述获取新盲文分词并调整模块中进行融合的步骤为，对于所述汉字串C＝c₁c₂…c_m与所述初始盲文分词B＝b₁b₂…b_n，其中c_i,b_j分别表示所述汉字串及所述初始盲文分词中的一个分词，对于所述初始盲文分词B，将B映射至对应的所述汉字串B'＝b₁'b'₂…b'_n，其中b'_j为所述初始盲文分词b_j映射为中文后的分词。In the system for blind people to read Chinese characters, the step of obtaining new Braille word segmentation and adjusting the fusion in the module is, for the Chinese character string C=c₁ c₂ ...c_m and the initial Braille word segmentation B= b₁ b₂ …b_n , where c_i , b_j respectively represent the Chinese character string and a participle in the initial Braille word segmentation, for the initial Braille word B, map B to the corresponding Chinese character string B '=b₁ 'b'₂ ...b'_n , where b'_j is the word segmentation after the initial Braille word b_j is mapped to Chinese.

所述的用于盲人读取汉字的系统，所述获取新盲文分词并调整模块中盲文分词连写规则如下：The described system for the blind to read Chinese characters, the new braille word segmentation is obtained and the braille word segmentation ligature rules in the adjustment module are as follows:

所述的用于盲人读取汉字的系统，所述盲文显示模块中所述盲文标调的具体步骤为：The described system for the blind to read Chinese characters, the specific steps of the braille standard tone in the described braille display module are:

其中tone_k为当前新盲文分词的拼音，n为需要查看当前新盲文分词的后n 个新盲文分词的拼音，tone_k…tone_k+n为标调条件，如果拼音序列满足标调条件，则对tone_k进行标调。Among them, tone_k is the pinyin of the current new Braille word segmentation, n is the pinyin of the last n new Braille word segmentations that need to be checked for the current new Braille word segmentation, tone_k ... tone_k+n is the tone condition, if the pinyin sequence meets the tone condition, then Standardize tone_k .

由以上方案可知，本发明的优点在于：As can be seen from the above scheme, the present invention has the advantages of:

本发明不同于现有的汉盲转换技术中，先对汉字串进行汉语分词，再在分词结果上运用一系列复杂的分词连写规则进行二次处理的做法，本发明利用构建的基于统计机器学习技术的盲文分词模型直接对盲符串进行一步式分词，分词结果基本符合盲文分词连写规则，只需进行少量微调即可作为盲文输出，相比现有技术，避免了用计算机处理复杂的、涉及语义的分词连写规则导致的准确率不高的问题，分词准确率和整体汉盲转换准确率都有较大的提升。The present invention is different from the existing Chinese blind conversion technology, which first performs Chinese word segmentation on Chinese character strings, and then uses a series of complex word segmentation rules for secondary processing on the word segmentation results. The present invention utilizes the constructed statistical machine learning The advanced braille word segmentation model directly performs one-step word segmentation on the braille character string. The word segmentation result basically conforms to the braille word segmentation rule, and can be output as braille with only a small amount of fine-tuning. Compared with the existing technology, it avoids the use of computers to deal with complicated The problem of low accuracy caused by semantic word segmentation and consecutive writing rules, the accuracy of word segmentation and the overall conversion accuracy of Chinese characters have been greatly improved.

附图说明Description of drawings

图1为用于盲人读取汉字的方法流程图；Fig. 1 is the flow chart of the method for reading Chinese characters for the blind;

图2为分词后的汉字串转换为拼音串的流程图。Fig. 2 is a flow chart of converting Chinese character strings after word segmentation into pinyin strings.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚，以下结合附图及实施例，对本发明的用于盲人读取汉字的方法进行进一步详细说明，应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。In order to make the purpose, technical solution and advantages of the present invention clearer, the method for reading Chinese characters for the blind of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used To explain the present invention, not to limit the present invention.

本发明的用于盲人读取汉字的方法主要流程如附图1所示，其输入为一个汉语句子，即一个汉字串，输出为相应的盲文，并显示在盲用点显器上。The main flow of the method for the blind to read Chinese characters of the present invention is as shown in accompanying drawing 1, and its input is a Chinese sentence, i.e. a Chinese character string, and the output is corresponding Braille, and is shown on the point display for the blind.

步骤1.汉语分词。即采用汉语分词系统将输入的汉字串切分为汉语词的序列，得到分词后的汉字串，同时为每个词标注词性，汉语分词可采用当前已有的各种方法和系统，如基于词典的最大或最小匹配方法，基于隐马尔科夫模型(HMM)的方法，基于最大熵模型的方法等；Step 1. Chinese word segmentation. That is, the Chinese word segmentation system is used to divide the input Chinese character string into a sequence of Chinese words, and the Chinese character string after word segmentation is obtained. At the same time, the part of speech is marked for each word. Chinese word segmentation can use various methods and systems currently available, such as based on a dictionary The maximum or minimum matching method, the method based on the hidden Markov model (HMM), the method based on the maximum entropy model, etc.;

步骤2.将分词后的汉字串转换为拼音串，即利用发音词典、多音字字典和词频信息，参考分词得到的词性标注，将分词后的汉字串中的每个词转换为对应的拼音并连接为拼音串，所述发音词典为汉字词(包括单字词与多字词) 与拼音的映射表。在一个实施例中，发音词典的规模为7万词左右，所述多音字字典中列出了所有的多音字及其每个多音字对应的多个拼音，所述词频信息为每个汉字的在汉语文本中的出现频率，该信息预先采用大量汉语文本统计得到。在一个实施例中，词频信息中字的规模为7000字左右。Step 2. Convert the Chinese character string after the word segmentation into a pinyin string, that is, use the pronunciation dictionary, polyphone dictionary and word frequency information, refer to the part-of-speech tag obtained by the word segmentation, convert each word in the Chinese character string after the word segmentation into the corresponding pinyin and The connection is a pinyin string, and the pronunciation dictionary is a mapping table of Chinese characters (including single-character words and multi-character words) and pinyin. In one embodiment, the scale of the pronunciation dictionary is about 70,000 words, all polyphonic characters and multiple pinyin corresponding to each polyphonic character thereof are listed in the polyphonic dictionary, and the word frequency information is the number of each Chinese character The frequency of occurrence in Chinese texts, the information is obtained in advance by statistics of a large number of Chinese texts. In one embodiment, the scale of words in the word frequency information is about 7000 words.

以下为分词的具体步骤，如图2所示：The following are the specific steps of word segmentation, as shown in Figure 2:

步骤2.1对于分词后的汉字串中的每个词，判断该词是否为多字词(包含两个或以上的汉字)，若为多字词，且在发音词典中能够找到该词对应的拼音，则直接返回该拼音，否则执行步骤2.2；Step 2.1 For each word in the Chinese character string after word segmentation, judge whether the word is a multi-character word (comprising two or more Chinese characters), if it is a multi-character word, and the corresponding pinyin of the word can be found in the pronunciation dictionary , return the pinyin directly, otherwise go to step 2.2;

步骤2.2对于输入的词(单字词或多字词)，将词切分为汉字的序列，依次取其所有的汉字，对每个汉字，执行步骤2.3至2.4；Step 2.2 For the input word (single-character word or multi-character word), the word is segmented into a sequence of Chinese characters, and all Chinese characters thereof are successively taken, and for each Chinese character, steps 2.3 to 2.4 are performed;

步骤2.3对于当前汉字，查找多音字字典，判断该字是否为多音字，若非多音字，在发音词典中查找该字的拼音并返回该拼音；否则执行步骤2.4；Step 2.3 is for current Chinese character, searches polyphonic word dictionary, judges whether this word is polyphonic word, if non-polyphonic word, searches the pinyin of this word in pronunciation dictionary and returns this pinyin; Otherwise execution step 2.4;

步骤2.4对于多音字，需综合多种信息确定多音字的拼音。具体步骤为：Step 2.4 For polyphonic characters, multiple information needs to be integrated to determine the pinyin of polyphonic characters. The specific steps are:

步骤2.4.1如果当前多音字来自于一个单字词，则直接执行步骤2.4.2；否则先执行下述步骤：Step 2.4.1 If the current polyphonic character comes from a single-character word, then directly perform step 2.4.2; otherwise, perform the following steps first:

对于多字词中的多音字w_k，a)与后续n个字组成一n+1字的词 W_k,n＝w_kw_k+1…w_k+n，在多音字词组字典中查找W_k,n，如查找到，则以该词组中该字的发音作为多音字的读音并返回；如未查到，则b)与前面n个字组成一n+1 字的词W_n-k,k＝w_n-kw_n-kk+1…w_n，在多音字词组字典中查找W_n-k,k，如查找到，则以该词组中该字的发音作为多音字的读音并返回，如未查找，则分别与后续及前面n-1个字组成一n字的词W_k,n-1、W_n-k+1,k，对该词分别执行a)、b)步骤，直至确定该多音字发音。如果n＝1时，W_k,k+1、W_k-1,k仍然无法在多音字词组字典查找到读音，则返回空；For the polyphonic word w_k in the multi-character word, a) form a word W_{k of n+1 word with follow-up n words, n} =w_k w_k+1 ...w_k+n , in polyphonic word phrase dictionary Look up W_k,n in, if found, then take the pronunciation of this character in this phrase as the pronunciation of polyphonic characters and return; If not found, then b) form a word W of n+1 words with the previous n words_{nk, k} ＝w_nk w_n-kk+1 ...w_n , look up W_{nk, k} in the polyphonic word phrase dictionary, if found, then take the pronunciation of the word in the phrase as the pronunciation of the polyphonic word and return , if not found, then respectively form a word W_k,n-1 , W_n-k+1,k of n characters with subsequent and previous n-1 characters, and perform steps a) and b) respectively on the word, Until the polyphone pronunciation is determined. If when n=1, W_{k, k+1} , W_{k-1, k} still can't find pronunciation in polyphonic word phrase dictionary, then return empty;

步骤2.4.2假设多音字有tone₁,...,tone_n共n个读音，分词词性概率定义为 P_pos，权值为λ₁，语言模型概率定义为P_lm，权值为λ₂，分词词频概率定义为P_freq，权值为λ₃，系统为多音字的每一个读音计算一个得分Score_i，其中 Score_i＝λ₁·P_pos(tone_i)+λ₂·P_lm(tone_i)+λ₃·P_freq(tone_i)，取出得分最高的读音作为多音字的最终拼音并返回。需要说明的是，对于词性、词频、语言模型各类型每一发音的概率，需要进行归一化处理，各类型的权值可根据经验值设定。Step 2.4.2 Assume that polyphonic characters have n pronunciations of tone₁ ,...,tone_n , the part-of-speech probability of word segmentation is defined as P_pos , the weight is λ₁ , the language model probability is defined as P_lm , and the weight is λ₂ , The word frequency probability of word segmentation is defined as P_freq , and the weight is λ₃ . The system calculates a score Score_i for each pronunciation of polyphonic characters, where Score_i = λ₁ ·P_pos (tone_i )+λ₂ ·P_lm (tone_i )+λ₃ ·P_freq (tone_i ), take out the pronunciation with the highest score As the final pinyin of polyphonic characters and return. It should be noted that the probability of each pronunciation of each type of part of speech, word frequency, and language model needs to be normalized, and the weights of each type can be set according to empirical values.

步骤3.将拼音串转换为盲符串。通过查找拼音和盲符的对照字典，将步骤2得到的拼音串转换为盲符串，此时的盲符串是未分词的盲符串。所述拼音和盲符的对照字典为拼音和对应盲符的映射表。Step 3. Convert the pinyin string to a blind character string. By searching the comparison dictionary of pinyin and blind characters, the pinyin string obtained in step 2 is converted into a blind character string, and the blind character string at this moment is an unsegmented blind character string. The contrast dictionary of described pinyin and blind symbols is a mapping table of pinyin and corresponding blind symbols.

步骤4.采用预先用统计机器学习方法训练好的分词模型进行盲文分词，生成初始盲文分词。采用当前相关领域常用的感知器模型，模型训练时采用已经分好词的盲文语料，采用的特征为一元特征、二元特征和属性特征。分词时对盲符串的每一个可以切分的位置提取特征并利用训练好的模型计算概率，根据概率判断是否需要在该位置进行词语切分。Step 4. Use the pre-trained word segmentation model using statistical machine learning methods to perform Braille word segmentation to generate initial Braille word segmentation. The current commonly used perceptron model in related fields is used. The Braille corpus that has been divided into words is used for model training. The features used are unary features, binary features and attribute features. During word segmentation, extract features from each segmentable position of the blind character string and use the trained model to calculate the probability, and judge whether word segmentation is required at this position according to the probability.

训练模型采用感知机算法，学习到从输入到输出的判别式映射模型，输入是训练语料中的句子，输出是相应的标注结果。The training model uses the perceptron algorithm to learn a discriminative mapping model from input to output. The input is the sentence in the training corpus, and the output is the corresponding labeling result.

对盲文句子的分词采用字分类模型。给定一个由n个字组成的句子，分词的过程是将这个句子分成m(m≤n)块，每一块是一个有意义的词。给每个字分配一个代表其在词中位置的类标，将分词问题转化成字分类问题。采用b,m,e,s 作为字的边界类标，b,m,e分别代表该字位于词的开始位置、中间位置、结束位置，s代表该字是单字词。解码过程是寻找使得分数评价函数f(x)最高的标注序列y。The word classification model is used for word segmentation of Braille sentences. Given a sentence consisting of n characters, the process of word segmentation is to divide the sentence into m (m≤n) blocks, each block is a meaningful word. Assign a class label representing its position in the word to each word, and transform the word segmentation problem into a word classification problem. Adopt b, m, e, s as the boundary class mark of the word, b, m, e represent that the word is located in the beginning position, middle position, and end position of the word respectively, and s represents that the word is a single word. The decoding process is to find the label sequence y that makes the score evaluation function f(x) the highest.

其中，f(x)分数累加了每个字和类标对的分数， (i,t)∈y(s.t.1≤i≤n,t∈{b,m,e,s})，Φ(x,y)是特征提取函数，是参数向量。分词使用维特比解码算法。Among them, the f(x) score accumulates the scores of each word and class label pair, (i,t)∈y(st1≤i≤n,t∈{b,m,e,s}), Φ(x, y) is the feature extraction function, is a parameter vector. Word segmentation uses the Viterbi decoding algorithm.

步骤5.汉语和初始盲文分词进行融合，即利用汉语盲文分词结果对盲文分词结果进行微调，以进一步提高分词的准确率。Step 5. Fusion of Chinese and initial Braille word segmentation, that is, fine-tuning the Braille word segmentation results using the Chinese Braille word segmentation results to further improve the accuracy of word segmentation.

对于中文分词C＝c₁c₂…c_m和盲文分词B＝b₁b₂…b_n，其中c_i,b_j分别表示中文及盲文中的一个分词，对于盲文分词B，可以将B映射至对应的中文分词 B'＝b₁'b'₂…b'_n，其中b'_j为盲文分词b_j映射为中文后的分词。对中文分词C和映射为中文的盲文分词B'进行编辑距离对齐，可以得到C和B'中不同的片段，运用上述的融合规则，确定不同片段的最终结果是采用中文分词结果还是盲文分词结果。假设C和B'中不同的片段分别定义为CH＝ch₁ch₂…ch_m和BR＝br₁br₂…br_n，具体步骤如下：For Chinese participle C=c₁ c₂ ...c_m and Braille participle B=b₁ b₂ ...b_n , where c_i and b_j respectively represent a participle in Chinese and Braille, and for Braille participle B, B can be mapped To the corresponding Chinese word segmentation B'=b₁ 'b'₂ ...b'_n , where b'_j is the word segmentation after b_j is mapped to Chinese in Braille. Align the Chinese word segmentation C with the Braille word segmentation B' mapped to Chinese to obtain different segments in C and B', and use the above fusion rules to determine whether the final results of different segments use the Chinese word segmentation result or the Braille word segmentation result . Assuming that the different fragments in C and B' are respectively defined as CH=ch₁ ch₂ ...ch_m and BR=br₁ br₂ ...br_n , the specific steps are as follows:

步骤5.1假设ch_i为CH中第i个分词，br_j为BR中第j个分词，初始值i,j 都设置为1Step 5.1 Assume that ch_i is the i-th participle in CH, br_j is the j-th participle in BR, and the initial values i and j are both set to 1

步骤5.2分别比较ch_i和br_j，如果说明第一个分词中，盲文分词包含中文分词，则对于第一个分词，采用盲文分词的结果br_j；相反的，如果则采用中文分词的结果ch_iStep 5.2 Compare ch_i and br_j respectively, if It shows that in the first participle, the Braille participle contains the Chinese participle, then for the first participle, the result of Braille participle br_j is used; on the contrary, if Then use the result of Chinese word segmentation ch_i

步骤5.3初始设置k＝1Step 5.3 Initially set k=1

5.3.1对于的情况，定义ch_i,i+k＝ch_i…ch_i+k，比较ch_i,i+k和br_j:5.3.1 For In the case of , define ch_i,i+k =ch_i ...ch_i+k , and compare ch_i,i+k with br_j :

a)如果ch_i,i+1＝br_j，设置i＝i+2,j＝j+1，如果i>m或j>n，跳转至步骤5.4，否则，跳转至步骤5.2a) If ch_i,i+1 =br_j , set i=i+2,j=j+1, if i>m or j>n, go to step 5.4, otherwise, go to step 5.2

b)如果k＝k+1，跳转至5.3.1b) if k=k+1, skip to 5.3.1

c)如果说明ch_i+k中包含br_j中的最后一个字，定义该字的位置为pos，则以pos为分界，将ch_i+k分为ch_i+k,pos和ch_{i+k,after_pos}，其中 ch_i+k＝ch_i+k,posch_{i+k,after_pos}，ch_i+k,pos表示ch_i+k中第1个到第pos个字组成的词组， ch_{i+k,after_pos}表示ch_i+k中第pos+1字到最后一个字组成的词组。将中文分词中第 i+k个分词用ch_{i+k,after_pos}替换，即更新CH＝ch₁…ch_i+k-1ch_{i+k,after_pos}ch_i+k+1…ch_m， i＝i+k,j＝j+1，跳转至步骤5.2c) if Explain that ch_i+k contains the last word in br_j , define the position of the word as pos, then use pos as the boundary, divide ch_i+k into ch_i+k,pos and ch_{i+k,after_pos} , Wherein ch_i+k ＝ch_i+k,pos ch_{i+k,after_pos} , ch_i+k,pos represents the phrase formed from the first to pos words in ch_i+k , ch_{i+k,after_pos} represents A phrase consisting of the word pos+1 to the last word in ch_i+k . Replace the i+k participle in the Chinese word segmentation with ch_{i+k, after_pos} , that is, update CH=ch₁ ...ch_i+k-1 ch_{i+k,after_pos} ch_i+k+1 ...ch_m , i= i+k, j=j+1, skip to step 5.2

5.3.2对于的情况，定义br_j,j+k＝br_j…br_j+k，比较br_j,j+k和ch_i:5.3.2 For In the case of , define br_j,j+k =br_j ...br_j+k , and compare br_j,j+k with ch_i :

a)如果br_j,j+1＝ch_i，则i＝i+1,j＝j+2，跳转至步骤5.2a) If br_j,j+1 =ch_i , then i=i+1,j=j+2, go to step 5.2

b)如果k＝k+1，跳转至5.3.2b) if k=k+1, skip to 5.3.2

c)如果说明br_j+k中包含ch_i中的最后一个字，定义该字的位置为pos，则以pos为分界，将br_j+k分为br_j+k,pos和br_{j+k,after_pos}，其中 br_j+k＝br_j+k,posbr_{j+k,after_pos}，br_j+k,pos表示br_j+k中第1个到第pos个字组成的词组， br_{j+k,after_pos}表示br_j+k中第pos+1字到最后一个字组成的词组。将盲文分词中第 j+k个分词用br_{j+k,after_pos}替换，即更新BR＝br₁…br_j+k-1br_{j+k,after_pos}br_j+k+1…br_n， i＝i+1,j＝j+k，跳转至步骤5.2c) if Explain that br_j+k contains the last word in ch_i , define the position of the word as pos, then use pos as the boundary, divide br_j+k into br_j+k,pos and br_{j+k,after_pos} , Wherein br_j+k ＝br_{j+k, pos} br_{j+k, after_pos} , br_{j+k, pos} represents the phrase formed from the first to pos words in br_j+k , br_{j+k, after_pos} represents br_j+k is a phrase composed of the word pos+1 to the last word. Replace the j+kth participle in Braille word segmentation with br_{j+k,after_pos} , that is, update BR=br₁ ...br_j+k-1 br_{j+k,after_pos} br_j+k+1 ...br_n , i= i+1, j=j+k, skip to step 5.2

步骤5.4结束整合算法Step 5.4 ends the integration algorithm

步骤6.根据盲文分词连写规则调整分词结果。依次查看分词对应的词性，并与盲文分词连写规则集中的激活条件进行比对，如果符合，则运用规则集中的条件对结果进行分词或连写。盲文分词连写规则集格式如下：Step 6. Adjust word segmentation results according to Braille word segmentation rules. Check the part of speech corresponding to the word segmentation in turn, and compare it with the activation conditions in the Braille word segmentation and writing rule set. If they match, use the conditions in the rule set to perform word segmentation or link writing on the result. The Braille word segmentation rule set format is as follows:

对于规则集中的规则，第一个冒号前的词性POS_k是激活条件，规则后会跟一中括号，里面的m和n表示需要分别查看当前分词的前m个词和n个词，如果m和n都为0，则表示这是一条分词规则。第二个冒号后表示的是分词的词性组合，如果满足该组合，则执行右箭头之后的操作。For the rules in the rule set, the part-of-speech POS_k before the first colon is the activation condition, and there will be a square bracket after the rule. The m and n inside indicate that you need to check the first m words and n words of the current participle, if m and n are both 0, which means this is a word segmentation rule. The part-of-speech combination of the participle is indicated after the second colon. If the combination is satisfied, the operation after the right arrow will be performed.

步骤7.盲文标调。依次查看每个分词对应字的拼音，并与盲文标调集中的规则进行比对，如果满足条件，则对当前字进行标调。盲文标调集的格式如下：Step 7. Braille coding. Check the pinyin of the character corresponding to each participle in turn, and compare it with the rules in the Braille marking set. If the conditions are met, the current word will be marked. The format of the Braille collection is as follows:

其中tone_k为当前字的拼音，方括号中的n表示需要查看当前字的后n个字的拼音，tone_k…tone_k+n为标调条件，如果拼音序列满足标调条件，则对tone_k进行标调Among them, tone_k is the pinyin of the current word, n in the square brackets indicates that the pinyin of the last n characters of the current word needs to be checked, tone_k ... tone_k+n is the tone condition, if the pinyin sequence meets the tone condition, then tone_k for calibration

步骤8.盲文显示，即将盲文输出到盲用点显器上。可采用当前已有的各种点显器产品，并调用其相应的输出接口。Step 8. Braille display, that is, to output the Braille to the dot display for the blind. Various existing dot display products can be used and their corresponding output interfaces can be called.

所述获取拼音串模块中将所述汉字串转换成拼音串的具体步骤为：The specific steps of converting the Chinese character string into a pinyin string in the module of obtaining the pinyin string are:

所述获取新盲文分词并调整模块中进行融合的步骤为，对于所述汉字串 C＝c₁c₂…c_m与所述初始盲文分词B＝b₁b₂…b_n，其中c_i,b_j分别表示所述汉字串及所述初始盲文分词中的一个分词，对于所述初始盲文分词B，将B映射至对应的所述汉字串B'＝b₁'b'₂…b'_n，其中b'_j为所述初始盲文分词b_j映射为中文后的分词。The step of obtaining the new Braille word segmentation and adjusting the fusion in the module is, for the Chinese character string C=c₁ c₂ ...c_m and the initial Braille word segmentation B=b₁ b₂ ...b_n , where c_i , b_j represent the Chinese character string and a word in the initial Braille word segmentation, for the initial Braille word B, map B to the corresponding Chinese character string B'=b₁ 'b'₂ ...b'_n , wherein b'_j is the word segmentation after the initial Braille word segmentation b_j is mapped to Chinese.

所述获取新盲文分词并调整模块中盲文分词连写规则如下：The acquisition of the new Braille word segmentation and adjustment of the Braille word segmentation rules in the module are as follows:

所述盲文显示模块中所述盲文标调的具体步骤为：The specific steps of the Braille marking in the Braille display module are:

下面通过对一个汉语句子进行汉语到盲文的转换及显示作为实例，详细介绍本发明的用于盲人读取汉字的方法及系统的实施过程,应该明白该例子只是用于举例说明，而不是意图限制本发明的范围。Below, by converting and displaying a Chinese sentence from Chinese to Braille as an example, the implementation process of the method and system for reading Chinese characters for blind people of the present invention will be introduced in detail. It should be understood that this example is only used for illustration, not intended to limit scope of the invention.

假设需转换为盲文的汉语句子为：“北京是她们的目的地”,采用汉语分词模块进行汉语分词并进行词性标注，得到的结果为：“北京/NR是/VC她们/PN 的/DEG目的/NN地/NN”。Assuming that the Chinese sentence that needs to be converted into Braille is: "Beijing is their destination", the Chinese word segmentation module is used to perform Chinese word segmentation and part-of-speech tagging, and the result is: "Beijing/NR is the /DEG purpose of /VC and their/PN /NN ground/NN".

调用汉字串到拼音串转换模块将分词结果转换为拼音串,对于“北京”、 “是”、“她们”、“目的”这五个词，通过查找发音字典可直接确认读音；对于 “的”和“地”这两个字，由于都是多音字，需调用算法确定多音字发音。Call the Chinese character string to pinyin string conversion module to convert the word segmentation result into a pinyin string. For the five words "Beijing", "yes", "they", and "purpose", the pronunciation can be directly confirmed by looking up the pronunciation dictionary; for "de" Since the two characters "地" and "地" are polyphonic characters, an algorithm needs to be called to determine the pronunciation of polyphonic characters.

以“的”字为例，通过词性标注可知“的”字的词性为“DEG”，由“DEG” 可以确认该字的发音为“de”，由于通过词性可唯一确认“的”字发音，所以：Taking the word "de" as an example, the part of speech of the word "de" is "DEG" through part-of-speech tagging. From "DEG", it can be confirmed that the pronunciation of the word "de" is "de". Since the pronunciation of the word "de" can be uniquely confirmed through the part of speech, so:

P_pos(de)＝1,P_pos (de)=1,

P_pos(di)＝0P_pos (di) = 0

在前一个词为“她们”的条件下，通过查找语言模型概率，可以得到发音为“de”的概率为0.45，发音为“di”的概率为0.05：Under the condition that the previous word is "they", by looking up the probability of the language model, the probability of pronouncing "de" is 0.45, and the probability of pronouncing "di" is 0.05:

P_lm(de)＝P(de|tamen)＝0.45P_lm (de)=P(de|tamen)=0.45

P_lm(di)＝P(di|tamen)＝0.05P_lm (di)=P(di|tamen)=0.05

进行归一化处理后，可以得到：P_lm(de)＝0.9,P_lm(di)＝0.1After normalization processing, it can be obtained: P_lm (de)=0.9, P_lm (di)=0.1

在词频字典中查找“的”的单字词频，发音为“de”的次数为185次，发音为“di”的次数为75次，通过计算可知，发音为“de”的概率为0.71，发音为“di”的概率为0.29Look up the word frequency of "的" in the word frequency dictionary, the number of times that it is pronounced as "de" is 185 times, and the number of times that it is pronounced as "di" is 75 times. It can be seen through calculation that the probability of pronunciation as "de" is 0.71, Probability of pronounced "di" is 0.29

根据经验值，设置词性、语言模型、词频三者概率的权重都为1/3，则：According to the empirical value, set the weights of the probabilities of part of speech, language model, and word frequency to 1/3, then:

通过得分比较，可以确定多音字“的”的最终发音为“de”。Through score comparison, it can be determined that the final pronunciation of the polyphonic word "的" is "de".

类似的，可以确定“地”字的发音为“di”。最终得到汉语句子对应的拼音串为“beijing shi ta men de mu di di”。Similarly, it can be determined that the word "地" is pronounced as "di". Finally, the pinyin string corresponding to the Chinese sentence is "beijing shi ta men de mu di di".

调用拼音串到盲符串转换模块，得到拼音串对应的盲符串为“B！G*:T9 M0 D MUDI DI”。(本说明书中采用的盲文表示为盲符的ASCII码编码，而非盲符的点位形式。下文中相同。)Call the pinyin string to blind character string conversion module to get the blind character string corresponding to the pinyin string as "B!G*:T9 M0 D MUDI DI". (The braille used in this manual is expressed as the ASCII code of the braille character, not the dot form of the braille character. The same applies below.)

调用盲文分词模块对盲符串进行分词，得到分词后的盲符串为“B！G*:|T9 M0|D|MU DI DI”。Call the Braille word segmentation module to segment the blind character string, and obtain the word-segmented blind character string as "B!G*:|T9 M0|D|MU DI DI".

调用汉语和盲文分词结果融合模块对中文分词结果和盲文分词结果进行融合。将分词后盲文串对应至汉语串，可得到采用盲文分词的汉字串为“北京是/她们/的/目的地”，将盲文分词的汉字串与汉语分词的汉字串进行编辑距离对齐，可得到附表1：Call the Chinese and Braille word segmentation results fusion module to fuse the Chinese word segmentation results and Braille word segmentation results. Corresponding the Braille string after the word segmentation to the Chinese string, the Chinese character string using the Braille word segmentation can be obtained as "Beijing is/they/the/destination", and the Chinese character string of the Braille word segmentation and the Chinese word segmentation Chinese character string are aligned with the edit distance, and we can get Schedule 1:

附表1：中文、盲文分词对照表Attached Table 1: Chinese and Braille Word Segmentation Comparison Table

对比附表1中汉语和盲文分词，有两个不同的片段，片段1“北京是”和片段2“目的地”。Comparing the Chinese and Braille word segmentation in Appendix 1, there are two different fragments, fragment 1 "Beijing is" and fragment 2 "destination".

对片段1进行处理，片段1的汉语分词为“北京/是”，盲文分词为“北京是”，取汉语分词第一个分词“北京”和盲文分词的第一个分词“北京是”进行对比，由于盲文分词中第一个词“北京是”包含了汉语分词中第一个词“北京”，继续查看汉语分词的第二个词“是”，并与第一个词“北京”进行组合形成“北京是”与盲文分词的第一个词“北京是”进行对比，因为两者相同且片段1中不再有其它未处理词，根据选取字数较多的词语作为最终分词的规则，因此确定片段1的分词为“北京是”。Process Fragment 1, the Chinese participle of Fragment 1 is "北京/是", the Braille participle is "北京是", and the first participle of Chinese participle "Beijing" is compared with the first participle of Braille participle "北京是". , since the first word "Beijing is" in the Braille participle contains the first word "Beijing" in the Chinese participle, continue to check the second word "Yes" in the Chinese participle and combine it with the first word "Beijing" The formation of "Beijing is" is compared with the first word "Beijing is" in Braille word segmentation, because the two are the same and there are no other unprocessed words in segment 1. According to the rule of selecting words with more words as the final word segmentation, therefore Determine the participle of segment 1 as "Beijing is".

类似的，可以确定片段2的分词为“目的地”。最终，可以确定融合后的分词结果为“北京是/她们/的/目的地”。Similarly, it can be determined that the word segmentation of segment 2 is "destination". Finally, it can be determined that the word segmentation result after fusion is "Beijing is/their/of/destination".

调用分词结果调整模块，根据汉语分词标注结果，北京的词性为“NR”，即专有名词，盲文标准中对于专有名词，后跟单音节通用名词才进行连写，示例中“北京”后跟“是”，词性为“VC”，即“系动词”，不满足盲文标准的条件，不应该进行连写，应对融合的分词“北京是”进行拆分，得到“北京/是”, 经调整后，得到的分词结果为“北京/是/她们/的/目的地”，其对应的盲文分词表示形式为“B！G*:T9M0 D MUDIDI”。Call the word segmentation result adjustment module. According to the results of Chinese word segmentation, the part of speech of Beijing is "NR", which is a proper noun. In the Braille standard, proper nouns are followed by monosyllable common nouns. In the example, "Beijing" is followed by " "is", the part of speech is "VC", that is, "linked verb", which does not meet the conditions of the Braille standard, and should not be written consecutively. The fused participle "Beijing is" should be split to obtain "Beijing/Yes". After adjustment, The obtained word segmentation result is "Beijing/is/they/of/destination", and its corresponding Braille word segmentation form is "B!G*:T9M0 D MUDIDI".

调用盲文标调模块对分词结果进行标调。盲文标准中规定，“他”、“她”、 “字”需使用特殊的表示方法，对于“她”字必须要标调。“她”的盲符为“T9”，声调为第一声，盲符中的表示为“A”，标调后盲文串的表示形式为“B！G*:T9AM0 D MUDIDI”。Call the Braille standardization module to standardize the word segmentation results. The Braille standard stipulates that "he", "she", and "character" need to use a special representation method, and the word "she" must be marked. The braille symbol of "she" is "T9", the tone is the first tone, the expression in the braille symbol is "A", and the expression form of the braille string after standardization is "B!G*:T9AM0 D MUDIDI".

调用盲文显示模块将盲文串显示在盲用点显器上。Call the braille display module to display the braille string on the dot display for the blind.

Claims

A kind of 1. method that Chinese character is read for blind person, which is characterized in that including：
Step 1, Chinese language text is obtained, carries out participle operation to the Chinese language text, generates Chinese character string, by pronunciation dictionary, moreSound word dictionary and word frequency information with reference to the part-of-speech tagging that participle obtains, each word in the Chinese character string are converted to correspondingPhonetic is simultaneously connected as pinyin string；
Step 2, by searching for phonetic and the control dictionary of blind symbol, the pinyin string is converted to the blind symbol string not segmented, is passed throughUsing braille participle is carried out to the blind symbol string with the trained participle model of statistical machine learning method in advance, generation is initial blindThe Chinese character string with the initial braille participle is merged, new braille participle is generated, according to braille word link writing by text participleRule is adjusted the new braille participle；
Step 3, to carrying out braille mark tune according to the new braille participle after braille word link writing rule adjustment, generation is final blindText participle shows the final braille participle.
2. the method for Chinese character is read for blind person as described in claim 1, which is characterized in that by the Chinese in the step 1Word string be converted into pinyin string the specific steps are：
Step 2.1 judges whether each word is multi-character words, if multi-character words, and is sending out for each word in the Chinese character stringThe corresponding phonetic of the multi-character words can be found in sound dictionary, then directly returns to the corresponding phonetic of the multi-character words, otherwise performsStep 2.2；
Step 2.2 by the multiword word segmentation be Chinese character sequence, Chinese character all in the multi-character words is taken successively, to eachChinese character performs step 2.3 to 2.4；
Step 2.3 judges whether the current Chinese character is polyphone for current Chinese character, lookup polyphone dictionary, if not multitoneWord searches the phonetic of the current Chinese character in pronunciation dictionary and returns to the phonetic；Otherwise step 2.4 is performed；
Step 2.4 then performs following steps if polyphone, the specific steps are：
If the current polyphones of step 2.4.1 come from a monosyllabic word, step 2.4.2 is directly performed；If multi-character words,Then perform following step：
For the polyphone w in multi-character words_k, a) step, the word W with follow-up n word one n+1 words of composition_k,n=w_kw_k+1…w_k+n,W is searched in polyphone phrase dictionary_k,n, such as find, then with W_k,nIn be searched the pronunciation of word as polyphone w_kPronunciationAnd it returns；If do not found, then b) step is performed, the word W of a n+1 words is formed with the word of front n_n-k,k=w_n-kw_n-kk+1…w_n,W is searched in polyphone phrase dictionary_n-k,k, such as find, then with W_k,nIn be searched word pronunciation as polyphone pronunciation simultaneouslyIt returns, does not search such as, then form the word W of a n words with the follow-up and word of front n-1 respectively_k,n-1、W_n-k+1,k, to the multi-character wordsPerform respectively a), b) step, until determining the polyphone w_kPronunciation；
Step 2.4.2 assumes that the polyphone has tone₁,...,tone_nCommon n pronunciation, participle part of speech definition of probability are P_pos,Weights are λ₁, probabilistic language model is defined as P_lm, weights λ₂, participle word frequency definition of probability is P_freq, weights λ₃, system isEach pronunciation of the polyphone calculates a score Score_i, wherein Score_i=λ₁·P_pos(tone_i)+λ₂·P_lm(tone_i)+λ₃·P_freq(tone_i), take out the pronunciation of highest scoringAs polyphone final phonetic and returnIt returns.
3. the method for Chinese character is read for blind person as described in claim 1, which is characterized in that merged in the step 2The step of be, for the Chinese character string C=c₁c₂…c_mWith the initial braille participle B=b₁b₂…b_n, wherein c_i,b_jTable respectivelyShow a participle in the Chinese character string and initial braille participle, B is segmented for the initial braille, B is mapped to pairThe Chinese character string B '=b ' answered₁b′₂…b′_n, wherein b '_jB is segmented for the initial braille_jIt is mapped as the participle after Chinese.
4. the method for Chinese character is read for blind person as described in claim 1, which is characterized in that braille segments in the step 2Combination handwriting rule is as follows：
Combination handwriting rule：POS_k:[m,n]:POS_k-m+…+POS_k+…+POS_k+n→POS_k-m…POS_k+n
Word segmentation regulation：
POS_kFor activation condition, m and n expressions need to check the preceding m word and n word of current new braille participle respectively, if m and nAll it is 0, then it represents that this is a word segmentation regulation, and what is represented after second colon is the part of speech combination of participle, if meeting the groupIt closes, then performs the operation after right arrow.
5. the method for Chinese character is read for blind person as described in claim 1, which is characterized in that braille described in the step 3Mark adjust the specific steps are：
The phonetic of the corresponding word of new braille participle after each adjustment is checked successively, and the rule in being assembled with braille mark carries outIt compares, if meeting condition, current new braille is segmented into rower tune, the form that the braille mark is assembled is as follows：
Mark adjusts rule：tone_k:[n]:tone_k…tone_k+n
Wherein tone_kFor the phonetic of current new braille participle, n is to need to check rear n new braille participles of current new braille participlePhonetic, tone_k…tone_k+nTo mark tune condition, if pinyin sequence meets mark tune condition, to tone_kInto rower tune.
6. a kind of system that Chinese character is read for blind person, which is characterized in that including：
Pinyin string module is obtained, for obtaining Chinese language text, participle operation is carried out to the Chinese language text, generates Chinese character string, is led toPronunciation dictionary, polyphone dictionary and word frequency information are crossed, with reference to the part-of-speech tagging that participle obtains, by each word in the Chinese character stringIt is converted to corresponding phonetic and is connected as pinyin string；
It obtains new braille to segment and adjust module, for the control dictionary by searching for phonetic and blind symbol, the pinyin string is turnedThe blind symbol string not segmented is changed to, by using in advance with the trained participle model of statistical machine learning method to the blind symbol stringBraille participle is carried out, generates initial braille participle, the Chinese character string is merged with the initial braille participle, generation is new blindText participle is adjusted the new braille participle according to braille word link writing rule；
Braille display module, for carrying out braille mark according to the new braille participle after braille word link writing rule adjustmentIt adjusts, generates final braille participle, the final braille participle is shown.
7. the system of Chinese character is read for blind person as claimed in claim 6, which is characterized in that in the acquisition pinyin string moduleBy the Chinese character string be converted into pinyin string the specific steps are：
Step 2.1 judges whether each word is multi-character words, if multi-character words, and is sending out for each word in the Chinese character stringThe corresponding phonetic of the multi-character words can be found in sound dictionary, then directly returns to the corresponding phonetic of the multi-character words, otherwise performsStep 2.2；
Step 2.2 by the multiword word segmentation be Chinese character sequence, Chinese character all in the multi-character words is taken successively, to eachChinese character performs step 2.3 to 2.4；
Step 2.3 judges whether the current Chinese character is polyphone for current Chinese character, lookup polyphone dictionary, if not multitoneWord searches the phonetic of the current Chinese character in pronunciation dictionary and returns to the phonetic；Otherwise step 2.4 is performed；
Step 2.4 then performs following steps if polyphone, the specific steps are：
If the current polyphones of step 2.4.1 come from a monosyllabic word, step 2.4.2 is directly performed；If multi-character words,Then perform following step：
For the polyphone w in multi-character words_k, a) step, the word W with follow-up n word one n+1 words of composition_k,n=w_kw_k+1…w_k+n,W is searched in polyphone phrase dictionary_k,n, such as find, then with W_k,nIn be searched the pronunciation of word as polyphone w_kPronunciationAnd it returns；If do not found, then b) step is performed, the word W of a n+1 words is formed with the word of front n_n-k,k=w_n-kw_n-kk+1…w_n,W is searched in polyphone phrase dictionary_n-k,k, such as find, then with W_k,nIn be searched word pronunciation as polyphone pronunciation simultaneouslyIt returns, does not search such as, then form the word W of a n words with the follow-up and word of front n-1 respectively_k,n-1、W_n-k+1,k, to the multi-character wordsPerform respectively a), b) step, until determining the polyphone w_kPronunciation；
Step 2.4.2 assumes that the polyphone has tone₁,...,tone_nCommon n pronunciation, participle part of speech definition of probability are P_pos,Weights are λ₁, probabilistic language model is defined as P_lm, weights λ₂, participle word frequency definition of probability is P_freq, weights λ₃, system isEach pronunciation of the polyphone calculates a score Score_i, wherein Score_i=λ₁·P_pos(tone_i)+λ₂·P_lm(tone_i)+λ₃·P_freq(tone_i), take out the pronunciation of highest scoringAs polyphone final phonetic and returnIt returns.
8. the system of Chinese character is read for blind person as claimed in claim 6, which is characterized in that described to obtain new braille participle simultaneouslyThe step of being merged in adjustment module is, for the Chinese character string C=c₁c₂…c_mWith the initial braille participle B=b₁b₂…b_n, wherein c_i,b_jA participle in the Chinese character string and the initial braille participle is represented respectively, for the initial brailleB is segmented, B is mapped into the corresponding Chinese character string B '=b '₁b′₂…b′_n, wherein b '_jB is segmented for the initial braille_jMappingFor the participle after Chinese.
9. the system of Chinese character is read for blind person as claimed in claim 6, which is characterized in that described to obtain new braille participle simultaneouslyIt is as follows to adjust braille word link writing rule in module：
Combination handwriting rule：POS_k:[m,n]:POS_k-m+…+POS_k+…+POS_k+n→POS_k-m…POS_k+n
Word segmentation regulation：
POS_kFor activation condition, m and n expressions need to check the preceding m word and n word of current new braille participle respectively, if m and nAll it is 0, then it represents that this is a word segmentation regulation, and what is represented after second colon is the part of speech combination of participle, if meeting the groupIt closes, then performs the operation after right arrow.
10. the system of Chinese character is read for blind person as claimed in claim 6, which is characterized in that in the braille display moduleThe braille mark tune the specific steps are：
The phonetic of the corresponding word of new braille participle after each adjustment is checked successively, and the rule in being assembled with braille mark carries outIt compares, if meeting condition, current new braille is segmented into rower tune, the form that the braille mark is assembled is as follows：
Mark adjusts rule：tone_k:[n]:tone_k…tone_k+n
Wherein tone_kFor the phonetic of current new braille participle, n is to need to check rear n new braille participles of current new braille participlePhonetic, tone_k…tone_k+nTo mark tune condition, if pinyin sequence meets mark tune condition, to tone_kInto rower tune.