CN110070853A

Movatterモバイル変換

Info

Publication number: CN110070853A
Application number: CN201910356270.9A
Authority: CN
Inventors: 杨彦; 罗文华; 马芳
Original assignee: Yancheng Vocational Institute of Industry Technology
Current assignee: Beijing Lingshang Chunding Technology Co ltd; Hefei Wisdom Dragon Machinery Design Co ltd
Priority date: 2019-04-29
Filing date: 2019-04-29
Publication date: 2019-07-30
Anticipated expiration: 2039-04-29
Also published as: CN111583905A; CN110070853B; CN111583905B

Abstract

The invention provides a voice recognition and conversion method and a system, wherein the method comprises the following steps: acquiring voice data to be recognized; recognizing a language family corresponding to the voice data according to a plurality of language family databases; acquiring the language family database corresponding to the voice data from a plurality of language family databases according to the language family; the language family database comprises a plurality of language category databases; obtaining languages corresponding to the voice data from a plurality of language data sub-databases; converting the voice data into text data corresponding to the language according to a text conversion database; extracting keyword data of the text data; and acquiring keyword voice data corresponding to the keyword data in the voice data, and storing the keyword data and the keyword voice data into the text conversion database.

Description

Translated fromChinese

一种语音识别转化方法及系统A kind of speech recognition conversion method and system

技术领域technical field

本发明涉及语音识别技术领域，特别涉及一种语音识别转化方法及系统。The invention relates to the technical field of speech recognition, in particular to a speech recognition conversion method and system.

背景技术Background technique

随着科学技术的不断发展，语音识别技术已经融入到了人们生活的方方面面。例如，人们在不方便手动输入文字时，通过将语音数据输入电子设备，电子设备对语音数据进行自动转换为文本数据。With the continuous development of science and technology, speech recognition technology has been integrated into all aspects of people's lives. For example, when it is inconvenient for people to manually input text, by inputting voice data into an electronic device, the electronic device automatically converts the voice data into text data.

但目前，传统的语音识别技术需要人工设置语音转换的语种，并不能够实现将语音数据转化为与语音数据具有相同语种的文本数据。因此，急需一种语音识别转化方法及系统。However, at present, the traditional speech recognition technology needs to manually set the language of speech conversion, and cannot convert speech data into text data in the same language as the speech data. Therefore, there is an urgent need for a speech recognition conversion method and system.

发明内容SUMMARY OF THE INVENTION

为解决上述技术问题，本发明提供一种语音识别转化方法及系统，用以实现对语音数据的语种的自动识别，转化为与语音数据具有相同语种的文本数据。In order to solve the above technical problems, the present invention provides a speech recognition conversion method and system for realizing automatic recognition of the language of speech data and converting it into text data with the same language as the speech data.

本发明实施例中提供了一种语音识别转化方法，所述方法包括如下步骤：In the embodiment of the present invention, a speech recognition conversion method is provided, and the method includes the following steps:

S101、获取待识别的语音数据；S101, acquiring voice data to be recognized;

S102、根据多个语系数据库，识别所述语音数据所对应的语系；S102, identifying the language family corresponding to the voice data according to multiple language family databases;

S103、根据所述语系，从多个语系数据库中获取与所述语音数据对应的所述语系数据库；所述语系数据库，包括多个语种数据子库；S103, according to the language family, obtain the language family database corresponding to the voice data from a plurality of language family databases; the language family database includes a plurality of language data sub-databases;

S104、从多个所述语种数据子库中获取与所述语音数据对应的语种；库S104, acquiring languages corresponding to the voice data from a plurality of the language data sub-databases; the library

S105、根据文本转换数据库，将所述语音数据转换为所述语种对应的文本数据；S105, according to the text conversion database, convert the voice data into text data corresponding to the language;

S106、提取所述文本数据的关键词数据；S106, extracting the keyword data of the text data;

S107、获取所述语音数据中所述关键词数据对应的关键词语音数据，并将所述关键词数据和关键词语音数据存储到所述文本转换数据库中。S107: Acquire keyword voice data corresponding to the keyword data in the voice data, and store the keyword data and the keyword voice data in the text conversion database.

在一个实施例中，多个所述语系数据库，包括印欧语系数据库，闪含语系数据库，阿尔泰语系数据库，乌拉尔语系数据库，高加索语系数据库，汉藏语系数据库和德拉维达语系数据库。In one embodiment, a plurality of the language family databases include an Indo-European language database, a Semitic language database, an Altaic language database, a Uralic language database, a Caucasian language database, a Sino-Tibetan language database, and a Dravidian language database.

在一个实施例中，所述步骤S101、获取待识别的语音数据之后，所述方法包括：用于对所述语音数据进行预处理；具体步骤包括：In one embodiment, in step S101, after acquiring the voice data to be recognized, the method includes: preprocessing the voice data; the specific steps include:

检测获取所述语音数据中的静音区间；Detect and obtain the silent interval in the voice data;

根据所述静音区间，对所述语音数据进行过滤处理，获取过滤处理后的语音数据。According to the silent interval, filtering processing is performed on the voice data, and the filtered voice data is obtained.

在一个实施例中，所述步骤S102、根据多个语系数据库，识别所述语音数据所对应的语系；具体步骤包括：In one embodiment, the step S102 is to identify the language family corresponding to the voice data according to multiple language family databases; the specific steps include:

获取所述语音数据的语系数据；具体包括：Obtain the language data of the voice data; specifically, it includes:

将所述语音数据根据语音时长均等的分为两段子语音数据，并分别提取所述两段子语音数据的音频特征，形成两个语音频特征矩阵；并通过以下公式(1)，获取语系数据：The voice data is divided into two sections of sub-voice data equally according to the voice duration, and the audio features of the two sections of sub-voice data are extracted respectively to form two voice and audio feature matrices; And by following formula (1), obtain language data:

其中F为语系数据，(Y₁Y₂…Y_n)为第一段语音音频特征矩阵，(y₁y₂…y_n)为第二段语音音频特征矩阵；Among them, F is the language data, (Y₁ Y₂ ... Y_n ) is the first segment of speech and audio feature matrix, and (y₁ y₂ ... y_n ) is the second segment of speech and audio feature matrix;

并将所述语系数据与多个所述语系数据库内预设的语系阈值数据进行比对，获取所述语音数据的所对应的语系；Comparing the language family data with a plurality of language family threshold data preset in the language family database to obtain the language family corresponding to the speech data;

所述语系阈值数据，包括所述印欧语系数据库对应的印欧语系阈值数据、所述闪含语系数据库对应的闪含语系阈值数据、所述阿尔泰语系数据库对应的阿尔泰语系阈值数据、所述乌拉尔语系数据库对应的乌拉尔语系阈值数据、所述高加索语系数据库对应的高加索语系阈值数据、所述汉藏语系数据库对应的汉藏语系阈值数据和所述德拉维达语系数据库对应的德拉维达语系阈值数据。The language family threshold data includes the Indo-European language family threshold data corresponding to the Indo-European language family database, the Semitic language family threshold value data corresponding to the Semitic language family database, the Altaic language family threshold data corresponding to the Altaic language family database, and the Ural language family. Uralic language threshold data corresponding to the language family database, Caucasian language threshold data corresponding to the Caucasian language database, Sino-Tibetan language threshold data corresponding to the Sino-Tibetan language database, and Dravidian language database corresponding to the Dravidian language database Da language threshold data.

在一个实施例中，所述步骤S102之后，所述方法还包括：In one embodiment, after the step S102, the method further includes:

判断对所述语音数据的语系识别是否成功；judging whether the language recognition of the speech data is successful;

若识别成功，执行所述步骤S103；If the identification is successful, execute the step S103;

若识别失败，则根据所述语系数据和所述语系阈值数据，计算所述语音数据的与所述语系阈值数据的语系类间距离数据；If the recognition fails, calculate the distance data between the language families of the speech data and the language family threshold data according to the language family data and the language family threshold data;

获取所述语系类间距离中的最小值数据，并将所述最小值数据对应的语系作为所述语音数据的语系；Obtain the minimum value data in the distance between the language families and classes, and use the language family corresponding to the minimum value data as the language family of the voice data;

所述语系类间距离，包括所述语系数据与所述印欧语系阈值数据之间的印欧语系类间距离数据、所述语系数据与所述闪含语系阈值数据之间的闪含语系类间数据、所述语系数据与所述阿尔泰语系阈值数据之间的阿尔泰语系类间数据、所述语系数据与所述乌拉尔语系阈值数据之间的乌拉尔语系类间数据、所述语系数据与所述高加索语系阈值数据之间的高加索语系类间数据、所述语系数据与所述汉藏语系阈值数据之间的汉藏语系类间数据和所述语系数据与所述德拉维达语系阈值数据之间的德拉维达语系类间距离。The distance between language families includes the distance data between Indo-European language families between the language family data and the Indo-European language family threshold data, and the Semitic language family between the language family data and the Semitic language family threshold data inter-class data between the language family data, the Altaic language family inter-class data between the language family data and the Altaic language family threshold data, the Uralic language family inter-class data between the language family data and the Uralic language family threshold data, the language family data and the Uralic language family data Caucasian language family interclass data between the Caucasian language family threshold data, Sino-Tibetan language family interclass data between the language family data and the Sino-Tibetan language family threshold data, and the language family data and the Dravidian language family threshold value Dravidian interclass distance between data.

在一个实施例中，所述S106、提取所述文本数据的关键词数据；具体步骤包括：In one embodiment, the S106, extracting the keyword data of the text data; the specific steps include:

对所述文本数据进行分词处理，获取多个词组；具体包括如下步骤：Perform word segmentation processing on the text data to obtain multiple phrases; the specific steps include the following steps:

建立分词模型；其具体步骤如下所示S201-S203：A word segmentation model is established; the specific steps are as follows: S201-S203:

S201将所述文本数据中的第一个字标注为B，S201 marks the first word in the text data as B,

S202提取所述文本数据中标注为B的后一个字，并标注为C，同时提取所述文本数据中中C所对应的字的所有前一个字去重后组成集合D，利用公式(2)判断所述标注为B的字是否是词语的结束字段；S202 extracts the last word marked as B in the text data, and marks it as C, and extracts all the previous words of the word corresponding to C in the text data to form a set D after deduplication, using formula (2) Determine whether the word marked as B is the end field of the word;

其中，P₁,P₂为中间函数，length(D)为集合D中间的字的个数，P(B)为出现标注为B所对应的字的概率，P(C)为出现标注为C所对应的字的概率，length(all)为文本总长度，P(BC)为标注为B所对应字和标注为C所对应的字同时出现的概率，若最终B＝B则，标注B不变，若B＝E则将所述标注为B改为标注为E；Among them, P₁ , P₂ are intermediate functions, length(D) is the number of words in the middle of set D, P(B) is the probability of occurrence of the word marked as B, and P(C) is the occurrence of the word marked as C The probability of the corresponding word, length(all) is the total length of the text, P(BC) is the probability that the word corresponding to the label B and the word corresponding to the label C appear at the same time, if the final B=B, the label B is not. change, if B=E, the labeling as B is changed to be labelled as E;

S203判断所述C是否为最后一个字，若是，则将所述标注C改为标注E，分词结束；若不是，则将所述标注为C改为标注为B，重复步骤S202和S203；S203 judges whether the C is the last character, if so, change the label C to label E, and the word segmentation ends; if not, change the label C to label B, and repeat steps S202 and S203;

对所述文本数据分词的步骤为：The steps of segmenting the text data are:

将文本数据的开始阶段和所有标注为E的字段后面增加切割线，则任意两个切割线之间为一个词组，提取所有词组，形成词组向量F1，对所述词组向量F1去除重复值，形成相应的词组集合F2，则所述集合F2中的词组则为分词处理后获取的词组，F2中含有词组个数为N个；Add a cutting line to the beginning of the text data and all fields marked as E, so that a phrase is formed between any two cutting lines, extract all the phrases to form a phrase vector F1, and remove duplicate values from the phrase vector F1 to form Corresponding phrase set F2, then the phrases in the set F2 are phrases obtained after word segmentation processing, and the number of phrases contained in F2 is N;

提取所述词组中的关键词数据；具体步骤包括：Extract the keyword data in the phrase; Concrete steps include:

首先利用公式(3)计算集合F2中每个词组的关键得分；First use formula (3) to calculate the key score of each phrase in the set F2;

其中，Q_i为F2中第i个词组的得分，e为自然常熟，lenght(F2_i)为F2中第i个词组的长度，P(F2_i)为F2中第i个词组的长度在向量F1中出现的次数，i＝1、2、3……n；Among them, Qi is the score of the ith phrase in F2, e is the natural Changshu, lenght(F2_i ) is the length of the_{ith phrase in F2, P(F2 i}₎ is the length of the ith phrase in F2 in the vector The number of occurrences in F1, i=1, 2, 3...n;

利用公式(4)确定关键词数据；Use formula (4) to determine the keyword data;

gjc＝find(max(Q₁,Q₂,Q₃……Q_N))gjc=find(max(Q₁ , Q₂ , Q₃ ......Q_N ))

(4) (4)

其中，gjc为最终得到的关键词，find(A)为寻找出A的值所对应的关键词，max()求取最大值；则gjc所对应的词则为确定的关键词数据。Among them, gjc is the final keyword, find(A) is the keyword corresponding to the value of A, and max() is the maximum value; the word corresponding to gjc is the determined keyword data.

一种语音识别转换系统，包括获取模块、语系识别模块、数据库选择模块、语种识别模块、文本转换模块、关键词提取模块和所述数据库更新模块；其中，所述获取模块，用于获取待识别的语音数据；A speech recognition conversion system, comprising an acquisition module, a language family recognition module, a database selection module, a language recognition module, a text conversion module, a keyword extraction module and the database update module; wherein, the acquisition module is used for acquiring to-be-recognized modules voice data;

所述语系识别模块，用于根据多个语系数据库，识别与所述语音数据所对应的语系；The language family identification module is used to identify the language family corresponding to the voice data according to a plurality of language family databases;

所述数据库选择模块，用于根据所述语系，从多个语系数据库中获取与所述语音数据对应的所述语系数据库；所述语系数据库，包括多个语种数据子库；The database selection module is configured to acquire, according to the language family, the language family database corresponding to the speech data from a plurality of language family databases; the language family database includes a plurality of language data sub-databases;

所述语种识别模块，用于从多个所述语种数据子库中获取与所述语音数据对应的语种；The language identification module is configured to acquire the language corresponding to the speech data from a plurality of the language data sub-databases;

所述文本转换模块，用于根据文本转换数据库，将所述语音数据转换为所述语种对应的文本数据；The text conversion module is used to convert the voice data into text data corresponding to the language according to a text conversion database;

所述关键词提取模块，用于提取所述文本数据的关键词数据；The keyword extraction module is used to extract the keyword data of the text data;

所述数据库更新模块，用于获取所述语音数据中所述关键词数据对应的关键词语音数据，并将所述关键词数据和关键词语音数据存储到所述文本转换数据库中。The database updating module is configured to acquire the keyword voice data corresponding to the keyword data in the voice data, and store the keyword data and the keyword voice data in the text conversion database.

在一个实施例中，所述文本转换数据库，包括信息类别识别单元、第一存储区和第二存储区；In one embodiment, the text conversion database includes an information category identification unit, a first storage area and a second storage area;

所述信息类别识别单元，用于将所述关键词语音数据向所述第一存储区传输，还用于将所述关键词数据向所述第二存储区传输；所述第一存储区，用于对所述关键词语音数据通过第一加密算法运算后进行存储；所述第二存储区，用于对所述关键词数据通过第二加密算法运算后进行存储；所述第一存储区中还存储有所述关键词语音数据对应的所述关键词数据的存储地址；The information category identification unit is configured to transmit the keyword voice data to the first storage area, and also to transmit the keyword data to the second storage area; the first storage area, For storing the keyword speech data after passing the first encryption algorithm operation; the second storage area for storing the keyword data after passing the second encryption algorithm operation; the first storage area Also stored in the storage address of the keyword data corresponding to the keyword voice data;

所述第一加密算法或者所述第二加密算法，包括等值加密算法、对称加密算法中的一种或多种。The first encryption algorithm or the second encryption algorithm includes one or more of an equivalent encryption algorithm and a symmetric encryption algorithm.

本发明的其它特征和优点将在随后的说明书中阐述，并且，部分地从说明书中变得显而易见，或者通过实施本发明而了解。本发明的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description, claims, and drawings.

下面通过附图和实施例，对本发明的技术方案做进一步的详细描述。The technical solutions of the present invention will be further described in detail below through the accompanying drawings and embodiments.

附图说明Description of drawings

图1为本发明所提供一种语音识别转化方法的结构示意图；Fig. 1 is the structural representation of a kind of speech recognition conversion method provided by the present invention;

图2为本发明所提供一种语音识别转化系统的结构示意图。FIG. 2 is a schematic structural diagram of a speech recognition conversion system provided by the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的优选实施例进行说明，应当理解，此处所描述的优选实施例仅用于说明和解释本发明，并不用于限定本发明。The preferred embodiments of the present invention will be described below with reference to the accompanying drawings. It should be understood that the preferred embodiments described herein are only used to illustrate and explain the present invention, but not to limit the present invention.

本发明实施例提供了一种语音识别转化方法，如图1所示，方法包括如下步骤：An embodiment of the present invention provides a speech recognition conversion method, as shown in FIG. 1 , the method includes the following steps:

S102、根据多个语系数据库，识别语音数据所对应的语系；S102, identifying the language family corresponding to the voice data according to a plurality of language family databases;

S103、根据语系，从多个语系数据库中获取与语音数据对应的语系数据库；语系数据库，包括多个语种数据子库；S103, according to the language family, obtain language family databases corresponding to the voice data from multiple language family databases; the language family database includes multiple language data sub-databases;

S104、从多个语种数据子库中获取与语音数据对应的语种；S104, acquiring languages corresponding to the voice data from multiple language data sub-databases;

S105、根据文本转换数据库，将语音数据转换为语种对应的文本数据；S105, according to the text conversion database, convert the voice data into text data corresponding to the language;

S106、提取文本数据的关键词数据；S106, extracting the keyword data of the text data;

S107、获取语音数据中关键词数据对应的关键词语音数据，并将关键词数据和关键词语音数据存储到文本转换数据库中。S107: Acquire keyword speech data corresponding to the keyword data in the speech data, and store the keyword data and the keyword speech data in a text conversion database.

上述方法的工作原理在于：通过多个语系数据库，获取待识别的语音数据所对应的语系；根据语系，选择与语音数据相对应的语系数据库，语系数据库中存储有多个语种数据子库；通过多个语种数据子库，获取待识别的语音数据的语种；并根据文本转换数据库，将语音数据转换为该语种所对应的文本数据；The working principle of the above method is as follows: obtaining the language family corresponding to the speech data to be recognized through a plurality of language family databases; selecting a language family database corresponding to the speech data according to the language family, and storing a plurality of language data sub-databases in the language family database; a plurality of language data sub-databases to obtain the language of the speech data to be recognized; and according to the text conversion database, convert the speech data into text data corresponding to the language;

提取文本数据中的关键词数据，并在语音数据中获取关键词数据所对应的关键词语音数据向文本转换数据库传输进行存储。The keyword data in the text data is extracted, and the keyword voice data corresponding to the keyword data is obtained in the voice data, and is transmitted to the text conversion database for storage.

上述方法的有益效果在于：通过多个语系数据库，实现了对语音数据的语系的获取；通过语系数据库中的多个语种数据子库，实现了对语音数据的语种的获取；并根据文本转换数据库，实现了将语音数据按照语种转换为文本数据；从而实现了语音识别转化的功能；上述方法通过语种的识别，将所获取的语音数据转化为与语音数据相同语种的文本数据，从而实现了将语音数据转化为文本数据；并且通过多个语系数据库以及语系数据库中的多个语种数据子库，实现了对不同语种的语音数据的转化。并且提取所生成的文本数据中的关键词数据，获取语音数据中关键词数据对应的关键词语音数据，将关键词语音数据和关键词数据向文本转换数据库传输进行存储，从而实现了对文本转换数据库的更新，进一步地提高了以后语音识别转换的效率；解决了传统技术中语音转化时需要人工设置语音转换语种的不便，能够实现对语音数据的语种的自动识别，转化为与语音数据具有相同语种的文本数据。The beneficial effects of the above method are as follows: acquisition of language families of speech data is achieved through a plurality of language family databases; acquisition of language types of speech data is achieved through a plurality of language data sub-databases in the language family database; and the database is converted according to text. , realizes the conversion of speech data into text data according to the language; thus realizes the function of speech recognition conversion; the above method converts the acquired speech data into text data of the same language as the speech data through language recognition, thereby realizing the The voice data is converted into text data; and the conversion of voice data of different languages is realized through multiple language databases and multiple language data sub-databases in the language database. And extract the keyword data in the generated text data, obtain the keyword voice data corresponding to the keyword data in the voice data, and transmit the keyword voice data and the keyword data to the text conversion database for storage, thereby realizing the text conversion. The update of the database further improves the efficiency of speech recognition conversion in the future; it solves the inconvenience of manually setting the language of speech conversion during speech conversion in the traditional technology, and can realize automatic recognition of the language of the speech data, and convert it into the same language as the speech data. Language text data.

在一个实施例中，多个语系数据库，包括印欧语系数据库，闪含语系数据库，阿尔泰语系数据库，乌拉尔语系数据库，高加索语系数据库，汉藏语系数据库和德拉维达语系数据库。上述技术方案中按照世界七大语系设置了七个语系语系数据库，从而实现了对语音数据的语系的识别。In one embodiment, the plurality of language databases includes an Indo-European database, a Semitic database, an Altaic database, a Uralic database, a Caucasian database, a Sino-Tibetan database, and a Dravidian database. In the above technical solution, seven language family databases are set up according to the seven major language families in the world, thereby realizing the identification of the language families of the speech data.

在一个实施例中，步骤S101、获取待识别的语音数据之后，方法包括：用于对语音数据进行预处理；具体步骤包括：In one embodiment, in step S101, after acquiring the voice data to be recognized, the method includes: preprocessing the voice data; the specific steps include:

检测获取语音数据中的静音区间；Detect the silent interval in the acquired voice data;

根据静音区间，对语音数据进行过滤处理，获取过滤处理后的语音数据。上述技术方案中通过检测静音区间，过滤处理了语音数据中的静音部分，减少了后续步骤工作所需的时间，提高了工作效率。According to the silent interval, the voice data is filtered to obtain the filtered voice data. In the above technical solution, the silent part in the voice data is filtered and processed by detecting the silent interval, which reduces the time required for the work of the subsequent steps and improves the work efficiency.

在一个实施例中，步骤S102、根据多个语系数据库，识别语音数据所对应的语系；具体步骤包括：In one embodiment, in step S102, the language family corresponding to the speech data is identified according to multiple language family databases; the specific steps include:

获取语音数据的语系数据；具体包括：将语音数据根据语音时长均等的分为两段子语音数据，并分别提取两段子语音数据的音频特征，形成两个语音频特征矩阵；并通过以下公式(1)，获取语系数据：Obtain the language data of the voice data; specifically include: dividing the voice data into two sub-voice data equally according to the voice duration, and extracting the audio features of the two sub-voice data respectively to form two voice and audio feature matrices; and by the following formula (1 ) to get language data:

并将语系数据与多个语系数据库内预设的语系阈值数据进行比对，获取语音数据的所对应的语系；Comparing the language family data with preset language family threshold data in multiple language family databases to obtain the language family corresponding to the speech data;

语系阈值数据，包括印欧语系数据库对应的印欧语系阈值数据、闪含语系数据库对应的闪含语系阈值数据、阿尔泰语系数据库对应的阿尔泰语系阈值数据、乌拉尔语系数据库对应的乌拉尔语系阈值数据、高加索语系数据库对应的高加索语系阈值数据、汉藏语系数据库对应的汉藏语系阈值数据和德拉维达语系数据库对应的德拉维达语系阈值数据。上述技术方案中通过获取语音数据的语系数据，并将语系数据与预设的多个语系数据库所对应的语系阈值数据进行比对，当语系数据在某一个语系数据库所对应的语系阈值数据范围内时，则判定语音数据为该语系数据库所对应的语系，从而实现了对语音数据语种的识别。Language threshold data, including Indo-European threshold data corresponding to the Indo-European database, Semitic threshold data corresponding to the Semitic database, Altaic threshold data corresponding to the Altaic database, Uralic threshold data corresponding to the Uralic database, Caucasian threshold data The Caucasian language family threshold data corresponding to the language family database, the Sino-Tibetan language family threshold data corresponding to the Sino-Tibetan language family database, and the Dravidian language family threshold data corresponding to the Dravidian language family database. In the above technical solution, by acquiring the language family data of the voice data, and comparing the language family data with the language family threshold data corresponding to the preset multiple language family databases, when the language family data is within the language family threshold data range corresponding to a certain language family database. When the voice data is determined as the language family corresponding to the language family database, the recognition of the language type of the voice data is realized.

例如：所获取的语音数据的语系数据为3.45；印欧语系数据库对应的印欧语系阈值数据为1-2、闪含语系数据库对应的闪含语系阈值数据为3-4、阿尔泰语系数据库对应的阿尔泰语系阈值数据为5-6、乌拉尔语系数据库对应的乌拉尔语系阈值数据为7-8、高加索语系数据库对应的高加索语系阈值数据为9-10、汉藏语系数据库对应的汉藏语系阈值数据为11-12和德拉维达语系数据库对应的德拉维达语系阈值数据为13-14；则判定该语音数据的语系为闪含语系。For example: the language family data of the acquired speech data is 3.45; the Indo-European language threshold data corresponding to the Indo-European language database is 1-2, the flash language threshold data corresponding to the flash language database is 3-4, and the Altaic language database corresponding The threshold data of the Altaic language family is 5-6, the threshold data of the Uralic language family corresponding to the Uralic language family database is 7-8, the threshold data of the Caucasian language family corresponding to the Caucasian language family database is 9-10, and the threshold data of the Sino-Tibetan language family corresponding to the Sino-Tibetan language family database. The threshold data of the Dravidian language family corresponding to 11-12 and the Dravidian language family database are 13-14; then it is determined that the language family of the speech data is a flashy language family.

在一个实施例中，步骤S102之后，方法还包括：In one embodiment, after step S102, the method further includes:

判断对语音数据的语系识别是否成功；Determine whether the language recognition of the speech data is successful;

若识别成功，执行步骤S103；If the identification is successful, step S103 is performed;

若识别失败，则根据语系数据和语系阈值数据，计算语音数据的与语系阈值数据的语系类间距离数据；If the recognition fails, according to the language family data and the language family threshold data, calculate the language family distance data between the speech data and the language family threshold data;

获取语系类间距离中的最小值数据，并将最小值数据对应的语系作为语音数据的语系；Obtain the minimum value data in the distance between language families and classes, and use the language family corresponding to the minimum value data as the language family of the speech data;

语系类间距离，包括语系数据与印欧语系阈值数据之间的印欧语系类间距离数据、语系数据与闪含语系阈值数据之间的闪含语系类间数据、语系数据与阿尔泰语系阈值数据之间的阿尔泰语系类间数据、语系数据与乌拉尔语系阈值数据之间的乌拉尔语系类间数据、语系数据与高加索语系阈值数据之间的高加索语系类间数据、语系数据与汉藏语系阈值数据之间的汉藏语系类间数据和语系数据与德拉维达语系阈值数据之间的德拉维达语系类间距离。上述技术方案中对语音数据的语系识别是否成功进行了判断，当语系识别成功后，执行后续步骤；当语系识别失败后，则计算语系数据与多个语系阈值数据之间的多个语系类间距离数据，语系类间距离中的最小值数据作为语音数据的语系，从而实现了对所有语音数据语系的准确识别。The distance between language families and classes, including the Indo-European language family distance data between the language family data and the Indo-European language family threshold data, the Semitic language family inter-class data between the language family data and the Semitic language family threshold data, the language family data and the Altaic language family threshold data Between the Altaic language family interclass data, between language family data and Uralic language threshold data, between Uralic language family interclass data, between language family data and Caucasian language threshold data, between Caucasian language family data, language family data and Sino-Tibetan language threshold data between the Sino-Tibetan interclass data and the Dravidian interclass distance between the language data and the Dravidian threshold data. In the above technical solution, it is judged whether the language family recognition of the speech data is successful. When the language family recognition is successful, the subsequent steps are performed; when the language family recognition fails, the language family data between the language family data and the multiple language family threshold data are calculated. The distance data, the minimum value of the distance between language families and classes, is used as the language family of the speech data, thereby realizing accurate identification of all speech data language families.

例如：所获取的语音数据的语系数据为4.65；印欧语系阈值数据为1-2、闪含语系阈值数据为3-4、阿尔泰语系阈值数据为5-6、乌拉尔语系阈值数据为7-8、高加索语系阈值数据为9-10、汉藏语系阈值数据为11-12和德拉维达语系阈值数据为13-14；语音数据的语系数据4.65不在任何一个语系阈值数据中，则识别失败；For example: the language data of the acquired speech data is 4.65; the threshold data of the Indo-European language family is 1-2, the threshold data of the flash language family is 3-4, the threshold data of the Altaic language family is 5-6, and the threshold data of the Ural language family is 7-8 , the threshold data of the Caucasian language family is 9-10, the threshold data of the Sino-Tibetan language family is 11-12, and the threshold data of the Dravidian language family is 13-14; the language family data 4.65 of the speech data is not in any language family threshold data, then the recognition fails. ;

通过计算获取语系数据3.45与印欧语系阈值数据1-2之间的印欧语系类间距离数据为2.65、语系数据与闪含语系阈值数据3-4之间的闪含语系类间数据为0.65、语系数据与阿尔泰语系阈值数据5-6之间的阿尔泰语系类间数据0.35、语系数据与乌拉尔语系阈值数据7-8之间的乌拉尔语系类间数据2.35、语系数据与高加索语系阈值数据9-10之间的高加索语系类间数据4.35、语系数据与汉藏语系阈值数据11-12之间的汉藏语系类间数据6.35和语系数据与德拉维达语系阈值数据13-14之间的德拉维达语系类间距离8.35；语系类间距离中的最小值数据为阿尔泰语系类间数据0.35，则认定该语音数据的语系为阿尔泰语系。The Indo-European language family distance data between the language family data 3.45 and the Indo-European language family threshold data 1-2 obtained by calculation is 2.65, and the flash language family inter-class data between the language family data and the flash language family threshold data 3-4 is 0.65 , between the language family data and the Altaic language family threshold data 5-6 between the Altaic language family data 0.35, between the language family data and the Ural language family threshold data 7-8 between the Ural language family data 2.35, between the language family data and the Caucasian language family threshold data 9- Between 10 Caucasian interclass data 4.35, language data and Sino-Tibetan threshold data between 11-12 Sino-Tibetan interclass data 6.35 and language data and Dravidian threshold data between 13-14 The inter-class distance of the Dravidian language family is 8.35; the minimum data in the inter-class distance of the language family is 0.35 of the inter-class data of the Altaic language family, and the language family of the phonetic data is determined to be the Altaic language family.

在一个实施例中，S106、提取文本数据的关键词数据；具体步骤包括：In one embodiment, S106, extracting the keyword data of the text data; the specific steps include:

对文本数据进行分词处理，获取多个词组；具体包括如下步骤：Perform word segmentation processing on text data to obtain multiple phrases; specifically, the following steps are included:

S201、将文本数据中的第一个字标注为B，S201, marking the first word in the text data as B,

S202、提取文本数据中标注为B的后一个字，并标注为C，同时提取文本数据中中C所对应的字的所有前一个字去重后组成集合D，利用公式(2)判断标注为B的字是否是词语的结束字段；S202: Extract the last word marked B in the text data, and mark it as C, and extract all the previous words of the word corresponding to C in the text data to form a set D after deduplication, and use the formula (2) to judge and mark it as Whether the word of B is the end field of the word;

其中，P1,P₂为中间函数，length(D)为集合D中间的字的个数，P(B)为出现标注为B所对应的字的概率，P(C)为出现标注为C所对应的字的概率，length(all)为文本总长度，P(BC)为标注为B所对应字和标注为C所对应的字同时出现的概率，若最终B＝B则，标注B不变，若B＝E则将标注为B改为标注为E；利用公式(2)，可以在不借助额外的样本数据库的情况下，将所述文本数据进行分词，且对分词进行处理时，在考虑第j个字的时候仅仅需要判断第j+1个字的情况，使判断计算量大幅度减小。Among them, P1 and P2 are intermediate functions, length(D₎ is the number of words in the middle of set D, P(B) is the probability of occurrence of the word marked as B, and P(C) is the occurrence of the word marked as C. The probability of the corresponding word, length(all) is the total length of the text, P(BC) is the probability that the word corresponding to the label B and the word corresponding to the label C appear at the same time, if the final B=B, the label B remains unchanged , if B=E, change the label as B to be labelled as E; using formula (2), the text data can be word-segmented without using an additional sample database, and when the word-segmentation is processed, in When considering the jth word, only the j+1th word needs to be judged, which greatly reduces the amount of judgment and calculation.

S203、判断C是否为最后一个字，若是，则将标注C改为标注E，分词结束；若不是，则将标注为C改为标注为B，重复步骤S202和S203；S203, determine whether C is the last word, if so, change the label C to label E, and the word segmentation ends; if not, change the label C to label B, and repeat steps S202 and S203;

对文本数据分词的步骤为：The steps of tokenizing text data are:

将文本数据的开始阶段和所有标注为E的字段后面增加切割线，则任意两个切割线之间为一个词组，提取所有词组，形成词组向量F1，对词组向量F1去除重复值，形成相应的词组集合F2，则集合F2中的词组则为分词处理后获取的词组，F2中含有词组个数为N个；Add a cutting line to the beginning of the text data and all fields marked as E, then there is a phrase between any two cutting lines, extract all the phrases to form the phrase vector F1, remove the duplicate values for the phrase vector F1, and form the corresponding The phrase set F2, the phrases in the set F2 are phrases obtained after word segmentation processing, and the number of phrases contained in F2 is N;

提取词组中的关键词数据；具体步骤包括：Extract the keyword data in the phrase; the specific steps include:

其中，Q_i为F2中第i个词组的得分，e为自然常熟，lenght(F2_i)为F2中第i个词组的长度，P(F2_i)为F2中第i个词组的长度在向量F1中出现的次数，i＝1、2、3……n；利用公式(3)在求解关键词数据的时候，并不仅仅是对词组进行出现次数最多的情况确认为关键词数据，而且充分的考虑了词组长短，避免了一些单独的语气助词成为关键词数据。Among them, Qi is the score of the ith phrase in F2, e is the natural Changshu, lenght(F2_i ) is the length of the_{ith phrase in F2, P(F2 i}₎ is the length of the ith phrase in F2 in the vector The number of occurrences in F1, i = 1, 2, 3...n; when using formula (3) to solve the keyword data, not only the most frequent occurrence of the phrase is confirmed as the keyword data, but also sufficient The length of the phrase is considered, and some separate modal particles are avoided to become keyword data.

(4) (4)

其中，gjc为最终得到的关键词，find(A)为寻找出A的值所对应的关键词，max()求取最大值；则gjc所对应的词则为确定的关键词数据。通过上述技术方案确定的关键词数据，实现了文本数据不借助任何外界样本数据库的情况下，利用少量的计算获取关键词数据，从而有效地提高了获取关键词数据的效率；上述技术方案中通过公式(2)、(3)和(4)，实现了对文本数据中的关键词数据的获取，并通过步骤S107将关键词数据和关键词语音数据向文本转换数据库传输，从而实现了对文本转换数据库的自动更新，进一步提高了步骤S105的文本转换效率。Among them, gjc is the final keyword, find(A) is the keyword corresponding to the value of A, and max() is the maximum value; the word corresponding to gjc is the determined keyword data. The keyword data determined by the above technical solution realizes that the text data can be obtained by a small amount of calculation without any external sample database, thereby effectively improving the efficiency of obtaining the keyword data; Formulas (2), (3) and (4) realize the acquisition of the keyword data in the text data, and the keyword data and the keyword voice data are transmitted to the text conversion database through step S107, thereby realizing the text conversion database. The automatic update of the conversion database further improves the text conversion efficiency of step S105.

一种语音识别转换系统，如图2所示，包括获取模块21、语系识别模块22、数据库选择模块23、语种识别模块24、文本转换模块25、关键词提取模块26和数据库更新模块27；其中，A speech recognition conversion system, as shown in Figure 2, includes an acquisition module 21, a language recognition module 22, a database selection module 23, a language recognition module 24, a text conversion module 25, a keyword extraction module 26 and a database update module 27; wherein ,

获取模块21，用于获取待识别的语音数据；an acquisition module 21, for acquiring the voice data to be recognized;

语系识别模块22，用于根据多个语系数据库，识别与语音数据所对应的语系；The language family identification module 22 is used to identify the language family corresponding to the speech data according to multiple language family databases;

数据库选择模块23，用于根据语系，从多个语系数据库中获取与语音数据对应的语系数据库；语系数据库，包括多个语种数据子库；The database selection module 23 is used to obtain language databases corresponding to the speech data from a plurality of language databases according to language families; the language database includes a plurality of language data sub-databases;

语种识别模块24，用于从多个语种数据子库中获取与语音数据对应的语种；The language identification module 24 is used for acquiring the language corresponding to the speech data from the multiple language data sub-databases;

文本转换模块25，用于根据文本转换数据库，将语音数据转换为语种对应的文本数据；The text conversion module 25 is used to convert the voice data into text data corresponding to the language according to the text conversion database;

关键词提取模块26，用于提取文本数据的关键词数据；The keyword extraction module 26 is used to extract the keyword data of the text data;

数据库更新模块27，用于获取语音数据中关键词数据对应的关键词语音数据，并将关键词数据和关键词语音数据存储到文本转换数据库中。The database updating module 27 is configured to acquire the keyword voice data corresponding to the keyword data in the voice data, and store the keyword data and the keyword voice data in the text conversion database.

上述系统的工作原理在于：获取模块21将语音数据向语系识别模块22传输；语系识别模块22根据多个语系数据库获取语音数据所对应的语系，并向数据库选择模块23传输；数据库选择模块23，用于根据语系从多个语系数据库中获取语音数据所对应的语系数据库；语种识别模块24根据语系数据库中多个语种数据子库，获取语音数据所对应的语种；文本转换模块25，用于根据文本转换数据库，按照所获取的语种将语音数据转换为文本数据；The working principle of the above-mentioned system is: the acquisition module 21 transmits the voice data to the language family identification module 22; the language family identification module 22 obtains the language family corresponding to the voice data according to a plurality of language family databases, and transmits to the database selection module 23; the database selection module 23, For obtaining the language family database corresponding to the voice data from the multiple language family databases according to the language family; the language identification module 24 obtains the language corresponding to the voice data according to the multiple language data sub-databases in the language family database; the text conversion module 25 is used for according to A text conversion database, which converts voice data into text data according to the acquired language;

关键词提取模块26，用于提取文本数据中的关键词数据；数据库更新模块24，用于根据关键词数据从语音数据中获取与关键词数据相对应的关键词语音数据，并将关键词数据和关键词语音数据向文本转换数据库传输进行存储。The keyword extraction module 26 is used to extract the keyword data in the text data; the database update module 24 is used to obtain the keyword voice data corresponding to the keyword data from the voice data according to the keyword data, and convert the keyword data And the keyword speech data is transmitted to the text conversion database for storage.

上述系统的有益效果在于：通过语系识别模块，实现了对语音数据的语系的获取；通过数据库选择模块和语种识别模块，实现了对语音数据的语种的获取；并通过文本转换模块根据文本转换数据库，实现了将语音数据按照语种转换为文本数据；从而实现了语音识别转化的功能；上述系统通过语种的识别，将所获取的语音数据转化为与语音数据相同语种的文本数据，从而实现了将语音数据转化为文本数据；并且通过多个语系数据库以及语系数据库中的多个语种数据子库，实现了对不同语种的语音数据的转化。通过关键词提取模块，提取所生成的文本数据中的关键词数据；通过数据库更新模块，获取语音数据中关键词数据对应的关键词语音数据，将关键词语音数据和关键词数据向文本转换数据库传输进行存储，从而实现了对文本转换数据库的更新，进一步地提高了系统的音识别转换的效率；解决了传统技术中语音转化时需要人工设置语音转换语种的不便，从而实现了系统对语音数据的语种的自动识别，转化为与语音数据具有相同语种的文本数据。The beneficial effects of the above system are: through the language family identification module, the language family of the speech data is acquired; through the database selection module and the language type identification module, the acquisition of the language type of the speech data is realized; and through the text conversion module according to the text conversion database , realizes the conversion of speech data into text data according to the language; thus realizes the function of speech recognition conversion; the above-mentioned system converts the acquired speech data into text data of the same language as the speech data through language recognition, thereby realizing the The voice data is converted into text data; and the conversion of voice data of different languages is realized through multiple language databases and multiple language data sub-databases in the language database. Through the keyword extraction module, the keyword data in the generated text data is extracted; through the database update module, the keyword voice data corresponding to the keyword data in the voice data is obtained, and the keyword voice data and the keyword data are converted into the text database Transmission and storage, thus realizing the update of the text conversion database, further improving the efficiency of the system's voice recognition conversion; solving the inconvenience of manually setting the voice conversion language in the traditional technology, thus realizing the system's ability to convert the voice data to the inconvenience. The automatic identification of the language is converted into text data with the same language as the speech data.

在一个实施例中，文本转换数据库，包括信息类别识别单元、第一存储区和第二存储区；In one embodiment, the text conversion database includes an information category identification unit, a first storage area and a second storage area;

信息类别识别单元，用于将关键词语音数据向第一存储区传输，还用于将关键词数据向第二存储区传输；第一存储区，用于对关键词语音数据通过第一加密算法运算后进行存储；第二存储区，用于对关键词数据通过第二加密算法运算后进行存储；第一存储区中还存储有关键词语音数据对应的关键词数据的存储地址；The information category identification unit is used to transmit the keyword voice data to the first storage area, and is also used to transmit the keyword data to the second storage area; the first storage area is used to pass the first encryption algorithm to the keyword voice data. The second storage area is used to store the keyword data after the operation of the second encryption algorithm; the storage address of the keyword data corresponding to the keyword voice data is also stored in the first storage area;

第一加密算法或者第二加密算法，包括等值加密算法、对称加密算法中的一种或多种。上述技术方案中通过信息类别识别单元将关键词语音数据和关键词数据分别向第一存储区和第二存储区传输进行存储，并且第一存储区和第二存储区分别采用第一加密算法和第二加密算法对所存储的数据进行加密处理，有效地提高了文本转换数据库的存储数据的安全性。The first encryption algorithm or the second encryption algorithm includes one or more of an equivalent encryption algorithm and a symmetric encryption algorithm. In the above technical scheme, the keyword speech data and the keyword data are respectively transmitted to the first storage area and the second storage area by the information category identification unit for storage, and the first storage area and the second storage area respectively adopt the first encryption algorithm and the second storage area. The second encryption algorithm encrypts the stored data, which effectively improves the security of the stored data in the text conversion database.

显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包含这些改动和变型在内。It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the invention. Thus, provided that these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.