Movatterモバイル変換


[0]ホーム

URL:


CN109446537B - A translation evaluation method and device for machine translation - Google Patents

A translation evaluation method and device for machine translation
Download PDF

Info

Publication number
CN109446537B
CN109446537BCN201811306229.2ACN201811306229ACN109446537BCN 109446537 BCN109446537 BCN 109446537BCN 201811306229 ACN201811306229 ACN 201811306229ACN 109446537 BCN109446537 BCN 109446537B
Authority
CN
China
Prior art keywords
corpus
translation
model
word
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811306229.2A
Other languages
Chinese (zh)
Other versions
CN109446537A (en
Inventor
詹文法
邵志伟
陶鹏程
张振林
刘德阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Fengling Technology Co ltd
Original Assignee
Anqing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anqing Normal UniversityfiledCriticalAnqing Normal University
Priority to CN201811306229.2ApriorityCriticalpatent/CN109446537B/en
Publication of CN109446537ApublicationCriticalpatent/CN109446537A/en
Application grantedgrantedCritical
Publication of CN109446537BpublicationCriticalpatent/CN109446537B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种针对机器翻译的译文评估方法及装置,所述方法包括:获取语料库中的若干条语料,并将每一条语料中包含的上下文词向量的拼接结果;并对所述若干条语料中包含的不同词性的词语的词向量进行初始化;将所述拼接结果以及所述词向量作为CBOW模型的输入,获取训练后的CBOW模型;获取每一条语料的目标词,并使用训练后的CBOW模型进行翻译;获取待评估模型针对所述目标词的译文,并根据所述待评估模型对应的译文与训练后的CBOW模型对应的译文之间的相似度,评估待评估模型译文的准确度。应用本发明实施例,可以自动对译文结果进行准确性评估。

Figure 201811306229

The invention discloses a translation evaluation method and device for machine translation. The method includes: acquiring several pieces of corpus in a corpus, and concatenating the result of contextual word vectors contained in each piece of corpus; and evaluating the several pieces of corpus The word vectors of the words of different parts of speech contained in the corpus are initialized; the splicing result and the word vectors are used as the input of the CBOW model to obtain the trained CBOW model; the target words of each piece of corpus are obtained, and the trained CBOW model for translation; obtain the translation of the model to be evaluated for the target word, and evaluate the accuracy of the translation of the model to be evaluated according to the similarity between the translation corresponding to the model to be evaluated and the translation corresponding to the trained CBOW model . By applying the embodiment of the present invention, the accuracy evaluation of the translation result can be performed automatically.

Figure 201811306229

Description

Translated fromChinese
一种针对机器翻译的译文评估方法及装置A translation evaluation method and device for machine translation

技术领域technical field

本发明涉及一种译文评估方法及装置,更具体涉及一种针对机器翻译的译文评估方法及装置。The present invention relates to a translation evaluation method and device, and more particularly to a translation evaluation method and device for machine translation.

背景技术Background technique

随着现代社会的发展,人类对语言之间的转换需求越来越大。在实际应用中,传统机器翻译以规则为基础,特点是基于语法和语义理论,通过分析上下文的语法搭配关系得到翻译结果。但是由于规则不可能涵盖所有的句子,传统机器翻译大多是句法的直译或句型的转换。With the development of modern society, human beings have more and more needs for switching between languages. In practical applications, traditional machine translation is rule-based, characterized by grammatical and semantic theories, and the translation results are obtained by analyzing the grammatical collocation relationship of the context. However, since the rules cannot cover all sentences, traditional machine translation is mostly a literal translation of syntax or a conversion of sentence patterns.

随着人工智能技术的不断发展,基于神经网络的表示学习技术开始在各个领域崭露头角。尤其在以图像识别和语音识别为主的多个任务上,基于表示学习的方法在性能上均超过了传统的以统计学习为主的方法。现代机器翻译方法是以“双语库”为基础,特点是利用一个包含很多句型的双语语料库,在翻译的时候根据语料库中的句型抽取与所输入句子相类似的例句,然后参照双语句型把源语言转化为目标语言。With the continuous development of artificial intelligence technology, representation learning technology based on neural network has begun to emerge in various fields. Especially in multiple tasks based on image recognition and speech recognition, the performance of the method based on representation learning exceeds the traditional method based on statistical learning. The modern machine translation method is based on the "bilingual corpus", which is characterized by the use of a bilingual corpus containing many sentence patterns. When translating, according to the sentence patterns in the corpus, example sentences similar to the input sentences are extracted, and then refer to the bilingual sentence patterns. Convert source language to target language.

自然语言是人类智慧的抽象表达,很难通过已有的数据结构表示出来。在自然语言处理过程中,数据的基本单位是字或词。类似于“苹果”,既可以表示一种水果,也可以表示“苹果公司”。“麦克风”和“话筒”表示的是一种物品,但从字面上无法建立起正确的联系。因此,目前大多数翻译系统都能将语句的大致意思正确翻译。但是不同语言之间的词、句用法有着显著差别,翻译的结果大多存在语序错误、词语混用、错用等问题。尤其对于长句,机器翻译不能达到更好的准确度,导致现有技术存在翻译的结果仍需人工评估的技术问题。Natural language is an abstract expression of human intelligence, which is difficult to express through existing data structures. In natural language processing, the basic unit of data is a word or word. Similar to "apple", it can mean either a fruit or "apple company". "Microphone" and "microphone" denote an item, but the words do not make the right connection. Therefore, most of the current translation systems can correctly translate the general meaning of the sentence. However, there are significant differences in the usage of words and sentences between different languages, and most of the translation results have problems such as word order errors, mixed use of words, and misuse. Especially for long sentences, machine translation cannot achieve better accuracy, which leads to the technical problem that the translation results still need to be manually evaluated in the existing technology.

发明内容Contents of the invention

本发明所要解决的技术问题在于提供了一种针对机器翻译的译文评估方法及装置,以解决现有技术中存在的翻译的结果仍需人工评估的技术问题。The technical problem to be solved by the present invention is to provide a translation evaluation method and device for machine translation, so as to solve the technical problem in the prior art that the translation results still need manual evaluation.

本发明是通过以下技术方案解决上述技术问题的:The present invention solves the above technical problems through the following technical solutions:

本发明实施例提供了一种针对机器翻译的译文评估方法,所述方法包括:An embodiment of the present invention provides a translation evaluation method for machine translation, the method comprising:

获取语料库中的若干条语料,并将每一条语料中包含的上下文词向量的拼接结果;并对所述若干条语料中包含的不同词性的词语的词向量进行初始化;Obtaining several pieces of corpus in the corpus, and concatenating the result of the context word vectors contained in each piece of corpus; and initializing the word vectors of the words of different parts of speech contained in the several pieces of corpus;

将所述拼接结果以及所述词向量作为CBOW模型的输入,获取训练后的 CBOW模型;Using the splicing result and the word vector as the input of the CBOW model to obtain the trained CBOW model;

获取每一条语料的目标词,并使用训练后的CBOW模型进行翻译;Obtain the target words of each corpus and use the trained CBOW model for translation;

获取待评估模型针对所述目标词的译文,并根据所述待评估模型对应的译文与训练后的CBOW模型对应的译文之间的相似度,评估待评估模型译文的准确度。Obtain the translation of the model to be evaluated for the target word, and evaluate the accuracy of the translation of the model to be evaluated according to the similarity between the translation corresponding to the model to be evaluated and the translation corresponding to the trained CBOW model.

可选的,所述对所述若干条语料中包含的不同词性的词语的词向量进行初始化,包括:Optionally, the initialization of word vectors of words of different parts of speech contained in the several pieces of corpus includes:

分别使用互不重合的取值范围,对所述若干条语料中包含的不同词性的词语的词向量进行初始化。The word vectors of words with different parts of speech included in the several pieces of corpus are initialized by using non-overlapping value ranges respectively.

可选的,在所述将所述拼接结果以及所述词向量作为CBOW模型的输入,获取训练后的CBOW模型之前,所述方法还包括:Optionally, before the splicing result and the word vector are used as the input of the CBOW model to obtain the trained CBOW model, the method also includes:

将每一条语料中除设定的标点符号以外的标点符号去除,其中,设定的标点符号包括:用于表达语料的语气的标点符号、语料结束的标点符号中的一种或组合。The punctuation marks other than the set punctuation marks in each piece of corpus are removed, wherein the set punctuation marks include: one or a combination of punctuation marks used to express the tone of the corpus, and punctuation marks at the end of the corpus.

可选的,所述获取每一条语料的目标词,包括:Optionally, the acquisition of target words for each piece of corpus includes:

利用公式,

Figure BDA0001853681880000031
获取每一条语料的目标词,其中,Using the formula,
Figure BDA0001853681880000031
Obtain the target words of each corpus, where,

P(w|c)为目标词的概率;w为目标词;c为目标词的上下文;exp()为以自然底数为底的指数函数;x为CBOW模型的输入层;∑为求和函数;v为语料库;()T为转置矩阵。P(w|c) is the probability of the target word; w is the target word; c is the context of the target word; exp() is an exponential function with a natural base as the base; x is the input layer of the CBOW model; ∑ is the summation function ; v is the corpus; ()T is the transpose matrix.

可选的,所述语料为单独的句子。Optionally, the corpus is a single sentence.

本发明实施例提供了一种针对机器翻译的译文评估装置,所述装置包括:An embodiment of the present invention provides a translation evaluation device for machine translation, the device includes:

获取模块,用于获取语料库中的若干条语料,并将每一条语料中包含的上下文词向量的拼接结果;并对所述若干条语料中包含的不同词性的词语的词向量进行初始化;The acquisition module is used to obtain several pieces of corpus in the corpus, and splicing results of the context word vectors contained in each piece of corpus; and initialize the word vectors of the words of different parts of speech contained in the several pieces of corpus;

将所述拼接结果以及所述词向量作为CBOW模型的输入,获取训练后的 CBOW模型;Using the splicing result and the word vector as the input of the CBOW model to obtain the trained CBOW model;

获取每一条语料的目标词,并使用训练后的CBOW模型进行翻译;Obtain the target words of each corpus and use the trained CBOW model for translation;

获取待评估模型针对所述目标词的译文,并根据所述待评估模型对应的译文与训练后的CBOW模型对应的译文之间的相似度,评估待评估模型译文的准确度。Obtain the translation of the model to be evaluated for the target word, and evaluate the accuracy of the translation of the model to be evaluated according to the similarity between the translation corresponding to the model to be evaluated and the translation corresponding to the trained CBOW model.

可选的,所述获取模块,用于:Optionally, the acquiring module is used for:

分别使用互不重合的取值范围,对所述若干条语料中包含的不同词性的词语的词向量进行初始化。The word vectors of words with different parts of speech included in the several pieces of corpus are initialized by using non-overlapping value ranges respectively.

可选的,所述装置还包括:去除模块,用于将每一条语料中除设定的标点符号以外的标点符号去除,其中,设定的标点符号包括:用于表达语料的语气的标点符号、语料结束的标点符号中的一种或组合。Optionally, the device further includes: a removal module, configured to remove punctuation marks other than the set punctuation marks in each piece of corpus, wherein the set punctuation marks include: punctuation marks used to express the tone of the corpus , one or a combination of punctuation marks at the end of the corpus.

可选的,所述获取模块,用于:Optionally, the acquiring module is used for:

利用公式,

Figure BDA0001853681880000041
获取每一条语料的目标词,其中,Using the formula,
Figure BDA0001853681880000041
Obtain the target words of each corpus, where,

P(w|c)为目标词的概率;w为目标词;c为目标词的上下文;exp()为以自然底数为底的指数函数;x为CBOW模型的输入层;∑为求和函数;v为语料库;()T为转置矩阵。P(w|c) is the probability of the target word; w is the target word; c is the context of the target word; exp() is an exponential function with a natural base as the base; x is the input layer of the CBOW model; ∑ is the summation function ; v is the corpus; ()T is the transpose matrix.

可选的,所述语料为单独的句子。Optionally, the corpus is a single sentence.

本发明相比现有技术具有以下优点:Compared with the prior art, the present invention has the following advantages:

应用本发明实施例,由于上下文语序对于翻译起到了重要的作用,因此,将每一条语料中包含的上下文词向量的拼接结果,可以得到更加准确的翻译模型,进而可以使用本发明实施例训练的模型对现有技术中的模型的翻译结果进行校对,相对于现有技术中需要人工评估,本发明实施例可以自动对译文结果进行准确性评估。Applying the embodiment of the present invention, since the context word order plays an important role in translation, a more accurate translation model can be obtained by splicing the context word vectors contained in each piece of corpus, and then the training of the embodiment of the present invention can be used The model verifies the translation results of the models in the prior art. Compared with the need for manual evaluation in the prior art, the embodiment of the present invention can automatically evaluate the accuracy of the translation results.

附图说明Description of drawings

图1为本发明实施例提供的一种针对机器翻译的译文评估方法的流程示意图;FIG. 1 is a schematic flowchart of a translation evaluation method for machine translation provided by an embodiment of the present invention;

图2为本发明实施例提供的一种CBOW模型的结构示意图;Fig. 2 is a schematic structural diagram of a CBOW model provided by an embodiment of the present invention;

图3为本发明实施例提供的一种针对机器翻译的译文评估装置的结构示意图。FIG. 3 is a schematic structural diagram of a translation evaluation device for machine translation provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面对本发明的实施例作详细说明,本实施例在以本发明技术方案为前提下进行实施,给出了详细的实施方式和具体的操作过程,但本发明的保护范围不限于下述的实施例。The embodiments of the present invention are described in detail below. This embodiment is implemented on the premise of the technical solution of the present invention, and detailed implementation methods and specific operating procedures are provided, but the protection scope of the present invention is not limited to the following implementation example.

本发明实施例提供了一种针对机器翻译的译文评估方法及装置,下面首先就本发明实施例提供的一种针对机器翻译的译文评估方法进行介绍。Embodiments of the present invention provide a translation evaluation method and device for machine translation. The translation evaluation method for machine translation provided by the embodiment of the present invention will first be introduced below.

图1为本发明实施例提供的一种针对机器翻译的译文评估方法的流程示意图,如图1所示,所述方法包括:Fig. 1 is a schematic flow chart of a translation evaluation method for machine translation provided by an embodiment of the present invention. As shown in Fig. 1, the method includes:

S101:获取语料库中的若干条语料,并将每一条语料中包含的上下文词向量的拼接结果;并对所述若干条语料中包含的不同词性的词语的词向量进行初始化;S101: Obtain several pieces of corpus in the corpus, and concatenate the context word vectors contained in each piece of corpus; and initialize the word vectors of words with different parts of speech contained in the several pieces of corpus;

具体的,可以分别使用互不重合的取值范围,对所述若干条语料中包含的不同词性的词语的词向量进行初始化。所述语料为单独的句子。Specifically, value ranges that do not overlap with each other may be used to initialize the word vectors of words with different parts of speech included in the several pieces of corpus. The corpus is a single sentence.

示例性的,可以从大规模语料库中学习建立语言模型。由于语言模型的好坏直接影响到对句子正确性的判断,所以选取合适的语料较为重要。中文语料可以选取维基百科中文词条进行建模。Exemplarily, a language model can be learned from a large-scale corpus. Since the quality of the language model directly affects the judgment of the correctness of the sentence, it is more important to select the appropriate corpus. Chinese corpus can be modeled by selecting Wikipedia Chinese entries.

S102:将所述拼接结果以及所述词向量作为CBOW模型的输入,获取训练后的CBOW模型;S102: Using the splicing result and the word vector as the input of the CBOW model to obtain the trained CBOW model;

图2为本发明实施例提供的一种CBOW模型的结构示意图,如图2所示, CBOW模型(Continuous Bag of Words,连续词袋模型)包括:输入层x和输出层y。输入层接收不同的短语,进行翻译后由输出层输出。FIG. 2 is a schematic structural diagram of a CBOW model provided by an embodiment of the present invention. As shown in FIG. 2 , the CBOW model (Continuous Bag of Words, continuous bag of words model) includes: an input layer x and an output layer y. The input layer receives different phrases, which are translated and output by the output layer.

S103:获取每一条语料的目标词,并使用训练后的CBOW模型进行翻译。S103: Obtain the target words of each piece of corpus, and use the trained CBOW model to translate.

具体的,可以利用公式,

Figure BDA0001853681880000051
获取每一条语料的目标词,其中,Specifically, the formula can be used,
Figure BDA0001853681880000051
Obtain the target words of each corpus, where,

P(w|c)为目标词的概率;w为目标词;c为目标词的上下文;exp()为以自然底数为底的指数函数;x为CBOW模型的输入层;∑为求和函数;v为语料库;()T为转置矩阵。P(w|c) is the probability of the target word; w is the target word; c is the context of the target word; exp() is an exponential function with a natural base as the base; x is the input layer of the CBOW model; ∑ is the summation function ; v is the corpus; ()T is the transpose matrix.

(w,c)为从语料中选出的一个n元短语wi-(n-1)/2,...,wi+(n-1)/2,一般n选奇数,可以保证上下文的词语数量一致。(w,c) is an n-gram phrase wi-(n-1)/2 ,...,wi+(n-1)/2 selected from the corpus. Generally, n is an odd number, which can ensure the context The number of words is the same.

模型的优化目标可以:The optimization objective of the model can be:

Figure BDA0001853681880000061
其中,
Figure BDA0001853681880000061
in,

D为语料库。D is the corpus.

S104:获取待评估模型针对所述目标词的译文,并根据所述待评估模型对应的译文与训练后的CBOW模型对应的译文之间的相似度,评估待评估模型译文的准确度。S104: Obtain the translation of the model to be evaluated for the target word, and evaluate the accuracy of the translation of the model to be evaluated according to the similarity between the translation corresponding to the model to be evaluated and the translation corresponding to the trained CBOW model.

在实际应用中,对于一句译文,利用滑动窗口进行多次判断。例如:窗口大小为5,分别以译文的第1,2,…个词为中间词进行判断。每次判断得到一个相似度值,再计算相似度的平均值,最后得到的相似度为对这句译文的打分值,打分值越高说明译文的正确性越高。In practical applications, for a sentence of translation, the sliding window is used to make multiple judgments. For example: the window size is 5, and the first, second, ... words of the translation are judged as intermediate words. A similarity value is obtained for each judgment, and then the average value of the similarity is calculated. The final similarity is the scoring value of the translation. The higher the scoring value, the higher the correctness of the translation.

应用本发明图1所示实施例,由于上下文语序对于翻译起到了重要的作用,因此,将每一条语料中包含的上下文词向量的拼接结果,可以得到更加准确的翻译模型,进而可以使用本发明实施例训练的模型对现有技术中的模型的翻译结果进行校对,相对于现有技术中需要人工评估,本发明实施例可以自动对译文结果进行准确性评估。Apply the embodiment shown in Fig. 1 of the present invention, because the context word order has played an important role for translation, therefore, the splicing result of the context word vector contained in each corpus can obtain a more accurate translation model, and then the present invention can be used The model trained in the embodiment verifies the translation result of the model in the prior art. Compared with the need for manual evaluation in the prior art, the embodiment of the present invention can automatically evaluate the accuracy of the translation result.

具体的在本发明实施例的一种具体实施方式中,在S102步骤之前,所述方法还包括:Specifically, in a specific implementation manner of the embodiment of the present invention, before step S102, the method further includes:

将每一条语料中除设定的标点符号以外的标点符号去除,其中,设定的标点符号包括:用于表达语料的语气的标点符号、语料结束的标点符号中的一种或组合。The punctuation marks other than the set punctuation marks in each piece of corpus are removed, wherein the set punctuation marks include: one or a combination of punctuation marks used to express the tone of the corpus, and punctuation marks at the end of the corpus.

在训练模型前,处理语料库时,对于特殊符号予以去除,保留对模型有用的标点符号。例如:句号、感叹号、问号等。Before training the model, when processing the corpus, remove the special symbols and keep the punctuation marks that are useful to the model. For example: period, exclamation point, question mark, etc.

本发明通过改进语言模型,增加了词序、词性、标点符号等语句信息,提高了语言模型的表示能力,可以表示更加复杂的语句。通过语言模型的改进,结合机器翻译,可以判断机器翻译译文的正确性,提高机器翻译的准确率。By improving the language model, the present invention adds sentence information such as word order, part of speech, punctuation marks, etc., improves the representation ability of the language model, and can represent more complex sentences. Through the improvement of the language model, combined with machine translation, the correctness of the machine translation translation can be judged and the accuracy of machine translation can be improved.

用于本发明图1所示实施例相对应,本发明实施例还提供了一种针对机器翻译的译文评估装置。Corresponding to the embodiment shown in FIG. 1 of the present invention, the embodiment of the present invention also provides a translation evaluation device for machine translation.

图3为本发明实施例提供的一种针对机器翻译的译文评估装置的结构示意图,如图3所示,所述装置包括:Fig. 3 is a schematic structural diagram of a translation evaluation device for machine translation provided by an embodiment of the present invention. As shown in Fig. 3, the device includes:

获取模块301,用于获取语料库中的若干条语料,并将每一条语料中包含的上下文词向量的拼接结果;并对所述若干条语料中包含的不同词性的词语的词向量进行初始化;Theacquisition module 301 is used to acquire several pieces of corpus in the corpus, and concatenate the result of the context word vectors contained in each piece of corpus; and initialize the word vectors of the words of different parts of speech contained in the several pieces of corpus;

将所述拼接结果以及所述词向量作为CBOW模型的输入,获取训练后的 CBOW模型;Using the splicing result and the word vector as the input of the CBOW model to obtain the trained CBOW model;

获取每一条语料的目标词,并使用训练后的CBOW模型进行翻译;Obtain the target words of each corpus and use the trained CBOW model for translation;

获取待评估模型针对所述目标词的译文,并根据所述待评估模型对应的译文与训练后的CBOW模型对应的译文之间的相似度,评估待评估模型译文的准确度。Obtain the translation of the model to be evaluated for the target word, and evaluate the accuracy of the translation of the model to be evaluated according to the similarity between the translation corresponding to the model to be evaluated and the translation corresponding to the trained CBOW model.

应用本发明图1所示实施例,由于上下文语序对于翻译起到了重要的作用,因此,将每一条语料中包含的上下文词向量的拼接结果,可以得到更加准确的翻译模型,进而可以使用本发明实施例训练的模型对现有技术中的模型的翻译结果进行校对,相对于现有技术中需要人工评估,本发明实施例可以自动对译文结果进行准确性评估。Apply the embodiment shown in Fig. 1 of the present invention, because the context word order has played an important role for translation, therefore, the splicing result of the context word vector contained in each corpus can obtain a more accurate translation model, and then the present invention can be used The model trained in the embodiment verifies the translation result of the model in the prior art. Compared with the need for manual evaluation in the prior art, the embodiment of the present invention can automatically evaluate the accuracy of the translation result.

在本发明实施例的一种具体实施方式中,所述获取模块301,用于:In a specific implementation manner of the embodiment of the present invention, theacquisition module 301 is configured to:

分别使用互不重合的取值范围,对所述若干条语料中包含的不同词性的词语的词向量进行初始化。The word vectors of words with different parts of speech included in the several pieces of corpus are initialized by using non-overlapping value ranges respectively.

在本发明实施例的一种具体实施方式中,所述获取模块301,用于:所述装置还包括:去除模块,用于将每一条语料中除设定的标点符号以外的标点符号去除,其中,设定的标点符号包括:用于表达语料的语气的标点符号、语料结束的标点符号中的一种或组合。In a specific implementation manner of the embodiment of the present invention, theacquisition module 301 is configured to: the device further includes: a removal module configured to remove punctuation marks other than the set punctuation marks in each piece of corpus, Wherein, the set punctuation marks include: one or a combination of punctuation marks used to express the mood of the corpus, and punctuation marks at the end of the corpus.

在本发明实施例的一种具体实施方式中,所述获取模块301,用于:利用公式,

Figure BDA0001853681880000081
获取每一条语料的目标词,其中,In a specific implementation manner of the embodiment of the present invention, theacquisition module 301 is configured to: use a formula,
Figure BDA0001853681880000081
Obtain the target words of each corpus, where,

P(w|c)为目标词的概率;w为目标词;c为目标词的上下文;exp()为以自然底数为底的指数函数;x为CBOW模型的输入层;∑为求和函数;v为语料库;()T为转置矩阵。P(w|c) is the probability of the target word; w is the target word; c is the context of the target word; exp() is an exponential function with a natural base as the base; x is the input layer of the CBOW model; ∑ is the summation function ; v is the corpus; ()T is the transpose matrix.

在本发明实施例的一种具体实施方式中,所述语料为单独的句子。In a specific implementation manner of the embodiment of the present invention, the corpus is a single sentence.

以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. within range.

Claims (10)

Translated fromChinese
1.一种针对机器翻译的译文评估方法,其特征在于,所述方法包括:1. A translation evaluation method for machine translation, characterized in that the method comprises:获取语料库中的若干条语料,并将每一条语料中包含的上下文词向量的拼接结果;并对所述若干条语料中包含的不同词性的词语的词向量进行初始化;Obtaining several pieces of corpus in the corpus, and concatenating the result of the context word vectors contained in each piece of corpus; and initializing the word vectors of the words of different parts of speech contained in the several pieces of corpus;将所述拼接结果以及所述词向量作为CBOW模型的输入,获取训练后的CBOW模型;Using the splicing result and the word vector as the input of the CBOW model to obtain the trained CBOW model;获取每一条语料的目标词,并使用训练后的CBOW模型进行翻译;Obtain the target words of each corpus and use the trained CBOW model for translation;获取待评估模型针对所述目标词的译文,并根据所述待评估模型对应的译文与训练后的CBOW模型对应的译文之间的相似度,评估待评估模型译文的准确度。Obtain the translation of the model to be evaluated for the target word, and evaluate the accuracy of the translation of the model to be evaluated according to the similarity between the translation corresponding to the model to be evaluated and the translation corresponding to the trained CBOW model.2.根据权利要求1所述的一种针对机器翻译的译文评估方法,其特征在于,所述对所述若干条语料中包含的不同词性的词语的词向量进行初始化,包括:2. a kind of translation evaluation method for machine translation according to claim 1, is characterized in that, described word vector of the different parts of speech words that comprises in described several pieces of corpus is initialized, comprises:分别使用互不重合的取值范围,对所述若干条语料中包含的不同词性的词语的词向量进行初始化。The word vectors of words with different parts of speech included in the several pieces of corpus are initialized by using non-overlapping value ranges respectively.3.根据权利要求1所述的一种针对机器翻译的译文评估方法,其特征在于,在所述将所述拼接结果以及所述词向量作为CBOW模型的输入,获取训练后的CBOW模型之前,所述方法还包括:3. a kind of translation evaluation method for machine translation according to claim 1, is characterized in that, before described splicing result and described word vector are used as the input of CBOW model, obtain the CBOW model after training, The method also includes:将每一条语料中除设定的标点符号以外的标点符号去除,其中,设定的标点符号包括:用于表达语料的语气的标点符号、语料结束的标点符号中的一种或组合。The punctuation marks other than the set punctuation marks in each piece of corpus are removed, wherein the set punctuation marks include: one or a combination of punctuation marks used to express the tone of the corpus, and punctuation marks at the end of the corpus.4.根据权利要求1所述的一种针对机器翻译的译文评估方法,其特征在于,所述获取每一条语料的目标词,包括:4. a kind of translation evaluation method for machine translation according to claim 1, is characterized in that, the target word of described obtaining each corpus, comprises:利用公式,
Figure FDA0001853681870000011
获取每一条语料的目标词,其中,Using the formula,
Figure FDA0001853681870000011
Obtain the target words of each corpus, where,P(w|c)为目标词的概率;w为目标词;c为目标词的上下文;exp()为以自然底数为底的指数函数;x为CBOW模型的输入层;∑为求和函数;v为语料库;()T为转置矩阵。P(w|c) is the probability of the target word; w is the target word; c is the context of the target word; exp() is an exponential function with a natural base as the base; x is the input layer of the CBOW model; ∑ is the summation function ; v is the corpus; ()T is the transpose matrix.5.根据权利要求1所述的一种针对机器翻译的译文评估方法,其特征在于,所述语料为单独的句子。5. A translation evaluation method for machine translation according to claim 1, wherein the corpus is a separate sentence.6.一种针对机器翻译的译文评估装置,其特征在于,所述装置包括:6. A translation evaluation device for machine translation, characterized in that the device comprises:获取模块,用于获取语料库中的若干条语料,并将每一条语料中包含的上下文词向量的拼接结果;并对所述若干条语料中包含的不同词性的词语的词向量进行初始化;The acquisition module is used to obtain several pieces of corpus in the corpus, and splicing results of the context word vectors contained in each piece of corpus; and initialize the word vectors of the words of different parts of speech contained in the several pieces of corpus;将所述拼接结果以及所述词向量作为CBOW模型的输入,获取训练后的CBOW模型;Using the splicing result and the word vector as the input of the CBOW model to obtain the trained CBOW model;获取每一条语料的目标词,并使用训练后的CBOW模型进行翻译;Obtain the target words of each corpus and use the trained CBOW model for translation;获取待评估模型针对所述目标词的译文,并根据所述待评估模型对应的译文与训练后的CBOW模型对应的译文之间的相似度,评估待评估模型译文的准确度。Obtain the translation of the model to be evaluated for the target word, and evaluate the accuracy of the translation of the model to be evaluated according to the similarity between the translation corresponding to the model to be evaluated and the translation corresponding to the trained CBOW model.7.根据权利要求6所述的一种针对机器翻译的译文评估装置,其特征在于,所述获取模块,用于:7. A kind of translation evaluation device for machine translation according to claim 6, characterized in that, the acquisition module is used for:分别使用互不重合的取值范围,对所述若干条语料中包含的不同词性的词语的词向量进行初始化。The word vectors of words with different parts of speech included in the several pieces of corpus are initialized by using non-overlapping value ranges respectively.8.根据权利要求6所述的一种针对机器翻译的译文评估装置,其特征在于,所述装置还包括:去除模块,用于将每一条语料中除设定的标点符号以外的标点符号去除,其中,设定的标点符号包括:用于表达语料的语气的标点符号、语料结束的标点符号中的一种或组合。8. A kind of translation evaluation device for machine translation according to claim 6, characterized in that, said device also includes: a removal module for removing punctuation marks other than the set punctuation marks in each piece of corpus , wherein the set punctuation marks include: one or a combination of punctuation marks used to express the tone of the corpus, and punctuation marks at the end of the corpus.9.根据权利要求6所述的一种针对机器翻译的译文评估装置,其特征在于,所述获取模块,用于:9. A kind of translation evaluation device for machine translation according to claim 6, characterized in that, the acquisition module is used for:利用公式,
Figure FDA0001853681870000031
获取每一条语料的目标词,其中,
Using the formula,
Figure FDA0001853681870000031
Obtain the target words of each corpus, where,
P(w|c)为目标词的概率;w为目标词;c为目标词的上下文;exp()为以自然底数为底的指数函数;x为CBOW模型的输入层;∑为求和函数;v为语料库;()T为转置矩阵。P(w|c) is the probability of the target word; w is the target word; c is the context of the target word; exp() is an exponential function with a natural base as the base; x is the input layer of the CBOW model; ∑ is the summation function ; v is the corpus; ()T is the transpose matrix.
10.根据权利要求6所述的一种针对机器翻译的译文评估装置,其特征在于,所述语料为单独的句子。10. A translation evaluation device for machine translation according to claim 6, wherein the corpus is a single sentence.
CN201811306229.2A2018-11-052018-11-05 A translation evaluation method and device for machine translationActiveCN109446537B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201811306229.2ACN109446537B (en)2018-11-052018-11-05 A translation evaluation method and device for machine translation

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201811306229.2ACN109446537B (en)2018-11-052018-11-05 A translation evaluation method and device for machine translation

Publications (2)

Publication NumberPublication Date
CN109446537A CN109446537A (en)2019-03-08
CN109446537Btrue CN109446537B (en)2022-11-25

Family

ID=65550840

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201811306229.2AActiveCN109446537B (en)2018-11-052018-11-05 A translation evaluation method and device for machine translation

Country Status (1)

CountryLink
CN (1)CN109446537B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111274827B (en)*2020-01-202021-05-28南京新一代人工智能研究院有限公司Suffix translation method based on multi-target learning of word bag

Citations (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105808530A (en)*2016-03-232016-07-27苏州大学Translation method and device in statistical machine translation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8818790B2 (en)*2010-04-062014-08-26Samsung Electronics Co., Ltd.Syntactic analysis and hierarchical phrase model based machine translation system and method
US9779085B2 (en)*2015-05-292017-10-03Oracle International CorporationMultilingual embeddings for natural language processing

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105808530A (en)*2016-03-232016-07-27苏州大学Translation method and device in statistical machine translation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于语义分布相似度的翻译模型领域自适应研究;姚亮等;《山东大学学报(理学版)》;20160531(第07期);全文*
融合先验信息的蒙汉神经网络机器翻译模型;樊文婷等;《中文信息学报》;20180615(第06期);全文*

Also Published As

Publication numberPublication date
CN109446537A (en)2019-03-08

Similar Documents

PublicationPublication DateTitle
CN109359294B (en)Ancient Chinese translation method based on neural machine translation
Orosz et al.PurePos 2.0: a hybrid tool for morphological disambiguation
CN109840331B (en) A Neural Machine Translation Method Based on User Dictionary
CN109960804B (en)Method and device for generating topic text sentence vector
Patel et al.ES2ISL: an advancement in speech to sign language translation using 3D avatar animator
CN112765345A (en)Text abstract automatic generation method and system fusing pre-training model
WO2021139108A1 (en)Intelligent emotion recognition method and apparatus, electronic device, and storage medium
CN109325229B (en)Method for calculating text similarity by utilizing semantic information
JP7335300B2 (en) Knowledge pre-trained model training method, apparatus and electronic equipment
CN108549637A (en)Method for recognizing semantics, device based on phonetic and interactive system
CN115858758A (en)Intelligent customer service knowledge graph system with multiple unstructured data identification
CN103678285A (en)Machine translation method and machine translation system
CN110096705B (en) An unsupervised automatic simplification algorithm for English sentences
CN108363704A (en)A kind of neural network machine translation corpus expansion method based on statistics phrase table
Jayaweera et al.Hidden markov model based part of speech tagger for sinhala language
Dien et al.POS-tagger for English-Vietnamese bilingual corpus
Satapathy et al.Phonsenticnet: A cognitive approach to microtext normalization for concept-level sentiment analysis
CN115329784A (en)Sentence rephrasing generation system based on pre-training model
CN103678270B (en)Semantic primitive abstracting method and semantic primitive extracting device
Almansor et al.Transferring informal text in Arabic as low resource languages: state-of-the-art and future research directions
CN109446537B (en) A translation evaluation method and device for machine translation
Moore et al.Incremental dependency parsing and disfluency detection in spoken learner English
Bhat et al.Disco: A large scale human annotated corpus for disfluency correction in indo-european languages
CN111090720B (en)Hot word adding method and device
CN117149987B (en) Training method and device for multilingual dialogue state tracking model

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
TR01Transfer of patent right
TR01Transfer of patent right

Effective date of registration:20250120

Address after:201306 building C, No. 888, Huanhu West 2nd Road, Lingang New District, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Patentee after:Shanghai Xuncha Technology Co.,Ltd.

Country or region after:China

Address before:246133 1318 Jixian North Road, Anqing, Anhui

Patentee before:ANQING NORMAL University

Country or region before:China

TR01Transfer of patent right
TR01Transfer of patent right

Effective date of registration:20250126

Address after:Unit 1215, 12th Floor, Building 2, No. 87 West Road, Building Materials City, Changping District, Beijing 102200

Patentee after:Beijing Fengling Technology Co.,Ltd.

Country or region after:China

Address before:201306 building C, No. 888, Huanhu West 2nd Road, Lingang New District, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Patentee before:Shanghai Xuncha Technology Co.,Ltd.

Country or region before:China


[8]ページ先頭

©2009-2025 Movatter.jp