Movatterモバイル変換


[0]ホーム

URL:


CN109492232A - A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer - Google Patents

A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer
Download PDF

Info

Publication number
CN109492232A
CN109492232ACN201811231017.2ACN201811231017ACN109492232ACN 109492232 ACN109492232 ACN 109492232ACN 201811231017 ACN201811231017 ACN 201811231017ACN 109492232 ACN109492232 ACN 109492232A
Authority
CN
China
Prior art keywords
layer
similarity
semantic
mongolian
sublayer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811231017.2A
Other languages
Chinese (zh)
Inventor
苏依拉
张振
高芬
王宇飞
孙晓骞
牛向华
赵亚平
卞乐乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia University of Technology
Original Assignee
Inner Mongolia University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia University of TechnologyfiledCriticalInner Mongolia University of Technology
Priority to CN201811231017.2ApriorityCriticalpatent/CN109492232A/en
Publication of CN109492232ApublicationCriticalpatent/CN109492232A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本文提出了一种基于Transformer模型的增强语义特征信息的蒙汉机器翻译方法。首先,本发明从蒙古文的语言特点出发,找出其在词干、词缀以及格的附加成分的特征,并将这些语言特征融入到模型的训练之中。其次,本发明以衡量两个单词间的相似程度的分布式表示为研究背景,综合分析了深度和密度、语义重合度对概念语义相似度的影响。本发明在翻译过程中,采用Transformer模型,所述Transformer模型为利用三角函数进行位置编码并基于增强型多头注意力机制构建的多层编码器‑解码器架构,从而完全依赖于注意力机制来绘制输入和输出之间的全局依赖关系,消除递归和卷积。

This paper proposes a Mongolian-Chinese machine translation method based on the Transformer model to enhance semantic feature information. First, the present invention starts from the language characteristics of Mongolian, finds the characteristics of the additional components of stems, affixes and cases, and integrates these language characteristics into the training of the model. Secondly, the present invention takes the distributed representation of measuring the similarity between two words as the research background, and comprehensively analyzes the influence of depth, density, and semantic coincidence on the semantic similarity of concepts. In the translation process of the present invention, the Transformer model is adopted, and the Transformer model is a multi-layer encoder-decoder architecture constructed based on an enhanced multi-head attention mechanism for positional encoding using trigonometric functions, so as to completely rely on the attention mechanism to draw Global dependencies between input and output, eliminating recursion and convolution.

Description

A kind of illiteracy Chinese machine translation of the enhancing semantic feature information based on TransformerMethod
Technical field
The invention belongs to machine translation mothod field, in particular to a kind of enhancing semantic feature based on TransformerThe illiteracy Chinese machine translation method of information.
Background technique
Mongol is a kind of agglutinative language, is under the jurisdiction of Altai family.Mongolian written has traditional Mongolian and WestThat Mongolian, " illiteracy " in illiteracy Chinese translation system that we are studied here refer to the translation of traditional Mongolian to Chinese.It passesMongolian of uniting is also a kind of alphabetic writing, and alphabetical form is not unique, position phase of the variation of form with letter in wordIt closes, position includes that the independent of word starts, in word and suffix.The word of Mongolian is by root (root)+affixe (suffix) sideFormula is formed, and affixe is divided into two classes: one kind assigns original word for sewing to be connected to new meaning behind root, be called derivativeSew, sews behind root and connect one or more derivational suffixes just and will form stem (stem);It is another kind of sew to be connected to behind stem be used forExpress grammatical meaning.All there are a variety of variations such as tense, number, lattice in noun, the verb of Mongolian, these variations are again by sewingAffixe is connect to realize, therefore Mongolian morphological change is extremely complex.In addition, the word order of Mongolian and Chinese have very big difference,The verb of Mongolian is behind subject and predicate, and positioned at the end of sentence, and verb is between subject and object in Chinese.
A dimension difference of vector is only used with one-hot expression, the distributed of word indicates, uses the dense reality of low-dimensionalNumber vector indicates word.In the low-dimensional vector space, can be convenient according to distance or angle isometry mode, measure twoSimilarity degree between a word.In addition, on technological layer, under the background studied statistical language model, GoogleCompany has opened Word2vec in 2013, and this is a for training the software tool of term vector.Word2vec can be according to givenCorpus, by optimization after training pattern a word is fast and effeciently expressed as vector form, be natural language atThe application study in reason field provides new tool.However, Word2vec relies on skip-grams or continuous bag of words (CBOW) are comeEstablish neural word insertion.But word2vec realizes when semantic relevancy calculates there is certain limitation at present, on the one hand usesFoundation of the local context information of translation to be generated as prediction translation, not using global contextual information, so rightContextual information using insufficient, there is also rooms for promotion for the extraction of semantic feature.On the other hand, due to the knot of frame itselfStructure limits the parallelization of calculating, and computational efficiency is up for improving.
Traditional machine translation system, it is most of be based on Recognition with Recurrent Neural Network (RNN), shot and long term memory (LSTM) orGate recurrent neural network (GRU).These methods have become the Series Modelings such as machine translation in the past few years and conversion is askedInscribe state-of-the-art method.However recursive models usually consider the calculating along the character position for outputting and inputting sequence.By position withThe step alignment in the time is calculated, they generate a series of hidden state h in position t inputt, while being also previously to hide shapeState ht-1Function.This intrinsic sequential nature eliminates the parallelization in training example, and parallelization is in longer sequence lengthIn become most important because memory restrict crosses over exemplary batch processing.Nearest work is by decomposing skill and baseSignificantly improving for computational efficiency is realized in the calculating of condition, while also improving model performance in the latter case.However,The basic constraint that sequence calculates still has.
Current encoder device-decoder chassis is a main model for solving the problems, such as sequence to sequence.Model uses codingDevice carries out compression expression to source language sentence, generates target language sentence based on the compression expression of source using decoder.The knotThe benefit of structure can be achieved on the modeling of end-to-end mode between two sentences, and all parametric variables are unified to one in modelIt is trained under objective function, model performance is preferable.Fig. 1 illustrates the structure of coder-decoder model, is Down-Up oneThe process of a machine translation.
Encoder and decoder can select the neural network of different structure, such as RNN, CNN.The working method of RNN isTo sequence according to time step, compression expression is successively carried out.When using RNN, two-way RNN structure generally will use.SpecificallyMode is using a RNN to the compression expression of element progress from left to right in sequence, another RNN carries out from the right side sequenceCompression expression to the left.Two kinds indicate to be joined together using the distribution as ultimate sequence indicates.In the structure, due toTo be handled in order the element in sequence, the interaction distance between two words may be considered between them it is opposite away fromFrom.With the growth of sentence, the increase of relative distance, there is the apparent theoretical upper limit to the processing of information.
When using CNN structure, the structure of multilayer is generally used, Lai Shixian sequence is partially illustrated the mistake of global expressionJourney.The viewpoint that can regard a kind of time series as using RNN modeling sentence can regard a kind of knot as using CNN modeling sentenceThe viewpoint of structure.Sequence using RNN structure mainly includes RNNSearch, GNMT etc. to series model, uses CNN structureSequence mainly has ConvS2S etc. to series model, and what is embodied is a kind of from part to global feature extraction process, between wordInteraction distance, corresponding thereto apart from directly proportional.It can only meet on higher CNN node apart from farther away word, just generate friendshipMutually, this process may have more information and lose.
Summary of the invention
In order to overcome the disadvantages of the above prior art, the purpose of the present invention is to provide a kind of based on Transformer'sEnhance the illiteracy Chinese machine translation method of semantic feature information, the system based entirely on attention mechanism, completely eliminate recurrence andConvolution.Experiment shows that the system is more superior in quality, while being easier to parallelization, and the less time is needed to be instructedPractice, reaches 45.4BLEU in the translation duties of 120 Wan Menghan Parallel Corpus, realize higher translation quality.
To achieve the goals above, the technical solution adopted by the present invention is that: a kind of enhancing based on Transformer is semanticThe illiteracy Chinese machine translation method of characteristic information, which is characterized in that Transformer model is used in translation process, it is describedTransformer model is to carry out multilayer position encoded and based on enhanced bull attention mechanism construction using trigonometric functionCoder-decoder framework, thus place one's entire reliance upon attention mechanism to draw the global dependence between outputting and inputting,Eliminate recurrence and convolution.
Before translation, feature is preferably extracted for the ease of deep learning neural network, first data is pre-processed,Described to carry out pretreatment to data be to carry out cutting separation to the supplementary element of stem, affixe and lattice in Mongolian corpus, with dropThe sparsity of low data, while character segmentation processing is carried out to Chinese, Mongolian is found out in stem, affixe with the supplementary element passedLanguage feature, and these language features are dissolved among training.
The cutting separation includes the additional of the affixe cutting of small grain size, the stem cutting of big granularity and small-scale latticeIngredient cutting.
After being pre-processed to data, the influence of comprehensive depth, density, semantic registration to Concept Semantic Similarity, collectionSimilarity matrix is established at the similarity algorithm of semantic distance and the information content, then carries out principal component analysis, by similarity momentBattle array is converted into principal component transform matrix, calculates principal component contributor rate, and be weighted processing as weight, obtains finalConcept Semantic Similarity.
The formula of the similarity matrix is expressed as
Xsim=(xi1,xi2,xi3,xi4,xi5)T, i=1,2,3 ..., n
The final Concept Semantic Similarity calculates representation formula
δsim=r1ysim1+r2ysim2+r3ysim3+r4ysim4+r5ysim5
Wherein, XsimIndicate similarity matrix, xi1Indicate Ds,xi2It indicatesxi3Indicate Zs,xi4Indicate Ss,xi5Indicate Is,N is to be compared concept to the logarithm of the notional word in set, xi=(Dsi,Ksi,Zsi,Ssi,Isi), based onA vector in ingredient input sample set, wherein respectively representing each section in comprehensive similarity computing module per one-dimensional variableSemantic Similarity Measurement as a result, DsiIndicate the relationship in vector between the semantic distance and similarity of i-th dimension element, KsiTableShow the semantic similarity in vector in terms of the depth of i-th dimension element, ZsiIndicate the density of the notional word c of i-th dimension element in vectorImpact factor, SsiIndicate the similarity in vector in terms of the semantic registration of i-th dimension element, IsiIndicate i-th dimension element in vectorThe information content in terms of similarity;δsimIndicate Concept Semantic Similarity, ysim1,ysim2,ysim3,ysim4,ysim5For to similarityMatrix XsimCarry out the principal component that principal component analysis is extracted, r1,r2,r3,r4,r5Indicate each principal component contributor rate.
The bull attention mechanism is described as inquiry and one group of key-value pair is mapped to output, wherein inquiry, key, value and defeatedIt is all vector out, output is calculated as the weighted sum of value, distributes to the weight of each value by inquiring the compatibility with corresponding secret keyFunction is calculated.
The encoder is made of N number of identical layer, sublayer there are two every layer, and first sublayer is bull attentionLayer, second sublayer are propagated forward sublayers, and each sublayer is output and input there is residual error connection, after each sublayerFace follows a step regularization to operate, to accelerate model convergence;
The decoder is made of N number of identical layer, and every layer there are three sublayers, and first sublayer is mask matrix majorizationBull attention sublayer, for modeling the target side sentence generated, during training, with a mask matrix majorizationEach bull attention only calculates when calculating and arrives preceding t-1 word;Second sublayer is bull attention sublayer, is encoder reconciliationAttention mechanism between code device, that is, go in original language to look for relevant semantic information;Third sublayer is propagated forward sublayer, withPropagated forward sublayer in encoder is completely the same, and each sublayer outputs and inputs that there is residual error connections, and heel oneRegularization operation is walked, to accelerate model convergence.
Multilevel encoder-decoder architecture is constructed by the following method:
In encoder, the output of each sublayer is LayerNorm (x+Sublayer (x)), and wherein LayerNorm () is indicatedLayer normalized function, the function that Sublayer () is realized using the sublayer itself that the residual error based on bull attention mechanism connects,X indicates the current layer vector to be inputted, and Mongolian sentence is generated corresponding vector using word2vec vector techniques, is then madeFor the input of the first layer coder, i.e. Sublayer (x) is the function of being realized by the sublayer itself based on bull attention mechanism,In order to promote residual error to connect, all sublayers and embeding layer generate dimension dmodel=512 output.
The propagated forward sublayer of the encoder has a linear transformation twice in realizing, a Relu nonlinear activation, specificallyCalculation formula is as follows:
FFN (x)=γ (0, xW1+b1)W2+b2
X presentation code device inputs information, W1Indicate the corresponding weight of input vector, b1Indicate the inclined of bull attention mechanismThe factor is set, (0, xW1+b1) indicate propagated forward sublayer input layer information, W2The corresponding weight of input vector is indicated, before b2 expressionTo the bias factor of propagation function, the nonlinear activation function of γ presentation code device layer.
It carry out position encoded being calculated absolute position as the variable in trigonometric function using trigonometric function, formula is such asUnder:
In formula, pos is position, and i is dimension, i.e., position encoded each dimension corresponds to sine curve, and wavelength is formed from 2Geometric progression of the π to 100002 π, dmodelBe it is position encoded after embeding layer dimension, the value range of 2i is that minimum value is0, maximum value is dmodel
Compared with the prior art, the advantages of the present invention are as follows:
1, the present invention uses the Series Modeling method based on Transformer, and the model of sequence to sequence is still continued to useClassical coder-decoder structure, but RNN or CNN are not used as Series Modeling mechanism, but used bull noteMeaning power mechanism, to be easier capture " long-distance dependence information ".
2, the stem of the invention in Mongolian corpus, affixe are split with the supplementary element passed, the supplementary element of latticeIt is affixe special in Mongolian, the difference with common affixe, first consisting in it only indicates grammer meaning, without any languageThe meaning of adopted level, the present invention carry out cutting separation to the supplementary element of the lattice in corpus, on the one hand can reduce the dilute of dataProperty is dredged, Mongolian stem information is on the other hand also preferably remained.
3, the present invention is directed to the serious Sparse Problem as caused by Mongolian word-building characteristic, proposes three kinds in various degreeWord segmentation scheme, be the supplementary element of the affixe cutting of small grain size, the stem cutting of big granularity and small-scale lattice respectivelyCutting.Experiment shows to combine stem cutting and the supplementary element cutting of lattice, can maximally promote the quality of translation.
4, the present invention is expressed as research background to measure the distribution of the similarity degree between two words, comprehensive analysis depthThe influence of degree and density, semantic registration to Concept Semantic Similarity, and it is integrated with traditional semantic distance and the information contentSimilarity algorithm establishes similarity matrix, by carrying out principal component analysis to it, original similarity matrix is converted into newlyPrincipal component transform matrix, calculate its principal component contributor rate, and be weighted processing as weight, obtain final conceptSemantic similarity.
Detailed description of the invention
Fig. 1 is the illiteracy Chinese machine translation frame diagram the present invention is based on Transformer.
Fig. 2 is the illustraton of model the present invention is based on bull attention mechanism to Series Modeling.
Fig. 3 is " soft " attention model figure of the invention.
Fig. 4 is bull attention model figure of the present invention.
Fig. 5 is morpheme cutting flow chart of the present invention.
Fig. 6 is computation model of the bull attention mechanism of the present invention to weight.
Fig. 7 is that the present invention uses two-way RNN to carry out modeling schematic diagram to sequence.
Fig. 8 is that the present invention uses multi-layer C NN to carry out modeling schematic diagram to sequence.
Fig. 9 be aggregate concept Semantic Similarity Measurement of the present invention distributed algorithm under randomly select the phases of 65 groups of words pairLike degree distribution map.
Specific embodiment
The embodiment that the present invention will be described in detail with reference to the accompanying drawings and examples.
A kind of illiteracy Chinese machine translation method based on Transformer of the present invention, first pre-processes Mongolian corpus, then withThe correlation model that word2vec generates term vector is research background, and comprehensive depth, density, semantic registration are similar to Concept SemanticThe similarity algorithm of the influence of degree, Semantic distance and the information content establishes similarity matrix, then carries out principal component analysis,Similarity matrix is converted into principal component transform matrix, calculates principal component contributor rate, and be weighted processing as weight,Obtain final Concept Semantic Similarity;Transformer model is finally used in translation process, thus the note that places one's entire reliance uponMeaning power mechanism draws the global dependence between outputting and inputting, and eliminates recurrence and convolution, wherein the Transformer mouldType is to carry out multilevel encoder-decoder position encoded and based on enhanced bull attention mechanism construction using trigonometric functionFramework.
Mongolian corpus pretreatment: morpheme cutting based on dictionary, when carrying out cutting firstly the need of utilize word frequency statisticsThe dictionary of tool OpenNMT.dict generation Mongol corpus.After dictionary generates, searches for stem in dictionary and summarized, generatedStem table.Part other than stem table is corresponding affixe exterior portion point.Herein based on stem table and affixe table, using reverseMaximum matching algorithm carries out morpheme to Mongolian each word-building and carries out cutting, and cutting process is as shown in Figure 5.For eachA Mongolian word to be processed matches all dictionary records, one by one if a Mongolian word to be processed includes a certain itemRecord, then carry out cutting, keep the supplementary element of lattice disconnected, last a Mongolian word is separated into two parts: lattice addIngredient a part, being left part is another part.
To cover Chinese bilingual corpora carry out coding be uniformly processed after, construct bilingual dictionary on this basis.It shouldThe modeling of Transformer model carries out position comprising the coder-decoder structure of building multilayered structure, using trigonometric functionCoding and the model construction based on enhanced bull attention mechanism, and to the training optimization method of model and canonical strategy intoRow improves.
On the algorithm based on the information content, the present invention passes through the distributed analysis indicated to word, finds concept packetThe sub- concept contained is more, and the concept information contained content is fewer, and gives the distributed I calculating mould indicated for wordType:
Wherein: all sub- concept node numbers of h (c) expression notional word node c;maxwnIt is a constant, indicates in semantemeAll concept node trees in classification tree.
The present invention proposes the aggregative weighted method based on principal component analysis, and principal component analysis is introduced into weight computing, benefitUse the contribution rate of principal component as weight to similarity carry out aggregative weighted calculating, alleviate dimension disaster, gradient explosion askTopic, facilitates the fast convergence of model.
Aggregative weighted algorithm in the present invention based on principal component analysis is mainly by multi-angle similarity calculation, similarity matrixIt extracts, 3 part of the weight computing composition based on principal component analysis.
First part's multi-angle similarity calculation
Semantic similarity is analyzed from semantic distance, depth and density, semantic registration and the information content respectively,Provide the calculation formula of each section semantic similarity.
(1) semantic distance
Relationship between semantic distance and similarity is expressed as
Wherein: c1And c2For the distributed vector of two concepts to be compared;A is an adjustable parameter, is taken here to be comparedValue of the concept to the average semantic distance of set as a;D(c1,c2) be 2 concepts semantic distance, indicate c1And c2BetweenShortest path.
(2) depth and density
The level locating for semantic tree interior joint is higher, and representative notional word is more abstract;Locating level is lower, representativeNotional word it is more specific.If the notional word c compared1And c2Node where the depth capacity of semantic tree be respectively Kmax(c1) andKmax(c2), notional word c1And c2Node depth be respectively K (c1) and K (c2), then the Semantic Similarity Measurement formula in terms of depthFor
In semantic hierarchies tree, the density of regional area is bigger, illustrates that this region is more specific to the division of concept, in regionSemantic similarity between notional word is relatively large.The Effects of Density factor of notional word c is
Wherein: n (c) is using notional word node c as the direct descendent number of root node, and n (O) is notional word c node placeThe maximum value of the direct descendent number of sub- each node of semantic tree O.It is obtained based on following formula and is compared notional word c1And c2?The calculation formula of semantic similarity is in terms of density
(3) semantic registration
The root node for defining semantic hierarchies tree is R, c1And c2For arbitrary 2 notional word nodes, S (c1) it is from c1It sets outUntil the node number in the node set that root node R is passed through, S (c2) it is from c2The node to set out until root node R is passed throughNode number in set, S (c1)∩S(c2) indicate from c1And c2The node set (intersection) passed through jointly to R, S (c1)∪S(c2) indicate from c1To the R node set passed through and c2The union of the node set passed through to R, then in terms of semantic registrationSimilarity be expressed as
(4) information content
In order to define the similarity in terms of the information content, following algorithm is proposed to calculate I value.Calculation formula is
Wherein, c1And c2Indicate the distributed vector of two concepts to be compared, I (c1) indicate with Concept Vectors c1For fatherThe sum of the vector dimension of all child nodes of node, I (c2) indicate with Concept Vectors c2For the vector of all child nodes of father nodeThe sum of dimension.
Second part similarity matrix extracts
Assuming that being compared concept to having n in set to notional word, if xi=(Dsi,Ksi,Zsi,Ssi,Isi) it is that principal component inputsA vector in sample set, wherein it is similar to respectively represent each section semanteme in comprehensive similarity computing module per one-dimensional variableDegree calculate as a result, DsiIndicate the relationship in vector between the semantic distance and similarity of i-th dimension element, KsiIt indicates in vectorSemantic similarity in terms of the depth of i-th dimension element, ZsiIndicate the Effects of Density factor of the notional word c of i-th dimension element in vector,SsiIndicate the similarity in vector in terms of the semantic registration of i-th dimension element, IsiIt indicates in vector in the information of i-th dimension elementHold the similarity of aspect.
Then similarity matrix is expressed as
Xsim=(xi1,xi2,xi3,xi4,xi5)T, i=1,2,3 ..., n
Weight computing of the Part III based on principal component analysis
The thought of principal component analysis is that multiple indexs are converted into several overall targets under the premise of losing little informationMultivariate statistical method.The overall target being usually converted into is known as principal component, wherein each principal component is original variableLinear combination, and it is irrelevant between each principal component, this, which allows for principal component, has certain superior performances than original variable.The weight of each principal component is distributed in Principal Component Analysis Algorithm according to the contribution rate of principal component, rather than artificially determine, thusThe defect that weight is artificially determined in multi-variables analysis is overcome, so that result is objective rationally.
To the similarity matrix X builtsimPrincipal component analysis is carried out, the principal component extracted is
Y=(ysim1,ysim2,ysim3,ysim4,ysim5)
Each principal component contributor rate is (r1,r2,r3,r4,r5), then final Concept Semantic Similarity calculation formula is
δsim=r1ysim1+r2ysim2+r3ysim3+r4ysim4+r5ysim5
The method for constructing the coder-decoder structure of multilayered structure are as follows:
In encoder, the output of each sublayer is LayerNorm (x+Sublayer (x)), and wherein LayerNorm () is indicatedLayer normalized function, the function that Sublayer () is realized using the sublayer itself that the residual error based on bull attention mechanism connects,X indicates the current layer vector to be inputted, and Mongolian sentence is generated corresponding vector using word2vec vector techniques, is then madeFor the input of the first layer coder, i.e. Sublayer (x) is the function of being realized by the sublayer itself based on bull attention mechanism,In order to promote residual error to connect, all sublayers and embeding layer generate dimension dmodel=512 output.
Fig. 1 illustrates one layer of encoder and decoder of the structure of Transformer.
With reference to Fig. 1, the Nx in left side represents one layer of encoder, and two sublayers are contained in this layer, and first sublayer isBull attention sublayer, second sublayer are a propagated forward sublayers.Each sublayer outputs and inputs that there is residual errorsConnection, this mode can theoretically return gradient well.Each sublayer is followed by step regularization operation, regularizationUse can accelerate the convergence rate of model.The calculating of bull attention sublayer, will be more in enhanced bull attention mechanismIt is discussed in detail in the model construction of head attention mechanism.Propagated forward sublayer has linear transformation twice in realizing, one time Relu is non-Linear activation, specific formula for calculation are as follows:
FFN (x)=γ (0, xW1+b1)W2+b2
X presentation code device inputs information, W1Indicate the corresponding weight of input vector, b1Indicate the inclined of bull attention mechanismSet the factor.(0,xW1+b1) indicate propagated forward sublayer input layer information, W2The corresponding weight of input vector is indicated, before b2 expressionTo the bias factor of propagation function, the nonlinear activation function of γ presentation code device layer.Wherein, encoder input information is insertionThe vector obtained after layer information coal addition position encoded information processing.It is first son of encoder that propagated forward sublayer, which inputs information,Output after layer processing.
With reference to Fig. 1, the Nx on right side represents one layer in decoder of structure, there are three sublayer structures in this layer, firstA sublayer is the bull attention sublayer of mask matrix majorization, for modeling the target side sentence generated, in trained mistakeCheng Zhong needs a mask matrix to control, and when so that bull attention calculates every time, only calculates and arrives preceding t-1 word.TheTwo sublayers are bull attention sublayers, are the attention mechanism between encoder and decoder, that is, go in original language to look forRelevant semantic information, the calculating of this part and the attention of other sequences to sequence calculate unanimously, make in TransformerWith the mode of dot product.Third sublayer is propagated forward sublayer, completely the same with the propagated forward sublayer in encoder.EachAlso all there is residual error connections and regularization operation for sublayer, to accelerate model convergence.
The present invention carries out position encoded method using trigonometric function are as follows:
Bull attention mechanism models the mode of sequence, neither the timing feature of RNN, nor the structuring of CNN is specialPoint, but the characteristics of a kind of bag of words (bag of words).If being further described, it should say that the mechanism regards a sequence to be flatFlat structure, no matter being all 1 in bull attention mechanism because distance word how far seemed.Such modeling pattern, it is realThe relative distance relationship between word can be lost on border.Citing: " ox has eaten grass ", " grass has eaten ox ", " having eaten timothy " three sentencesThe corresponding expression of each word come is modeled, can be consistent.
In order to alleviate this problem, the present invention word is mapped to the location of in sentence in Transformer toAmount, adds in its embeding layer.The thinking is not to be suggested for the first time, and there is also similarly be difficult to build in fact for CNN modelThe defect of mould relative position (timing information), Facebook propose position encoded method.A kind of direct mode is, directlyInside absolute location information modeling to embeding layer, i.e., by word WiI be mapped to a vector, be added in its embeding layer, butThe shortcomings that this mode is the sequence that can only model finite length.
A kind of new timing information modeling pattern is used in the present invention, that is, utilizes the periodicity of trigonometric function, Lai JianmoRelative positional relationship between word.Specific mode is calculated absolute position as the variable in trigonometric function, specific publicFormula is as follows:
Pos is position, and i is dimension.That is, position encoded each dimension corresponds to sine curve.Wavelength is formedFrom 2 π to the geometric progression of 100002 π.The present invention has selected this function, it allows model easily to learn relative position,Because for any constant offset k, PEpos+kIt can be expressed as PEposLinear function.dmodelBe it is position encoded after embeding layerDimension, the value range of 2i is that minimum value is 0, and maximum value is dmodel
Trigonometric function has good periodicity, that is, every a certain distance, the value of dependent variable can repeat, this spyProperty can be used to model relative distance;On the other hand, the codomain of trigonometric function is [- 1,1], can provide embeding layer member wellThe value of element.
The method of the model construction based on enhanced bull attention mechanism are as follows:
Fig. 2 illustrates the Series Modeling method based on bull attention mechanism.Note that it is apparent in order to show figure,Lack some connecting lines of picture, each word and first layer in " source language sentence subvector " layer (i.e. original language morpheme vector in figure) are moreNode in head attention layer is all the relationship connected entirely, between first layer bull attention layer and second layer bull attention layerNode be also all the relationship connected entirely.It can be seen that the interaction distance between any two word is all in this modeling methodIt is that there is no relationships for relative distance between 1, with word.Under this mode, the semantic determination of each word, all consider with entirelyThe relationship of all words in sentence.Bull attention mechanism can capture more so that this global interaction becomes more complicatedMore information.
To sum up, bull attention mechanism can capture long-distance dependence knowledge when modeling sequence problem, have betterTheoretical basis.
The mathematical formization expression of bull attention mechanism is described below.Firstly, from being said attention mechanism.
1. attention mechanism (model)
When handling a large amount of input information with neural network, the attention mechanism of human brain can also be used for reference, is only selectedThe information input of some keys is handled, the efficiency of Lai Tigao neural network.In current neural network model, it can incite somebody to actionMaximum convergence (max pooling) gates (gating) mechanism approximatively to regard the note based on conspicuousness from bottom to top asMeaning power mechanism.In addition to this, top-down convergence type attention is also a kind of effective information selection mode.Understood with readingFor task, as soon as given very long article, then the content of this article is putd question to.The problem of proposition, is only and in paragraphOne or two of sentence is related, and rest part is all unrelated.In order to reduce the computation burden of neural network, it is only necessary to relevantSection, which is picked out, allows subsequent neural network to handle, without all article contents are all inputed to neural network.
Use x1:N=[x1,…,xN] indicate N number of input information, in order to save computing resource, not needing will be all N number of defeatedEnter information and be all input to neural network to be calculated, it is only necessary to from x1:NThe middle some information inputs relevant with task of selection are to mindThrough network.A query vector q relevant with task is given, we indicate to be selected information with attention variable z ∈ [1, N]Index position, i.e. z=i expression selected i-th of input information.In order to facilitate calculating, selected using the information of a kind of " soft "The system of selecting a good opportunity is calculated in given q and x first1:NUnder, select the probability α of i-th of input informationi,
Wherein s (xi, q) and it is scoring functions, following three kinds of modes can be used to calculate:
Addition model s (xi, q) and=vTtanh(Wxi+Uq)
Dot product model
Multiplied model
Wherein W, U, v are the network parameter that can learn, and T is the transposition operation of matrix.
Attention is distributed αiIt can be construed in Context query q, the concerned degree of i-th of information.Using one kindThe information selection mechanism of " soft " is encoded to input information
Fig. 3 gives the example of " soft " attention mechanism.
2. the variant of attention mechanism
2.1 key-value pair attentions
More generally, input information can be indicated with key-value pair (key-value pair) format, wherein " key " K is used toIt calculates attention and is distributed αi, " value " V is used to generate the information of selection.With (k, v)1:N=[(k1,v1),…,(kN,vN)] indicate N number ofIt inputs information, when the relevant query vector q of Given task, notices that force function is
Wherein s (ki, q) and indicate scoring functions.
Fig. 4 gives the example of key-value pair attention mechanism.If the k in key-value pair modei=vi,Then it is just etc.Valence is in common attention mechanism.
2.2 scaling dot product attentions
Scale dot product attention algorithm be describe by key-value pair K-V and query vector q, very be abstracted, here weAssuming that " key " K in key-value pair corresponds to same vector, i.e. K=V with " value " V, as shown in fig. 6, query vector q corresponds to target sentenceThe term vector of son.
There are three steps for specific operation.
1. the calculating process that each query vector q and " key " K can make a dot product
2. finally will use softmax their normalizings, it is maintained at the range of probability value in [0,1] section.
3. can be used to the end multiplied by " value " V as attention force vector again
Here mathematic(al) representation is as follows.
WhereinFor zoom factor, the transposition operation of T representing matrix.
2.3 bull attentions
Bull attention is to utilize multiple queries q1:M={ q1,…,qM, it is more to calculate the selection from input information in parallelA information.The different piece of each attention concern input information.
The method that the present invention improves the training optimization method and canonical strategy of model are as follows:
The training of model uses Adam method, and present invention employs a kind of learning rate adjusting methods for being warm up, such asShown in formula:
The formula is meant that training needs to preset the super ginseng of a warmup_steps.
A. when train epochs step_num is less than the value, learning rate, the formula are determined with the Section 2 formula in bracketThe linear function that the slope of really step_num variable is positive.
B. when train epochs step_num is greater than warm_steps, learning rate, the public affairs are determined with the first item in bracketFormula is just the power function of negative at an index.
So on the whole, learning rate is conducive to the fast convergence of model in downward trend after first rising.
Two important regularization methods are also used in model, one is common dropout method, is used in everyBehind a sublayer and in the calculating of attention.The other is label smoothing method, that is, when training, calculate cross entropyWhen, no longer it is the model answer of one-hot, but also fills a non-zero minimum at each 0 value.In this way may be usedTo enhance the robustness of model, the BLEU value of lift scheme.
To sum up, the present invention is based on the modeling sequence method of Transformer, the model of sequence to sequence has still continued to use warpThe coder-decoder structure of allusion quotation the difference is that not using RNN or CNN as Series Modeling mechanism, but has used moreHead attention mechanism.The theoretic advantage of bull attention mechanism is more easily capture " long-distance dependence information ".It is so-called " longApart from Dependency Specification " it can be regarded as: 1) word be in fact the symbol that can express diversity semantic information (ambiguity askTopic).2) semanteme of a word determines, to rely on the context environmental where it.(based on context disappear qi) 3) word that has mayThe lesser context environmental of range is needed just to can determine that its semantic (short distance dependence phenomenon), some words may need oneThe biggish context environmental of range just can determine that its semantic (long-distance dependence phenomenon).
For example, following two word is seen:
" have many cuckoos on mountain, spring to when, can opening all over the mountains and plains, it is very beautiful."
" have many cuckoos on mountain, spring to when, can cry of birds or animals all over the mountains and plains, very in a roundabout way."
In this two word, " cuckoo " respectively refers to colored and bird.In machine translation problem, if do not seen distant away from itsThe word of distance is difficult to translate " cuckoo " this word correct.The example is an obvious example, can significantly be seenRemote dependence between word.Certainly, the most of meaning of a word in a small range of context semantic environment justIt was determined that as the ratio regular meeting that above-mentioned example accounts in language is relatively small.It is desirable that be that model can either be goodLearn to short-range dependence knowledge, can also learn the knowledge to long-distance dependence.
It is short-range that bull attention mechanism in Transformer of the present invention theoretically can preferably capture this lengthKnowledge is relied on, lower mask body compares three kinds of Series Modeling methods based on RNN, CNN, Transformer, between any two wordInteraction distance on difference.
Fig. 7 is the method modeled using two-way RNN to sequence.Due to be to the element in sequence in orderProcessing, the interaction distance between two words may be considered the relative distance between them.Interaction distance between W1 and WnIt is n-1.Historical information selectively can be stored and be forgotten in RNN model theory with door control mechanism, have thanPure RNN structure preferably shows, but in the case that gating parameter amount is certain, this ability is certain.With the increasing of sentenceLong, there is the apparent theoretical upper limit in the increase of relative distance.
Fig. 8 illustrates the method modeled using multi-layer C NN to sequence.The semantic ring of the CNN unit covering of first layerBorder range is smaller, and the semantic environment range of second layer covering can become larger, and so on, the more CNN unit of deep layer, the semanteme of coveringEnvironment can be bigger.One prefix can first interact on bottom CNN unit with the generation of the word of its short distance, then in slightly higher levelIt is interacted on CNN unit with its farther some word generation.So the CNN structure of multilayer embody be it is a kind of from part to the overall situationFeature extraction process.Interaction distance between word, corresponding thereto apart from directly proportional.It can only be in higher CNN apart from farther away wordIt meets on node, just generates interaction.This process may have more information and lose.
And Fig. 2 show the present invention is based on the Series Modeling methods of bull attention mechanism to be obviously better than two kinds of sidesFormula can capture more information.
It is a specific illiteracy Chinese translation instance below.
It is tested using 120 Wan Menghan Parallel Corpus as data set, effect of the invention is verified.
Aiming at the problem that the serious Sparse occurred in Mongolian corpus, three kinds of processing modes are carried out, have been respectively: wordSew the supplementary element cutting of cutting, stem cutting and lattice, wherein the granularity of affixe cutting is smaller, the fineness ratio of stem cuttingLarger, the dicing process of the supplementary element of lattice is similar to stem cutting, and the granularity of cutting is bigger.
The present invention tests these three cutting methods of corpus respectively, and experimental result is as shown in table 1.
Table 1
It can be seen that the quality that all cutting methods all improve translation from the experimental result in table.Wherein, stem is cutOperation BLEU value is divided to can be improved 1.02, although the supplementary element cutting promotion of lattice is unobvious, when common with stem cuttingWhen effect, so that BLEU lifting values have reached 1.14.Why the result of affixe cutting is not so good as stem cutting, it is believed that mainThe reason is that since affixe cutting is too careful, so that the sentence length amplification after cutting is larger, and neural network machine translation pairThe processing capacity of long sentence is weaker, therefore effect will receive influence.Distribution is indicated aggregate concept semantic similarity is addedAfter calculating, BLEU improves 5.88.Then we randomly select out 65 groups of words pair, are built using word pair and similarity value for coordinateVertical coordinate system indicates that algorithm calculates point of similarity value in a coordinate system to the distributed word of aggregate concept semantic similarityCloth situation is analyzed, from fig. 9, it can be seen that the obtained continuity of this algorithm is relatively good, this explanation is based on calculating hereinThe similarity value calculation of method and the artificial marking of similarity have the good degree of correlation.By the pretreated of the data of front twoJourney is put into finally, the data pre-processed are divided into training set, verifying collection and test set with certain proportion by this experimentTraining in Transformer model, BLEU value improve 10.16, and training effect is obviously better than RNN.

Claims (10)

Translated fromChinese
1.一种基于Transformer的增强语义特征信息的蒙汉机器翻译方法,其特征在于,在翻译过程中采用Transformer模型,所述Transformer模型为利用三角函数进行位置编码并基于增强型多头注意力机制构建的多层编码器-解码器架构,从而完全依赖于注意力机制来绘制输入和输出之间的全局依赖关系,消除递归和卷积。1. a Mongolian-Chinese machine translation method based on the enhanced semantic feature information of Transformer, is characterized in that, adopts Transformer model in translation process, and described Transformer model is to utilize trigonometric function to carry out position coding and build based on enhanced multi-head attention mechanism The multi-layer encoder-decoder architecture, which completely relies on the attention mechanism to map the global dependencies between input and output, eliminates recursion and convolution.2.根据权利要求1所述基于Transformer的增强语义特征信息的蒙汉机器翻译方法,其特征在于,在翻译之前,先对数据进行预处理,所述对数据进行预处理是对蒙文语料中的词干、词缀和格的附加成分进行切割分离,以降低数据的稀疏性,同时找出蒙文在词干、词缀以及格的附加成分的语言特征,并将这些语言特征融入到训练之中。2. the Mongolian-Chinese machine translation method based on the enhanced semantic feature information of Transformer according to claim 1, is characterized in that, before translating, data is preprocessed earlier, and described data is preprocessed to be in the Mongolian language corpus. The stems, affixes and additional components of the case are cut and separated to reduce the sparsity of the data, and at the same time, the language features of the Mongolian language in the stems, affixes and additional components of the case are found, and these language features are integrated into the training. .3.根据权利要求2所述基于Transformer的增强语义特征信息的蒙汉机器翻译方法,其特征在于,所述切割分离包括小粒度的词缀切分、大粒度的词干切分以及小规模的格的附加成分切分。3. the Mongolian-Chinese machine translation method based on the enhanced semantic feature information of Transformer according to claim 2, is characterized in that, described cutting and separation comprises the affix segmentation of small granularity, the stem segmentation of large granularity and the lattice of small scale of additional ingredients.4.根据权利要求1所述基于Transformer的增强语义特征信息的蒙汉机器翻译方法,其特征在于,对数据进行预处理后,综合深度、密度、语义重合度对概念语义相似度的影响,集成语义距离与信息内容的相似度算法建立相似度矩阵,然后进行主成分分析,将相似度矩阵转换成主成分变换矩阵,计算主成分贡献率,并将其作为权值进行加权处理,得到最终的概念语义相似度。4. the Mongolian-Chinese machine translation method based on the enhanced semantic feature information of Transformer according to claim 1, it is characterized in that, after data is preprocessed, the impact of comprehensive depth, density, semantic coincidence degree on concept semantic similarity, integrated The similarity algorithm between semantic distance and information content establishes a similarity matrix, and then performs principal component analysis, converts the similarity matrix into a principal component transformation matrix, calculates the principal component contribution rate, and uses it as a weight for weighting to obtain the final Concept semantic similarity.5.根据权利要求4所述基于Transformer的增强语义特征信息的蒙汉机器翻译方法,其特征在于,所述相似度矩阵的公式表示为5. the Mongolian-Chinese machine translation method based on the enhanced semantic feature information of Transformer according to claim 4, is characterized in that, the formula of described similarity matrix is expressed asXsim=(xi1,xi2,xi3,xi4,xi5)T,i=1,2,3,…,nXsim =(xi1 ,xi2 ,xi3 ,xi4 ,xi5 )T ,i=1,2,3,...,n所述最终的概念语义相似度计算表示公式为The final concept semantic similarity calculation formula is as followsδsim=r1ysim1+r2ysim2+r3ysim3+r4ysim4+r5ysim5δsim =r1 ysim1 +r2 ysim2 +r3 ysim3 +r4 ysim4 +r5 ysim5其中,Xsim表示相似度矩阵,xi1表示Dsxi2表示Ks,xi3表示Zsxi4表示Ssxi5表示Isn是被比较概念对集合中的概念词的对数,xi=(Dsi,Ksi,Zsi,Ssi,Isi),为主成分输入样本集合中的一个向量,其中每一维变量分别代表综合相似度计算模块中各部分语义相似度计算的结果,Dsi表示向量中第i维元素的语义距离与相似度之间的关系,Ksi表示向量中第i维元素的深度方面的语义相似度,Zsi表示向量中第i维元素的概念词c的密度影响因子,Ssi表示向量中第i维元素的语义重合度方面的相似度,Isi表示向量中第i维元素的信息内容方面的相似度;δsim表示概念语义相似度,ysim1,ysim2,ysim3,ysim4,ysim5为对相似度矩阵Xsim进行主成分分析所提取出的主成分,r1,r2,r3,r4,r5表示各主成分贡献率。Among them, Xsim represents the similarity matrix, xi1 represents Ds , xi2 represents Ks , xi3 represents Zs , xi4 represents Ss , xi5 means Is , n is the logarithm of the concept words in the set of compared concept pairs,xi = (Dsi , Ksi , Zsi , Ssi , Isi ), a vector in the principal component input sample set, where each dimension The variables respectively represent the results of the semantic similarity calculation of each part in the comprehensive similarity calculation module, Dsi represents the relationship between the semantic distance and similarity of the i-th dimension element in the vector, and Ksi represents the depth aspect of the i-th dimension element in the vector The semantic similarity of , Zsi represents the density influence factor of the concept word c of the ith dimension element in the vector, Ssi represents the similarity in terms of semantic coincidence of the ith dimension element in the vector, Isi represents the ith dimension element in the vector The similarity in terms of information content; δsim represents the conceptual semantic similarity, ysim1 , ysim2 , ysim3 , ysim4 , ysim5 are the principal components extracted from the principal component analysis of the similarity matrix Xsim , r1 , r2 , r3 , r4 , and r5 represent the contribution rate of each principal component.6.根据权利要求1所述基于Transformer的增强语义特征信息的蒙汉机器翻译方法,其特征在于,所述多头注意力机制描述为查询和一组键值对映射到输出,其中查询、键、值和输出都是向量,输出被计算为值的加权和,分配给每个值的权重由查询与相应密钥的兼容性函数计算得到。6. the Mongolian-Chinese machine translation method based on the enhanced semantic feature information of Transformer according to claim 1, is characterized in that, described multi-head attention mechanism is described as query and a group of key-value pairs are mapped to output, wherein query, key, Both the value and the output are vectors, the output is computed as a weighted sum of the values, and the weight assigned to each value is computed by the query's compatibility function with the corresponding key.7.根据权利要求1所述基于Transformer的增强语义特征信息的蒙汉机器翻译方法,其特征在于,7. the Mongolian-Chinese machine translation method based on the enhanced semantic feature information of Transformer according to claim 1, is characterized in that,所述编码器由N个相同的层组成,每层有两个子层,第一个子层是多头注意力子层,第二个子层是前向传播子层,每个子层的输入和输出都存在着残差连接,每个子层的后面跟着一步正则化操作,以加快模型收敛;The encoder consists of N identical layers, each with two sub-layers, the first sub-layer is a multi-head attention sub-layer, the second is a forward-propagation sub-layer, and the input and output of each sub-layer are There are residual connections, and each sublayer is followed by a step of regularization to speed up model convergence;所述解码器由N个相同的层组成,每层有三个子层,第一个子层是mask矩阵控制的多头注意力子层,用来建模已经生成的目标端句子,在训练的过程中,以一个mask矩阵控制每次多头注意力计算时只计算到前t-1个词;第二个子层是多头注意力子层,是编码器和解码器之间的注意力机制,即去源语言中找相关的语义信息;第三个子层是前向传播子层,与编码器中的前向传播子层完全一致,每个子层的输入和输出都存在着残差连接,并后跟一步正则化操作,以加快模型收敛。The decoder consists of N identical layers, each of which has three sub-layers. The first sub-layer is a multi-head attention sub-layer controlled by the mask matrix, which is used to model the generated target-side sentences. During the training process , use a mask matrix to control each multi-head attention calculation to only calculate the first t-1 words; the second sub-layer is the multi-head attention sub-layer, which is the attention mechanism between the encoder and the decoder, that is, the source Find relevant semantic information in the language; the third sublayer is the forward propagation sublayer, which is exactly the same as the forward propagation sublayer in the encoder. The input and output of each sublayer have residual connections, followed by a step of regularization. operation to speed up model convergence.8.根据权利要求1或7所述基于Transformer的增强语义特征信息的蒙汉机器翻译方法,其特征在于,通过如下方法构建多层编码器-解码器架构:8. the Mongolian-Chinese machine translation method based on the enhanced semantic feature information of Transformer according to claim 1 or 7, is characterized in that, builds multi-layer encoder-decoder architecture by the following method:编码器中,每个子层的输出是LayerNorm(x+Sublayer(x)),其中LayerNorm()表示层归一化函数,Sublayer()使用基于多头注意力机制的残差连接的子层本身实现的函数,x表示当前层要输入的向量,将蒙语句子利用word2vec向量技术生成相对应的向量,然后作为第一层编码器的输入,即Sublayer(x)是由基于多头注意力机制的子层本身实现的功能,为了促进残差连接,所有子层以及嵌入层产生维度dmodel=512的输出。In the encoder, the output of each sublayer is LayerNorm(x+Sublayer(x)), where LayerNorm() represents the layer normalization function, and Sublayer() is implemented by the sublayer itself using the residual connection based on the multi-head attention mechanism Function, x represents the vector to be input by the current layer, and the corresponding vector is generated by using the word2vec vector technology for the Mongolian sentence, and then used as the input of the first layer encoder, that is, Sublayer(x) is composed of a sublayer based on a multi-head attention mechanism. A function implemented by itself, in order to facilitate residual connections, all sub-layers as well as the embedding layer produce outputs of dimensiondmodel = 512.9.根据权利要求1或7所述基于Transformer的增强语义特征信息的蒙汉机器翻译方法,其特征在于,9. the Mongolian-Chinese machine translation method based on the enhanced semantic feature information of Transformer according to claim 1 or 7, is characterized in that,所述编码器的前向传播子层实现中有两次线性变换,一次Relu非线性激活,具体计算公式如下:There are two linear transformations and one Relu nonlinear activation in the forward propagation sublayer implementation of the encoder. The specific calculation formula is as follows:FFN(x)=γ(0,xW1+b1)W2+b2FFN(x)=γ(0,xW1 +b1 )W2 +b2x表示编码器输入信息,W1表示输入向量对应的权重,b1表示多头注意力机制的偏置因子,(0,xW1+b1)表示前向传播子层输入层信息,W2表示输入向量对应的权重,b2表示前向传播函数的偏置因子,γ表示编码器层的非线性激活函数。x represents the encoder input information, W1 represents the weight corresponding to the input vector, b1 represents the bias factor of the multi-head attention mechanism, (0,xW1 +b1 ) represents the input layer information of the forward propagation sub-layer, and W2 represents The weight corresponding to the input vector, b2 represents the bias factor of the forward propagation function, and γ represents the nonlinear activation function of the encoder layer.10.根据权利要求1或7所述基于Transformer的增强语义特征信息的蒙汉机器翻译方法,其特征在于,利用三角函数进行位置编码是将绝对位置作为三角函数中的变量做计算,公式如下:10. according to the Mongolian-Chinese machine translation method of the enhanced semantic feature information based on Transformer according to claim 1 or 7, it is characterized in that, utilizing trigonometric function to carry out position coding is to calculate absolute position as the variable in trigonometric function, and formula is as follows:式中,pos是位置,i是维度,即位置编码的每个维度对应于正弦曲线,波长形成从2π到10000·2π的几何级数,dmodel是位置编码后的嵌入层的维度,2i的取值范围是最小值是0,最大值是dmodelIn the formula, pos is the position, i is the dimension, that is, each dimension of the position encoding corresponds to a sinusoid, the wavelength forms a geometric series from 2π to 10000 2π, dmodel is the dimension of the embedded layer after the position encoding, 2i’s The value range is the minimum value is 0, the maximum value is dmodel .
CN201811231017.2A2018-10-222018-10-22A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on TransformerPendingCN109492232A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201811231017.2ACN109492232A (en)2018-10-222018-10-22A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201811231017.2ACN109492232A (en)2018-10-222018-10-22A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer

Publications (1)

Publication NumberPublication Date
CN109492232Atrue CN109492232A (en)2019-03-19

Family

ID=65692441

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201811231017.2APendingCN109492232A (en)2018-10-222018-10-22A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer

Country Status (1)

CountryLink
CN (1)CN109492232A (en)

Cited By (57)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110083826A (en)*2019-03-212019-08-02昆明理工大学A kind of old man's bilingual alignment method based on Transformer model
CN110196946A (en)*2019-05-292019-09-03华南理工大学A kind of personalized recommendation method based on deep learning
CN110297887A (en)*2019-06-262019-10-01山东大学Service robot personalization conversational system and method based on cloud platform
CN110321961A (en)*2019-07-092019-10-11北京金山数字娱乐科技有限公司A kind of data processing method and device
CN110321962A (en)*2019-07-092019-10-11北京金山数字娱乐科技有限公司Data processing method and device
CN110349676A (en)*2019-06-142019-10-18华南师范大学Timing physiological data classification method, device, storage medium and processor
CN110390340A (en)*2019-07-182019-10-29暗物智能科技(广州)有限公司The training method and detection method of feature coding model, vision relationship detection model
CN110427493A (en)*2019-07-112019-11-08新华三大数据技术有限公司Electronic health record processing method, model training method and relevant apparatus
CN110543551A (en)*2019-09-042019-12-06北京香侬慧语科技有限责任公司question and statement processing method and device
CN110597947A (en)*2019-03-202019-12-20桂林电子科技大学 A reading comprehension system and method based on the interaction of global and local attention
CN110598221A (en)*2019-08-292019-12-20内蒙古工业大学 A Method of Improving the Quality of Mongolian-Chinese Translation Using Generative Adversarial Networks to Construct Mongolian-Chinese Parallel Corpus
CN110619034A (en)*2019-06-272019-12-27中山大学Text keyword generation method based on Transformer model
CN110674647A (en)*2019-09-272020-01-10电子科技大学 A layer fusion method and computer equipment based on Transformer model
CN110704587A (en)*2019-08-222020-01-17平安科技(深圳)有限公司Text answer searching method and device
CN110717343A (en)*2019-09-272020-01-21电子科技大学 An optimized alignment method based on the output of transformer attention mechanism
CN110765768A (en)*2019-10-162020-02-07北京工业大学 An optimized text summary generation method
CN110795535A (en)*2019-10-282020-02-14桂林电子科技大学Reading understanding method for depth separable convolution residual block
CN110827219A (en)*2019-10-312020-02-21北京小米智能科技有限公司Training method, device and medium of image processing model
CN111080032A (en)*2019-12-302020-04-28成都数之联科技有限公司Load prediction method based on Transformer structure
CN111105423A (en)*2019-12-172020-05-05北京小白世纪网络科技有限公司Deep learning-based kidney segmentation method in CT image
CN111310485A (en)*2020-03-122020-06-19南京大学Machine translation method, device and storage medium
CN111353315A (en)*2020-01-212020-06-30沈阳雅译网络技术有限公司 A Deep Neural Machine Translation System Based on Stochastic Residual Algorithm
CN111382583A (en)*2020-03-032020-07-07新疆大学Chinese-Uygur name translation system with mixed multiple strategies
CN111401052A (en)*2020-04-242020-07-10南京莱科智能工程研究院有限公司 Multilingual text matching method and system based on semantic understanding
CN111428509A (en)*2020-03-052020-07-17北京一览群智数据科技有限责任公司Latin letter-based Uygur language processing method and system
CN111428443A (en)*2020-04-152020-07-17中国电子科技网络信息安全有限公司 An Entity Linking Method Based on Entity Context Semantic Interaction
CN111444695A (en)*2020-03-252020-07-24腾讯科技(深圳)有限公司Text generation method, device and equipment based on artificial intelligence and storage medium
CN111488742A (en)*2019-08-192020-08-04北京京东尚科信息技术有限公司Method and device for translation
CN111507328A (en)*2020-04-132020-08-07北京爱咔咔信息技术有限公司 Text recognition and model training method, system, device and readable storage medium
CN111581987A (en)*2020-04-132020-08-25广州天鹏计算机科技有限公司Disease classification code recognition method, device and storage medium
CN111626062A (en)*2020-05-292020-09-04苏州思必驰信息科技有限公司Text semantic coding method and system
CN112037776A (en)*2019-05-162020-12-04武汉Tcl集团工业研究院有限公司Voice recognition method, voice recognition device and terminal equipment
CN112084794A (en)*2020-09-182020-12-15西藏大学 A Tibetan-Chinese translation method and device
WO2020253060A1 (en)*2019-06-172020-12-24平安科技(深圳)有限公司Speech recognition method, model training method, apparatus and device, and storage medium
CN112185104A (en)*2020-08-222021-01-05南京理工大学Traffic big data restoration method based on countermeasure autoencoder
CN112329760A (en)*2020-11-172021-02-05内蒙古工业大学 End-to-end printed Mongolian recognition and translation method based on spatial transformation network
CN112507733A (en)*2020-11-062021-03-16昆明理工大学Dependency graph network-based Hanyue neural machine translation method
CN112580373A (en)*2020-12-262021-03-30内蒙古工业大学High-quality Mongolian unsupervised neural machine translation method
CN112947930A (en)*2021-01-292021-06-11南通大学Method for automatically generating Python pseudo code based on Transformer
CN113065432A (en)*2021-03-232021-07-02内蒙古工业大学Handwritten Mongolian recognition method based on data enhancement and ECA-Net
CN113076398A (en)*2021-03-302021-07-06昆明理工大学Cross-language information retrieval method based on bilingual dictionary mapping guidance
CN113095091A (en)*2021-04-092021-07-09天津大学Chapter machine translation system and method capable of selecting context information
CN113177546A (en)*2021-04-302021-07-27中国科学技术大学Target detection method based on sparse attention module
CN113255597A (en)*2021-06-292021-08-13南京视察者智能科技有限公司Transformer-based behavior analysis method and device and terminal equipment thereof
CN113297841A (en)*2021-05-242021-08-24哈尔滨工业大学Neural machine translation method based on pre-training double-word vectors
CN113761841A (en)*2021-04-192021-12-07腾讯科技(深圳)有限公司Method for converting text data into acoustic features
CN114021590A (en)*2021-11-082022-02-08北京理工大学Neural machine translation method based on local phrase syntax enhancement mechanism
CN114281929A (en)*2021-08-232022-04-05腾讯科技(深圳)有限公司Data processing method and related device
CN114528855A (en)*2022-01-262022-05-24沈阳雅译网络技术有限公司Performance improving method of neural machine translation system
CN114880341A (en)*2022-04-262022-08-09同济大学 Text conversion encoder, text-to-SQL query analysis method and system
CN115510812A (en)*2022-09-142022-12-23湖州师范学院 Ultra-Long Sequence Processing Method Based on Improved Transformer Model
US11556723B2 (en)2019-10-242023-01-17Beijing Xiaomi Intelligent Technology Co., Ltd.Neural network model compression method, corpus translation method and device
CN116186249A (en)*2022-10-242023-05-30数采小博科技发展有限公司Item prediction robot for electronic commerce commodity and implementation method thereof
CN116911311A (en)*2023-08-022023-10-20北京市农林科学院 A method of technical consultation and question-and-answer in the agricultural field
CN117711417A (en)*2024-02-052024-03-15武汉大学Voice quality enhancement method and system based on frequency domain self-attention network
CN118862002A (en)*2024-09-262024-10-29中国海洋大学 Position and semantic optimization method and system for remote sensing visual question answering
CN119180566A (en)*2024-11-122024-12-24山东能源数智云科技有限公司Key work completion evaluation method and device based on generation type AI large model

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105957518A (en)*2016-06-162016-09-21内蒙古大学Mongolian large vocabulary continuous speech recognition method
CN107967262A (en)*2017-11-022018-04-27内蒙古工业大学A kind of neutral net covers Chinese machine translation method
CN108681539A (en)*2018-05-072018-10-19内蒙古工业大学A kind of illiteracy Chinese nerve interpretation method based on convolutional neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105957518A (en)*2016-06-162016-09-21内蒙古大学Mongolian large vocabulary continuous speech recognition method
CN107967262A (en)*2017-11-022018-04-27内蒙古工业大学A kind of neutral net covers Chinese machine translation method
CN108681539A (en)*2018-05-072018-10-19内蒙古工业大学A kind of illiteracy Chinese nerve interpretation method based on convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ASHISH VASWANI ET AL.: "Attention Is All You Need", 《31ST CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NIPS 2017)》*
王桐 等: "WordNet中的综合概念语义相似度计算方法", 《北京邮电大学学报》*

Cited By (85)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110597947B (en)*2019-03-202023-03-28桂林电子科技大学Reading understanding system and method based on global and local attention interaction
CN110597947A (en)*2019-03-202019-12-20桂林电子科技大学 A reading comprehension system and method based on the interaction of global and local attention
CN110083826A (en)*2019-03-212019-08-02昆明理工大学A kind of old man's bilingual alignment method based on Transformer model
CN112037776A (en)*2019-05-162020-12-04武汉Tcl集团工业研究院有限公司Voice recognition method, voice recognition device and terminal equipment
CN110196946B (en)*2019-05-292021-03-30华南理工大学Personalized recommendation method based on deep learning
CN110196946A (en)*2019-05-292019-09-03华南理工大学A kind of personalized recommendation method based on deep learning
CN110349676B (en)*2019-06-142021-10-29华南师范大学 Time series physiological data classification method, device, storage medium and processor
CN110349676A (en)*2019-06-142019-10-18华南师范大学Timing physiological data classification method, device, storage medium and processor
WO2020253060A1 (en)*2019-06-172020-12-24平安科技(深圳)有限公司Speech recognition method, model training method, apparatus and device, and storage medium
CN110297887A (en)*2019-06-262019-10-01山东大学Service robot personalization conversational system and method based on cloud platform
CN110619034A (en)*2019-06-272019-12-27中山大学Text keyword generation method based on Transformer model
CN110321962B (en)*2019-07-092021-10-08北京金山数字娱乐科技有限公司 A data processing method and device
CN110321962A (en)*2019-07-092019-10-11北京金山数字娱乐科技有限公司Data processing method and device
CN110321961A (en)*2019-07-092019-10-11北京金山数字娱乐科技有限公司A kind of data processing method and device
CN110427493A (en)*2019-07-112019-11-08新华三大数据技术有限公司Electronic health record processing method, model training method and relevant apparatus
CN110427493B (en)*2019-07-112022-04-08新华三大数据技术有限公司Electronic medical record processing method, model training method and related device
CN110390340A (en)*2019-07-182019-10-29暗物智能科技(广州)有限公司The training method and detection method of feature coding model, vision relationship detection model
CN111488742A (en)*2019-08-192020-08-04北京京东尚科信息技术有限公司Method and device for translation
CN111488742B (en)*2019-08-192021-06-29北京京东尚科信息技术有限公司Method and device for translation
CN110704587B (en)*2019-08-222023-10-20平安科技(深圳)有限公司Text answer searching method and device
CN110704587A (en)*2019-08-222020-01-17平安科技(深圳)有限公司Text answer searching method and device
CN110598221A (en)*2019-08-292019-12-20内蒙古工业大学 A Method of Improving the Quality of Mongolian-Chinese Translation Using Generative Adversarial Networks to Construct Mongolian-Chinese Parallel Corpus
CN110543551A (en)*2019-09-042019-12-06北京香侬慧语科技有限责任公司question and statement processing method and device
CN110543551B (en)*2019-09-042022-11-08北京香侬慧语科技有限责任公司Question and statement processing method and device
CN110717343B (en)*2019-09-272023-03-14电子科技大学Optimal alignment method based on transformer attention mechanism output
CN110717343A (en)*2019-09-272020-01-21电子科技大学 An optimized alignment method based on the output of transformer attention mechanism
CN110674647A (en)*2019-09-272020-01-10电子科技大学 A layer fusion method and computer equipment based on Transformer model
CN110765768A (en)*2019-10-162020-02-07北京工业大学 An optimized text summary generation method
US11556723B2 (en)2019-10-242023-01-17Beijing Xiaomi Intelligent Technology Co., Ltd.Neural network model compression method, corpus translation method and device
CN110795535A (en)*2019-10-282020-02-14桂林电子科技大学Reading understanding method for depth separable convolution residual block
CN110827219B (en)*2019-10-312023-04-07北京小米智能科技有限公司Training method, device and medium of image processing model
CN110827219A (en)*2019-10-312020-02-21北京小米智能科技有限公司Training method, device and medium of image processing model
CN111105423B (en)*2019-12-172021-06-29北京小白世纪网络科技有限公司Deep learning-based kidney segmentation method in CT image
CN111105423A (en)*2019-12-172020-05-05北京小白世纪网络科技有限公司Deep learning-based kidney segmentation method in CT image
CN111080032A (en)*2019-12-302020-04-28成都数之联科技有限公司Load prediction method based on Transformer structure
CN111080032B (en)*2019-12-302023-08-29成都数之联科技股份有限公司 A Load Forecasting Method Based on Transformer Structure
CN111353315B (en)*2020-01-212023-04-25沈阳雅译网络技术有限公司 A Deep Neural Machine Translation System Based on Random Residual Algorithm
CN111353315A (en)*2020-01-212020-06-30沈阳雅译网络技术有限公司 A Deep Neural Machine Translation System Based on Stochastic Residual Algorithm
CN111382583A (en)*2020-03-032020-07-07新疆大学Chinese-Uygur name translation system with mixed multiple strategies
CN111428509A (en)*2020-03-052020-07-17北京一览群智数据科技有限责任公司Latin letter-based Uygur language processing method and system
CN111310485A (en)*2020-03-122020-06-19南京大学Machine translation method, device and storage medium
CN111310485B (en)*2020-03-122022-06-21南京大学 Machine translation method, device and storage medium
CN111444695B (en)*2020-03-252022-03-01腾讯科技(深圳)有限公司Text generation method, device and equipment based on artificial intelligence and storage medium
CN111444695A (en)*2020-03-252020-07-24腾讯科技(深圳)有限公司Text generation method, device and equipment based on artificial intelligence and storage medium
CN111507328A (en)*2020-04-132020-08-07北京爱咔咔信息技术有限公司 Text recognition and model training method, system, device and readable storage medium
CN111581987A (en)*2020-04-132020-08-25广州天鹏计算机科技有限公司Disease classification code recognition method, device and storage medium
CN111428443A (en)*2020-04-152020-07-17中国电子科技网络信息安全有限公司 An Entity Linking Method Based on Entity Context Semantic Interaction
CN111428443B (en)*2020-04-152022-09-13中国电子科技网络信息安全有限公司Entity linking method based on entity context semantic interaction
CN111401052A (en)*2020-04-242020-07-10南京莱科智能工程研究院有限公司 Multilingual text matching method and system based on semantic understanding
CN111626062A (en)*2020-05-292020-09-04苏州思必驰信息科技有限公司Text semantic coding method and system
CN111626062B (en)*2020-05-292023-05-30思必驰科技股份有限公司 Text Semantic Coding Method and System
CN112185104B (en)*2020-08-222021-12-10南京理工大学 A traffic big data repair method based on adversarial autoencoder
CN112185104A (en)*2020-08-222021-01-05南京理工大学Traffic big data restoration method based on countermeasure autoencoder
CN112084794A (en)*2020-09-182020-12-15西藏大学 A Tibetan-Chinese translation method and device
CN112507733A (en)*2020-11-062021-03-16昆明理工大学Dependency graph network-based Hanyue neural machine translation method
CN112329760B (en)*2020-11-172021-12-21内蒙古工业大学 End-to-end printed Mongolian recognition and translation method based on spatial transformation network
CN112329760A (en)*2020-11-172021-02-05内蒙古工业大学 End-to-end printed Mongolian recognition and translation method based on spatial transformation network
CN112580373B (en)*2020-12-262023-06-27内蒙古工业大学High-quality Mongolian non-supervision neural machine translation method
CN112580373A (en)*2020-12-262021-03-30内蒙古工业大学High-quality Mongolian unsupervised neural machine translation method
CN112947930B (en)*2021-01-292024-05-17南通大学Automatic generation method of Python pseudo code based on transducer
CN112947930A (en)*2021-01-292021-06-11南通大学Method for automatically generating Python pseudo code based on Transformer
CN113065432A (en)*2021-03-232021-07-02内蒙古工业大学Handwritten Mongolian recognition method based on data enhancement and ECA-Net
CN113076398B (en)*2021-03-302022-07-29昆明理工大学Cross-language information retrieval method based on bilingual dictionary mapping guidance
CN113076398A (en)*2021-03-302021-07-06昆明理工大学Cross-language information retrieval method based on bilingual dictionary mapping guidance
CN113095091A (en)*2021-04-092021-07-09天津大学Chapter machine translation system and method capable of selecting context information
CN113761841A (en)*2021-04-192021-12-07腾讯科技(深圳)有限公司Method for converting text data into acoustic features
CN113177546A (en)*2021-04-302021-07-27中国科学技术大学Target detection method based on sparse attention module
CN113297841A (en)*2021-05-242021-08-24哈尔滨工业大学Neural machine translation method based on pre-training double-word vectors
CN113255597A (en)*2021-06-292021-08-13南京视察者智能科技有限公司Transformer-based behavior analysis method and device and terminal equipment thereof
CN114281929A (en)*2021-08-232022-04-05腾讯科技(深圳)有限公司Data processing method and related device
CN114281929B (en)*2021-08-232024-12-20腾讯科技(深圳)有限公司 A data processing method and related device
CN114021590A (en)*2021-11-082022-02-08北京理工大学Neural machine translation method based on local phrase syntax enhancement mechanism
CN114528855A (en)*2022-01-262022-05-24沈阳雅译网络技术有限公司Performance improving method of neural machine translation system
CN114880341A (en)*2022-04-262022-08-09同济大学 Text conversion encoder, text-to-SQL query analysis method and system
CN114880341B (en)*2022-04-262025-02-21同济大学 Text conversion encoder, text conversion to SQL query analysis method and system
CN115510812A (en)*2022-09-142022-12-23湖州师范学院 Ultra-Long Sequence Processing Method Based on Improved Transformer Model
CN116186249B (en)*2022-10-242023-10-13数采小博科技发展有限公司Item prediction robot for electronic commerce commodity and implementation method thereof
CN116186249A (en)*2022-10-242023-05-30数采小博科技发展有限公司Item prediction robot for electronic commerce commodity and implementation method thereof
CN116911311A (en)*2023-08-022023-10-20北京市农林科学院 A method of technical consultation and question-and-answer in the agricultural field
CN117711417A (en)*2024-02-052024-03-15武汉大学Voice quality enhancement method and system based on frequency domain self-attention network
CN117711417B (en)*2024-02-052024-04-30武汉大学Voice quality enhancement method and system based on frequency domain self-attention network
CN118862002A (en)*2024-09-262024-10-29中国海洋大学 Position and semantic optimization method and system for remote sensing visual question answering
CN118862002B (en)*2024-09-262025-01-28中国海洋大学 Position and semantic optimization method and system for remote sensing visual question answering
CN119180566A (en)*2024-11-122024-12-24山东能源数智云科技有限公司Key work completion evaluation method and device based on generation type AI large model
CN119180566B (en)*2024-11-122025-04-01山东能源数智云科技有限公司Key work completion evaluation method and device based on generation type AI large model

Similar Documents

PublicationPublication DateTitle
CN109492232A (en)A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer
JP7468929B2 (en) How to acquire geographical knowledge
CN110490946B (en) A text-to-image method based on cross-modal similarity and generative adversarial networks
CN111444343B (en)Cross-border national culture text classification method based on knowledge representation
CN109840322B (en)Complete shape filling type reading understanding analysis model and method based on reinforcement learning
CN111611361A (en) Extractive Machine Intelligence Reading Comprehension Question Answering System
CN112527938A (en)Chinese POI matching method based on natural language understanding
CN108647350A (en)Image-text associated retrieval method based on two-channel network
CN105938485A (en)Image description method based on convolution cyclic hybrid model
CN110287814A (en)Visual question-answering method based on image target characteristics and multilayer attention mechanism
CN115422369B (en)Knowledge graph completion method and device based on improved TextRank
CN112308326A (en) A Link Prediction Method for Biological Networks Based on Meta-Paths and Bidirectional Encoders
CN115438674B (en)Entity data processing method, entity linking method, entity data processing device, entity linking device and computer equipment
CN115204171B (en) Document-level event extraction method and system based on hypergraph neural network
Li et al.A relation aware embedding mechanism for relation extraction
Li et al.Combining local and global features into a Siamese network for sentence similarity
CN114220096A (en)Remote sensing image semantic understanding method based on image description
CN111145914A (en)Method and device for determining lung cancer clinical disease library text entity
CN114925232A (en) A cross-modal temporal video localization method under the framework of text question answering
CN117350378A (en) A natural language understanding algorithm based on semantic matching and knowledge graph
Zhang et al.Consensus knowledge exploitation for partial query based image retrieval
CN113535897A (en)Fine-grained emotion analysis method based on syntactic relation and opinion word distribution
CN113515947B (en)Training method for cascading place name entity recognition model
CN114065769A (en) Training method, device, equipment and medium for emotional reasons to extraction model
CN116128065A (en)Parallel automatic machine learning system based on bidirectional automatic encoder

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication

Application publication date:20190319

RJ01Rejection of invention patent application after publication

[8]ページ先頭

©2009-2025 Movatter.jp