Summary of the invention
In view of this, one of the objects of the present invention is to provide a kind of Sentence-level bilingual alignment method and devices, computerReadable storage medium storing program for executing is conducive to the raising of sentence alignment efficiency.
In order to achieve the above objectives, technical solution of the present invention provides a kind of Sentence-level bilingual alignment method, comprising:
Step S1: Z trained convolution kernels are obtained, wherein Z is the integer more than or equal to 1, is trained described in eachConvolution kernel obtained by step S11- step S15;
Step S11: punctuate processing is carried out with text to two training respectively, and establishes the text of described two trained textsThis similarity matrix B:
Wherein, n is the sentence that described two training are handled with text by punctuate with a training in textQuantity, m are the quantity for the sentence that described two training are handled with text by punctuate with another training in text, textElement K in this similarity matrix BijI-th of the sentence handled by punctuate with text for one training with it is describedThe text similarity for j-th of sentence that another training is handled with text by punctuate;
Step S12: initialization convolution kernel;
Step S13: it is rolled up using text similarity matrix B of the current convolution kernel to described two training textProduct, obtains matrix P, and calculate penalty values loss, if penalty values loss meets preset requirement, thens follow the steps S14, otherwise, holdRow step S16;
Wherein, if i-th of sentence and described another that one training is handled with text by punctuate are trainedIt is matched with text by j-th of sentence that punctuate is handled, then LijIt is 1, is otherwise 0;
Step S14: verifying current convolution kernel using verifying collection, judges whether the result of verifying meets default wantIt asks, if so, step S15 is executed, if it is not, executing step S16;
Step S15: using current convolution kernel as trained convolution kernel;
Step S16: adjusting the weight of current convolution kernel according to penalty values loss, judges whether current frequency of training reachesTo preset times, if so, step S15 is executed, if it is not, repeating step S13;
Step S2: carrying out punctuate processing to two texts to be aligned respectively, and establishes the text of described two texts to be alignedThis similarity matrix U:
Wherein, a is the sentence that a text to be aligned in described two texts to be aligned is handled by punctuateQuantity, b are the quantity for the sentence that another text to be aligned in described two texts to be aligned is handled by punctuate, textElement K in this similarity matrix UijI-th of the sentence handled for one text to be aligned by punctuate with it is describedThe text similarity for j-th of sentence that another text to be aligned is handled by punctuate;
Step S3: each of the Z trained convolution kernels convolution kernel is respectively adopted to the text similarityMatrix U carries out convolution, obtains Z optimization text similarity matrix;
Step S4: optimize text similarity matrix using described Z and obtain the sentence alignment of described two texts to be alignedAs a result.
Further, Z is integer more than or equal to 2, and the size and weighted of different trained convolution kernels.
Further, the step S4 includes:
Step S41: text matches degree matrix T is calculated according to the Z optimization text similarity matrix, wherein the textElement Y in this matching degree matrix TijI-th of the sentence handled for one text to be aligned by punctuate with it is describedThe text matches degree for j-th of sentence that another text to be aligned is handled by punctuate, and the text matches degree matrix TEach of element value be described Z optimize text similarity matrix in same position element average value;
Step S42: each row element in the text matches degree matrix T is successively traversed, is chosen from each row elementIt is worth maximum element, and corresponding two sentences of the element of the selection is matched.
Further, after the step S42 further include:
Step S43: judge that another described text to be aligned passes through in the b sentence that punctuate is handled with the presence or absence of notThe sentence of pairing, if so, lookup and its maximum sentence of text matches degree in the text matches degree matrix T, and will be describedThe sentence found is matched with it.
Further, after the step S4 further include:
Step S5: the b sentence handled according to another described text to be aligned by punctuate it is described anotherThe a sentence that sequence of positions, one text to be aligned in text to be aligned are handled by punctuate is oneSequence of positions in text to be aligned detects sentence alignment result.
Further, the step S5 includes:
Step S51: according to sequence of positions of the b sentence in another described text to be aligned and the sentenceAlignment result is ranked up a sentence;
Step S52: if there are two sentences in a sentence, described two sentences pass through the position sorted and obtainedSequence is set with sequence of positions of described two sentences in one text to be aligned on the contrary, then there are mistakes for judgement.
Further, include an English text in described two trained texts and described two texts to be aligned withAn and non English language text, wherein calculate in the following ways each sentence that English text is handled by punctuate with it is non-The text similarity K for each sentence that English text is handled by punctuate:
Non English language text is translated by the sentence that punctuate is handled, obtains corresponding English text;
To two sentences of text similarity to be calculated, compare sentence that wherein English text is handled by punctuate withPass through the quantity of word in the English text that the statement translation that punctuate is handled obtains by non English language text;
It calculates
Wherein, E is the word quantity of a fairly large number of one of word in the comparison result, NvFor the comparisonAs a result in a fairly large number of one of middle word v-th of word value, if in the comparison result word negligible amountsOne of include root identical as v-th of word word, then NvValue be 1, be otherwise 0.
To achieve the above object, technical solution of the present invention additionally provides a kind of Sentence-level bilingual alignment device, comprising:
Module is obtained, for obtaining Z trained convolution kernels, wherein Z is the integer more than or equal to 1, described in eachTrained convolution kernel is obtained by step S11- step S15;
Step S11: punctuate processing is carried out with text to two training respectively, and establishes the text of described two trained textsThis similarity matrix B:
Wherein, n is the sentence that described two training are handled with text by punctuate with a training in textQuantity, m are the quantity for the sentence that described two training are handled with text by punctuate with another training in text, textElement K in this similarity matrix BijI-th of the sentence handled by punctuate with text for one training with it is describedThe text similarity for j-th of sentence that another training is handled with text by punctuate;
Step S12: initialization convolution kernel;
Step S13: it is rolled up using text similarity matrix B of the current convolution kernel to described two training textProduct, obtains matrix P, and calculate penalty values loss, if penalty values loss meets preset requirement, thens follow the steps S14, otherwise, holdRow step S16;
Wherein, if i-th of sentence and described another that one training is handled with text by punctuate are trainedIt is matched with text by j-th of sentence that punctuate is handled, then LijIt is 1, is otherwise 0;
Step S14: verifying current convolution kernel using verifying collection, judges whether the result of verifying meets default wantIt asks, if so, step S15 is executed, if it is not, executing step S16;
Step S15: using current convolution kernel as trained convolution kernel;
Step S16: adjusting the weight of current convolution kernel according to penalty values loss, judges whether current frequency of training reachesTo preset times, if so, step S15 is executed, if it is not, repeating step S13;
First processing module for carrying out punctuate processing to two texts to be aligned respectively, and is established described two to rightThe text similarity matrix U of neat text:
Wherein, a is the sentence that a text to be aligned in described two texts to be aligned is handled by punctuateQuantity, b are the quantity for the sentence that another text to be aligned in described two texts to be aligned is handled by punctuate, textElement K in this similarity matrix UijI-th of the sentence handled for one text to be aligned by punctuate with it is describedThe text similarity for j-th of sentence that another text to be aligned is handled by punctuate;
Second processing module, for each of the Z trained convolution kernels convolution kernel to be respectively adopted to describedText similarity matrix U carries out convolution, obtains Z optimization text similarity matrix;
Third processing module, for obtaining described two texts to be aligned using the Z optimization text similarity matrixSentence be aligned result.
To achieve the above object, technical solution of the present invention additionally provides a kind of Sentence-level bilingual alignment device, including placeReason device and the memory that couple with the processor, wherein the processor is for executing the instruction in memory, in realizationState Sentence-level bilingual alignment method.
To achieve the above object, technical solution of the present invention additionally provides a kind of computer readable storage medium, the meterCalculation machine readable storage medium storing program for executing is stored with computer program, and the computer program realizes that above-mentioned Sentence-level is double when being executed by processorThe step of language alignment schemes.
Sentence-level bilingual alignment method provided by the invention, by using trained convolution kernel to two texts to be alignedText similarity matrix carry out convolution, and sentence alignment are carried out to two texts to be aligned according to the result of convolution, not only may be usedIt to reduce artificial participation, realizes sentence automatic aligning, the accuracy rate of alignment can also be improved, be conducive to sentence pair between raising textNeat efficiency.
Specific embodiment
Below based on embodiment, present invention is described, but the present invention is not restricted to these embodiments.UnderText is detailed to describe some specific detail sections in datail description of the invention, in order to avoid obscuring essence of the invention,There is no narrations in detail for well known method, process, process, element.
In addition, it should be understood by one skilled in the art that provided herein attached drawing be provided to explanation purpose, andWhat attached drawing was not necessarily drawn to scale.
Unless the context clearly requires otherwise, "include", "comprise" otherwise throughout the specification and claims etc. are similarWord should be construed as the meaning for including rather than exclusive or exhaustive meaning;That is, be " including but not limited to " containsJustice.
In the description of the present invention, it is to be understood that, term " first ", " second " etc. are used for description purposes only, withoutIt can be interpreted as indication or suggestion relative importance.In addition, in the description of the present invention, unless otherwise indicated, the meaning of " multiple "It is two or more.
It is a kind of flow chart of Sentence-level bilingual alignment method provided in an embodiment of the present invention, this method referring to Fig. 1, Fig. 1Include:
Step S1: Z trained convolution kernels are obtained, wherein Z is the integer more than or equal to 1, is trained described in eachConvolution kernel obtained by step S11- step S15;
Step S11: punctuate processing is carried out with text to two training respectively, and establishes the text of described two trained textsThis similarity matrix B:
Wherein, n is the sentence that described two training are handled with text by punctuate with a training in textQuantity, m are the quantity for the sentence that described two training are handled with text by punctuate with another training in text, textElement K in this similarity matrix BijI-th of the sentence handled by punctuate with text for one training with it is describedThe text similarity for j-th of sentence that another training is handled with text by punctuate;
Step S12: initialization convolution kernel;
Step S13: it is rolled up using text similarity matrix B of the current convolution kernel to described two training textProduct, obtains matrix P, and calculate penalty values loss, if penalty values loss meets preset requirement, thens follow the steps S14, otherwise, holdRow step S16;
Wherein, if i-th of sentence and described another that one training is handled with text by punctuate are trainedIt is matched with text by j-th of sentence that punctuate is handled, then LijIt is 1, is otherwise 0;
Step S14: verifying current convolution kernel using verifying collection, judges whether the result of verifying meets default wantIt asks, if so, step S15 is executed, if it is not, executing step S16;
Step S15: using current convolution kernel as trained convolution kernel;
Step S16: adjusting the weight of current convolution kernel according to penalty values loss, judges whether current frequency of training reachesTo preset times, if so, step S15 is executed, if it is not, repeating step S13;
Step S2: carrying out punctuate processing to two texts to be aligned respectively, and establishes the text of described two texts to be alignedThis similarity matrix U:
Wherein, a is the sentence that a text to be aligned in described two texts to be aligned is handled by punctuateQuantity, b are the quantity for the sentence that another text to be aligned in described two texts to be aligned is handled by punctuate, textElement K in this similarity matrix UijI-th of the sentence handled for one text to be aligned by punctuate with it is describedThe text similarity for j-th of sentence that another text to be aligned is handled by punctuate;
Step S3: each of the Z trained convolution kernels convolution kernel is respectively adopted to the text similarityMatrix U carries out convolution, obtains Z optimization text similarity matrix;
Step S4: optimize text similarity matrix using described Z and obtain the sentence alignment of described two texts to be alignedAs a result.
Sentence-level bilingual alignment method provided in an embodiment of the present invention, by using trained convolution kernel to two to rightThe text similarity matrix of neat text carries out convolution, and carries out sentence alignment to two texts to be aligned according to the result of convolution,It can not only reduce artificial participation, realize sentence automatic aligning, the accuracy rate of alignment can also be improved, be conducive to improve between textThe efficiency of sentence alignment.
The trained convolution kernel of each of embodiment of the present invention can be obtained by convolutional neural networks training, such as Fig. 2It is shown, by using sentence be aligned result known to two training use the text similarity matrix B of text as the input of training set,And objective matrix is inputted, objective matrix (i.e. model answer) is used for compared with the matrix that neural network returns, so that nerve netThe output of network is infinitely close to objective matrix, to obtain required convolution kernel, detailed process is as follows:
Step A1: obtaining two trained texts from training set, for example, one of training is English text with text(original text), another training are Chinese text (translation) with text, and the sentence of two trained texts is aligned known to result;
Step A2: punctuate processing is carried out with text to two training respectively;
Punctuate processing is carried out for dividing the marking symbols of sentence for example, can use in text, with bilingual Chinese-English rightFor neat, Chinese with ".","!" it is ending, English is ending with " ", is made pauses in reading unpunctuated ancient writings if there are above-mentioned marking symbols, is brokenTwo lists are obtained after sentence, respectively one English (original text) sentence list and one including n English sentence includes m ChineseChinese (translation) sentence list of sentence, each of English sentence list sentence is independent a word in original text, middle sentenceEach of list sentence is independent a word in translation,, can be with for each sentence list in addition, for convenient for processingEach sentence therein is numbered according to text tandem (i.e. the sequence of positions of sentence in the text), as sentence ropeDraw, for example, the number of the sentence of beginning location is 1 in English text ... in English sentence list, the sentence of end positionNumber be n, in Chinese sentence list, the number of the sentence of beginning location is 1 in Chinese text ..., the language of end positionThe number of sentence is m;
Step A3: establishing the text similarity matrix B of two trained texts, i.e., for the m word in Chinese list,All with each progress similarity system design of n word in English list, detailed process is as follows:
Firstly, using translation tool by the identical language of translator of Chinese Cheng Yuyuan (English) text, i.e., in Chinese sentence listEach sentence translated, obtain the wherein corresponding English text of each sentence;
To two sentences (a Chinese sentence and an English sentence) of text similarity to be calculated, compare wherein english statementThe quantity of word in the English text obtained with Chinese statement translation;
It calculates later
Wherein, E is the word quantity of a fairly large number of one of word in the comparison result, NvFor the comparisonAs a result in a fairly large number of one of middle word v-th of word value, if in the comparison result word negligible amountsOne of include root identical as v-th of word word, then NvValue be 1, be otherwise 0;
It should be noted that if comparison result is identical for the word quantity of the two, then it can be using any one as wordA fairly large number of one, negligible amounts one of of the another one as word;
I.e. by taking root to exactly match the word in sentence, and the text between two sentences is calculated using above-mentioned formulaSimilarity, if root is identical, coupling number adds 1. matched sums as molecule, the length of the sentence (number of word i.e. in sentenceAmount) it is used as denominator to take the word quantity of longer sentence as denominator if length is inconsistent;
By the above-mentioned means, available m*n text similarity, is indicated, i.e., using the matrix that a size is m*nAs text similarity matrix B;
Wherein, the element K in text similarity matrix BijIt (is numbered for i-th of sentence in above-mentioned English sentence listFor the sentence of i) text similarity with j-th of sentence (i.e. number be j sentence) in above-mentioned Chinese sentence list;
For example, obtaining its text similarity matrix as shown in figure 3, can see after being handled with text two trainingOut the element aggregation of matrix intermediate value larger (i.e. text similarity is higher) since the upper left corner to the diagonal line that the lower right corner terminatesPosition, this is because China and British text sentence sequencing having the same;
Step A4: initialization convolution kernel, and the convolution kernel that initialization is obtained executes step A5 as current convolution kernel;
Step A5: result is aligned according to the sentence of above-mentioned two training text and establishes objective matrix J;
Wherein, the element L in objective matrix JijI-th of sentence and above-mentioned Chinese in corresponding above-mentioned English sentence listJ-th of sentence in sentence list, and the value of element is determined by known sentence alignment result, if i-th in English sentence listJ-th of sentence in a sentence and Chinese sentence list matches, LijValue be 1, be otherwise 0;
For example, as shown in Figure 4 according to the objective matrix J that above-mentioned two training is established with text;
Step A6: carrying out convolution using text similarity matrix B of the current convolution kernel to described two trained texts,Matrix P is obtained, and calculates penalty values loss using the objective matrix J established, if penalty values loss meets preset requirement (such as less thanOne threshold value), A7 is thened follow the steps, otherwise, executes step A9;
Step A7: verifying current convolution kernel using verifying collection, judges whether the result of verifying meets default wantIt asks, if so, step A8 is executed, if it is not, executing step A9;
Wherein, which includes several verifying texts pair, each verifying text is to including an English textThis (original text) and a Chinese text (translation);
Wherein, verification process is substantially similar to training process, and details are not described herein again, when the damage for verifying collection in the result of verifyingMistake value loss is less than a certain threshold value, and when the accuracy rate for verifying collection is greater than a certain threshold value, it is default to determine that the result of verifying meetsIt is required that;
Step A8: using current convolution kernel as trained convolution kernel;
Step A9: adjusting the weight of current convolution kernel according to penalty values loss, judges whether current frequency of training reachesTo preset times, if so, step A8 is executed, if it is not, repeating step A6.
Preferably, in one embodiment, Z is integer more than or equal to 2, and the size of different trained convolution kernels andWeighted, for example, the value of Z can be 3,5 or 6;
To obtain multiple trained convolution kernels, multiple convolution kernels (different volumes that initialization obtains can be initialized respectivelyThe size and weight of product core are different), later using each convolution kernel respectively to the text of above-mentioned two trained textSimilarity matrix B carries out convolution algorithm, operation the result is that multiple changed matrixes of numerical value, later by obtain eachMatrix obtains the penalty values loss of different convolutional neural networks compared with objective matrix, wherein the more big then table of penalty values lossShow that neural network effect is more bad, need parameter adjustment bigger, penalty values loss is smaller, indicates that neural network effect is better, needsWant parameter adjustment smaller, therefore can be according to respectively different penalty values loss, reverse transfer is to corresponding convolution mindThrough network, each convolutional neural networks reversely successively adjusts network parameter according to respective penalty values loss, i.e. adjustment convolutionThe weight of core, the weighted value that each backpropagation of each convolution kernel is adjusted is not identical, until penalty values loss reaches pre-Phase requires.
It should be noted that memory can be stored it in after obtaining trained convolution kernel through the above wayIn, when need to use, it can read and obtain directly from memory.
For example, in one embodiment, in two texts to be aligned, one of them text to be aligned is English text(original text), another text to be aligned are Chinese text (translation), wherein establish the text similarity of two texts to be alignedThe method (i.e. above-mentioned steps A1, A2, A3) that matrix U and the text similarity matrix B for establishing above-mentioned two training text are adoptedIdentical, details are not described herein again;
In above-mentioned steps S3, by by the text similarity matrix U of two texts to be aligned and trained convolution kernelConvolution is carried out, realizes that the optimization to text similarity matrix U is corrected, obtains optimization text similarity matrix;
For example, in one embodiment, above-mentioned steps S4 includes:
Step S41: text matches degree matrix T is calculated according to the Z optimization text similarity matrix, wherein the textElement Y in this matching degree matrix TijI-th of the sentence handled for one text to be aligned by punctuate with it is describedThe text matches degree for j-th of sentence that another text to be aligned is handled by punctuate, and the text matches degree matrix TEach of element value be described Z optimize text similarity matrix in same position element average value;
After the contraposition of obtained Z optimization text similarity matrix is added, the element of each position is averaging, is obtainedTo text matches degree matrix T;
It should be noted that if the value of Z is 1, it can be directly using optimization text similarity matrix as text matches degree squareBattle array;
For example, with reference to Fig. 5, the text similarity matrix U of two texts to be aligned and 3 trained convolution kernels are carried outConvolution obtains 3 optimization text similarity matrixes, text matches degree matrix is calculated later;
Step S42: each row element in the text matches degree matrix T is successively traversed, is chosen from each row elementIt is worth maximum element, and corresponding two sentences of the element of the selection is matched;
For example, for each row element, therefrom selective value is maximum for text matches degree matrix obtained in Fig. 5Element matches corresponding two sentences of the element of selection, obtain three pairing as a result, i.e. the 1st row (i.e. said one waits forAligning texts pass through the 1st sentence that punctuate is handled) (another i.e. above-mentioned text to be aligned is by punctuate with the 1st columnManage the 1st obtained sentence) pairing, the 2nd row (i.e. said one text to be aligned passes through the 2nd sentence that punctuate is handled)With the 3rd column (another i.e. above-mentioned text to be aligned passes through the 3rd sentence that punctuate is handled) pairing, the 3rd row (i.e. above-mentioned oneA text to be aligned passes through the 3rd sentence that punctuate is handled) (another i.e. above-mentioned text to be aligned is by disconnected with the 3rd columnThe 3rd sentence that sentence processing obtains) pairing:
Wherein, in this step, if in a line, there are multiple maximum elements of value are (i.e. same in text matches degree matrix TThe value of multiple elements is maximum value in a line), then determine the value with the maximum element of a line intermediate value first, and asCurrent lookup value is searched in above-mentioned Z optimization text similarity matrix and above-mentioned multiple maximum same positions of element of value laterElement, and determine the most position of current lookup value number wherein occur, and by determining corresponding two sentences in position intoRow pairing, for example, the element in the first row is [0.7,0.7,0.3], wherein first for the text matches degree matrix in Fig. 5The value of the element of the second column position of element and the first row of the first column position of row is maximum value 0.7, then searches 3 optimization textsThe element of the second column position of the element of the first column position of the first row and the first row in this similarity matrix, due to 3 optimization textsThe first row element in this similarity matrix is respectively [0.7,0.6,0.3], [0.7,0.6,0.2] [0.7,0.9,0.4], can be withSee the element of the first column position of the first row occur 0.7 number it is most, therefore by the 1st row (i.e. said one text to be alignedThe 1st sentence handled by punctuate) (another i.e. above-mentioned text to be aligned is handled by punctuate with the 1st column1st sentence) pairing, in addition, if in text matches degree matrix T in a line there are the maximum elements of multiple values, can also be fromAn element is randomly choosed in multiple maximum element of value is used as the maximum element of value;
S42 can match each of said one text to be aligned sentence through the above steps, but mayIt is unpaired in the presence of the sentence in another one or more above-mentioned text to be aligned, it is preferable that in step S4, the stepAfter rapid S42 further include:
Step S43: judge that another described text to be aligned passes through in the b sentence that punctuate is handled with the presence or absence of notThe sentence of pairing, if so, lookup and its maximum sentence of text matches degree in the text matches degree matrix T, and will be describedThe sentence found is matched with it, is realized to the column leakage detection in matrix;
For example, after being matched by step S42, there are still the 2nd for text matches degree matrix obtained in Fig. 5Arranging not matching row, (i.e. another text to be aligned is unpaired language by the 2nd sentence that punctuate is handledSentence), then wherein maximum value element is searched in the 2nd column in text matches degree matrix T, obtained result is that the 1st row the 2nd arranges positionThe element set, thus by the 1st row (i.e. said one text to be aligned passes through the 1st sentence that punctuate is handled) and the 2nd column(another i.e. above-mentioned text to be aligned passes through the 2nd sentence that punctuate is handled) matches, through the above steps S42-S43,The pairing result that text matches degree matrix in Fig. 5 obtains are as follows: the 1st row and the 1st column pairing, the 1st row and the 2nd column pairing, the 2nd rowIt is matched with the 3rd column pairing, the 3rd row and the 3rd column;
Preferably, in one embodiment, after the step S4 further include:
Step S5: the b sentence handled according to another described text to be aligned by punctuate it is described anotherThe a sentence that sequence of positions, one text to be aligned in text to be aligned are handled by punctuate is oneSequence of positions in text to be aligned detects sentence alignment result;
For example, the step S5 can be specifically included:
Step S51: according to sequence of positions of the b sentence in another described text to be aligned and the sentenceAlignment result is ranked up a sentence;
Step S52: if there are two sentences in a sentence, described two sentences pass through the position sorted and obtainedSequence is set with sequence of positions of described two sentences in one text to be aligned on the contrary, then there are mistakes for judgement, is neededIllustrate, sequence of positions herein refers on the contrary: for two sentences in one text to be aligned, if passed throughThe sequence of positions that sequence in step S51 obtains be one of sentence be located at before another sentence, but it is one toSaid one sentence is located at behind another above-mentioned sentence in aligning texts, it is determined that sequence of positions is opposite.
For example, said one text to be aligned is English text, another text to be aligned is Chinese text, in thisAfter two texts to be aligned of English carry out sentence alignment, it will usually obtain shaped like [in 20, English 25] such matching pair, for into oneStep ground improves the accuracy of pairing, can detect to matched result, specifically, will obtain matching to according to Chinese firstThe number (i.e. sequence of positions of all Chinese sentences made pauses in reading unpunctuated ancient writings of Chinese text in Chinese text) of sentence carry out from it is small toBig sequence is ranked up all English sentences that English text is made pauses in reading unpunctuated ancient writings to realize, then according to the result of the sequenceDetect the number (i.e. sequence of positions of all English sentences made pauses in reading unpunctuated ancient writings of English text in English text) of english sentenceVariation, judges whether it is the variation of monotonic increase, wherein monotonic increase is are as follows: inside a collating sequence, if in rear positionThe number set is greater than the number in front position, then this sequence is monotonic increase, if not meeting the variation of monotonic increase, can incite somebody to actionThe matching of monotonic increase is not met to being marked, to carry out error prompting to user.
Sentence-level bilingual alignment method provided in an embodiment of the present invention, it is contemplated that since complexity is more in sentence alignment procedureThe difference that the text structure of sample and author write habit causes complicated and diversified sentence pairing situation, by using multiple trainingGood convolution kernel carries out convolution to the text similarity matrix of two texts to be aligned, realizes to the excellent of text similarity matrixChange amendment, the matrix after making optimization considers the time sequencing (namely sequence of positions) that sentence occurs in the text, not only avoidsThe interference that identical sentence generates when matching to sentence, and also avoid doing caused by complicated and diversified sentence pairing situationIt disturbs, ensure that the matched accuracy rate of sentence, substantially increase the robustness of algorithm.
The embodiment of the invention also provides a kind of Sentence-level bilingual alignment devices, comprising:
Module is obtained, for obtaining Z trained convolution kernels, wherein Z is the integer more than or equal to 1, described in eachTrained convolution kernel is obtained by step S11- step S15;
Step S11: punctuate processing is carried out with text to two training respectively, and establishes the text of described two trained textsThis similarity matrix B:
Wherein, n is the sentence that described two training are handled with text by punctuate with a training in textQuantity, m are the quantity for the sentence that described two training are handled with text by punctuate with another training in text, textElement K in this similarity matrix BijI-th of the sentence handled by punctuate with text for one training with it is describedThe text similarity for j-th of sentence that another training is handled with text by punctuate;
Step S12: initialization convolution kernel;
Step S13: it is rolled up using text similarity matrix B of the current convolution kernel to described two training textProduct, obtains matrix P, and calculate penalty values loss, if penalty values loss meets preset requirement, thens follow the steps S14, otherwise, holdRow step S16;
Wherein, if i-th of sentence and described another that one training is handled with text by punctuate are trainedIt is matched with text by j-th of sentence that punctuate is handled, then LijIt is 1, is otherwise 0;
Step S14: verifying current convolution kernel using verifying collection, judges whether the result of verifying meets default wantIt asks, if so, step S15 is executed, if it is not, executing step S16;
Step S15: using current convolution kernel as trained convolution kernel;
Step S16: adjusting the weight of current convolution kernel according to penalty values loss, judges whether current frequency of training reachesTo preset times, if so, step S15 is executed, if it is not, repeating step S13;
First processing module for carrying out punctuate processing to two texts to be aligned respectively, and is established described two to rightThe text similarity matrix U of neat text:
Wherein, a is the sentence that a text to be aligned in described two texts to be aligned is handled by punctuateQuantity, b are the quantity for the sentence that another text to be aligned in described two texts to be aligned is handled by punctuate, textElement K in this similarity matrix UijI-th of the sentence handled for one text to be aligned by punctuate with it is describedThe text similarity for j-th of sentence that another text to be aligned is handled by punctuate;
Second processing module, for each of the Z trained convolution kernels convolution kernel to be respectively adopted to describedText similarity matrix U carries out convolution, obtains Z optimization text similarity matrix;
Third processing module, for obtaining described two texts to be aligned using the Z optimization text similarity matrixSentence be aligned result.
Wherein, in one embodiment, Z is integer more than or equal to 2, and the size and power of different trained convolution kernelsWeight is different.
Wherein, in one embodiment, the third processing module includes:
Computing unit, for calculating text matches degree matrix T according to the Z optimization text similarity matrix, wherein instituteState the element Y in text matches degree matrix TijI-th of the sentence handled for one text to be aligned by punctuate withThe text matches degree for j-th of sentence that another described text to be aligned is handled by punctuate, and the text matches degreeThe value of each of matrix T element is the average value of same position element in described Z optimization text similarity matrix;
First pairing unit, for successively traversing each row element in the text matches degree matrix T, from every a line memberThe maximum element of selected value in element, and corresponding two sentences of the element of the selection are matched.
Wherein, in one embodiment, the third processing module further include:
Second pairing unit, for judging that another described text to be aligned passes through in the b sentence that punctuate is handledWith the presence or absence of unpaired sentence, if so, being searched and its maximum language of text matches degree in the text matches degree matrix TSentence, and the sentence found is matched with it.
Wherein, in one embodiment, the Sentence-level bilingual alignment device further include:
As a result detection module, for being existed according to another described text to be aligned by the b sentence that punctuate is handledSequence of positions, one text to be aligned in another described text to be aligned pass through a sentence that punctuate is handledSequence of positions in one text to be aligned detects sentence alignment result.
Wherein, in one embodiment, the result detection module includes:
Sequencing unit, for according to sequence of positions of the b sentence in another described text to be aligned and instituteThe neat result of predicate sentence pair is ranked up a sentence;
Detection unit, if for, there are two sentences, described two sentences to be obtained by the sequence in a sentenceSequence of positions in one text to be aligned of sequence of positions and described two sentences on the contrary, then there are mistakes for judgement.
It wherein, in one embodiment, include one in described two trained texts and described two texts to be alignedEnglish text and a non English language text, wherein calculating English text is handled each by punctuate in the following waysThe text similarity K for each sentence that a sentence and non English language text are handled by punctuate:
Non English language text is translated by the sentence that punctuate is handled, obtains corresponding English text;
To two sentences of text similarity to be calculated, compare sentence that wherein English text is handled by punctuate withPass through the quantity of word in the English text that the statement translation that punctuate is handled obtains by non English language text;
It calculates
Wherein, E is the word quantity of a fairly large number of one of word in the comparison result, NvFor the comparisonAs a result in a fairly large number of one of middle word v-th of word value, if in the comparison result word negligible amountsOne of include root identical as v-th of word word, then NvValue be 1, be otherwise 0.
The embodiment of the invention also provides a kind of Sentence-level bilingual alignment device, including processor and with the processorThe memory of coupling, wherein the processor is used to execute the instruction in memory, realizes above-mentioned Sentence-level bilingual alignment sideMethod.
The embodiment of the invention also provides a kind of computer readable storage medium, the computer-readable recording medium storageThere is the step of computer program, the computer program realizes above-mentioned Sentence-level bilingual alignment method when being executed by processor.
Those skilled in the art will readily recognize that above-mentioned each preferred embodiment can be free under the premise of not conflictingGround combination, superposition.
It should be appreciated that above-mentioned embodiment is merely exemplary, and not restrictive, without departing from of the invention basicIn the case where principle, those skilled in the art can be directed to the various apparent or equivalent modification or replace that above-mentioned details is madeIt changes, is all included in scope of the presently claimed invention.