Invention content
In view of the above problems, it is proposed that the embodiment of the present invention overcoming the above problem or at least partly in order to provide one kindProcessing method, processing unit and the device for processing to solve the above problems, the embodiment of the present invention can improve pending textThe translation quality of this corresponding punctuate result.
To solve the above-mentioned problems, the invention discloses a kind of processing methods, including:
Obtain pending text;
According to the cut-point that the preset punctuation mark for including based on the pending text obtains, the pending text is obtainedThis corresponding optimal punctuate result;Wherein, the synthesized translation optimal quality of the optimal punctuate result, the optimal punctuate resultIncluding:At least one sentence, the synthesized translation quality are the synthesis that all sentences that punctuate result includes correspond to translation quality;
Export the corresponding optimal punctuate result of the processing text.
Optionally, the cut-point that the preset punctuation mark that the foundation includes based on the pending text obtains obtainsThe corresponding optimal punctuate of the pending text is as a result, include:
Using dynamic programming algorithm, the segmentation obtained according to the preset punctuation mark for including based on the pending textPoint obtains the corresponding optimal punctuate result of the pending text.
Optionally, described to utilize dynamic programming algorithm, according to the preset punctuation mark for including based on the pending textObtained cut-point obtains the corresponding optimal punctuate of the pending text as a result, including:
According to the preset punctuation mark that the pending text includes, the corresponding subordinate sentence sequence of the pending text is determinedSet;
According to the subset sequence from small to large of the subordinate sentence arrangement set, determine that each subset corresponds to most by recursion modeThe backtracking cut-point of excellent subset punctuate result;The corresponding synthesized translation optimal quality of the optimal subset punctuate result;
Each subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, obtains describedThe corresponding optimal punctuate result of pending text.
Optionally, the subset of the subordinate sentence arrangement set includes:The preceding i subordinate sentence of the pending text, preceding i subordinate sentenceCorresponding optimal subset synthesized translation quality score is expressed as F (i), the subordinate sentence quantity M of 0≤i≤pending text, then instituteThe subset sequence from small to large according to the subordinate sentence arrangement set is stated, determines that each subset corresponds to optimal subset by recursion modeThe backtracking cut-point of punctuate result, including:
Made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, to obtain k pairs of the preceding i subordinate sentence and the cut-pointThe optimal subset synthesized translation quality score F (k) for the first semantic primitive answered and the translation quality of the second semantic primitive obtainPoint;Wherein, first semantic primitive includes:The subordinate sentence being located at before cut-point k that the preceding i subordinate sentence includes, described theTwo semantic primitives include:The subordinate sentence being located at after cut-point k that the preceding i subordinate sentence includes, 0≤k<i;
The translation quality score of F (k) and second semantic primitive are integrated, to obtain the preceding i subordinate sentenceSynthesized translation quality score corresponding with cut-point k;
According to the preceding i subordinate sentence and the corresponding synthesized translation quality scores of cut-point k, corresponded to from the preceding i subordinate sentenceAt least one cut-point k in obtain the corresponding Target Segmentation point of optimal synthesis translation quality score;
The backtracking cut-point that optimal subset punctuate result is corresponded to using the Target Segmentation point as the preceding i subordinate sentence, withAnd it is integrated the corresponding synthesized translation quality score of the Target Segmentation point as the corresponding optimal subset of the preceding i subordinate sentenceTranslation quality score F (i).
Optionally, each subset according to the subordinate sentence arrangement set corresponds to the backtracking segmentation of optimal subset punctuate resultPoint obtains the corresponding optimal punctuate of the pending text as a result, including:
The backtracking cut-point that optimal subset punctuate result is corresponded to each subset of the subordinate sentence arrangement set is recalled, withThe maximal subset for obtaining the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result;
Maximal subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, to describedPending text is made pauses in reading unpunctuated ancient writings, to obtain the corresponding optimal punctuate result of the pending text.
Optionally, each subset to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate resultRecalled, including:
The corresponding first backtracking cut-point P1 of i subordinate sentence before obtaining;
Obtain the subordinate sentence corresponding second being located at before the first backtracking cut-point P1 that the pending text includesRecall cut-point P2.
Optionally, the cut-point that the preset punctuation mark that the foundation includes based on the pending text obtains obtainsThe corresponding optimal punctuate of the pending text is as a result, include:
According to the cut-point that the preset punctuation mark for including based on the pending text obtains, to the pending textPunctuate processing is carried out, to obtain the corresponding a variety of punctuate results of the pending text;
Determine the corresponding synthesized translation quality of the punctuate result;
The punctuate of selection synthesized translation optimal quality is as a result, make from the pending text corresponding a variety of punctuate resultsFor the corresponding optimal punctuate result of the pending text.
Optionally, the preset punctuation mark includes:Comma and/or branch and/or branch.
On the other hand, the invention discloses a kind of processing units, including:
Pending text acquisition module, for obtaining pending text;
Optimal punctuate result acquisition module, for being obtained according to the preset punctuation mark for including based on the pending textCut-point, obtain the corresponding optimal punctuate result of the pending text;Wherein, the synthesized translation of the optimal punctuate resultOptimal quality, the optimal punctuate result include:At least one sentence, the institute that the synthesized translation quality includes for punctuate resultThere is sentence to correspond to the synthesis of translation quality;And
Optimal punctuate result output module, for exporting the corresponding optimal punctuate result of the processing text.
Optionally, the optimal punctuate result acquisition module includes:
Dynamic Programming acquisition submodule, for utilizing dynamic programming algorithm, foundation to include based on the pending textThe cut-point that preset punctuation mark obtains obtains the corresponding optimal punctuate result of the pending text.
Optionally, the Dynamic Programming acquisition submodule includes:
Subordinate sentence arrangement set determination unit, the preset punctuation mark for including according to the pending text, determines instituteState the corresponding subordinate sentence arrangement set of pending text;
Recursion unit, it is true by recursion mode for the sequence of the subset according to the subordinate sentence arrangement set from small to largeFixed each subset corresponds to the backtracking cut-point of optimal subset punctuate result;And
Optimal punctuate result acquiring unit, for corresponding to optimal subset punctuate according to each subset of the subordinate sentence arrangement setAs a result backtracking cut-point obtains the corresponding optimal punctuate result of the pending text.
Optionally, the subset of the subordinate sentence arrangement set includes:The preceding i subordinate sentence of the pending text, preceding i subordinate sentenceCorresponding optimal subset synthesized translation quality score is expressed as F (i), the subordinate sentence quantity M of 0≤i≤pending text, then instituteStating recursion unit includes:
Subset punctuate subelement, for being made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, to obtain the preceding iThe optimal subset synthesized translation quality score F (k) and the second language of subordinate sentence and corresponding first semantic primitives of the cut-point kThe translation quality score of adopted unit;Wherein, first semantic primitive includes:What the preceding i subordinate sentence included is located at cut-point kSubordinate sentence before, second semantic primitive include:The subordinate sentence being located at after cut-point k that the preceding i subordinate sentence includes, 0≤k<i;
Quality comprehensive subelement, for being integrated to the translation quality score of F (k) and second semantic primitive,To obtain the preceding i subordinate sentence and the corresponding synthesized translation quality scores of cut-point k;
Target Segmentation point obtains subelement, for according to the preceding i subordinate sentence and the corresponding synthesized translation quality of cut-point kScore obtains the corresponding target of optimal synthesis translation quality score from the corresponding at least one cut-point k of the preceding i subordinate sentenceCut-point;
Recall cut-point and obtain subelement, for corresponding to optimal son using the Target Segmentation point as the preceding i subordinate sentenceCollect the backtracking cut-point of punctuate result, and, using the corresponding synthesized translation quality score of the Target Segmentation point as the preceding iThe corresponding optimal subset synthesized translation quality score F (i) of a subordinate sentence.
Optionally, the optimal punctuate result acquiring unit includes:
Recall subelement, the backtracking point for corresponding to optimal subset punctuate result to each subset of the subordinate sentence arrangement setCutpoint is recalled, and the maximal subset to obtain the subordinate sentence arrangement set corresponds to the backtracking segmentation of optimal subset punctuate resultPoint;
Recall punctuate subelement, optimal subset punctuate result is corresponded to for the maximal subset according to the subordinate sentence arrangement setBacktracking cut-point, make pauses in reading unpunctuated ancient writings to the pending text, to obtain the corresponding optimal punctuate result of the pending text.
Optionally, the backtracking subelement includes:
First trace unit, for the corresponding first backtracking cut-point P1 of i subordinate sentence before obtaining;
Second trace unit, for obtain that the pending text includes be located at described first recall cut-point P1 beforeThe corresponding second backtracking cut-point P2 of subordinate sentence.
Optionally, the optimal punctuate result acquisition module includes:
Exhaustive submodule, the cut-point for being obtained according to the preset punctuation mark for including based on the pending text,Punctuate processing is carried out to the pending text, to obtain the corresponding a variety of punctuate results of the pending text;
Comprehensive quality determination sub-module, for determining the corresponding synthesized translation quality of the punctuate result;
As a result submodule is selected, for selecting synthesized translation matter from the corresponding a variety of punctuate results of the pending textOptimal punctuate is measured as a result, as the corresponding optimal punctuate result of the pending text.
Optionally, the preset punctuation mark includes:Comma and/or branch and/or branch.
Include memory and one or one in another aspect, the invention discloses a kind of device for processingAbove program, one of them either more than one program be stored in memory and be configured to by one or one withIt includes the instruction for being operated below that upper processor, which executes the one or more programs,:
Obtain pending text;
According to the cut-point that the preset punctuation mark for including based on the pending text obtains, the pending text is obtainedThis corresponding optimal punctuate result;Wherein, the synthesized translation optimal quality of the optimal punctuate result, the optimal punctuate resultIncluding:At least one sentence, the synthesized translation quality are the synthesis that all sentences that punctuate result includes correspond to translation quality;
Export the corresponding optimal punctuate result of the processing text.
The embodiment of the present invention includes following advantages:
The cut-point that the embodiment of the present invention is obtained according to the preset punctuation mark for including based on pending text, described in acquisitionThe corresponding optimal punctuate result of pending text;Due to the synthesized translation quality of the above-mentioned optimal punctuate result of the embodiment of the present inventionOptimal, the optimal punctuate result may include:At least one sentence, the synthesized translation quality can be a kind of punctuate resultIncluding all sentences correspond to the synthesis of translation quality;Therefore the optimal punctuate result of the embodiment of the present invention can realize synthesized translationThe global optimum of quality, therefore the optimal punctuate result of the embodiment of the present invention can improve the corresponding punctuate result of pending textTranslation quality.
Specific implementation mode
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below in conjunction with the accompanying drawings and specific realApplying mode, the present invention is described in further detail.
An embodiment of the present invention provides a kind of processing scheme, which can be according to based on the pending text packetThe cut-point that the preset punctuation mark contained obtains obtains the corresponding optimal punctuate result of the pending text;Due to the present inventionThe synthesized translation optimal quality of the above-mentioned optimal punctuate result of embodiment, the optimal punctuate result may include:It is at least oneSentence, the synthesized translation quality can be the synthesis that a kind of all sentences that punctuate result includes correspond to translation quality;Therefore thisThe optimal punctuate result of inventive embodiments can realize the global optimum of synthesized translation quality, herein it is globally available in indicate wait forThe corresponding entirety of the corresponding optimal punctuate result of text is handled, therefore the optimal punctuate result of the embodiment of the present invention can be improved and be waited forHandle the translation quality of the corresponding punctuate result of text.
The embodiment of the present invention can be applied to turn in the needs such as machine translation, speech recognition, information service punctuate and machineThe arbitrary scene translated, it will be understood that the embodiment of the present invention does not limit specific application scenarios.
For example, referring to Fig. 1, a kind of example arrangement schematic diagram of processing system of the embodiment of the present invention is shown, haveBody may include:Processing unit 101, machine translation apparatus 102 and translation result output device 103.Wherein, processing unit 101,Machine translation apparatus 102 and translation result output device 103 can be used as individual server, can also be set to jointly sameIn a server, that is, the embodiment of the present invention is for processing unit 101, machine translation apparatus 102 and translation result output device103 specific location does not limit.
Wherein, processing unit 101 can obtain pending text;According to the preset mark for including based on the pending textThe cut-point that point symbol obtains carries out punctuate processing to the pending text, corresponding most to obtain the pending textExcellent punctuate result;And export the corresponding optimal punctuate result of the processing text to machine translation apparatus 102.
Optionally, processing unit 101 can obtain pending text according to the voice signal of spoken user.Such situationUnder, the voice signal of spoken user can be converted to text message by processing unit 101, and from obtaining and waiting in text informationManage text.In practical applications, spoken user may include:The use of voice signal is talked and sent out in the scene of simultaneous interpretationFamily, and/or the user etc. that voice signal is generated by terminal can be said by microphone or the reception of other voice collecting devicesTalk about the voice signal of user.
Optionally, processing unit 101 may be used speech recognition technology and the voice signal of spoken user be converted to textInformation.If the voice signal of user's spoken user is denoted as S, corresponding language is obtained after carrying out a series of processing to SSound characteristic sequence O, is denoted as O={ O1, O2..., Oi..., OT, wherein OiIt is i-th of phonetic feature, T is phonetic feature total number.The corresponding sentences of voice signal S are considered as a word string being made of many words, are denoted as W={ w1, w2..., wn}.Voice is knownOther process is exactly to find out most probable word string W according to known phonetic feature sequence O.
Specifically, speech recognition is the process of a Model Matching, in this process, can be first according to the language of peopleSound feature establishes speech model, by the analysis of the voice signal to input, extracts required feature, to establish speech recognition instituteThe template needed;The process that voice inputted to user is identified is by the feature of the inputted voice of user and the template ratioCompared with process, finally determine with the optimal Template of the inputted voice match of the user, to obtain the result of speech recognition.ToolThe speech recognition algorithm of body can be used training and the recognizer of the hidden Markov model based on statistics, base can also be usedIn the training of neural network and recognizer, based on the matched recognizer of dynamic time consolidation etc. other algorithms, the present inventionEmbodiment does not limit specific speech recognition process.
Alternatively, optionally, processing unit 101 can obtain pending text according to text input by user.For example, userThe text inputted under the scenes such as instant messaging, office documents, can be as the source of pending text.
In practical applications, processing unit 101 can according to practical application request, from the corresponding text of voice signal orPending text is obtained in text input by user.It is alternatively possible to the interval time according to voice signal S, from voice signal SPending text is obtained in corresponding text;For example, when the interval time of voice signal S being more than time threshold, it can foundationThe time point determines corresponding first separation, using the corresponding texts of voice signal S before first separation as waiting locatingText is managed, and the corresponding texts of voice signal S after first separation are handled, it is pending to continue therefrom to obtainText.It is alternatively possible to according to the number of words that the corresponding text of voice signal or text input by user are included, believe from voicePending text is obtained in number corresponding text or text input by user;For example, in the corresponding text of voice signal orWhen the number of words that text input by user includes is more than number of words threshold value, corresponding second boundary can be determined according to the number of words threshold valuePoint, can be using the corresponding texts of voice signal S before second separation as pending text, and to second separationThe corresponding texts of voice signal S later are handled, to continue therefrom to obtain pending text.
In the embodiment of the present invention, sentence be made of according to certain syntax rule word or phrase, expression it is relatively completeThe meaning, have the syntactical unit of the apparent tone and sentence tune.Optionally, sentence may include:Simple sentence and/or complex sentence.Wherein, simple sentenceIt is the sentence being made of phrase or single word, one certain tone language that completely looks like and have relatively of independent expressionIt adjusts, such as " classmates have returned to school ", " he is in the pink of condition ".Relatively independent simple sentence form is referred to as point in complex sentenceSentence, generally has pause, is indicated with comma or branch on written between subordinate sentence and subordinate sentence;Subordinate sentence and subordinate sentence have one in the senseFixed contact, commonly uses some related words adverbial word or phrase of relevant effect (conjunction) to connect, such as " China wants rich and powerful, thisIt is the hope of more than ten00000000 Chinese people " etc..
Optionally, the interval time and its language model that processing unit 101 can be according to voice signal S, in spoken userThe corresponding text message of voice signal in be inserted into corresponding preset punctuation mark.Optionally, the preset punctuation mark of insertion canFor identifying the pause in sentence between each subordinate sentence, which can include but is not limited to:Comma, pause mark, branchDeng.
The cut-point that processing unit 101 is obtained according to the preset punctuation mark for including based on the pending text obtainsThe corresponding optimal punctuate result of the pending text;Specifically, in the embodiment of the present invention, the pending text includes pre-Set punctuation mark possible as or not as punctuate processing cut-point, that is, can be according to the pending text packetThe preset punctuation mark contained as or not as punctuate processing cut-point situation, make pauses in reading unpunctuated ancient writings to the pending textProcessing, in this way, pending text will it is corresponding there are many punctuate scheme and its corresponding punctuate as a result, the embodiment of the present invention mostWhat is obtained eventually is the punctuate result of synthesized translation optimal quality.
In a kind of application example of the present invention, it is assumed that 2 comma punctuates that pending text [A, B, C] includes have canCan or can not possibly be as the cut-point of punctuate processing, and assume that corresponding punctuate result may include:{ (A, B, C) }, (A),(B, C) }, { (A), (B), (C) } and { (A, B), (C) } etc., then the embodiment of the present invention can obtain synthesized translation optimal qualityPunctuate result;Wherein, [] indicates that pending text, () indicate that the sentence that punctuate obtains, { } indicate punctuate result.
Machine translation apparatus 102, can be received from processing unit 101 the processing text it is corresponding it is optimal punctuate as a result,And the corresponding optimal punctuate result of the processing text is translated as to the word of object language, wherein machine translation apparatus102 may be used the translation that machine translation mothod carries out optimal punctuate result, and machine translation mothod can utilize computer by oneThe target subordinate sentence of kind natural language (original language) is converted to the process of the word of another natural language (object language), for example,Source language and the target language can be respectively Chinese and English, alternatively, source language and the target language can be respectively English inText etc., the embodiment of the present invention mention specific machine translation mothod for specific original language, target language and do not limit.It is optionalThe type on ground, above-mentioned machine translation apparatus 102 may include:Measurement type and/or neural network type etc., it will be understood that thisInventive embodiments do not limit the concrete type of machine translation apparatus 102.
Translation result output device 103 can receive the word of object language from machine translation apparatus 102, and to the targetThe word of language is exported, and the corresponding way of output may include:Voice mode and/or interface manner etc..For example, in unisonUnder the scene of translation, the text conversion of the object language can be the voice of object language, and export.It is alternatively possible toIt is object language by the text conversion of the object language using the switch technology (such as speech synthesis technique) of Text To SpeechVoice, and by the speech plays such as earphone, loud speaker device by the voice output of object language.It is appreciated that the present invention is implementedExample is not for limiting the detailed process of voice and output that the text conversion of the object language is object language.AgainSuch as, under the scene of information service (such as translation web site or translation APP), directly machine translation apparatus 102 can be obtainedThe word of object language exports, for example, the text importing of object language is looked into the display device of such as screen for userIt sees.
It is appreciated that processing system shown in Fig. 1 is intended only as can be exemplified, in fact, processing unit 101 can in addition toExcept machine translation apparatus 102 other devices output processing text it is corresponding it is optimal punctuate as a result, the embodiment of the present invention forSpecific processing system does not limit.
Embodiment of the method
With reference to Fig. 2, shows a kind of processing method embodiment flow chart of the present invention, can specifically include following steps:
Step 201 obtains pending text;
Step 202, the cut-point obtained according to the preset punctuation mark for including based on the pending text, described in acquisitionThe corresponding optimal punctuate result of pending text;Wherein, the synthesized translation optimal quality of the optimal punctuate result, it is described optimalPunctuate result may include:At least one sentence, the synthesized translation quality can be all sentences pair that punctuate result includesAnswer the synthesis of translation quality;
Step 203, the corresponding optimal punctuate result of the output processing text.
Processing method provided in an embodiment of the present invention can be applied to the application environment of the computing devices such as terminal or serverIn.Optionally, above-mentioned terminal can include but is not limited to:Smart mobile phone, tablet computer, pocket computer on knee, vehicle mounted electricBrain, desktop computer, intelligent TV set, wearable device etc..Above-mentioned server can be Cloud Server or generic servicesDevice, the processing service for providing pending text to client.
Processing method provided in an embodiment of the present invention is applicable to the processing of the language such as Chinese, Japanese, Korean, for improvingThe translation quality of the corresponding punctuate result of pending text.It is appreciated that the arbitrary language made pauses in reading unpunctuated ancient writings is in this hairIn the scope of application of the processing method of bright embodiment.
In the embodiment of the present invention, the text that pending text can be used for indicating to be handled, which canWith the text or voice inputted by computing device from user, other computing devices are can be from.It needs to illustrateIt is that may include in above-mentioned pending text:A kind of language or more than one language, for example, in above-mentioned pending textIt may include Chinese, can also include the Chinese mixing with other for example English language, the embodiment of the present invention is to specifically waiting forProcessing text does not limit.
In practical applications, the computing device of the embodiment of the present invention can by client end AP P (application,Application the process flow of the embodiment of the present invention) is executed, client application may operate on computing device, exampleSuch as, which can be the arbitrary APP that runs in terminal, then the client application can be answered from other of computing deviceWith the pending text of acquisition.Alternatively, the computing device of the embodiment of the present invention can be executed by the functional device of client applicationThe process flow of the embodiment of the present invention, then the functional device can be from the pending text of other functional devices acquisition.Alternatively,The computing device of the embodiment of the present invention can execute the processing method of the embodiment of the present invention as server.
In a kind of alternative embodiment of the present invention, the method for the embodiment of the present invention can also include:Step 201 is obtainedAt least one pending text write-in buffer area taken;Then step 202 can read pending text from the buffer area first,And the cut-point obtained according to the preset punctuation mark for including based on read pending text, obtain the pending textCorresponding optimal punctuate result.It is alternatively possible to establish such as queue, array or chained list in the memory field of computing deviceData structure does not limit specific buffer area as above-mentioned buffer area, the embodiment of the present invention.It is above-mentioned to use buffer areaThe treatment effeciency of pending text can be improved by storing the mode of pending text, it will be understood that pending using disk storageThe mode of text is also feasible, and the embodiment of the present invention does not limit the specific storage mode of pending text.
In the embodiment of the present invention, the preset punctuation mark that the pending text includes is possible as or not as disconnectedThe cut-point of sentence processing, that is, can be used as according to the preset punctuation mark that the pending text includes or not as disconnectedThe situation of the cut-point of sentence processing, carries out punctuate processing, in this way, a pending text will be corresponding with to the pending textA variety of punctuate schemes and its it is corresponding punctuate as a result, the embodiment of the present invention it is finally obtained be synthesized translation optimal quality punctuateAs a result.
The embodiment of the present invention can provide point obtained according to the preset punctuation mark for including based on the pending textCutpoint, the following optimal result for obtaining the corresponding optimal punctuate result of the pending text obtain scheme:
Optimal result acquisition scheme 1,
Optimal result obtains scheme 1:It is obtained according to the preset punctuation mark for including based on the pending textThe cut-point arrived carries out punctuate processing to the pending text, is tied with obtaining the corresponding a variety of punctuates of the pending textFruit;Determine the corresponding synthesized translation quality of the punctuate result;And from the corresponding a variety of punctuate results of the pending textThe punctuate of synthesized translation optimal quality is selected as a result, as the corresponding optimal punctuate result of the pending text.
In practical applications, path planning algorithm may be used, according to the preset mark for including based on the pending textThe cut-point that point symbol obtains carries out punctuate processing to the pending text, corresponding more to obtain the pending textKind path and the corresponding punctuate result in each path.The principle of above-mentioned path planning algorithm can be, in the ring with barrierIn border, according to certain evaluation criterion, a collisionless path from initial state to dbjective state is found, specific to the present inventionEmbodiment, barrier can be used for indicating that the corresponding cut-point of pending text, initial state and dbjective state indicate to wait locating respectivelyManage the first subordinate sentence and end subordinate sentence of text.
With reference to Fig. 3, a kind of schematic diagram of the path planning of pending text of the embodiment of the present invention is shown, wherein wait forIt is [A, B, C] to handle text, it is assumed that 2 comma punctuates that pending text [A, B, C] includes are possible to or can not possibly makeFor the cut-point of punctuate processing, in Fig. 3, subordinate sentence A, B, C are indicated with rectangle respectively, and comma punctuate is indicated with circle respectively, funnyWhen number punctuate is used as cut-point, corresponding rounded periphery be provided with hexagon, then the punctuate result of [A, B, C] may include:0Cut-point corresponding { (A, B, C) }, the 1st comma punctuate are as cut-point corresponding { (A), (B, C) }, the 1st comma punctuateIt is corresponding as cut-point as cut-point corresponding { (A), (B), (C) } and the 2nd comma punctuate with the 2nd comma punctuate{ (A, B), (C) } etc..
It is appreciated that path planning algorithm is intended only as the alternative embodiment of the embodiment of the present invention, actually this field skillArt personnel can according to practical application request, obtain the corresponding a variety of punctuates of the pending text using other algorithms as a result,It is appreciated that the embodiment of the present invention is not subject to the specific acquisition algorithm of the corresponding a variety of punctuate results of the pending textLimitation.
In a kind of alternative embodiment of the present invention, the corresponding synthesized translation quality of the determination punctuate result can be withIncluding:For the sentence that each punctuate result includes, corresponding translation quality score is determined;All sentences for including to each punctuate resultThe corresponding translation quality score of son is merged, to obtain corresponding synthesized translation quality score;It can then be tied from all punctuatesThe highest punctuate of synthesized translation quality score is obtained in fruit as a result, as the corresponding optimal punctuate result of the pending text.
Optionally, the above-mentioned sentence for including for each punctuate result determines that the process of corresponding translation quality score can be withIncluding:Machine translation evaluation method can be used and determine the corresponding translation quality score of sentence.Wherein, above-mentioned machine translation evaluation sideMethod may include:Automatic evaluation method and/or artificial evaluation method;Above-mentioned automatic evaluation method can obtain evaluation and test set in advance(including original language input sentence and reference translation), then can be Chong Die with reference translation according to the corresponding machine translation result of sentenceN-gram (N-gram, such as " having deep love for home " be a bi-gram, " liking eating apple " is a Trigram), calculateThe corresponding translation quality score of sentence.It is appreciated that arbitrary machine translation evaluation method is feasible, the embodiment of the present inventionFor the sentence for including for each punctuate result, determine that the detailed process of corresponding translation quality score does not limit.
Optionally, the process that the corresponding translation quality score of the above-mentioned all sentences for including to each punctuate result is mergedMay include:Summation or product are carried out to the corresponding translation quality score of all sentences that each punctuate result includes or addedWeight average processing etc., it will be understood that the embodiment of the present invention is for the corresponding translation matter of all sentences for including to each punctuate resultPoint detailed process merged is measured not limit.
Optimal result acquisition scheme 2,
Optimal result obtains scheme 2:Using dynamic programming algorithm, foundation includes based on the pending textThe obtained cut-point of preset punctuation mark, obtain the corresponding optimal punctuate result of the pending text.
The principle of above-mentioned dynamic programming algorithm can be, by splitting problem, the pass between problem definition state and stateSystem so that problem can go to solve in a manner of recursion (dividing and ruling in other words).Specific to the embodiment of the present invention, problem can be to wait forThe corresponding synthesized translation optimal quality of the corresponding punctuate result of text is handled, state can be the corresponding subordinate sentence sequence of pending textArrange the corresponding synthesized translation optimal quality of the corresponding punctuate result of each subset of set.It is poor that scheme 1 is obtained relative to optimal resultIt lifts the corresponding a variety of punctuate results of the pending text and determines that the synthesized translation quality of a variety of punctuate results, optimal result obtainTake the dynamic programming algorithm that scheme 2 uses that can reduce operand, and as the preset punctuate that the pending text includes accords withNumber quantity increase, the reduction amplitude of operand will be increasing.
Optionally, above-mentioned to utilize dynamic programming algorithm, according to the preset punctuation mark for including based on the pending textObtained cut-point obtains the corresponding optimal punctuate of the pending text as a result, can specifically include:According to described pendingThe preset punctuation mark that text includes determines the corresponding subordinate sentence arrangement set of the pending text;According to the subordinate sentence sequenceThe sequence of the subset of set from small to large determines that each subset corresponds to the backtracking segmentation of optimal subset punctuate result by recursion modePoint;Each subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, obtains described waiting locatingManage the corresponding optimal punctuate result of text.
Wherein, above-mentioned subordinate sentence arrangement set can be used for indicating the sequence for the continuous subordinate sentence composition that the pending text is includedThe set of row, optionally, the subordinate sentence sequence included by above-mentioned subordinate sentence arrangement set can be by preceding i continuous subordinate sentence groups of target vocabularyAt for example, pending text [C1C2…CM] corresponding subordinate sentence arrangement set may include:{C1, C1C2, C1C2C3..., C1C2…CM, the subset which is included according to sequence length (namely sequence includes the quantity of subordinate sentence) from small to largeSequence can be expressed as:{C1}、{C1C2}、{C1C2C3}…{C1C2…CM, wherein above-mentioned subset corresponds to adjacent in subordinate sentence sequenceIt can be connected by preset punctuation mark between subordinate sentence;Optionally, the subset of the embodiment of the present invention can include a subordinate sentence sequenceRow, wherein CiI-th of subordinate sentence for including for indicating pending text, i are the positive integer more than or equal to 0, are waited for described in M expressionsThe subordinate sentence quantity of text is handled, M is positive integer.
For each subset of subordinate sentence arrangement set, corresponding subset punctuate result is also corresponding with synthesized translation matterAmount, therefore the embodiment of the present invention can determine that each subset corresponds to the backtracking cut-point of optimal subset punctuate result;The optimal subsetThe backtracking cut-point of punctuate result can be used for indicating subset corresponds to optimal subset punctuate result it is optimal when, in which preset punctuate symbolIt is divided or makes pauses in reading unpunctuated ancient writings at number.Assuming that subset { C1C2C3Optimal subset punctuate result is corresponded to as { (C1), (C2C3), then illustrate sonCollect { C1C2C3It is in " C1" at be divided or punctuate, it is corresponding to recall cut-point and be expressed as " C1" number 1, Ke YiliSolution, the embodiment of the present invention are not limited for recalling the specific representation of cut-point.
The embodiment of the present invention can pass through recursion mode according to the subset sequence from small to large of the subordinate sentence arrangement setDetermine that each subset corresponds to the backtracking cut-point of optimal subset punctuate result, it is assumed that according to the subset of the subordinate sentence arrangement set from smallEach subset is expressed as to big sequence:G1、G2、G3…Gu, wherein u is positive integer, then can obtain G successively1、G2、G3…GuThe backtracking cut-point of corresponding optimal subset punctuate result;Also, for Go (1≤o≤u), subset before Go is needed (such asGo-1、Go-2Deng) optimal subset punctuate as a result, determining that Go corresponds to the backtracking cut-point of optimal subset punctuate result.
In a kind of alternative embodiment of the present invention, the subset of the subordinate sentence arrangement set may include:It is described pendingThe preceding i subordinate sentence of text, the corresponding optimal subset synthesized translation quality score of preceding i subordinate sentence are expressed as F (i), 0≤i≤describedThe subordinate sentence quantity M of pending text, then the sequence of the subset according to the subordinate sentence arrangement set from small to large, passes through recursionMode determines that each subset corresponds to the backtracking cut-point of optimal subset punctuate result, can specifically include:
Made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, to obtain k pairs of the preceding i subordinate sentence and the cut-pointThe optimal subset synthesized translation quality score F (k) for the first semantic primitive answered and the translation quality of the second semantic primitive obtainPoint;Wherein, first semantic primitive may include:The subordinate sentence being located at before cut-point k that the preceding i subordinate sentence includes, instituteStating the second semantic primitive may include:The subordinate sentence being located at after cut-point k that the preceding i subordinate sentence includes, 0≤k<i;
The translation quality score of F (k) and second semantic primitive are integrated, to obtain the preceding i subordinate sentenceSynthesized translation quality score corresponding with cut-point k;
According to the preceding i subordinate sentence and the corresponding synthesized translation quality scores of cut-point k, corresponded to from the preceding i subordinate sentenceAt least one cut-point k in obtain the corresponding Target Segmentation point k ' of optimal synthesis translation quality score;In practical applications, divideThe quantity of cutpoint k can be one or more, and the quantity of Target Segmentation point k ' can be one or more, but Target SegmentationThe corresponding set of point k ' can be less than or equal to the corresponding set of cut-point k.Assuming that the corresponding collection of cut-point k is combined into { 0,1,2,3 ...K }, then the corresponding set of Target Segmentation point k ' can be the subset of { 0,1,2,3 ... k }, for example, the corresponding collection of Target Segmentation point k 'Closing can be { 0,1 } etc..
The Target Segmentation point k ' is corresponded to the backtracking cut-point of optimal subset punctuate result as the preceding i subordinate sentence,And using the corresponding synthesized translation quality scores of the Target Segmentation point k ' as the corresponding optimal subset of the preceding i subordinate sentenceSynthesized translation quality score F (i).
In the embodiment of the present invention, semantic primitive can be used for indicating the unit of one meaning of expression, can in the embodiment of the present inventionTwo to be made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k by the first semantic primitive and the second semantic element representationA semantic primitive.In practical applications, made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, i subordinate sentence packet before obtainingThe second language after the cut-point k that the first semantic primitive and preceding i subordinate sentence before what is included be located at cut-point k includeAdopted unit.It is appreciated that the embodiment of the present invention for the first semantic primitive and the included subordinate sentence of the second semantic primitive quantity notIt limits, for example, the first semantic primitive and the second semantic primitive can separately include one or more subordinate sentence.
The corresponding optimal synthesis translation quality score of k subordinate sentence before F (k) can be used for indicating.It in practical applications, can be with needleTo F (k), preset corresponding initial value, for example, initial value=0, k of the corresponding F of k=0 [0] is more than the initial of 0 corresponding F [i]Value=- INF (minus infinity) etc., it will be understood that the embodiment of the present invention does not limit the corresponding initial values of F (k).It can be withFind out, the value of F (0) can be obtained by preset;When k is more than 0, the initial value of corresponding F (k) can be obtained by preset, corresponding F (k)End value can be obtained by iteration, for example, can be acquired by following formula (1) k more than 0 correspond to F (k) end value.
Assuming that the corresponding optimal subset synthesized translation quality score of the first semantic primitive is F (k), the second semantic primitive is turned overIt is NMT_score (k, i) to translate quality score, then is integrated to the translation quality score of F (k) and second semantic primitiveProcess may include:Summation or product or weighted average processing etc. are carried out to F (k) and NMT_score (k, i), it canTo understand, the embodiment of the present invention carries out the specific of synthesis for the translation quality score to F (k) and second semantic primitiveProcess does not limit.
In practical applications, for preceding i subordinate sentence, corresponding cut-point k can be located at corresponding of preceding i subordinate sentenceMeaning position, in this way, the corresponding cut-point of preceding i subordinate sentence such as subset { C1C2C3Corresponding cut-point k number can be 0,1,2,3 etc..It correspondingly, can be according to the preceding i subordinate sentence and the corresponding synthesized translation quality score F (i, k) of cut-point k, from describedThe corresponding Target Segmentation point of optimal synthesis translation quality score is obtained in the corresponding at least one cut-point k of preceding i subordinate sentence.
In the embodiment of the present invention, it can be obtained by the size of synthesized translation quality score to weigh optimal synthesis translation qualityPoint, it is assumed that F (i, k)=F [k]+NMT_score (k, i), then the corresponding optimal synthesis translation quality score of the preceding i subordinate sentence,The corresponding Target Segmentation point of the optimal synthesis translation quality score can be expressed as:
F [i]=max (F [k]+NMT_score (k, i)) (1)
Index [i]=argmax (F [k]+NMT_score (k, i)) (2)
Index [i] can be used for indicating maximum (F [k]+NMT_score (k, i)) corresponding k values.In practical applications,Can be according to the sequences of i from small to large, the corresponding optimal subset synthesized translation quality score F of i subordinate sentence before Recursive Solution successively(i) and corresponding backtracking cut-point.
Optionally, the method for the embodiment of the present invention can also include:Each subset of the subordinate sentence arrangement set is corresponded to mostThe backtracking cut-point of excellent subset punctuate result is recorded;Alternatively, information to each subset of the subordinate sentence arrangement set and itsMapping relations between the backtracking cut-point of corresponding optimal subset punctuate result are recorded, to obtain corresponding record content.Wherein, the information of the subset of above-mentioned subordinate sentence arrangement set may include:The number information of the corresponding end subordinate sentence of subset, and/or,Corresponding number information of subset etc..For example, for preceding i subordinate sentence, corresponding number information can be i, correspond to end pointThe information etc. of sentence namely i-th of subordinate sentence.It is appreciated that the embodiment of the present invention does not limit the specifying information of subset.
In a kind of alternative embodiment of the present invention, above-mentioned each subset according to the subordinate sentence arrangement set corresponds to optimal sonThe backtracking cut-point for collecting punctuate result obtains the corresponding optimal punctuate of the pending text as a result, can specifically include:
The backtracking cut-point that optimal subset punctuate result is corresponded to each subset of the subordinate sentence arrangement set is recalled, withThe maximal subset for obtaining the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result;
Maximal subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, to describedPending text is made pauses in reading unpunctuated ancient writings, to obtain the corresponding optimal punctuate result of the pending text.
Optionally, above-mentioned each subset to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate resultRecalled, be can specifically include:
The corresponding first backtracking cut-point P1 of i subordinate sentence before obtaining;
Obtain the subordinate sentence corresponding second being located at before the first backtracking cut-point P1 that the pending text includesRecall cut-point P2.
In practical applications, the backtracking of backtracking cut-point, former M subordinate sentence can be carried out according to the sequences of i from big to smallFor the acquisition process of corresponding backtracking cut-point, the corresponding first backtracking cut-point P1 of M subordinate sentence, example before can determining firstSuch as, the corresponding first backtracking cut-point P1 of M subordinate sentence before being inquired from record content above-mentioned;Wherein, the first backtracking pointThe corresponding optimal subset punctuate result of M subordinate sentence before cutpoint P1 can be obtained;Then, P1 before being obtained from record content above-mentionedA subordinate sentence corresponding second recalls cut-point P2, for example, P1 subordinate sentence is corresponding before being inquired from record content above-mentionedSecond backtracking cut-point P2;Wherein, the corresponding optimal subset punctuate knot of P1 subordinate sentence before the second backtracking cut-point P2 can be obtainedFruit can terminate to recall if P1 or P2 is equal to 0, otherwise, if P1 or P2 is not equal to 0, can continue to recall.
To make those skilled in the art more fully understand the cutting processing procedure of the embodiment of the present invention, shown herein by one kindExample illustrates that the processing procedure of the embodiment of the present invention, the example are related to handling pending text [A, B, C], corresponding to handleProcess can specifically include following steps:
Step S1, the corresponding subordinate sentence arrangement set { [A, B], [A, B], [A, B, C] } of pending text [A, B, C] is obtained;
Assuming that S (i, j) indicate from u-th of comma to the subordinate sentence sequence v-th of preset punctuation mark, then S (0,1)=A, S (1,2)=B, S (2,3)=C, S (0,2)=A, B, S (1,3)=B, C, S (0,3)=A, B, C.
It is further assumed that the translation quality score of the corresponding sentences of S (i, j) is respectively:
NMT_score (0,1)=- 10
NMT_score (1,2)=- 15
NMT_score (2,3)=- 20
NMT_score (0,2)=- 2
NMT_score (1,3)=- 5
NMT_score (0,3)=- 30
Step S2, the corresponding optimal subset synthesized translation quality score of i subordinate sentence before being indicated using F (i), F's [0] is initialIt is worth initial value=- INF (minus infinity) that=0, i is more than 0 corresponding F [i];
Step S3, as i=0, the corresponding optimal subset synthesized translation quality score F (0)=0 of first 0 continuous subordinate sentence;
Step S4, as i=1, corresponding cut-point k=0, then
F [1]=max (F [0]+NMT_score (0,1))=- 10
Index [1]=0;
Step S5, as i=2, corresponding cut-point k=0,1, then
F [2]=max (F [0]+NMT_score (0,2), F [1]+NMT_score (1,2))=F [0]+NMT_score (0,2)=- 2
Index [2]=0;
Step S6, as i=3, corresponding cut-point k=0,1,2, then
F [3]=max (F [0]+NMT_score (0,3), F [1]+NMT_score (1,3), F [2]+NMT_score (2,3))=F [1]+NMT_score (1,3)=- 15
Index [3]=1;
Step S7, the corresponding backtracking cut-points of F (3) are recalled;
Wherein it is possible to obtain the corresponding backtracking cut-point P1=1 of F (3) first, the corresponding backtracking segmentations of F (1) are then obtainedPoint P2=0, that is, pending text [A, B, C] can be made pauses in reading unpunctuated ancient writings for 2 sentences, corresponding backtracking cut-point is respectively:P=0,And P=1, that is, 2 sentences that cutting obtains are located at after the 0th subordinate sentence and the 1st subordinate sentence, therefore can obtainCorresponding optimal punctuate result " A " and " B, C ".
It is appreciated that above-mentioned pending text [A, B, C] is intended only as alternative embodiment, it will be understood that art technologyPersonnel can be handled arbitrary pending text, according to practical application request to obtain corresponding optimal punctuate result.For example, for pending text [A, B, C, D, E, F], " Saunders indicates that Donald Trump once promised to undertake that he was after taking up the post of during general electionSystem of social security, the elderly's medical insurance system and Medicaid will not be cancelled, still, now he appoint thisA little people exactly advocate that crowd of people for cancelling above-mentioned system " corresponding punctuate result may include:" A, B, C, D " and " E, F ".
To sum up, the processing method of the embodiment of the present invention is obtained according to the preset punctuation mark for including based on pending textCut-point, obtain the corresponding optimal punctuate result of the pending text;Due to the above-mentioned optimal punctuate of the embodiment of the present inventionAs a result synthesized translation optimal quality, the optimal punctuate result may include:At least one sentence, the synthesized translation qualityIt can be the synthesis that a kind of all sentences that punctuate result includes correspond to translation quality;Therefore the optimal punctuate knot of the embodiment of the present inventionFruit can realize the global optimum of synthesized translation quality, thus the optimal punctuate result of the embodiment of the present invention can improve it is pendingThe translation quality of the corresponding punctuate result of text.
It should be noted that for embodiment of the method, for simple description, therefore it is dynamic to be all expressed as a series of movementIt combines, but those skilled in the art should understand that, the embodiment of the present invention is not limited by described athletic performance sequenceSystem, because of embodiment according to the present invention, certain steps can be performed in other orders or simultaneously.Secondly, art technologyPersonnel should also know that embodiment described in this description belongs to preferred embodiment, and involved athletic performance simultaneously differsSurely it is necessary to the embodiment of the present invention.
Device embodiment
With reference to Fig. 4, shows a kind of structure diagram of processing unit embodiment of the present invention, can specifically include:
Pending text acquisition module 401, for obtaining pending text;
Optimal punctuate result acquisition module 402, for according to the preset punctuation mark for including based on the pending textObtained cut-point obtains the corresponding optimal punctuate result of the pending text;Wherein, the synthesis of the optimal punctuate resultTranslation quality is optimal, and the optimal punctuate result may include:At least one sentence, the synthesized translation quality are optimal punctuateAs a result all sentences for including correspond to the synthesis of translation quality;And
Optimal punctuate result output module 403, for exporting the corresponding optimal punctuate result of the processing text.
Optionally, the optimal punctuate result acquisition module 402 may include:
Dynamic Programming acquisition submodule, for utilizing dynamic programming algorithm, foundation to include based on the pending textThe cut-point that preset punctuation mark obtains obtains the corresponding optimal punctuate result of the pending text.
Optionally, the Dynamic Programming acquisition submodule may include:
Subordinate sentence arrangement set determination unit, the preset punctuation mark for including according to the pending text, determines instituteState the corresponding subordinate sentence arrangement set of pending text;
Recursion unit, it is true by recursion mode for the sequence of the subset according to the subordinate sentence arrangement set from small to largeFixed each subset corresponds to the backtracking cut-point of optimal subset punctuate result;And
Optimal punctuate result acquiring unit, for corresponding to optimal subset punctuate according to each subset of the subordinate sentence arrangement setAs a result backtracking cut-point obtains the corresponding optimal punctuate result of the pending text.
Optionally, the subset of the subordinate sentence arrangement set may include:The preceding i subordinate sentence of the pending text, preceding iThe corresponding optimal subset synthesized translation quality score of subordinate sentence is expressed as F (i), the subordinate sentence quantity M of 0≤i≤pending text,Then the recursion unit may include:
Subset punctuate subelement, for being made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, to obtain the preceding iThe optimal subset synthesized translation quality score F (k) and the second language of subordinate sentence and corresponding first semantic primitives of the cut-point kThe translation quality score of adopted unit;Wherein, first semantic primitive may include:The position that the preceding i subordinate sentence may includeSubordinate sentence before cut-point k, second semantic primitive may include:What the preceding i subordinate sentence may include is located at segmentationSubordinate sentence after point k, 0≤k<i;
Quality comprehensive subelement, for being integrated to the translation quality score of F (k) and second semantic primitive,To obtain the preceding i subordinate sentence and the corresponding synthesized translation quality scores of cut-point k;
Target Segmentation point obtains subelement, for according to the preceding i subordinate sentence and the corresponding synthesized translation quality of cut-point kScore obtains the corresponding target of optimal synthesis translation quality score from the corresponding at least one cut-point k of the preceding i subordinate sentenceCut-point;
Recall cut-point and obtain subelement, for corresponding to optimal son using the Target Segmentation point as the preceding i subordinate sentenceCollect the backtracking cut-point of punctuate result, and, using the corresponding synthesized translation quality score of the Target Segmentation point as the preceding iThe corresponding optimal subset synthesized translation quality score F (i) of a subordinate sentence.
Optionally, the optimal punctuate result acquiring unit may include:
Recall subelement, the backtracking point for corresponding to optimal subset punctuate result to each subset of the subordinate sentence arrangement setCutpoint is recalled, and the maximal subset to obtain the subordinate sentence arrangement set corresponds to the backtracking segmentation of optimal subset punctuate resultPoint;
Recall punctuate subelement, optimal subset punctuate result is corresponded to for the maximal subset according to the subordinate sentence arrangement setBacktracking cut-point, make pauses in reading unpunctuated ancient writings to the pending text, to obtain the corresponding optimal punctuate result of the pending text.
Optionally, the backtracking subelement may include:
First trace unit, for the corresponding first backtracking cut-point P1 of i subordinate sentence before obtaining;
Second trace unit described first recalls cut-point P1 for being located at of obtaining that the pending text may includeThe corresponding second backtracking cut-point P2 of subordinate sentence before.
Optionally, the optimal punctuate result acquisition module 402 may include:
Exhaustive submodule, the cut-point for being obtained according to the preset punctuation mark for including based on the pending text,Punctuate processing is carried out to the pending text, to obtain the corresponding a variety of punctuate results of the pending text;
Comprehensive quality determination sub-module, for determining the corresponding synthesized translation quality of the punctuate result;
As a result submodule is selected, for selecting synthesized translation matter from the corresponding a variety of punctuate results of the pending textOptimal punctuate is measured as a result, as the corresponding optimal punctuate result of the pending text.
Optionally, the preset punctuation mark may include:Comma and/or branch and/or branch.
For device embodiments, since it is basically similar to the method embodiment, so fairly simple, the correlation of descriptionPlace illustrates referring to the part of embodiment of the method.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are withThe difference of other embodiment, the same or similar parts between the embodiments can be referred to each other.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this methodEmbodiment in be described in detail, explanation will be not set forth in detail herein.
Fig. 5 be shown according to an exemplary embodiment it is a kind of for processing device as terminal when block diagram.For example,The terminal 900 can be mobile phone, computer, digital broadcast terminal, messaging devices, game console, tablet device,Medical Devices, body-building equipment, personal digital assistant etc..
With reference to Fig. 5, terminal 900 may include following one or more components:Processing component 902, memory 904, power supplyComponent 906, multimedia component 908, audio component 910, the interface 912 of input/output (I/O), sensor module 914, andCommunication component 916.
The integrated operation of 902 usual control terminal 900 of processing component, such as with display, call, data communication, phaseMachine operates and record operates associated operation.Processing element 902 may include that one or more processors 920 refer to executeIt enables, to perform all or part of the steps of the methods described above.In addition, processing component 902 may include one or more modules, justInteraction between processing component 902 and other assemblies.For example, processing component 902 may include multi-media module, it is more to facilitateInteraction between media component 908 and processing component 902.
Memory 904 is configured as storing various types of data to support the operation in terminal 900.These data are shownExample includes instruction for any application program or method that are operated in terminal 900, contact data, and telephone book data disappearsBreath, picture, video etc..Memory 904 can be by any kind of volatibility or non-volatile memory device or their groupIt closes and realizes, such as static RAM (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compileJourney read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flashDevice, disk or CD.
Power supply module 906 provides electric power for the various assemblies of terminal 900.Power supply module 906 may include power management systemSystem, one or more power supplys and other generated with for terminal 900, management and the associated component of distribution electric power.
Multimedia component 908 is included in the screen of one output interface of offer between the terminal 900 and user.OneIn a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screenCurtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensingsDevice is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding motionThe boundary of action, but also detect duration and pressure associated with the touch or slide operation.In some embodiments,Multimedia component 908 includes a front camera and/or rear camera.When terminal 900 is in operation mode, mould is such as shotWhen formula or video mode, front camera and/or rear camera can receive external multi-medium data.Each preposition camera shootingHead and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 910 is configured as output and/or input audio signal.For example, audio component 910 includes a MikeWind (MIC), when terminal 900 is in operation mode, when such as call model, logging mode and speech recognition mode, microphone by withIt is set to reception external audio signal.The received audio signal can be further stored in memory 904 or via communication setPart 916 is sent.In some embodiments, audio component 910 further includes a loud speaker, is used for exports audio signal.
I/O interfaces 912 provide interface between processing component 902 and peripheral interface module, and above-mentioned peripheral interface module canTo be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, start button and lockDetermine button.
Sensor module 914 includes one or more sensors, and the state for providing various aspects for terminal 900 is commentedEstimate.For example, sensor module 914 can detect the state that opens/closes of terminal 900, and the relative positioning of component, for example, it is describedComponent is the display and keypad of terminal 900, and sensor module 914 can be with 900 1 components of detection terminal 900 or terminalPosition change, the existence or non-existence that user contacts with terminal 900,900 orientation of terminal or acceleration/deceleration and terminal 900Temperature change.Sensor module 914 may include proximity sensor, be configured to detect without any physical contactPresence of nearby objects.Sensor module 914 can also include optical sensor, such as CMOS or ccd image sensor, atAs being used in application.In some embodiments, which can also include acceleration transducer, gyro sensorsDevice, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 916 is configured to facilitate the communication of wired or wireless way between terminal 900 and other equipment.Terminal900 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or combination thereof.In an exemplary implementationIn example, communication component 916 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel.In one exemplary embodiment, the communication component 916 further includes near-field communication (NFC) module, to promote short range communication.ExampleSuch as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology,Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, terminal 900 can be believed by one or more application application-specific integrated circuit (ASIC), numberNumber processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind ofSuch as include the memory 904 of instruction, above-metioned instruction can be executed by the processor 920 of terminal 900 to complete the above method.For example,The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy diskWith optical data storage devices etc..
Fig. 6 be shown according to an exemplary embodiment it is a kind of for processing device as server when block diagram.It shouldServer 1900 can generate bigger difference because configuration or performance are different, may include one or more central processingsDevice (central processing units, CPU) 1922 (for example, one or more processors) and memory 1932,(such as one or more magnanimity of storage medium 1930 of one or more storage application programs 1942 or data 1944Storage device).Wherein, memory 1932 and storage medium 1930 can be of short duration storage or persistent storage.Storage is stored in be situated betweenThe program of matter 1930 may include one or more modules (diagram does not mark), and each module may include in serverSeries of instructions operation.Further, central processing unit 1922 could be provided as communicating with storage medium 1930, serviceThe series of instructions operation in storage medium 1930 is executed on device 1900.
Server 1900 can also include one or more power supplys 1926, one or more wired or wireless netsNetwork interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one orMore than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTMEtc..
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind ofSuch as include the memory 1932 of instruction, above-metioned instruction can be executed by the processor 1922 of server 1900 to complete the above method.For example, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape,Floppy disk and optical data storage devices etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processor of serverWhen execution so that device (server or terminal) is able to carry out a kind of processing method, the method includes:Obtain pending textThis;According to the cut-point that the preset punctuation mark for including based on the pending text obtains, the pending text pair is obtainedThe optimal punctuate result answered;Wherein, the synthesized translation optimal quality of the optimal punctuate result, the optimal punctuate result packetIt includes:At least one sentence, the synthesized translation quality are that all sentences that the optimal punctuate result includes correspond to translation qualitySynthesis;Export the corresponding optimal punctuate result of the processing text.
Optionally, the cut-point that the preset punctuation mark that the foundation includes based on the pending text obtains obtainsThe corresponding optimal punctuate of the pending text is as a result, include:Using dynamic programming algorithm, according to based on the pending textIncluding the obtained cut-point of preset punctuation mark, obtain the corresponding optimal punctuate result of the pending text.
Optionally, described to utilize dynamic programming algorithm, according to the preset punctuation mark for including based on the pending textObtained cut-point obtains the corresponding optimal punctuate of the pending text as a result, including:
According to the preset punctuation mark that the pending text includes, the corresponding subordinate sentence sequence of the pending text is determinedSet;
According to the subset sequence from small to large of the subordinate sentence arrangement set, determine that each subset corresponds to most by recursion modeThe backtracking cut-point of excellent subset punctuate result;The corresponding synthesized translation optimal quality of the optimal subset punctuate result;
Each subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, obtains describedThe corresponding optimal punctuate result of pending text.
Optionally, the subset of the subordinate sentence arrangement set includes:The preceding i subordinate sentence of the pending text, preceding i subordinate sentenceCorresponding optimal subset synthesized translation quality score is expressed as F (i), the subordinate sentence quantity M of 0≤i≤pending text, then instituteThe subset sequence from small to large according to the subordinate sentence arrangement set is stated, determines that each subset corresponds to optimal subset by recursion modeThe backtracking cut-point of punctuate result, including:
Made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, to obtain k pairs of the preceding i subordinate sentence and the cut-pointThe optimal subset synthesized translation quality score F (k) for the first semantic primitive answered and the translation quality of the second semantic primitive obtainPoint;Wherein, first semantic primitive includes:The subordinate sentence being located at before cut-point k that the preceding i subordinate sentence includes, described theTwo semantic primitives include:The subordinate sentence being located at after cut-point k that the preceding i subordinate sentence includes, 0≤k<i;
The translation quality score of F (k) and second semantic primitive are integrated, to obtain the preceding i subordinate sentenceSynthesized translation quality score corresponding with cut-point k;
According to the preceding i subordinate sentence and the corresponding synthesized translation quality scores of cut-point k, corresponded to from the preceding i subordinate sentenceAt least one cut-point k in obtain the corresponding Target Segmentation point of optimal synthesis translation quality score;
The backtracking cut-point that optimal subset punctuate result is corresponded to using the Target Segmentation point as the preceding i subordinate sentence, withAnd it is integrated the corresponding synthesized translation quality score of the Target Segmentation point as the corresponding optimal subset of the preceding i subordinate sentenceTranslation quality score F (i).
Optionally, each subset according to the subordinate sentence arrangement set corresponds to the backtracking segmentation of optimal subset punctuate resultPoint obtains the corresponding optimal punctuate of the pending text as a result, including:Each subset of the subordinate sentence arrangement set is corresponded to mostThe backtracking cut-point of excellent subset punctuate result is recalled, and the maximal subset to obtain the subordinate sentence arrangement set corresponds to optimal sonCollect the backtracking cut-point of punctuate result;Maximal subset according to the subordinate sentence arrangement set corresponds to returning for optimal subset punctuate resultTrace back cut-point, makes pauses in reading unpunctuated ancient writings to the pending text, to obtain the corresponding optimal punctuate result of the pending text.
Optionally, each subset to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate resultRecalled, including:The corresponding first backtracking cut-point P1 of i subordinate sentence before obtaining;Obtain the position that the pending text includesThe corresponding second backtracking cut-point P2 of subordinate sentence before the first backtracking cut-point P1.
Optionally, the cut-point that the preset punctuation mark that the foundation includes based on the pending text obtains obtainsThe corresponding optimal punctuate of the pending text is as a result, include:According to the preset punctuate symbol for including based on the pending textNumber obtained cut-point, punctuate processing is carried out to the pending text, corresponding a variety of disconnected to obtain the pending textSentence result;Determine the corresponding synthesized translation quality of the punctuate result;From the corresponding a variety of punctuate results of the pending textThe middle punctuate for selecting synthesized translation optimal quality is as a result, as the corresponding optimal punctuate result of the pending text.
Optionally, the preset punctuation mark includes:Comma and/or branch and/or branch.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the present inventionIts embodiment.The present invention is directed to cover the present invention any variations, uses, or adaptations, these modifications, purposes orPerson's adaptive change follows the general principle of the present invention and includes the undocumented common knowledge in the art of the disclosureOr conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by followingClaim is pointed out.
It should be understood that the invention is not limited in the precision architectures for being described above and being shown in the accompanying drawings, andAnd various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit andWithin principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.
Above to a kind of processing method provided by the present invention, a kind of processing unit and a kind of device for processing,It is described in detail, principle and implementation of the present invention are described for specific case used herein, the above realityThe explanation for applying example is merely used to help understand the method and its core concept of the present invention;Meanwhile for the general technology of this fieldPersonnel, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, in conclusion this theoryBright book content should not be construed as limiting the invention.