- how many positive (e.g., “thumbs-up” or ratings above a fixed amount) feedbacks are given to its existing alignment;
- how many negative (e.g., “thumbs-down” or ratings below a fixed amount) feedbacks are given to its existing alignment.

Then, the following records are calculated, at406, for each word in the designated first (“L1”) sentence for a sentence pair: the link given to that word by the existing alignment model and the link (i.e., realignment) given to that word by each different user.

At408, realignment by users is considered. This may be implemented by, for example, therealignment factorizer136. When users perform realignments rather than just rating the existing alignment, a “corrected” group of realignments is produced. That factorization may be calculated in this manner:

- 1) Suppose the L1 sentence contains m words (i.e., it is of the form “w1(1) w1(2) . . . w1(m)”) and suppose the L2 sentence contains n words (i.e., it is of the form “w2(1) w2(2) . . . w2(n)”).
- 2) Suppose there are x users (known as u₁, u₂, . . . , u_x) who have given feedback to that sentence pair (containing L1 and L2). Then there are x+1 voters. The first vote is by the existing word-alignment model (known as ‘user 0’ or u₀) and the others are by the human users.
- 3) For each L1 word w1(i), 1≦i≦m, there are n+1 candidates to be voted for, viz. {w2(0), w2(1), w2(2), . . . , w2(n)}. The candidate ‘w2(0)’ means that w1(i) does not align to any word in the L2 sentence, and each of the other candidates w2(j), 1≦j≦n, means that w1(i) aligns to a particular word w2(j) in the L2 sentence.
- 4) If a user does not give any correction to the link of w1(i), then it is assumed that she agrees with the existing link and therefore her vote is the same as that by the existing word-alignment model.
- 5) The vote from user u_k(1≦k≦x) is assigned a weight, W_k. W_kis determined by combining a list of factors. These factors are divided into two groups:
  - (i) The first group of factors is about confidence on users. One factor is the user's credibility based on her previous records; another factor is the time that the user took before making her correction.
  - (ii) The second group of factors is about confidence on the link that the user votes for. These factors may be (a) whether the link is supported by a dictionary, (b) whether the link is supported by statistical analysis of bilingual textual dataset, (c) whether the link looks reasonable given its context.
- 6) For each w1(i), each candidate w2(j) is assigned a score:

Σ_k=1^xW_k·δ(j,k) Equation (1)

- where δ(j, k) is defined as 1 if user k votes for candidate w2(j) and 0 otherwise. The candidate scored the highest is taken to be the new link for w1(i). The new alignment for the entire L1 sentence can thus be obtained.

At410, the ratings of existing word-alignments and the proposed new word realignments are processed and inserted into a new training dataset. This may be implemented by, for example, thetraining data updater138.

For the sentence pairs given user-feedback ratings, sentence pairs are divided into two groups: a retaining group and to-be-revised group. The retaining group will be part of the new training dataset and includes sentence pairs that have been given many positive feedbacks to their existing word alignments. The to-be-revised group will not be part of the new training dataset and includes sentence pairs that have been given many negative feedbacks to their existing word alignments. The to-be-realigned group may be later examined by a set of human language experts. Of course, other statistical calculations and thresholds may be employed in other implementations.

For the realigned sentence pairs, the corrected word realignment is calculated based upon the results from factoring user realignment at408. The corrected word realigned sentence pairs are included as part of the new training dataset.

The data gathered from the users' feedback is used to produce the new labeled dataset for training. Based on this new training dataset, a new word-alignment model can be trained. The unlabeled dataset is all the sentence pairs in the repertoire of examples of the multilingual textual dataset. The new word-alignment model is applied to the unlabeled dataset to produce new alignment links.

At412, the learning algorithm/approach is run on the updated training dataset to produce a new and presumably improved word-alignment model. This may be implemented by, for example, by the machine-translation learner140.

At414, the improved word-alignment model is consumed and deployed for use by users. That means the new model is then applied to the existing dataset or a new multilingual textual dataset. If applied to the existing dataset, the existing sentence pairs are realigned according to that improved word-alignment model. The updated or new multilingual textual dataset is now made available for use by thefrontend subsystem150 for exposure to users.

Finally, the cycle returns back to402 where the updated or new multilingual textual dataset, based upon the improved word-alignment model, is exposed via a UI to users for them to, for example, learn a language or provide feedback and improve the current word-alignment.

Exemplary Processes

FIGS. 5-8 are flow diagrams illustrating

exemplary processes

500,600,700, and800 that implement the techniques described herein for word-alignment depiction and/or improvement. The UIs shown inFIGS. 2A-C and4A-D are generated by and/or utilized by

exemplary processes

500,600, and700.

Each of these processes is illustrated as a collection of blocks in a logical flow graph, which represents a sequence of operations that can be implemented in hardware, software, or a combination thereof. In the context of software, the blocks represent computer instructions stored on one or more computer-readable storage media that, when executed by one or more processors of such a computer, perform the recited operations. Note that the order in which the process is described is not intended to be construed as a limitation, and any number of the described process blocks can be combined in any order to implement the process, or an alternate process. Additionally, individual blocks may be deleted from the process without departing from the spirit and scope of the subject matter described herein.

FIG. 5 illustrates theprocess500 for word-alignment depiction. The process is performed at least in part by a word-alignment computing system. That computing system includes one or more computing devices that is configured to depict, expose, display, present, and/or improve word-alignments for one or more bilingual sentence pairs. The word-alignment computing system includes, for example, the computing device102, the language-translation-and-learningsystem120, or some combination thereof. The word-alignment computing system that is configured as described here qualifies as a particular machine or apparatus.

As shown here, theprocess500 begins withoperation502, where the word-alignment computing system obtains at least one bilingual sentence pair.

Atoperation504, the word-alignment computing system concurrently displays each of the sentences of the bilingual sentence pair via a UI on an output display (likedisplay screen104 shown inFIG. 1). One or more word-aligned words or phrases in each sentence may be emphasized. An example of this is seen inFIG. 2A where the word “shining”212 in thesentence202 is emphasized along with its word-aligned confederate word or phrase, which isword214 in thesentence204. That emphasis conveys a particular meaning, for example,

words

212 and214 inFIG. 2A are query words. One of these two words may have been used to find the particular sentence pair shown inFIG. 2A.

Atoperation506, the word-alignment computing system waits for the user to produce an input event that indicates the user has chosen an of-interest word or phrase in one of the sentences. That input event may be, for example, a mouse cursor hovering over or near (i.e., proximate) to a word or phrase. The of-interest word can be in either sentence regardless of language or order of the sentences. For the sake of clarity, the sentence with the of-interest word is called the “first” sentence herein.

Atoperation508, once the of-interest word is chosen, the system determines if there is corresponding word in the other (i.e., second) sentence that is aligned with the of-interest word. If not, then the process returns tooperation506 to wait for another of-interest word to be chosen. If so, then the process proceeds to the next operation.

Atoperation510, the system locates the particular corresponding word in the other (i.e., second) sentence that is aligned with the of-interest word. The particular corresponding word in the other sentence is called the linked word.

Atoperation512, the system concurrently highlights both words. Said another way, the system simultaneously highlights both the of-interest word and the linked word on the screen.

Alternatively, some or all of the operations506-512 may be described as including a determination about whether the user-directable position indicator is proximate to the of-interest word or phrase of one of the sentences of the bilingual sentence pair. When the user-directable position indicator is proximate to the of-interest word or phrase, the system finds the linked word or phrase in the other sentence pair that corresponds to the of-interest word or phrase. The linked word or phrase is found based upon predetermined word-alignments between the bilingual sentence pair. Then, the system concurrently highlights both the of-interest word or phrase and the linked word or phrase via the UI on the display and doing so while still concurrently displaying each sentence of the bilingual sentence pair via the UI.

Atoperation514, the system receives input from the user indicating the user's rating or opinion regarding the quality of the existing word-alignments of the concurrently display bilingual sentence pair.

FIG. 6 illustrates anotherprocess600 for depiction of word alignments. The process is performed at least in part by the word-alignment computing system.Process600 employs the word-alignment computing system like that ofprocess500. Also,process600 may be employed concurrently withprocess500, or separately.

As shown here, theprocess600 begins withoperation602, wherein the word-alignment computing system obtains at least one bilingual sentence pair. Atoperation604, the word-alignment computing system concurrently displays each of the sentences of the bilingual sentence pair via a UI on an output display (likedisplay screen104 shown inFIG. 1).

Atoperation606, the word-alignment computing system waits for the user to produce an input event that indicates the user has chosen an of-interest word or phrase in one of the sentences. That input event may be, for example, a mouse cursor hovering over or near (i.e., proximate) to a word or phrase. The input event may be the result of a more active selection by the user, such as a click (left- or right-click), a hot-key, or a text selection by dragging the cursor over the desired text. Dragging here includes moving the cursor while pressing a button (typically on the mouse itself).

Atoperation608, once the of-interest word is chosen, the system performs a query based upon the of-interest word or phrase. The query may be via an online search engine like BING™ brand search engine by the Microsoft Corporation. Alternatively, the query may be to a dictionary, multilingual dictionary or translator. Alternatively still, the query may be made to a database where the meaning of word is described or elaborated upon.

Instead of just searching based upon the of-interest word, other implementations that combine withprocess500, may query on the linked word or phrase as well as, or instead of, the of-interest word or phrase. So in this instance the user may get, in response to selecting an English word, an English definition of its word-aligned Russian word in the Russian sentence of a bilateral sentence pair.

Atoperation610, the system presents the results of the query.

FIG. 7 illustrates aprocess700 for helping improve word alignments. The process is performed at least in part by the word-alignment computing system.Process700 employs the word-alignment computing system like that of

processes

500 and600. Also,process700 may be employed along with that of other processes described herein or separately.

As shown here, theprocess700 begins withoperation702, where the word-alignment computing system obtains at least one bilingual sentence pair. Atoperation704, the word-alignment computing system concurrently displays each of the sentences of the bilingual sentence pair via a UI on an output display (likedisplay screen104 shown inFIG. 1).

Atoperation706, the word-alignment computing system waits for the user to produce an input event that indicates the user has chosen a first word or phrase in one of the sentences. The first word or phrase can be in either sentence regardless of language or order of the sentences. For the sake of clarity, the sentence with the first word or phrase is called the “first” sentence herein.

Atoperation708, once the first word or phrase is chosen, the system highlights that first word or phrase.

Next, atoperation710, the system waits for the user to produce another input event that indicates the user has chosen a second word or phrase in the other of the two sentences. For the sake of clarity, the sentence with the second word or phrase is called the “second” sentence herein.

Once the user has selected both the first and the second words or phrases, she has indicated that these two words or phrases should be aligned. This is called word-realignment or user-feedback word realignment herein.

Atoperation712, once the second word or phrase is chosen, the system highlights that second word or phrase.

Atoperation714, the system stores the user-feedback word realignment.

In addition, atoperation716, the system stores other properties associated the user who performed this user-feedback word realignment. For example, other properties may include a measure of the time between word selections, which may be an indicator of whether the user seriously contemplated the contextual meaning of the words.

FIG. 8 illustrates aprocess800 for helping improve word alignments. The process is performed at least in part by the word-alignment computing system.Process800 employs the word-alignment computing system like that of the other processes already described. Also,process800 may be employed along with that of other processes described herein or separately.

As shown here, theprocess800 begins withoperation802, where the system obtains user-feedback ratings regarding the user-perceived quality of existing word-alignments for a dataset of bilingual sentence pairs.

Atoperation804, the system selects a retained group of sentence pairs based upon the user-feedback ratings. This also can be described as the system dividing the dataset of bilingual sentence pairs into at least two groups. The retained group includes sentence pairs that meet or exceed a defined quality standard based upon the obtained user-feedback ratings. A to-be-revised group includes sentence pairs having an existing word-alignment that falls below a defined quality standard based upon the obtained user-feedback ratings.

The defined quality threshold may be set automatically (via a statistical analysis) or manually by a human operator. Once set, each sentence pair has an overall rating that meets or exceeds a threshold or, of course, falls below that threshold. The overall rating of each sentence pair may be calculated based upon a number of factors, such as median or mean of user-feedback ratings of the word-alignment of a particular pair. Other statistical factors may be used as well, including quantity of specific ratings (e.g., at least 500 “up” ratings) or weighting based upon confidence associated with particular users.

Atoperation806, the word-alignment computing system obtains user-feedback word-realignment data regarding many of the sentence pairs in the dataset.

This user-feedback word-realignment data and the user-feedback ratings may be acquired from a multitude of multilingual users over the Internet. Because of the potentially global scale, the feedback may be received from thousands, hundreds of thousands, and perhaps even millions of users.

Atoperation808, based upon the obtained user-feedback word-realignment of the realigned sentence pairs, the system calculates a corrected word-realignment of the realigned sentence pairs. These pairs may be called the “corrected” group or “realigned” group.

As part of this operation, the system may calculate a user-specific confidence value based, at least, upon factors associated with the user. The system then repeats that calculation for each user of the group being considered. Each link is weighed based upon the calculated user-specific confidence value for each user of the many users. Then the system selects the corrected word-realignment of the realigned sentence pairs based upon the weighted links.

Next, atoperation810, the system generates a new and presumably improved word-alignment model based upon the retained group of sentence pairs and the realigned sentence pairs.

Atoperation812, the system applies the new word-realignment model to the same or another multilingual textual dataset. This is the improved multilingual textual dataset.

Atoperation814, the system exposes the pairs of sentences from the improved multilingual textual dataset. This would be done much like is discussed herein about

processes

500,600, and700.

CONCLUSION

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the claims.