Movatterモバイル変換

[0]ホーム

Jump to content

Parallel text

Edit links

From Wikipedia, the free encyclopedia

Text placed alongside its translation or translations

Not to be confused withParallel novel.

TheRosetta Stone, astele engraved with the same decree in both of theAncient Egyptian scripts as well asAncient Greek. Its discovery was key todeciphering the Ancient Egyptian language.

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Parallel text" – news ·newspapers ·books ·scholar ·JSTOR(May 2008) (Learn how and when to remove this message)

Aparallel text is a text placed alongside its translation or translations.^[1]^[2]Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. TheLoeb Classical Library and theClay Sanskrit Library are two examples of dual-language series of texts. ReferenceBibles may contain the original languages and a translation, or several translations by themselves, for ease of comparison and study;Origen'sHexapla (Greek for "sixfold") placed six versions of the Old Testament side by side. A famous example is theRosetta Stone, whose discovery allowed theAncient Egyptian language to begin beingdeciphered.

Large collections of parallel texts are calledparallel corpora (seetext corpus). Alignments of parallel corpora at sentence level are prerequisite for many areas oflinguistic research. During translation, sentences can be split, merged, deleted, inserted or reordered by the translator. This makes alignment a non-trivial task.

Parallel texts may be used inlanguage education.^[3]

Types of parallel corpora

[edit]

Parallel corpora can be classified into four main categories:^{[citation needed]}

Aparallel corpus contains translations of the same document in two or more languages, aligned at least at the sentence level. These tend to be rarer than less-comparable corpora.^{[citation needed]}
Anoisy parallel corpus contains bilingual sentences that are not perfectly aligned or have poor quality translations. Nevertheless, most of its contents are bilingual translations of a specific document.
Acomparable corpus is built from non-sentence-aligned and untranslated bilingual documents, but the documents are topic-aligned.
Aquasi-comparable corpus includes very heterogeneous and non-parallel bilingual documents that may or may not be topic-aligned.

Noise in corpora

[edit]

Large corpora used as training sets formachine translation algorithms are usually extracted from large bodies of similar sources, such as databases of news articles written in the first and second languages describing similar events.

However, extracted fragments may be noisy, with extra elements inserted in each corpus. Extraction techniques can differentiate betweenbilingual elements represented in both corpora andmonolingual elements represented in only one corpus in order to extract cleaner parallel fragments of bilingual elements. Comparable corpora are used to directly obtain knowledge for translation purposes. High-quality parallel data is difficult to obtain, however, especially for under-resourced languages.^[4]

Bitext

[edit]

Main article:Bitext word alignment

In the field oftranslation studies abitext is a merged document composed of both source- and target-language versions of a given text.

Bitexts are generated by a piece of software called analignment tool, or abitext tool, which automatically aligns the original and translated versions of the same text. The tool generally matches these two texts sentence by sentence. A collection of bitexts is called abitext database or abilingual corpus, and can be consulted with a search tool.

Bitexts and translation memories

[edit]

Main article:Translation memory

Bitexts have some similarities with translation memories. The most salient difference is that a translation memory loses the original context, while a bitext retains the original sentence order. That said, some implementations of translation memory, such asTranslation Memory eXchange (TMX), a standardXML format for exchanging translation memories betweencomputer-assisted translation (CAT) programs, allow preserving the original order of sentences.

Bitexts are designed to be consulted by a humantranslator, not by a machine. As such, small alignment errors or minor discrepancies that would cause a translation memory to fail are of no importance.

In his original 1988 article, Harris also posited that bitext represents how translators hold their source and target texts together in their mental working memories as they progress. However, this hypothesis has not been followed up.^[5]

Online bitexts and translation memories may also be calledonline bilingual concordances. Several are available on the public Web, includingLinguée,Reverso, and Tradooit.^[6]^[7]^[8]

References

[edit]

^Chan, Sin-Wai (2015).Routledge Encyclopedia of Translation Technology. London: Routledge.ISBN 978-1-315-74912-9.
^Williams, Philip; Sennrich, Rico; Post, Matt; Koehn, Philipp (2016).Syntax-based Statistical Machine Translation. Morgan & Claypool.ISBN 978-1-62705-502-4.
^Abdallah, A. (2021). Impact of using parallel text strategy on teaching reading to intermediate II level students. International Journal on Social and Education Sciences (IJonSES), 3(1), 95-108.https://doi.org/10.46328/ijonses.48
^Wołk, Krzysztof (2015)."Noisy-Parallel and Comparable Corpora Filtering Methodology for the Extraction of Bi-Lingual Equivalent Data at Sentence Level".Computer Science.16 (2):169–184.arXiv:1510.04500.Bibcode:2015arXiv151004500W.doi:10.7494/csci.2015.16.2.169.S2CID 12860633.
^Harris, B. (March 1988)."Bi-Text, A New Concept in Translation Theory"(PDF).Language Monthly.54:8–10. Archived fromthe original(PDF) on 2018-03-02.
^Genette, Marie (2016).How Reliable Are Online Bilingual Concordancers? An investigation ofLinguee,TradooIT,WeBiText andReversoContext and Their Reliability Through a Contrastive Analysis of Complex Prepositions from French to English (M.A. thesis). Université catholique de Louvain & Universitetet i Oslo.hdl:10852/51577.
^"TradooIT – Concordancier bilingue".
^Désilets, Alain; Farley, Benoît; Stojanović, Marta; Patenaude, Geneviève (2008).WeBiText: Building Large Heterogeneous Translation Memories from Parallel Web Content. Proceedings of Translating and the Computer. Vol. 30. pp. 27–28.S2CID 14586900.

External links

[edit]

General terms

Text analysis

Text segmentation	Compound-term processing Lemmatisation Lexical analysis Text chunking Stemming Sentence segmentation Word segmentation

Automatic summarization

Machine translation

Distributional semantics models

Language resources,
datasets and corpora

Types and standards	Corpus linguistics Lexical resource Linguistic Linked Open Data Machine-readable dictionary Parallel text PropBank Semantic network Simple Knowledge Organization System Speech corpus Text corpus Thesaurus (information retrieval) Treebank Universal Dependencies
Data	BabelNet Bank of English DBpedia FrameNet Google Ngram Viewer UBY WordNet Wikidata

Automatic identification
and data capture

Topic model

Computer-assisted
reviewing

Natural language
user interface

^Ralf, Ralf Steinberger; Pouliquen, Bruno; Widiger, Anna; Ignat, Camelia; Erjavec, Tomaž; Tufiş, Dan; Varga, Dániel (2006).The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages.Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'2006). Genoa, Italy, 24–26 May 2006.