Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Bidirectional text

From Wikipedia, the free encyclopedia
Text that contains both LTR and RTL text
This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Bidirectional text" – news ·newspapers ·books ·scholar ·JSTOR
(July 2015) (Learn how and when to remove this message)
Some web browsers may display the Hebrew text in this article in the reverse direction.

Abidirectional text contains twotext directionalities,right-to-left (RTL) andleft-to-right (LTR). It generally involves text containing different types ofalphabets, but may also refer toboustrophedon, which is changing text direction in each row.

An example is the RTL Hebrew name Sarah:שרה, spelled sin (ש) on the right, resh (ר) in the middle, and heh (ה) on the left. Many computer programs failed to display this correctly, because they were designed to display text in one direction only.

Some so-calledright-to-left scripts such as thePersian script and Arabic are mostly, but not exclusively, right-to-left—mathematical expressions, numeric dates and numbers bearing units are embedded from left to right. That also happens if text from a left-to-right language such as English is embedded in them; or vice versa, if Arabic is embedded in a left-to-right script such as English.

Bidirectional script support

[edit]

Bidirectional script support is the capability of acomputer system to correctly display bidirectional text. The term is often shortened to "BiDi" or "bidi".

Early computer installations were designed only to support a singlewriting system, typically for left-to-right scripts based on theLatin alphabet only. Adding newcharacter sets andcharacter encodings enabled a number of other left-to-right scripts to be supported, but did not easily support right-to-left scripts such asArabic orHebrew, and mixing the two was not practical. Right-to-left scripts were introduced through encodings likeISO/IEC 8859-6 andISO/IEC 8859-8, storing the letters (usually) in writing and reading order. It is possible to simply flip the left-to-right display order to a right-to-left display order, but doing this sacrifices the ability to correctly display left-to-right scripts. With bidirectional script support, it is possible to mix characters from different scripts on the same page, regardless of writing direction.

In particular, theUnicode standard provides foundations for complete BiDi support, with detailed rules as to how mixtures of left-to-right and right-to-left scripts are to be encoded and displayed.

Unicode bidi support

[edit]
See also:Right-to-left mark,Left-to-right mark, andArabic letter mark

The Unicode standard calls for characters to be ordered 'logically', i.e. in the sequence they are intended to be interpreted, as opposed to 'visually', the sequence they appear. This distinction is relevant for bidi support because at any bidi transition, the visual presentation ceases to be the 'logical' one. Thus, in order to offer bidi support, Unicode prescribes an algorithm for how to convert the logical sequence of characters into the correct visual presentation. For this purpose, the Unicode encoding standard divides all its characters into one of four types: 'strong', 'weak', 'neutral', and 'explicit formatting'.[1]

Strong characters

[edit]

Strong characters are those with a definite direction. Examples of this type of character include most alphabetic characters, syllabic characters, Han ideographs, non-European or non-Arabic digits, and punctuation characters that are specific to only those scripts.

Weak characters

[edit]

Weak characters are those with vague direction. Examples of this type of character include European digits, Eastern Arabic-Indic digits, arithmetic symbols, and currency symbols.

Neutral characters

[edit]

Neutral characters have direction indeterminable without context. Examples include paragraph separators, tabs, and most other whitespace characters. Punctuation symbols that are common to many scripts, such as the colon, comma, full-stop, and the no-break-space also fall within this category.

Explicit formatting

[edit]

Explicit formatting characters, also referred to as "directional formatting characters", are special Unicode sequences that direct the algorithm to modify its default behavior. These characters are subdivided into "marks", "embeddings", "isolates", and "overrides". Their effects continue until the occurrence of either a paragraph separator, or a "pop" character.

Marks

[edit]
See also:Right-to-left mark,Left-to-right mark, andArabic letter mark

If a "weak" character is followed by another "weak" character, the algorithm will look at the first neighbouring "strong" character. Sometimes this leads to unintentional display errors. These errors are corrected or prevented with "pseudo-strong" characters. SuchUnicode control characters are calledmarks. The mark (U+200ELEFT-TO-RIGHT MARK (LRM) orU+200FRIGHT-TO-LEFT MARK (RLM)) is to be inserted into a location to make an enclosed weak character inherit its writing direction.

For example, to correctly display theU+2122TRADE MARK SIGN for an English name brand (LTR) in an Arabic (RTL) passage, an LRM mark is inserted after the trademark symbol if the symbol is not followed by LTR text (e.g. "قرأ Wikipedia™‎ طوال اليوم."). If the LRM mark is not added, the weak character ™ will be neighbored by a strong LTR character and a strong RTL character. Hence, in an RTL context, it will be considered to be RTL, and displayed in an incorrect order (e.g. "قرأ Wikipedia™ طوال اليوم.").

Embeddings

[edit]

The "embedding" directional formatting characters are the classical Unicode method of explicit formatting, and as of Unicode 6.3, are being discouraged in favor of "isolates". An "embedding" signals that a piece of text is to be treated as directionally distinct. The text within the scope of the embedding formatting characters is not independent of the surrounding text. Also, characters within an embedding can affect the ordering of characters outside. Unicode 6.3 recognized that directional embeddings usually have too strong an effect on their surroundings and are thus unnecessarily difficult to use.

Isolates

[edit]

The "isolate" directional formatting characters signal that a piece of text is to be treated as directionally isolated from its surroundings. As of Unicode 6.3, these are the formatting characters that are being encouraged in new documents – once target platforms are known to support them. These formatting characters were introduced after it became apparent that directional embeddings usually have too strong an effect on their surroundings and are thus unnecessarily difficult to use. Unlike the legacy 'embedding' directional formatting characters, 'isolate' characters have no effect on the ordering of the text outside their scope. Isolates can be nested, and may be placed within embeddings and overrides.

Overrides

[edit]

The "override" directional formatting characters allow for special cases, such as for part numbers (e.g. to force a part number made of mixed English, digits and Hebrew letters to be written from right to left), and are recommended to be avoided wherever possible. As is true of the other directional formatting characters, "overrides" can be nested one inside another, and in embeddings and isolates.

Using Unicode to override
[edit]

UsingU+202DLEFT-TO-RIGHT OVERRIDE will switch the text direction from left-to-right to right-to-left. Similarly, usingU+202ERIGHT-TO-LEFT OVERRIDE will switch the text direction from right-to-left to left-to-right. Refer to theUnicode Bidirectional Algorithm.

Pops

[edit]

The "pop" directional formatting character, encoded atU+202CPOP DIRECTIONAL FORMATTING, terminates the scope of the most recent "embedding", "override", or "isolate".

Runs

[edit]

In the algorithm, each sequence of concatenated strong characters is called a "run". A "weak" character that is located between two "strong" characters with the same orientation will inherit their orientation. A "weak" character that is located between two "strong" characters with a different writing direction will inherit the main context's writing direction (in an LTR document the character will become LTR, in an RTL document, it will become RTL).

Table of possible BiDi character types

[edit]
Bidirectional character type(Bidi_Class Unicodecharacter property)[1]
Type[2]DescriptionStrengthDirectionalityGeneral scopeBidi_Control character[3]
LLeft-to-RightStrongL-to-RMost alphabetic and syllabic characters, Chinese characters, non-European or non-Arabic digits, LRM character, ...U+200E LEFT-TO-RIGHT MARK (LRM)
RRight-to-LeftStrongR-to-LAdlam, Garay, Hebrew, Mandaic, Mende Kikakui, N'Ko, Samaritan, ancient scripts like Kharoshthi and Nabataean, RLM character, ...U+200F RIGHT-TO-LEFT MARK (RLM)
ALArabic LetterStrongR-to-LArabic, Hanifi Rohingya, Sogdian, Syriac, and Thaana alphabets, and most punctuation specific to those scripts, ALM character, ...U+061C ARABIC LETTER MARK (ALM)
ENEuropean NumberWeakEuropean digits, Eastern Arabic-Indic digits, Coptic epact numbers, ...
ESEuropean SeparatorWeakplus sign,minus sign, ...
ETEuropean Number TerminatorWeakdegree sign, currency symbols, ...
ANArabic NumberWeakArabic-Indic digits, Arabic decimal and thousands separators, Rumi digits, Hanifi Rohingya digits, ...
CSCommon Number SeparatorWeakcolon,comma,full stop,no-break space, ...
NSMNonspacing MarkWeakCharacters in General Categories Mark, nonspacing, and Mark, enclosing (Mn, Me)
BNBoundary NeutralWeakDefault ignorables, non-characters, control characters other than those explicitly given other types
BParagraph SeparatorNeutralparagraph separator, appropriate Newline Functions, higher-level protocol paragraph determination
SSegment SeparatorNeutralTabs
WSWhitespaceNeutralspace,figure space,line separator,form feed, General Punctuation block spaces (smaller set than theUnicode whitespace list)
ONOther NeutralsNeutralAll other characters, includingobject replacement character
LRELeft-to-Right EmbeddingExplicitL-to-RLRE character onlyU+202A LEFT-TO-RIGHT EMBEDDING (LRE)
LROLeft-to-Right OverrideExplicitL-to-RLRO character onlyU+202D LEFT-TO-RIGHT OVERRIDE (LRO)
RLERight-to-Left EmbeddingExplicitR-to-LRLE character onlyU+202B RIGHT-TO-LEFT EMBEDDING (RLE)
RLORight-to-Left OverrideExplicitR-to-LRLO character onlyU+202E RIGHT-TO-LEFT OVERRIDE (RLO)
PDFPop Directional FormatExplicitPDF character onlyU+202C POP DIRECTIONAL FORMATTING (PDF)
LRILeft-to-Right IsolateExplicitL-to-RLRI character onlyU+2066 LEFT-TO-RIGHT ISOLATE (LRI)
RLIRight-to-Left IsolateExplicitR-to-LRLI character onlyU+2067 RIGHT-TO-LEFT ISOLATE (RLI)
FSIFirst Strong IsolateExplicitFSI character onlyU+2068 FIRST STRONG ISOLATE (FSI)
PDIPop Directional IsolateExplicitPDI character onlyU+2069 POP DIRECTIONAL ISOLATE (PDI)
Notes
1.^Unicode Bidirectional Algorithm (UAX#9), As of Unicode version 16.0
2.^Possible Bidirectional character types for character property: Bidi_Class or 'type'
3.^Bidi_Control characters: Twelve Bidi_Control formatting characters are defined. They are invisible, and have no effect apart from directionality. Nine of them have a unique, overruling BiDi-type that is used by the algorithm. Their type is also their acronym (e.g. character 'LRE' has BiDi type 'LRE').

Security

[edit]

Unicode bidirectional characters are used in theTrojan Source vulnerability.[2]

Visual Studio Code highlights BiDi control characters since version 1.62 released in October 2021.[3]

Visual Studio highlights BiDi control characters since version 17.0.3 released on December 14, 2021.[4]

Scripts using bidirectional text

[edit]

Egyptian hieroglyphs

[edit]

Egyptianhieroglyphs were written bidirectionally, where the signs that had a distinct "head" or "tail" faced the beginning of the line.

Chinese characters and other CJK scripts

[edit]

Chinese characters can be written in either direction as well as vertically (top to bottom then right to left), especially in signs (such as plaques), but the orientation of the individual characters does not change. This can often be seen on tour buses in China, where the company name customarily runs from the front of the vehicle to its rear — that is, from right to left on the right side of the bus, and from left to right on the left side of the bus. English texts on the right side of the vehicle are also quite commonly written in reverse order. (See pictures of tour bus and post vehicle below.)

Likewise, otherCJK scripts made up of the same square characters, such as theJapanese writing system andKorean writing system, can also be written in any direction, although horizontally left-to-right, top-to-bottom and vertically top-to-bottom right-to-left are the two most common forms.

  • The right side (text runs from right to left, including the English text)
    The right side (text runs from right to left, including the English text)
  • The left side (text runs from left to right)
    The left side (text runs from left to right)
  • On the right side of this Hainan Airlines aircraft, the text runs from right to left (空航南海).
    On the right side of thisHainan Airlines aircraft, the text runs from right to left (空航南海).
  • The left side of this Hainan Airlines aircraft, however, shows the text running from left to right (海南航空).
    The left side of this Hainan Airlines aircraft, however, shows the text running from left to right (海南航空).
  • A photo that shows text on both sides of a China Post vehicle. On the right door, china post appears as tsop anihc.
    A photo that shows text on both sides of a China Post vehicle. On the right door,china post appears astsop anihc.

Boustrophedon

[edit]

Boustrophedon is a writing style found in ancientGreek inscriptions, inOld Sabaic (anOld South Arabian language) and inHungarian runes. This method of writing alternates direction, and usually reverses the individual characters, on each successive line.

Moon type

[edit]

Moon type is an embossed adaptation of the Latin alphabet invented as atactile alphabet for the blind.Initially the text changed direction (but not character orientation) at the end of the lines.Special embossed lines connected the end of a line and the beginning of the next.[5]Around 1990, it changed to aleft-to-right orientation.

See also

[edit]

References

[edit]
  1. ^"UAX #9: Unicode Bi-directional Algorithm". Unicode.org. 2018-05-09. Retrieved2018-06-26.
  2. ^"Trojan Source Attacks".trojansource.codes. Retrieved17 January 2022.
  3. ^"Visual Studio Code October 2021".code.visualstudio.com. Retrieved11 November 2021.
  4. ^"Visual Studio 2022 version 17.0 Release Notes".docs.microsoft.com. Retrieved17 January 2022.
  5. ^Moon Type for the Blind,Ramseyer Bible Collection,Kathryn A. Martin Library,University of Minnesota Duluth.

External links

[edit]
Unicode
Code points
Characters
Special purpose
Lists
Processing
Algorithms
Comparison of encodings
On pairs of
code points
Usage
Related standards
Related topics
Scripts and symbols in Unicode
Common and
inherited scripts
Modern scripts
Ancient and
historic scripts
Notational scripts
Symbols, emojis
Retrieved from "https://en.wikipedia.org/w/index.php?title=Bidirectional_text&oldid=1297936433"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp