- Notifications
You must be signed in to change notification settings - Fork11
A Javascript implementation of the Unicode 9.0.0 Bidirectional Algorithm
License
bbc/unicode-bidirectional
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
A Javascript implementation of theUnicode 9.0.0 Bidirectional Algorithm
This is an implementation of the Unicode Bidirectional Algorithm (UAX #9) thatworks in both Browser and Node.js environments. The implementation is conformant as per definitionUAX#9-C1.
npm install unicode-bidirectional --saveunicode-bidirectional is declared as aUniversal Module (UMD),meaning it can be used with all conventional Javascript module systems:
1. ES6→
import{resolve,reorder}from'unicode-bidirectional';constcodepoints=[0x28,0x29,0x2A,0x05D0,0x05D1,0x05D2]constlevels=resolve(codepoints,0);// [0, 0, 0, 1, 1, 1]constreordering=reorder(codepoints,levels);// [0x28, 0x29, 0x2A, 0x05D2, 0x05D1, 0x05D0]
2. CommonJS→
varUnicodeBidirectional=require('unicode-bidirectional/dist/unicode.bidirectional');varresolve=UnicodeBidirectional.resolve;varreorder=UnicodeBidirectional.reorder;varcodepoints=[0x28,0x29,0x2A,0x05D0,0x05D1,0x05D2]varlevels=resolve(codepoints,0);// [0, 0, 0, 1, 1, 1]varreordering=reorder(codepoints,levels);// [0x28, 0x29, 0x2A, 0x05D2, 0x05D1, 0x05D0]
3. RequireJS→
require(['UnicodeBidirectional'],function(UnicodeBidirectional){varresolve=UnicodeBidirectional.resolve;varreorder=UnicodeBidirectional.reorder;varcodepoints=[0x28,0x29,0x2A,0x05D0,0x05D1,0x05D2]varlevels=resolve(codepoints,0);// [0, 0, 0, 1, 1, 1]varreordering=reorder(codepoints,levels);// [0x28, 0x29, 0x2A, 0x05D2, 0x05D1, 0x05D0]});
4. HTML5<script> tag→
<scriptsrc="unicode.bidirectional.js"/><!-- exposes window.UnicodeBidirectional -->
varresolve=UnicodeBidirectional.resolve;varreorder=UnicodeBidirectional.reorder;varcodepoints=[0x28,0x29,0x2A,0x05D0,0x05D1,0x05D2]varlevels=resolve(codepoints,0);// [0, 0, 0, 1, 1, 1]varreordering=reorder(codepoints,levels);// [0x28, 0x29, 0x2A, 0x05D2, 0x05D1, 0x05D0]
You can downloadunicode.bidirectional.js fromReleases.Using this file with a<script> tag willexposeUnicodeBidirectional as global variable on thewindow object.
Returns the resolved levels associated to each codepoint incodepoints[1].This levels array determines: (i) the relative nesting of LTR and RTL characters, andhence (ii) how characters should be reversed when displayed on the screen.
The input codepoints are assumed to be all be in one paragraph that has a base direction ofparagraphLevel –this is a Number that is either 0 or 1 and represents whether the paragraph isleft-to-right (0) orright-to-left (1).automaticLevel is an optional Boolean flag that when present and set to true,causes this function to ignore theparagraphlevel argument and instead attempt to deduce the paragraph level from the codepoints.[2]
Neither of the two input arrays are mutated.
Returns the codepoints incodepoints reordered (i.e. permuted) according thelevels array.[3]
Neither of the two input arrays are mutated.
Returns the reordering thatlevels represents as an permutation array.When this array has an element at index i with value j, it denotes that the codepointprevious positioned at index i is now positioned at index j.[4]
The input array is not mutated. TheIGNORE_INVISIBLE parameter controls whether or notinvisible characters (characters with a level of 'x'[5])are to be included in the permutation array.By default, theyare included in the permutation (they arenot ignored, henceIGNORE_INVISIBLE isfalse).
Replaces each codepoint incodepoints with its mirrored glyph according to ruleL4and thelevels array.
Neither of the two input arrays are mutated.
An object containing metadata used by the bidirectional algorithm. This object includes the following keys:
mirrorMap: a map mapping a codepoint to its mirrored counterpart, e.g. looking up "<" gives ">". If a codepoint does nothave a mirrored counterpart, then there is no key-value pair in the map and so a lookup will giveundefined.[6]oppositeBracket: a map mapping a codepoint to its bracket pair counterpart, e.g. looking up "(" gives ")". If a codepoint does nothave a bracket pair counterpart, then there is no key-value pair in the map and so a lookup will giveundefined.[7]openingBrackets: a set containing all brackets that are opening brackets.[7]closingBrackets: a set containing all brackets that are closing brackets.[7]
Additional Notes:
For all the above functions, codepoints are represented by an Array of Numberswhere each Number denotes the Unicode codepoint of the character, thatis an integer between 0x0 and 0x10FFFF inclusive. levels are represented by an Array ofNumbers where Number is an integer between 0 and 127 inclusive. One or more entries of levelsmay be the string 'x'. This denotes a character that does not have a level[5].
[1]: Codepoints are automatically converted toNFC normal form if they are not already in that form.
[2]: This function deduces the paragraph level according to:UAX#P1,UAX#P2 andUAX#P3.
[3]: This is an implementation ofUAX#9-L2.
[4]: More formally known as theone-line notation for permutations.See Wikipedia.
[5]: Some characters have a level of x – the levels array has a string 'x' instead of a number.This is expected behaviour. The reason is because the Unicode Bidirectional algorithm (by ruleX9.) will not assign a level to certain invisible characters / control characters.They are basically completely ignored by the algorithm.They are invisible and so have no impact on the visual RTL/LTR ordering of characters.Most of the invisible characters that fall into this category are in thislist.
[6]: This is taken fromBidiMirroring.txt.
[7]: This is taken fromBidiBrackets.txt.
unicode-bidirectional uses the following ECMAScript 2015 (ES5) features that are not fully supported by Internet Explorer and older versions of other browsers:
If you are targeting these browsers, you'll need to add one or more Polyfill libraries to fill in these features(for example,es6-shim andunorm).
For other Javascript Unicode Implementations see:
- devongovett/grapheme-breaker – Unicode Grapheme Cluster Breaking Algorithm (UAX #29)
- devongovett/linebreak – Unicode Line Breaking Algorithm (UAX #14)
MIT.
Copyright (c) 2017 British Broadcasting Corporation
About
A Javascript implementation of the Unicode 9.0.0 Bidirectional Algorithm
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.