Movatterモバイル変換

26.Unicode in ES6

This chapter explains the improved support for Unicode that ECMAScript 6 brings. For a general introduction to Unicode, read Chap. “Unicode and JavaScript” in “Speaking JavaScript”.

26.1.Unicode is better supported in ES6
26.2.Escape sequences in ES6
- 26.2.1. Where can escape sequences be used?
- 26.2.2. Escape sequences in the ES6 spec

26.1Unicode is better supported in ES6

There are three areas in which ECMAScript 6 has improved support for Unicode:

Unicode escapes for code points beyond 16 bits:\u{···}
Can be used in identifiers, string literals, template literals and regular expression literals. They are explained in the next section.
Strings:
- Iteration honors Unicode code points.
- Read code point values viaString.prototype.codePointAt().
- Create a string from code point values viaString.fromCodePoint().
Regular expressions:
- New flag/u (plus boolean propertyunicode) improves handling of surrogate pairs.

Additionally, ES6 is based on Unicode version 5.1.0, whereas ES5 is based on Unicode version 3.0.

26.2Escape sequences in ES6

There are three parameterized escape sequences for representing characters in #"#_where-can-escape-sequences-be-used" aria-hidden="true">#

The escape sequences can be used in the following locations:

	`\uHHHH`	`\u{···}`	`\xHH`
Identifiers	✔	✔
String literals	✔	✔	✔
Template literals	✔	✔	✔
Regular expression literals	✔	Only with flag`/u`	✔

Identifiers:

A 4-digit Unicode escape\uHHHH becomes a single code point.
A Unicode code point escape\u{···} becomes a single code point.

> const hello = 123;> hell\u{6F}123

String literals:

Strings are internally stored as UTF-16 code units.
A hex escape\xHH contributes a UTF-16 code unit.
A 4-digit Unicode escape\uHHHH contributes a UTF-16 code unit.
A Unicode code point escape\u{···} contributes the UTF-16 encoding of its code point (one or two UTF-16 code units).

Template literals:

In template literals, escape sequences are handled like in string literals.
In tagged templates, how escape sequences are interpreted depends on the tag function. It can choose between two interpretations:
- Cooked: escape sequences are handled like in string literals.
- Raw: escape sequences are handled as a sequence of characters.

> `hell\u{6F}` // cooked'hello'> String.raw`hell\u{6F}` // raw'hell\\u{6F}'

Regular expressions:

Unicode code point escapes are only allowed if the flag/u is set, because\u{3} is interpreted as three times the characteru, otherwise:
```
  > /^\u{3}$/.test('uuu')  true
```

26.2.2Escape sequences in the ES6 spec

Various information:

The spec treats source code as a sequence of Unicode code points: “Source Text”
Unicode escape sequences sequences in identifiers: “Names and Keywords”
Strings are internally stored as sequences of UTF-16 code units: “String Literals”
Strings – how various escape sequences are translated to UTF-16 code units: “Static Semantics: SV”
Template literals – how various escape sequences are translated to UTF-16 code units: “Static Semantics: TV and TRV”

26.2.2.1Regular expressions

The spec distinguishes between BMP patterns (flag/u not set) and Unicode patterns (flag/u set). Sect. “Pattern Semantics” explains that they are handled differently and how.

As a reminder, here is how grammar rules are be parameterized in the spec:

If a grammar ruleR has the subscript[U] then that means there are two versions of it:R andR_U.
Parts of the rule can pass on the subscript via[?U].
If a part of a rule has the prefix[+U] it only exists if the subscript[U] is present.
If a part of a rule has the prefix[~U] it only exists if the subscript[U] is not present.

You can see this parameterization in action in Sect. “Patterns”, where the subscript[U] creates separate grammars for BMP patterns and Unicode patterns:

IdentityEscape: In BMP patterns, many characters can be prefixed with a backslash and are interpreted as themselves (for example: if\u is not followed by four hexadecimal digits, it is interpreted asu). In Unicode patterns that only works for the following characters (which frees up\u for Unicode code point escapes):^ $ \ . * + ? ( ) [ ] { } |
RegExpUnicodeEscapeSequence:"\u{" HexDigits "}" is only allowed in Unicode patterns. In those patterns, lead and trail surrogates are also grouped to help with UTF-16 decoding.

Sect. “CharacterEscape” explains how various escape sequences are translated tocharacters (roughly: either code units or code points).

Movatterモバイル変換

26.Unicode in ES6#

26.1Unicode is better supported in ES6#

26.2Escape sequences in ES6#

26.2.2Escape sequences in the ES6 spec#

26.2.2.1Regular expressions#

Further reading#

26.Unicode in ES6

26.1Unicode is better supported in ES6

26.2Escape sequences in ES6

26.2.2Escape sequences in the ES6 spec

26.2.2.1Regular expressions

Further reading