Movatterモバイル変換


[0]ホーム

URL:


MDN Web Docs

String.prototype.normalize()

BaselineWidely available

Thenormalize() method ofString values returns the Unicode NormalizationForm of this string.

Try it

const name1 = "\u0041\u006d\u00e9\u006c\u0069\u0065";const name2 = "\u0041\u006d\u0065\u0301\u006c\u0069\u0065";console.log(`${name1}, ${name2}`);// Expected output: "Amélie, Amélie"console.log(name1 === name2);// Expected output: falseconsole.log(name1.length === name2.length);// Expected output: falseconst name1NFC = name1.normalize("NFC");const name2NFC = name2.normalize("NFC");console.log(`${name1NFC}, ${name2NFC}`);// Expected output: "Amélie, Amélie"console.log(name1NFC === name2NFC);// Expected output: trueconsole.log(name1NFC.length === name2NFC.length);// Expected output: true

Syntax

js
normalize()normalize(form)

Parameters

formOptional

One of"NFC","NFD","NFKC", or"NFKD", specifying the Unicode Normalization Form. If omitted orundefined,"NFC" is used.

These values have the following meanings:

"NFC"

Canonical Decomposition, followed by Canonical Composition.

"NFD"

Canonical Decomposition.

"NFKC"

Compatibility Decomposition, followed by Canonical Composition.

"NFKD"

Compatibility Decomposition.

Return value

A string containing the Unicode Normalization Form of the given string.

Exceptions

RangeError

Thrown ifform isn't one of the valuesspecified above.

Description

Unicode assigns a unique numerical value, called acode point, to eachcharacter. For example, the code point for"A" is given as U+0041. However,sometimes more than one code point, or sequence of code points, can represent the sameabstract character — the character"ñ" for example can be represented byeither of:

  • The single code point U+00F1.
  • The code point for"n" (U+006E) followed by the code point for thecombining tilde (U+0303).
js
const string1 = "\u00F1";const string2 = "\u006E\u0303";console.log(string1); // ñconsole.log(string2); // ñ

However, since the code points are different, string comparison will not treat them asequal. And since the number of code points in each version is different, they even havedifferent lengths.

js
const string1 = "\u00F1"; // ñconst string2 = "\u006E\u0303"; // ñconsole.log(string1 === string2); // falseconsole.log(string1.length); // 1console.log(string2.length); // 2

Thenormalize() method helps solve this problem by converting a stringinto a normalized form common for all sequences of code points that represent the samecharacters. There are two main normalization forms, one based oncanonicalequivalence and the other based oncompatibility.

Canonical equivalence normalization

In Unicode, two sequences of code points have canonical equivalence if they representthe same abstract characters, and should always have the same visual appearance andbehavior (for example, they should always be sorted in the same way).

You can usenormalize() using the"NFD" or"NFC"arguments to produce a form of the string that will be the same for all canonicallyequivalent strings. In the example below we normalize two representations of thecharacter"ñ":

js
let string1 = "\u00F1"; // ñlet string2 = "\u006E\u0303"; // ñstring1 = string1.normalize("NFD");string2 = string2.normalize("NFD");console.log(string1 === string2); // trueconsole.log(string1.length); // 2console.log(string2.length); // 2

Composed and decomposed forms

Note that the length of the normalized form under"NFD" is2. That's because"NFD" gives you thedecomposed version of the canonical form, in which single code pointsare split into multiple combining ones. The decomposed canonical form for"ñ" is"\u006E\u0303".

You can specify"NFC" to get thecomposed canonical form,in which multiple code points are replaced with single code points where possible. Thecomposed canonical form for"ñ" is"\u00F1":

js
let string1 = "\u00F1"; // ñlet string2 = "\u006E\u0303"; // ñstring1 = string1.normalize("NFC");string2 = string2.normalize("NFC");console.log(string1 === string2); // trueconsole.log(string1.length); // 1console.log(string2.length); // 1console.log(string2.codePointAt(0).toString(16)); // f1

Compatibility normalization

In Unicode, two sequences of code points are compatible if they represent the sameabstract characters, and should be treated alike in some — but not necessarily all —applications.

All canonically equivalent sequences are also compatible, but not vice versa.

For example:

  • the code point U+FB00 represents theligature"ff". It is compatiblewith two consecutive U+0066 code points ("ff").
  • the code point U+24B9 represents the symbol"Ⓓ".It is compatible with the U+0044 code point ("D").

In some respects (such as sorting) they should be treated as equivalent—and in some(such as visual appearance) they should not, so they are not canonically equivalent.

You can usenormalize() using the"NFKD" or"NFKC" arguments to produce a form of the string that will be the same forall compatible strings:

js
let string1 = "\uFB00";let string2 = "\u0066\u0066";console.log(string1); // ffconsole.log(string2); // ffconsole.log(string1 === string2); // falseconsole.log(string1.length); // 1console.log(string2.length); // 2string1 = string1.normalize("NFKD");string2 = string2.normalize("NFKD");console.log(string1); // ff <- visual appearance changedconsole.log(string2); // ffconsole.log(string1 === string2); // trueconsole.log(string1.length); // 2console.log(string2.length); // 2

When applying compatibility normalization it's important to consider what you intend todo with the strings, since the normalized form may not be appropriate for allapplications. In the example above the normalization is appropriate for search, becauseit enables a user to find the string by searching for"f". But it may notbe appropriate for display, because the visual representation is different.

As with canonical normalization, you can ask for decomposed or composed compatibleforms by passing"NFKD" or"NFKC", respectively.

Examples

Using normalize()

js
// Initial string// U+1E9B: LATIN SMALL LETTER LONG S WITH DOT ABOVE// U+0323: COMBINING DOT BELOWconst str = "\u1E9B\u0323";// Canonically-composed form (NFC)// U+1E9B: LATIN SMALL LETTER LONG S WITH DOT ABOVE// U+0323: COMBINING DOT BELOWstr.normalize("NFC"); // '\u1E9B\u0323'str.normalize(); // same as above// Canonically-decomposed form (NFD)// U+017F: LATIN SMALL LETTER LONG S// U+0323: COMBINING DOT BELOW// U+0307: COMBINING DOT ABOVEstr.normalize("NFD"); // '\u017F\u0323\u0307'// Compatibly-composed (NFKC)// U+1E69: LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVEstr.normalize("NFKC"); // '\u1E69'// Compatibly-decomposed (NFKD)// U+0073: LATIN SMALL LETTER S// U+0323: COMBINING DOT BELOW// U+0307: COMBINING DOT ABOVEstr.normalize("NFKD"); // '\u0073\u0323\u0307'

Specifications

Specification
ECMAScript® 2026 Language Specification
# sec-string.prototype.normalize

Browser compatibility

See also

Help improve MDN

Learn how to contribute.

This page was last modified on byMDN contributors.


[8]ページ先頭

©2009-2025 Movatter.jp