Movatterモバイル変換


[0]ホーム

URL:


MDN Web Docs

Lexical grammar

This page describes JavaScript's lexical grammar. JavaScript source text is just a sequence of characters — in order for the interpreter to understand it, the string has to beparsed to a more structured representation. The initial step of parsing is calledlexical analysis, in which the text gets scanned from left to right and is converted into a sequence of individual, atomic input elements. Some input elements are insignificant to the interpreter, and will be stripped after this step — they includewhite space andcomments. The others, includingidentifiers,keywords,literals, and punctuators (mostlyoperators), will be used for further syntax analysis.Line terminators and multiline comments are also syntactically insignificant, but they guide the process forautomatic semicolons insertion to make certain invalid token sequences become valid.

Format-control characters

Format-control characters have no visual representation but are used to control the interpretation of the text.

Code pointNameAbbreviationDescription
U+200CZero width non-joiner<ZWNJ>Placed between characters to prevent being connected into ligatures in certain languages (Wikipedia).
U+200DZero width joiner<ZWJ>Placed between characters that would not normally be connected in order to cause the characters to be rendered using their connected form in certain languages (Wikipedia).
U+FEFFByte order mark<BOM>Used at the start of the script to mark it as Unicode and to allow detection of the text's encoding and byte order (Wikipedia).

In JavaScript source text, <ZWNJ> and <ZWJ> are treated asidentifier parts, while <BOM> (also called a zero-width no-break space <ZWNBSP> when not at the start of text) is treated aswhite space.

White space

White space characters improve the readability of source text and separate tokens from each other. These characters are usually unnecessary for the functionality of the code.Minification tools are often used to remove whitespace in order to reduce the amount of data that needs to be transferred.

Code pointNameAbbreviationDescriptionEscape sequence
U+0009Character tabulation<TAB>Horizontal tabulation\t
U+000BLine tabulation<VT>Vertical tabulation\v
U+000CForm feed<FF>Page breaking control character (Wikipedia).\f
U+0020Space<SP>Normal space
U+00A0No-break space<NBSP>Normal space, but no point at which a line may break
U+FEFFZero-width no-break space<ZWNBSP>When not at the start of a script, the BOM marker is a normal whitespace character.
OthersOther Unicode space characters<USP>Characters in the "Space_Separator" general category

Note:Of thosecharacters with the "White_Space" property but are not in the "Space_Separator" general category, U+0009, U+000B, and U+000C are still treated as white space in JavaScript; U+0085 NEXT LINE has no special role; others become the set ofline terminators.

Note:Changes to the Unicode standard used by the JavaScript engine may affect programs' behavior. For example, ES2016 upgraded the reference Unicode standard from 5.1 to 8.0.0, which caused U+180E MONGOLIAN VOWEL SEPARATOR to be moved from the "Space_Separator" category to the "Format (Cf)" category, and made it a non-whitespace. Subsequently, the result of"\u180E".trim().length changed from0 to1.

Line terminators

In addition towhite space characters, line terminator characters are used to improve the readability of the source text. However, in some cases, line terminators can influence the execution of JavaScript code as there are a few places where they are forbidden. Line terminators also affect the process ofautomatic semicolon insertion.

Outside the context of lexical grammar, white space and line terminators are often conflated. For example,String.prototype.trim() removes all white space and line terminators from the beginning and end of a string. The\scharacter class escape in regular expressions matches all white space and line terminators.

Only the following Unicode code points are treated as line terminators in ECMAScript, other line breaking characters are treated as white space (for example, Next Line, NEL, U+0085 is considered as white space).

Code pointNameAbbreviationDescriptionEscape sequence
U+000ALine Feed<LF>New line character in UNIX systems.\n
U+000DCarriage Return<CR>New line character in Commodore and early Mac systems.\r
U+2028Line Separator<LS>Wikipedia
U+2029Paragraph Separator<PS>Wikipedia

Comments

Comments are used to add hints, notes, suggestions, or warnings to JavaScript code. This can make it easier to read and understand. They can also be used to disable code to prevent it from being executed; this can be a valuable debugging tool.

JavaScript has two long-standing ways to add comments to code: line comments and block comments. In addition, there's a special hashbang comment syntax.

Line comments

The first way is the// comment; this makes all text following it on the same line into a comment. For example:

js
function comment() {  // This is a one line JavaScript comment  console.log("Hello world!");}comment();

Block comments

The second way is the/* */ style, which is much more flexible.

For example, you can use it on a single line:

js
function comment() {  /* This is a one line JavaScript comment */  console.log("Hello world!");}comment();

You can also make multiple-line comments, like this:

js
function comment() {  /* This comment spans multiple lines. Notice     that we don't need to end the comment until we're done. */  console.log("Hello world!");}comment();

You can also use it in the middle of a line, if you wish, although this can make your code harder to read so it should be used with caution:

js
function comment(x) {  console.log("Hello " + x /* insert the value of x */ + " !");}comment("world");

In addition, you can use it to disable code to prevent it from running, by wrapping code in a comment, like this:

js
function comment() {  /* console.log("Hello world!"); */}comment();

In this case, theconsole.log() call is never issued, since it's inside a comment. Any number of lines of code can be disabled this way.

Block comments that contain at least one line terminator behave likeline terminators inautomatic semicolon insertion.

Hashbang comments

There's a special third comment syntax, thehashbang comment. A hashbang comment behaves exactly like a single line-only (//) comment, except that it begins with#! andis only valid at the absolute start of a script or module. Note also that no whitespace of any kind is permitted before the#!. The comment consists of all the characters after#! up to the end of the first line; only one such comment is permitted.

Hashbang comments in JavaScript resembleshebangs in Unix which provide the path to a specific JavaScript interpreter that you want to use to execute the script. Before the hashbang comment became standardized, it had already been de-facto implemented in non-browser hosts like Node.js, where it was stripped from the source text before being passed to the engine. An example is as follows:

js
#!/usr/bin/env nodeconsole.log("Hello world");

The JavaScript interpreter will treat it as a normal comment — it only has semantic meaning to the shell if the script is directly run in a shell.

Warning:If you want scripts to be runnable directly in a shell environment, encode them in UTF-8 without aBOM. Although a BOM will not cause any problems for code running in a browser — because it's stripped during UTF-8 decoding, before the source text is analyzed — a Unix/Linux shell will not recognize the hashbang if it's preceded by a BOM character.

You must only use the#! comment style to specify a JavaScript interpreter. In all other cases just use a// comment (or multiline comment).

Identifiers

Anidentifier is used to link a value with a name. Identifiers can be used in various places:

js
const decl = 1; // Variable declaration (may also be `let` or `var`)function fn() {} // Function declarationconst obj = { key: "value" }; // Object keys// Class declarationclass C {  #priv = "value"; // Private field}lbl: console.log(1); // Label

In JavaScript, identifiers are commonly made of alphanumeric characters, underscores (_), and dollar signs ($). Identifiers are not allowed to start with numbers. However, JavaScript identifiers are not only limited toASCII — many Unicode code points are allowed as well. Namely:

  • Start characters can be any character in theID_Start category plus_ and$.
  • After the first character, you can use any character in theID_Continue category plus U+200C (ZWNJ) and U+200D (ZWJ).

Note:If, for some reason, you need to parse some JavaScript source yourself, do not assume all identifiers follow the pattern/[A-Za-z_$][\w$]*/ (i.e., ASCII-only)! The range of identifiers can be described by the regex/[$_\p{ID_Start}][$\p{ID_Continue}]*/u (excluding unicode escape sequences).

In addition, JavaScript allows usingUnicode escape sequences in the form of\u0000 or\u{000000} in identifiers, which encode the same string value as the actual Unicode characters. For example,你好 and\u4f60\u597d are the same identifiers:

js
const 你好 = "Hello";console.log(\u4f60\u597d); // Hello

Not all places accept the full range of identifiers. Certain syntaxes, such as function declarations, function expressions, and variable declarations require using identifiers names that are notreserved words.

js
function import() {} // Illegal: import is a reserved word.

Most notably, private elements and object properties allow reserved words.

js
const obj = { import: "value" }; // Legal despite `import` being reservedclass C {  #import = "value";}

Keywords

Keywords are tokens that look like identifiers but have special meanings in JavaScript. For example, the keywordasync before a function declaration indicates that the function is asynchronous.

Some keywords arereserved, meaning that they cannot be used as an identifier for variable declarations, function declarations, etc. They are often calledreserved words.A list of these reserved words is provided below. Not all keywords are reserved — for example,async can be used as an identifier anywhere. Some keywords are onlycontextually reserved — for example,await is only reserved within the body of an async function, andlet is only reserved instrict mode code, orconst andlet declarations.

Identifiers are always compared bystring value, so escape sequences are interpreted. For example, this is still a syntax error:

js
const els\u{65} = 1;// `els\u{65}` encodes the same identifier as `else`

Reserved words

These keywords cannot be used as identifiers for variables, functions, classes, etc. anywhere in JavaScript source.

The following are only reserved when they are found in strict mode code:

  • let (also reserved inconst,let, and class declarations)
  • static
  • yield (also reserved in generator function bodies)

The following are only reserved when they are found in module code or async function bodies:

Future reserved words

The following are reserved as future keywords by the ECMAScript specification. They have no special functionality at present, but they might at some future time, so they cannot be used as identifiers.

These are always reserved:

  • enum

The following are only reserved when they are found in strict mode code:

  • implements
  • interface
  • package
  • private
  • protected
  • public

Future reserved words in older standards

The following are reserved as future keywords by older ECMAScript specifications (ECMAScript 1 till 3).

  • abstract
  • boolean
  • byte
  • char
  • double
  • final
  • float
  • goto
  • int
  • long
  • native
  • short
  • synchronized
  • throws
  • transient
  • volatile

Identifiers with special meanings

A few identifiers have a special meaning in some contexts without being reserved words of any kind. They include:

Literals

Note:This section discusses literals that are atomic tokens.Object literals andarray literals areexpressions that consist of a series of tokens.

Null literal

See alsonull for more information.

js
null

Boolean literal

See alsoboolean type for more information.

js
truefalse

Numeric literals

TheNumber andBigInt types use numeric literals.

Decimal

js
123456789042

Decimal literals can start with a zero (0) followed by another decimal digit, but if all digits after the leading0 are smaller than 8, the number is interpreted as an octal number. This is considered a legacy syntax, and number literals prefixed with0, whether interpreted as octal or decimal, cause a syntax error instrict mode — so, use the0o prefix instead.

js
0888 // 888 parsed as decimal0777 // parsed as octal, 511 in decimal
Exponential

The decimal exponential literal is specified by the following format:beN; whereb is a base number (integer or floating), followed by anE ore character (which serves as separator orexponent indicator) andN, which isexponent orpower number – a signed integer.

js
0e-5   // 00e+5   // 05e1    // 50175e-2 // 1.751e3    // 10001e-3   // 0.0011E3    // 1000

Binary

Binary number syntax uses a leading zero followed by a lowercase or uppercase Latin letter "B" (0b or0B). Any character after the0b that is not 0 or 1 will terminate the literal sequence.

js
0b10000000000000000000000000000000 // 21474836480b01111111100000000000000000000000 // 21390950400B00000000011111111111111111111111 // 8388607

Octal

Octal number syntax uses a leading zero followed by a lowercase or uppercase Latin letter "O" (0o or0O). Any character after the0o that is outside the range (01234567) will terminate the literal sequence.

js
0O755 // 4930o644 // 420

Hexadecimal

Hexadecimal number syntax uses a leading zero followed by a lowercase or uppercase Latin letter "X" (0x or0X). Any character after the0x that is outside the range (0123456789ABCDEF) will terminate the literal sequence.

js
0xFFFFFFFFFFFFF // 45035996273704950xabcdef123456  // 1889009675930460XA             // 10

BigInt literal

TheBigInt type is a numeric primitive in JavaScript that can represent integers with arbitrary precision. BigInt literals are created by appendingn to the end of an integer.

js
123456789123456789n     // 1234567891234567890o777777777777n         // 687194767350x123456789ABCDEFn      // 819855292164868950b11101001010101010101n // 955733

BigInt literals cannot start with0 to avoid confusion with legacy octal literals.

js
0755n; // SyntaxError: invalid BigInt syntax

For octalBigInt numbers, always use zero followed by the letter "o" (uppercase or lowercase):

js
0o755n;

For more information aboutBigInt, see alsoJavaScript data structures.

Numeric separators

To improve readability for numeric literals, underscores (_,U+005F) can be used as separators:

js
1_000_000_000_0001_050.950b1010_0001_1000_01010o2_2_5_60xA0_B0_C01_000_000_000_000_000_000_000n

Note these limitations:

js
// More than one underscore in a row is not allowed100__000; // SyntaxError// Not allowed at the end of numeric literals100_; // SyntaxError// Can not be used after leading 00_1; // SyntaxError

String literals

Astring literal is zero or more Unicode code points enclosed in single or double quotes. Unicode code points may also be represented by an escape sequence. All code points may appear literally in a string literal except for these code points:

  • U+005C \ (backslash)
  • U+000D <CR>
  • U+000A <LF>
  • The same kind of quote that begins the string literal

Any code points may appear in the form of an escape sequence. String literals evaluate to ECMAScript String values. When generating these String values Unicode code points are UTF-16 encoded.

js
'foo'"bar"

The following subsections describe various escape sequences (\ followed by one or more characters) available in string literals. Any escape sequence not listed below becomes an "identity escape" that becomes the code point itself. For example,\z is the same asz. There's a deprecated octal escape sequence syntax described in theDeprecated and obsolete features page. Many of these escape sequences are also valid in regular expressions — seeCharacter escape.

Escape sequences

Special characters can be encoded using escape sequences:

Escape sequenceUnicode code point
\0null character (U+0000 NULL)
\'single quote (U+0027 APOSTROPHE)
\"double quote (U+0022 QUOTATION MARK)
\\backslash (U+005C REVERSE SOLIDUS)
\nnewline (U+000A LINE FEED; LF)
\rcarriage return (U+000D CARRIAGE RETURN; CR)
\vvertical tab (U+000B LINE TABULATION)
\ttab (U+0009 CHARACTER TABULATION)
\bbackspace (U+0008 BACKSPACE)
\fform feed (U+000C FORM FEED)
\ followed by aline terminatorempty string

The last escape sequence,\ followed by a line terminator, is useful for splitting a string literal across multiple lines without changing its meaning.

js
const longString =  "This is a very long string which needs \to wrap across multiple lines because \otherwise my code is unreadable.";

Make sure there is no space or any other character after the backslash (except for a line break), otherwise it will not work. If the next line is indented, the extra spaces will also be present in the string's value.

You can also use the+ operator to append multiple strings together, like this:

js
const longString =  "This is a very long string which needs " +  "to wrap across multiple lines because " +  "otherwise my code is unreadable.";

Both of the above methods result in identical strings.

Hexadecimal escape sequences

Hexadecimal escape sequences consist of\x followed by exactly two hexadecimal digits representing a code unit or code point in the range 0x0000 to 0x00FF.

js
"\xA9"; // "©"

Unicode escape sequences

A Unicode escape sequence consists of exactly four hexadecimal digits following\u. It represents a code unit in the UTF-16 encoding. For code points U+0000 to U+FFFF, the code unit is equal to the code point. Code points U+10000 to U+10FFFF require two escape sequences representing the two code units (a surrogate pair) used to encode the character; the surrogate pair is distinct from the code point.

See alsoString.fromCharCode() andString.prototype.charCodeAt().

js
"\u00A9"; // "©" (U+A9)

Unicode code point escapes

A Unicode code point escape consists of\u{, followed by a code point in hexadecimal base, followed by}. The value of the hexadecimal digits must be in the range 0 and 0x10FFFF inclusive. Code points in the range U+10000 to U+10FFFF do not need to be represented as a surrogate pair.

See alsoString.fromCodePoint() andString.prototype.codePointAt().

js
"\u{2F804}"; // CJK COMPATIBILITY IDEOGRAPH-2F804 (U+2F804)// the same character represented as a surrogate pair"\uD87E\uDC04";

Regular expression literals

Regular expression literals are enclosed by two forward slashes (/). The lexer consumes all characters up to the next unescaped forward slash or the end of the line, unless the forward slash appears within a character class ([]). Some characters (namely, those that areidentifier parts) can appear after the closing slash, denoting flags.

The lexical grammar is very lenient: not all regular expression literals that get identified as one token are valid regular expressions.

See alsoRegExp for more information.

js
/ab+c/g;/[/]/;

A regular expression literal cannot start with two forward slashes (//), because that would be a line comment. To specify an empty regular expression, use/(?:)/.

Template literals

One template literal consists of several tokens:`xxx${ (template head),}xxx${ (template middle), and}xxx` (template tail) are individual tokens, while any expression may come between them.

See alsotemplate literals for more information.

js
`string text`;`string text line 1 string text line 2`;`string text ${expression} string text`;tag`string text ${expression} string text`;

Automatic semicolon insertion

SomeJavaScript statements' syntax definitions require semicolons (;) at the end. They include:

However, to make the language more approachable and convenient, JavaScript is able to automatically insert semicolons when consuming the token stream, so that some invalid token sequences can be "fixed" to valid syntax. This step happens after the program text has been parsed to tokens according to the lexical grammar. There are three cases when semicolons are automatically inserted:

1. When a token not allowed by the grammar is encountered, and it's separated from the previous token by at least oneline terminator (including a block comment that includes at least one line terminator), or the token is "}", then a semicolon is inserted before the token.

js
{ 12 } 3// is transformed by ASI into:{ 1;2 ;} 3;// Which is valid grammar encoding three statements,// each consisting of a number literal

The ending ")" ofdo...while is taken care of as a special case by this rule as well.

js
do {  // …} while (condition) /* ; */ // ASI hereconst a = 1

However, semicolons are not inserted if the semicolon would then become the separator in thefor statement's head.

js
for (  let a = 1 // No ASI here  a < 10 // No ASI here  a++) {}

Semicolons are also never inserted asempty statements. For example, in the code below, if a semicolon is inserted after ")", then the code would be valid, with an empty statement as theif body and theconst declaration being a separate statement. However, because automatically inserted semicolons cannot become empty statements, this causes adeclaration to become the body of theif statement, which is not valid.

js
if (Math.random() > 0.5)const x = 1 // SyntaxError: Unexpected token 'const'

2. When the end of the input stream of tokens is reached, and the parser is unable to parse the single input stream as a complete program, a semicolon is inserted at the end.

js
const a = 1 /* ; */ // ASI here

This rule is a complement to the previous rule, specifically for the case where there's no "offending token" but the end of input stream.

3. When the grammar forbids line terminators in some place but a line terminator is found, a semicolon is inserted. These places include:

  • expr <here> ++,expr <here> --
  • continue <here> lbl
  • break <here> lbl
  • return <here> expr
  • throw <here> expr
  • yield <here> expr
  • yield <here> * expr
  • (param) <here> => {}
  • async <here> function,async <here> prop(),async <here> function*,async <here> *prop(),async <here> (param) <here> => {}

Here++ is not treated as a postfix operator applying to variableb, because a line terminator occurs betweenb and++.

js
a = b++c// is transformed by ASI intoa = b;++c;

Here, thereturn statement returnsundefined, and thea + b becomes an unreachable statement.

js
returna + b// is transformed by ASI intoreturn;a + b;

Note that ASI would only be triggered if a line break separates tokens that would otherwise produce invalid syntax. If the next token can be parsed as part of a valid structure, semicolons would not be inserted. For example:

js
const a = 1(1).toString()const b = 1[1, 2, 3].forEach(console.log)

Because() can be seen as a function call, it would usually not trigger ASI. Similarly,[] may be a member access. The code above is equivalent to:

js
const a = 1(1).toString();const b = 1[1, 2, 3].forEach(console.log);

This happens to be valid syntax.1[1, 2, 3] is aproperty accessor with acomma-joined expression. Therefore, you would get errors like "1 is not a function" and "Cannot read properties of undefined (reading 'forEach')" when running the code.

Within classes, class fields and generator methods can be a pitfall as well.

js
class A {  a = 1  *gen() {}}

It is seen as:

js
class A {  a = 1 * gen() {}}

And therefore will be a syntax error around{.

There are the following rules-of-thumb for dealing with ASI, if you want to enforce semicolon-less style:

  • Write postfix++ and-- on the same line as their operands.

    js
    const a = b++console.log(a) // ReferenceError: Invalid left-hand side expression in prefix operation
    js
    const a = b++console.log(a)
  • The expressions afterreturn,throw, oryield should be on the same line as the keyword.

    js
    function foo() {  return    1 + 1 // Returns undefined; 1 + 1 is ignored}
    js
    function foo() {  return 1 + 1}function foo() {  return (    1 + 1  )}
  • Similarly, the label identifier afterbreak orcontinue should be on the same line as the keyword.

    js
    outerBlock: {  innerBlock: {    break      outerBlock // SyntaxError: Illegal break statement  }}
    js
    outerBlock: {  innerBlock: {    break outerBlock  }}
  • The=> of an arrow function should be on the same line as the end of its parameters.

    js
    const foo = (a, b)  => a + b
    js
    const foo = (a, b) =>  a + b
  • Theasync of async functions, methods, etc. cannot be directly followed by a line terminator.

    js
    asyncfunction foo() {}
    js
    async functionfoo() {}
  • If a line starts with one of(,[,`,+,-,/ (as in regex literals), prefix it with a semicolon, or end the previous line with a semicolon.

    js
    // The () may be merged with the previous line as a function call(() => {  // …})()// The [ may be merged with the previous line as a property access[1, 2, 3].forEach(console.log)// The ` may be merged with the previous line as a tagged template literal`string text ${data}`.match(pattern).forEach(console.log)// The + may be merged with the previous line as a binary + expression+a.toString()// The - may be merged with the previous line as a binary - expression-a.toString()// The / may be merged with the previous line as a division expression/pattern/.exec(str).forEach(console.log)
    js
    ;(() => {  // …})();[1, 2, 3].forEach(console.log);`string text ${data}`.match(pattern).forEach(console.log);+a.toString();-a.toString();/pattern/.exec(str).forEach(console.log)
  • Class fields should preferably always be ended with semicolons — in addition to the previous rule (which includes a field declaration followed by acomputed property, since the latter starts with[), semicolons are also required between a field declaration and a generator method.

    js
    class A {  a = 1  [b] = 2  *gen() {} // Seen as a = 1[b] = 2 * gen() {}}
    js
    class A {  a = 1;  [b] = 2;  *gen() {}}

Specifications

Specification
ECMAScript® 2026 Language Specification

Browser compatibility

See also

Help improve MDN

Learn how to contribute.

This page was last modified on byMDN contributors.


[8]ページ先頭

©2009-2025 Movatter.jp