Lexical grammar
This page describes JavaScript's lexical grammar. JavaScript source text is just a sequence of characters — in order for the interpreter to understand it, the string has to beparsed to a more structured representation. The initial step of parsing is calledlexical analysis, in which the text gets scanned from left to right and is converted into a sequence of individual, atomic input elements. Some input elements are insignificant to the interpreter, and will be stripped after this step — they includewhite space andcomments. The others, includingidentifiers,keywords,literals, and punctuators (mostlyoperators), will be used for further syntax analysis.Line terminators and multiline comments are also syntactically insignificant, but they guide the process forautomatic semicolons insertion to make certain invalid token sequences become valid.
In this article
Format-control characters
Format-control characters have no visual representation but are used to control the interpretation of the text.
| Code point | Name | Abbreviation | Description |
|---|---|---|---|
| U+200C | Zero width non-joiner | <ZWNJ> | Placed between characters to prevent being connected into ligatures in certain languages (Wikipedia). |
| U+200D | Zero width joiner | <ZWJ> | Placed between characters that would not normally be connected in order to cause the characters to be rendered using their connected form in certain languages (Wikipedia). |
| U+FEFF | Byte order mark | <BOM> | Used at the start of the script to mark it as Unicode and to allow detection of the text's encoding and byte order (Wikipedia). |
In JavaScript source text, <ZWNJ> and <ZWJ> are treated asidentifier parts, while <BOM> (also called a zero-width no-break space <ZWNBSP> when not at the start of text) is treated aswhite space.
White space
White space characters improve the readability of source text and separate tokens from each other. These characters are usually unnecessary for the functionality of the code.Minification tools are often used to remove whitespace in order to reduce the amount of data that needs to be transferred.
| Code point | Name | Abbreviation | Description | Escape sequence |
|---|---|---|---|---|
| U+0009 | Character tabulation | <TAB> | Horizontal tabulation | \t |
| U+000B | Line tabulation | <VT> | Vertical tabulation | \v |
| U+000C | Form feed | <FF> | Page breaking control character (Wikipedia). | \f |
| U+0020 | Space | <SP> | Normal space | |
| U+00A0 | No-break space | <NBSP> | Normal space, but no point at which a line may break | |
| U+FEFF | Zero-width no-break space | <ZWNBSP> | When not at the start of a script, the BOM marker is a normal whitespace character. | |
| Others | Other Unicode space characters | <USP> | Characters in the "Space_Separator" general category |
Note:Of thosecharacters with the "White_Space" property but are not in the "Space_Separator" general category, U+0009, U+000B, and U+000C are still treated as white space in JavaScript; U+0085 NEXT LINE has no special role; others become the set ofline terminators.
Note:Changes to the Unicode standard used by the JavaScript engine may affect programs' behavior. For example, ES2016 upgraded the reference Unicode standard from 5.1 to 8.0.0, which caused U+180E MONGOLIAN VOWEL SEPARATOR to be moved from the "Space_Separator" category to the "Format (Cf)" category, and made it a non-whitespace. Subsequently, the result of"\u180E".trim().length changed from0 to1.
Line terminators
In addition towhite space characters, line terminator characters are used to improve the readability of the source text. However, in some cases, line terminators can influence the execution of JavaScript code as there are a few places where they are forbidden. Line terminators also affect the process ofautomatic semicolon insertion.
Outside the context of lexical grammar, white space and line terminators are often conflated. For example,String.prototype.trim() removes all white space and line terminators from the beginning and end of a string. The\scharacter class escape in regular expressions matches all white space and line terminators.
Only the following Unicode code points are treated as line terminators in ECMAScript, other line breaking characters are treated as white space (for example, Next Line, NEL, U+0085 is considered as white space).
Comments
Comments are used to add hints, notes, suggestions, or warnings to JavaScript code. This can make it easier to read and understand. They can also be used to disable code to prevent it from being executed; this can be a valuable debugging tool.
JavaScript has two long-standing ways to add comments to code: line comments and block comments. In addition, there's a special hashbang comment syntax.
Line comments
The first way is the// comment; this makes all text following it on the same line into a comment. For example:
function comment() { // This is a one line JavaScript comment console.log("Hello world!");}comment();Block comments
The second way is the/* */ style, which is much more flexible.
For example, you can use it on a single line:
function comment() { /* This is a one line JavaScript comment */ console.log("Hello world!");}comment();You can also make multiple-line comments, like this:
function comment() { /* This comment spans multiple lines. Notice that we don't need to end the comment until we're done. */ console.log("Hello world!");}comment();You can also use it in the middle of a line, if you wish, although this can make your code harder to read so it should be used with caution:
function comment(x) { console.log("Hello " + x /* insert the value of x */ + " !");}comment("world");In addition, you can use it to disable code to prevent it from running, by wrapping code in a comment, like this:
function comment() { /* console.log("Hello world!"); */}comment();In this case, theconsole.log() call is never issued, since it's inside a comment. Any number of lines of code can be disabled this way.
Block comments that contain at least one line terminator behave likeline terminators inautomatic semicolon insertion.
Hashbang comments
There's a special third comment syntax, thehashbang comment. A hashbang comment behaves exactly like a single line-only (//) comment, except that it begins with#! andis only valid at the absolute start of a script or module. Note also that no whitespace of any kind is permitted before the#!. The comment consists of all the characters after#! up to the end of the first line; only one such comment is permitted.
Hashbang comments in JavaScript resembleshebangs in Unix which provide the path to a specific JavaScript interpreter that you want to use to execute the script. Before the hashbang comment became standardized, it had already been de-facto implemented in non-browser hosts like Node.js, where it was stripped from the source text before being passed to the engine. An example is as follows:
#!/usr/bin/env nodeconsole.log("Hello world");The JavaScript interpreter will treat it as a normal comment — it only has semantic meaning to the shell if the script is directly run in a shell.
Warning:If you want scripts to be runnable directly in a shell environment, encode them in UTF-8 without aBOM. Although a BOM will not cause any problems for code running in a browser — because it's stripped during UTF-8 decoding, before the source text is analyzed — a Unix/Linux shell will not recognize the hashbang if it's preceded by a BOM character.
You must only use the#! comment style to specify a JavaScript interpreter. In all other cases just use a// comment (or multiline comment).
Identifiers
Anidentifier is used to link a value with a name. Identifiers can be used in various places:
const decl = 1; // Variable declaration (may also be `let` or `var`)function fn() {} // Function declarationconst obj = { key: "value" }; // Object keys// Class declarationclass C { #priv = "value"; // Private field}lbl: console.log(1); // LabelIn JavaScript, identifiers are commonly made of alphanumeric characters, underscores (_), and dollar signs ($). Identifiers are not allowed to start with numbers. However, JavaScript identifiers are not only limited toASCII — many Unicode code points are allowed as well. Namely:
- Start characters can be any character in theID_Start category plus
_and$. - After the first character, you can use any character in theID_Continue category plus U+200C (ZWNJ) and U+200D (ZWJ).
Note:If, for some reason, you need to parse some JavaScript source yourself, do not assume all identifiers follow the pattern/[A-Za-z_$][\w$]*/ (i.e., ASCII-only)! The range of identifiers can be described by the regex/[$_\p{ID_Start}][$\p{ID_Continue}]*/u (excluding unicode escape sequences).
In addition, JavaScript allows usingUnicode escape sequences in the form of\u0000 or\u{000000} in identifiers, which encode the same string value as the actual Unicode characters. For example,你好 and\u4f60\u597d are the same identifiers:
const 你好 = "Hello";console.log(\u4f60\u597d); // HelloNot all places accept the full range of identifiers. Certain syntaxes, such as function declarations, function expressions, and variable declarations require using identifiers names that are notreserved words.
function import() {} // Illegal: import is a reserved word.Most notably, private elements and object properties allow reserved words.
const obj = { import: "value" }; // Legal despite `import` being reservedclass C { #import = "value";}Keywords
Keywords are tokens that look like identifiers but have special meanings in JavaScript. For example, the keywordasync before a function declaration indicates that the function is asynchronous.
Some keywords arereserved, meaning that they cannot be used as an identifier for variable declarations, function declarations, etc. They are often calledreserved words.A list of these reserved words is provided below. Not all keywords are reserved — for example,async can be used as an identifier anywhere. Some keywords are onlycontextually reserved — for example,await is only reserved within the body of an async function, andlet is only reserved instrict mode code, orconst andlet declarations.
Identifiers are always compared bystring value, so escape sequences are interpreted. For example, this is still a syntax error:
const els\u{65} = 1;// `els\u{65}` encodes the same identifier as `else`Reserved words
These keywords cannot be used as identifiers for variables, functions, classes, etc. anywhere in JavaScript source.
breakcasecatchclassconstcontinuedebuggerdefaultdeletedoelseexportextendsfalsefinallyforfunctionifimportininstanceofnewnullreturnsuperswitchthisthrowtruetrytypeofvarvoidwhilewith
The following are only reserved when they are found in strict mode code:
let(also reserved inconst,let, and class declarations)staticyield(also reserved in generator function bodies)
The following are only reserved when they are found in module code or async function bodies:
Future reserved words
The following are reserved as future keywords by the ECMAScript specification. They have no special functionality at present, but they might at some future time, so they cannot be used as identifiers.
These are always reserved:
enum
The following are only reserved when they are found in strict mode code:
implementsinterfacepackageprivateprotectedpublic
Future reserved words in older standards
The following are reserved as future keywords by older ECMAScript specifications (ECMAScript 1 till 3).
abstractbooleanbytechardoublefinalfloatgotointlongnativeshortsynchronizedthrowstransientvolatile
Identifiers with special meanings
A few identifiers have a special meaning in some contexts without being reserved words of any kind. They include:
arguments(not a keyword, but cannot be declared as identifier in strict mode)as(import * as ns from "mod")asynceval(not a keyword, but cannot be declared as identifier in strict mode)from(import x from "mod")getofset
Literals
Note:This section discusses literals that are atomic tokens.Object literals andarray literals areexpressions that consist of a series of tokens.
Null literal
See alsonull for more information.
nullBoolean literal
See alsoboolean type for more information.
truefalseNumeric literals
TheNumber andBigInt types use numeric literals.
Decimal
123456789042Decimal literals can start with a zero (0) followed by another decimal digit, but if all digits after the leading0 are smaller than 8, the number is interpreted as an octal number. This is considered a legacy syntax, and number literals prefixed with0, whether interpreted as octal or decimal, cause a syntax error instrict mode — so, use the0o prefix instead.
0888 // 888 parsed as decimal0777 // parsed as octal, 511 in decimalExponential
The decimal exponential literal is specified by the following format:beN; whereb is a base number (integer or floating), followed by anE ore character (which serves as separator orexponent indicator) andN, which isexponent orpower number – a signed integer.
0e-5 // 00e+5 // 05e1 // 50175e-2 // 1.751e3 // 10001e-3 // 0.0011E3 // 1000Binary
Binary number syntax uses a leading zero followed by a lowercase or uppercase Latin letter "B" (0b or0B). Any character after the0b that is not 0 or 1 will terminate the literal sequence.
0b10000000000000000000000000000000 // 21474836480b01111111100000000000000000000000 // 21390950400B00000000011111111111111111111111 // 8388607Octal
Octal number syntax uses a leading zero followed by a lowercase or uppercase Latin letter "O" (0o or0O). Any character after the0o that is outside the range (01234567) will terminate the literal sequence.
0O755 // 4930o644 // 420Hexadecimal
Hexadecimal number syntax uses a leading zero followed by a lowercase or uppercase Latin letter "X" (0x or0X). Any character after the0x that is outside the range (0123456789ABCDEF) will terminate the literal sequence.
0xFFFFFFFFFFFFF // 45035996273704950xabcdef123456 // 1889009675930460XA // 10BigInt literal
TheBigInt type is a numeric primitive in JavaScript that can represent integers with arbitrary precision. BigInt literals are created by appendingn to the end of an integer.
123456789123456789n // 1234567891234567890o777777777777n // 687194767350x123456789ABCDEFn // 819855292164868950b11101001010101010101n // 955733BigInt literals cannot start with0 to avoid confusion with legacy octal literals.
0755n; // SyntaxError: invalid BigInt syntaxFor octalBigInt numbers, always use zero followed by the letter "o" (uppercase or lowercase):
0o755n;For more information aboutBigInt, see alsoJavaScript data structures.
Numeric separators
To improve readability for numeric literals, underscores (_,U+005F) can be used as separators:
1_000_000_000_0001_050.950b1010_0001_1000_01010o2_2_5_60xA0_B0_C01_000_000_000_000_000_000_000nNote these limitations:
// More than one underscore in a row is not allowed100__000; // SyntaxError// Not allowed at the end of numeric literals100_; // SyntaxError// Can not be used after leading 00_1; // SyntaxErrorString literals
Astring literal is zero or more Unicode code points enclosed in single or double quotes. Unicode code points may also be represented by an escape sequence. All code points may appear literally in a string literal except for these code points:
- U+005C \ (backslash)
- U+000D <CR>
- U+000A <LF>
- The same kind of quote that begins the string literal
Any code points may appear in the form of an escape sequence. String literals evaluate to ECMAScript String values. When generating these String values Unicode code points are UTF-16 encoded.
'foo'"bar"The following subsections describe various escape sequences (\ followed by one or more characters) available in string literals. Any escape sequence not listed below becomes an "identity escape" that becomes the code point itself. For example,\z is the same asz. There's a deprecated octal escape sequence syntax described in theDeprecated and obsolete features page. Many of these escape sequences are also valid in regular expressions — seeCharacter escape.
Escape sequences
Special characters can be encoded using escape sequences:
| Escape sequence | Unicode code point |
|---|---|
\0 | null character (U+0000 NULL) |
\' | single quote (U+0027 APOSTROPHE) |
\" | double quote (U+0022 QUOTATION MARK) |
\\ | backslash (U+005C REVERSE SOLIDUS) |
\n | newline (U+000A LINE FEED; LF) |
\r | carriage return (U+000D CARRIAGE RETURN; CR) |
\v | vertical tab (U+000B LINE TABULATION) |
\t | tab (U+0009 CHARACTER TABULATION) |
\b | backspace (U+0008 BACKSPACE) |
\f | form feed (U+000C FORM FEED) |
\ followed by aline terminator | empty string |
The last escape sequence,\ followed by a line terminator, is useful for splitting a string literal across multiple lines without changing its meaning.
const longString = "This is a very long string which needs \to wrap across multiple lines because \otherwise my code is unreadable.";Make sure there is no space or any other character after the backslash (except for a line break), otherwise it will not work. If the next line is indented, the extra spaces will also be present in the string's value.
You can also use the+ operator to append multiple strings together, like this:
const longString = "This is a very long string which needs " + "to wrap across multiple lines because " + "otherwise my code is unreadable.";Both of the above methods result in identical strings.
Hexadecimal escape sequences
Hexadecimal escape sequences consist of\x followed by exactly two hexadecimal digits representing a code unit or code point in the range 0x0000 to 0x00FF.
"\xA9"; // "©"Unicode escape sequences
A Unicode escape sequence consists of exactly four hexadecimal digits following\u. It represents a code unit in the UTF-16 encoding. For code points U+0000 to U+FFFF, the code unit is equal to the code point. Code points U+10000 to U+10FFFF require two escape sequences representing the two code units (a surrogate pair) used to encode the character; the surrogate pair is distinct from the code point.
See alsoString.fromCharCode() andString.prototype.charCodeAt().
"\u00A9"; // "©" (U+A9)Unicode code point escapes
A Unicode code point escape consists of\u{, followed by a code point in hexadecimal base, followed by}. The value of the hexadecimal digits must be in the range 0 and 0x10FFFF inclusive. Code points in the range U+10000 to U+10FFFF do not need to be represented as a surrogate pair.
See alsoString.fromCodePoint() andString.prototype.codePointAt().
"\u{2F804}"; // CJK COMPATIBILITY IDEOGRAPH-2F804 (U+2F804)// the same character represented as a surrogate pair"\uD87E\uDC04";Regular expression literals
Regular expression literals are enclosed by two forward slashes (/). The lexer consumes all characters up to the next unescaped forward slash or the end of the line, unless the forward slash appears within a character class ([]). Some characters (namely, those that areidentifier parts) can appear after the closing slash, denoting flags.
The lexical grammar is very lenient: not all regular expression literals that get identified as one token are valid regular expressions.
See alsoRegExp for more information.
/ab+c/g;/[/]/;A regular expression literal cannot start with two forward slashes (//), because that would be a line comment. To specify an empty regular expression, use/(?:)/.
Template literals
One template literal consists of several tokens:`xxx${ (template head),}xxx${ (template middle), and}xxx` (template tail) are individual tokens, while any expression may come between them.
See alsotemplate literals for more information.
`string text`;`string text line 1 string text line 2`;`string text ${expression} string text`;tag`string text ${expression} string text`;Automatic semicolon insertion
SomeJavaScript statements' syntax definitions require semicolons (;) at the end. They include:
var,let,const,using,await using- Expression statements
do...whilecontinue,break,return,throwdebugger- Class field declarations (public orprivate)
import,export
However, to make the language more approachable and convenient, JavaScript is able to automatically insert semicolons when consuming the token stream, so that some invalid token sequences can be "fixed" to valid syntax. This step happens after the program text has been parsed to tokens according to the lexical grammar. There are three cases when semicolons are automatically inserted:
1. When a token not allowed by the grammar is encountered, and it's separated from the previous token by at least oneline terminator (including a block comment that includes at least one line terminator), or the token is "}", then a semicolon is inserted before the token.
{ 12 } 3// is transformed by ASI into:{ 1;2 ;} 3;// Which is valid grammar encoding three statements,// each consisting of a number literalThe ending ")" ofdo...while is taken care of as a special case by this rule as well.
do { // …} while (condition) /* ; */ // ASI hereconst a = 1However, semicolons are not inserted if the semicolon would then become the separator in thefor statement's head.
for ( let a = 1 // No ASI here a < 10 // No ASI here a++) {}Semicolons are also never inserted asempty statements. For example, in the code below, if a semicolon is inserted after ")", then the code would be valid, with an empty statement as theif body and theconst declaration being a separate statement. However, because automatically inserted semicolons cannot become empty statements, this causes adeclaration to become the body of theif statement, which is not valid.
if (Math.random() > 0.5)const x = 1 // SyntaxError: Unexpected token 'const'2. When the end of the input stream of tokens is reached, and the parser is unable to parse the single input stream as a complete program, a semicolon is inserted at the end.
const a = 1 /* ; */ // ASI hereThis rule is a complement to the previous rule, specifically for the case where there's no "offending token" but the end of input stream.
3. When the grammar forbids line terminators in some place but a line terminator is found, a semicolon is inserted. These places include:
expr <here> ++,expr <here> --continue <here> lblbreak <here> lblreturn <here> exprthrow <here> expryield <here> expryield <here> * expr(param) <here> => {}async <here> function,async <here> prop(),async <here> function*,async <here> *prop(),async <here> (param) <here> => {}using <here> id,await <here> using <here> id
Here++ is not treated as a postfix operator applying to variableb, because a line terminator occurs betweenb and++.
a = b++c// is transformed by ASI intoa = b;++c;Here, thereturn statement returnsundefined, and thea + b becomes an unreachable statement.
returna + b// is transformed by ASI intoreturn;a + b;Note that ASI would only be triggered if a line break separates tokens that would otherwise produce invalid syntax. If the next token can be parsed as part of a valid structure, semicolons would not be inserted. For example:
const a = 1(1).toString()const b = 1[1, 2, 3].forEach(console.log)Because() can be seen as a function call, it would usually not trigger ASI. Similarly,[] may be a member access. The code above is equivalent to:
const a = 1(1).toString();const b = 1[1, 2, 3].forEach(console.log);This happens to be valid syntax.1[1, 2, 3] is aproperty accessor with acomma-joined expression. Therefore, you would get errors like "1 is not a function" and "Cannot read properties of undefined (reading 'forEach')" when running the code.
Within classes, class fields and generator methods can be a pitfall as well.
class A { a = 1 *gen() {}}It is seen as:
class A { a = 1 * gen() {}}And therefore will be a syntax error around{.
There are the following rules-of-thumb for dealing with ASI, if you want to enforce semicolon-less style:
Write postfix
++and--on the same line as their operands.jsconst a = b++console.log(a) // ReferenceError: Invalid left-hand side expression in prefix operationjsconst a = b++console.log(a)The expressions after
return,throw, oryieldshould be on the same line as the keyword.jsfunction foo() { return 1 + 1 // Returns undefined; 1 + 1 is ignored}jsfunction foo() { return 1 + 1}function foo() { return ( 1 + 1 )}Similarly, the label identifier after
breakorcontinueshould be on the same line as the keyword.jsouterBlock: { innerBlock: { break outerBlock // SyntaxError: Illegal break statement }}jsouterBlock: { innerBlock: { break outerBlock }}The
=>of an arrow function should be on the same line as the end of its parameters.jsconst foo = (a, b) => a + bjsconst foo = (a, b) => a + bThe
asyncof async functions, methods, etc. cannot be directly followed by a line terminator.jsasyncfunction foo() {}jsasync functionfoo() {}The
usingkeyword inusingandawait usingstatements should be on the same line as the first identifier it declares.jsusingresource = acquireResource()jsusing resource = acquireResource()If a line starts with one of
(,[,`,+,-,/(as in regex literals), prefix it with a semicolon, or end the previous line with a semicolon.js// The () may be merged with the previous line as a function call(() => { // …})()// The [ may be merged with the previous line as a property access[1, 2, 3].forEach(console.log)// The ` may be merged with the previous line as a tagged template literal`string text ${data}`.match(pattern).forEach(console.log)// The + may be merged with the previous line as a binary + expression+a.toString()// The - may be merged with the previous line as a binary - expression-a.toString()// The / may be merged with the previous line as a division expression/pattern/.exec(str).forEach(console.log)js;(() => { // …})();[1, 2, 3].forEach(console.log);`string text ${data}`.match(pattern).forEach(console.log);+a.toString();-a.toString();/pattern/.exec(str).forEach(console.log)Class fields should preferably always be ended with semicolons — in addition to the previous rule (which includes a field declaration followed by acomputed property, since the latter starts with
[), semicolons are also required between a field declaration and a generator method.jsclass A { a = 1 [b] = 2 *gen() {} // Seen as a = 1[b] = 2 * gen() {}}jsclass A { a = 1; [b] = 2; *gen() {}}
Specifications
| Specification |
|---|
| ECMAScript® 2026 Language Specification> |
Browser compatibility
Loading…
See also
- Grammar and types guide
- Micro-feature from ES6, now in Firefox Aurora and Nightly: binary and octal numbers by Jeff Walden (2013)
- JavaScript character escape sequences by Mathias Bynens (2011)