Lexical grammar
This page describes JavaScript's lexical grammar. JavaScript source text is just a sequence of characters — in order for the interpreter to understand it, the string has to beparsed to a more structured representation. The initial step of parsing is calledlexical analysis, in which the text gets scanned from left to right and is converted into a sequence of individual, atomic input elements. Some input elements are insignificant to the interpreter, and will be stripped after this step — they includewhite space andcomments. The others, includingidentifiers,keywords,literals, and punctuators (mostlyoperators), will be used for further syntax analysis.Line terminators and multiline comments are also syntactically insignificant, but they guide the process forautomatic semicolons insertion to make certain invalid token sequences become valid.
Format-control characters
Format-control characters have no visual representation but are used to control the interpretation of the text.
Code point | Name | Abbreviation | Description |
---|---|---|---|
U+200C | Zero width non-joiner | <ZWNJ> | Placed between characters to prevent being connected into ligatures in certain languages (Wikipedia). |
U+200D | Zero width joiner | <ZWJ> | Placed between characters that would not normally be connected in order to cause the characters to be rendered using their connected form in certain languages (Wikipedia). |
U+FEFF | Byte order mark | <BOM> | Used at the start of the script to mark it as Unicode and to allow detection of the text's encoding and byte order (Wikipedia). |
In JavaScript source text, <ZWNJ> and <ZWJ> are treated asidentifier parts, while <BOM> (also called a zero-width no-break space <ZWNBSP> when not at the start of text) is treated aswhite space.
White space
White space characters improve the readability of source text and separate tokens from each other. These characters are usually unnecessary for the functionality of the code.Minification tools are often used to remove whitespace in order to reduce the amount of data that needs to be transferred.
Code point | Name | Abbreviation | Description | Escape sequence |
---|---|---|---|---|
U+0009 | Character tabulation | <TAB> | Horizontal tabulation | \t |
U+000B | Line tabulation | <VT> | Vertical tabulation | \v |
U+000C | Form feed | <FF> | Page breaking control character (Wikipedia). | \f |
U+0020 | Space | <SP> | Normal space | |
U+00A0 | No-break space | <NBSP> | Normal space, but no point at which a line may break | |
U+FEFF | Zero-width no-break space | <ZWNBSP> | When not at the start of a script, the BOM marker is a normal whitespace character. | |
Others | Other Unicode space characters | <USP> | Characters in the "Space_Separator" general category |
Note:Of thosecharacters with the "White_Space" property but are not in the "Space_Separator" general category, U+0009, U+000B, and U+000C are still treated as white space in JavaScript; U+0085 NEXT LINE has no special role; others become the set ofline terminators.
Note:Changes to the Unicode standard used by the JavaScript engine may affect programs' behavior. For example, ES2016 upgraded the reference Unicode standard from 5.1 to 8.0.0, which caused U+180E MONGOLIAN VOWEL SEPARATOR to be moved from the "Space_Separator" category to the "Format (Cf)" category, and made it a non-whitespace. Subsequently, the result of"\u180E".trim().length
changed from0
to1
.
Line terminators
In addition towhite space characters, line terminator characters are used to improve the readability of the source text. However, in some cases, line terminators can influence the execution of JavaScript code as there are a few places where they are forbidden. Line terminators also affect the process ofautomatic semicolon insertion.
Outside the context of lexical grammar, white space and line terminators are often conflated. For example,String.prototype.trim()
removes all white space and line terminators from the beginning and end of a string. The\s
character class escape in regular expressions matches all white space and line terminators.
Only the following Unicode code points are treated as line terminators in ECMAScript, other line breaking characters are treated as white space (for example, Next Line, NEL, U+0085 is considered as white space).
Comments
Comments are used to add hints, notes, suggestions, or warnings to JavaScript code. This can make it easier to read and understand. They can also be used to disable code to prevent it from being executed; this can be a valuable debugging tool.
JavaScript has two long-standing ways to add comments to code: line comments and block comments. In addition, there's a special hashbang comment syntax.
Line comments
The first way is the//
comment; this makes all text following it on the same line into a comment. For example:
function comment() { // This is a one line JavaScript comment console.log("Hello world!");}comment();
Block comments
The second way is the/* */
style, which is much more flexible.
For example, you can use it on a single line:
function comment() { /* This is a one line JavaScript comment */ console.log("Hello world!");}comment();
You can also make multiple-line comments, like this:
function comment() { /* This comment spans multiple lines. Notice that we don't need to end the comment until we're done. */ console.log("Hello world!");}comment();
You can also use it in the middle of a line, if you wish, although this can make your code harder to read so it should be used with caution:
function comment(x) { console.log("Hello " + x /* insert the value of x */ + " !");}comment("world");
In addition, you can use it to disable code to prevent it from running, by wrapping code in a comment, like this:
function comment() { /* console.log("Hello world!"); */}comment();
In this case, theconsole.log()
call is never issued, since it's inside a comment. Any number of lines of code can be disabled this way.
Block comments that contain at least one line terminator behave likeline terminators inautomatic semicolon insertion.
Hashbang comments
There's a special third comment syntax, thehashbang comment. A hashbang comment behaves exactly like a single line-only (//
) comment, except that it begins with#!
andis only valid at the absolute start of a script or module. Note also that no whitespace of any kind is permitted before the#!
. The comment consists of all the characters after#!
up to the end of the first line; only one such comment is permitted.
Hashbang comments in JavaScript resembleshebangs in Unix which provide the path to a specific JavaScript interpreter that you want to use to execute the script. Before the hashbang comment became standardized, it had already been de-facto implemented in non-browser hosts like Node.js, where it was stripped from the source text before being passed to the engine. An example is as follows:
#!/usr/bin/env nodeconsole.log("Hello world");
The JavaScript interpreter will treat it as a normal comment — it only has semantic meaning to the shell if the script is directly run in a shell.
Warning:If you want scripts to be runnable directly in a shell environment, encode them in UTF-8 without aBOM. Although a BOM will not cause any problems for code running in a browser — because it's stripped during UTF-8 decoding, before the source text is analyzed — a Unix/Linux shell will not recognize the hashbang if it's preceded by a BOM character.
You must only use the#!
comment style to specify a JavaScript interpreter. In all other cases just use a//
comment (or multiline comment).
Identifiers
Anidentifier is used to link a value with a name. Identifiers can be used in various places:
const decl = 1; // Variable declaration (may also be `let` or `var`)function fn() {} // Function declarationconst obj = { key: "value" }; // Object keys// Class declarationclass C { #priv = "value"; // Private field}lbl: console.log(1); // Label
In JavaScript, identifiers are commonly made of alphanumeric characters, underscores (_
), and dollar signs ($
). Identifiers are not allowed to start with numbers. However, JavaScript identifiers are not only limited toASCII — many Unicode code points are allowed as well. Namely:
- Start characters can be any character in theID_Start category plus
_
and$
. - After the first character, you can use any character in theID_Continue category plus U+200C (ZWNJ) and U+200D (ZWJ).
Note:If, for some reason, you need to parse some JavaScript source yourself, do not assume all identifiers follow the pattern/[A-Za-z_$][\w$]*/
(i.e., ASCII-only)! The range of identifiers can be described by the regex/[$_\p{ID_Start}][$\p{ID_Continue}]*/u
(excluding unicode escape sequences).
In addition, JavaScript allows usingUnicode escape sequences in the form of\u0000
or\u{000000}
in identifiers, which encode the same string value as the actual Unicode characters. For example,你好
and\u4f60\u597d
are the same identifiers:
const 你好 = "Hello";console.log(\u4f60\u597d); // Hello
Not all places accept the full range of identifiers. Certain syntaxes, such as function declarations, function expressions, and variable declarations require using identifiers names that are notreserved words.
function import() {} // Illegal: import is a reserved word.
Most notably, private elements and object properties allow reserved words.
const obj = { import: "value" }; // Legal despite `import` being reservedclass C { #import = "value";}
Keywords
Keywords are tokens that look like identifiers but have special meanings in JavaScript. For example, the keywordasync
before a function declaration indicates that the function is asynchronous.
Some keywords arereserved, meaning that they cannot be used as an identifier for variable declarations, function declarations, etc. They are often calledreserved words.A list of these reserved words is provided below. Not all keywords are reserved — for example,async
can be used as an identifier anywhere. Some keywords are onlycontextually reserved — for example,await
is only reserved within the body of an async function, andlet
is only reserved instrict mode code, orconst
andlet
declarations.
Identifiers are always compared bystring value, so escape sequences are interpreted. For example, this is still a syntax error:
const els\u{65} = 1;// `els\u{65}` encodes the same identifier as `else`
Reserved words
These keywords cannot be used as identifiers for variables, functions, classes, etc. anywhere in JavaScript source.
break
case
catch
class
const
continue
debugger
default
delete
do
else
export
extends
false
finally
for
function
if
import
in
instanceof
new
null
return
super
switch
this
throw
true
try
typeof
var
void
while
with
The following are only reserved when they are found in strict mode code:
let
(also reserved inconst
,let
, and class declarations)static
yield
(also reserved in generator function bodies)
The following are only reserved when they are found in module code or async function bodies:
Future reserved words
The following are reserved as future keywords by the ECMAScript specification. They have no special functionality at present, but they might at some future time, so they cannot be used as identifiers.
These are always reserved:
enum
The following are only reserved when they are found in strict mode code:
implements
interface
package
private
protected
public
Future reserved words in older standards
The following are reserved as future keywords by older ECMAScript specifications (ECMAScript 1 till 3).
abstract
boolean
byte
char
double
final
float
goto
int
long
native
short
synchronized
throws
transient
volatile
Identifiers with special meanings
A few identifiers have a special meaning in some contexts without being reserved words of any kind. They include:
arguments
(not a keyword, but cannot be declared as identifier in strict mode)as
(import * as ns from "mod"
)async
eval
(not a keyword, but cannot be declared as identifier in strict mode)from
(import x from "mod"
)get
of
set
Literals
Note:This section discusses literals that are atomic tokens.Object literals andarray literals areexpressions that consist of a series of tokens.
Null literal
See alsonull
for more information.
null
Boolean literal
See alsoboolean type for more information.
truefalse
Numeric literals
TheNumber andBigInt types use numeric literals.
Decimal
123456789042
Decimal literals can start with a zero (0
) followed by another decimal digit, but if all digits after the leading0
are smaller than 8, the number is interpreted as an octal number. This is considered a legacy syntax, and number literals prefixed with0
, whether interpreted as octal or decimal, cause a syntax error instrict mode — so, use the0o
prefix instead.
0888 // 888 parsed as decimal0777 // parsed as octal, 511 in decimal
Exponential
The decimal exponential literal is specified by the following format:beN
; whereb
is a base number (integer or floating), followed by anE
ore
character (which serves as separator orexponent indicator) andN
, which isexponent orpower number – a signed integer.
0e-5 // 00e+5 // 05e1 // 50175e-2 // 1.751e3 // 10001e-3 // 0.0011E3 // 1000
Binary
Binary number syntax uses a leading zero followed by a lowercase or uppercase Latin letter "B" (0b
or0B
). Any character after the0b
that is not 0 or 1 will terminate the literal sequence.
0b10000000000000000000000000000000 // 21474836480b01111111100000000000000000000000 // 21390950400B00000000011111111111111111111111 // 8388607
Octal
Octal number syntax uses a leading zero followed by a lowercase or uppercase Latin letter "O" (0o
or0O)
. Any character after the0o
that is outside the range (01234567) will terminate the literal sequence.
0O755 // 4930o644 // 420
Hexadecimal
Hexadecimal number syntax uses a leading zero followed by a lowercase or uppercase Latin letter "X" (0x
or0X
). Any character after the0x
that is outside the range (0123456789ABCDEF) will terminate the literal sequence.
0xFFFFFFFFFFFFF // 45035996273704950xabcdef123456 // 1889009675930460XA // 10
BigInt literal
TheBigInt type is a numeric primitive in JavaScript that can represent integers with arbitrary precision. BigInt literals are created by appendingn
to the end of an integer.
123456789123456789n // 1234567891234567890o777777777777n // 687194767350x123456789ABCDEFn // 819855292164868950b11101001010101010101n // 955733
BigInt literals cannot start with0
to avoid confusion with legacy octal literals.
0755n; // SyntaxError: invalid BigInt syntax
For octalBigInt
numbers, always use zero followed by the letter "o" (uppercase or lowercase):
0o755n;
For more information aboutBigInt
, see alsoJavaScript data structures.
Numeric separators
To improve readability for numeric literals, underscores (_
,U+005F
) can be used as separators:
1_000_000_000_0001_050.950b1010_0001_1000_01010o2_2_5_60xA0_B0_C01_000_000_000_000_000_000_000n
Note these limitations:
// More than one underscore in a row is not allowed100__000; // SyntaxError// Not allowed at the end of numeric literals100_; // SyntaxError// Can not be used after leading 00_1; // SyntaxError
String literals
Astring literal is zero or more Unicode code points enclosed in single or double quotes. Unicode code points may also be represented by an escape sequence. All code points may appear literally in a string literal except for these code points:
- U+005C \ (backslash)
- U+000D <CR>
- U+000A <LF>
- The same kind of quote that begins the string literal
Any code points may appear in the form of an escape sequence. String literals evaluate to ECMAScript String values. When generating these String values Unicode code points are UTF-16 encoded.
'foo'"bar"
The following subsections describe various escape sequences (\
followed by one or more characters) available in string literals. Any escape sequence not listed below becomes an "identity escape" that becomes the code point itself. For example,\z
is the same asz
. There's a deprecated octal escape sequence syntax described in theDeprecated and obsolete features page. Many of these escape sequences are also valid in regular expressions — seeCharacter escape.
Escape sequences
Special characters can be encoded using escape sequences:
Escape sequence | Unicode code point |
---|---|
\0 | null character (U+0000 NULL) |
\' | single quote (U+0027 APOSTROPHE) |
\" | double quote (U+0022 QUOTATION MARK) |
\\ | backslash (U+005C REVERSE SOLIDUS) |
\n | newline (U+000A LINE FEED; LF) |
\r | carriage return (U+000D CARRIAGE RETURN; CR) |
\v | vertical tab (U+000B LINE TABULATION) |
\t | tab (U+0009 CHARACTER TABULATION) |
\b | backspace (U+0008 BACKSPACE) |
\f | form feed (U+000C FORM FEED) |
\ followed by aline terminator | empty string |
The last escape sequence,\
followed by a line terminator, is useful for splitting a string literal across multiple lines without changing its meaning.
const longString = "This is a very long string which needs \to wrap across multiple lines because \otherwise my code is unreadable.";
Make sure there is no space or any other character after the backslash (except for a line break), otherwise it will not work. If the next line is indented, the extra spaces will also be present in the string's value.
You can also use the+
operator to append multiple strings together, like this:
const longString = "This is a very long string which needs " + "to wrap across multiple lines because " + "otherwise my code is unreadable.";
Both of the above methods result in identical strings.
Hexadecimal escape sequences
Hexadecimal escape sequences consist of\x
followed by exactly two hexadecimal digits representing a code unit or code point in the range 0x0000 to 0x00FF.
"\xA9"; // "©"
Unicode escape sequences
A Unicode escape sequence consists of exactly four hexadecimal digits following\u
. It represents a code unit in the UTF-16 encoding. For code points U+0000 to U+FFFF, the code unit is equal to the code point. Code points U+10000 to U+10FFFF require two escape sequences representing the two code units (a surrogate pair) used to encode the character; the surrogate pair is distinct from the code point.
See alsoString.fromCharCode()
andString.prototype.charCodeAt()
.
"\u00A9"; // "©" (U+A9)
Unicode code point escapes
A Unicode code point escape consists of\u{
, followed by a code point in hexadecimal base, followed by}
. The value of the hexadecimal digits must be in the range 0 and 0x10FFFF inclusive. Code points in the range U+10000 to U+10FFFF do not need to be represented as a surrogate pair.
See alsoString.fromCodePoint()
andString.prototype.codePointAt()
.
"\u{2F804}"; // CJK COMPATIBILITY IDEOGRAPH-2F804 (U+2F804)// the same character represented as a surrogate pair"\uD87E\uDC04";
Regular expression literals
Regular expression literals are enclosed by two forward slashes (/
). The lexer consumes all characters up to the next unescaped forward slash or the end of the line, unless the forward slash appears within a character class ([]
). Some characters (namely, those that areidentifier parts) can appear after the closing slash, denoting flags.
The lexical grammar is very lenient: not all regular expression literals that get identified as one token are valid regular expressions.
See alsoRegExp
for more information.
/ab+c/g;/[/]/;
A regular expression literal cannot start with two forward slashes (//
), because that would be a line comment. To specify an empty regular expression, use/(?:)/
.
Template literals
One template literal consists of several tokens:`xxx${
(template head),}xxx${
(template middle), and}xxx`
(template tail) are individual tokens, while any expression may come between them.
See alsotemplate literals for more information.
`string text`;`string text line 1 string text line 2`;`string text ${expression} string text`;tag`string text ${expression} string text`;
Automatic semicolon insertion
SomeJavaScript statements' syntax definitions require semicolons (;
) at the end. They include:
var
,let
,const
- Expression statements
do...while
continue
,break
,return
,throw
debugger
- Class field declarations (public orprivate)
import
,export
However, to make the language more approachable and convenient, JavaScript is able to automatically insert semicolons when consuming the token stream, so that some invalid token sequences can be "fixed" to valid syntax. This step happens after the program text has been parsed to tokens according to the lexical grammar. There are three cases when semicolons are automatically inserted:
1. When a token not allowed by the grammar is encountered, and it's separated from the previous token by at least oneline terminator (including a block comment that includes at least one line terminator), or the token is "}", then a semicolon is inserted before the token.
{ 12 } 3// is transformed by ASI into:{ 1;2 ;} 3;// Which is valid grammar encoding three statements,// each consisting of a number literal
The ending ")" ofdo...while
is taken care of as a special case by this rule as well.
do { // …} while (condition) /* ; */ // ASI hereconst a = 1
However, semicolons are not inserted if the semicolon would then become the separator in thefor
statement's head.
for ( let a = 1 // No ASI here a < 10 // No ASI here a++) {}
Semicolons are also never inserted asempty statements. For example, in the code below, if a semicolon is inserted after ")", then the code would be valid, with an empty statement as theif
body and theconst
declaration being a separate statement. However, because automatically inserted semicolons cannot become empty statements, this causes adeclaration to become the body of theif
statement, which is not valid.
if (Math.random() > 0.5)const x = 1 // SyntaxError: Unexpected token 'const'
2. When the end of the input stream of tokens is reached, and the parser is unable to parse the single input stream as a complete program, a semicolon is inserted at the end.
const a = 1 /* ; */ // ASI here
This rule is a complement to the previous rule, specifically for the case where there's no "offending token" but the end of input stream.
3. When the grammar forbids line terminators in some place but a line terminator is found, a semicolon is inserted. These places include:
expr <here> ++
,expr <here> --
continue <here> lbl
break <here> lbl
return <here> expr
throw <here> expr
yield <here> expr
yield <here> * expr
(param) <here> => {}
async <here> function
,async <here> prop()
,async <here> function*
,async <here> *prop()
,async <here> (param) <here> => {}
Here++
is not treated as a postfix operator applying to variableb
, because a line terminator occurs betweenb
and++
.
a = b++c// is transformed by ASI intoa = b;++c;
Here, thereturn
statement returnsundefined
, and thea + b
becomes an unreachable statement.
returna + b// is transformed by ASI intoreturn;a + b;
Note that ASI would only be triggered if a line break separates tokens that would otherwise produce invalid syntax. If the next token can be parsed as part of a valid structure, semicolons would not be inserted. For example:
const a = 1(1).toString()const b = 1[1, 2, 3].forEach(console.log)
Because()
can be seen as a function call, it would usually not trigger ASI. Similarly,[]
may be a member access. The code above is equivalent to:
const a = 1(1).toString();const b = 1[1, 2, 3].forEach(console.log);
This happens to be valid syntax.1[1, 2, 3]
is aproperty accessor with acomma-joined expression. Therefore, you would get errors like "1 is not a function" and "Cannot read properties of undefined (reading 'forEach')" when running the code.
Within classes, class fields and generator methods can be a pitfall as well.
class A { a = 1 *gen() {}}
It is seen as:
class A { a = 1 * gen() {}}
And therefore will be a syntax error around{
.
There are the following rules-of-thumb for dealing with ASI, if you want to enforce semicolon-less style:
Write postfix
++
and--
on the same line as their operands.jsconst a = b++console.log(a) // ReferenceError: Invalid left-hand side expression in prefix operation
jsconst a = b++console.log(a)
The expressions after
return
,throw
, oryield
should be on the same line as the keyword.jsfunction foo() { return 1 + 1 // Returns undefined; 1 + 1 is ignored}
jsfunction foo() { return 1 + 1}function foo() { return ( 1 + 1 )}
Similarly, the label identifier after
break
orcontinue
should be on the same line as the keyword.jsouterBlock: { innerBlock: { break outerBlock // SyntaxError: Illegal break statement }}
jsouterBlock: { innerBlock: { break outerBlock }}
The
=>
of an arrow function should be on the same line as the end of its parameters.jsconst foo = (a, b) => a + b
jsconst foo = (a, b) => a + b
The
async
of async functions, methods, etc. cannot be directly followed by a line terminator.jsasyncfunction foo() {}
jsasync functionfoo() {}
If a line starts with one of
(
,[
,`
,+
,-
,/
(as in regex literals), prefix it with a semicolon, or end the previous line with a semicolon.js// The () may be merged with the previous line as a function call(() => { // …})()// The [ may be merged with the previous line as a property access[1, 2, 3].forEach(console.log)// The ` may be merged with the previous line as a tagged template literal`string text ${data}`.match(pattern).forEach(console.log)// The + may be merged with the previous line as a binary + expression+a.toString()// The - may be merged with the previous line as a binary - expression-a.toString()// The / may be merged with the previous line as a division expression/pattern/.exec(str).forEach(console.log)
js;(() => { // …})();[1, 2, 3].forEach(console.log);`string text ${data}`.match(pattern).forEach(console.log);+a.toString();-a.toString();/pattern/.exec(str).forEach(console.log)
Class fields should preferably always be ended with semicolons — in addition to the previous rule (which includes a field declaration followed by acomputed property, since the latter starts with
[
), semicolons are also required between a field declaration and a generator method.jsclass A { a = 1 [b] = 2 *gen() {} // Seen as a = 1[b] = 2 * gen() {}}
jsclass A { a = 1; [b] = 2; *gen() {}}
Specifications
Specification |
---|
ECMAScript® 2026 Language Specification |
Browser compatibility
See also
- Grammar and types guide
- Micro-feature from ES6, now in Firefox Aurora and Nightly: binary and octal numbers by Jeff Walden (2013)
- JavaScript character escape sequences by Mathias Bynens (2011)