Jun 9, 2025 · May 21, 2025 · May 21, 2025 · May 21, 2025 · May 21, 2025 · May 28, 2025
diff --git a/Doc/reference/grammar.rst b/Doc/reference/grammar.rst
 The version here omits details related to code generation and
 error recovery.

 The notation isa mixture of `EBNF
 <https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form>`_
 and `PEG <https://en.wikipedia.org/wiki/Parsing_expression_grammar>`_.
 In particular, ``&`` followed by a symbol, token or parenthesized
 group indicates a positive lookahead (i.e., is required to match but
 not consumed), while ``!`` indicates a negative lookahead (i.e., is
 required *not* to match).  We use the ``|``separator to mean PEG's
 "ordered choice" (written as ``/``in traditional PEG grammars). See
 :pep:`617` for more details on the grammar's syntax.
 The notationused hereisthe same as in the preceding docs,
 and is described in the :ref:`notation <notation>` section,
 except for a few extra complications:

 * ``&e``: a positive lookahead (that is, ``e`` is required to match but
 not consumed)
 * ``!e``: a negative lookahead (that is, ``e``is required *not* to match)
 * ``~``("cut"): commit to the current alternative and fail the rule
  even if this fails to parse

 .. literalinclude:: ../../Grammar/python.gram
  :language: peg
diff --git a/Doc/reference/introduction.rst b/Doc/reference/introduction.rst

 .. index:: BNF, grammar, syntax, notation

 The descriptions of lexical analysis and syntax use a modified
 `Backus–Naur form (BNF) <https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form>`_ grammar
 notation.  This uses the following style of definition:

 .. productionlist:: notation
   name: `lc_letter` (`lc_letter` | "_")*
   lc_letter: "a"..."z"

 The first line says that a ``name`` is an ``lc_letter`` followed by a sequence
 of zero or more ``lc_letter``\ s and underscores.  An ``lc_letter`` in turn is
 any of the single characters ``'a'`` through ``'z'``.  (This rule is actually
 adhered to for the names defined in lexical and grammar rules in this document.)

 Each rule begins with a name (which is the name defined by the rule) and
 ``::=``.  A vertical bar (``|``) is used to separate alternatives; it is the
 least binding operator in this notation.  A star (``*``) means zero or more
 repetitions of the preceding item; likewise, a plus (``+``) means one or more
 repetitions, and a phrase enclosed in square brackets (``[ ]``) means zero or
 one occurrences (in other words, the enclosed phrase is optional).  The ``*``
 and ``+`` operators bind as tightly as possible; parentheses are used for
 grouping.  Literal strings are enclosed in quotes.  White space is only
 meaningful to separate tokens. Rules are normally contained on a single line;
 rules with many alternatives may be formatted alternatively with each line after
 the first beginning with a vertical bar.

 .. index:: lexical definitions, ASCII

 In lexical definitions (as the example above), two more conventions are used:
 Two literal characters separated by three dots mean a choice of any single
 character in the given (inclusive) range of ASCII characters.  A phrase between
 angular brackets (``<...>``) gives an informal description of the symbol
 defined; e.g., this could be used to describe the notion of 'control character'
 if needed.

 Even though the notation used is almost the same, there is a big difference
 between the meaning of lexical and syntactic definitions: a lexical definition
 operates on the individual characters of the input source, while a syntax
 definition operates on the stream of tokens generated by the lexical analysis.
 All uses of BNF in the next chapter ("Lexical Analysis") are lexical
 definitions; uses in subsequent chapters are syntactic definitions.

 The descriptions of lexical analysis and syntax use a grammar notation that
 is a mixture of
 `EBNF <https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form>`_
 and `PEG <https://en.wikipedia.org/wiki/Parsing_expression_grammar>`_.
 For example:

 .. grammar-snippet::
   :group: notation

   name:   `letter` (`letter` | `digit` | "_")*
   letter: "a"..."z" | "A"..."Z"
   digit:  "0"..."9"

 In this example, the first line says that a ``name`` is a ``letter`` followed
 by a sequence of zero or more ``letter``\ s, ``digit``\ s, and underscores.
 A ``letter`` in turn is any of the single characters ``'a'`` through
 ``'z'`` and ``A`` through ``Z``; a ``digit`` is a single character from ``0``
 to ``9``.

 Each rule begins with a name (which identifies the rule that's being defined)
 followed by a colon, ``:``.
 The definition to the right of the colon uses the following syntax elements:

 * ``name``: A name refers to another rule.
  Where possible, it is a link to the rule's definition.

  * ``TOKEN``: An uppercase name refers to a :term:`token`.
    For the purposes of grammar definitions, tokens are the same as rules.

 * ``"text"``, ``'text'``: Text in single or double quotes must match literally
  (without the quotes). The type of quote is chosen according to the meaning
  of ``text``:

  * ``'if'``: A name in single quotes denotes a :ref:`keyword <keywords>`.
  * ``"case"``: A name in double quotes denotes a
    :ref:`soft-keyword <soft-keywords>`.
  * ``'@'``: A non-letter symbol in single quotes denotes an
    :py:data:`~token.OP` token, that is, a :ref:`delimiter <delimiters>` or
    :ref:`operator <operators>`.

 * ``e1 e2``: Items separated only by whitespace denote a sequence.
  Here, ``e1`` must be followed by ``e2``.
 * ``e1 | e2``: A vertical bar is used to separate alternatives.
  It denotes PEG's "ordered choice": if ``e1`` matches, ``e2`` is
  not considered.
  In traditional PEG grammars, this is written as a slash, ``/``, rather than
  a vertical bar.
  See :pep:`617` for more background and details.
 * ``e*``: A star means zero or more repetitions of the preceding item.
 * ``e+``: Likewise, a plus means one or more repetitions.
 * ``[e]``: A phrase enclosed in square brackets means zero or
  one occurrences. In other words, the enclosed phrase is optional.
 * ``e?``: A question mark has exactly the same meaning as square brackets:
  the preceding item is optional.
 * ``(e)``: Parentheses are used for grouping.
 * ``"a"..."z"``: Two literal characters separated by three dots mean a choice
  of any single character in the given (inclusive) range of ASCII characters.
  This notation is only used in
  :ref:`lexical definitions <notation-lexical-vs-syntactic>`.
 * ``<...>``: A phrase between angular brackets gives an informal description
  of the matched symbol (for example, ``<any ASCII character except "\">``),
  or an abbreviation that is defined in nearby text (for example, ``<Lu>``).
  This notation is only used in
  :ref:`lexical definitions <notation-lexical-vs-syntactic>`.

 The unary operators (``*``, ``+``, ``?``) bind as tightly as possible;
 the vertical bar (``|``) binds most loosely.

 White space is only meaningful to separate tokens.

 Rules are normally contained on a single line, but rules that are too long
 may be wrapped:

 .. grammar-snippet::
   :group: notation

   literal: stringliteral | bytesliteral
            | integer | floatnumber | imagnumber

 Alternatively, rules may be formatted with the first line ending at the colon,
 and each alternative beginning with a vertical bar on a new line.
 For example:


 .. grammar-snippet::
   :group: notation-alt

   literal:
      | stringliteral
      | bytesliteral
      | integer
      | floatnumber
      | imagnumber

 This does *not* mean that there is an empty first alternative.

 .. index:: lexical definitions

 .. _notation-lexical-vs-syntactic:

 Lexical and Syntactic definitions
 ---------------------------------

 There is some difference between *lexical* and *syntactic* analysis:
 the :term:`lexical analyzer` operates on the individual characters of the
 input source, while the *parser* (syntactic analyzer) operates on the stream
 of :term:`tokens <token>` generated by the lexical analysis.
 However, in some cases the exact boundary between the two phases is a
 CPython implementation detail.

 The practical difference between the two is that in *lexical* definitions,
 all whitespace is significant.
 The lexical analyzer :ref:`discards <whitespace>` all whitespace that is not
 converted to tokens like :data:`token.INDENT` or :data:`~token.NEWLINE`.
 *Syntactic* definitions then use these tokens, rather than source characters.

 This documentation uses the same BNF grammar for both styles of definitions.
 All uses of BNF in the next chapter (:ref:`lexical`) are lexical definitions;
 uses in subsequent chapters are syntactic definitions.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -8,15 +8,15 @@ used to generate the CPython parser (see :source:`Grammar/python.gram`).
		The version here omits details related to code generation and
		error recovery.

		The notation isa mixture of `EBNF
		<https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form>`_
		and `PEG <https://en.wikipedia.org/wiki/Parsing_expression_grammar>`_.
		In particular, ``&`` followed by a symbol, token or parenthesized
		group indicates a positive lookahead (i.e., is required to match but
		not consumed), while ``!`` indicates a negative lookahead (i.e., is
		required not to match). We use the ``\|``separator to mean PEG's
		"ordered choice" (written as ``/``in traditional PEG grammars). See
		:pep:`617` for more details on the grammar's syntax.
		The notationused hereisthe same as in the preceding docs,
		and is described in the :ref:`notation <notation>` section,
		except for a few extra complications:

		* ``&e``: a positive lookahead (that is, ``e`` is required to match but
		not consumed)
		* ``!e``: a negative lookahead (that is, ``e``is required not to match)
		* ``~``("cut"): commit to the current alternative and fail the rule
		even if this fails to parse

		.. literalinclude:: ../../Grammar/python.gram
		:language: peg
Original file line number	Diff line number	Diff line change
Expand Up		@@ -90,44 +90,122 @@ Notation

		.. index:: BNF, grammar, syntax, notation

		The descriptions of lexical analysis and syntax use a modified
		`Backus–Naur form (BNF) <https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form>`_ grammar
		notation. This uses the following style of definition:

		.. productionlist:: notation
		name: `lc_letter` (`lc_letter` \| "_")*
		lc_letter: "a"..."z"

		The first line says that a ``name`` is an ``lc_letter`` followed by a sequence
		of zero or more ``lc_letter``\ s and underscores. An ``lc_letter`` in turn is
		any of the single characters ``'a'`` through ``'z'``. (This rule is actually
		adhered to for the names defined in lexical and grammar rules in this document.)

		Each rule begins with a name (which is the name defined by the rule) and
		``::=``. A vertical bar (``\|``) is used to separate alternatives; it is the
		least binding operator in this notation. A star (``*``) means zero or more
		repetitions of the preceding item; likewise, a plus (``+``) means one or more
		repetitions, and a phrase enclosed in square brackets (``[ ]``) means zero or
		one occurrences (in other words, the enclosed phrase is optional). The ``*``
		and ``+`` operators bind as tightly as possible; parentheses are used for
		grouping. Literal strings are enclosed in quotes. White space is only
		meaningful to separate tokens. Rules are normally contained on a single line;
		rules with many alternatives may be formatted alternatively with each line after
		the first beginning with a vertical bar.

		.. index:: lexical definitions, ASCII

		In lexical definitions (as the example above), two more conventions are used:
		Two literal characters separated by three dots mean a choice of any single
		character in the given (inclusive) range of ASCII characters. A phrase between
		angular brackets (``<...>``) gives an informal description of the symbol
		defined; e.g., this could be used to describe the notion of 'control character'
		if needed.

		Even though the notation used is almost the same, there is a big difference
		between the meaning of lexical and syntactic definitions: a lexical definition
		operates on the individual characters of the input source, while a syntax
		definition operates on the stream of tokens generated by the lexical analysis.
		All uses of BNF in the next chapter ("Lexical Analysis") are lexical
		definitions; uses in subsequent chapters are syntactic definitions.

		The descriptions of lexical analysis and syntax use a grammar notation that
		is a mixture of
		`EBNF <https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form>`_
		and `PEG <https://en.wikipedia.org/wiki/Parsing_expression_grammar>`_.
		For example:

		.. grammar-snippet::
		:group: notation

		name: `letter` (`letter` \| `digit` \| "_")*
		letter: "a"..."z" \| "A"..."Z"
		digit: "0"..."9"

		In this example, the first line says that a ``name`` is a ``letter`` followed
		by a sequence of zero or more ``letter``\ s, ``digit``\ s, and underscores.
		A ``letter`` in turn is any of the single characters ``'a'`` through
		``'z'`` and ``A`` through ``Z``; a ``digit`` is a single character from ``0``
		to ``9``.

		Each rule begins with a name (which identifies the rule that's being defined)
		followed by a colon, ``:``.
		The definition to the right of the colon uses the following syntax elements:

		* ``name``: A name refers to another rule.
		Where possible, it is a link to the rule's definition.
Copy link Member lysnikolaouMay 29, 2025 Choose a reason for hiding this comment The reason will be displayed to describe this comment to others.Learn more. What possibilities exist, other than referencing another rule? Copy link MemberAuthor encukouJun 3, 2025 Choose a reason for hiding this comment The reason will be displayed to describe this comment to others.Learn more. Rules that aren't defined by formal grammar in the docs, for example`strings` inhttps://docs.python.org/3/reference/compound_stmts.html#literal-patterns I consider these to be issues that#127833 should eventually fix, but until then I'd keep the weasel word in. Also: tokens are currently unlinked, and many don't have a (lexical) grammar rule so theyshould eventually link to prose. lysnikolaou reacted with thumbs up emoji

		* ``TOKEN``: An uppercase name refers to a :term:`token`.
		For the purposes of grammar definitions, tokens are the same as rules.

		* ``"text"``, ``'text'``: Text in single or double quotes must match literally
		(without the quotes). The type of quote is chosen according to the meaning
		of ``text``:

		* ``'if'``: A name in single quotes denotes a :ref:`keyword <keywords>`.
		* ``"case"``: A name in double quotes denotes a
		:ref:`soft-keyword <soft-keywords>`.
		* ``'@'``: A non-letter symbol in single quotes denotes an
		:py:data:`~token.OP` token, that is, a :ref:`delimiter <delimiters>` or
		:ref:`operator <operators>`.

		* ``e1 e2``: Items separated only by whitespace denote a sequence.
		Here, ``e1`` must be followed by ``e2``.
		* ``e1 \| e2``: A vertical bar is used to separate alternatives.
		It denotes PEG's "ordered choice": if ``e1`` matches, ``e2`` is
		not considered.
		In traditional PEG grammars, this is written as a slash, ``/``, rather than
		a vertical bar.
		See :pep:`617` for more background and details.
		* ``e*``: A star means zero or more repetitions of the preceding item.
		* ``e+``: Likewise, a plus means one or more repetitions.
		* ``[e]``: A phrase enclosed in square brackets means zero or
		one occurrences. In other words, the enclosed phrase is optional.
		* ``e?``: A question mark has exactly the same meaning as square brackets:
		the preceding item is optional.
		* ``(e)``: Parentheses are used for grouping.
		* ``"a"..."z"``: Two literal characters separated by three dots mean a choice
		of any single character in the given (inclusive) range of ASCII characters.
		This notation is only used in
		:ref:`lexical definitions <notation-lexical-vs-syntactic>`.
		* ``<...>``: A phrase between angular brackets gives an informal description
		of the matched symbol (for example, ``<any ASCII character except "\">``),
		or an abbreviation that is defined in nearby text (for example, ``<Lu>``).
		This notation is only used in
		:ref:`lexical definitions <notation-lexical-vs-syntactic>`.

		The unary operators (``*``, ``+``, ``?``) bind as tightly as possible;
		the vertical bar (``\|``) binds most loosely.

		White space is only meaningful to separate tokens.

		Rules are normally contained on a single line, but rules that are too long
		may be wrapped:

		.. grammar-snippet::
		:group: notation

		literal: stringliteral \| bytesliteral
		\| integer \| floatnumber \| imagnumber

		Alternatively, rules may be formatted with the first line ending at the colon,
		and each alternative beginning with a vertical bar on a new line.
		For example:


		.. grammar-snippet::
		:group: notation-alt

		literal:
		\| stringliteral
		\| bytesliteral
		\| integer
		\| floatnumber
		\| imagnumber

		This does not mean that there is an empty first alternative.

		.. index:: lexical definitions

		.. _notation-lexical-vs-syntactic:

		Lexical and Syntactic definitions
		---------------------------------

		There is some difference between lexical and syntactic analysis:
		the :term:`lexical analyzer` operates on the individual characters of the
		input source, while the parser (syntactic analyzer) operates on the stream
		of :term:`tokens <token>` generated by the lexical analysis.
		However, in some cases the exact boundary between the two phases is a
		CPython implementation detail.

		The practical difference between the two is that in lexical definitions,
		all whitespace is significant.
		The lexical analyzer :ref:`discards <whitespace>` all whitespace that is not
		converted to tokens like :data:`token.INDENT` or :data:`~token.NEWLINE`.
		Syntactic definitions then use these tokens, rather than source characters.

		This documentation uses the same BNF grammar for both styles of definitions.
		All uses of BNF in the next chapter (:ref:`lexical`) are lexical definitions;
		uses in subsequent chapters are syntactic definitions.