Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-127833: Reword and expand the Notation section#134443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
encukou wants to merge4 commits intopython:main
base:main
Choose a base branch
Loading
fromencukou:grammar-notation
Open
Show file tree
Hide file tree
Changes fromall commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 8 additions & 9 deletionsDoc/reference/grammar.rst
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -8,15 +8,14 @@ used to generate the CPython parser (see :source:`Grammar/python.gram`).
The version here omits details related to code generation and
error recovery.

The notation is a mixture of `EBNF
<https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form>`_
and `PEG <https://en.wikipedia.org/wiki/Parsing_expression_grammar>`_.
In particular, ``&`` followed by a symbol, token or parenthesized
group indicates a positive lookahead (i.e., is required to match but
not consumed), while ``!`` indicates a negative lookahead (i.e., is
required *not* to match). We use the ``|`` separator to mean PEG's
"ordered choice" (written as ``/`` in traditional PEG grammars). See
:pep:`617` for more details on the grammar's syntax.
The notation used here is the same as in the preceding docs,
and is described in the :ref:`notation <notation>` section,
except for a few extra complications:

* ``&e``: a positive lookahead (that is, ``e`` is required to match but
not consumed)
* ``!e``: a negative lookahead (that is, ``e`` is required *not* to match)
* ``~`` ("cut"): commit to the current alternative, even if it fails to parse

.. literalinclude:: ../../Grammar/python.gram
:language: peg
144 changes: 103 additions & 41 deletionsDoc/reference/introduction.rst
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -90,44 +90,106 @@ Notation

.. index:: BNF, grammar, syntax, notation

The descriptions of lexical analysis and syntax use a modified
`Backus–Naur form (BNF) <https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form>`_ grammar
notation. This uses the following style of definition:

.. productionlist:: notation
name: `lc_letter` (`lc_letter` | "_")*
lc_letter: "a"..."z"

The first line says that a ``name`` is an ``lc_letter`` followed by a sequence
of zero or more ``lc_letter``\ s and underscores. An ``lc_letter`` in turn is
any of the single characters ``'a'`` through ``'z'``. (This rule is actually
adhered to for the names defined in lexical and grammar rules in this document.)

Each rule begins with a name (which is the name defined by the rule) and
``::=``. A vertical bar (``|``) is used to separate alternatives; it is the
least binding operator in this notation. A star (``*``) means zero or more
repetitions of the preceding item; likewise, a plus (``+``) means one or more
repetitions, and a phrase enclosed in square brackets (``[ ]``) means zero or
one occurrences (in other words, the enclosed phrase is optional). The ``*``
and ``+`` operators bind as tightly as possible; parentheses are used for
grouping. Literal strings are enclosed in quotes. White space is only
meaningful to separate tokens. Rules are normally contained on a single line;
rules with many alternatives may be formatted alternatively with each line after
the first beginning with a vertical bar.

.. index:: lexical definitions, ASCII

In lexical definitions (as the example above), two more conventions are used:
Two literal characters separated by three dots mean a choice of any single
character in the given (inclusive) range of ASCII characters. A phrase between
angular brackets (``<...>``) gives an informal description of the symbol
defined; e.g., this could be used to describe the notion of 'control character'
if needed.

Even though the notation used is almost the same, there is a big difference
between the meaning of lexical and syntactic definitions: a lexical definition
operates on the individual characters of the input source, while a syntax
definition operates on the stream of tokens generated by the lexical analysis.
All uses of BNF in the next chapter ("Lexical Analysis") are lexical
definitions; uses in subsequent chapters are syntactic definitions.

The descriptions of lexical analysis use a grammar notation that is a mixture
of `EBNF <https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form>`_
and `PEG <https://en.wikipedia.org/wiki/Parsing_expression_grammar>`_.
For example:

.. grammar-snippet::
:group: notation

name: `letter` (`letter` | `digit` | "_")*
letter: "a"..."z" | "A"..."Z"
digit: "0"..."9"

In this example, the first line says that a ``name`` is a ``letter`` followed
by a sequence of zero or more ``letter``\ s, ``digit``\ s, and underscores.
A ``letter`` in turn is any of the single characters ``'a'`` through
``'z'`` and ``A`` through ``Z``; a ``digit`` is a single character from ``0``
to ``9``.

Each rule begins with a name (which identifies the rule that's being defined)
followed by a colon, ``:``.
The definition to the right of the colon uses the following syntax elements:

* ``name``: A name refers to another rule.
Where possible, it is a link to the rule's definition.

* ``TOKEN``: An uppercase name refers to a :term:`token`.
For the purposes of grammar definitions, tokens are the same as rules.

* ``"text"``, ``'text'``: Text in single or double quotes must match literally
(without the quotes). The type of quote is chosen according to the meaning
of ``text``:

* ``'if'``: A name in single quotes denotes a :ref:`keyword <keywords>`.
* ``"case"``: A name in double quotes denotes a
:ref:`soft-keyword <soft-keywords>`.
* ``'@'``: A non-letter symbol in single quotes denotes an
:py:data:`~token.OP` token, that is, a :ref:`delimiter <delimiters>` or
:ref:`operator <operators>`.

* ``"a"..."z"``: Two literal characters separated by three dots mean a choice
of any single character in the given (inclusive) range of ASCII characters.
* ``<...>``: A phrase between angular brackets gives an informal description
of the matched symbol (for example, ``<any ASCII character except "\">``),
or an abbreviation that is defined in nearby text (for example, ``<Lu>``).
* ``e1 e2``: Items separated only by whitespace denote a sequence.
Here, ``e1`` must be followed by ``e2``.
* ``e1 | e2``: A vertical bar is used to separate alternatives.
It denotes PEG's "ordered choice": if ``e1`` matches, ``e2`` is
not considered.
In traditional PEG grammars, this is written as a slash, ``/``, rather than
a vertical bar.
See :pep:`617` for more background and details.
* ``e*``: A star means zero or more repetitions of the preceding item.
* ``e+``: Likewise, a plus means one or more repetitions.
* ``[e]``: A phrase enclosed in square brackets means zero or
one occurrences. In other words, the enclosed phrase is optional.
* ``e?``: A question mark has exactly the same meaning as square brackets:
the preceding item is optional.
* ``(e)``: Parentheses are used for grouping.

The unary operators (``*``, ``+``, ``?``) bind as tightly as possible;
the vertical bar (``|``) binds most loosely.

White space is only meaningful to separate tokens.

Rules are normally contained on a single line, but rules that are too long
may be wrapped:

.. grammar-snippet::
:group: notation

literal: stringliteral | bytesliteral
| integer | floatnumber | imagnumber

Alternatively, rules may be formatted with the first line ending at the colon,
and each alternative beginning with a vertical bar on a new line.
For example:


.. grammar-snippet::
:group: notation-alt

literal:
| stringliteral
| bytesliteral
| integer
| floatnumber
| imagnumber

This does *not* mean that there is an empty first alternative.

.. index:: lexical definitions

.. note::

There is some difference between *lexical* and *syntactic* analysis:
the :term:`lexical analyzer` operates on the individual characters of the
input source, while the *parser* (syntactic analyzer) operates on the stream
of :term:`tokens <token>` generated by the lexical analysis.
However, in some cases the exact boundary between the two phases is a
CPython implementation detail.

This documentation uses the same BNF grammar for both.
Loading

[8]ページ先頭

©2009-2025 Movatter.jp