Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.3k
gh-127833: Reword and expand the Notation section#134443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Changes fromall commits
211f83d
ec90d40
3f3a0db
160ff42
231e2ba
fa17c00
327b90d
3c5caa6
File filter
Filter by extension
Conversations
Uh oh!
There was an error while loading.Please reload this page.
Jump to
Uh oh!
There was an error while loading.Please reload this page.
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -90,44 +90,122 @@ Notation | ||
.. index:: BNF, grammar, syntax, notation | ||
The descriptions of lexical analysis and syntax use a grammar notation that | ||
is a mixture of | ||
`EBNF <https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form>`_ | ||
and `PEG <https://en.wikipedia.org/wiki/Parsing_expression_grammar>`_. | ||
For example: | ||
.. grammar-snippet:: | ||
:group: notation | ||
name: `letter` (`letter` | `digit` | "_")* | ||
letter: "a"..."z" | "A"..."Z" | ||
digit: "0"..."9" | ||
In this example, the first line says that a ``name`` is a ``letter`` followed | ||
by a sequence of zero or more ``letter``\ s, ``digit``\ s, and underscores. | ||
A ``letter`` in turn is any of the single characters ``'a'`` through | ||
``'z'`` and ``A`` through ``Z``; a ``digit`` is a single character from ``0`` | ||
to ``9``. | ||
Each rule begins with a name (which identifies the rule that's being defined) | ||
followed by a colon, ``:``. | ||
The definition to the right of the colon uses the following syntax elements: | ||
* ``name``: A name refers to another rule. | ||
Where possible, it is a link to the rule's definition. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. What possibilities exist, other than referencing another rule? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. Rules that aren't defined by formal grammar in the docs, for example Also: tokens are currently unlinked, and many don't have a (lexical) grammar rule so theyshould eventually link to prose. | ||
* ``TOKEN``: An uppercase name refers to a :term:`token`. | ||
For the purposes of grammar definitions, tokens are the same as rules. | ||
* ``"text"``, ``'text'``: Text in single or double quotes must match literally | ||
(without the quotes). The type of quote is chosen according to the meaning | ||
of ``text``: | ||
* ``'if'``: A name in single quotes denotes a :ref:`keyword <keywords>`. | ||
* ``"case"``: A name in double quotes denotes a | ||
:ref:`soft-keyword <soft-keywords>`. | ||
* ``'@'``: A non-letter symbol in single quotes denotes an | ||
:py:data:`~token.OP` token, that is, a :ref:`delimiter <delimiters>` or | ||
:ref:`operator <operators>`. | ||
* ``e1 e2``: Items separated only by whitespace denote a sequence. | ||
Here, ``e1`` must be followed by ``e2``. | ||
* ``e1 | e2``: A vertical bar is used to separate alternatives. | ||
It denotes PEG's "ordered choice": if ``e1`` matches, ``e2`` is | ||
not considered. | ||
In traditional PEG grammars, this is written as a slash, ``/``, rather than | ||
a vertical bar. | ||
See :pep:`617` for more background and details. | ||
* ``e*``: A star means zero or more repetitions of the preceding item. | ||
* ``e+``: Likewise, a plus means one or more repetitions. | ||
* ``[e]``: A phrase enclosed in square brackets means zero or | ||
one occurrences. In other words, the enclosed phrase is optional. | ||
* ``e?``: A question mark has exactly the same meaning as square brackets: | ||
the preceding item is optional. | ||
* ``(e)``: Parentheses are used for grouping. | ||
* ``"a"..."z"``: Two literal characters separated by three dots mean a choice | ||
of any single character in the given (inclusive) range of ASCII characters. | ||
This notation is only used in | ||
:ref:`lexical definitions <notation-lexical-vs-syntactic>`. | ||
* ``<...>``: A phrase between angular brackets gives an informal description | ||
of the matched symbol (for example, ``<any ASCII character except "\">``), | ||
or an abbreviation that is defined in nearby text (for example, ``<Lu>``). | ||
This notation is only used in | ||
:ref:`lexical definitions <notation-lexical-vs-syntactic>`. | ||
The unary operators (``*``, ``+``, ``?``) bind as tightly as possible; | ||
the vertical bar (``|``) binds most loosely. | ||
White space is only meaningful to separate tokens. | ||
Rules are normally contained on a single line, but rules that are too long | ||
may be wrapped: | ||
.. grammar-snippet:: | ||
:group: notation | ||
literal: stringliteral | bytesliteral | ||
| integer | floatnumber | imagnumber | ||
Alternatively, rules may be formatted with the first line ending at the colon, | ||
and each alternative beginning with a vertical bar on a new line. | ||
For example: | ||
.. grammar-snippet:: | ||
:group: notation-alt | ||
literal: | ||
| stringliteral | ||
| bytesliteral | ||
| integer | ||
| floatnumber | ||
| imagnumber | ||
This does *not* mean that there is an empty first alternative. | ||
.. index:: lexical definitions | ||
.. _notation-lexical-vs-syntactic: | ||
Lexical and Syntactic definitions | ||
--------------------------------- | ||
There is some difference between *lexical* and *syntactic* analysis: | ||
the :term:`lexical analyzer` operates on the individual characters of the | ||
input source, while the *parser* (syntactic analyzer) operates on the stream | ||
of :term:`tokens <token>` generated by the lexical analysis. | ||
However, in some cases the exact boundary between the two phases is a | ||
CPython implementation detail. | ||
The practical difference between the two is that in *lexical* definitions, | ||
all whitespace is significant. | ||
The lexical analyzer :ref:`discards <whitespace>` all whitespace that is not | ||
converted to tokens like :data:`token.INDENT` or :data:`~token.NEWLINE`. | ||
*Syntactic* definitions then use these tokens, rather than source characters. | ||
This documentation uses the same BNF grammar for both styles of definitions. | ||
All uses of BNF in the next chapter (:ref:`lexical`) are lexical definitions; | ||
uses in subsequent chapters are syntactic definitions. |
Uh oh!
There was an error while loading.Please reload this page.