Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitc7364f7

Browse files
encukouStanFromIrelandblaisep
authored
gh-127833: lexical analysis: Improve section on Names (GH-131474)
Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>Co-authored-by: Blaise Pabon <blaise@gmail.com>
1 parent109f759 commitc7364f7

File tree

2 files changed

+77
-53
lines changed

2 files changed

+77
-53
lines changed

‎Doc/reference/lexical_analysis.rst

Lines changed: 76 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -288,58 +288,81 @@ forms a legal token, when read from left to right.
288288

289289
.. _identifiers:
290290

291-
Identifiersand keywords
292-
========================
291+
Names (identifiersand keywords)
292+
================================
293293

294294
..index::identifier, name
295295

296-
Identifiers (also referred to as *names*) are described by the following lexical
297-
definitions.
296+
:data:`~token.NAME` tokens represent *identifiers*, *keywords*, and
297+
*soft keywords*.
298298

299-
The syntax of identifiers in Python is based on the Unicode standard annex
300-
UAX-31, with elaboration and changes as defined below; see also:pep:`3131` for
301-
further details.
302-
303-
Within the ASCII range (U+0001..U+007F), the valid characters for identifiers
304-
include the uppercase and lowercase letters ``A`` through
305-
``Z``, the underscore ``_`` and, except for the first character, the digits
299+
Within the ASCII range (U+0001..U+007F), the valid characters for names
300+
include the uppercase and lowercase letters (``A-Z`` and ``a-z``),
301+
the underscore ``_`` and, except for the first character, the digits
306302
``0`` through ``9``.
307-
Python 3.0 introduced additional characters from outside the ASCII range (see
308-
:pep:`3131`). For these characters, the classification uses the version of the
309-
Unicode Character Database as included in the:mod:`unicodedata` module.
310303

311-
Identifiers are unlimited in length. Case is significant.
304+
Names must contain at least one character, but have no upper length limit.
305+
Case is significant.
312306

313-
..productionlist::python-grammar
314-
identifier: `xid_start` `xid_continue`*
315-
id_start: <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl, the underscore, and characters with the Other_ID_Start property>
316-
id_continue: <all characters in `id_start`, plus characters in the categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
317-
xid_start: <all characters in `id_start` whose NFKC normalization is in "id_start xid_continue*">
318-
xid_continue: <all characters in `id_continue` whose NFKC normalization is in "id_continue*">
319-
320-
The Unicode category codes mentioned above stand for:
321-
322-
* *Lu* - uppercase letters
323-
* *Ll* - lowercase letters
324-
* *Lt* - titlecase letters
325-
* *Lm* - modifier letters
326-
* *Lo* - other letters
327-
* *Nl* - letter numbers
328-
* *Mn* - nonspacing marks
329-
* *Mc* - spacing combining marks
330-
* *Nd* - decimal numbers
331-
* *Pc* - connector punctuations
332-
* *Other_ID_Start* - explicit list of characters in `PropList.txt
333-
<https://www.unicode.org/Public/16.0.0/ucd/PropList.txt>`_ to support backwards
334-
compatibility
335-
* *Other_ID_Continue* - likewise
336-
337-
All identifiers are converted into the normal form NFKC while parsing; comparison
338-
of identifiers is based on NFKC.
339-
340-
A non-normative HTML file listing all valid identifier characters for Unicode
341-
16.0.0 can be found at
342-
https://www.unicode.org/Public/16.0.0/ucd/DerivedCoreProperties.txt
307+
Besides ``A-Z``, ``a-z``, ``_`` and ``0-9``, names can also use "letter-like"
308+
and "number-like" characters from outside the ASCII range, as detailed below.
309+
310+
All identifiers are converted into the `normalization form`_ NFKC while
311+
parsing; comparison of identifiers is based on NFKC.
312+
313+
Formally, the first character of a normalized identifier must belong to the
314+
set ``id_start``, which is the union of:
315+
316+
* Unicode category ``<Lu>`` - uppercase letters (includes ``A`` to ``Z``)
317+
* Unicode category ``<Ll>`` - lowercase letters (includes ``a`` to ``z``)
318+
* Unicode category ``<Lt>`` - titlecase letters
319+
* Unicode category ``<Lm>`` - modifier letters
320+
* Unicode category ``<Lo>`` - other letters
321+
* Unicode category ``<Nl>`` - letter numbers
322+
* {``"_"``} - the underscore
323+
* ``<Other_ID_Start>`` - an explicit set of characters in `PropList.txt`_
324+
to support backwards compatibility
325+
326+
The remaining characters must belong to the set ``id_continue``, which is the
327+
union of:
328+
329+
* all characters in ``id_start``
330+
* Unicode category ``<Nd>`` - decimal numbers (includes ``0`` to ``9``)
331+
* Unicode category ``<Pc>`` - connector punctuations
332+
* Unicode category ``<Mn>`` - nonspacing marks
333+
* Unicode category ``<Mc>`` - spacing combining marks
334+
* ``<Other_ID_Continue>`` - another explicit set of characters in
335+
`PropList.txt`_ to support backwards compatibility
336+
337+
Unicode categories use the version of the Unicode Character Database as
338+
included in the:mod:`unicodedata` module.
339+
340+
These sets are based on the Unicode standard annex `UAX-31`_.
341+
See also:pep:`3131` for further details.
342+
343+
Even more formally, names are described by the following lexical definitions:
344+
345+
..grammar-snippet::
346+
:group: python-grammar
347+
348+
NAME: `xid_start` `xid_continue`*
349+
id_start: <Lu> | <Ll> | <Lt> | <Lm> | <Lo> | <Nl> | "_" | <Other_ID_Start>
350+
id_continue: `id_start` | <Nd> | <Pc> | <Mn> | <Mc> | <Other_ID_Continue>
351+
xid_start: <all characters in `id_start` whose NFKC normalization is
352+
in (`id_start` `xid_continue`*)">
353+
xid_continue: <all characters in `id_continue` whose NFKC normalization is
354+
in (`id_continue`*)">
355+
identifier: <`NAME`, except keywords>
356+
357+
A non-normative listing of all valid identifier characters as defined by
358+
Unicode is available in the `DerivedCoreProperties.txt`_ file in the Unicode
359+
Character Database.
360+
361+
362+
.. _UAX-31:https://www.unicode.org/reports/tr31/
363+
.. _PropList.txt:https://www.unicode.org/Public/16.0.0/ucd/PropList.txt
364+
.. _DerivedCoreProperties.txt:https://www.unicode.org/Public/16.0.0/ucd/DerivedCoreProperties.txt
365+
.. _normalization form:https://www.unicode.org/reports/tr15/#Norm_Forms
343366

344367

345368
.. _keywords:
@@ -351,7 +374,7 @@ Keywords
351374
single: keyword
352375
single: reserved word
353376

354-
The followingidentifiers are used as reserved words, or *keywords* of the
377+
The followingnames are used as reserved words, or *keywords* of the
355378
language, and cannot be used as ordinary identifiers. They must be spelled
356379
exactly as written here:
357380

@@ -375,18 +398,19 @@ Soft Keywords
375398

376399
..versionadded::3.10
377400

378-
Some identifiers are only reserved under specific contexts. These are known as
379-
*soft keywords*. The identifiers ``match``, ``case``, ``type`` and ``_`` can
380-
syntactically act as keywords in certain contexts,
401+
Some names are only reserved under specific contexts. These are known as
402+
*soft keywords*:
403+
404+
- ``match``, ``case``, and ``_``, when used in the:keyword:`match` statement.
405+
- ``type``, when used in the:keyword:`type` statement.
406+
407+
These syntactically act as keywords in their specific contexts,
381408
but this distinction is done at the parser level, not when tokenizing.
382409

383410
As soft keywords, their use in the grammar is possible while still
384411
preserving compatibility with existing code that uses these names as
385412
identifier names.
386413

387-
``match``, ``case``, and ``_`` are used in the:keyword:`match` statement.
388-
``type`` is used in the:keyword:`type` statement.
389-
390414
..versionchanged::3.12
391415
``type`` is now a soft keyword.
392416

‎Tools/unicode/makeunicodedata.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@
4343
# When changing UCD version please update
4444
# * Doc/library/stdtypes.rst, and
4545
# * Doc/library/unicodedata.rst
46-
# * Doc/reference/lexical_analysis.rst (two occurrences)
46+
# * Doc/reference/lexical_analysis.rst (three occurrences)
4747
UNIDATA_VERSION="16.0.0"
4848
UNICODE_DATA="UnicodeData%s.txt"
4949
COMPOSITION_EXCLUSIONS="CompositionExclusions%s.txt"

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp