@@ -288,58 +288,81 @@ forms a legal token, when read from left to right.
288
288
289
289
.. _identifiers :
290
290
291
- Identifiers and keywords
292
- ========================
291
+ Names (identifiers and keywords)
292
+ ================================
293
293
294
294
..index ::identifier, name
295
295
296
- Identifiers (also referred to as * names *) are described by the following lexical
297
- definitions .
296
+ :data: ` ~token.NAME ` tokens represent * identifiers *, * keywords *, and
297
+ * soft keywords * .
298
298
299
- The syntax of identifiers in Python is based on the Unicode standard annex
300
- UAX-31, with elaboration and changes as defined below; see also:pep: `3131 ` for
301
- further details.
302
-
303
- Within the ASCII range (U+0001..U+007F), the valid characters for identifiers
304
- include the uppercase and lowercase letters ``A `` through
305
- ``Z ``, the underscore ``_ `` and, except for the first character, the digits
299
+ Within the ASCII range (U+0001..U+007F), the valid characters for names
300
+ include the uppercase and lowercase letters (``A-Z `` and ``a-z ``),
301
+ the underscore ``_ `` and, except for the first character, the digits
306
302
``0 `` through ``9 ``.
307
- Python 3.0 introduced additional characters from outside the ASCII range (see
308
- :pep: `3131 `). For these characters, the classification uses the version of the
309
- Unicode Character Database as included in the:mod: `unicodedata ` module.
310
303
311
- Identifiers are unlimited in length. Case is significant.
304
+ Names must contain at least one character, but have no upper length limit.
305
+ Case is significant.
312
306
313
- ..productionlist ::python-grammar
314
- identifier: `xid_start ` `xid_continue`*
315
- id_start: <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl, the underscore, and characters with the Other_ID_Start property>
316
- id_continue: <all characters in `id_start `, plus characters in the categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
317
- xid_start: <all characters in `id_start ` whose NFKC normalization is in "id_start xid_continue*">
318
- xid_continue: <all characters in `id_continue ` whose NFKC normalization is in "id_continue*">
319
-
320
- The Unicode category codes mentioned above stand for:
321
-
322
- * *Lu * - uppercase letters
323
- * *Ll * - lowercase letters
324
- * *Lt * - titlecase letters
325
- * *Lm * - modifier letters
326
- * *Lo * - other letters
327
- * *Nl * - letter numbers
328
- * *Mn * - nonspacing marks
329
- * *Mc * - spacing combining marks
330
- * *Nd * - decimal numbers
331
- * *Pc * - connector punctuations
332
- * *Other_ID_Start * - explicit list of characters in `PropList.txt
333
- <https://www.unicode.org/Public/16.0.0/ucd/PropList.txt> `_ to support backwards
334
- compatibility
335
- * *Other_ID_Continue * - likewise
336
-
337
- All identifiers are converted into the normal form NFKC while parsing; comparison
338
- of identifiers is based on NFKC.
339
-
340
- A non-normative HTML file listing all valid identifier characters for Unicode
341
- 16.0.0 can be found at
342
- https://www.unicode.org/Public/16.0.0/ucd/DerivedCoreProperties.txt
307
+ Besides ``A-Z ``, ``a-z ``, ``_ `` and ``0-9 ``, names can also use "letter-like"
308
+ and "number-like" characters from outside the ASCII range, as detailed below.
309
+
310
+ All identifiers are converted into the `normalization form `_ NFKC while
311
+ parsing; comparison of identifiers is based on NFKC.
312
+
313
+ Formally, the first character of a normalized identifier must belong to the
314
+ set ``id_start ``, which is the union of:
315
+
316
+ * Unicode category ``<Lu> `` - uppercase letters (includes ``A `` to ``Z ``)
317
+ * Unicode category ``<Ll> `` - lowercase letters (includes ``a `` to ``z ``)
318
+ * Unicode category ``<Lt> `` - titlecase letters
319
+ * Unicode category ``<Lm> `` - modifier letters
320
+ * Unicode category ``<Lo> `` - other letters
321
+ * Unicode category ``<Nl> `` - letter numbers
322
+ * {``"_" ``} - the underscore
323
+ * ``<Other_ID_Start> `` - an explicit set of characters in `PropList.txt `_
324
+ to support backwards compatibility
325
+
326
+ The remaining characters must belong to the set ``id_continue ``, which is the
327
+ union of:
328
+
329
+ * all characters in ``id_start ``
330
+ * Unicode category ``<Nd> `` - decimal numbers (includes ``0 `` to ``9 ``)
331
+ * Unicode category ``<Pc> `` - connector punctuations
332
+ * Unicode category ``<Mn> `` - nonspacing marks
333
+ * Unicode category ``<Mc> `` - spacing combining marks
334
+ * ``<Other_ID_Continue> `` - another explicit set of characters in
335
+ `PropList.txt `_ to support backwards compatibility
336
+
337
+ Unicode categories use the version of the Unicode Character Database as
338
+ included in the:mod: `unicodedata ` module.
339
+
340
+ These sets are based on the Unicode standard annex `UAX-31 `_.
341
+ See also:pep: `3131 ` for further details.
342
+
343
+ Even more formally, names are described by the following lexical definitions:
344
+
345
+ ..grammar-snippet ::
346
+ :group: python-grammar
347
+
348
+ NAME: `xid_start ` `xid_continue`*
349
+ id_start: <Lu> | <Ll> | <Lt> | <Lm> | <Lo> | <Nl> | "_" | <Other_ID_Start>
350
+ id_continue: `id_start ` | <Nd> | <Pc> | <Mn> | <Mc> | <Other_ID_Continue>
351
+ xid_start: <all characters in `id_start ` whose NFKC normalization is
352
+ in (`id_start ` `xid_continue`*)">
353
+ xid_continue: <all characters in `id_continue ` whose NFKC normalization is
354
+ in (`id_continue`*)">
355
+ identifier: <`NAME `, except keywords>
356
+
357
+ A non-normative listing of all valid identifier characters as defined by
358
+ Unicode is available in the `DerivedCoreProperties.txt `_ file in the Unicode
359
+ Character Database.
360
+
361
+
362
+ .. _UAX-31 :https://www.unicode.org/reports/tr31/
363
+ .. _PropList.txt :https://www.unicode.org/Public/16.0.0/ucd/PropList.txt
364
+ .. _DerivedCoreProperties.txt :https://www.unicode.org/Public/16.0.0/ucd/DerivedCoreProperties.txt
365
+ .. _normalization form :https://www.unicode.org/reports/tr15/#Norm_Forms
343
366
344
367
345
368
.. _keywords :
@@ -351,7 +374,7 @@ Keywords
351
374
single: keyword
352
375
single: reserved word
353
376
354
- The followingidentifiers are used as reserved words, or *keywords * of the
377
+ The followingnames are used as reserved words, or *keywords * of the
355
378
language, and cannot be used as ordinary identifiers. They must be spelled
356
379
exactly as written here:
357
380
@@ -375,18 +398,19 @@ Soft Keywords
375
398
376
399
..versionadded ::3.10
377
400
378
- Some identifiers are only reserved under specific contexts. These are known as
379
- *soft keywords *. The identifiers ``match ``, ``case ``, ``type `` and ``_ `` can
380
- syntactically act as keywords in certain contexts,
401
+ Some names are only reserved under specific contexts. These are known as
402
+ *soft keywords *:
403
+
404
+ - ``match ``, ``case ``, and ``_ ``, when used in the:keyword: `match ` statement.
405
+ - ``type ``, when used in the:keyword: `type ` statement.
406
+
407
+ These syntactically act as keywords in their specific contexts,
381
408
but this distinction is done at the parser level, not when tokenizing.
382
409
383
410
As soft keywords, their use in the grammar is possible while still
384
411
preserving compatibility with existing code that uses these names as
385
412
identifier names.
386
413
387
- ``match ``, ``case ``, and ``_ `` are used in the:keyword: `match ` statement.
388
- ``type `` is used in the:keyword: `type ` statement.
389
-
390
414
..versionchanged ::3.12
391
415
``type `` is now a soft keyword.
392
416