Movatterモバイル変換

Skip to content

python/cpythonPublic

NotificationsYou must be signed in to change notification settings
Fork33.4k
Star69.8k

gh-135676: Simplify docs on lexing names#140464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft

encukou wants to merge5 commits intopython:main

base:main

Choose a base branch

fromencukou:lex-analysis-names-simpler

Draft

gh-135676: Simplify docs on lexing names#140464

encukou wants to merge5 commits intopython:mainfromencukou:lex-analysis-names-simpler

+87 −58

Conversation

@encukou

Copy link

Member

encukou commentedOct 22, 2025•
edited by github-actionsbot
Loading

This simplifies the Lexical Analysis section on Names (but keeps it technically correct) by putting all the info about non-ASCII characters in a separate (and very technical) section.

It uses a mental model where the parser doesn't handle Unicode complexity “immediately”, but:

parsesany non-ASCII character (outside strings/comments) as part of a name, since these can't (yet) be e.g. operators
normalizes the name
validates the name, using theid_start/id_continue sets (referred to in previous sections as “letter-like” and “number-like” characters, with a link to the details)

This also means we don't needxid_start/xid_continue to define the behaviour :)

Issue:Reword the Lexical Analysis chapter of the docs #135676

📚 Documentation preview 📚:https://cpython-previews--140464.org.readthedocs.build/

encukouand others added4 commits

October 8, 2025 17:58

@encukou

@StanFromIreland

@blaisep

@MichaByte

@KeithTheEE

Simplify Names section

4606120

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>Co-authored-by: Blaise Pabon <blaise@gmail.com>Co-authored-by: Micha Albert <info@micha.zone>Co-authored-by: KeithTheEE <kmurrayis@gmail.com>

@encukou

Casing; 3 dots for character ranges

6163c24

@encukou

Clean-ups

de6d1af

@encukou

Mention Unicode's *ID_Start* and *ID_Continue*

152e7aa

@encukou

encukou requested review fromAA-Turner andwillingc ascode owners

October 22, 2025 15:53

@bedevere-app

bedevere-appbot added docs

Documentation in the Doc dir

skip news labels

@github-project-automation

github-project-automationbot added this toDocs PRs

@github-project-automation

github-project-automationbot moved this toTodo inDocs PRs

@bedevere-app

bedevere-appbot mentioned this pull request

Reword the Lexical Analysis chapter of the docs#135676

Open

@bedevere-app

bedevere-appbot added the awaiting core review label

@StanFromIreland

StanFromIreland linked an issue

that may beclosed by this pull request

Docs: note requirement to normalise unicode identifiers passed to globals() and locals()#86846

Open

willingc

willingc approved these changes

View reviewed changes

Copy link

Contributor

willingc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Outstanding document@encukou. I had one small suggestion to be a bit more explicit on the normalization example with number.

Doc/reference/lexical_analysis.rst

		This means that, for example, some typographic variants of characters are
		converted to their "basic" form, for example::

		>>> nᵘₘᵇₑʳ = 3

Copy link

Contributor

willingcOct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It would be helpful to add an explicit comment that the normalized form ofnᵘₘᵇₑʳisnumber.

Copy link

MemberAuthor

encukouOct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Does this look good?

@bedevere-app

bedevere-appbot added awaiting merge and removed awaiting core review labels

@encukou

Make it clear thatnᵘₘᵇₑʳ normalizes tonumber

fce5e98

@encukou

encukou mentioned this pull request

gh-129117: Expose_PyUnicode_IsXidContinue/Start inunicodedata#140269

Merged

@encukou

encukou marked this pull request as draft

November 5, 2025 10:45

@bedevere-app

bedevere-appbot removed the awaiting merge label

@encukou

Copy link

MemberAuthor

encukou commentedNov 5, 2025

There was an insightful conversation in#140269. I'll update this PR to make things even clearer.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Labels

Documentation in the Doc dir

2 participants

@encukou

@willingc

[8]ページ先頭

©2009-2025 Movatter.jp