Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.4k
Description
Feature or enhancement
Proposal:
Python currently interns certain strings, such as keywords and some ASCII/Unicode characters, as well as module-specific strings. I propose extending this interning mechanism to the string representations of operators (e.g.,"+="
,"=="
,"|="
).
Rationale:
Interning these strings could improve performance, particularly in code parsing workflows, by:
- Reducing memory overhead for repeated operator strings.
- Accelerating string comparisons (e.g., during AST construction or bytecode generation).
Target Symbols:
The following multi-character syntactic literals (withlen() > 1
) are candidates for interning:
# Syntax literals '...', '->' # Operators '**', '//', '==', '!=', '>=', '<=', ':=', '+=', '-=', '*=', '/=', '//=', '%=', '**=', '<<', '>>', '<<=', '>>=', '&=', '|=', '^=' # And maybe character sequence that used in REPL?'>>>'
Proof of Concept:
A preliminary implementation is availablehere, demonstrating the feasibility of this change.
Considerations:
- The change would be low-risk, as it targets immutable, statically known strings.
- The impact on startup time and memory usage should be negligible, given the small set of operators.
Would this be a worthwhile optimization for CPython? I’d appreciate feedback on the idea and the PoC.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response