Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-136595: Normalize surrogate pairs in REPL input to fix UnicodeEnco…#136639

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
vedant713 wants to merge3 commits intopython:main
base:main
Choose a base branch
Loading
fromvedant713:patch-4

Conversation

vedant713
Copy link

@vedant713vedant713 commentedJul 14, 2025
edited by bedevere-appbot
Loading

The new REPL implementation (_pyrepl) crashes on Windows when the user inputs Unicode characters outside the Basic Multilingual Plane (≥ U+10000), such as emoji (e.g. 🐍). This happens because the Windows input layer provides surrogate pairs (UTF-16 code units) that _pyrepl attempts to process and tokenize directly, leading to unpaired surrogate handling issues.

This commit introduces anormalize_surrogates() helper inReader to explicitly normalize surrogate pairs by encoding to UTF-16 with 'surrogatepass' and decoding back. Theget_unicode() method is patched to use this normalization so that any code consuming REPL input (e.g. syntax highlighting via tokenize) receives valid Unicode text.

This resolves UnicodeEncodeError crashes in the REPL when typing emoji or other non-BMP characters on Windows.

Fixes#136595

…deEncodeError on WindowsThe new REPL implementation (_pyrepl) crashes on Windows when the user inputs Unicode characters outside the Basic Multilingual Plane (≥ U+10000), such as emoji (e.g. 🐍). This happens because the Windows input layer provides surrogate pairs (UTF-16 code units) that _pyrepl attempts to process and tokenize directly, leading to unpaired surrogate handling issues.This commit introduces a `normalize_surrogates()` helper in `Reader` to explicitly normalize surrogate pairs by encoding to UTF-16 with 'surrogatepass' and decoding back. The `get_unicode()` method is patched to use this normalization so that any code consuming REPL input (e.g. syntax highlighting via tokenize) receives valid Unicode text.This resolves UnicodeEncodeError crashes in the REPL when typing emoji or other non-BMP characters on Windows.Fixespython#136595
@bedevere-app
Copy link

Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply theskip news label instead.

@serhiy-storchaka
Copy link
Member

This implementation fails if there are lone surrogate characters. Even after fixing this, it will not completely solve the original issue for the case of lone surrogate characters -- we need to handle this at the encoding to UTF-8 step.

See also a different (regular expression based) implementation in#121219.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@pablogsalpablogsalAwaiting requested review from pablogsalpablogsal will be requested when the pull request is marked ready for reviewpablogsal is a code owner

@lysnikolaoulysnikolaouAwaiting requested review from lysnikolaoulysnikolaou will be requested when the pull request is marked ready for reviewlysnikolaou is a code owner

@ambvambvAwaiting requested review from ambvambv will be requested when the pull request is marked ready for reviewambv is a code owner

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Unicode characters ≥ 0x10000 cannot be inputted/behaves unusually at the REPL terminal.
2 participants
@vedant713@serhiy-storchaka

[8]ページ先頭

©2009-2025 Movatter.jp