Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Use non-BOM encodings#2370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
lostmsu merged 2 commits intopythonnet:masterfromfilmor:fix-bom-strings
May 10, 2024
Merged

Conversation

filmor
Copy link
Member

@filmorfilmor commentedMay 4, 2024
edited
Loading

Use non-BOM encodings for both C#->Python and Python->C#, as the byteorder is always the native one and the BOM is neither never or always used.

Fixes#2369.

@filmorfilmorforce-pushed thefix-bom-strings branch 2 times, most recently from49db3bf to07f65c7CompareMay 5, 2024 18:41
The documentation of the used `PyUnicode_DecodeUTF16` states that notpassing `*byteorder` or passing a 0 results in the first two bytes, ifthey are the BOM (U+FEFF, zero-width no-break space), to be interpretedand skipped, which is incorrect when we convert a known "non BOM"string, which all strings from C# are.
@filmorfilmor marked this pull request as ready for reviewMay 5, 2024 18:42
@filmorfilmor requested a review fromlostmsuMay 5, 2024 18:44
@lostmsu
Copy link
Member

@filmor can you ELI5? For someone not familiar with intricacies of BOM, but aware of byte order issues.

My biggest question is if this change has any potential to introduce bugs to handling strings that actually have BOM? E.g. imagine a scenario when someone serialized and persisted something with BOM using 3.0.3, but after this change in 3.0.4 if they read it back BOM will be in their string data.

@filmor
Copy link
MemberAuthor

It's the other way round. Strings that are being passed between Python and .NET are UTF16 in the respective native byte order (usually LE), without a BOM. The functions that we were using for the conversions (in particularPyUnicode_DecodeUTF16 and the defaulr encoding objects fromEncoding) try to be "smart" and will interpret a leading set of FE FF or FF FE as the byte order mark, removing it from the converted string. By passing the correct endian-ness explicitly, this behaviour is disabled.

@filmorfilmor self-assigned thisMay 7, 2024
@lostmsulostmsu merged commit195cde6 intopythonnet:masterMay 10, 2024
27 checks passed
@filmorfilmor deleted the fix-bom-strings branchMay 10, 2024 19:55
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@lostmsulostmsuAwaiting requested review from lostmsu

Assignees

@filmorfilmor

Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Possible bug in reading zero width no-break space character
2 participants
@filmor@lostmsu

[8]ページ先頭

©2009-2025 Movatter.jp