The documentation of the used `PyUnicode_DecodeUTF16` states that notpassing `*byteorder` or passing a 0 results in the first two bytes, ifthey are the BOM (U+FEFF, zero-width no-break space), to be interpretedand skipped, which is incorrect when we convert a known "non BOM"string, which all strings from C# are.

filmor force-pushed thefix-bom-strings branch from07f65c7 todc6f5efCompare

May 5, 2024 18:42

filmor marked this pull request as ready for review

May 5, 2024 18:42

filmor requested a review fromlostmsu

May 5, 2024 18:44

Copy link

Member

lostmsu commentedMay 6, 2024

@filmor can you ELI5? For someone not familiar with intricacies of BOM, but aware of byte order issues.

My biggest question is if this change has any potential to introduce bugs to handling strings that actually have BOM? E.g. imagine a scenario when someone serialized and persisted something with BOM using 3.0.3, but after this change in 3.0.4 if they read it back BOM will be in their string data.

Copy link

MemberAuthor

filmor commentedMay 6, 2024

It's the other way round. Strings that are being passed between Python and .NET are UTF16 in the respective native byte order (usually LE), without a BOM. The functions that we were using for the conversions (in particularPyUnicode_DecodeUTF16 and the defaulr encoding objects fromEncoding) try to be "smart" and will interpret a leading set of FE FF or FF FE as the byte order mark, removing it from the converted string. By passing the correct endian-ness explicitly, this behaviour is disabled.

filmor self-assigned this

May 7, 2024

lostmsu merged commit195cde6 intopythonnet:master

May 10, 2024

27 checks passed

filmor deleted the fix-bom-strings branch

May 10, 2024 19:55

Labels

None yet

2 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use non-BOM encodings#2370

Use non-BOM encodings#2370

Uh oh!

Conversation

filmor commentedMay 4, 2024•
edited
Loading

Uh oh!

Uh oh!

lostmsu commentedMay 6, 2024

Uh oh!

filmor commentedMay 6, 2024

Uh oh!

Uh oh!

Uh oh!

Movatterモバイル変換

Use non-BOM encodings#2370

Use non-BOM encodings#2370

Uh oh!

Conversation

filmor commentedMay 4, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

lostmsu commentedMay 6, 2024

Uh oh!

filmor commentedMay 6, 2024

Uh oh!

Uh oh!

Uh oh!

filmor commentedMay 4, 2024•
edited
Loading