- Notifications
You must be signed in to change notification settings - Fork748
Use non-BOM encodings#2370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
49db3bf
to07f65c7
CompareThe documentation of the used `PyUnicode_DecodeUTF16` states that notpassing `*byteorder` or passing a 0 results in the first two bytes, ifthey are the BOM (U+FEFF, zero-width no-break space), to be interpretedand skipped, which is incorrect when we convert a known "non BOM"string, which all strings from C# are.
@filmor can you ELI5? For someone not familiar with intricacies of BOM, but aware of byte order issues. My biggest question is if this change has any potential to introduce bugs to handling strings that actually have BOM? E.g. imagine a scenario when someone serialized and persisted something with BOM using 3.0.3, but after this change in 3.0.4 if they read it back BOM will be in their string data. |
It's the other way round. Strings that are being passed between Python and .NET are UTF16 in the respective native byte order (usually LE), without a BOM. The functions that we were using for the conversions (in particular |
195cde6
intopythonnet:masterUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Use non-BOM encodings for both C#->Python and Python->C#, as the byteorder is always the native one and the BOM is neither never or always used.
Fixes#2369.