Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.4k
bpo-46659: Update the test on the mbcs codec alias#31168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
encodings registers the _alias_mbcs() codec search function beforethe search_function() codec search function. Previously, the_alias_mbcs() was never used.Fix the test_codecs.test_mbcs_alias() test: use the current ANSI codepage, not a fake ANSI code page number.Remove the test_site.test_aliasing_mbcs() test: the alias is nowimplemented in the encodings module, no longer in the site module.
# The encodings module create a "mbcs" alias to the ANSI code page | ||
codec = codecs.lookup(encoding) | ||
self.assertEqual(codec.name, "mbcs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This was never true before. With 1252 as my ANSI code page, I checkedcodecs.lookup('cp1252')
in 2.7, 3.4, 3.5, 3.6, 3.9, and 3.10, and none of them return the "mbcs" encoding. It's not equivalent, and not supposed to be. The implementation of "cp1252" should be cross-platform, regardless of whether we're on a Windows system with 1252 as the ANSI code page, as opposed to a Windows system with some other ANSI code page, or a Linux or macOS system.
The differences are that "mbcs" maps every byte, whereas our code-page encodings do not map undefined bytes, and the "replace" handler of "mbcs" uses a best-fit mapping (e.g. "α" -> "a") when encoding text, instead of mapping all undefined characters to "?".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This issue is worse than what I expected, I createdhttps://bugs.python.org/issue46668 to discuss it.
# On Windows, the encoding name must be the ANSI code page | ||
encoding = locale.getpreferredencoding(False) | ||
self.assertTrue(encoding.startswith('cp'), encoding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This will fail ifPYTHONUTF8
is set in the environment, because it overridesgetpreferredencoding(False)
and_get_locale_encoding()
.
Uh oh!
There was an error while loading.Please reload this page.
Move the test on the "mbcs" codec alias from test_site to
test_codecs. Moreover, the test now uses
locale.getpreferredencoding(False) rather than
locale.getdefaultlocale() to get the ANSI code page.
https://bugs.python.org/issue46659